Unsupervised Statistical Learning Using Python: A Short Course

An 8-Hour Livestream Seminar Taught by Edwin Dalmaijer, DPhil

Download Sample Course Slides

Python is a general-purpose programming language. It is open-source, powerful, and easy to use. Because of this, Python is one of the most popular languages in the world, and it has become indispensable in data science.

In this course, we will cover two main classes of analytical approaches that aim to uncover what makes up a dataset by identifying separable dimensions or subgroups. These are unsupervised machine-learning techniques: you simply give them your data, and they will carve it up without requiring further input from you.

Starting August 12, we are offering this seminar as an 8-hour synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two 2-hour lecture sessions which include hands-on exercises, separated by a 30-minute break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

Computing

To run hands-on exercises, we will be using carefully crafted interactive notebooks via Google Colaboratory. For this, you only need an internet browser (like Firefox) and a Google account.

Alternatively, you are welcome to install Python on your own computer. In addition to Python (version 3.7 or higher), you will need the packages NumPy, SciPy, Matplotlib, and scikit-learn. Python package installation can be a bit tricky for those who aren’t familiar with it. We will cover installing Python packages on the first day of the course, so you might want to wait to install anything until then.

Who Should Register?

This course is aimed at people who already know the basics of Python. This includes those who have taken Code Horizons’ Introduction to Python for Data Analysis.

The content leans towards data science, so this course will be especially useful to those who would like to expand their expertise in data handling, visualization, statistics, and basic machine learning.

Outline

Day 1: Dimensional analyses

Recap of crucial skills
- NumPy arrays
- Loading data from files
Generating realistic fake data
- Creating predictors with realistic covariance
- Creating outcomes with ground-truth models
Principal component analysis (PCA)
- Fitting a PCA
- Scree plots
Factor analysis
- PCA as a first step
- Rotations
Independent component analysis (ICA)
- Uncovering the signals that created an outcome
- Using ICA in noise reduction

Day 2: Subgroup analyses

Cluster analysis
- k-means
- The curse of dimensionality
- Evaluating outcomes
Hierarchical agglomerative clustering
- Distance and linkage
- Plotting a dendrogram
Fuzzy clustering
- c-means
- Fuzzy silhouette score
Mixture modelling
- Latent class analysis
- Latent profile analysis

Seminar Information

Monday, August 12 –
Tuesday, August 13, 2024

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:30am-12:30pm (convert to your local time)
1:00pm-3:00pm

Payment Information

The fee of $695 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.

Contact Information

+1 610-715-0115 info@statisticalhorizons.com