Regression Modeling for Prediction Using Python: A Short Course

An 8-Hour Livestream Seminar Taught by Edwin Dalmaijer, DPhil

Download Sample Course Slides

Python is a general-purpose programming language. It is open-source, powerful, and easy to use. Because of this, Python is one of the most popular languages in the world, and it has become indispensable in data science.

In this course, we will cover various types of regression models. These are supervised statistical techniques: you give them variables and an outcome, and they will try to find the best fitting set of parameters that most closely predicts the outcome from the variables.

You will learn how to fit regression models to data and how to use trained models for prediction. You will be introduced to multivariable and multivariate linear regression, logistic regression, and mediation analysis. To avoid over-fitting, you will also learn how to divide datasets into train and test sets, and how to implement different types of cross-validation.

The above will be implemented in scikit-learn, a Python package for machine learning. It is a powerful tool for data science, and its common interface will allow you to extend what you learn in this course to other models.

Starting October 24, we are offering this seminar as an 8-hour synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two 2-hour lecture sessions which include hands-on exercises, separated by a 30-minute break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

Join Dr. Dalmaijer for Unsupervised Statistical Learning Using Python on August 12-13 to learn how to use the scikit-learn package in Python to uncover subgroups and latent components in datasets with unsupervised machine-learning techniques.

Computing

To run hands-on exercises, we will be using carefully crafted interactive notebooks via Google Colaboratory. For this, you only need an internet browser (like Firefox) and a Google account.

Alternatively, you are welcome to install Python on your own computer. In addition to Python (version 3.7 or higher), you will need the packages NumPy, SciPy, Matplotlib, and scikit-learn. Python package installation can be a bit tricky for those who aren’t familiar with it. We will cover installing Python packages on the first day of the course, so you might want to wait to install anything until then.

Who Should Register?

This course is aimed at people who already know the basics of Python. This includes those who have taken Code Horizons’ Introduction to Python for Data Analysis.

The content leans towards data science, so this course will be especially useful to those who would like to expand their expertise in data handling, visualization, statistics, and basic machine learning.

Outline

Day 1: Linear regression

  • Recap of crucial skills
    • NumPy arrays
    • Loading data from files
  • Generating realistic fake data
    • Creating predictors with realistic covariance
    • Creating outcomes with ground-truth models
  • Linear regression
    • Understanding ordinary least squares
    • Using scikit-learn to do the hard work for you
  • Logistic regression
    • Predicting binary outcomes
  • Mediation analysis
    • Direct and indirect effects
    • Three regressions in a trench coat

Day 2: Cross-validation and hold-out sets

  • Multivariable linear regression
    • Moving from one predictor to many
    • Using binary and continuous predictors
    • Dummy variables
  • Hold-out sets
    • Dividing data into train and a test sets
  • Cross-validation
    • k-folds
    • Hold-one-out
    • Monte-Carlo

Seminar Information

Thursday, October 24 –
Friday, October 25, 2024

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:30am-12:30pm (convert to your local time)
1:00pm-3:00pm

Payment Information

The fee of $695 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.