Statistical Computing with Python

A 3-Day Livestream Seminar Taught by Jason Anastasopoulos, Ph.D.


NOTE: this course is designed for those who have taken Introduction to Python for Data Analysis or who already have some experience with Python.
Download Sample Course Slides

Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day.

Python can be used for:

  • Storing and analyzing large and small datasets.
  • Web scraping and data collection using APIs.
  • Beautiful data visualization.
  • Natural language processing and text analysis.
  • General machine learning.
  • Deep learning.
  • Image analysis and much, much more…

This seminar is an intermediate course on statistical computing with Python. The goal is to get participants to learn about advanced data analysis and visualization applications of the Python language.

Starting January 6, we are offering this seminar as a 3-day synchronous*, livestream workshop. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional lab session will be held Thursday and Friday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.

More Details About the Course Content

By the end of this seminar you will be able to do:

  • Natural language processing: Grasp the basics of natural language processing and sentiment analysis.
  • Advanced data visualization: Advanced Python plotting functionality. This includes: plotting geospatial data and plotting text data.
  • Big data analysis and inference: Learn how to deal with massive data in Python.
  • Statistical inference: Perform data analysis and basic statistical inference with Python, including: GLMs, ANOVA and hypothesis testing.
  • Web-scraping: Scrape and parse semi-structured data, including HTML, XML, and JSON.
  • Databases: Create and extract information from SQL and MongoDB databases with Python.


This is a hands-on class that will involve at least two hours of structured and supervised assignments. To ensure that you are prepared, you must do the following BEFORE the first class:

➔ Download and Install Anaconda Python 3.7+ Individual Edition for your operating system:

➔ Familiarize yourself with Google Colaboratory Python Notebooks:

You should also know how to access the command prompt (Windows users) or the terminal (Mac users). We will briefly review how to access these in class, but it will save you time and effort if you come already knowing these basics. You can get resources on the internet that will help you get started with the Windows Command Prompt or the Mac Terminal.


Participants receive access to a private repository containing all of the lecture notes, code and data needed for the class.

Participants interested in getting a jump start on some of the material should consider reading the “Python Data Science Handbook” by Jake VanDerPlas. This book is not required but is recommended as optional reading and as a useful reference.

Who Should Register?

This seminar is designed for students who already have basic programming skills in Python and want to learn more advanced applications typically used by data scientists and academic researchers.

This course assumes that you have already completed Introduction to Python for Data Analysis or a similar introduction to Python course.

Seminar Outline

Day 1: Semi-structured Data: Web-scraping and Databases

1. Semi-structured data:

  • HTML parsing.
  • JSON parsing.

2. Database creation and extraction:

  • Introduction to SQL.
  • Introduction to MongoDB.
  • Using MongoDB and SQL to store and retrieve data.

3. Web-scraping and Databases Assignment and Review

Day 2: Big Data Analysis

1. General statistical inference

  • Linear regression.
  • Generalized Linear Models.
  • Time series analysis.

2. Big data analysis and inference

  • Import/export of massive data sets.
  • Statistical inference with massive data sets.

3. Statistical Inference and Big Data Analysis Assignment and Review

Day 3: Advanced Data Visualization and Collection

1. Advanced Data Visualization 

  • Making beautiful plots with Seaborn.
  • Geospatial data visualization.

2. Advanced Data Collection

  • Collecting social media data with APIs.

3. Advanced Data Visualization and Collection Assignment and Review.

Day 4: Natural Language Processing and Sentiment Analysis

1. Unstructured data and natural language processing:

  • Introduction to text processing.

2. Sentiment analysis.

3. Natural Language Processing Assignment and Review.

Reviews of Statistical Computing with Python

“The course is helpful for me because the content, especially for the methods and skills the instructor shares, is very practical and easy to understand. And he is pleased to answer students’ questions and doubts which is very important for the short-term course.”
  Nelson Qilong Wang, HKBU

“I really enjoyed the numerous advanced topics discussed, specifically NLP, geospatial mapping, and big data analysis. Jason took time to retouch on basic principles right before diving into the meat of material. The exercises were also helpful.”

“This course is a great introduction to manipulating and processing data sets in multiple formats.”

Seminar information

Thursday, January 6, 2022 –
Saturday, January 8, 2022

Each day will follow this schedule:

10:00am-2:00pm ET (New York time): Live lecture via Zoom

4:00pm-5:00pm ET: Live lab session via Zoom (Thursday and Friday only)

Payment Information

The fee of $895 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.