Statistical Computing with Python

A 3-Day Remote Seminar Taught by Jason Anastasopoulos, Ph.D.

 

NOTE: this course is designed for those who have taken Introduction to Python for Data Analysis or who already have some experience with Python.

Download a sample of the course materials

DOWNLOAD

Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day.

Python can be used for:

  • Storing and analyzing large and small datasets.
  • Web scraping and data collection using APIs.
  • Beautiful data visualization.
  • Natural language processing and text analysis.
  • General machine learning.
  • Deep learning.
  • Image analysis and much, much more…

Starting October 22, we are offering this seminar as a 3-day synchronous*, remote workshop for the first time. Each day will consist of a 4-hour, live morning lecture held via the free video-conferencing software Zoom. Participants are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if they are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional session will be held Thursday and Friday afternoons as an “office hour”, where participants can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

More Details About the Course Content

This seminar is an intermediate course on statistical computing with Python. The goal is to get participants to learn about advanced data analysis and visualization applications of the Python language.

By the end of this seminar you will be able to do:

  • Natural language processing: Grasp the basics of natural language processing and sentiment analysis.
  • Advanced data visualization: Advanced Python plotting functionality. This includes: plotting geospatial data and plotting text data.
  • Big data analysis and inference: Learn how to deal with massive data in Python.
  • Statistical inference: Perform data analysis and basic statistical inference with Python, including: GLMs, ANOVA and hypothesis testing.
  • Web-scraping: Scrape and parse semi-structured data, including HTML, XML, and JSON.
  • Databases: Create and extract information from SQL and MongoDB databases with Python.

Computing

This remote seminar is held via Zoom, a free video conferencing application. Instructions for joining a session via Zoom are available here. Before the seminar begins, participants will receive an email with the meeting code and password you must use to join.

This is a hands-on class that will involve at least two hours of structured and supervised assignments. To ensure that you are prepared, you must do the following BEFORE the first class:

➔ Download and Install Anaconda Python 3.7+ Individual Edition for your operating system: https://www.anaconda.com/products/individual

➔ Familiarize yourself with Google Colaboratory Python Notebooks: https://colab.research.google.com/notebooks/intro.ipynb

You should also know how to access the command prompt (Windows users) or the terminal (Mac users). We will briefly review how to access these in class, but it will save you time and effort if you come already knowing these basics. You can get resources on the internet that will help you get started with the Windows Command Prompt or the Mac Terminal.

Materials

Participants receive access to a private repository containing all of the lecture notes, code and data needed for the class.

Participants interested in getting a jump start on some of the material should consider reading the “Python Data Science Handbook” by Jake VanDerPlas. This book is not required but is recommended as optional reading and as a useful reference.

Who Should Register?

This seminar is designed for students who already have basic programming skills in Python and want to learn more advanced applications typically used by data scientists and academic researchers.

This course assumes that you have already completed Python for Data Analysis or a similar introduction to Python course.

Seminar Outline

Day 1: Semi-structured Data: Web-scraping and Databases

1. Semi-structured data:

  • HTML parsing.
  • JSON parsing.

2. Database creation and extraction:

  • Introduction to SQL.
  • Introduction to MongoDB.
  • Using MongoDB and SQL to store and retrieve data.

3. Web-scraping and Databases Assignment and Review

Day 2: Advanced Data Analysis and Visualization

1. General statistical inference

  • Linear regression.
  • Generalized Linear Models.
  • Time series analysis.

2. Big data analysis and inference

  • Import/export of massive data sets.
  • Statistical inference with massive data sets.

3. Advanced topics in data visualization

  • Making beautiful plots with Seaborn.
  • Geospatial data visualization.

4. Advanced Data Analysis and Visualization Assignment and Review.

Day 3: Natural Language Processing

1. Collecting social media data with APs.

2. Unstructured data and natural language processing:

  • Introduction to text processing.

3. Sentiment analysis.

4. Natural Language Processing Assignment and Review.

Seminar information

Thursday, October 22, 2020 –
Saturday, October 24, 2020

Each day will follow this schedule:

10:00am-2:00pm ET: Live lecture via Zoom

3:00pm-4:00pm ET: Live “office hour” via Zoom (Thursday and Friday only)

Payment Information

The fee of $795 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.