Statistical Computing with Python
A 4-Day Remote Seminar Taught by Jason Anastasopoulos, Ph.D.
Download a sample of the course materialsDOWNLOAD
Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day.
Python can be used for:
- Storing and analyzing large and small datasets.
- Web scraping and data collection using APIs.
- Beautiful data visualization.
- Natural language processing and text analysis.
- General machine learning.
- Deep learning.
- Image analysis and much, much more…
This seminar is an intermediate course on statistical computing with Python. The goal is to get participants to learn about advanced data analysis and visualization applications of the Python language.
Starting June 8, we are offering this seminar as a 4-day synchronous*, remote workshop. Each day will consist of a 3-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional lab session will be held Tuesday and Thursday afternoons, where you can review the exercise results with the instructor and ask any questions.
*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for two weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
More Details About the Course Content
By the end of this seminar you will be able to do:
- Natural language processing: Grasp the basics of natural language processing and sentiment analysis.
- Advanced data visualization: Advanced Python plotting functionality. This includes: plotting geospatial data and plotting text data.
- Big data analysis and inference: Learn how to deal with massive data in Python.
- Statistical inference: Perform data analysis and basic statistical inference with Python, including: GLMs, ANOVA and hypothesis testing.
- Web-scraping: Scrape and parse semi-structured data, including HTML, XML, and JSON.
- Databases: Create and extract information from SQL and MongoDB databases with Python.
This is a hands-on class that will involve at least two hours of structured and supervised assignments. To ensure that you are prepared, you must do the following BEFORE the first class:
➔ Download and Install Anaconda Python 3.7+ Individual Edition for your operating system: https://www.anaconda.com/products/individual
➔ Familiarize yourself with Google Colaboratory Python Notebooks: https://colab.research.google.com/notebooks/intro.ipynb
You should also know how to access the command prompt (Windows users) or the terminal (Mac users). We will briefly review how to access these in class, but it will save you time and effort if you come already knowing these basics. You can get resources on the internet that will help you get started with the Windows Command Prompt or the Mac Terminal.
Participants receive access to a private repository containing all of the lecture notes, code and data needed for the class.
Participants interested in getting a jump start on some of the material should consider reading the “Python Data Science Handbook” by Jake VanDerPlas. This book is not required but is recommended as optional reading and as a useful reference.
Who Should Register?
This seminar is designed for students who already have basic programming skills in Python and want to learn more advanced applications typically used by data scientists and academic researchers.
This course assumes that you have already completed Introduction to Python for Data Analysis or a similar introduction to Python course.
Day 1: Semi-structured Data: Web-scraping and Databases
1. Semi-structured data:
- HTML parsing.
- JSON parsing.
2. Database creation and extraction:
- Introduction to SQL.
- Introduction to MongoDB.
- Using MongoDB and SQL to store and retrieve data.
3. Web-scraping and Databases Assignment and Review
Day 2: Big Data Analysis
1. General statistical inference
- Linear regression.
- Generalized Linear Models.
- Time series analysis.
2. Big data analysis and inference
- Import/export of massive data sets.
- Statistical inference with massive data sets.
3. Statistical Inference and Big Data Analysis Assignment and Review
Day 3: Advanced Data Visualization and Collection
1. Advanced Data Visualization
- Making beautiful plots with Seaborn.
- Geospatial data visualization.
2. Advanced Data Collection
- Collecting social media data with APIs.
3. Advanced Data Visualization and Collection Assignment and Review.
Day 4: Natural Language Processing and Sentiment Analysis
1. Unstructured data and natural language processing:
- Introduction to text processing.
2. Sentiment analysis.
3. Natural Language Processing Assignment and Review.
Recent Comments From Jason Anastopoulos's Other Seminars
“A very nice interactive course that provided me with a good introduction to Python.”
Tahereh Dehdarirad, Chalmers University of Technology
“The notes that Jason provided were outstanding, as were the videos that could be watched at any time after the course was complete.”
“Great applications. Very approachable, but also not too simplistic where I could easily learn it on my own. Instructor was very inclusive, and offered great responses.”
Tuesday, June 8, 2021 –
Friday, June 11, 2021
Each day will follow this schedule:
11:00am-2:00pm ET (New York time): Live lecture via Zoom
4:00pm-5:00pm ET: Live lab session via Zoom (Tuesday and Thursday only)
The fee of $895 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.