Introduction to Python for Data Analysis

A 3-Day Remote Seminar Taught by Jason Anastasopoulos, Ph.D.

 

NOTE: this course is designed for those who have no previous experience with Python. If you have some previous experience with Python, you may want to consider Statistical Computing with Python.

Download a sample of the course materials

DOWNLOAD

Python is a premier language for modern data science and data analysis. It is a free, open-source language that has a simple, easy-to-understand syntax and an incredible range of data analysis and visualization libraries. In three days, this seminar combines both an introductory and intermediate course in Python. The goal is to get participants to fully understand many of the basic elements of Python and immediately apply them to practical data analysis and data collection problems.

Starting October 1, we are offering this seminar as a 3-day synchronous*, remote workshop. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. Participants are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if they are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional session will be held Thursday and Friday afternoons as an “office hour”, where participants can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

More Details About the Course Content

Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day.

Python can be used for:

  • Storing and analyzing large and small datasets.
  • Web scraping and data collection using APIs.
  • Beautiful data visualization.
  • Natural language processing and text analysis.
  • General machine learning.
  • Deep learning.
  • Image analysis and much, much more…

How you will benefit from this seminar:

This seminar is a foundational course in Python. The goal is to get participants to fully understand many of the basic elements of Python and immediately apply them to practical data analysis problems.

By the end of this seminar you will be able to:

  • Program using Python (Jupyter) notebooks and IDEs.
  • Understand and use basic data analysis and visualization libraries such as NumPy, Pandas, Matplotlib, Seaborn and statsmodels, among others.
  • Use basic data structures needed to do data analysis: variables, lists, loops, dictionaries, Boolean operators, functions.
  • Perform data analysis and basic statistical inference: GLMs, ANOVA, hypothesis testing.
  • Produce beautiful data visualizations.

Computing

This remote seminar is held via Zoom, a free video conferencing application. Instructions for joining a session via Zoom are available here. Before the seminar begins, participants will receive an email with the meeting code and password you must use to join.

This is a hands-on class that will involve at least two hours of structured and supervised assignments. To ensure that you are prepared, you must do the following BEFORE the first class:

➔ Download and Install Anaconda Python 3.7+ Individual Edition for your operating system: https://www.anaconda.com/products/individual

➔ Familiarize yourself with Google Colaboratory Python Notebooks: https://colab.research.google.com/notebooks/intro.ipynb

You should also know how to access the command prompt (Windows users) or the terminal (Mac users). We will briefly review how to access these in class, but it will save you time and effort if you come already knowing these basics. You can get resources on the internet that will help you get started with the Windows Command Prompt or the Mac Terminal.

Materials

Participants receive access to a private repository containing all of the lecture notes, code and data needed for the class.

Participants interested in getting a jump start on some of the material should consider reading the free book “Python for Everybody” by Dr. Charles R. Severance. This book is not required but is recommended as optional reading and as a useful reference.

Who Should Register?

This seminar is designed for anyone who wants to quickly and efficiently obtain a solid foundation in the Python language that will allow them to begin using the language for their research, data analysis or visualization needs.

This seminar does not assume any previous programming experience. However, those at an intermediate or advanced level in other packages or languages can also benefit greatly from this course.

Seminar Outline

Day 1: Introduction to Python

1. Getting started with Python:

  • Why python?
  • Introduction to Anaconda Python.
  • Introduction to Python (Jupyter) notebooks.
  • Overview of basic libraries used: NumPy, Pandas, Matplotlib, SciPy, statsmodels.

2. Python basics and data structures:

  • Variables: numbers, strings values, using variables.
  • Lists and loops: lists basics, simple loops, pythonic loops.
  • Logical statements in python.
  • Using and creating dictionaries.
  • Creating functions.

3. Python Basics Assignment Solutions and Review.

Day 2: Data Analysis and Manipulation

1. Data analysis and statistical inference:

  • Handling arrays with Pandas and NumPy.
  • Basic data analysis:
    A. Summary statistics: mean, median, mode, variance and standard deviation.
    B. Hypothesis testing: t-tests, confidence intervals.

2. Data analysis and manipulation assignment solutions and review.

Day 3: Statistical Inference and Data Visualization

1. Statistical inference:

  • Linear regression, logistic regression, generalized linear models.

2. Data visualization:

  • Basic plots: Scatterplots, line plots, heatmaps.
  • Distributions: Densities, box plots, histograms.

3. Statistical inference and visualization assignment review.

Seminar information

Thursday, October 1, 2020 –
Saturday, October 3, 2020

Each day will follow this schedule:

10:00am-2:00pm ET: Live lecture via Zoom

3:00pm-4:00pm ET: Live “office hour” via Zoom (Thursday and Friday only)

Payment Information

The fee of $795 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.