Introduction to Python for
Data Analysis

A 3-Day Livestream Seminar Taught by Edwin Dalmaijer, Ph.D.

Download Sample Course Slides

Python is one of the most popular languages in the world. It is a general-purpose language, but also highly user-friendly. This makes Python a very powerful tool, with a relatively easy learning curve. It is also open-source and supported by a large international community of users who support each other, and continue to develop additional functionality.

In the field of data science, Python has become indispensable. It is used for quick prototyping of statistical models and machine-learning pipelines, and you can even find highly mature Python applications in production environments! It has also become a go-to language in science, from astrophysics (e.g. black hole imaging) to zoology (e.g. evolution simulation).

This course is aimed at beginners, including those who are new to programming altogether. We will start with the basics of coding, including variables, logic, loops, functions, and object-oriented programming. In addition, we will discuss reading and writing data files, how to process large quantities of data fast, and data visualization. Finally, the course will cover statistics, regression, models, and a bit of machine learning. No prior knowledge on any of these topics is assumed.

Starting September 19, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult.  If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions.

More Details About the Course Content

More specifically, the course will cover how to write your own functions and classes, using variables, statements, and loops. These make up the majority of code-bases, and are thus a crucial skill to master. You will also be introduced to some of the most commonly used packages: NumPy and SciPy for fast computing, Matplotlib for publication-quality visualizations, and scikit-learn for machine learning.

The course will be very hands-on. It will run through interactive notebooks in your internet browser, so you won’t have to download anything. However, we will provide advice on how to install Python and additional packages, so that you can continue using it at home and at work.

At the end of this course, you should be able to find your own way in Python. You will be equipped to handle datasets and to write full analyses. You will also be well-equipped to start deepening your Python knowledge, as this course will have introduced some of the most commonly used tools.

Computing

To run hands-on exercises, we will be using carefully crafted interactive notebooks via Google Colaboratory. For this, you only need an internet browser (like Firefox) and a Google account.

Alternatively, you are welcome to install Python on your own computer. In addition to Python (version 3.7 or higher), you will need the packages NumPy, SciPy, Matplotlib, and scikit-learn. Python package installation can be a bit tricky for those who aren’t familiar with it. We will cover installing Python packages on the first day of the course, so you might want to wait to install anything until then.

Who Should Register?

This course is for everyone who would like to learn Python, or to dip their toes into programming. The content leans towards data science, so this course will be especially useful to those who would like to expand their expertise in data handling, visualization, statistics, and basic machine learning. No prior knowledge of coding or statistics is necessary: we’ll start with the basics, and work our way up from there.

Outline

Day 1: Programming basics

  • Variables
    • Numerical values (int and float), operations, and functions
    • Text values (str), operations, and functions
    • Booleans and logical operation
    • Collections (tuples, lists, dictionaries), operations, and functions
  • If statements
  • While loops
  • For loops
  • Functions
    • What is a function?
    • Input and output
  • Classes
    • The class as a blueprint, the instance as a product
    • Bound variables
    • Bound functions

Day 2: Data processing and visualization

  • NumPy
    • Arrays: fast, scalable, fantastic
    • Useful array manipulation functions
    • Random data generation
  • Loading and writing data
    • Paths and the os module
    • Writing a CSV file
    • Loading a CSV file
    • Managing big data with memory-mapped arrays
  • Data visualization
    • Matplotlib
    • The basics: scatter plots, lines, error bars, and bar charts
    • Better than bars: box plots and violin plots
    • Drawing distributions
    • Heatmaps

Day 3: Statistics

  • Basic statistical tests
    • Tests of relations
    • Tests of differences
  • Model fitting
    • Scikit-learn
    • Linear regression
    • Multivariable regression
    • Regularization (LASSO, Ridge)
    • Writing your own model
    • Generic data fitting with minimization
    • Bayesian Information Criterion
  • Cross-validation
    • Data for training, and data for testing
    • N-folds cross-validation
  • Unsupervised learning algorithms
    • Machine learning with scikit-learn
    • Data-driven methods
    • K-means clustering
    • Fuzzy clustering
    • Gaussian mixture models

Seminar information

Monday, September 19 –
Wednesday, September 21, 2022

Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time) Monday-Wednesday
1:30pm-4:00pm Monday
1:30pm-3:30pm Tuesday & Wednesday

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.