GitHub for Data Analysis: A Short Course

A 3-Day Livestream Seminar Taught by Aaron Gullickson, Ph.D.

Download Sample Course Slides

Check out Dr. Gullickson’s blog post, where he gives a brief introduction to creating your first R project repository using GitHub.

Git is a free, open-source distributed version control system that is used by programmers and data analysts to track project progress efficiently, code without fear of error, and collaborate sanely. Although version control was originally developed for software development, data scientists have adopted its use to facilitate efficient project management and to easily disseminate research materials (such as code) to broader communities.

GitHub, a website that provides online open-access git repositories, has emerged as a leading choice for data analysts and researchers seeking to collaborate and share projects using git. GitHub provides a variety of additional features and workflows that improve the experience of using git.

This seminar will familiarize you with using git through GitHub and demonstrate how to integrate GitHub into a research workflow. The seminar will focus on the basic git workflow and how to use git and GitHub to simplify research collaboration.

Starting October 11, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions. Live captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

The course will introduce you to the basic workflow of git, including how to commit, push, and pull changes to underlying research material, as well as how to create and clone repositories through GitHub. You will also learn how to create separate branches of code for saner collaboration and how to merge branches using GitHub pull requests.

You will learn both command line tools for working with git as well as how to interact with git using GitHub Desktop and RStudio. While you will learn how to use git to manage a project directly in RStudio, the principles of using git developed in this course can be applied broadly to any statistical software package that uses scripting.

The seminar will be very hands on and you will learn how to create and manage your own remote repositories through GitHub. You are welcome to bring projects to the course for which you would like to construct GitHub repositories.

Computing

In order to participate in the hands-on exercises and to follow along in the class, you will need to have git installed on your computer. Git is free, open source, and available on Windows, Mac, and Linux platforms. Window users will also need to use the Git Bash application (installed automatically with git) for command line operations. You will also need to create a free account on GitHub.

We will also make use of additional GUI clients that can make it easier to work with git. You should install GitHub Desktop as this will be our primary method of interacting with git. You are also encouraged to download and install R and RStudio to learn how to manage a project through git and GitHub using RStudio.

Who Should Register?

This course is for anyone who wants to improve their statistical research workflow and learn to easily collaborate on research and share the products of that research. The principles learned in this course can be applied broadly to working in any statistical or coding environment.

Outline

Day 1: The Basic Git Workflow

  • What is version control?
  • Setting up git
  • Time to commit: working with a local repository
  • Push and pull: working with a remote repository
  • Making your first repository on GitHub

Day 2: Collaborating with Others

  • Dealing with (git) conflicts
  • Branching for sanity
  • Creating pull requests
  • Collaborating with GitHub tools

Day 3: Dealing with Complications

  • Undoing changes
  • Learning good repository organization principles
  • Ignoring things (in git)
  • Working with large files
  • Using the README
  • Creating GitHub templates
  • Extending git and GitHub with other tools

Reviews of GitHub for Data Analysis

“The content was well planned and gets you going with GitHub in 3 days! The instructor, Aaron Gullickson is an extremely competent, experienced educator. He prepared the content in a concise and a very well-structured way.” 
  Agz Leman, University of Surrey 

“Having mastered studio primarily by trial and error, GitHub was a scary interface to me. Before attending this class, I never understood fork or pull requests. This class answered everything and is a great class. I enjoyed all the sessions. The course was structured in an easily understandable way.” 
  Soundarya Soundararajan 

“It was easy to follow. The content was exactly what I hoped for with many practical advices. It was a nice atmosphere, so that I was not afraid to ask questions. And Aaron is not only an expert on git, you could see he is in love with git.” 
  Ann-Kristin Koop, International Association for the Evaluation of Educational Achievement 

“Git is a mind bender, and I’m grateful to have taken this course—particularly with this instructor. Dr. Gullickson clearly has guided a lot of people through this process and has thought about how best to learn the material. I highly recommend GitHub for Data Analysis for any who are working with data whether you are working solo or in groups.”
  Michael Davies, NGIC

Seminar Information

Wednesday, October 11,
Thursday, October 12 &
Saturday, October 14, 2023

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.