Version Control for Data Analysis Using GitHub: A Short Course

An 8-hour Livestream Seminar Taught by Aaron Gullickson, Ph.D.

Download Sample Course Slides

Check out Dr. Gullickson’s blog post, where he gives a brief introduction to creating your first R project repository using GitHub.

Git is a free, open-source distributed version control system that is used by programmers and data analysts to track project progress efficiently, code without fear of error, and collaborate sanely. Data scientists and researchers have adopted version control to facilitate efficient project management and to easily disseminate research materials (such as code) to broader communities.

GitHub, a website that provides online open-access git repositories, has emerged as a leading choice for data analysts and researchers seeking to collaborate and share projects using git. GitHub provides a variety of additional features and workflows that improve the experience of using git.

This seminar will familiarize you with using git through GitHub and demonstrate how to integrate git and GitHub into a research workflow. We will focus on teaching you the basic git workflow and how to use git and GitHub to simplify your research collaboration.

Starting April 24, we are offering this seminar as an 8-hour synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two 2-hour lecture sessions which include hands-on exercises, separated by a 30-minute break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

The course will introduce you to the basic workflow of git, including how to commit, push, and pull changes to underlying research material, as well as how to create and clone repositories through GitHub. You will also learn how to create separate branches of code for efficient collaboration and how to merge branches using GitHub pull requests.

The principles of using git developed in this course can be applied broadly to any statistical software package that uses scripting. In this course, you will learn both command line tools for working with git as well as how to interact with git using GitHub Desktop and RStudio. Collectively, these tools will serve the needs of all users.

The seminar will be very hands on and you will learn how to create and manage your own remote repositories through GitHub. You are welcome to bring projects to the course for which you would like to construct GitHub repositories.

Computing

In order to participate in the hands-on exercises and to follow along in the class, you will need to have git installed on your computer. Git is free, open source, and available on Windows, Mac, and Linux platforms. Window users will also need to use the Git Bash application (installed automatically with git) for command line operations. You will also need to create a free account on GitHub.

We will also make use of additional GUI clients that can make it easier to work with git. You should install GitHub Desktop as this will be our primary method of interacting with git. You are also encouraged to download and install R and RStudio to learn how to manage a project through git and GitHub using RStudio.

Who Should Register?

This course is for anyone who wants to improve their statistical research workflow and learn to easily collaborate on research and share the products of that research. The principles learned in this course can be applied broadly to working in any statistical or coding environment.

Outline

Day 1: The basic git workflow

    • What is version control?
    • Setting up git
    • Time to commit: Working with a local repository
    • Push and pull: Working with a remote repository
    • Making your first repository on GitHub
    • Learning good repository organization principles

Day 2: Collaboration and complications

    • Dealing with (git) conflicts
    • Branching and pull requests
    • Collaborating with GitHub tools
    • Undoing changes
    • Ignoring things (in git)
    • Working with large files

Reviews of Version Control for Data Analysis Using GitHub

“The content was well planned and gets you going with GitHub in a few days! The instructor, Aaron Gullickson, is an extremely competent, experienced educator. He prepared the content in a concise and a very well-structured way.” 
  Agz Leman, University of Surrey 

“Having mastered RStudio primarily by trial and error, GitHub was a scary interface to me. Before attending this class, I never understood fork or pull requests. This class answered everything and is a great class. I enjoyed all the sessions. The course was structured in an easily understandable way.” 
  Soundarya Soundararajan 

“It was easy to follow. The content was exactly what I hoped for with a lot of practical advice. It was a nice atmosphere, so that I was not afraid to ask questions. And Aaron is not only an expert on git, you could see he is in love with git.” 
  Ann-Kristin Koop, International Association for the Evaluation of Educational Achievement 

“Git is a mind bender, and I’m grateful to have taken this course—particularly with this instructor. Dr. Gullickson clearly has guided a lot of people through this process and has thought about how best to learn the material. I highly recommend this workshop for anyone working with data whether you are working solo or in groups.”
  Michael Davies, NGIC

Seminar Information

Thursday, April 24 –
Friday, April 25, 2025

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:30am-12:30pm (convert to your local time)
1:00pm-3:00pm

Payment Information

The fee of $695 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.