Introduction to the Analysis of Electronic Health Records: A Short Course

A 3-Day Livestream Seminar Taught by Jesse Gronsbell, Ph.D.

Download Sample Course Slides

The widespread adoption of electronic health records (EHR) has generated massive amounts of clinical data with potential to improve healthcare delivery and advance biomedical research. EHRs contain comprehensive patient-level information collected over time, including demographics, disease diagnoses, medical procedures, and vital signs. Large scale EHR databases are also being increasingly linked across healthcare systems and to biobanks containing detailed genetic data to characterize individual health at unprecedented scale and precision.

However, EHR data is complex and heterogeneous. Effective data analysis requires a deep understanding of the data as well as familiarity with modern statistical and machine learning methods. This course will provide a broad overview of the analysis of EHR data for participants with little or no prior experience with the topic. We will start with the opportunities and challenges associated with the analysis of EHR data. We will then build an understanding of data provenance and structure. Finally, we will cover basic and advanced methods for EHR data analysis and their use in various research applications.

We will cover a full suite of methods for processing EHR data, developing phenotyping models, generating real-world evidence, and developing fair and privacy preserving predictive models. You will also be introduced to publicly available datasets, software packages for statistical analyses, and tools for clinical natural language processing. The course will be hands-on and use the R and Rstudio computing environment. After completing the course, you will be prepared to analyze your own EHR dataset and deepen your knowledge of the topic.

Starting November 7, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.


This seminar will use R as the base software and incorporate publicly available clinical natural language processing software such as MetaMap. All of the datasets used for exercises are openly available and detailed instructions will be provided for additional software.

Basic familiarity with R is highly desirable, but even novice R coders should be able to follow the presentation and do the exercises.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations

Who Should Register?

This course is for you if you want to learn the fundamentals of EHR data analysis and apply them to your own biomedical research questions. While no prior knowledge of EHR data is necessary, knowledge of linear and logistic regression is required for the course.


Day 1

1. Introduction to electronic health record (EHR) data

    • Types of EHR systems
    • EHR terminology
    • Data structure and provenance

2. Opportunities and challenges for EHR-based applications

    • Opportunities: comparative effectiveness studies, clinical decision support, biobank analyses, etc.
    • Challenges: selection bias, missing data, measurement error, etc.

Day 2

3. Curating research quality data

    • Code mapping
    • Free-text processing

4. EHR-based phenotyping

    • Rule-based algorithms
    • Machine learning methods

Day 3

5. Real-world evidence generation with EHRs

6. Predictive modeling with EHRs

    • Fairness considerations
    • Privacy preserving algorithms

Reviews of Introduction to the Analysis of Electronic Health Records

“Dr. Gronsbell’s depth of knowledge, clarity of explanation, and kindness shone through in this course. Her pedagogy and course content were excellent! She’s a brilliant instructor, and I highly recommend this seminar to anyone planning to work with, or have worked with, EHRs.” 
  Savannah L. Kelly, University of Mississippi 

“The seminar was very well structured and Jesse gave excellent explanations. I learned a lot in this seminar and I am glad I signed up! I also liked that participants asked relevant questions and contributed to the learning experience.” 
  Julius Weise, Universität des Saarlandes

“Dr. Gronsbell is both a cutting-edge researcher in this area and a very good teacher! She was very responsive to student questions.” 
  Clayton Brown, University of Maryland, Baltimore 

“I liked the clear step wise approach as well as the focus on phenotypes and how to proceed to ‘extract’ the phenotypes.” 
  Jan Posthumus, Basilea Pharmaceutica International Ltd, Allschwil 

Seminar Information

Thursday, November 7 –
Saturday, November 9, 2024

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.