Data Visualization Using Stata and LLMs - A Short Course
A 3-Day Livestream Seminar Taught by Trenton Mize, Ph.D.
Read reviews of this courseUnderstanding data and effectively presenting model results are challenges that data analysts face almost every day. There is seldom a more effective solution than a well thought out visualization. Problems in the data are easily identified; complex effects are quickly summarized; effect sizes and variability are immediately clear. In this seminar, we will cover best practices for accurately representing data as well as many specific approaches to data exploration, model diagnostics, and model presentation.
The primary focus is on the applied analyst’s “bread and butter” types of visualizations: those that will be useful in most every research project. These range from exploratory data visualizations that help you understand your data—to visualizations of complex models that can help translate results to a broad audience.
In addition to learning data visualization tools in Stata, this course will also equip you with a set of structured prompts to use with your Large Language Model (LLM) of choice. LLMs like ChatGPT or Claude can serve as invaluable “research assistants” but need to be prompted in a skillful way to maximize their usefulness and avoid pitfalls. You will learn how to use LLMs to help write code, fix errors, customize graphs, and to give feedback to ensure your figures follow best practices for accessibility and effectiveness. Explicit discussion of LLM prompting will comprise approximately 15-20% of course time.
Starting April 8, this seminar will be presented as a 3-day synchronous, livestream workshop via Zoom. Each day will feature two lecture sessions with hands-on exercises, separated by a 1-hour break. Live attendance is recommended for the best experience. But if you can’t join in real time, recordings will be available within 24 hours and can be accessed for four weeks after the seminar.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
ECTS Equivalent Points: 1
More Details About the Course Content
Topics covered range from exploratory data analysis techniques to methods for presenting complex model results. Applied exercises will help you implement the techniques we cover in Stata. Additional template Stata code will be provided, allowing you to reproduce all workshop examples.
The seminar will use Stata. Stata is widely-used to clean, examine, model, and visualize data. The data and model visualization capabilities of Stata are impressive yet vastly underutilized by most users. This seminar will teach you about best data visualization practices generally—and specific ways to implement these using Stata.
We will use ChatGPT and Claude for course examples of LLM prompting, though most LLMs can be used for the purpose of aiding data visualization.
Computing
This course will use Stata for the examples and exercises. Stata version 19 will be used for the examples, but the exercises can also be done with versions 14-18.
For LLM support, the instructor will use ChatGPT and Claude. Both have free versions that allow limited use. However, most modern LLMs (e.g., ChatGPT, Claude, Gemini, Copilot) will be useful for understanding, modifying, and interpreting data visualizations.
The lecture slides are accompanied by a full set of Stata replication files. To replicate the instructor’s examples, you should have Stata already installed on your computer when the course begins. Basic familiarity with Stata is highly desirable, but even novice Stata coders should be able to follow the presentation and do the exercises. For users new to Stata, an “Introduction to Stata” guide will be provided before the seminar begins which covers the basics of getting started using Stata.
If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
Who Should Register?
If you use data, you can benefit from this seminar. Stata is a flexible and powerful tool for visualizing your data to better understand data and statistical models. This seminar is for anyone who wants to learn tools for creating effective and attractive data visualizations.
No experience with LLMs is required for the seminar. The instructor will provide full prompts necessary to aid in your data visualizations.
No statistical background is required for the seminar—but a working knowledge of regression or ANOVA is helpful to understand some of the more advanced visualization techniques we cover.
Outline
Day 1: Univariate distributions
-
- Why visualize data?
- The art and science of effective data visualization
- Introduction to data visualization in Stata
- Unique benefits of Stata for visualization
- Common options universal to most graphs
- Commonly used Stata tools
- LLMs: Overview of effective prompts for data visualizations
- Plots of univariate distributions
- Histograms
- Kernel density plots
- Overlays for group comparisons
- Box (and whisker) plots
- Violin plots
- Transforming distributions
- Visual tools and cautions
- Plotting parts of a whole, and amounts across groups
- Pie charts (and many cautions)
- Perceptual accuracy and choosing plots
- Stacked bar charts; group comparisons
- Stacked bar charts
- Bar charts
- Dot plots
- Radar/spider plots
- Pie charts (and many cautions)
- Confidence intervals and standard errors
- Visual tools for conveying uncertainty
- Balance plots
- Observational data, experimental data, and causal inference matching methods
- Using LLMs to help write code for data visualizations
- Stata tools
- Using LLMs to fix errors in code
- Using LLMs to customize graphics
- Why visualize data?
Day 2: Bivariate relationships
-
- Plots of bivariate relationships
- Scatterplots
- Options for continuous and nominal variables
- Scatterplot smoothing
- Lowess
- Incorporating covariates
- Local polynomial smoothing
- Lowess
- Heat plots
- Correlation matrices as heat plots
- Scatterplots
- Plotting change over time
- Slopegraphs
- Ridgeline plots (AKA joyplots)
- Area plots
- General data visualization rules and guidelines
- Axis range rules
- 3D graphics
- Using color well
- Nominal vs ordinal palettes
- Color blindness-proofing your graphs
- Figures that work in color or black and white
- Fonts
- Graphics file formats
- Graph schemes in Stata
- Confidence intervals and inferring statistical significance
- Maps
- Map projection options; pros and cons
- Choropleth maps
- Area vs population issues in visualization
- World, countries, states, and counties
- LLM checks for data visualizations
- Teaching your preferred LLM data visualization rules
- Easy checks for your figures to be maximally accessible and effective
- Plots of bivariate relationships
Day 3: Visualizing model results & advanced topics
-
- Visualizing model results
- Coefficient plots
- Comparing across models and/or groups
- Coefficient plots
- Plots of model predictions
- Continuous predictors vs nominal predictors
- Adding distributional information to plots
- Univariate and group-specific
- Visualization with many groups
- Marginal effects
- Plots of effects
- Summaries
- Plots of group differences
- Plots of effects
- Interaction effects
- Nominal x nominal interactions
- Nominal x continuous interactions
- Continuous x continuous interactions
- Nonlinear interaction effects
- Diagnosing, modeling, and visualizing nonlinearities
- Scatterplot smoothing
- Binned scatterplots
- Continuous variables modeled as nominal
- Ordinal predictors
- Splines
- Model diagnostics
- Residuals
- Influence
- Added-variable plots
- Using LLMs to reproduce figures of interest
- LLM prompts for reproduction
- LLM prompts for improvements and suggestions
- Visualizing model results
Reviews of Data Visualization Using Stata and LLMs
“The clarity with which Trent teaches the course is exceptional. I appreciate how he shares all the code with the students so that we can replicate it with our own data. Trent is an excellent instructor, and I always learn so much in classes with him. Everything in the course is extremely well-organized and Trent is super responsive in addressing questions!”
Vasundhara Kaul, Purdue University
“Trent was a fantastic instructor – excellent balance of content, explanations, resources, and breaks. I also loved the “levity” portions of each session. While I took the course to learn data visualization, I ended up learning about Stata code and got a bit of a statistics refresher too!”
Jessie Jensen, Rutgers University
“I enjoyed the ability to immediately translate what I learned to my current projects. Data visualization includes many hands-on applications that are useful right away. Just as important, I think there is a data visualization approach/perspective to doing data analysis that comes through.”
Sean Lauer, University of British Columbia
“This seminar was accessible yet sophisticated. I liked that we were provided ample resources to engage with the content even further.”
Ráchael Powers, University of South Florida
Seminar Information
Wednesday, April 8 –
Friday, April 10, 2026
Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).
10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm
Payment Information
The fee of $995 USD includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.

Back to Public Seminars