Using Large-Language Models for Social Science Research: A Short Course

A 3-Day Livestream Seminar Taught by Ethan C. Busby, Ph.D.

This seminar explores how to integrate large language models (LLMs) into research on human attitudes and behavior. It provides an introduction to LLMs like ChatGPT, Claude, Gemini, and Llama, discusses the critical concept of prompt engineering, and teaches you how to use LLMs in three common use cases:

  1. Coding open-ended survey responses
  2. Generating simulated or synthetic samples, and
  3. As treatments in randomized experiments.

The course is designed to give you all the tools you would need to start using these LLMs in these ways in your own work. You’ll gain a foundational understanding of what these generative AI tools are, the most efficient ways to interact with them, and detailed knowledge of how to apply LLMs in these three common cases.

This seminar includes a set of exercises and supplemental tutorials to help you apply the skills from the seminar yourself. By working through these applications, you will gain the ability to really use LLMs in your own projects.

Starting February 27, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

As a foundation for the course, you will first receive an accessible introduction to the history, structure, and nature of large language models (LLMs). We will then turn to the differences between LLMs of different types (open and closed source LLMs, LLMs provided by different companies, etc.).

A significant amount of seminar time will be devoted to “prompt engineering” or principles of interacting with LLMs. We will contrast this with fine-tuning (which we will only discuss at a conceptual level) and cover different approaches and methods to prompting. You will spend time exploring methods of prompting with LLMs.

The course will provide detailed demonstrations of how to interact with LLMs in public-facing interfaces (like ChatGPT), online developer platforms, and through APIs in R and RStudio. You will be given time to work through exercises and gain experience working with these interfaces as a part of the seminar.

The final part of the course covers three common use cases: coding open-ended survey responses, generating simulated or synthetic samples, and as treatments in randomized experiments. Of these three, we will devote more time to synthetic samples and treatments than on coding of open-ended texts.

Computing

You should have access to a computer with the most recent versions of R and RStudio installed.

You will receive detailed instructions and examples before the course for setting up API keys for three LLMs (OpenAI, Anthropic, and Google) and GitHub Copilot before the course begins.

The goal is to make the course as accessible and approachable as possible. The applications and examples, however, assume a working familiarity with R programming. Many of these skills can be translated into other platforms (like Python), but all of the demonstrations and exercises use R and RStudio.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

Who Should Register?

This course is targeted at social scientists and behavioral researchers without advanced programming skills who are seeking to enhance their research with AI tools (specifically, large language models or LLMs). If you are a data scientist, market researcher, social scientist, or analyst interested in how LLMs can be used to help in your work, this course is for you.

Outline

Day 1: Background and start-up

    • Conceptual introduction to LLMs
    • Brief comparison of LLMs to other related tools (notably unsupervised machine learning, supervised machine learning, and BERT models)
    • Discussion of different kinds of LLMs, with costs and benefits to each
      • Closed vs. open-source LLMs
      • LLMs produced by different companies
      • LLMs of different sizes
      • LLMs with different linguistic capacities
      • Leaderboards
    • Demonstrations
      • Different methods of interacting with LLMs
      • Demonstrations with OpenAI, Anthropic, and Google
      • Demonstrations using R
    • Exercises and practice

Day 2: Prompting and ethics

    • Revisiting conceptual nature of LLMs
    • Prompting and prompt engineering
      • Principle-based approaches
      • The need for trial and error
      • The concept of alignment and the construction of the models
      • Where in the workflow to budget time and resources for prompting
      • The role of validation
    • Demonstrations
      • Prompt engineering through the public-facing interfaces of different LLMs
      • Demonstration of playground/sandbox/workbench elements of closed-source LLMs
    • Exercises and practice
    • Discussion of ethical considerations when working with LLMs

Day 3: Three use cases of LLMs

    • Walk through several examples of how researchers are currently using LLMs
    • Cover three common use cases, with demonstrations and exercises for each
    • Coding open-ended survey responses
      • Motivation/need for this tool
      • How LLMs can be used to generate dictionaries
      • How LLMs can be used directly as coders
      • The concept of validation again
    • Synthetic/simulated samples with LLMs
      • Introduction to the concept
      • Applications
      • Extension of prompt engineering
      • Limits and possibilities
    • LLMs as treatments
      • Different ways to use as treatments
      • Generating vignettes and images
      • Use within a survey tool
      • As interactive/dynamic components
      • Uses and limits
      • Ethical considerations
    • Demonstrations of all three use cases
    • Exercises with all three use cases
    • Recap and conclusion

Seminar Information

Thursday, February 27 –
Saturday, March 1, 2025

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.