Using Large Language Transformer Models for Research in R: A Short Course

A 3-Day Livestream Seminar Taught by Hudson Golino, Ph.D. and Alexander Christensen, Ph.D.

Download Sample Course Slides

This seminar will introduce you to basic techniques to convert unstructured text data to structured data in R. As a necessary precursor to large language transformer models (LLMs), the course will also cover word embeddings and their use, and you will gain hands-on experience implementing word embeddings in R.

Additionally, the course will cover the concept of zero-shot classification, which involves using LLMs for text classification without the need for labeled data. You will learn about Hugging Face Transformers and implement zero-shot classification in R. Finally, the course will cover retrieval-augmented generation to summarize topics in texts for automatic zero-shot text classification using R and pre-trained transformer models.

Overall, the goal of this course is to provide you with a comprehensive (applied) understanding of LLMs for research applications. By the end of the course, you will be equipped with the necessary skills to apply these techniques to analyze and extract insights from unstructured text data in your research work.

Starting August 6, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

Computing

This is a hands-on course with instructor-led software demonstrations and guided exercises. These guided exercises are designed for the R language, so you should use a computer with a recent version of R (version 4.1.3 or later) and RStudio (version 2022.02.1+461 or later).

To follow along with the course exercises, you should have good familiarity with the use of R, including opening and executing data files and programs, as well as performing very basic data manipulation and analyses.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

Who Should Register?

The course is designed for participants who have a solid basic understanding of R and are interested in applying NLP techniques to extract insights from unstructured text data for research purposes.

Outline

Introduction to text mining

What is text mining
Common applications of text mining
From texts to structured data
Overview of the process of converting unstructured text data to structured data
Text tokenization, stop word removal, and stemming
Transforming text data into a usable format for modeling
Topic modeling with exploratory graph analysis for cross-sectional text data
Generalized local linear approximation and time-delay embedding
Topic modeling with dynamic exploratory graph analysis for time-series text data (or intensive longitudinal text data)

Word embeddings

Introduction to word embeddings and their use in text classification
Different types of word embeddings
BERT word embedding in R
Mining word embeddings with exploratory graph analysis in R

Introduction to large language transformer models: Understanding the concept of zero-shot classification

Introduction to large language transformer models
Understanding the difference between traditional NLP models and large language transformer models
Introduction to Hugging Face Transformers and its implementation in R
Research examples of zero-shot classification using R

Automatic text classification and summarization

Automatic text classification and summarization using R

Reviews of Using Large Language Transformer Models for Research in R

“I recently completed a course on large language modeling, and it exceeded my expectations. The presentations were top-notch, providing clear insights into complex concepts. The discussions were engaging, fostering a collaborative learning environment. The instructors were knowledgeable, making the entire experience highly valuable. I highly recommend this course to anyone interested in exploring large language models.”
Dr. Sepideh Banava, UCSF

“The in-depth explanation and the statistical walkthroughs with the code were excellent, as was the focus on application. I appreciated the responsiveness of the instructors on Zoom chat and Slack to answer questions from participants. I stayed up until 4:30 am in Hong Kong for almost the entire course. I was too sleepy to attend the second session live on the first night. It’s currently 5 am as I write this, which is a testament to the course’s value. I will definitely be revisiting the recordings too.”
Stefano Occhipinti, The Hong Kong Polytechnic University

“The lecturers were very dedicated and put a lot of effort into teaching us the content.”
Michael Thrun, IAP-GmbH

“I recently completed the Using Large Language Transformer Models for Research in R course and highly recommend the training. The instructors were friendly, helpful, and thorough in their approach, making sure important concepts were clearly explained and understood. They were always available to answer questions and provide guidance, which made the learning process so much more enjoyable and effective.

The time spent going over the worked examples was particularly useful, as it allowed me to gain a deeper understanding by seeing how the concepts we had learned actually functioned in a practical manner. I highly recommend this course to researchers looking for an introduction to using large language models in their research. It is well-structured, comprehensive, and the support provided by the instructors is second to none.”
William Rayo, Oregon State University

“I loved that the course was geared towards R users. So many courses are taught by Python power users. The instructors are incredibly knowledgeable and have developed R packages for researchers to put LLMs into practice! The instructors are fantastic. The course materials (slides, exercises, references) were great and will be a valuable resource. I also appreciated the updates and links on Slack.”
Juan Fung, National Institute of Standards and Technology

“I liked the very knowledgeable presenters.”
Nicholas Shirlaw, University of New South Wales

“I appreciated the very clean and well-organized R code.”
Garth Rauscher, University of Illinois Chicago

“This course opened a huge door for many different and important tools.”
Bruno Teixeira, Bristol Myers Squibb

“The R code and detailed discussion of processing text and using the packages was great!”
Jay Unick, University of Maryland

Seminar Information

Tuesday, August 6 –
Thursday, August 8, 2024

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.

Contact Information

+1 610-715-0115 info@statisticalhorizons.com