Using Large Language Transformer Models for Research in R: A Short Course

A 3-Day Livestream Seminar Taught by Hudson Golino, Ph.D. and Alexander Christensen, Ph.D.

Download Sample Course Slides

This seminar will introduce you to basic techniques to convert unstructured text data to structured data in R. As a necessary precursor to large language transformer models (LLMs), the course will also cover word embeddings and their use, and you will gain hands-on experience implementing word embeddings in R.

Additionally, the course will cover the concept of zero-shot classification, which involves using LLMs for text classification without the need for labeled data. You will learn about Hugging Face Transformers and implement zero-shot classification in R. Finally, the course will cover retrieval-augmented generation to summarize topics in texts for automatic zero-shot text classification using R and pre-trained transformer models.

Overall, the goal of this course is to provide you with a comprehensive (applied) understanding of LLMs for research applications. By the end of the course, you will be equipped with the necessary skills to apply these techniques to analyze and extract insights from unstructured text data in your research work.

Starting August 6, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

Why are Large Language Transformer Models (LLMs) so popular nowadays?

Large language transformer models, such as GPT-4, have gained popularity for several reasons:

  1. State-of-the-art performance: These models have achieved state-of-the-art performance on a wide range of natural language processing tasks, including language translation, text summarization, question answering, and language generation.
  2. Zero-shot learning: LLMs can perform tasks for which they have not been explicitly trained, a property known as zero-shot learning. This is because they have been trained on a vast amount of diverse text data, allowing them to understand the underlying patterns and relationships in natural language.
  3. Scalability: LLMs are highly scalable and can be fine-tuned for specific tasks with relatively small amounts of task-specific data.
  4. General-purpose: LLMs are designed to be general-purpose, meaning they can be used for a wide variety of natural language processing tasks without the need for specialized models for each task.
  5. Ease of use: Many LLMs are available as pre-trained models, allowing developers and researchers to use them without the need for extensive training or expertise in natural language processing.

Overall, the combination of state-of-the-art performance, zero-shot learning, scalability, general-purpose design, and ease of use make large language transformer models highly attractive for a wide range of natural language processing applications.

Our course is designed as a first introduction to natural language processing and large language models for research applications, covering some basic concepts and applications of transformer models in R.

Computing

This is a hands-on course with instructor-led software demonstrations and guided exercises. These guided exercises are designed for the R language, so you should use a computer with a recent version of R (version 4.1.3 or later) and RStudio (version 2022.02.1+461 or later).

To follow along with the course exercises, you should have good familiarity with the use of R, including opening and executing data files and programs, as well as performing very basic data manipulation and analyses.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

Who Should Register?

The course is designed for participants who have a solid basic understanding of R and are interested in applying NLP techniques to extract insights from unstructured text data for research purposes.

Outline

Introduction to text mining

  • What is text mining
  • Common applications of text mining
  • From texts to structured data
  • Overview of the process of converting unstructured text data to structured data
  • Text tokenization, stop word removal, and stemming
  • Transforming text data into a usable format for modeling
  • Topic modeling with exploratory graph analysis for cross-sectional text data
  • Generalized local linear approximation and time-delay embedding
  • Topic modeling with dynamic exploratory graph analysis for time-series text data (or intensive longitudinal text data)

Word embeddings

  • Introduction to word embeddings and their use in text classification
  • Different types of word embeddings
  • BERT word embedding in R
  • Mining word embeddings with exploratory graph analysis in R

Introduction to large language transformer models: Understanding the concept of zero-shot classification

  • Introduction to large language transformer models
  • Understanding the difference between traditional NLP models and large language transformer models
  • Introduction to Hugging Face Transformers and its implementation in R
  • Research examples of zero-shot classification using R

Automatic text classification and summarization

  • Automatic text classification and summarization using R

Reviews of Using Large Language Transformer Models for Research in R

“I recently completed a course on large language modeling, and it exceeded my expectations. The presentations were top-notch, providing clear insights into complex concepts. The discussions were engaging, fostering a collaborative learning environment. The instructors were knowledgeable, making the entire experience highly valuable. I highly recommend this course to anyone interested in exploring large language models.” 
  Dr. Sepideh Banava, UCSF 

“The in-depth explanation and the statistical walkthroughs with the code were excellent, as was the focus on application. I appreciated the responsiveness of the instructors on Zoom chat and Slack to answer questions from participants. I stayed up until 4:30 am in Hong Kong for almost the entire course. I was too sleepy to attend the second session live on the first night. It’s currently 5 am as I write this, which is a testament to the course’s value. I will definitely be revisiting the recordings too.” 
  Stefano Occhipinti, The Hong Kong Polytechnic University 

“The lecturers were very dedicated and put a lot of effort into teaching us the content.” 
  Michael Thrun, IAP-GmbH 

“I recently completed the Using Large Language Transformer Models for Research in R course and highly recommend the training. The instructors were friendly, helpful, and thorough in their approach, making sure important concepts were clearly explained and understood. They were always available to answer questions and provide guidance, which made the learning process so much more enjoyable and effective.

The time spent going over the worked examples was particularly useful, as it allowed me to gain a deeper understanding by seeing how the concepts we had learned actually functioned in a practical manner. I highly recommend this course to researchers looking for an introduction to using large language models in their research. It is well-structured, comprehensive, and the support provided by the instructors is second to none.”
  William Rayo, Oregon State University

“I loved that the course was geared towards R users. So many courses are taught by Python power users. The instructors are incredibly knowledgeable and have developed R packages for researchers to put LLMs into practice! The instructors are fantastic. The course materials (slides, exercises, references) were great and will be a valuable resource. I also appreciated the updates and links on Slack.”
  Juan Fung, National Institute of Standards and Technology

“I liked the very knowledgeable presenters.”
  Nicholas Shirlaw, University of New South Wales

“I appreciated the very clean and well-organized R code.”
  Garth Rauscher, University of Illinois Chicago

“This course opened a huge door for many different and important tools.”
  Bruno Teixeira, Bristol Myers Squibb

“The R code and detailed discussion of processing text and using the packages was great!”
  Jay Unick, University of Maryland

Seminar Information

Tuesday, August 6 –
Thursday, August 8, 2024

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.