Using Large Language Transformer Models for Research in R: A Short Course

An 8-Hour Livestream Seminar Taught by Hudson Golino, Ph.D. and Alexander Christensen, Ph.D.

Over the years, we’ve gotten many requests for short introductory courses. Today we are proud to unveil our newest “mini” course: Using Large Language Transformer Models for Research in R. In just 8 hours (over 2 days), you will learn to use natural language processing (NLP) techniques and large language transformer models (LLMs) for research applications using the R programming language. We hope you enjoy the course.

This seminar will introduce you to basic techniques to convert unstructured text data to structured data in R. As a necessary precursor to large language transformer models (LLMs), the course will also cover word embeddings and their use, and you will gain hands-on experience implementing word embeddings in R.

Additionally, the course will cover the concept of zero-shot classification, which involves using LLMs for text classification without the need for labeled data. You will learn about Hugging Face Transformers and implement zero-shot classification in R. Finally, the course will cover automatic text classification and summarization using R and pre-trained transformer models.

Overall, the goal of this course is to provide you with a comprehensive (applied) understanding of LLMs for research applications. By the end of the course, you will be equipped with the necessary skills to apply these techniques to analyze and extract insights from unstructured text data in your research work.

Starting August 28, we are offering this seminar as an 8-hour synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two 2-hour lecture sessions which include hands-on exercises, separated by a 30-minute break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions. Live captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

Why are Large Language Transformer Models (LLMs) so popular nowadays?

Large language transformer models, such as GPT-4, have gained popularity for several reasons:

  1. State-of-the-art performance: These models have achieved state-of-the-art performance on a wide range of natural language processing tasks, including language translation, text summarization, question answering, and language generation.
  2. Zero-shot learning: LLMs can perform tasks for which they have not been explicitly trained, a property known as zero-shot learning. This is because they have been trained on a vast amount of diverse text data, allowing them to understand the underlying patterns and relationships in natural language.
  3. Scalability: LLMs are highly scalable and can be fine-tuned for specific tasks with relatively small amounts of task-specific data.
  4. General-purpose: LLMs are designed to be general-purpose, meaning they can be used for a wide variety of natural language processing tasks without the need for specialized models for each task.
  5. Ease of use: Many LLMs are available as pre-trained models, allowing developers and researchers to use them without the need for extensive training or expertise in natural language processing.

Overall, the combination of state-of-the-art performance, zero-shot learning, scalability, general-purpose design, and ease of use make large language transformer models highly attractive for a wide range of natural language processing applications.

Our course is designed as a first introduction to natural language processing and large language models for research applications, covering some basic concepts and applications of transformer models in R.

Computing

This is a hands-on course with instructor-led software demonstrations and guided exercises. These guided exercises are designed for the R language, so you should use a computer with a recent version of R (version 4.1.3 or later) and RStudio (version 2022.02.1+461 or later).

To follow along with the course exercises, you should have good familiarity with the use of R, including opening and executing data files and programs, as well as performing very basic data manipulation and analyses.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

Who Should Register?

The course is designed for participants who have a solid basic understanding of R and are interested in applying NLP techniques to extract insights from unstructured text data for research purposes.

Outline

Introduction to text mining

  • What is text mining
  • Common applications of text mining
  • From texts to structured data
  • Overview of the process of converting unstructured text data to structured data
  • Text tokenization, stop word removal, and stemming
  • Transforming text data into a usable format for modeling

Word embeddings

  • Introduction to word embeddings and their use in text classification
  • Different types of word embeddings

Introduction to large language transformer models: Understanding the concept of zero-shot classification

  • Introduction to large language transformer models
  • Understanding the difference between traditional NLP models and large language transformer models
  • Introduction to Hugging Face Transformers and its implementation in R
  • Research examples of zero-shot classification using R

Automatic text classification and summarization

  • Automatic text classification and summarization using R

Seminar Information

Monday, August 28 &
Wednesday, August 30, 2023

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:30am-12:30pm (convert to your local time)
1:00pm-3:00pm

Payment Information

The fee of $595 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.