Team Science Principles for Data Scientists: A Short Course

A 3-Day Livestream Seminar Taught by Manisha Desai, Ph.D.

Download Sample Course Slides

Data scientists are working more and more as part of scientific teams. In this course, participants who are (or who are training to be) data scientists will learn optimal team science tools for engaging clinical and translational investigators in the collaborative research process. These principles apply across the medical, behavioral, and social sciences.

The course will touch upon aspects of engagement with non-data scientists in a team setting all along the translational research process from study design to data management to data analysis to dissemination of findings. We will address the following questions:

  • How should a data scientist be integrated into the team?
  • Is there a difference between a data scientist consulting or collaborating on a project?
  • When should a data scientist be onboarded to a project, and what happens when the ideal does not occur in practice?
  • How should the data scientist engage the collaborator more generally given the different stages at which investigators may be ready to include data science expertise?
  • Is data collection, extraction, cleaning and management a topic that should concern a data scientist?
  • What do topics like authorship and reasonable timelines have to do with upholding principles of rigor and reproducibility?
  • Who is responsible for interpreting empirical findings?

This course touches upon these issues and more in the context of a multidisciplinary team.

Starting September 28, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions. Live captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

By the end of this seminar, you will have the skills to:

  • Assemble and recommend an optimal data science sub-team.
  • Engage collaborators in the scientific process of study design.
  • Educate collaborators on optimal ways to engage data scientists as true peers.
  • Educate collaborators on how to integrate rigor and reproducibility principles.
  • Engage the entire team to jointly design a study and develop an ideal statistical analysis plan.
  • Play an integral role during data collection and data extraction phases of the study.
  • Disseminate findings to the team and to the research community with strong intention.
  • Ensure the data scientist’s voice is heard in the study design, study implementation, and interpretation of findings.

In addition to lectures, materials will be taught using simulated role playing and real-time demonstrations of collaborations with a guest collaborator.


This course will not rely on any particular computing package.

Who Should Register?

Anyone who is a collaborative data scientist or interested in becoming a collaborative data scientist would benefit from this course. An introductory course in statistics or biostatistics and in study design would be helpful.


Assessing project needs from a data science lens and getting integrated into the team

  • Introduction to the assessment meeting and its purpose
  • Essential elements of an assessment meeting
  • Assessment tool
  • Autopsies on examples of assessment meetings
  • Extracting next steps from assessment meetings
  • Four assessment scenarios to troubleshoot (examples below)
    • My grant is near final and I just need a power calculation.
    • I have a simple question that requires a simple t-test for you to perform.
    • Can you review my manuscript to make sure the stats are correct?
    • My study is really simple and should take a few hours using some electronic health record data to study whether those who died of late stage colon cancer utilized more resources than necessary if they were on private vs public insurance. I have all the data cleaned and ready to analyze.

Essentials to designing a study and developing a statistical analysis plan

  • Essential elements of a statistical analysis plan
  • Statistical analysis plan templates for different types of studies
  • Examples of good and bad statistical analysis plans
  • The purpose and importance of pre-registering the statistical analysis plan
  • Engagement with guest collaborator

The ideal statistical analysis plan and data management plan

  • Essential elements of a data sharing and management plan
  • Examples of data sharing and management plans
  • Role of data scientists in ongoing data management and data cleaning
  • Workshop to collectively establish an example statistical analysis plan (continuing example)

Disseminating findings and troubleshooting collaborative mishaps

  • Scenarios with output to illustrate principles behind optimal dissemination when the study is ongoing with study team
  • Scenarios with output to illustrate principles behind optimal dissemination to the research community
  • Five scenarios to troubleshoot that arise in dissemination (examples below)
    • Unrealistic deadlines
    • Authorship issues
    • Tone of the findings
    • Altering/deviating from statistical analysis plan
    • Disseminating negative findings

Seminar Information

Thursday, September 28 –
Saturday, September 30, 2023

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.