An Introduction to How Much Data

The topic of this course is sample-size planning: How do you know how much data to collect for a research project? 

Strangely, this is not often a point of emphasis in our scientific training. It is common for even advanced graduate courses in statistics to omit this topic or touch on it rarely. Even worse, the informal training you might receive in a lab is often wrong.

The lack of strong training on sample-size determination in science is worrisome. You wouldn’t get into a plane with a pilot who didn’t have a flight plan. So it seems a bit strange that often we charge off into experimentation with no real sense of where we are going and when it will end. That’s a missed opportunity, because with just a bit of initial planning we can do research that is more fruitful and reproducible. 

The goal of this workshop is to dymystify sample-size planning and to provide actionable, practical advice about how to develop solid sample-size plans for your research. This course was developed not by a professional statistician but by a neuroscientist who accidentally fell down the rabbit hole of caring deeply about statistical inference. Thus, the course hopes to present material clearly, in language fellow bench-scientists can understand without having to delve too deeply into statistical arcana.   

Here’s the outline of the course:

Hope you enjoy this workshop. Please submit questions or bug reports to the GitHub page for this course: https://github.com/rcalinjageman/How-Much-Data/discussions.

The development of this course was supported by the NIGMS division of the National Institute of Health through grant 5R25GM132784-02. You can find other training modules focused on Rigor and Reproducibility here: https://www.nigms.nih.gov/training/pages/clearinghouse-for-training-modules-to-enhance-data-reproducibility.aspx.