An Introduction to How Much Data
The topic of this course is sample-size planning: How do you know how much data to collect for a research project?
Strangely, this is not often a point of emphasis in our scientific training. It is common for even advanced graduate courses in statistics to omit this topic or touch on it rarely. Even worse, the informal training you might receive in a lab is often wrong.
The lack of strong training on sample-size determination in science is worrisome. You wouldn’t get into a plane with a pilot who didn’t have a flight plan. So it seems a bit strange that often we charge off into experimentation with no real sense of where we are going and when it will end. That’s a missed opportunity, because with just a bit of initial planning we can do research that is more fruitful and reproducible.
The goal of this workshop is to dymystify sample-size planning and to provide actionable, practical advice about how to develop solid sample-size plans for your research. This course was developed not by a professional statistician but by a neuroscientist who accidentally fell down the rabbit hole of caring deeply about statistical inference. Thus, the course hopes to present material clearly, in language fellow bench-scientists can understand without having to delve too deeply into statistical arcana.
Here’s the outline of the course:
First, we’ll discuss why sample-size planning is so vital to good science and therefore why it is worth investing your time and energy into completing this workshop. You may have been driven to this class because of some external mandate imposed by a funder or training program. The goal for this section is to help develop your own, intrinsic motivation for learning how to plan sample sizes.
Next, we need to clear some very common but very bad practices: we’ll discuss the perils of “run-and-check” and of just copying forward sample-sizes from previous studies. It turns out these informal approaches to sample-size determination undermine sound science.
The unit on effect sizes provides some of the foundational knowledge needed to understand sample-size planning. Specifically, we define effect sizes, discuss how they can be expressed in different units, and provide some tips on how to build “effect size zoos” and start thinking in effect sizes.
Finally, we get to the actual nitty-gritty of sample-size planning, covering three major approaches:
Planning for Evidence (still under development)
With an understanding of some of the different approaches to sample-size planning, we’re ready for a check-list for a quality sample-size plan and to review some model plans.
Last but not least, this workshop describes some important strategies for dealing with sample-size sticker shock. There are lots of ways to optimize your experiments to help reduce your sample-size needs and get more information out of your experiments.
Hope you enjoy this workshop. Please submit questions or bug reports to the GitHub page for this course: https://github.com/rcalinjageman/How-Much-Data/discussions.
The development of this course was supported by the NIGMS division of the National Institute of Health through grant 5R25GM132784-02. You can find other training modules focused on Rigor and Reproducibility here: https://www.nigms.nih.gov/training/pages/clearinghouse-for-training-modules-to-enhance-data-reproducibility.aspx.