Decreasing Sample-Size Needs

When you get serious about sample-size planning, you can often find that it takes much larger samples than you’d like to get clear answers to your scientific questions. While there is no doubt that many fields have settled on demonstrably inadequate sample sizes, the answer doesn’t have to be solely focused on collecting more data: we can also design better studies!

The video above gives an example of optimization - the art of coaxing more out of your data. There are three basic approaches:

Reducing Variation is Gold

Imagine you are characterizing neurogenesis in animals raised in stressed vs. standard conditions. From prevous studies, you know expect control animals to have about 3500 new neurons labelled 1 hour after BRDU injection, with a standard deviation of 500. You expect stress to have a pretty notable effect, say a 20% reduction to 2,800 labelled neurons, a difference of 3,500 - 2,800 = 700 neurons. This example is inspired by (Mirescu, Peters, and Gould 2004) (note that we’ll analyze this scenario with t-tests, as the original authors did, even though count data is typically not normally distributed and therefore not suitable for analysis in this way).

Let’s say the typical study in your field uses n = 6/group. Is that adequate? Let’s find out with statpsych:

What’s alpha = 0.05, what sample size do you need for 90% power? Let’s find out with statpsych:

if (!require("statpsych")) install.packages("statpsych")
Loading required package: statpsych
# Estimate power for a 2-group design
statpsych::power.mean2(
  alpha = 0.05,
  n1 = 6,
  n2 = 6,
  var1 = 500^2,
  var2 = 500^2,
  es = 3500-2800
)
     Power
 0.5906058

Uh oh, power is only 60%. That means we have a high risk of missing true effects and that statistically significant effects are likely to be inflated (see What Not To Do).

We could throw more data at the problem. But first, what would happen if we could reduce our sampling variability? Imagine, for example, that we might better-standardize our cell counts, perhaps by having two trainees count them independently and use the average. We could also refine what to do with borderline cases, and maybe standardize our injection and dissection protocols a little better. Suppose through these steps we could reduce within-group variation by just 20%. What would that do to our power?

if (!require("statpsych")) install.packages("statpsych")

# Same scenario, within group sd reduced by 20% through optimization
statpsych::power.mean2(
  alpha = 0.05,
  n1 = 6,
  n2 = 6,
  var1 = (500 * .8) ^2,   # Imagine reducing the sd to 80% of what your lab typically obtains
  var2 = (500 * .8) ^2,   # same reduction in both groups
  es = 3500-2800
)
     Power
 0.7797131

That’s a big jump in power! We’re now getting close to a reasonable power with the same number of samples.

Would a 20% increase in sample size have the same impact?

if (!require("statpsych")) install.packages("statpsych")

# Same scenario but increase sample-size by 20%... not the same impact!
statpsych::power.mean2(
  alpha = 0.05,
  n1 = 6 * 1.2,
  n2 = 6 * 1.2,
  var1 = 500^2,
  var2 = 500^2,
  es = 3500-2800
)
     Power
 0.6868732

No! Why not? Well, recall that formula for the standard error of the mean:

\[ sigma_{M} = frac{sigma}{sqrt(N)} \]

This shows us that changes in within-group variation is directly related to expected sampling error, whiles sample-size is only related by its square root. That means that reducing noise (when possible) can be much more impactful than increasing sample size. If takes a 56% increase in sample-size to obtain the same benefit of a 20% reduction in within-group standard deviation!

if (!require("statpsych")) install.packages("statpsych")

# To get the same impact through sample-size, we need ~50% increase
statpsych::power.mean2(
  alpha = 0.05,
  n1 = 6 * 1.5,
  n2 = 6 * 1.5,
  var1 = 500^2,
  var2 = 500^2,
  es = 3500-2800
)
   Power
 0.79613

More Gold: Optimizing for Larger Effects

In addition to reducing noise, we can work on maximizing signal. We might extend our treatement (longer stress), increase the magnitude of treatment (stronger stress), and/or focus in on measures which are especially susceptible to the treatment. For example, we might find that only some layers of the hippocampus undergo significant neurogenesis. If we could restrict our labelling to these layers, we could avoid having our effect diluted by unaffected measures.

As with reducing noise, increasing signal gets us a lot more bang for the buck. Let’s continue the previous example (2-group design, 6 per group, reduction of neurogenesis by 700 neurons, within-group standard deviaiton of 500 neurons).

Again, here is our ‘standard scenario’, in which we learn we have an inadequate sample size:

if (!require("statpsych")) install.packages("statpsych")

# Estimate power for a 2-group design
statpsych::power.mean2(
  alpha = 0.05,
  n1 = 6,
  n2 = 6,
  var1 = 500^2,
  var2 = 500^2,
  es = 3500-2800
)
     Power
 0.5906058

And now let’s check our power if we can increase the effect size by just 20%

if (!require("statpsych")) install.packages("statpsych")

# Estimate power for a 2-group design
statpsych::power.mean2(
  alpha = 0.05,
  n1 = 6,
  n2 = 6,
  var1 = 500^2,
  var2 = 500^2,
  es = (3500-2800) * 1.2   # increase effect size
)
     Power
 0.7463204

Wow! We’ve get to nearly reasonable power without needing more resources.

Certainly there are limits to what optimization can do, but working diligently to increase signal and decrease noise can help you get clearer answers with the same resources! That’s the type of thing that can make a huge difference over the course of your career.

Design Matters, Too

Our experimental design can also influence the efficiency of our experiment. In general, the simple two-group design is the least efficient design. Within-subjects designs typically have much more bang for the buck.

Let’s take a look at the benefits, this time using an precision approach. Here we’ll use statpsych to simulate converting a between-subjects study to within-subjects. For each set of simulations we’ll focus on the typical confidence interval width.

First, the between-subjects scenario. We’ll again work with 6 animals per group. We’ll assume the groups have equal variation (sd.ratio = 1) and that both come from normal distributions (dist1 = 1; dist2 = 1; where 1 tells statpsych to simulate draws from a normal distribution). We’ll conduct 1000 studies and report the average 95% confidence-interval width in standard deviation units:

if (!require("statpsych")) install.packages("statpsych")

# Base scenario - 6 animals/group, between subjects, equal variance
statpsych::sim.ci.mean2(
  alpha = 0.05,
  n1 = 6,
  n2 = 6,
  sd.ratio = 1,
  dist1 = 1,
  dist2 = 1,
  rep = 1000
)
                             Coverage Lower Error Upper Error Ave CI Width
Equal Variances Assumed:        0.958        0.02       0.022     2.544827
Equal Variances Not Assumed:    0.960        0.02       0.020     2.602374

Wow! Our typical confidence interval will be ~2.5 standard deviations wide! That’s a lot of uncertainty, showing that our sample-size is appropriate only for assays in which we are justified in expecting truly massive effects.

What if we ran the same study as a within-subject design? We’ll keep everything the same, but will also specifiy the correlation between pre/post measures. We’ll use 0.70, which is a reasonable estimate for a measure that has reasonable reliability.

While within-subjects designs may not be feasible for studies in which the measurement requires destruction of the sample, matched-control designs can offer some of the same benefits.

if (!require("statpsych")) install.packages("statpsych")

# Switch to within-subjects, just n = 6, correlation of .7 between repeated measures
statpsych::sim.ci.mean.ps(
  alpha = 0.05,
  n = 6,
  sd.ratio = 1,
  cor = 0.7,
  dist1 = 1,
  dist2 = 1,
  rep = 1000
)
 Coverage Lower Error Upper Error Ave CI Width
    0.949       0.025       0.026     1.527108

Holy cow! We are now using 1/2 the animals (1 group of 6 rather than 2 groups of 6), but our confidence interval is now much reduced, to about 1.5 standard deviations in length. That’s still quite long, and only acceptable for assays where we expect pretty large effects… but much despite using 1/2 the resources. And what if we kept with 12 animals, but all in the within-subjects design?

if (!require("statpsych")) install.packages("statpsych")

# Switch to within-subjects, n = 12
statpsych::sim.ci.mean.ps(
  alpha = 0.05,
  n = 12,
  sd.ratio = 1,
  cor = 0.7,
  dist1 = 1,
  dist2 = 1,
  rep = 1000
)
 Coverage Lower Error Upper Error Ave CI Width
    0.952       0.025       0.023    0.9628865

Nice - we’ve got our precision down to ~1 standard deviation without increasing our sample-size. Of course, we need to think critically about if a control/untreated design is needed – but we might be able to show no effect in control once and leverage that finding for repeated mechanistic studies with within-subjects designs–getting a lot more out of each experiment without much more in resources!

For many studies, within-subjects measurement is simply not feasible. With neurogenesis, for example, counting new neurons requires sacrificing the animal subjects, so additional measures are no longer feasible. In those cases, though, matched control designs and/or rigorous selection of covariates can provide many of the same benefits. For example, we could conduct a matched-control design by pre-testing stress-reactivity in all animals prior to the stress manipulation, making matched pairs of similar reactivity, and conducting random assignment to treatment within each pair. Depending on how linked scores are across match pairs, we can obtain many of the same benefits of a within-subjects design.

Further Resources

It’s surprisingly difficult to find good practical advice on optimization. Here are some papers I’ve found helpful:

References

“Here Are Some Ways of Making Your Study Replicable. (No, the First Steps Are Not Preregistration or Increasing the Sample Size!).” n.d. https://statmodeling.stat.columbia.edu/2023/06/22/here-are-some-ways-of-making-your-study-replicable-no-its-not-what-you-think/.
Ioannidis, John P A, Sander Greenland, Mark a. Hlatky, Muin J. Khoury, Malcolm R. Macleod, David Moher, Kenneth F. Schulz, and Robert Tibshirani. 2014. “Increasing Value and Reducing Waste in Research Design, Conduct, and Analysis.” The Lancet 383 (9912): 166–75. https://doi.org/10.1016/S0140-6736(13)62227-8.
Kraemer, H C. 1991. “To Increase Power in Randomized Clinical Trials Without Increasing Sample Size.” Psychopharmacology Bulletin 27 (3): 217–24. http://www.ncbi.nlm.nih.gov/pubmed/1775591.
Lazic, Stanley E. 2018. “Four Simple Ways to Increase Power Without Increasing the Sample Size.” Laboratory Animals 52 (6): 621–29. https://doi.org/10.1177/0023677218767478.
MacKinnon, Sean. 2013. “Increasing Statistical Power in Psychological Research Without Increasing Sample Size.” http://osc.centerforopenscience.org/2013/11/03/Increasing-statistical-power/.
Meyvis, Tom, and Stijn M. J. Van Osselaer. 2018. “Increasing the Power of Your Study by Increasing the Effect Size.” Journal of Consumer Research 44 (5): 1157–73. https://doi.org/10.1093/jcr/ucx110.
Mirescu, Christian, Jennifer D. Peters, and Elizabeth Gould. 2004. “Early Life Experience Alters Response of Adult Neurogenesis to Stress.” Nature Neuroscience 7 (8): 841–46. https://doi.org/10.1038/nn1290.