The video above gives an overview of planning for precision. Now let’s dig in with an example.
In the power for planning section, we imagined conducting a study of the effect of stress on neurogenesis in the hippocampus. The plan is to raise mice under stressful or standard conditions and then, as adults, characterize neurogenesis by injecting BRDU and counting labelled cells 2 hours after injection (This example is inspired by [@mirescu2004]).
To plan for power, we need a lot of inputs: we’d need to know the type of test we’ll use, its stringency, the power level desired, the typical level of neurogenesis in the control group, a quantitative prediction about the degree to which stress will change neurogenesis, and the expected standard deviation within each group. This long list of inputs can feel frustrating – that’s because planning for power is for testing a hypothesis, and until you have a well-developed hypothesis it is simply not possible to make a good sample plan.
Planning for precision can help – the goal of planning for precision is to conduct a decriptive study, one where we simply want to provide a quality measurement. That is, we’re not trying to confirm a specific theory about neurogenesis (stress halts neurogenesis); we’re trying to accurately measure the impact of stress on neurogenesis. Planning for precision lets us weigh our expected sampling error against resources: how close to the truth can you afford to get?
Planning for precision requires fewer inputs than planning for power. You need:
The confidence level you want for your confidence interval (95% is typical, but good thought and judgement is needed for selecting a confidence level),
The within-group variety you expect (variance or standard deviation); this can come from the prior literature, previous studies in your lab using this assay, and/or a pilot study.
The desired confidence interval width (how much uncertainty are you willing to live with?).
Setting the confidence level is pretty easy. Knowing the variety of scores to expect within a group can present some challenges, but is not too hard to develop some good guesses. Critically, standard deviations in the published literature are likely not very biased because they are not a major point of selection/filtering. So, perhaps you find previous BRDU experiments in the same strain of mice and find a typical standard deviation is 500 neurons.
The last input can is the most difficult: how much uncertainty can you tolerate? We need a number: are you ok with an answer within 100 neurons of the truth? 500 neurons of the truth? 1,000 neurons of the truth? The answer will be informed by what you can afford and also by your expectations of the magnitude of the effect. In political polling, pollsters want to get small margins of error in races expected to be close; they can tolerate large margins of error in non-competitive races. Similarly, if you have reason to believe that stress will make an enormous impact on neurogenesis, then you might be comfortable with a fairly long confidence interval. If you believe the effect will be subtle, you’d rather have a short confidence interval. Remember, the goal is to get a reasonable reading on the effect; to describe it well to inform your theories and hypotheses – you don’t yet need to be so precise as to test a hypothesis, but you still want to generate useful knowledge.
If you have expectations about the control group, you might be able to think in terms of percentages. For example, if you know that typical neurogenesis is 3,500 new neurons, you could think about getting a confidence interval that spans 10% (350 neurons), 20% (750 neurons), etc.
Most likely you will do some exploration and iteration – you might initially want a very short confidence interval, but then scale back to find a reasonable compromise between precision and cost.
For this example, let’s check out confidence interval lengths of 250 neurons (1/2 a standard deviation long), 500 (1 standard deviation long), and 1,000 (2 standard deviations long) and we can then think about the costs and pick a precision level.
To get our sample-size needs we will again turn to statpsych in R. For this two-group design we will use the size.ci.mean2. This function along with all other functions in statpsych are documented here: https://dgbonett.github.io/statpsych/reference/index.html. The parameters for this function are:
alpha - instead of confidence level, statpsych asks for alpha level. So for a 95% confidence interval you input .05, for 99% the input is .01, etc.
var - statpsych asks for the within-group diversity to be expressed as variance, the squared standard deviation
w - confidence-interval width
R - n2/n1 ratio; typically we would use 1 to indicate equal groups
Here’s our code, starting with a desired CI width of 250 neurons (entire CI length is just half the standard deviation of 500 neurons).
if (!require("statpsych")) install.packages("statpsych")
Loading required package: statpsych
statpsych::size.ci.mean2(alpha =1-0.95, # alpha of .05 for 95% CIvar =500^2, # sd is 500, so we enter 500^2 for variancew =250, # we'll start with a CI length of 250R =1# R = 1 indicates equal sample sizes)
n1 n2
124 124
Whoa! Getting that much prevision is expensive. Let’s consider something a bit less ambitious:
if (!require("statpsych")) install.packages("statpsych")statpsych::size.ci.mean2(alpha =1-0.95, var =500^2, w =500, # update CI length to 500R =1)
n1 n2
32 32
Much better! That’s a lot less animals, though we will also be a lot less certain about the effects of stress. Let’s see what happens if we’re willing to be even more uncertain:
if (!require("statpsych")) install.packages("statpsych")statpsych::size.ci.mean2(alpha =1-0.95, var =500^2, w =1000, # update CI length to 500R =1)
n1 n2
9 9
That’s a somewhat large, but not atypical sample size for these types of studies. But it is also not especially precise: we should expect our typical confidence interval to be 1,000 neurons wide. Given that within-group variability is 500, that’s a pretty long interval, something only suitable if we believe stress will have a pretty enormous impact on neurogenesis.
While planning for precision takes fewer inputs, it still often reveals that more resources are needed than have been typical in a field. Part of that is just the sad truth: many assays are regularly conducted with sample-sizes that are indefensibly small, and we need to change that. But there are other ways to deal with this sticker-shock; see the section on Decreasing Sample-Size Needs.
Important Considerations When Planning for Precision
What if your inputs are off? That, is what if variation is 20% higher than you expected? Or 50%? The best practice is to explore a variety of inputs and judiciously choose a sample size.
Does your sample-size plan matches your research question? It sounds obvious, but you need to make sure you are planning for power for the analysis that answers your research quesiton. For example, suppose you describe the effects of both stress (high/low) and sex (male/female) on neurogensis. You will probably want to describe how much stress affects neurogenesis in male mice. And you will probably want to describe how much stress affects neurogenesis in female mice. But most importantly, you will probably want to describe the difference between these simple effects, the extent to which stress effects differ between the sexes. In that case, you need to make sure your sample-size plan is for the confidence interval for the difference between the simple effects (the interaction), and effect size that has much higher sampling error than either simple effect.
Will you estimate multiple parameters? When you want to estimate multiple parameters you increase the overall risk of providing a confidence interval that doesn’t capture the true parameter for the population. You thus need to make a sample size plan that includes the increased needs that occur when you make multiple estimates. This can become complicated, and sometimes can only be solved through simulation.
Do you have non-independent/hierarchical data? Hierarchical data (e.g. multiple cells recorded from the each animal in each condition) violates the non-independence assumed in most statistical tests. You will need to make sure you have a porper analysis strategy (e.g. sufficient summary statistics approach) and a power plan that matches.