Member-only story
P-Hacking for Beginners
That famous p-value in studies should not have the final say in your conclusions.
Of all the subjects in biostatistics, the one I have the hardest time explaining to students is the concept of the p-value. At its core, the p-value is the probability of rejecting the null hypothesis when it is true. In English? It is the probability of saying there is an association between the things we are observing — the cause and the effect — when in fact there is no association. It is also the probability that what we’re seeing is just by chance. (This is all a very simplistic way of putting it. Remember, this is not a statistics class.)
I told you it was hard to explain.
Basically, you want a very low p-value when you perform a statistical analysis. The lower, the better… The lower, the less chance you’ll say two or more things are associated when they’re not. And we use a p-value of 0.05 as a cut-off, because we’re willing to take at most a 5% probability of being wrong.
Because 0.051 would be crazy.
Here’s the thing, though. It is possible to manipulate the design of your studies so that the p-value of the statistical analysis you perform is under 0.05. That would make your results “statistically significant,” even if it doesn’t change the strength of the association you’re seeing in your analysis. Let me give you an example.