Statistics
p-value
.webp)
Statistical measure used in the frequentist approach to indicate the probability of observing the results obtained (or even more extreme results) if the null hypothesis were true. In other words, it quantifies the degree of surprise of the observation compared to a situation where there would be no real difference between the variants tested.
CRO / A/B testing :
In an A/B test, the null hypothesis states that variation B has no effect compared to A.
➡️ A low p-value (e.g. 0.03) indicates that the observed results are not very compatible with this hypothesis, reinforcing the idea that a real effect exists.
To remember:
- A p-value of less than 0.05 (classic threshold) is often considered statistically significant → the null hypothesis is rejected.
- A p-value greater than 0.05 does not mean that there is no effect, but that there is insufficient evidence to conclude.
- The p-value does not measure the probability of the variation being better, nor the magnitude of the effect. For that, we look at theuplift, the confidence intervals or theestimated business impact.
Common pitfalls in CRO:
- Interpret a p-value in isolation, regardless of data volume, MDE or business context.
- "P-hacking: stopping a test as soon as 0.049 is reached can lead to false positives (misleading results).
- Forget that p-value says nothing about practical significance: a small effect can be significant but useless.