5 A/B testing mistakes that quietly ruin your results
Peeking, sample-ratio mismatch, underpowered tests, ignored guardrails, and multiple comparisons — the common A/B testing mistakes that lead to confident but wrong decisions.
A/B testing looks simple — show two versions, measure which wins — but the statistics are easy to get wrong in ways that feel right. Here are five mistakes that lead teams to ship changes confidently in the wrong direction.
1. Peeking and stopping early
Watching a test daily and calling it the moment p < 0.05 appears massively inflates false positives. A fixed-horizon test is only valid at its pre-computed sample size. If you need to monitor continuously, use a sequential or always-valid testing method built for it.
2. Underpowered tests
Running a test without a power analysis means you often can't detect the effect you care about. Decide the minimum effect worth detecting, then compute the sample size and runtime before you start. A "non-significant" result from an underpowered test tells you almost nothing.
3. Ignoring sample-ratio mismatch (SRM)
If you split 50/50 but see 52/48, something is broken — assignment, logging, or a redirect is dropping users non-randomly. SRM invalidates the experiment. Always run an SRM check before trusting a readout.
4. No guardrail metrics
A variant can lift your primary metric while quietly harming latency, churn, or revenue. Define guardrail metrics up front so a "win" that breaks something else gets caught before rollout.
5. Multiple comparisons
Test twenty metrics or twenty variants and one will look significant by chance. Pre-register your primary metric and correct for multiple comparisons rather than fishing for any green number.
The fix is process, not just math
Most of these are solved by deciding the rules before the experiment runs: one primary metric, a pre-computed sample size, an SRM check, and guardrails. Variance-reduction techniques like CUPED can then make trustworthy tests faster — which means more real learning per quarter.