A/B Test Sample Size Calculator
This A/B Test Calculator helps you determine the exact sample size needed for statistically significant experiments. Drag sliders to simulate different test parameters, see how your baseline compares to industry benchmarks, and share your test plan with your team — all in real time.
Detecting a 10% relative lift means going from 3.00% to 3.30% (absolute change: 0.300 pp).
Per Variation
53.2K
visitors needed
Total Sample
106.4K
both variations
Days to Complete
22
5.0K/day
🎯 Set a target →
Statistical Power Curve
Recommended Actions
Performing WellTest duration of 22 days is within a healthy range.
Run for at least 7 days regardless of sample size to capture weekly patterns.
Track conversion rate trends during the test to catch any anomalies.
Ensure your traffic split is truly 50/50 for accurate results.
Monitor traffic quality with Semrush → Try free →Risk Radar
What happens to your days to complete if each variable drops by 15%?
⚠️ Confidence is your most sensitive variable. A 15% decrease would change days to complete by $8.81
A/B Test Sample Size: The Complete Guide
Running an A/B test without calculating sample size is one of the most common — and costly — mistakes in conversion rate optimization. End a test too early and you risk acting on noise instead of signal. Run it too long and you waste weeks that could have been spent shipping improvements. This calculator gives you the exact number of visitors you need before you launch your experiment, so you can plan with confidence instead of guessing.
How A/B Test Sample Size Is Calculated
The sample size formula is rooted in frequentist hypothesis testing. You start with two conversion rates: your current baseline (the control) and the expected rate after the change (the variant). The difference between these two rates is driven by your Minimum Detectable Effect (MDE) — the smallest relative improvement you want the test to be able to detect.
Two Z-scores determine the required sample size. The first, Z-alpha, comes from your confidence level. At 95% confidence (the industry standard), Z-alpha is 1.96, meaning you accept a 5% false-positive risk. The second, Z-beta, comes from your desired statistical power. At 80% power, Z-beta is 0.8416, meaning you accept a 20% chance of missing a real effect. The formula combines these Z-scores with the variance of both conversion rates to determine the minimum observations per variation.
Choosing the Right Minimum Detectable Effect
The MDE is the single most influential parameter on your sample size. A 5% relative MDE on a 3% baseline conversion rate means you want to detect a lift from 3.0% to 3.15% — an absolute change of just 0.15 percentage points. Detecting such a small shift requires a massive sample. A 20% MDE (3.0% to 3.6%) is much easier to detect and requires roughly 16 times fewer visitors.
The key question is: what is the smallest improvement worth acting on? If a 5% lift would generate $50,000 in annual revenue, it is worth the patience. If it would generate $500, use a larger MDE and run shorter tests. This calculator shows you the exact trade-off so you can make an informed decision.
Confidence Level vs. Statistical Power
Confidence level controls your false-positive rate (Type I error) — the probability of declaring a winner when there is no real difference. Statistical power controls your false-negative rate (Type II error) — the probability of missing a real winner. These two parameters work in tension: increasing either one raises the required sample size.
The standard combination of 95% confidence and 80% power is appropriate for most business decisions. For high-stakes tests — pricing changes, checkout redesigns, or anything with significant revenue impact — consider 99% confidence and 90% power. For low-risk exploratory tests, 90% confidence and 80% power can cut your required sample by nearly half.
How to Use This Calculator
Start by entering your current conversion rate as the baseline. Set the MDE to the smallest relative improvement you care about — 10% is a common starting point. Adjust confidence and power if needed, then enter your daily traffic to see how many days the test will take. Use the power curve chart to understand how sample size affects your ability to detect real effects.
If the test duration is too long, use the interactive sliders to explore trade-offs. The Risk Radar shows which variable has the biggest impact on test duration, helping you focus your attention. Click "Days to Complete" to enter reverse-goal mode, where you set a target duration and the calculator tells you the daily traffic needed to meet it.
Common Pitfalls to Avoid
Never stop a test early because p-values look promising. Statistical significance fluctuates during a test, and early stopping inflates false-positive rates dramatically. Always run for the full calculated duration. Also ensure your test runs for at least one full week to capture day-of-week effects — even if your sample size is reached in three days.
Watch out for the "peeking problem." Every time you check results mid-test and consider stopping, you effectively run a new hypothesis test. Some testing platforms offer sequential testing methods that account for peeking, but the standard fixed-horizon approach used here assumes you check results only once — at the end.
Frequently Asked Questions
Help us make this tool better
We built Scenarical to help marketers make smarter decisions. If something feels off, we'd love to hear about it.
Related Tools
Landing Page Conversion Estimator
Estimate landing page conversion rates and revenue. Simulate changes to traffic, conversion rate, and average order value.
ROAS Calculator
Calculate Return on Ad Spend across multiple scenarios. Compare channels, adjust budgets, and see your profitability in real time.
Email ROI Calculator
Calculate the ROI of your email marketing campaigns. Factor in list size, open rates, click rates, and revenue per subscriber.