A/B Test Sample Size Calculator

This A/B Test Calculator helps you determine the exact sample size needed for statistically significant experiments. Drag sliders to simulate different test parameters, see how your baseline compares to industry benchmarks, and share your test plan with your team — all in real time.

%
0.1%50%
%
1%50%
1001,000,000

Detecting a 10% relative lift means going from 3.00% to 3.30% (absolute change: 0.300 pp).

Per Variation

53.2K

visitors needed

Total Sample

106.4K

both variations

Days to Complete

22

5.0K/day

🎯 Set a target →

Statistical Power Curve

Your Baseline vs. Landing Page AverageBelow Average
Avg: 5.89%
Your value: 3%

Recommended Actions

Performing Well

Test duration of 22 days is within a healthy range.

Run for at least 7 days regardless of sample size to capture weekly patterns.

📈

Track conversion rate trends during the test to catch any anomalies.

🎯

Ensure your traffic split is truly 50/50 for accurate results.

Monitor traffic quality with Semrush → Try free
🛡️

Risk Radar

What happens to your days to complete if each variable drops by 15%?

⚠️ Confidence is your most sensitive variable. A 15% decrease would change days to complete by $8.81

A/B Test Sample Size: The Complete Guide

Running an A/B test without calculating sample size is one of the most common — and costly — mistakes in conversion rate optimization. End a test too early and you risk acting on noise instead of signal. Run it too long and you waste weeks that could have been spent shipping improvements. This calculator gives you the exact number of visitors you need before you launch your experiment, so you can plan with confidence instead of guessing.

How A/B Test Sample Size Is Calculated

The sample size formula is rooted in frequentist hypothesis testing. You start with two conversion rates: your current baseline (the control) and the expected rate after the change (the variant). The difference between these two rates is driven by your Minimum Detectable Effect (MDE) — the smallest relative improvement you want the test to be able to detect.

Two Z-scores determine the required sample size. The first, Z-alpha, comes from your confidence level. At 95% confidence (the industry standard), Z-alpha is 1.96, meaning you accept a 5% false-positive risk. The second, Z-beta, comes from your desired statistical power. At 80% power, Z-beta is 0.8416, meaning you accept a 20% chance of missing a real effect. The formula combines these Z-scores with the variance of both conversion rates to determine the minimum observations per variation.

Choosing the Right Minimum Detectable Effect

The MDE is the single most influential parameter on your sample size. A 5% relative MDE on a 3% baseline conversion rate means you want to detect a lift from 3.0% to 3.15% — an absolute change of just 0.15 percentage points. Detecting such a small shift requires a massive sample. A 20% MDE (3.0% to 3.6%) is much easier to detect and requires roughly 16 times fewer visitors.

The key question is: what is the smallest improvement worth acting on? If a 5% lift would generate $50,000 in annual revenue, it is worth the patience. If it would generate $500, use a larger MDE and run shorter tests. This calculator shows you the exact trade-off so you can make an informed decision.

Confidence Level vs. Statistical Power

Confidence level controls your false-positive rate (Type I error) — the probability of declaring a winner when there is no real difference. Statistical power controls your false-negative rate (Type II error) — the probability of missing a real winner. These two parameters work in tension: increasing either one raises the required sample size.

The standard combination of 95% confidence and 80% power is appropriate for most business decisions. For high-stakes tests — pricing changes, checkout redesigns, or anything with significant revenue impact — consider 99% confidence and 90% power. For low-risk exploratory tests, 90% confidence and 80% power can cut your required sample by nearly half.

How to Use This Calculator

Start by entering your current conversion rate as the baseline. Set the MDE to the smallest relative improvement you care about — 10% is a common starting point. Adjust confidence and power if needed, then enter your daily traffic to see how many days the test will take. Use the power curve chart to understand how sample size affects your ability to detect real effects.

If the test duration is too long, use the interactive sliders to explore trade-offs. The Risk Radar shows which variable has the biggest impact on test duration, helping you focus your attention. Click "Days to Complete" to enter reverse-goal mode, where you set a target duration and the calculator tells you the daily traffic needed to meet it.

Common Pitfalls to Avoid

Never stop a test early because p-values look promising. Statistical significance fluctuates during a test, and early stopping inflates false-positive rates dramatically. Always run for the full calculated duration. Also ensure your test runs for at least one full week to capture day-of-week effects — even if your sample size is reached in three days.

Watch out for the "peeking problem." Every time you check results mid-test and consider stopping, you effectively run a new hypothesis test. Some testing platforms offer sequential testing methods that account for peeking, but the standard fixed-horizon approach used here assumes you check results only once — at the end.

Frequently Asked Questions

Help us make this tool better

We built Scenarical to help marketers make smarter decisions. If something feels off, we'd love to hear about it.