Problems in traditional A/B testing

The irst-generation testing tools like Optimizely and VWO are good with buttons and other micro-optimizations. But they are not designed for the entire conversion optimization task.

4,400+ visitors and 20 days was needed to show that the two versions were actually the same 🤷

No traction

Traditional A/B testing starts with an off-the-cuff hypothesis: “By using a bigger call-to-action button here, we should get more leads.“ Here's why this approach fails:

  1. Not fixing bottlenecks. The style or the copy of your buttons is hardly your biggest concern. While doing your experiment, your site is probably leaking tons of people that could be captured with some simple fix. You should really be fixing your bottlenecks — that's where you make the biggest wins with the lowest possible efforts.
  2. Not shooting for maximum traction “More leads on this time period” is hardly your best goal. What you really want is to collect maximum traction during the entire lifetime of the visitors — maximum amount of prospects, suspects, leads, customers, and virality.

You make the biggest wins by starting from your bottlenecks and by striving for maximum visitor lifetime value.

Many businesses would be better off if they didn’t run A/B tests at all.

David Kadavy

Author of “Design for Hackers”

Only one way to optimize

Traditional A/B testing is relatively trivial: you are encouraged to make small changes on a single page, and see how they impact one metric on the current time range. However, there is so much more you can do to your growth that is not possible with this method.

  1. Content marketing — optimizing for a better audience that wants your product more than than the other segments
  2. Website optimization — optimizing for a more engaging experience with global design changes, new product launches, brand refreshing, or with new information architecture.

It's sloooooooow

In traditional A/B testing, the results take too long to develop.

It's not unusual to wait for several months before you can trust the results. A rule of thumb is to collect 250-400 conversions per variation before the results are statistically significant.

For example, let's assume you have a reasonably popular site with 10,000 monthly visitors, and your lead conversion rate is 3%. That generates a charming 10 leads every day (1). Then we start the above A/B test with a more prominent call-to-action button: A is the original button, B is the bigger one.

With 300 conversions per variant, you must wait for 60 days before you can be sure which button was better during the experiment. (2)

Learn why traditional A/B testing is so slow

(1) 10,000 / 30 Ă— 0.03 = 10

(2) 2 Ă— 300 / 10 = 60

Blocked progress

Perhaps the biggest problem in traditional A/B testing is that you are not expected to make any updates on the site while the experiment is running. You must make sure that the button is making the impact and not anything else.

You must stop working on the site while waiting for the results. This can take a couple of months.

And every time you run a test, you are likely leaking tons of visitors somewhere on the top of your funnel. You could have fixed those and made a bunch of other optimizations while waiting for the button experiment to end.

That is not acceptable for a startup because you want to be continually improving and “pivoting.“ You want to make big content- and functional changes to your website and you want to find new and better audiences that generate more traction.

Wasted data

When a traditional A/B test is over, all the experimentation data is no longer usable. You only know which variant performed better in the past couple of weeks.

Instead of just learning one thing, you could have collected new insights about your landing pages, referring domains and visitors, and then reuse this information on the next experiment.

Unfortunately, your next test starts again with a “heuristic analysis” or based on some external data.

User experience damage

Traditional A/B testing software causes following distractions for your visitors just by having it included on your pages:

  1. Slower page load time. For example, Optimizely weights a whopping 796K, which is literally 200 times bigger than Volument. Despite all the superintelligence included in the JavaScript file, Optimizely slows down your pages, and your visitors have a less convincing first impression.
  2. Flash of original content. Visual A/B testing tools can cause an off-putting FOOC effect, where the original content is briefly displayed before the alternate (B variant) kicks in. This makes a flaky experience, and savvy people know they are just part of a marketing experiment.

A bad first impression makes people want to leave the site before they even bother to find out what the product is.

The UX damage caused by the various analytics tools
{"short_desc":"Why it doesn't really work.","desc":"The irst-generation testing tools like **Optimizely** and **VWO** are good with buttons and other micro-optimizations. But they are not designed for the entire conversion optimization task.","title":"Problems in traditional A/B testing","url":"/blog/problems-in-traditional-ab-testing","key":"problems-in-traditional-ab-testing","created":"2019-11-11T04:53:57.910Z","modified":"2020-07-24T03:44:53.504Z","createdISO":"2019-11-11","modifiedISO":"2020-07-24"}