A Different Approach to A/B Testing

Published

Feb 17, 2021

Reading time

2 min read

Dear friends,

When a lot of data is available, machine learning is great at automating decisions. But when data is scarce, consider using the data to augment human insight, so people can make better decisions.

Let me illustrate this point with A/B testing. The common understanding of the process is:

Build two versions of your product. For example, on the DeepLearning.AI website, one version might say, “Build your career with DeepLearning.AI,” and another, “Grow your skills with DeepLearning.AI.”
Show both versions to groups of users chosen at random and collect data on their behavior.
Launch the version that results in better engagement (or another relevant metric).

But this is not how I typically use A/B testing. Often I run such tests to gain insight, not to choose which product to launch. Here‘s how it works:

Build two versions of your product.
Have the product team make predictions about which version will perform better.
Test both versions and collect data on user behavior.
Show the results to the team, and let them influence their beliefs about users and their reactions. If someone says, “Oh, that’s weird. I didn’t realize our users wanted that!” then we’ve learned something valuable.
Based on the team’s revised intuitions, have them decide what to launch. It could be version A, version B, or something else.
Repeat until you reach diminishing returns in terms of learning.

On major websites, where the developers may run thousands of automated experiments a day — for example, trying out different ad placements to see who clicks on what — it’s not possible for people to look at every experimental result to hone their intuition. In this case, fully or mostly automated decision making works well. An algorithm can try multiple versions and pick the one that achieves the best metrics (or use the data to learn what to show a given user). But when the number of daily experiments is small, using such experiments to hone your intuition allows you to combine limited trials with human insight to arrive at a better decision.

Beyond A/B testing, the same concept applies to building machine learning systems. If your dataset size is modest, combining data-derived insights with human insights is critical. For example, you might do careful error analysis to derive insights and then design a system architecture that captures how you would carry out the task. If you have a massive amount of data, more automation — perhaps a large end-to-end learning algorithm — can work. But even then, error analysis and human insight still play important roles.

Keep learning!

Andrew

Subscribe to The Batch