Difference between revisions of "A b test"
Line 38: | Line 38: | ||
These parameters should be entered in the UI (how explained below) | These parameters should be entered in the UI (how explained below) | ||
− | + | [[File:Example.jpg]] | |
=Model= | =Model= | ||
The calculator uses Monte Carlo method to calculate the chances to see a nonrandom difference in means for samples. | The calculator uses Monte Carlo method to calculate the chances to see a nonrandom difference in means for samples. | ||
Line 55: | Line 55: | ||
It should take not more then 20-30 seconds. | It should take not more then 20-30 seconds. | ||
3) Run the second block of code: | 3) Run the second block of code: | ||
+ | [[File:Second_Block_of_code.jpg|700px|]] | ||
It should also be launched pretty fast. | It should also be launched pretty fast. |
Revision as of 19:11, 28 January 2022
Name: Sample size calculation for A/B test
Author: Vaso Dzhinchvelashvili
Method: Monte Carlo
Tool: Python
Contents
Some theory
A/B test is a test enabling to see how the feature influenced the performance (some target metric)
α (Alpha) is the probability of Type I error in any hypothesis test–incorrectly rejecting the null hypothesis
β (Beta) is the probability of Type II error in any hypothesis test–incorrectly failing to reject the null hypothesis. (1 – β is power).
Problem definition
Imagine you are the analyst and there is a real problem: you need to understand how many observations do you need, and how long should you conduct an experiment
Limitations
Normally, there are many experiments held within a same company, therefore, for a purity of an experiments, users should not be intersected (cannot participate in two different experiments at a same time) => longer you conduct an experiment, more experiments are getting postponed, therefore development of a product is stopped/slowed down.
So I hardcoded 3 months as a maximum length of an experiment, this way there is lower chance calculations will take an inappropriate time.
Let’s assume there is an ‘old’ feature, and ‘new’ feature is developed to replace the old one. The company needs to decide which feature should be used and which one should be sunsetted (exluded from the product). Both features cannot exist outside the experiment.
Goal
Your goal(as an analyst who uses the calculator) is to understand if a new feature is increasing/decreasing the target metric. You want an assumption to be statistically significant.
What you have
In real life situation there could be the following inputs:
1 Historical information on how users performed in the past for the old feature. Expectation of the metric Variation of the metric
2 How many users access the feature monthly
3 Also, you have some assumptions: you expect a new feature to increase/decrease the metric by 0-x% (both sides). Of course, you want your metric to skyrocket (+10000%) but in reality, you don’t expect more than, say, 20% raise. As mentioned, you can’t hold an experiment longer then 3 months. You might of course. But for the sake of evaluation of my work in somewhat appropriate time (calculation takes some time) I hardcoded the maximum length. But it could be changed in the code.
4 Lastly, there is some chance you are ready to take, to be wrong when assuming a difference was random/not random (i.e. because of a new feature) – normally 5%.
These parameters should be entered in the UI (how explained below)
Model
The calculator uses Monte Carlo method to calculate the chances to see a nonrandom difference in means for samples.
Also, the script uses 2 different formulas and 1 python function to calculate sample size needed to achieve some confidence level (as turned out, all the formulas are tuned for 80% accuracy, however, from the literature review it is not obvious where is a betta, as all the Z/T and other statistics have only alpha in the formulas).
How to run the simulation
1) Add UI-Copy1 clean file to a Jupiter notebook:
2) Run the first block of code:
It should take not more then 20-30 seconds. 3) Run the second block of code:
It should also be launched pretty fast. As a result, the following UI should be visible. (I didn’t have to install any other software/libraries, but read on the internet someone had problems):
As it can be seen, all the parameters from the real life problem can be inserted to the calculator.