I analyzed past tests of my new clients and found at least 6 of about a hundred tests that have SRM. The discrepancy is about 3-4%. Test variations always have slightly more traffic. SRM calculators confirm it.
Strictly speaking we can’t trust test results of that tests, and our set-up. The only thing I found is that Optimize snippet is below GTM snippet. Could that be a reason?
What can look like an SRM can actually be something else, like returning users, so it’s not that there isn’t a tolerance, it’s more of like “something happened here and we need to investigate a little more”.
Agree, that returning users may be a case sometimes.
However, most of scientific sources I came across regarding SRM, tell that it is not tolerable. Do you have any sources that support the opposite?
There is a quote from a scientific paper called Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners.
For example, if a 50/50 split is expected between two experiment variants, the ratio between the number of users exposed to each of the groups at the end of the experiment is expected to be close to 1. While there are many data quality issues that could decrease the validity and significance of a controlled experiment, Sample Ratio Mismatch in most cases completely invalidates experiment results. For example, a ratio of 50.2/49.8 (821,588 versus 815,482 users) diverges enough from an expected 50/50 ratio that the probability that it happened by chance is less than 1 in 500k.