I have data that looks like this.
Here, the freq represents the number of conversions in a visit, and the control and variant count’s are the number of sessions or visits. Converting Sessions is just a binary yes/no - did the session have at least one conversion.
I am trying to run two tests on this.
- Did the variant had statistically more conversions than the control?
- Did the variant had statistically more converting sessions than the control?
Where I am stuck is what test to use to figure this out. I’m leaning towards a Mann Whitney U test for the Test1, as my data is left skewed, but want to know if this is the right approach (esp with the medians for both groups being the same). And if Mann Whitney U s the way to go, is there a calculator some one can recommend? I really want to understand the statistical power of my results given my sample size.
For Test2, I would usually go with a Z-Test, but I am second guessing myself as how can a binary value be normally distributed? This has always confused me when it comes to Web Analytics. Any advice here would be greatly appreciated.
Welcome to the community. Thank you for sharing info on what you’re trying to figure out.
We will tag relevant experts to see what tests they recommend early AM EST tomorrow. And you’ll get a guaranteed recommendation from our community.
Check back tomorrow
If I got the problem and the data right…
This sounds like an unpaired two-samples t-test, or in this case Mann-Whitney U test ( Two-sample data, dependent variable is interval, independent is a factor with two levels).
I would use R and wilcox.test for this.
It looks more like a normal chi-squared test to me. if I understood the question your trying to figure out correctly, you want to compare the number of converting sessions depending on control/variant, not the binary values yes/know per se. In a table this should look something like this:
This one you can also test with R (chisq.test(table(data$dependentVariable, data$independentVariable)
I’m not a statistic pro by any means so please take this comment with a grain of salt.
THank you for the thoughtful response, @zoya.ruhe. What do you think, @chriswragge?
Thanks for the reply @zoya.ruhe !
Totally agree with your recommendation for wilcox.test. I found this site (Mann-Whitney U test) to be super helpful, which also displays the R code down the bottom with descriptions of what the results mean.
I actually went with a two sample proportional Z test (Two sample proportion test calculator with step-by-step solution). If I am not mistaken, the normal chi-squared test tests for normality. While my data is still binary in nature (yes a session converts or no it doesn’t), according to the Central Limit Theorem I have to assume that it follows a normal distribution. I still don’t really understand why binary can be considered normal, but have decided to trust people smarter than me