Get a better understanding of data behavior and testing results.
Use data visualization techniques like boxplots and scatterplots to explore your data and detect outliers.
- Boxplots use an interquartile range to show whether your data exists beyond the 75th percentile, or under the 25th percentile.
- Scatterplots can demonstrate outliers which don’t fit the visual pattern of your data.
Most testing tools allow you to access or download the full raw data from experiments you’ve run.
Set up filters in your testing tool that filter out abnormally large or small values from your testing results.
For example, if you’re tracking revenue as an A/B test goal, and your past analytics data suggests the average web order has been $150, you might set up filters for any orders beyond $1000.
Be careful removing outliers without careful analysis, as sometimes outliers are part of the story, not a distraction from it.
For example, use the Excel function
TRIMMEAN to quickly remove outliers. In R, use mean(function).
Use your knowledge of historical data and data visualization techniques to determine where your extreme values lie.
For example, most outliers in optimization are on the higher end because of bulk orders, and a boxplot will confirm that.
Use conditional formatting in Excel to trim values beyond three standard deviations from the mean of your raw data set.
A boxplot can demonstrate this visually.
Use statistical methods to analyze the underlying distribution of your observed data to determine if it is too skewed to be normalized.
Common post-hoc statistical methods include the Mann-Whitney U-test and Student’s t-test. Which you use will depend on whether your data is distributed in a parametric or non-parametric pattern.
For example, which demographic, behavioral, or firmographic traits correlate with their unique purchasing behavior?