Deal with outliers in data when running tests

Business Benefits

Get a better understanding of data behavior and testing results.

Use data visualization techniques like boxplots and scatterplots to explore your data and detect outliers.

  • Boxplots use an interquartile range to show whether your data exists beyond the 75th percentile, or under the 25th percentile.
  • Scatterplots can demonstrate outliers which don’t fit the visual pattern of your data.

Access your raw experiment data instead of simply trusting the dashboard on your testing tool.

Most testing tools allow you to access or download the full raw data from experiments you’ve run.

Set up filters in your testing tool that filter out abnormally large or small values from your testing results.

For example, if you’re tracking revenue as an A/B test goal, and your past analytics data suggests the average web order has been $150, you might set up filters for any orders beyond $1000.

Be careful removing outliers without careful analysis, as sometimes outliers are part of the story, not a distraction from it.

Trim your data set to hide or remove outliers during your post-test analysis.

For example, use the Excel function TRIMMEAN to quickly remove outliers. In R, use mean(function).

Use your knowledge of historical data and data visualization techniques to determine where your extreme values lie.

For example, most outliers in optimization are on the higher end because of bulk orders, and a boxplot will confirm that.

Use conditional formatting in Excel to trim values beyond three standard deviations from the mean of your raw data set.

A boxplot can demonstrate this visually.

Use statistical methods to analyze the underlying distribution of your observed data to determine if it is too skewed to be normalized.

Common post-hoc statistical methods include the Mann-Whitney U-test and Student’s t-test. Which you use will depend on whether your data is distributed in a parametric or non-parametric pattern.

Segment your outliers and analyze them independently.

For example, which demographic, behavioral, or firmographic traits correlate with their unique purchasing behavior?