Estimate the standard deviation of continuous data

georgi_georgiev · March 10, 2021, 5:34am

Business Benefits

Get more actionable test results from non-binomial metrics.

Choose a period of historical data to use as a baseline when estimating standard deviation.

If you are working with test data this would simply be the test duration period. It’s important that the selected period is neither too long nor too short, as this could capture information with little predictive capacity and distort the accuracy of your estimate. 2-8 weeks of data is usually sufficient.

Extract the value of the metric of choice for each individual user to a spreadsheet, allocating one row per user.

Some values might be zeroes, which is fine. For example, if your metric of choice is revenue per user, it’s normal to have many users with a revenue value of zero unless the dataset only includes paying customers.

Add any missing zero values to your spreadsheet.

Some tools won’t extract data for each individual user or session and might ignore zero values. If your chosen metric is something like transaction revenue per user or per session, manually add these zero values to your spreadsheet to ensure your estimate is accurate.

For example, if you want to calculate the average revenue per user from Google Analytics data, you’ll need to first extract transaction data. Users with no transactions would have a revenue value of zero, so you’ll need to assign zero values to users who didn’t make any purchases until there is a value associated with each user.

Use Excel’s `STDEV.S()` function or R’s `sd()` function or an online standard deviation calculator like the GIGA Calculator Standard Deviation Calculator to estimate the population standard deviation of all values based on your sample.

Include all values in the input, including zeroes. Then use the population standard deviation in sample size calculations or when calculating statistical estimates, such as p-values and confidence intervals.

Alternatively, complete the process manually using the programming language of your choice.

Use Excel’s AVERAGE() function, R’s mean() function, or an arithmetic mean calculator to compute the arithmetic mean of all values.
Use Excel’s POWER() function, R’s ^ operator, or an online exponent calculator to subtract each value from the mean and raise it to the power of two and record the result in a second column.
Use Excel’s SUM() function, R’s sum() function, or an online sum calculator to sum the values of the second column.
Divide the sum by the number of values, subtract one, and take the square root of the result as the standard deviation of the metric of interest.

Last edited by @hesh_fekry 2023-11-14T15:10:52Z

Topic	Replies	Views
Calculate statistical significance for absolute difference in means Step-by-step Playbooks marketing , 5-testing , cro , working-with-results , playbook , significance	380	April 1, 2021
Measure and report customer retention rate Step-by-step Playbooks marketing , automation , saas , playbook , retention-loyalty , customer-satisfaction , measurement-reporting	276	January 16, 2022
Estimate the value of on-site conversions Step-by-step Playbooks marketing , cro , 6-monitoring , playbook , reporting	364	January 18, 2021
Calculate a 95% confidence interval for absolute difference in means Step-by-step Playbooks marketing , 5-testing , cro , working-with-results , playbook	1116	March 29, 2021
Choose ecommerce metrics for a dashboard in Tableau Step-by-step Playbooks marketing , ecommerce , analytics , dashboard , playbook	281	January 16, 2021