I’m doing Momoko Price’s message mining technique for an actual copywriting client I have. This client has thousands and thousands of survey responses.

My question: How do I know when I have ENOUGH data? Is there such a thing as statistical significance for message mining?

I appreciate any insight if you happen to know - Thank you!

I remember seeing a post or a blog from Ben Labay about surveys. I remember he mentioned that 300 normally is enough to get a good measure of the results. In regards to feedback.

We recently mined thousands of Intercom chats. Strongest signals came after 2-300 indeed. There was some small shifts in items lower down the count rate. But to be honest they where already quite far down to be actionable.

I think it might depend on your dataset and results and what you are trying to do.

For message testing, the Wynter panels tend to be fairly small in number too compared to numbers you need for statistical signficance.

I’m not 100% sure how to interpret the chart in point #3 “How Big a Sample Do I Need?” though. (Do we add ‘margin of error’ and ‘confidence level’ together?)

Interestingly, my own gut feeling was that 300 was a good number to aim for. Then, I saw your comment, confirming this.

Yeah, that’s a very technical way to determine truth (margin of error and confidence level), and it’s important! (You would choose a level you’re comfy with in each, not add them together.)

But your sample size probably won’t reach any of the higher sizes for this type of learning anyway.

My basic rule is that if 1-3 people were to have answered differently, would it change my whole data set? If so, the sample size is too small

One more thought here: Take the info you’re gathering as input for a testing strategy. You can see a trend or theme in your message mining, but rather than just taking it as a plan, test out two or three top trends! See which one gets more of your desired result.