29 June 2015 13:46:44 IST

Understanding the concept of statistical significance

A simple explanation, without any math

We quite often read — in the science pages of the newspaper — that some treatment has been found to be statistically significant compared to the placebo, at 95 per cent.

This sounds complex, what does it mean? Yes, it does sound complicated. But the concept behind it is actually very simple. Let us try and understand statistical significance without any mathematics.

As you know, we use sample surveys to measure various things. For example, we could be measuring

-The percentage of people who use a particular brand of soap, OR

-The average amount of money spent on eating out each month, OR

-The satisfaction level of customers with their brand of TV...and so on and so forth

The thing to remember is – whenever we use a sample survey, the survey finding will be somewhat accurate, but not exact.

So, if the percentage of people using a brand is 60 per cent in reality, then our sample survey could give us an estimate of 58 or 61...or some such figure.

When the sample size is large, the accuracy levels are likely to be higher. This slight inaccuracy is because of something called sampling error. Just try and remember that sample surveys always have some built-in and unavoidable slight inaccuracies because of sampling error. The only way to avoid this error is to not do a sample but a census.

I hope this is clear so far.

Yes it is. But I am afraid it still doesn’t explain what statistical significance is.

Yes, I am coming to that. Let us assume a Scenario 1, where we measure the awareness for a brand through a sample survey, and then do it again after six months. Since each figure will have some degree of error associated with it, the two figures will probably not be the same, even though nothing has happened in the interim to actually affect brand awareness.

Let us now take Scenario 2: we measure the awareness of a brand through a sample survey, and then run an ad campaign for six months. At the end of the campaign, we once again measure the brand awareness through another sample survey. Here too, the two figures will not be the same. However, now the difference could either be because of sampling error, or because of the fact that an ad campaign has been run.

This is where significance testing comes in useful. If we run an appropriate significance test, we can calculate how likely it is that the difference is because of the campaign and not simply because of sampling error. We can then conclude whether the campaign has had any impact or not.

Statistically significant means the difference between two figures — or whatever we have measured — is not only because of sampling process…..but because there is some real difference.

It is enough for all of us to remember this much.

And when something is statistically significant, it means it is good, right?

No, no! Not always. Good or bad depends on what the desired outcome is. If we run an ad campaign, then we want the brand awareness to be different. But if there are different teachers handling two sections of (say) Class VII, then we don’t want the marks to be significantly different.

And a note of caution: If the significance test says that the difference is probably because of the campaign, it does not automatically mean that the campaign is a success. The campaign can be treated as a success when the extent of difference justifies the expense behind the campaign. Significance testing by itself cannot answer the question of whether the difference is large enough or not, given the money spent. That is a marketing expert’s call.

So these test results can be misinterpreted, or even misused?

Yes, they can. If the statistician is sufficiently unscrupulous, and the audience is not familiar with the concepts, then the tests can be misused. Or if a statistician is unable to explain the concept in simple terms, and the audience is unwilling to invest the time and energy to understand the concepts, then the test results can be misinterpreted.

Haven’t you heard the remark “There are lies, damn lies, and then statistics”? It is attributed — probably wrongly — to Benjamin Disraeli. Essentially, it refers to the use of statistical analysis and data to bolster a weak argument, or to further a case that has no other justification. Naturally, such a tactic, assuming people indeed use it and abuse it, can succeed only if other people don’t quite understand the statistics well.

To gain a better understanding of this subject, the best reference book is probably the statistics textbook written by PK Viswanathan, a noted professor of Economics. Most universities run courses on statistics, and there are several good certificate programmes as well.

To read more from the FundaMental section, click here .