Sunday, May 18, 2014

Statistical Con

Disclaimer: I am not a statistician, and this post is purely a myopic view of an extremely vast and difficult subject

So I came across an organization that is trying to impose a dress code for its employees. Now, I have a very neutral opinion about whether organizations should or should not impose dress codes. My belief is that it completely depends on the organization’s culture and the business that it is in. (For e.g – I like seeing customer service executives in a certain dress code  so that I know I ain’t asking my questions to a civilian or fellow-shopper) However, the thing that I am more interested in is – how did the organization actually conduct an internal study, and conclude that employees were voluntarily interested in having a dress code imposed on themselves. The study was concluded with a statistical analysis of the answers, and some of the answers went against my understanding of people’s mentality.

This got me interested in the subject – can we actually go and misuse statistics? Apparently we can, and interestingly we have been doing it all the time. There are even books available on “How to lie with statistics” and there is an entire Wikipedia article dedicated to misuse of statistics.

The applicability of a statistic really depends on the completeness of the people sampled.

I went to the Samsung service center to get my mobile phone fixed the other day. And there were 56 people before me on the same day. So we were made to sit in a big room where everyone instantaneously concluded that Samsung mobile phones are the worst because there were so many of us there at that time.

In the dress code survey statistic, it was never revealed how many people in the organization were actually surveyed and what was their demographic (organizational position). People in sales and in higher positions are more customer facing and naturally they have to adhere to a dress code. So if the survey respondent sample contained the above mentioned people, the survey results would be severely biased.

The answer that a survey gets, depends on the way the question is framed.

I have seen that if a righteous question is asked, we come up with noble thoughts and try to be righteous. If a question is framed like “Do you think a person should be penalized for breaking the organization’s dress code policy?”, the chances are, I would reply with a yes. However if it asked “Do you think you should be severely punished for breaking your organization’s dress code policy?”, my answer would expect some leniency towards me.

The way a sentence is framed or a graph is shown could change the severity of the statement.

If I say “25% of a population has a risk of heart disease, it seems a big number but it still seems distant. But if I say “1 in 4 people have a risk of heart disease, suddenly the number appears to be too close”. This is because we know a lot more than 4 people and it means we are gonna know many individuals who actually have that risk!

The way news media blatantly abuses statistics these days is truly appalling (Well summarized in this XKCD comic). Especially when they just interview a small sample of people in a city by calling a select few people (without regard for unbiased sampling!) and then overgeneralize the output to form an attention grabbing newspaper headline like “Indians are lazy”.

And finally – Correlation doesn't necessarily imply causality

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

And I just came across these two amazing websites that have plotted correlations between totally unrelated dimensions to drive home this point!

The first one is from Business Insider

Correlation doesn't mean causation

And this one called spurious correlations by Tyler Vigen actually finds and plots correlations between absolutely random variables.

Spurious Correlations

So the next time when you come across statistical analysis in a news article. Be warned, be very warned. There is a high chance the journalist has no background in statistics.

No comments:

Post a Comment