Flaws and Fallacies in Statistical Thinking 2/5
Stephen K. Campbell
3/ Meaningless Statistics
Meaningless statistics is by definition meaningless, i.e., it’s not leading to any change in behaviour or decision making and is also bloody annoying.
Mixing precise numbers and qualitative/vague criteria (e.g., 75% of our staff has been with the company for a very long time) is flat out misleading but gives the source a way out by redefining the definition of “very long time” on the go. The trick is to use the common terms that everyone has a personal (i.e., not universally accepted) understanding of, and using it to the advantage of the source. Some of the terms are “key personnel”, “major supplier”, “overreliance” – a smart manager can trick their not-so-smart higher-ups by using these terms when convenient.
Definitions of vague terms can be overcomplicated, making assumptions about certain things, and adjusting others to fit the definition frame. As a result, definitions become too verbose to understand and useless for a practical purpose. It’s the complexity of the definition that is a manipulative technique: the more verbose – the supposedly better the definition is (it’s not!). [MK: the closest example I can come up with is the practice of building complex financial forecasting models: the more parameters the model takes into account, the more invested in it its creators become, and the more pain such creators experience when things inevitably go not as planned.]
4/ Far-Fetched Estimates
Much misleading information comes to us as estimates (“educated guesses”) about unknown statistical facts. [MK: I remember at INSEAD we had a challenge of guesstimating a range for something like 25 or 30 questions and how poorly the whole class performed.]
Unknowable Statistic – one that no one can possibly know. It’s a wild guess or blatant fabrication, a loosely prepared estimate (based on the common sense, of course) or something prepared with painstaking but wasted care. It could be some physical barrier interfering with data collection, or garbage data, or the people who should know don’t know or won’t say.
Not all cases get reported – abortions, shoplifting, fraud, etc. That’s why even estimating the order of magnitude of fraud (especially in money laundering) is a challenge.
Some estimates are made based on eccentric theories (Flat Earth Society, anyone?). The wrong underlying assumptions about the nature of things can lead to correct estimates by the virtue of a miracle. Say, if the middle class reduces their spend on gourmet foods, in a few months there’s going to be a recession. It’s an attempt to find a strong signal in the otherwise noisy consumer sentiment, but a failed one.
Preposterous estimates are made to shock the recipient and make the subject more dramatic and/or important than it really is. Again, poverty and theft numbers are good candidates to shock and influence policies.
The build-up from a dubious cluster occurs when a group of people being chosen to prove the point is either not representative of the relevant population, or the count or measure of the cluster (e.g., the number of crimes or the average loss from a theft) is an Unknowable Statistic or can’t be measured reliably enough.
Uncritical projection of trends is the analyst’s pet peeve. The most obvious fallacy is the uncapped growth (humanity will overpopulate the Earth) or shrinking (there won’t be any oil left soon). There are many-many examples of trends that look ridiculous to anyone but the author.
It’s in fact better to ask twice than to repeat someone’s words twice.
Asking Questions About Estimates
What kind of reputation for integrity does the source have and what is its expertise in the subject matter? [MK: this is not the only question as experts are quite often mistaken, too.]
Does the supplier of data has vested interests in this data or its interpretation? This can be reflected in the intentional and unintentional biases.
What supportive evidence is offered, and does it really support the point or just confuses the reader?
Does the underlying assumption, methodology or theory seem okay? Are the respondents representative of the population and are there any uncapped trend assumptions?
Do the estimates appear plausible? Common sense and a bit of factual knowledge about a topic are all that’s needed to see the naked king behind the veil of words. [MK: that’s also why after certain point in one’s career it’s helpful to be more of a generalist rather than a specialist.]