Flaws and Fallacies in Statistical Thinking 5/5
Stephen K. Campbell
12/ Faulty Induction
A proper statistical process involves making valid inferences about the aggregate of items of interest from a sample of such items. Done properly, this has a high probability of being correct at a fraction of a cost and time required to process the entire set of items.
Induction: arriving at conclusions about an entire group of items from a study of particular cases.
Deduction is the opposite. It takes a major premise (all X is Y), a minor premise (a person is X) and a conclusion (the person is Y); no new information is created because the major premise is too broad to cover all relevant cases.
In induction conclusions are broader than the premises. Some people are X and Y, hence X = Y. This conclusion: creates new information, but also may be dramatically wrong and misleading. The right way to look at it is that the new information only has a certain probability of being true. (If it’s 100% correct, it’s enumeration, not induction). Thus, accepting the conclusion places dependence on the knowledge and personal integrity of the one presenting the new information.
The chance of the conclusion being correct improves by ensuring the sample is representative of the whole. And the conclusions must be compared against the real-world facts, i.e., if something can’t exist in a physical world – most likely the conclusion is wrong.
The conclusions from a sample observed (from) may be generalized to describe a larger group (to); if the “from” sample is not a subset of the “to” group, or even is not a representative subset of it, the generalization is faulty by default. Otherwise, it stands a good chance of being true.
People may have different opinions on what the term “representative” means and use it freely to add weight to their words instead of digging a bit deeper. That’s why the recipient should put an extra effort in understanding the selection criteria for the representative sample.
Sample must be chosen randomly to avoid biases (implicit and explicit) and to provide enough of a variation in key properties. Judgement sampling by definition is based on the “expert” opinion, i.e., personal conviction plays at least some role there. Convenience sampling is simply using whatever sample is available at the moment (fast, cheap and dirty) and passing it off as a representative sample.
Probability-based sampling is the universal approach: each possible different sample of a given size contained within a population has an equal chance of being selected. A purely random approach is the simplest but has some limitations (uncertain accessibility of items in the sample, for instance, in choosing random people and trying to contact them).
Stratified random sampling splits the group into multiple mutually exclusive subgroups (AKA strata) and gets random items from each of these subgroups. This is done to ensure representativeness and reduce sampling error by grouping together sample items more alike with respect to a characteristic under investigation.
Cluster sampling divides the large group into clusters and then takes probability samples are drawn from them.
There’s no universal rule as the size depends on:
amount of variation in the population or the rarity of a specified event
degree of precision required
the kind of sample design with practical consideration like the cost, time, and manpower.
The more variable the population – the larger the sample must be to capture it. It’s particularly true about finding the links to random events (say, certain types of cancer).
MK: No human characteristics can be inferred from a sample size a typical human (i.e., not a researcher or a professional) will come across in their daily life. But there can be a self-selection bias on, say, Tinder, making the sample not truly random.
Every probability-based conclusion has its confidence interval. A statistical fact obtained from a sample is a fact only insofar as that specific sample is concerned.
13/ Relationships: Causal and Casual
The most useful part of statistics is finding the action/condition A result B relationship. This is the kind of practical knowledge people are looking for, because if B is cancer, then eliminating A reduces or eliminates the chance of B. So, it can be very personal.
There are 4 types of causes: necessary (must be present for the event to occur), sufficient (will bring about the event alone and by itself), necessary and sufficient (will bring about the result and without which the event won’t occur; very rare in real life) and contributory (B is more likely to occur if A is present, rather when it’s not).
The fallacy of the common sense is that people are looking for a single contributory cause of a certain event, while there may be none or multiple causes. [MK: this indeed creates entertainment value and a nice talking point for the laymen. The danger is when these straightforward explanations make it to the government policies.]
Oversimplification is very tempting, but it truly requires throwing the baby out with the bathwater. It may be helpful in guessing the major factor in running experiments but is definitely harmful when it comes to presenting the conclusions to the public.
The “before and after” advertising cliches are a goldmine of faulty cause and effect claims. [MK: however, let’s not forget that change management, yes, that change management, is also prone to this single solution bias. It goes like this: unfreeze —> change —> refreeze. Whatever the new state of the system is, can be confidently traced back to the “change” step. This may work for individual departments or, say, product lines, but on a larger scale the outcome can’t be attributed solely to the change made.]
When it comes to people matters, there’s a huge temptation of, say, politicians and CEOs, to attribute all the results of their tenures to their own actions, as if they were primarily driven by them. This works both ways: political opponents can be blamed for anything bad that happened under their watch, too.
When searching for an explanation of some event, one can employ testing by method of agreement: finding the characteristic present in all subjects and using it as a hypothetical cause of the event (even if there may be more independent characteristics).
A more precise method is the one of difference (similar to the A/B tests): identify a characteristic present in all cases where the phenomenon occurs, ensure this characteristic is absent from all cases in which this phenomenon doesn’t occur, and ensure all other characteristics are the same for all the cases. It becomes a controlled experiment, which is almost impossible to pull out in real life (time, money, people, quality of data).
Concomitant variations: if two phenomena vary together (i.e., respond to a cause), one of these phenomena causes the other one or is in some way connected to a cause. It’s not always the case, of course, giving way to the post hoc fallacy (i.e., if B occurred after A, A has to be the cause of B. Or if X and Y are correlated to a statistically significant degree, one is the cause and another one is effect.)
Statistically Correlated Variables
This shallow dive into statistics is needed to explain that while X and Y can be correlated (i.e., change together), one doesn’t necessarily cause another.
Case 1: X and Y correlate because X is the cause of Y. That’s perfect.
Case 2: X and Y correlate because Y is the cause of X. The most curious example is when sales volumes drive ad spend and not the other way around. It’s far from impossible. Another example is that families with 1 car drive less than with 2 cars; one more car doesn’t cause more driving, it is caused by the need for more driving, which couldn’t be satisfied with 1 car.
Case 3: X and Y correlate because they interact with each other. The supply and demand interdependency is a great example of this.
Case 4: X and Y correlate by chance. First of all, the data might not be as clean is it seems: people tend to remember strange coincidences completely ignoring instances when such coincidences didn’t take place. Secondly, some data sets to correlate without an underlying reason, purely by chance. Check here for some bizarre correlations.
Case 5: X and Y correlate because they are both effects of Z, which is outside the analysis. The most common external cause is the population growth driving up all kinds of numbers. [MK: say, there’s a correlation between the growth in the number of schools and cars on the road, but both are driven by the population growth.]
It’s wrong to extrapolate beyond the range of the observed data. This also has a lot to do with the Uncritical projection of trends from Chapter 4. Most things can’t go to infinity or break the laws of physics, so there’s only a usable range of values where the regression makes sense.
The regression fallacy usually sounds like this: natural characteristics over time tend to regress to the mean. (E.g., tall parents have shorter kids or vice versa.) It’s incorrect because this would mean that there’s a natural trend towards the reduction in dispersion and this is not how Mother Nature works. A number of observed examples of high-IQ individuals having mediocre-IQ children don’t prove the fallacy; it’s the existence of the high-IQ individuals that stands out.
In business there are always some firms earning top profits at this very moment; it’s not a guarantee that their past performance will be an indication of future performance: in industries with at least some competition getting to the top of the distribution (relative to the other firms in the sector) and staying there is quite uncommon. There are some short-term factors allowing mediocre companies earn higher profits than others – if only for some time.
Faulty deduction is very common, too. What’s true for the aggregate group may be inappropriate for a subgroup. If the power generation in the country is 5% above the aggregate demand, this still means that some parts of the country will experience power shortages. Exports being 10% of the GNP may look unimportant unless one understands that there are lots of industries involved, tens of millions of jobs, and for them these 10% are very much important.
Generally, the population averages (1 in 14 people has a drinking problem) can’t be applied to smaller groups of people collected according to some principle, not randomly. And these smaller groups don’t necessarily possess the same characteristics as the general population.
Statistics can be misleading if some important explanatory information is deliberately withheld. A good example is that in US corporations PR usually reports to the president / Chair of the Board. What gets omitted is that the majority of corporations have 4 or fewer officers, and the president is the only person in charge of the company image. [MK: I think the author meant the CEO, not the president. That’s how I read this.]
Confusing the reader with superfluous data, which can be dropped without any impact on the message, is another trick. Many people do it just to pad their message as they believe the numbers are too dry to be presented as is, but sometimes the effect is that the unnecessary information is used to redirect one’s attention from what the facts really imply.
Crazy ratios make for some dramatic or comical conclusions. Often the fallacy includes using fractions when anything other than an integer is appropriate. It’s no use knowing that in every given year 1/60 of a child is born to every citizen. Or that men on average buy 1/3 of a pyjama every year.
Using black-and-white reasoning (aka false dichotomy) helps with discussing extremes, but with nothing else. Ignoring the shades of grey is intellectually unconscionable. Any reasoning leading the reader to one of the extremes is manipulative.
Playing with words sounding the same but having slightly different meanings (aka equivocation) is another manipulative technique. Businesses and governments use terms in their strict sense, but in the media these words can have either the commonly used meaning (“unemployed’ in the policy sense is far different from “he’s unemployed most of the time” common meaning) or a literal meaning (especially needed to amplify the message, making it more emotional).