The WMD (Weapons of Math Destruction) are algorithms that judge people on many criteria. But they’re opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target or “optimize” millions of people. (direct quote)
1/ Bomb Parts: What is a Model?
A model is just an abstract representation of some process, usually utilized to predict outcomes and/or explain the past performance.
The people building models often lack the data they need to predict the subject’s behaviour, so they use proxies, draw statistical correlations with anything available about the person: a zip code, credit score, language patterns, etc. Some of this data is discriminatory (e.g., zip code) and illegal.
Models are simplifications, so they will never completely describe reality, but if the right variables and constraints are used, and the models are “trained” (i.e., the values in the formulas are adjusted based on the repeated “running” of the model), there’s a lot of practical value in them – in the form of probabilities of expected outcomes.
Whatever is not included in the model (i.e., “blind spots”) is an indication of what the creator of the model doesn’t find useful. These omissions can be deliberate or a result of poor model construction, laziness, or negligence. A model is only useful if its author makes it clear what was omitted and why.
Every model must “tell” what it is trying to accomplish: optimize a process, predict future results, ensure sufficient representation of a certain group of people, etc. The objective of a model is a reflection not only of the task at hand, but also of its author and, if it’s produced in a corporate environment, - of the corporate priorities.
Models have a useful life; over time they become less precise due to the variance of data and the emergence of unexpected variables.
The danger of simple models is that they may become persuasive enough (e.g., racism) to morph into beliefs; at this stage it’s very hard to change them because of the confirmation bias. Simplified beliefs can literally be the decisive factor in whether the person lives or dies, or at least gets a longer prison sentence.
Supposedly, using models that are blind to biases will lead to great improvements in fairness and consistency. But first it has to be proven that they don’t merely camouflage these biases with technology (say, the supposedly race-blind model being trained by a racist). Metadata (i.e. the data describing other data) reveals the needed details about a person without asking direct questions, which might be illegal. Bias usually hits where it hurts the most.
The outputs of models are rarely questioned, even if they are clearly biased. The unfair outputs of models may themselves be inputs into similar models (an ex-convict isn’t hired, can’t support themselves, steals, gets a longer sentence, thus increasing the risk of getting incarcerated yet again and again). This is a typical example of a WMD.
Models can be used dishonestly by not revealing the possible consequences to the person being profiled or questioned. This is true for both the law enforcement and the corporate settings. It’s quite clear why: like with security cameras – whoever knows the setup, knows how to avoid being caught. At the same time, even the most honest model can be distrusted for the mere reason of its existence.
Intellectual property rights are not helping create transparency: it’s very tempting to hide the inner workings and the suspected deficiencies of the models by claiming their proprietary nature. This may be a fair point: reverse engineering a model may be very hard, and it the model is self-learning (i.e., incorporates new data every time it runs), then this is an impossible task due to the limited number of external observations.
Models are by nature probabilistic, i.e., there will definitely be cases when they spit out false positives or false negatives; it sucks being the person on whom the model misfired. Wrong credit score can mean both reduced job prospects and increased mortgage payments. One single model glitch can propagate through an interconnected system affecting the entire person’s life. The danger is that the decisions based on the model outputs usually can’t be appealed. This creates the worst case of unaccountability.
Part 2.
Famous psychologists and neuro-scientists claim that a human being has two "cognition systems": S1 which is a probabilistic engine working on associations learnt from past experience, and S2 which is the true intelligence, the ability to formally think of pre-requisites, logical connections, and conclusions. It is funny that what we call "artificial intelligence" is purely S1, while when we talk about intelligence of a human being we normally mean the ability of activating and using S2.
A huge problem with different profiling and prediction techniques is that they conserve the status quo. It is a fact of the world that some people are more criminal than the other. However, we all understand that the reason for this fact is not that some people are inferior, but rather that they are put in a situation where they lack skills or possibilities to earn without crime. A logical policy would reflect this, a probabilistic automaton can just say "beer drinking + living in this area + low education = criminal". Excluding some data would not help, as the system will automatically find proxy variables (e.g. it may automatically switch from registration address, which is illegal to use in many countries, to the geography of Uber drives or Facebook check-ins, which is a piece of info one can easily buy)
As the jinn is out of the bottle, banning algorithms is not a way to go. And I don't think that anyone has a ready solution for this problem.