Weapons of Math Destruction 5/6

Cathy O'Neil

Max Kraynov

Aug 16, 2021

Part 4.

8/ Collateral Damage

While official credit scores are regulated to the extent one can’t target people by their credit scores alone, their proxies (e-scores) are not and are widely used to profile, say, web site visitors. Such profiling can include the financial situation, location (matched against post codes with higher delinquency rates), clickstreams and interests – all of this helps target various offers based on these scores (I seem to be doing good, so here’s an ad for a credit card with 12.99% APR, while my colleague from a less privileged suburb will get an offer for 18.99 APR). Offers, responses to them and expected customer lifetime value are all put into a model, which does have a feedback loop, and both targeting and offer get adjusted for revenue maximization.
Actually, post codes may even be redundant: since there’s a sizeable correlation between credit scores and education, and between education and class + race, it’s possible to guess one’s credit score range, class and race from one’s spelling mistakes and letter capitalisation.
What started as an attempt to eliminate “buckets” (i.e., grouping people who display similar attributes or are “like each other”) in credit reporting has made a full circle, and web targeting brings us back to identifying the buckets we would fit in.
Buckets work most of the time and offer a fast and efficient way of navigating through a series of internal targeting if-else conditions. But no one is immune from being misclassified and put in the wrong bucket. [MK: And for the avoidance of doubt, the “wrong” bucket is almost never a good thing: it usually means being passed for promotion, having a loan application declined or not getting accepted to a good school.] This is where people who have no idea how they were scored and who have no one to complain to become collateral damage.
Buckets may punish people for something they have nothing to do with: if the algorithm assigns high weight to one’s social network (“birds of a feather flock together”), it may lift the prospects of someone with a “good” network of ex-classmates, but completely sink someone hardworking who tries to break out of the circle of “friends” from an inner-city school.
While credit scores are regulated, hiring practices relying on credit scores as proxies for consistent behaviour and good judgement are not. And as any HR manager knows, everyone can get rejected if needed, so the credit score may not even appear as the reason for the rejection. Same as refusing a breathalyser, refusing to access one’s credit report by an employer is an automatic offence.
Let’s be clear: there are genuine life circumstances where missing a credit card payment or being late with a bill is the lesser evil than, say, not being able to pay for an emergency surgery. Or losing a paycheck-to-paycheck job, which also happens. So it’s not completely moral to use credit score as proxy for anything but an indication of one’s wealth.
One can be unlucky enough to share the same name as, say, a known terrorist, which almost guarantees a place in the “no fly” list unless s/he applies for a redress number. Just think about it: one has to actively prove their innocence simply because some algorithm used by authorities gave a false positive.
Companies assembling profiles for every person do so by mixing and matching data from multiple databases – public, private and sometimes illegally obtained. Anyone knowing what ETL means (“extract – transform – load” for those who don’t) will immediately pull the fire alarm and explain that creating a super-data set out of multiple data sets is prone with errors just by the virtue of data never being 100% correct even within a single database, let alone multiple ones.
This again creates a chain of no accountability: data aggregators don’t deal with end users; they deal with businesses relying on that data for decision making on these end users. Sometimes it’s possible to fix data issues via court orders, but only if the reason for the decision is provided to the end user.
The most common data issues are duplicate (my data belongs to 2+ of my profiles that should have been merged) or blended (my data contains the data from someone else who the system thought is me due to the full match of a name and/or data of birth, or an incorrectly entered identifier such as Social Security Number, or even who owned my mobile phone number 10 years ago) identities.
The core issue is automated data collection and matching. While algorithms work really well 90%+ of the time, that’s not nearly enough to completely rely on them without being irresponsible. There’s no guarantee that the algorithm will self-learn and automatically fix previous errors, so some errors will remain in the user profile forever.

Part 6.

Course Notes: Continuous Business Learning

Discussion about this post