Yesterday Doctorow pointed to another of Maciej Cegłowski's barn-burning speeches. This one is entitled Haunted by Data, and it is just as much of a must-read. Doctorow is obviously a fan of Cegłowski's and now so am I. It is hard to write talks this good, and even harder to ensure that they are relevant to stuff I was posting in May. This one takes the argument of The Panopticon Is Good For You, also from May, and makes it more general and much clearer. Below the fold, details.
I argued that the big data enthusiasts in the health industry were failing to, and probably had never even considered, ensuring that they had informed consent from their
Maciej Cegłowski spoke ... about the toxicity of data -- the fact that data collected is likely to leak, and that data-leaks resemble nuclear leaks in that even the "dilute" data (metadata or lightly contaminated boiler suits and tools) are still deadly when enough of them leak out (I've been using this metaphor since 2008).Cegłowski writes:
In particular, I'd like to draw a parallel between what we're doing and nuclear energy, another technology whose beneficial uses we could never quite untangle from the harmful ones. A singular problem of nuclear power is that it generated deadly waste whose lifespan was far longer than the institutions we could build to guard it. Nuclear waste remains dangerous for many thousands of years. The data we're collecting about people has this same odd property. Tech companies come and go, not to mention the fact that we share and sell personal data promiscuously. But information about people retains its power as long as those people are alive, and sometimes as long as their children are alive. No one knows what will become of sites like Twitter in five years or ten. But the data those sites own will retain the power to hurt for decades.It isn't just the data that lasts forever, the organization becomes addicted to the flow of data:
In a world where everything is tracked and kept forever, like the world we're for some reason building, you become hostage to the worst thing you've ever done. Whoever controls that data has power over you, whether or not they exercise it. And yet we treat this data with the utmost carelessness, as if it held no power at all.
You can't just set up an elaborate surveillance infrastructure and then decide to ignore it. These data pipelines take on an institutional life of their own, and it doesn't help that people speak of the "data driven organization" with the same religious fervor as a "Christ-centered life". The data mindset is good for some questions, but completely inadequate for others. But try arguing that with someone who insists on seeing the numbers.In the same way an addict always wants more drug, the organization always wants more data. Here is Cegłowski:
The promise is that enough data will give you insight. ... There's a little bit of a con going on here. On the data side, they tell you to collect all the data you can, because they have magic algorithms to help you make sense of it. On the algorithms side, where I live, they tell us not to worry too much about our models, because they have magical data. ... The data collectors put their faith in the algorithms, and the programmers put their faith in the data.And here is Doctorow:
Big Data's advocates believe that all this can be solved with more Big Data. This requires them to deny the privacy harms from collecting (and, inevitably, leaking) our personal information, and to assert without evidence that they can massage the data so that it can't be associated with the humans from whom it was extracted.And, like the addict, the organization's effectiveness decays as the drug takes over:
Eroom's Law (which is ‘Moore’s Law’ spelled backwards). It's the observation that the number of drugs discovered per billion dollars in research has dropped by half every nine years since 1950. ... This is astonishing, because the entire science of biochemistry has developed since 1950. Every step of the drug discovery pipeline has become more efficient, some by orders of magnitude, and yet overall the process is eighty times less cost-effective. This has been a bitter pill to swallow for the pharmacological industry. They bought in to the idea of big data very early on.I hope this is enough to get you to read Cegłowski's talk; its well worth your time. While you're there, read this one too.