Thursday, October 8, 2015

Two In Two Days

Tuesday, Cory Doctorow pointed to "another of [Maciej Cegłowski's] barn-burning speeches". It is entitled What Happens Next Will Amaze You and it is a must-read exploration of the ecosystem of the Web and its business model of pervasive surveillance. I commented on my post from last May Preserving the Ads? pointing to it, because Cegłowski goes into much more of the awfulness of the Web ecosystem than I did.

Yesterday Doctorow pointed to another of Maciej Cegłowski's barn-burning speeches. This one is entitled Haunted by Data, and it is just as much of a must-read. Doctorow is obviously a fan of Cegłowski's and now so am I. It is hard to write talks this good, and even harder to ensure that they are relevant to stuff I was posting in May. This one takes the argument of The Panopticon Is Good For You, also from May, and makes it more general and much clearer. Below the fold, details.

I argued that the big data enthusiasts in the health industry were failing to, and probably had never even considered, ensuring that they had informed consent from their patients subjects victims as to the negative consequences of the inevitable leak of the big data that was being collected about them. Doctorow writes:
Maciej Cegłowski spoke ... about the toxicity of data -- the fact that data collected is likely to leak, and that data-leaks resemble nuclear leaks in that even the "dilute" data (metadata or lightly contaminated boiler suits and tools) are still deadly when enough of them leak out (I've been using this metaphor since 2008).
Cegłowski writes:
In particular, I'd like to draw a parallel between what we're doing and nuclear energy, another technology whose beneficial uses we could never quite untangle from the harmful ones. A singular problem of nuclear power is that it generated deadly waste whose lifespan was far longer than the institutions we could build to guard it. Nuclear waste remains dangerous for many thousands of years. The data we're collecting about people has this same odd property. Tech companies come and go, not to mention the fact that we share and sell personal data promiscuously. But information about people retains its power as long as those people are alive, and sometimes as long as their children are alive. No one knows what will become of sites like Twitter in five years or ten. But the data those sites own will retain the power to hurt for decades.
...
In a world where everything is tracked and kept forever, like the world we're for some reason building, you become hostage to the worst thing you've ever done. Whoever controls that data has power over you, whether or not they exercise it. And yet we treat this data with the utmost carelessness, as if it held no power at all.
It isn't just the data that lasts forever, the organization becomes addicted to the flow of data:
You can't just set up an elaborate surveillance infrastructure and then decide to ignore it. These data pipelines take on an institutional life of their own, and it doesn't help that people speak of the "data driven organization" with the same religious fervor as a "Christ-centered life". The data mindset is good for some questions, but completely inadequate for others. But try arguing that with someone who insists on seeing the numbers.
In the same way an addict always wants more drug, the organization always wants more data. Here is Cegłowski: 
The promise is that enough data will give you insight. ... There's a little bit of a con going on here. On the data side, they tell you to collect all the data you can, because they have magic algorithms to help you make sense of it. On the algorithms side, where I live, they tell us not to worry too much about our models, because they have magical data. ... The data collectors put their faith in the algorithms, and the programmers put their faith in the data.
And here is Doctorow:
Big Data's advocates believe that all this can be solved with more Big Data. This requires them to deny the privacy harms from collecting (and, inevitably, leaking) our personal information, and to assert without evidence that they can massage the data so that it can't be associated with the humans from whom it was extracted.
And, like the addict, the organization's effectiveness decays as the drug takes over:
The pharmaceutical industry has something called Eroom's Law (which is ‘Moore’s Law’ spelled backwards). It's the observation that the number of drugs discovered per billion dollars in research has dropped by half every nine years since 1950. ... This is astonishing, because the entire science of biochemistry has developed since 1950. Every step of the drug discovery pipeline has become more efficient, some by orders of magnitude, and yet overall the process is eighty times less cost-effective. This has been a bitter pill to swallow for the pharmacological industry. They bought in to the idea of big data very early on.
I hope this is enough to get you to read Cegłowski's talk; its well worth your time. While you're there, read this one too.

1 comment:

David. said...

Sixteen years ago Cory Doctorow wrote Personal data is as hot as nuclear waste about the danger of data leaking. Nine years ago Maciej Cegłowski wrote Haunted by Data on the same theme:

"Nuclear waste remains dangerous for many thousands of years. The data we're collecting about people has this same odd property. Tech companies come and go, not to mention the fact that we share and sell personal data promiscuously. But information about people retains its power as long as those people are alive, and sometimes as long as their children are alive."

Now, Mark Pesce writes Data is the new uranium – incredibly powerful and amazingly dangerous:

"Most security execs know they have pools of data all over the place, and that marketing departments have built massive data-gathering and analytics engines into all customer-facing systems, and acquire more data every day.

But they're mostly unable to identify all the data they hold, and are unsure if those who collect it understand the reputational and financial risks of a data breach – blame for which lands on a CISO's desk no matter who messed up.

CISOs therefore increasingly feel that the cost of managing data sometimes exceeds its value. Those I observed have found themselves wishing for a world with less data that needs securing."

It is Groundhog Day.