Thursday, February 27, 2020

Ludwig Siegele On Data

Ludwig Siegele's latest Special report for The Economist is entitled A deluge of data is giving rise to a new economy. He provides an excellent overview of the impact the availability of vast amounts of data is having on business. But follow me below the fold for my two quibbles.

First, Siegele like many others elides the difference between IDC's "data generated" numbers and the data that can be stored in data centers and processed, for example to train AIs, manage inventories, track people, etc. He writes:
If the amount of data generated around the world is any guide, this new economy is growing fast. The first human genome (three gigabytes of data, which nearly fills a dvd) was sequenced 17 years ago; in April, 23andMe, a firm which offers genetic testing, claimed more than 10m customers. The latest autonomous vehicles produce up to 30 terabytes for every eight hours of driving (or some 6,400 dvds). idc, a market-research firm, estimates the world will generate about 90 zettabytes (19trn dvds) this year and next (see chart), more than all data produced since the advent of computers.
But then, without drawing a distinction between "generated data" and "stored data", writes:
Many firms want to use data to infuse their corporate applications with ai. They have built central repositories such as “data lakes”, which hold all kinds of digital information.
The data that can be stored in order to extract value from it in the future can be no more than (and is actually much less than) the amount of storage manufactured, which is many orders of magnitude smaller, a point I've been making at least since 2016's Where Did All Those Bits Go?
The numbers projected by IDC for "data generated" have always greatly exceeded the numbers actually reported for storage media manufactured in a given year, which in turn as discussed above exaggerate the capacity added to the world's storage infrastructure. So where were the extra projected bytes stored?

The assumption behind "demand exceeds supply" is that every byte of "data generated" in the IDC report is a byte of demand for permanent storage capacity. In a world where storage was free there would still be much data generated that was never intended to be stored for any length of time, and would thus not represent demand for storage media.
If the value to be extracted by storing a lot more data were large, customers would pay more for storage and more would be manufactured. But the results for storage manufacturers don't show huge margins driving large investments in capacity.

So the value that can be extracted from the average byte of IDC's "data generated" number is actually very low. This is an argument for edge computing - to compress the data from sensors enough that it can be communicated over the available bandwidth and stored in the available storage. The Large Hadron Collider is an example of this. Much of its computing infrastructure is devoted to throwing data flowing from the instruments away in order to ensure that the limited available write bandwidth and storage capacity is used for the events that are potentially interesting.

Second, in discussing the connectivity for edge computing, Siegele writes:
Ericsson, a maker of network gear, predicts that the number of iot devices will reach 25bn by 2025, up from 11bn in 2019. Such an estimate may sound self-serving, but this explosion is the likely outcome of a big shift in how data is collected. Currently, many devices are tethered by cable. Increasingly, they will be connected wirelessly. 5g, the next generation of mobile technology, is designed to support 1m connections per square kilometre, meaning that in Manhattan alone there could be 60m connections. Ericsson estimates that mobile networks will carry 160 exabytes of data globally each month by 2025, four times the current amount.
The idea that 5G is there to provide bandwidth to the IoT has a snag. Combine 5G's 2Gb/s bandwidth with the IoT's notorious lack of security and you have a disaster. For there to be billions of IoT devices they need to be very cheap, too cheap for effective software support. As I write this, we have an example of the problem in Dab Goodin's Flaw in billions of Wi-Fi devices left communications open to eavesdropping:
Billions of devices—many of them already patched—are affected by a Wi-Fi vulnerability that allows nearby attackers to decrypt sensitive data sent over the air, researchers said on Wednesday at the RSA security conference.
Manufacturers have made patches available for most or all of the affected devices, but it’s not clear how many devices have installed the patches. Of greatest concern are vulnerable wireless routers, which often go unpatched indefinitely.
Note that the phones and tablets (hundreds of dollars) are very likely to have been patched, whereas the routers (tens of dollars) are unlikely to have been patched. The numbers of IoT devices being projected imply small integer dollar prices, so they're even less likely to get patched.

As I wrote recently:
The IoT has proliferated for two reasons, the Things are very cheap and connecting them to the Internet is unregulated, so ISPs cannot impose hassles. But connecting a Thing to the 5G Internet will require a data plan from the carrier, so they will be able to impose requirements, and thus costs. Among the requirements will have to be that the Things have UL certification, adequate security and support, including timely software updates for their presumably long connected life. It is precisely the lack of these expensive attributes that have made the IoT so ubiquitous and such a security dumpster-fire!
Thus the rosy 5G-IoT scenario is unlikely. And there's something odd when Siegele writes:
the number of iot devices will reach 25bn by 2025, up from 11bn in 2019 ...  5g, the next generation of mobile technology, is designed to support 1m connections per square kilometre, meaning that in Manhattan alone there could be 60m connections.
About 1.6M people live in Manhattan, but the daytime population includes another 1.5M commuters, plus an average of 0.2M tourists. The figures Siegele quotes from Ericsson suggest that in 2025 the 8.1B humans will average around 3 devices each. Manhattan humans will have a lot more than average, probably a lot more than 6 times the world average.

No comments: