Thursday, January 11, 2018

It Isn't About The Technology

A year and a half ago I attended Brewster Kahle's Decentralized Web Summit and wrote:
I am working on a post about my reactions to the first two days (I couldn't attend the third) but it requires a good deal of thought, so it'll take a while.
As I recall, I came away from the Summit frustrated. I posted the TL;DR version of the reason half a year ago in Why Is The Web "Centralized"? :
What is the centralization that decentralized Web advocates are reacting against? Clearly, it is the domination of the Web by the FANG (Facebook, Amazon, Netflix, Google) and a few other large companies such as the cable oligopoly.

These companies came to dominate the Web for economic not technological reasons.
Yet the decentralized Web advocates persist in believing that the answer is new technologies, which suffer from the same economic problems as the existing decentralized technologies underlying the "centralized" Web we have. A decentralized technology infrastructure is necessary for a decentralized Web but it isn't sufficient. Absent an understanding of how the rest of the solution is going to work, designing the infrastructure is an academic exercise.

It is finally time for the long-delayed long-form post. I should first reiterate that I'm greatly in favor of the idea of a decentralized Web based on decentralized storage. It would be a much better world if it happened. I'm happy to dream along with my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?. He describes a potential future decentralized system of scholarly communication built on existing Web protocols. But even he prefaces the dream with a caveat that the future he describes "will most likely never exist".

I agree with Herbert about the desirability of his vision, but I also agree that it is unlikely. Below the fold I summarize Herbert's vision, then go through a long explanation of why I think he's right about the low likelihood of its coming into existence.

Herbert identifies three classes of decentralized Web technology and explains that he decided not to deal with these two:
  • Distributed file systems. Herbert is right about this. Internet-scale distributed file systems were first prototyped in the late 90s with Intermemory and Oceanstore, and many successors have followed in their footsteps. None have achieved sustainability or Internet platform scale. The reasons are many, the economic one of which I wrote about in Is Distributed Storage Sustainable? Betteridge's Law applies, so the answer is "no".
  • Blockchains. Herbert is right about this too. Even the blockchain pioneers have to admit that, in the real world, blockchains have failed to deliver any of their promised advantages over centralized systems. In particular, as we see with Bitcoin, maintaining decentralization against economies of scale is a fundamental, unsolved problem:
    Trying by technical means to remove the need to have viable economics and governance is doomed to fail in the medium- let alone the long-term. What is needed is a solution to the economic and governance problems. Then a technology can be designed to work in that framework.
    And, as Vitalik Buterin points out, the security of blockchains depends upon decentralization:
    In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently.
Herbert's reason for disregarding distributed file systems and blockchains is that they both involve entirely new protocols. He favors the approach being pursued at MIT in Sir Tim Berners-Lee's Solid project, which builds on existing Web protocols. Herbert's long experience convinces him (and me) that this is a less risky approach. My reason is different; they both reduce to previously unsolved problems.

The basic idea of Solid is that each person would own a Web domain, the "host" part of a set of URLs that they control. These URLs would be served by a "pod", a Web server controlled by the user that implemented a whole set of Web API standards, including authentication and authorization. Browser-side apps would interact with these pods, allowing the user to:
  • Export a machine-readable profile describing the pod and its capabilities.
  • Write content for the pod.
  • Control others access to the content of the pod.
Pods would have inboxes to receive notifications from other pods. So that, for example, if Alice writes a document and Bob writes a comment in his pod that links to it in Alice's pod, a notification appears in the inbox of Alice's pod announcing that event. Alice can then link from the document in her pod to Bob's comment in his pod. In this way, users are in control of their content which, if access is allowed, can be used by Web apps elsewhere.

In Herbert's vision, institutions would host their researchers "research pods", which would be part of their personal domain but would have extensions specific to scholarly communication, such as automatic archiving upon publication.

Herbert demonstrates that the standards and technology needed to implement his pod-based vision for scholarly communication exist, if the implementation is currently a bit fragile. But he concludes by saying:
By understanding why it is not feasible we may get new insights into what is feasible.
I'll take up his challenge, but in regard to the decentralized Web that underlies and is in some respects a precondition for his vision. I hope in a future post to apply the arguments that follow to his scholarly communication vision in particular.

The long explanation for why I agree with Herbert that the Solid future "will most likely never exist" starts here. Note that much of what I link to from now on is a must-read, flagged (MR). Most of them are long and cover many issues that are less, but still, related to the reason I agree with Herbert than the parts I cite.

Cory Doctorow introduces his post about Charlie Stross' keynote for the 34th Chaos Communications Congress (MR) by writing (MR):
Stross is very interested in what it means that today's tech billionaires are terrified of being slaughtered by psychotic runaway AIs. Like Ted Chiang and me, Stross thinks that corporations are "slow AIs" that show what happens when we build "machines" designed to optimize for one kind of growth above all moral or ethical considerations, and that these captains of industry are projecting their fears of the businesses they nominally command onto the computers around them. 
Stross uses the Paperclip Maximizer thought experiment to discuss how the goal of these "slow AIs", which is to maximize profit growth, makes them a threat to humanity. The myth is that these genius tech billionaire CEOs are "in charge", decision makers. But in reality, their decisions are tightly constrained by the logic embedded in their profit growth maximizing "slow AIs".

Here's an example of a "slow AI" responding to its Prime Directive and constraining the "decision makers".  Dave Farber's IP list discussed Hiroko Tabuchi's New York Times article How Climate Change Deniers Rise to the Top in Google Searches, which described how well-funded climate deniers were buying ads on Google that appeared at the top of search results for climate change. Chuck McManis (Chuck & I worked together at Sun Microsystems. He worked at Google then built Blekko, another search engine.) contributed a typically informative response. As previously, I have Chuck's permission to quote him extensively:
publications, as recently as the early 21st century, had a very strict wall between editorial and advertising. It compromises the integrity of journalism if the editorial staff can be driven by the advertisers. And Google exploited that tension and turned it into a business model.
How did they do that?
When people started using Google as an 'answer this question' machine, and then Google created a mechanism to show your [paid] answer first, the stage was set for what has become a gross perversion of 'reference' information.
Why would they do that? Their margins were under pressure:
The average price per click (CPC) of advertisements on Google sites has gone down for every year, and nearly every quarter, since 2009. At the same time Microsoft's Bing search engine CPCs have gone up. As the advantage of Google's search index is eroded by time and investment, primarily by Microsoft, advertisers have been shifting budget to be more of a blend between the two companies. The trend suggests that at some point in the not to distant future advertising margins for both engines will be equivalent.
And their other businesses weren't profitable:
Google has scrambled to find an adjacent market, one that could not only generate enough revenue to pay for the infrastructure but also to generate a net income . Youtube, its biggest success outside of search, and the closest thing they have, has yet to do that after literally a decade of investment and effort.
So what did they do?
As a result Google has turned to the only tools it has that work,  it has reduced payments to its 'affiliate' sites (AdSense for content payments), then boosted the number of ad 'slots' on Google sites, and finally paying third parties to send search traffic preferentially to Google (this too hurts Google's overall search margin)
And the effect on users is:
On the search page, Google's bread and butter so to speak, for a 'highly contested' search (that is what search engine marketeers call a search query that can generate lucrative ad clicks) such as 'best credit card' or 'lowest home mortgage', there are many web browser window configurations that show few, if any organic search engine results at all!
In other words, for searches that are profitable, Google has moved all the results it thinks are relevant off the first page and replaced them with results that people have paid to put there. Which is pretty much the definition of "evil" in the famous "don't be evil" slogan notoriously dropped in 2015. I'm pretty sure that no-one at executive level in Google thought that building a paid-search engine was a good idea, but the internal logic of the "slow AI" they built forced them into doing just that.

Another example is that Mark Zuckerberg's "personal challenge" for 2018 is to "fix Facebook". In Facebook Can't Be Fixed (MR) John Battelle writes:
You cannot fix Facebook without completely gutting its advertising-driven business model.

And because he is required by Wall Street to put his shareholders above all else, there’s no way in hell Zuckerberg will do that.

Put another way, Facebook has gotten too big to pivot to a new, more “sustainable” business model.
...
If you’ve read “Lost Context,” you’ve already been exposed to my thinking on why the only way to “fix” Facebook is to utterly rethink its advertising model. It’s this model which has created nearly all the toxic externalities Zuckerberg is worried about: It’s the honeypot which drives the economics of spambots and fake news, it’s the at-scale algorithmic enabler which attracts information warriors from competing nation states, and it’s the reason the platform has become a dopamine-driven engagement trap where time is often not well spent.
John Battelle's “Lost Context is also (MR).

I have personal experience of this problem. In the late 80s I foresaw a bleak future for Sun Microsystems. Its profits were based on two key pieces of intellectual property, the SPARC architecture and the Solaris operating system. In each case they had a competitor (Intel and Microsoft) whose strategy was to make owning that kind of IP too expensive for Sun to compete. I came up with a strategy for Sun to undergo a radical transformation into something analogous to a combination of Canonical and an App Store. I spent years promoting and prototyping this idea within Sun.

One of the reasons I have great respect for Scott McNealy is that he gave me, an engineer talking about business, a very fair hearing before rejecting the idea, saying "Its too risky to do with a Fortune 100 company". Another way of saying this is "too big to pivot to a new, more “sustainable” business model". In the terms set by Sun's "slow AI" Scott was right and I was wrong. Sun was taken over by Oracle in 2009; their "slow AI" had no answer for the problems I identified two decades earlier. But in those two decades Sun made its shareholders unbelievable amounts of money.

In Herbert's world of scholarly communication, a similar process can be seen at work in the history of open access (MR, my comments here). In May 1995 Stanford Libraries' HighWire Press pioneered the move of scholarly publishing to the Web by putting the Journal of Biological Chemistry on-line. Three years later, Vitek Tracz was saying:
with the Web technology available today, publishing can potentially happen independently of publishers. If authors started depositing their papers directly into a central repository, they could bypass publishers and make it freely available.
He started the first commercial open-access publisher, BioMed Central, in 2000 (the Springer "slow AI" bought it in 2008). In 2002 came the Budapest Open Access Initiative:
By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
Sixteen years later, the "slow AIs" which dominate scholarly publishing have succeeded in growing profits so much that Roger Schonfeld can tweet:
I want to know how anyone can possibly suggest that Elsevier is an enemy of open access. I doubt any company today profits more from OA and its growth!
What Elsevier means by "open access" is a long, long way from the Budapest definition. The Open Access advocates, none of them business people, set goals which implied the demise of Elsevier and the other "slow AIs" without thinking through how the "slow AIs" would react to this existential threat. The result was that the "slow AIs" perverted the course of "open access" in ways that increased their extraction of monopoly rents, and provided them with even more resources to buy up nascent and established competitors.

Elsevier's Research Infrastructure
Now the "slow AIs" dominate not just publishing, but the entire infrastructure of science. If I were Elsevier's "slow AI" I would immediately understand that Herbert's "research pods" needed to run on Elsevier's infrastructure. Given university IT departments current mania for outsourcing everything to "the cloud" this would be trivial to arrange. They've already done it to institutional repositories. Elsevier would then be able to, for example, use a Microsoft-like "embrace, extend and extinguish" strategy to exploit its control over researcher's pods.

Open access advocates point to the rise in the proportion of papers that are freely accessible. They don't point to the rise in payments to the major publishers, the added costs to Universities of dealing with the fragmented system, the highly restrictive licenses that allow "free access" in many cases, the frequency with which author processing charges are paid without resulting in free access, and all the other ills that the "slow AIs" have visited upon scholarly communication in the pursuit of profit growth.

What people mean by saying "the Web is centralized" is that it is dominated by a small number of extremely powerful "slow AIs", the FAANGs (Facebook, Apple, Amazon, Netflix, Google) and the big telcos. None of the discussion of the decentralized Web I've seen is about how to displace them, its all about building a mousetrap network infrastructure so much better along some favored axes that, magically, the world will beat a path to their door.

This is so not going to happen.

For example, you could build a decentralized, open source social network system. In fact, people did. It is called Diaspora and it launched in a blaze of geeky enthusiasm in 2011. Diaspora is one of the eight decentralization initiatives studied by the MIT Media Lab's Defending Internet Freedom through Decentralization (MR) report:
The alpha release of the Diaspora software was deeply problematic, riddled with basic security errors in the code. At the same time, the founders of the project received a lot of pressure from Silicon Valley venture capitalists to “pivot” the project to a more profitable business model. Eventually the core team fell apart and the Diaspora platform was handed over to the open source community, who has done a nice job of building out a support website to facilitate new users in signing up for the service. Today it supports just under 60,000 active participants, but the platform remains very niche and turnover of new users is high.
Facebook has 1.37*109 daily users, so it is about 22,800 times bigger than Diaspora. Even assuming Diaspora was as good as Facebook, an impossible goal for a small group of Eben Moglen's students, no-one had any idea how to motivate the other 99.996% of Facebook users to abandon the network where all their friends were and restart building their social graph from scratch. The fact that after 6 years Diaspora has 60K active users is impressive for an open source project, but it is orders of magnitude away from the scale needed to be a threat to Facebook. We can see this because Facebook hasn't bothered to react to it.

Suppose the team of students had been inspired, and built something so much better than Facebook along axes that the mass of Facebook users cared about (which don't include federation, censorship resistance, open source, etc.) that they started to migrate. Facebook's "slow AI" would have reacted in one of two ways. Either the team would have been made a financial offer they couldn't refuse, which wouldn't have made a dent in the almost $40B in cash and short-term investments on Facebook's balance sheet. Or Facebook would have tasked a few of their more than 1000 engineers to replicate the better system. They'd have had an easy job because (a) they'd be adding to an existing system rather than building from scratch, and (b) because their system would be centralized, so wouldn't have to deal with the additional costs of decentralization.

Almost certainly Facebook would have done both. Replicating an open source project in-house is very easy and very fast. Doing so would reduce the price they needed to pay to buy the startup. Hiring people good enough to build something better than the existing product is a big problem for the FAANGs. The easiest way to do it is to spot their startup early and buy it. The FAANGs have been doing this so effectively that it no longer makes sense to do a startup in the Valley with the goal of IPO-ing it; the goal is to get bought by a FAANG.

Lets see what happens when one of the FAANGs actually does see something as a threat. Last January Lina M. Kahn of the Open Markets team at the New America Foundation published Amazon's Antitrust Paradox (MR) in the Yale Law Review. Her 24,000-word piece got a lot of well-deserved attention for describing how platforms evade antitrust scrutiny. In August, Barry Lynn, Kahn's boss and the entire Open Markets team were ejected from the New America Foundation. Apparently, the reason was this press release commenting favorably on Google's €2.5 billion loss in an antitrust case in the EU. Lynn claims that:
hours after his press release went online, [New America CEO] Slaughter called him up and said: “I just got off the phone with Eric Schmidt and he is pulling all of his money,”
The FAANGs' "slow AIs" understand that antitrust is a serious threat. €2.5 billion checks get their attention, even if they are small compared to their cash hoards. The PR blowback from defenestrating the Open Markets team was a small price to pay for getting the message out that advocating for effective antitrust enforcement carried serious career risks.

This was a FAANG reacting to a law journal article and a press release. "All of his money" had averaged about $1M/yr over two decades. Imagine how FAANGs would react to losing significant numbers of users to a decentralized alternative!

Kahn argued that:
the current framework in antitrust—specifically its pegging competition to “consumer welfare,” defined as short-term price effects—is unequipped to capture the architecture of market power in the modern economy. We cannot cognize the potential harms to competition posed by Amazon’s dominance if we measure competition primarily through price and output. Specifically, current doctrine underappreciates the risk of predatory pricing and how integration across distinct business lines may prove anticompetitive. These concerns are heightened in the context of online platforms for two reasons. First, the economics of platform markets create incentives for a company to pursue growth over profits, a strategy that investors have rewarded. Under these conditions, predatory pricing becomes highly rational—even as existing doctrine treats it as irrational and therefore implausible. Second, because online platforms serve as critical intermediaries, integrating across business lines positions these platforms to control the essential infrastructure on which their rivals depend. This dual role also enables a platform to exploit information collected on companies using its services to undermine them as competitors.
In the 30s antitrust was aimed at preserving a healthy market by eliminating excessive concentration of market power. But:
Due to a change in legal thinking and practice in the 1970s and 1980s, antitrust law now assesses competition largely with an eye to the short-term interests of consumers, not producers or the health of the market as a whole; antitrust doctrine views low consumer prices, alone, to be evidence of sound competition. By this measure, Amazon has excelled; it has evaded government scrutiny in part through fervently devoting its business strategy and rhetoric to reducing prices for consumers.
Shop, Ikebukuro, Tokyo
The focus on low prices for "consumers" rather than "customers" is especially relevant for Google and Facebook; it is impossible to get monetary prices lower than those they charge "consumers". The prices they charge the "customers" who buy ad space from them are another matter, but they don't appear to be a consideration for current antitrust law. Nor is the non-monetary price "consumers" pay for the services of Google and Facebook in terms of the loss of privacy, the spam, the fake news, the malvertising and the waste of time.

Perhaps the reason for Google's dramatic reaction to the Open Markets team was that they were part of a swelling chorus of calls for antitrust action against the FAANGs from both the right and the left. Roger McNamee (previously) was an early investor in Facebook and friend of Zuckerberg's, but in How to Fix Facebook — Before It Fixes Us (MR) even he voices deep concern about Facebook's effects on society. He and ethicist Tristan Harris provide an eight-point prescription for mitigating them:
  1. Ban bots.
  2. Block further acquisitions.
  3. "be transparent about who is behind political and issues-based communication"
  4. "be more transparent about their algorithms"
  5. "have a more equitable contractual relationship with users"
  6. Impose "a limit on the commercial exploitation of consumer data by internet platforms"
  7. "consumers, not the platforms, should own their own data"
Why would the Facebook "slow AI" do any of these things when they're guaranteed to decrease its stock price? The eighth is straight out of Lina Kahn:
we should consider that the time has come to revive the country’s traditional approach to monopoly. Since the Reagan era, antitrust law has operated under the principle that monopoly is not a problem so long as it doesn’t result in higher prices for consumers. Under that framework, Facebook and Google have been allowed to dominate several industries—not just search and social media but also email, video, photos, and digital ad sales, among others—increasing their monopolies by buying potential rivals like YouTube and Instagram. While superficially appealing, this approach ignores costs that don’t show up in a price tag. Addiction to Facebook, YouTube, and other platforms has a cost. Election manipulation has a cost. Reduced innovation and shrinkage of the entrepreneurial economy has a cost. All of these costs are evident today. We can quantify them well enough to appreciate that the costs to consumers of concentration on the internet are unacceptably high.
McNamee understands that the only way to get Facebook to change its ways is the force of antitrust law.

Another of the initiatives studied by the MIT Media Lab's Defending Internet Freedom through Decentralization (MR) is Solid. They describe the project's goal thus:
Ultimately, the goal of this project is to render platforms like Facebook and Twitter as merely “front-end” services that present a user’s data, rather than silos for millions of people’s personal data. To this end, Solid aims to support users in controlling their own personal online datastore, or “pod,” where their personal information resides. Applications would generally run on the client-side (browser or mobile phone) and access data in pods via APIs based on HTTP.
In other words, to implement McNamee's #7 prescription.

Why do you think McNamee's #8 talks about the need to "revive the country’s traditional approach to monopoly"? He understands that having people's personal data under their control, not Facebook's, would be viewed by Facebook's "slow AI" as an existential threat. Exclusive control over the biggest and best personal data of everyone on the planet, whether or not they have ever created an account, is the basis on which Facebook's valuation rests.

The Media Lab report at least understands that there is an issue here:
The approach of Solid towards promoting interoperability and platform-switching is admirable, but it begs the question: why would the incumbent “winners” of our current system, the Facebooks and Twitters of the world, ever opt to switch to this model of interacting with their users? Doing so threatens the business model of these companies, which rely on uniquely collecting and monetizing user data. As such, this open, interoperable model is unlikely to gain traction with already successful large platforms. While a site like Facebook might share content a user has created–especially if required to do so by legislation that mandates interoperability–it is harder to imagine them sharing data they have collected on a user, her tastes and online behaviors. Without this data, likely useful for ad targeting, the large platforms may be at an insurmountable advantage in the contemporary advertising ecosystem.
The report completely fails to understand the violence of the reaction Solid will face from the FAANGs "slow AIs" if it ever gets big enough for them to notice.

Note that the report fails to understand that you don't have to be a Facebook user to have been extensively profiled. Facebook's "slow AI" is definitely not going to let go of the proprietary data it has collected (and in many cases paid other data sources for) about a person. Attempts to legislate this sharing in isolation would meet ferocious lobbying, and might well be unconstitutional. Nor is it clear that, even if legislation passed, the data would be in a form usable by the person, or by other services. History tends to show that attempts to force interoperability upon unwilling partners are easily sabotaged by them.

McNamee points out that, even if sharing were forced upon Facebook, it would likely do little to reduce their market power:
consumers, not the platforms, should own their own data. In the case of Facebook, this includes posts, friends, and events—in short, the entire social graph. Users created this data, so they should have the right to export it to other social networks. Given inertia and the convenience of Facebook, I wouldn’t expect this reform to trigger a mass flight of users. Instead, the likely outcome would be an explosion of innovation and entrepreneurship. Facebook is so powerful that most new entrants would avoid head-on competition in favor of creating sustainable differentiation. Start-ups and established players would build new products that incorporate people’s existing social graphs, forcing Facebook to compete again.
After all, allowing users to export their data from Facebook doesn't prevent Facebook maintaining a copy. And you don't need to be a Facebook user for them to make money from data they acquire about you. Note that, commendably, Google has for many years allowed users to download the data they create in the various Google systems (but not the data Google collects about them) via the Data Liberation Front, now Google TakeOut. It hasn't caused their users to leave.

No alternate social network can succeed without access to the data Facebook currently holds. Realistically, if this is to change, there will be some kind of negotiation. Facebook's going-in position will be "no access". Thus the going-in position for the other side needs to be something that Facebook's "slow AI" will think is much worse than sharing the data.

We may be starting to see what the something much worse might be. In contrast to the laissez-faire approach of US antitrust authorities, the EU has staked out a more aggressive position. It fined Google the €2.5 billion that got the Open Markets team fired. And, as Cory Doctorow reports (MR):
Back in 2016, the EU passed the General Data Protection Regulation, a far-reaching set of rules to protect the personal information and privacy of Europeans that takes effect this coming May.
Doctorow explains that these regulations require that:
Under the new directive, every time a European's personal data is captured or shared, they have to give meaningful consent, after being informed about the purpose of the use with enough clarity that they can predict what will happen to it. Every time your data is shared with someone, you should be given the name and contact details for an "information controller" at that entity. That's the baseline: when a company is collecting or sharing information about (or that could reveal!) your "racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, … [and] data concerning health or data concerning a natural person’s sex life or sexual orientation," there's an even higher bar to hurdle.
Pagefair has a detailed explanation of what this granting of granular meaningful consent would have to look like. It is not a viable user interface to the current web advertising ecosystem of real-time auctions based on personal information.
All of these companies need to get consent
Here is Pagefair's example of what is needed to get consent from each of them.

The start of a long, long chain of dialog boxes
Doctorow's take on the situation is:
There is no obvious way the adtech industry in its current form can comply with these rules, and in the nearly two years they've had to adapt, they've done virtually nothing about it, seemingly betting that the EU will just blink and back away, rather than exercise its new statutory powers to hit companies for titanic fines, making high profile examples out of a few sacrificial companies until the rest come into line.

But this is the same institution that just hit Google with a $2.73 billion fine. They're spoiling for this kind of fight, and I wouldn't bet on them backing down. There's no consumer appetite for being spied on online ... and the companies involved are either tech giants that everyone hates (Google, Facebook), or creepy data-brokers no one's ever heard of and everyone hates on principle (Acxiom). These companies have money, but not constituencies.

Meanwhile, publishers are generally at the mercy of the platforms, and I assume most of them are just crossing their fingers and hoping the platforms flick some kind of "comply with the rules without turning off the money-spigot" switch this May.
Pagefair's take is:
Websites, apps, and adtech vendors, should switch from using personal data to monetize direct and RTB advertising to “non-personal data”. Using non-personal, rather than personal, data neutralizes the risks of the GDPR for advertisers, publishers, and adtech vendors. And it enables them to address the majority (80%-97%) of the audience that will not give consent for 3rd party tracking across the web.
The EU is saying "it is impractical to monetize personal information". Since Facebook's and Google's business models depend on monetizing personal information, this is certainly looks like "something worse" than making it portable.

I remember at Esther Dyson's 2001 conference listening to the CEO of American Express explain how they used sophisticated marketing techniques to get almost all their customers to opt-in to information sharing. If I were Facebook's or Google's "slow AI" I'd be wondering if I could react to the GDPR by getting my users to opt-in to my data collection, and structuring things so they wouldn't opt-in to everyone else's. I would be able to use their personal information, but I wouldn't be able to share it with anyone else. That is a problem for everyone else, but for me its a competitive advantage.

It is hard to see how this will all play out:
  • The Chinese government is enthusiastic about enabling companies to monetize personal information. That way the companies fund the government's surveillance infrastructure:
    WeChat, the popular mobile application from Tencent Holdings, is set to become more indispensable in the daily lives of many Chinese consumers under a project that turns it into an official electronic personal identification system.
  • The US has enabled personal information to be monetized, but seems to be facing a backlash from both right and left.
  • The EU seems determined to eliminate, or at least place strict limits on, monetizing of personal information.
Balkanization of the Web seems more likely than decentralization.

If a decentralized Web doesn't achieve mass participation, nothing has really changed. If it does, someone will have figured out how to leverage antitrust to enable it. And someone will have designed a technical infrastructure that fit with and built on that discovery, not a technical infrastructure designed to scratch the itches of technologists.




37 comments:

David. said...

Richard Smith, ex-editor of the BMJ, has figured out what Elsevier's "slow AI" is doing:

"The company recognises that science publishing will become a service that scientists will largely run themselves. In a sense, it always has been with scientists producing the science, editing the journals, peer reviewing the studies, and then reading the journals. ... Elsevier have recognised the importance of this trend and are creating their own software platforms to speed up and make cheaper the process of publishing science.

But how, I wondered, can Elsevier continue to make such big profits from science publishing? Now, I think I see. The company thinks that there will be one company supplying publishing services to scientists—just as there is one Amazon, one Google, and one Facebook; and Elsevier aims to be that company. But how can it make big profits from providing a cheap service?

The answer lies in big data. ... Elsevier will come to know more about the world’s scientists—their needs, demands, aspirations, weaknesses, and buying patterns—than any other organisation. The profits will come from those data and that knowledge. The users of Facebook are both the customers and the product, and scientists will be both the customers and the product of Elsevier."

Unfortunately, he's a bit late to the game. I realized the Elsevier had figured this out almost a decade ago when I served on the jury for Elsevier's Grand Challenge . In 2010 I wrote:

"In 2008 I was on the jury for the Elsevier Grand Challenge, a competition Elsevier ran with a generous prize for the best idea of what could be done with access to their entire database of articles. This got a remarkable amount of attention from some senior managers. Why did they sponsor the competition? They understood that, over time, their ability to charge simply for access to the raw text of scholarly articles will go away. Their idea was to evolve to charging for services based on their database instead."

Ruben Verborgh said...

I don't think decentralization or Balkanization are the only options. Being a strong believer and creator of decentralized technology, I still intend to maintain my Facebook account for the years to come—while simultaneously also using decentralized storage and applications. There is room for both options to exist in parallel, and they would just have different usages.

I see a strong potential for Solid-like solutions in several business sectors, such as the legal and medical domains, where special data requirements make the decoupling of data and apps a strong advantage.

We should stop seeing decentralized solutions as competitors to Facebook, but rather as useful platforms in their own right, which already have relevant use cases.

David. said...

Ruben, if all you have are some applications in the "legal and medical domains", you haven't decentralized the Web, have you?

David. said...

"Zuckerberg’s announcement on Wednesday that he would be changing the Facebook News Feed to make it promote “meaningful interactions” does little to address the concerns I have with the platform." writes Roger McNamee.

Adrian Cockcroft said...

Hi David, it would be good to catch up sometime and have a chat about this. I’m now at AWS and though in some ways Cloud is centralizing it’s also decentralizing in other ways. A lot of companies are shutting down entire data centers and moving core backend systems to a more distributed cloud based architecture.

David. said...

Hi, Adrian!

Decentralization and distribution are two different things. Vitalik Buterin identifies three axes of (de)centralization:

- Architectural (de)centralization — how many physical computers is a system made up of? How many of those computers can it tolerate breaking down at any single time?

- Political (de)centralization — how many individuals or organizations ultimately control the computers that the system is made up of?

- Logical (de)centralization— does the interface and data structures that the system presents and maintains look more like a single monolithic object, or an amorphous swarm? One simple heuristic is: if you cut the system in half, including both providers and users, will both halves continue to fully operate as independent units?

As far as I can see, systems built on AWS may be architecturally decentralized (but are more likely just distributed), but are politically and logically centralized so, even if decentralization delivered its promised advantages, they would get few of them.

I believe that AWS is better at running data centers than companies. Whether they are enough better to outweigh the added risk that comes from correlations between the failure of my system and failures of other systems at AWS that my system depends upon (say my supply chain partners) is an interesting question.

As far as I can see the biggest selling point of AWS is that it provides the in-house IT organization some place to point fingers when things go wrong. Its the modern equivalent of "no-one ever got fired for buying IBM".

Sarven Capadisli said...

Thank you for putting this article together with great references.

Just to clarify (a minor point given everything that you've mentioned): the "Solid approach" doesn't mandate that we must all have our domains and run our personal online datastores (pod) there. It is certainly one of the ways of going at it. The bigger story there is that our assets are access controlled working alongside a global authentication mechanism. The bottom line there is about where one places their trust, eg. if a researcher trusts their institution to take care of their pod, that's all good - as exemplified in Herbert's talk.

To support your argument towards how the "big players" are not particularly concerned - at least in public at this time - about decentralised initiatives, we can simply look at their involvement in standardisation efforts. Namely speaking: the W3C Social Web Working Group which went on a long journey in coming up with recommendations towards interoperable protocols and data exchange on the Web, with emphasis on "social" stuff. The big players, while having W3C Member status did not participate in this WG. I think that speaks in volumes.

David. said...

Thanks for the clarification, Sarven!

My concern is, as I briefly mentioned in the post, that research institutions are in a desperate rush to outsource everything to do with IT to "The Cloud". There are a number of reasons, including the fact that they can't hire enough skilled people, and that outsourcing means that the CIO has some place to point the finger when things go wrong. "The Cloud" generally means AWS, but in the case of researcher's "pods" it would definitely mean Elsevier (who would probably run on AWS).

So the "pods" would all end up controlled by Elsevier. Which in the big picture would not be much of a change, and would probably end up being even more expensive than the system we have now.

Sarven Capadisli said...

I agree with you on the likelihood of Elsevier (or some other company) seamlessly offering pod services to institutions. While there are alarm bells all around that for never ending vendor lock-in - eg "embrace, extend and extinguish" - I suppose it might settle down on which core features of the system are non-proprietary. For instance, could institutions or researchers decide to pack up their bags and go to the next "hosting" provider, ie. continuous and ubiquitous offering of the research objects that are part of the commons? Would the generated data be reusable by any application that conform to some open standards? If I was shopping around for a service or a tool, I'd check to see if it passes an acid test along the lines of https://linkedresearch.org/rfc#acid-test. From that angle, the possibility of "controlled by Elsevier" could mean anything. Is the web hosting provider for my domain controlling my pod? They may be mining the data (including all the interactions that go with it) but I suppose I can't be certain. To escape that, I'll have to run my own hosting box.

I assume that most people agree that the major academic publishers will continue to milk the system. In the meantime, the best I think we (researchers and institutions) can do is embrace the feasibility of this lets-get-all-decentralised-bandwagon because the current path is not quite working in our favour. If we are lucky, the effect of this may be that the playing field will even out for new service/tool providers to participate, and maybe diversity in how one can participate - wishful thinking?

Just to contrast: ORCID is a great service for many, but what's the end result or some of the consequences of such effort? New scholarly systems are being built that solely recognise identifiers that virtually include ORCID's domain name. One is literally forbidden to participate / make their contributions to humanity - and be acknowledged by it - unless they create an ORCID account. This is not ORCID's fault but precisely what happens when we go all in on centralised systems, no matter how shiny or useful it may seem at first. Its design has consequences. DOI is precisely the same story. Your "scholarly" article doesn't have a DOI? It is considered to be equivalent to a random blogpost on the Web regardless of what it says, or dare I say, a "preprint". Again, this is not because ORCID or DOI are bad or wrong designs. The bigger question in my opinion is can we have different systems cooperate and be welcome to participate in the big Scholarly Bazaar?

For our little corner in Web Science - but not exclusive to - I've tried to capture some our problems and steps towards a paradigm shift which you might be interested in having a look: http://csarven.ca/web-science-from-404-to-200. Same old, same old :)

Kingsley Idehen said...

Hi David,

I little correction regarding the section about solid in this insightful article:

A WebID (HTTP URI that identifies an Agen) and its associated WebID-Profile doc (collection of RDF sentences using various notations) only require users to possess Read-Write privileges over a Folder (a/k/a LDPContainer instance these days). Fundamentally, domain ownership requirement is one of the problems with current storage models (e.g., systems where identification depends on ".well-known" pattern), in regards to a viable Read-Write Web etc..

Conventions outlined by the Solid Project enable PODs (or Personal Data Spaces) to be scoped to folders. Thus, Google Drive, Dropbox, OneDrive, Amazon AWS, Rackspace, Box. are all viable launch points [1].

[1] https://medium.com/virtuoso-blog/taking-control-of-access-to-resources-stored-in-google-drive-dropbox-microsoft-onedrive-box-and-d25ab3dd27d9 -- Mounting various Storage Services

PMcB said...

Hi David - fantastic article (I've an even bigger MR backlog now!). But I didn't understand your response to Ruben. Surely both centralised and decentralised systems have their place (e.g. Facebook 'cos all my mates are there already, or Facebook-decentralised 'cos they pay me per view, or allow non-Facebookers to comment on or annotate my posts, or whatever). Noone is suggesting *everything* must be decentralised for us to start 'decentralising the web', incrementally, as deemed appropriate by individuals, are they?

David. said...

PMcB, what people mean by "the Web is centralized" is that it is dominated by the FAANGs. If you want to change that, you have to displace some or all of the FAANGs. Otherwise the Web will still be centralized i.e. dominated by the FAANGs. Its fine to build something, e.g. Diaspora, that attracts say 60K users. But that hasn't changed the big picture at all.

What I'm trying to do is to get people to think about how to displace the FAANGs instead of thinking about the neat-oh technology they think it'd be cool to build for themselves and a few friends. Decentralizing the Web: It Isn't About The Technology.

PMcB said...

David - hhhmmm... Ok, I think I see where you're coming from now, and it's a good point. But for whatever reason, one word popped into my head after reading your response - Tesla. Maybe it'll take a social-media breach on the scale of Equifax to wake up the general populace to the consequences of Facebook et al, but regardless, like autonomous or electric-only cars, I think (purely personal opinion) that a large proportion of the Web will become decentralised (how 'large' is just a reflection of how naive you are I guess!), but my conviction is based on it simply being the more 'correct' thing to do - like electric cars (full disclosure: I'm a cyclist, and therefore inherently dislike all cars!). I know a counter-argument is 'but where's the money in decentralisation', but wasn't that exactly the argument 10 years ago with electric cars too (which seems so utterly myopic and 'missing the whole point' today)??

David. said...

Prof. Scott Galloway of NYU argues for breaking up the FAANGs in his talk to the DLD conference.

David. said...

Tesla delivered about 47K cars in the first half of 2017. In the whole of 2017 General Motors delivered 3M in the US, 4M in China. Lets say Tesla delivered 100K cars in 2017. In just these two regions GM alone sold 70 times as many.

Tesla is an amazing achievement, but its still a very small car company. Its a little more than 1/3 the size of Porsche. But the key point is that, right from the start with the Roadster, Elon Musk had an explanation for why people would buy the cars that wasn't simply that they were electric. They were fast and fun to drive.

"being the more 'correct' thing to do" is not an explanation for why people will migrate off the FAANGs.

David. said...

With Facebook And Google’s Surveillance Capitalism Model Is In Trouble, Paul Blumenthal joins in the calls for regulation. He too notices the looming impact of the GDPR:

"For the most part, Facebook and Google prevent you from using their products if you decline to agree to their entire terms of service. You cannot pick and choose what to agree to and still use their free services.

The GDPR changes that by requiring online companies, in some cases, to get permission from each individual user to collect, share and combine their data for use by advertisers. Companies will not be allowed to ban people from their services who decline to share their data for advertising purposes. There are 734 million EU residents who will soon be able to opt out of helping Facebook and Google make money. If companies do not comply with the new regulations they will face fines totaling four percent of their global revenues."

David. said...

Anti-trust is having an effect:

"Google's bet against the Commission cost it $2.73 billion; Qualcomm's out more than a billion, and Apple's got to pay $15.4 billion in evaded taxes.

These 9- and 10-figure invoices have made deathbed converts out of Big Tech, who are now in a made scramble to comply with the GDPR before its fines of the greater of 4% of global total profit or 20 million Euros kick in this May."

As Everett Dirksen said "A billion here, a billion there, pretty soon you're talking real money."

David. said...

Adam Ludwin's A Letter To Jamie Dimon is worth a read. Even though I don't agree with all of it, he makes some very good points, such as:

"Since Ethereum is a platform, its value is ultimately a function of the value of the applications built on top. In other words, we can ask if Ethereum is useful by simply asking if anything that has been built on Ethereum is useful. For example, do we need censorship resistant prediction markets? Censorship resistant meme playing cards? Censorship resistant versions of YouTube or Twitter?

While it’s early, if none of the 730+ decentralized apps built on Ethereum so far seem useful, that may be telling. Even in year 1 of the web we had chat rooms, email, cat photos, and sports scores. What are the equivalent killer applications on Ethereum today?"

David. said...

David Dayen's Tech Companies Are Under Pressure Everywhere Except Where It Matters looks at the pathetic state of anti-trust in the US, particularly at the Federal Trade Commission, and contrasts it with all the political rhetoric about the tech behemoths. As usual with politicians, you need to look at what they do not what they say.

David. said...

"Mining and oil companies exploit the physical environment; social media companies exploit the social environment. This is particularly nefarious because social media companies influence how people think and behave without them even being aware of it. This has far-reaching adverse consequences on the functioning of democracy, particularly on the integrity of elections.

The distinguishing feature of internet platform companies is that they are networks and they enjoy rising marginal returns; that accounts for their phenomenal growth. The network effect is truly unprecedented and transformative, but it is also unsustainable. It took Facebook eight and a half years to reach a billion users and half that time to reach the second billion. At this rate, Facebook will run out of people to convert in less than 3 years."

This is from George Soros' wide-ranging and very interesting remarks in Davos. He goes on:

"The exceptional profitability of these companies is largely a function of their avoiding responsibility for– and avoiding paying for– the content on their platforms.

They claim they are merely distributing information. But the fact that they are near- monopoly distributors makes them public utilities and should subject them to more stringent regulations, aimed at preserving competition, innovation, and fair and open universal access."

David. said...

Prof. Scott Galloway continues his argument for breaking up the FAANGs in Esquire. It is long but really worth reading:

"Why should we break up big tech? Not because the Four are evil and we’re good. It’s because we understand that the only way to ensure competition is to sometimes cut the tops off trees, just as we did with railroads and Ma Bell. This isn’t an indictment of the Four, or retribution, but recognition that a key part of a healthy economic cycle is pruning firms when they become invasive, cause premature death, and won’t let other firms emerge. The breakup of big tech should and will happen, because we’re capitalists."

David. said...

German court rules Facebook use of personal data illegal by Hans-Edzard Busemann & Nadine Schimroszik at Reuters reveals that:

"a court had found Facebook’s use of personal data to be illegal because the U.S. social media platform did not adequately secure the informed consent of its users.

The verdict, from a Berlin regional court, comes as Big Tech faces increasing scrutiny in Germany over its handling of sensitive personal data that enables it to micro-target online advertising.

The Federation of German Consumer Organisations (vzvb) said that Facebook’s default settings and some of its terms of service were in breach of consumer law, and that the court had found parts of the consent to data usage to be invalid. "

David. said...

Chuck McManis points me to rowland Manthorpe's Google’s nemesis: meet the British couple who took on a giant, won... and cost it £2.1 billion. It is a fascinating account of the history behind the EU's anti-trust fine on Google.

David. said...

Chris Dixon's Why Decentralization Matters is sort-of-half-right:

"The question of whether decentralized or centralized systems will win the next era of the internet reduces to who will build the most compelling products, which in turn reduces to who will get more high quality developers and entrepreneurs on their side."

The first part of the sentence is right, the second part is techno-optimism like most of the rest of the essay.

David. said...

Sam D'Amico argues that it is about the technology, in a sense.

David. said...

What happens if you give an AI control over a corporation? is an interesting question. Clive Thompson points to a paper by UCLA law professor Lynn Lopucki:

"Odds are high you'd see them emerge first in criminal enterprises, as ways of setting up entities that engage in nefarious activities but cannot be meaningfully punished (in human terms, anyway), even if they're caught, he argues. Given their corporate personhood in the US, they'd enjoy the rights to own property, to enter into contracts, to legal counsel, to free speech, and to buy politicians -- so they could wreak a lot of havoc."

Not that the current "slow AIs" can be "meaningfully punished" if they engage in "nefarious activities".

David. said...

Lina Kahn's The Supreme Court Case That Could Give Tech Giants More Power reports:

"But the decision in a case currently before the Supreme Court could block off that path, by effectively shielding big tech platforms from serious antitrust scrutiny. On Monday the Court heard Ohio v. American Express, a case centering on a technical but critical question about how to analyze harmful conduct by firms that serve multiple groups of users. Though the case concerns the credit card industry, it could have sweeping ramifications for the way in which antitrust law gets applied generally, especially with regards to the tech giants."

David. said...

Shira Ovide's How Amazon’s Bottomless Appetite Became Corporate America’s Nightmare concludes:

"Amazon is far from invulnerable. All the same old red flags are there—a puny 2.7 percent e-commerce profit in North America, massive outlays to establish delivery routes abroad—but few are paying attention. Anyone buying a share of Amazon stock today is agreeing to pay upfront for the next 180 years of profit. By one measure, it’s generating far less cash than investors believe. And its biggest risk may be the fear of its power in Washington, New York, and Brussels, a possible prelude to regulatory crackdown." [my emphasis]

David. said...

"An index of 10 tech growth shares pushed its advance to 23 percent so far this year, giving the group an annualized return since early 2016 of 67 percent. That frenzied pace tops the Nasdaq Composite Index’s 66 percent return in the final two years of the dot-com bubble." writes Lu Wang at Bloomberg:

"In addition to the quartet of Facebook, Amazon, Netflix and Google, the NYSE index also includes Apple, Twitter, Alibaba, Baidu, Nvidia and Tesla. These companies have drawn money as investors bet that their dominance in areas from social media to e-commerce will foster faster growth. ... At 64 times earnings, the companies in the NYSE FANG+ Index are valued at a multiple that’s almost three times the broader gauge’s. That compared with 2.7 in March 2000."

David. said...

Cory Doctorow points out that:

"In 2007, the Guardian's Victor Keegan published "Will MySpace ever lose its monopoly?" in which he enumerated the unbridgeable moats and unscalable walls that "Rupert Murdoch's Myspace" had erected around itself, evaluating all the contenders to replace Myspace and finding them wanting."

I could be wrong! But that was then and this is now. Facebook is far bigger and more embedded than MySpace ever was.

David. said...

In The Real Villain Behind Our New Gilded Age Eric Posner and Glen Weyl reinforce the meme that the absence of effective antitrust policy is the cause of many of society's ills, including inequality, slow economic growth and the FAANGs:

"the rise of the internet has bestowed enormous market power on the tech titans — Google, Facebook — that rely on networks to connect users. Yet again, antitrust enforcers have not stopped these new robber barons from buying up their nascent competitors. Facebook swallowed Instagram and WhatsApp; Google swallowed DoubleClick and Waze. This has allowed these firms to achieve near-monopolies over new services based on user data, such as training machine learning and artificial intelligence systems. As a result, antitrust authorities allowed the creation of the world’s most powerful oligopoly and the rampant exploitation of user data."

David. said...

People like convenience more than privacy – so no, blockchain will not 'decentralise the web' by Matt Asay makes the same point:

"It's not that a blockchain-based web isn't possible. After all, the original web was decentralised, too, and came with the privacy guarantees that blockchain-based options today purport to deliver. No, the problem is people.

As user interface designer Brennan Novak details, though the blockchain may solve the crypto crowd's privacy goals, it fails to offer something as secure and easy as a (yes) Facebook or Google login: 'The problem exists somewhere between the barrier to entry (user-interface design, technical difficulty to set up, and overall user experience) versus the perceived value of the tool, as seen by Joe Public and Joe Amateur Techie.'"

David. said...

Cory Doctorow's Facebook is worth much less to its users than search and email, but it keeps a larger share of the value reports on an experiment by:

" Economists Erik Brynjolfsson, Felix Eggers and Avinash Gannamaneni have published an NBER paper (Sci-Hub mirror) detailing an experiment where they offered Americans varying sums to give up Facebook, and then used a less-rigorous means to estimate much much Americans valued other kinds of online services: maps, webmail, search, etc.

They concluded that 20% of Facebook's users value the service at less than a dollar a month, and at $38/month, half of Facebook's users would quit.

Search is the most valued online service -- the typical American won't give up search for less than $17,500/year -- while social media is the least valuable ($300)."

David. said...

"In 1993, John Gilmore famously said that “The Internet interprets censorship as damage and routes around it.” That was technically true when he said it but only because the routing structure of the Internet was so distributed. As centralization increases, the Internet loses that robustness, and censorship by governments and companies becomes easier." from Censorship in the Age of Large Cloud Providers by Bruce Schneier.

I believe that current efforts to decentralize the Web won't be successful for economic reasons. Bruce adds another reason, because governments everywhere prefer that it be centralized. But that doesn't mean either of us think decentralizing the Web isn't important.

David. said...

In his must-read Intel and the Danger of Integration, Ben Thompson describes how Intel became the latest large "slow AI" to get trapped defending its margins:

"The company’s integrated model resulted in incredible margins for years, and every time there was the possibility of a change in approach Intel’s executives chose to keep those margins."

Actually, Intel's slow AI forced them to make that choice, just as Sun's did.

David. said...

Rachel M Cohen's Has the New America Foundation Lost its Way? lays out the way "slow AIs" took over the New America Foundation and led to the defenestration of the Open Markets team I discussed above.

David. said...

Perhaps it is simpler to say that Intel…was disrupted by Steven Sinofsky is fascinating commentary on Ben Thompson's Intel and the Danger of Integration:

"Disruption is never one feature, but full set of *assumptions* that go into a business."

That is another way of putting the "slow AI" concept.