Friday, July 24, 2015

Amazon owns the cloud

Back in May I posted about Amazon's Q1 results, the first in which they broke out AWS, their cloud services, as a separate item. The bottom line was impressive:
AWS is very profitable: $265 million in profit on $1.57 billion in sales last quarter alone, for an impressive (for Amazon!) 17% net margin.
Again via Barry Ritholtz, Re/Code reports on Q2:
Amazon Web Services, ... grew its revenue by 81 percent year on year in the second quarter. It grew faster and with higher profit margins than any other aspect of Amazon’s business.

AWS, which offers leased computing services to businesses, posted revenue of $1.82 billion, up from $1 billion a year ago, as part of its second-quarter results.

By comparison, retail sales in North America grew only 26 percent to $13.8 billion from $11 billion a year ago.

The cloud computing business also posted operating income of $391 million — up an astonishing 407 percent from $77 million at this time last year — for an operating margin of 21 percent, making it Amazon’s most profitable business unit by far. The North American retail unit turned in an operating margin of only 5.1 percent.
Revenue growing at 81% year-on-year at a 21% and growing margin despite:
price competition from the likes of Google, Microsoft and IBM.
Amazon clearly dominates the market, the competition is having no effect on their business. As I wrote nearly a year ago, based on Benedict Evans' analysis:
Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into starting and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.
Unfortunately, S3 is part of AWS for reporting purposes, so we can't see the margins for the storage business alone. But I've been predicting for years that if we could, we would find them to be very generous.

Wednesday, July 15, 2015

Be Careful What You Wish For

Richard Poynder has a depressing analysis of the state of Open Access entitled HEFCE, Elsevier, the “copy request” button, and the future of open access and Bjoern Brembs has a related analysis entitled What happens to publishers that don’t maximize their profit?. They contrast vividly with the Director Zhang's vision for China's National Science Library., and Rolf Schimmer's description of the Max Planck Institute's plans. I expressed doubt that Schimmer's plan would prevent Elsevier ending up with all the money. Follow me below the fold to see how much less optimistic Brembs and Poynder are than I was.

Tuesday, July 7, 2015

IIPC Preservation Working Group

The Internet Archive has by far the largest archive of Web content but its preservation leaves much to be desired. The collection is mirrored between San Francisco and Richmond in the Bay Area, both uncomfortably close to the same major fault systems. There are partial copies in the Netherlands and Egypt, but they are not synchronized with the primary systems.

Now, Andrea Goethals and her co-authors from the IIPC Preservation Working Group have a paper entitled Facing the Challenge of Web Archives Preservation Collaboratively that reports on a survey of Web archives' preservation activities in the following areas; Policy, Access, Preservation Strategy, Ingest, File Formats and Integrity. They conclude:
This survey also shows that long term preservation planning and strategies are still lacking to ensure the long term preservation of web archives. Several reasons may explain this situation: on one hand, web archiving is a relatively recent field for libraries and other heritage institutions, compared for example with digitization; on the other hand, web archives preservation presents specific challenges that are hard to meet.
I discussed the problem of creating and maintaining a remote backup of the Internet Archive's collection in The Opposite of LOCKSS. The Internet Archive isn't alone in having less than ideal preservation of its collection. It's clear the major challenges are the storage and bandwidth requirements for Web archiving, and their rapid growth. Given the limited resources available, and the inadequate reliability of current storage technology, prioritizing collecting more content over preserving the content already collected is appropriate.

Tuesday, June 30, 2015

Blaming the Victim

The Washington Post is running a series called Net of Insecurity. So far it includes:
  • A Flaw In The Design, discussing the early history of the Internet and how the difficulty of getting it to work at all and the lack of perceived threats meant inadequate security.
  • The Long Life Of A Quick 'Fix', discussing the history of BGP and the consistent failure of attempts to make it less insecure, because those who would need to take action have no incentive to do so.
  • A Disaster Foretold - And Ignored,  discussing L0pht and how they warned a Senate panel 17 years ago of the dangers of Internet connectivity but were ignored.
Perhaps a future article in the series will describe how successive US administrations consistently strove to ensure that encryption wasn't used to make systems less insecure and, the encryption that was used was as weak as possible. They prioritized their (and their opponents) ability to spy over mitigating the risks that Internet users faced, and they got what they wanted. As we see with the compromise of the Office of Personnel Management and the possibly related compromise of health insurers including Anthem. These breaches revealed the kind of information that renders everyone with a security clearance vulnerable to phishing and blackmail. Be careful what you wish for!

More below the fold.

Tuesday, June 23, 2015

Future of Research Libraries

Bryan Alexander reports on a talk by Xiaolin Zhang, the head of the National Science Library at the Chinese Academy of Sciences (CAS), on the future of research libraries.
Director Zhang began by surveying the digital landscape, emphasizing the ride of ebooks, digital journals, and machine reading. The CAS decided to embrace the digital-first approach, and canceled all print subscriptions for Chinese-language journals. Anything they don’t own they obtain through consortial relationships ...

This approach works well for a growing proportion of the CAS constituency, which Xiaolin referred to as “Generation Open” or “Generation Digital”. This group benefits from – indeed, expects – a transition from print to open access. For them, and for our presenter, “only ejournals are real journals. Only smartbooks are real books… Print-based communication is a mistake, based on historical practicality.” It’s not just consumers, but also funders who prefer open access.
Below the fold, some thoughts on Director Zhang's vision.

Friday, June 19, 2015

EE380 talk on eBay storage

Russ McElroy & Farid Yavari gave a talk to Stanford's EE380 course describing how eBay's approach to storage (YouTube) is driven by their Total Cost of Ownership (TCO) model. As shown in this screengrab, by taking into account all the cost elements, they can justify the higher capital cost of flash media in much the way, but with much more realistic data and across a broader span of applications, that Ian Adams, Ethan Miller and I did in our 2011 paper Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes.

We were inspired by a 2009 paper FAWN A Fast Array of Wimpy Nodes in which David Andersen and his co-authors from C-MU showed that a network of large numbers of small CPUs coupled with modest amounts of flash memory could process key-value queries at the same speed as the networks of beefy servers used by, for example, Google, but using 2 orders of magnitude less power.

As this McElroy slide shows, power cost is important and it varies over a 3x range (a problem for Kaminska's thesis about the importance of 21 Inc's bitcoin mining hardware). He specifically mentions the need to get the computation close to the data, with ARM processors in the storage fabric. In this way the amount of data to be moved can be significantly reduced, and thus the capital cost, since as he reports the cost of the network hardware is 25% of the cost of the rack, and it burns a lot of power.

At present, eBay relies on tiering, moving data to less expensive storage such as consumer hard drives when it hasn't been accessed in some time. As I wrote last year:
Fundamentally, tiering like most storage architectures suffers from the idea that in order to do anything with data you need to move it from the storage medium to some compute engine. Thus an obsession with I/O bandwidth rather than what the application really wants, which is query processing rate. By moving computation to the data on the storage medium, rather than moving data to the computation, architectures like DAWN and Seagate's and WD's Ethernet-connected hard disks show how to avoid the need to tier and thus the need to be right in your predictions about how users will access the data.
That post was in part about Facebook's use of tiering, which works well because Facebook has highly predictable data access patterns. McElroy's talk suggests that eBay's data accesses are somewhat predictable, but much less so than Facebook's. This makes his implication that tiering isn't a good long-term approach plausible.

Tuesday, June 16, 2015

Alphaville on Bitcoin

I'm not a regular reader of the Financial Times, so I really regret I hadn't noticed that Izabella Kaminska and others at the FT's Alphaville blog have been posting excellent work in their BitcoinMania series. For a taste, see Bitcoin's lien problem, in which Kaminska discusses the problems caused by the fact that the blockchain records the transfer of assets but not the conditions attached to the transfer:
For example, let's hypothesise that Tony Soprano was to start a bitcoin loan-sharking operation. The bitcoin network would have no way of differentiating bitcoins being transferred from his account with conditions attached - such as repayment in x amount of days, with x amount of points of interest or else you and your family get yourself some concrete boots รข€” and those being transferred as legitimate and final settlement for the procurement of baked cannoli goods.

Now say you've lost all the bitcoin you owe to Tony Soprano on the gambling website Satoshi Dice. What are the chances that Tony forgets all about it and offers you a clean slate? Not high. Tony, in all likelihood, will pursue his claim with you.
She reports work by George K. Fogg at Perkins Coie on the legal status of Tony's claim:
Indeed, given the high volume of fraud and default in the bitcoin network, chances are most bitcoins have competing claims over them by now. Put another way, there are probably more people with legitimate claims over bitcoins than there are bitcoins. And if they can prove the trail, they can make a legal case for reclamation.

This contrasts considerably with government cash. In the eyes of the UCC code, cash doesn't take its claim history with it upon transfer. To the contrary, anyone who acquires cash starts off with a clean slate as far as previous claims are concerned. ... According to Fogg there is currently only one way to mitigate this sort of outstanding bitcoin claim risk in the eyes of US law. ... investors could transform bitcoins into financial assets in line with Article 8 of the UCC. By doing this bitcoins would be absolved from their cumbersome claim history.

The catch: the only way to do that is to deposit the bitcoin in a formal (a.k.a licensed) custodial or broker-dealer agent account.
In other words, to avoid the lien problem you have to submit to government regulation, which is what Bitcoin was supposed to escape from. Government-regulated money comes with a government-regulated dispute resolution system. Bitcoin's lack of a dispute resolution system is seen in the problems Ross Ulbricht ran in to.

Below the fold, I start from some of Kaminska's more recent work and look at another attempt to use the blockchain as a Solution to Everything.