Tuesday, April 21, 2015

The Ontario Library Research Cloud

One of the most interesting sessions at the recent CNI was on the Ontario Library Research Cloud (OLRC). It is a collaboration between universities in Ontario to provide a low-cost, distributed, mutually owned private storage cloud with adequate compute capacity for uses such as text-mining. Below the fold, my commentary on their presentations.

For quite some time I've been greeted by skepticism as I've argued that, once you  get to a reasonable scale, commercial cloud storage is significantly more expensive than doing it yourself. It was very nice to hear a talk that agreed with me.

Admittedly, Ontario is a nearly ideal environment for a collaborative private storage cloud. It has nearly 40% of Canadians, almost all concentrated together close to the US border, and a quarter of the top Canadian Universities. And the Universities have a long history of collaborating. Among them are ORION, a shared high-bandwidth network connecting the campuses, and the striking success of Scholar's Portal, which ingests the e-journals to which Ontario subscribes and provides local access to them. They currently have about 38M articles, about 610K e-books.

Scholar's Portal are branching out to act as a data repository. Their partners estimated that their storage needs would grow rapidly to over a petabyte. OLRC's goals for a shared storage infrastructure were four-fold:
They estimated the cost for using commercial cloud services, even though they would not meet the other three goals, and were confident that using off-the-shelf hardware and open source software they could build a system that would provide significant savings.

They received a grant from the provincial government to cover the cost of the initial hardware, but the grant conditions meant they had only three months to purchase it. This turned out to be a major constraint on the procurement. Dell supplied 4.8PB of raw disk in 77 MD1200 shelves, and 19 PowerEdge R720xd heads for a total usable storage capacity of 1.2PB. Two nodes were in Toronto, one each in Ottawa, Kingston, Guelph. They were connected by 10G Ethernet VLAN links.

The software is the Swift OpenStack open source object storage infrastructure. This is hardware agnostic, so future hardware procurement won't be so constrained.

The partners initially set up a test network with three nodes and ran tests to see what the impact of Bad Things happening would be on the network. The bottom line is that 1G Ethernet is the absolute minimum you need - recovering from the loss of a 48TB shelf over a 1G link took 8 days.

As I've pointing out for some time, archives are going through a major change in access pattern. Scholars used to access a few individual items from an archive, but increasingly they want to mine from the entire corpus. OLRC realized that they needed to provide this capability, so they added another 5 R720xd servers as a compute cluster.

Now that the system is up and running, they have actual costs to work with. OLRC's next step is to figure out pricing, which they are confident will be significantly less than commercial clouds. I will be very interested to follow their progress - this is exactly how Universities should collaborate to build affordable infrastructure.

1 comment:

  1. For what its worth, after working pretty deeply in the OpenStack Swift system for about a year, I'm bit less sanguine in its use as an agnostic backend for archival stores (I worked on the Seagate/Evault LTS2 Project which initially used it as a backend).

    It has a lot of funny quirks and hacked together design that make it dangerous in a way. It has almost too much passive safety where without a lot of monitoring it will keep chugging along accepting data even when half the disks of the system have failed.

    That said, its open source and a (relatively) straightforward architecture at a high level.