Thursday, January 10, 2019

Digital Preservation Network Is No More

In Why Is the Digital Preservation Network Disbanding? Roger Schonfeld examines the demise of the Digital Preservation Network which was announced last month:
An initial announcement said directly that "After careful analysis of the Digital Preservation Network's membership, operating model, and finances, the Board of Trustees of DPN passed a resolution to affect an orderly wind-down of DPN," including committing to consultations with each member to ensure that content would not be lost in the wind-down. Shortly thereafter, messages came out from DPN's hubs, both individually including HathiTrust, and collectively, characterizing their operating and financial strength and ability to provide for an orderly transition. Because DPN was not itself directly preserving anything but rather a broker for preservation services by underlying repositories, it does not appear that any content will be put at risk.
Below the fold, I look at various views of the lessons to be learned.

I'm often critical of Ithaka's and Schonfeld's work, so it is important to start by saying that I agree with much of what he wrote. He starts thus:
The vision for the Digital Preservation Network (DPN) was outlined in an early overview that illustrates much of the founders' thinking. It was established to solve two problems at once. First, its founders observed that not all preservation services at the time were as resilient as one might hope, and so, "the heart of DPN is a commitment to replicate the data and metadata of research and scholarship across diverse software architectures, organizational structures, geographic regions, and political environments." Second, as far too little scholarly content was being preserved, DPN would also enable existing preservation capacity to be utilized for a wider array of purposes, recognizing that, "once that infrastructure is in place, it can be extended at much lower marginal costs." In one sense, DPN thereby offered an elegant technical solution. But as elegant as it may have been technically, its product offering was never as clear as it could have been. And as much as it accomplished, it ultimately could not be sustained.
Schonfeld is correct that diversity was a theme of DPN from the start, but he doesn't have the full story. I took part in the very first meeting that led to the DPN. It was hosted on his mega-yacht by the leader of an organization that held a vast video collection of significant academic interest, which he was anxious to see preserved for posterity. The leader was very wealthy, with the kind of donor potential that ensured the attendance of a number of major University Librarians including James Hilton (Michigan) and Michael Keller (Stanford). I was one of the few technical people in the meeting. We pointed out that the scale of the video collection meant that the combined resources of the assembled libraries could not possibly store even a single copy, much less the multiple copies needed for robust preservation.

Each of the libraries represented had made significant investments in establishing an institutional repository, which was under-utilized due to the difficulty of persuading researchers to deposit materials. With the video collection out of the picture as too expensive, the librarians seized on diversity as the defense against the monoculture threat to preservation. In my view there were two main reasons:
  • Replicating pre-ingested content from other institutions was a quicker and easier way to increase the utilization of their repository than educating faculty.
  • Jointly marketing a preservation service that, through diversity, would be more credible than those they could offer individually was a way of transferring money from other libraries' budgets to their repositories' budgets.
Alas, this meant that the founders' incentives were not aligned with their customers'. Despite this, the marketing part worked well. As Schonfeld writes:
It was comparatively easy to get several dozen libraries to sign up to pay a $20,000 annual membership fee, especially after Hilton met with AAU presidents and pitched an early vision of DPN to them. One reader suggested to me that initial sign-ups may have been more out of courtesy or community citizenship than commitment.
The reader was right. Another example of this phenomenon is the contrast between LOCKSS, marketed to librarians, and the greater participation in Ithaka's Portico, marketed initially to University presidents by Bill Bowen, ex-President of Princeton and ex-President of the Andrew W. Mellon Foundation.

Schonfeld lists reasons for DPN's failure, with most of which I concur:
  • "[DPN] had a strong technical vision, but a clear product offering took time to emerge and the value proposition was not uniformly understood." I disagree that DPN's technical vision was strong. Given the commitment to infrastructure diversity, it was limited to implementing transfer of content and metadata, and fixity auditing between diverse mutually trusting repositories. Even this limited capability took far too long to achieve production, especially given that the basic functions were widely available off-the-shelf. As always, reconciling diverse metadata standards was a stumbling block, but this is more of an organizational than a technical issue. Members cannot be faulted for observing that a "clear product offering" was taking far too long to reach production.
  • "CIOs and others were comfortable that cloud solutions were secure enough for almost all purposes. ... DPN and its members were, ... unsuccessful in distinguishing the added value of a preservation solution from cloud storage." This is a continuing problem, compounded by accounting systems that subject (large but intermittent) capital expenses to more severe scrutiny than (smaller but recurring) operational expenses that, capitalized over an appropriate planning horizon, may be significantly larger. I will return to it in the report on cloud storage on which I'm about to start work.
  • "While memberships have proved to be a durable way to fund certain kinds of "clubs" such as professional organizations, they seem to be misaligned when there is a need to match value with actual products or services. Membership organizations necessarily seek a degree of consensus in their governing and cross-subsidization in their fee structures, while product organizations need to be able to deliver a clear solution to a well-defined problem for a reasonable price and to adapt their approach aggressively as marketplace conditions dictate." Cameron Neylon wrote the book blog on this problem in 2016 with Squaring Circles: The economics and governance of scholarly infrastructures, and I commented at length here. Neylon wrote:
    Membership models can work in those cases where there are club goods being created which attract members. Training experiences or access to valued meetings are possible examples. In the wider world this parallels the "Patreon" model where members get exclusive access to some materials, access to a person (or more generally expertise), or a say in setting priorities. Much of this mirrors the roles that Scholarly Societies play or at least could play.
    In summary, DPN's "club goods" were not seen as justifying the membership dues. Neylon subsequently expanded his post into Sustaining Scholarly Infrastructures through Collective Action: The Lessons that Olson can Teach us. In a fascinating read he applies the analysis of Mancur Olson's The Logic of Collective Action: Public Goods and the Theory of Groups.
  • British Library Real Income
    "the academic community, and its libraries in particular, have moved into a period of more rigorous value assessment and more strategic resource allocation, ... the library and scholarly communications communities have created more small independent collaborative organizations than we can possibly sustain. We need, and are experiencing, consolidation. This is the crux of the matter. Preservation Is Not A Technical Problem, it is an economic problem. We know how to do it, we just don't want to pay enough from continually decreasing library discretionary funds to have it done that way. DPN lacked Portico's focused marketing pitch, lacked LOCKSS' focus on cost minimization, and lacked both of their business model's justification of protecting investment in expensive paywalled content. Even once DPN achieved production it was little used and thus vulnerable:
    only 27 members ever deposited content into DPN.
    DPN’s financial model needed members to submit additional content beyond 5 TB annual deposit to succeed. Only 2 of 60 members did.
Schonfeld is certainly right that a necessary consolidation is under way:
Consolidation in this sector can arise in one of two ways - through pivots and shut-downs as services decline and organizations fail, or through value-creating reorganizations and mergers among organizations that are doing fine but should be able to deliver more.
But I think he underestimates the risks. The differences in quality between different preservation services are evident only in the long term, the difference in price is evident immediately. And, importantly:
While research universities and cultural heritage institutions are innately long-running, they operate on that implicitly rather than by making explicit long-term plans.
This gives rise to a Gresham's Law of preservation, in which low-quality services, economizing on replication, metadata, fixity checks and so on, out-compete higher-quality services. Services such as DPN and Duracloud, which act as brokers layered on top of multiple services and whose margins are thus stacked on those of the underlying services, find it hard to deliver enough value to justify their margins, and are strongly incentivized to use the cheapest among the available services.

The point of DPN was to use organizational and technological diversity to mitigate monoculture risk. Unfortunately, as Brian Arthur's 1994 Increasing Returns and Path Dependence in the Economy described, technology markets have increasing returns to scale. Thus as services scale up diversity becomes an increasingly expensive luxury, and monoculture risk increases. Fortunately, the winner then acquires too-big-to-fail robustness (see Amazon).

No comments: