The Internet Archive has by far the
largest archive of Web content but its preservation leaves much to be desired. The collection is mirrored between San Francisco and Richmond in the Bay Area, both uncomfortably close to the same major fault systems. There are partial copies in the Netherlands and Egypt, but they are not synchronized with the primary systems.
Now, Andrea Goethals and her co-authors from the
IIPC Preservation Working Group have a paper entitled
Facing the Challenge of Web Archives Preservation Collaboratively that reports on a survey of Web archives' preservation activities in the following areas; Policy, Access, Preservation Strategy, Ingest, File Formats and Integrity. They conclude:
This survey also shows that long term preservation planning and
strategies are still lacking to ensure the long term preservation of web
archives. Several reasons may explain this situation: on one hand, web
archiving is a relatively recent field for libraries and other heritage
institutions, compared for example with digitization; on the other hand,
web archives preservation presents specific challenges that are hard to
meet.
I discussed the problem of creating and maintaining a remote backup of the Internet Archive's collection in
The Opposite of LOCKSS. The Internet Archive isn't alone in having less than ideal preservation of its collection. It's clear the major challenges are the storage and bandwidth requirements for Web archiving, and their rapid growth. Given the limited resources available, and the
inadequate reliability of current storage technology, prioritizing collecting more content over preserving the content already collected is appropriate.