The Internet Archive has by far the largest archive of Web content
but its preservation leaves much to be desired. The collection is mirrored between San Francisco and Richmond in the Bay Area, both uncomfortably close to the same major fault systems. There are partial copies in the Netherlands and Egypt, but they are not synchronized with the primary systems.
Now, Andrea Goethals and her co-authors from the IIPC Preservation Working Group
have a paper entitled Facing the Challenge of Web Archives Preservation Collaboratively
that reports on a survey of Web archives' preservation activities in the following areas; Policy, Access, Preservation Strategy, Ingest, File Formats and Integrity. They conclude:
This survey also shows that long term preservation planning and
strategies are still lacking to ensure the long term preservation of web
archives. Several reasons may explain this situation: on one hand, web
archiving is a relatively recent field for libraries and other heritage
institutions, compared for example with digitization; on the other hand,
web archives preservation presents specific challenges that are hard to
I discussed the problem of creating and maintaining a remote backup of the Internet Archive's collection in The Opposite of LOCKSS
. The Internet Archive isn't alone in having less than ideal preservation of its collection. It's clear the major challenges are the storage and bandwidth requirements for Web archiving, and their rapid growth. Given the limited resources available, and the inadequate reliability of current storage technology
, prioritizing collecting more content over preserving the content already collected is appropriate.