Thursday, December 12, 2013

UK National Archive

Joe Fay at The Register has an interesting piece about a tour of the UK National Archive.

The archive has an excellent and comprehensive approach to preserving the UK government's Web presence:
It uses a crawler to trawl the UK government’s web estate, aiming to hit sites every six months. With the government looking to shutter many obscure or unloved sites, the pressure is on. The web archive currently stands at around 80TB, with the crawler pulling in 1.6TB a month. At time of writing, there are 3 billion urls in the archive, with 1 billion captured last year alone.But does anyone really care? Seems like they do. Espley said the archive gets around 15 to 20 million page views a month. This often maps to current events - the assumption being that visitors are often cross checking current government positions/statements against previous positions.
One must hope that the cross-checking doesn't turn up anything embarrassing  enough to imperil the Archive's budget ...

No comments:

Post a Comment