Friday, January 18, 2008

Digital Preservation for the "Google Generation"

A study by researchers at University College London sponsored by the British Library and JISC supports a point I've been making since the start of the LOCKSS program nearly a decade ago.
"The report Information Behaviour of the Researcher of the Future (PDF format; 1.67MB) also shows that research-behaviour traits that are commonly associated with younger users – impatience in search and navigation, and zero tolerance for any delay in satisfying their information needs – are now becoming the norm for all age-groups, from younger pupils and undergraduates through to professors."

What this means for digital preservation is that transparency of access is essential. Readers don't have the patience and attention span to jump through hoops to obtain access to preserved materials. The Web is training users that if they click on a link and nothing happens within about 10 seconds, they should forget that link and click elsewhere. These few seconds are all a digital preservation system has to satisfy its readers.

If preserved materials are not instantly available through their normal finding techniques, primarily search engines such as Google, they will not be used. We made this observation in late 1998 during the initial design of the LOCKSS syste. It motivated us to make access to preserved content completely transparent, by having an institution's LOCKSS box behave as a persistent Web cache. Content thus remains instantly available at its original URL. From our 2000 Freenix paper (pdf):
"Unless links to pages continue to resolve, the material will effectively be lost because no-one will have the knowledge or patience to retrieve it."

Dark archives are thus not useful to the general readership. Although they may provide useful insurance for the content, their complex and time-consuming access methods mean that readers will require an access copy. For small collections, additional copies are not significant. But for collections big enough that the cost of storage is significant this is a problem.

2 comments:

Anonymous said...

"If preserved materials are not instantly available through their normal finding techniques, primarily search engines such as Google, they will not be used." While this is likely true for the general reader, it is quite terrifying for the future of research to think that it might become true for research.

Research is hard; PhDs are not supposed to be trivial, point and click activities. Only a small proportion of relevant research data (depending on field, admittedly) exists or is even catalogued in the digital domain. To do research properly will surely require patient pursuit of relevant resources wherever they are. In the future, as the past, many may be missed, perhaps to be discovered later, perhaps not.

I think I'm trying to say that preservation is potentially valuable for researchers whether it can meet the "instantly accessible" standard or not!

Anonymous said...

This is the case with all forms of reseach accessible by whatever medium and not just the internet or Google.

Books / scrolls / parchament in a remote libery in a war-torn country which have not been committed to other forms of media are in danger of being lost forever and not just to the lazy researcher but to the dedicated one as well.

The problem here seems two-pronged, the sheer volume of work still to be done in terms of committing historical data to a digital medium and then nurturing a mindset where people look beyond the first returns in a search engine for the information they want.

Again, I raise my hat to you in admiration of the work you do.