The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community have now fixed more than one million broken outbound web links on English Wikipedia. This has been done by the Internet Archive's monitoring for all new, and edited, outbound links from English Wikipedia for three years and archiving them soon after changes are made to articles. This combined with the other web archiving projects, means that as pages on the Web become inaccessible, links to archived versions in the Internet Archive's Wayback Machine can take their place. This has now been done for the English Wikipedia and more than one million links are now pointing to preserved copies of missing web content.This is clearly a good thing, but follow me below the fold.
The lead developers of the bot that has been fixing up the links are Maximilian Doerr and Stephen Balbach:
As a result of their work, in close collaboration with the non-profit Internet Archive and the Wikimedia Foundation's Wikipedia Library program and Community Tech team, now more than one million broken links have been repaired. For example, footnote #85 from the article about Easter Island, now links to the Wayback Machine instead of a now-missing page.Footnote 85 on the Easter Island page now looks like this:
However, Alfred Metraux pointed out that the rubble filled Rapanui walls were a fundamentally different design to those of the Inca, as these are trapezoidal in shape as opposed to the perfectly fitted rectangular stones of the Inca. See also this FAQ at the Wayback Machine (archived 11 October 2007)The URL that has been updated to point to an archived copy is:
https://web.archive.org/web/20071011083729/http://islandheritage.org/faq.htmlThis shows that the Internet Archive collected the page:
http://islandheritage.org/faq.htmlon 11 October 2007. This is clearly a big improvement, as Mark Graham writes:
"What Max and Stephen have done in partnership with Mark Graham at the Internet Archive is nothing short of critical for Wikipedia's enduring value as a shared repository of knowledge. Without dependable and persistent links, our articles lose their backbone of reliable sources. It's amazing what a few people can do when they are motivated by sharing - and preserving -knowledge," said Jake Orlowitz, head of the Wikipedia Library. "Having the opportunity to contribute something big to the community with a fun task like this is why I am a Wikipedia volunteer and bot operator. It's also the reason why I continue to work on this never-ending project, and I'm proud to call myself its lead developer," said Maximilian, the primary developer and operator of InternetArchiveBot.But wiring the Internet Archive in as the only source of archived Web pages, while expedient in the short term, is also a problem. It is true that the Wayback Machine is by far the largest repository of archived URLs, but research using Memento (RFC7089) has shown that significantly better reproduction of archived pages can be achieved by aggregating all the available Web archives.
Reinforcing the public perception that the Wayback Machine is the only usable Web archive reduces the motivation for other institutions, such as national libraries, to maintain their own Web archiving efforts. Given the positive effects of aggregating even relatively small Web archives, this impairs the quality of the reader's experience of the preserved Web, and thus Wikipedia.
Perhaps at some point the InternetArchiveBot could be replaced by a MementoBot that inserted links to a Memento aggregator instead of directly to the Wayback Machine. The Wayback Machine would still be the source for most of the broken link replacements, but more links would resolve. Other Web archives would get credit for their efforts, in the cooperative spirit of Brewster Kahle's "Building Libraries Together".
[Update - Blogger appears to have bloggered the link to Mark Graham's post, so I fixed it. Sorry about that.]