Saturday, October 13, 2007

Who's looking after the snowman?

In a post to the liblicense mailing list James O'Donnell, Provost of Georgetown University, asks:

"So when I read ten years from now about this cool debate among Democratic candidates that featured video questions from goofy but serious viewers, including a snowman concerned about global warming, and people were watching it on YouTube for weeks afterwards: how will I find it? Who's looking after the snowman?

This is an important question. Clearly, future scholars will not be able to understand the upcoming election without access to YouTube videos, blog posts and other ephemera. In this particular case, I believe there are both business and technical reasons why Provost O'Donnell can feel somewhat reassured, and legal and library reasons why he should not. Follow me below the fold for the details.

Here is the pre-debate version of the snowman's video, and here are the candidates' responses. CNN, which broadcast the debate, has the coverage here. As far as I can tell the Internet Archive doesn't collect videos like these.

From a business point of view, YouTube videos are a business asset of Google, and will thus be preserved with more than reasonable care and attention. As I argued here, content owned by major publishing corporations (which group now includes Google) is at very low risk of accidental loss; the low and rapidly decreasing cost per byte of storage makes the business decision to keep it available rather than take it down a no-brainer. And that is ignoring the other aspects of the Web's Long Tail economics which mean that the bulk of the revenue comes from the less popular content.

Technically, YouTube video is Flash Video. It can easily be downloaded, for example by this website. The content is in a widely used web format that has an open-source player, in this case at least two (MPlayer and VLC). It is thus perfectly feasible to preserve it, and for the reasons I describe here the open source players make it extraordinarily unlikely that it would not be possible to play the video in 10, or even 30 years. If someone collects the video from YouTube and preserves the bits, it is highly likely that the bits will be viewable indefinitely.

But, will anyone other than Google actually collect and preserve the bits? Provost O'Donnell's library might want to do so, but the state of copyright law places some tricky legal obstacles in the way. Under the DMCA, preserving a copy of copyright content requires the copyright owner's permission. Although I heard rumors that CNN would release video of the debate under a Creative Commons license, on their website there is a normal "All Rights Reserved" copyright notice. And on YouTube, there is no indication of the copyright status of the videos. A library downloading the videos would have to assume it didn't have permission to preserve them. It could follow the example of the Internet Archive and depend on the "safe harbor" provision, complying with any "takedown letters" by removing them. This is a sensible approach for the Internet Archive, which aims to be a large sample of the Web, but not for the kind of focused collections Provost O'Donnell has in mind.

The DMCA, patents and other IP restrictions place another obstacle in the way. I verified that an up-to-date Ubuntu Linux system using the Totem media player plays downloaded copies of YouTube videos very happily. Totem uses the GStreamer media framework with plugins for specific media. Playing the YouTube videos used the FFmpeg library. As with all software, it is possible that some patent holder might claim that it violated their patents, or that in some way it could be viewed as evading some content protection mechanism as defined by the DMCA. As with all open source software, there is no indemnity from a vendor against such claims. Media formats are so notorious for such patent claims that Ubuntu segregates many media plugins into separate classes and provides warnings during the install process that the user may be straying into a legal gray area. The uncertainty surrounding the legal status is carefully cultivated by many players in the media market, as it increases the returns they may expect from what are, in many cases, very weak patents and content protection mechanisms. Many libraries judge that the value of the content they would like to preserve doesn't justify the legal risks of preserving it.


James J. O'Donnell said...

David well outlines some of the issues that face preservation efforts, wherever they come from. Let me highlight one more.

YouTube is interested in page views and maximum number of page views and links through whatever video you happen to be looking at. Well and good, and good luck to them: if I enjoy seeing the stuff that attracts my attention, then I'm happy to play along with this, remembering Richard Lanham's work on *The Economics of Attention* in the process.

But the traditional library function recognizes the original commercial value of information objects *and* goes on to something else. Looking at 2007 YouTube 20 years from now doesn't play along with that original business plan, but has a historical and spectatorial purpose. Not, hey, what a cool snowman; but, hmm, and just why did snowmen become important icons in politics in 2007 and how were they used? The library function is one of a place where all sorts of originally commercial objects get used in ways that go beyond the business plan of the original producer.

Now, if we believe the long tail argument, then YouTube may have a business plan 20 years from now in keeping this old stuff around and accessible. Or perhaps not. The question for the snowman, poor abused snowman, would then be: do you feel lucky? If so, trust YouTube. But I think the snowman got on that screen because he wasn't feeling especially lucky, wasn't feeling that he can just trust the aggregate collection of economic impulses of his contemporaries to make things all work out for the good.

Jim O'Donnell

(Eugippius was a sixth century scholar and monk, in case you wonder.)

David. said...

I should have used Google earlier. It turns out that a team at the University of North Carolina is in fact selecting and collecting YouTube video of the 2008 Presidential campaign. They describe their system in this paper (PDF).

However, their focus is on the process of identifying and selecting suitable videos. Their paper ignores the issues of preservation.

The paper also fails to make the economic case for expending resources on collecting and preserving video that there is no convincing reason to believe won't be available from YouTube indefinitely. To steal Eugippius's words from the comment above, they don't think the snowman is lucky. I should stress that I believe the case can be made, but it is not a slam dunk and it does not depend on knowing whether Google will continue to make the video available. Maybe you trust a single archive to control history. Or maybe you are worried about future Winston Smiths, whether corporate or governmental.

James J. O'Donnell said...

And meanwhile, Google themselves just made the rights management issue harder. If you got those snowman clips from CNN or another provider that claims copyright, well . . .:

Google Takes Step on Video Copyrights
Published: October 16, 2007, New York Times

"SAN BRUNO, Calif., Oct. 15 — Google is seeking to put an end to the copyright wars over online video.

"On Monday, the company unveiled a long-anticipated system that, if effective, would allow media companies to prevent their clips from being uploaded to YouTube without permission.

"Whether the system will work well enough to satisfy media companies who have been irked by the proliferation of unauthorized copyrighted clips on YouTube is not yet clear. But if successful, the system, which Google is offering to all media companies, could usher in a detente between them and Google."

Jim O'Donnell

Anonymous said...

Should we be sanguine about any one entity preserving content, even if that entity has absolutely guaranteed resources and procedures to do the preservation? Even if that company is relatively well liked and has happy slogans like "don't be evil"?

What if their idea of "evil" is different than someone else's? What if their expanded profitability depends on playing along with other powers that might be, arguably, evil?

I don't think these scenarios are all that far fetched considering the major search engines agreeing to censor themselves in trade for making economic inroads in China (Comparing image search results for Tiananmen Square gives a graphic example). How about phone companies supplying information to the US government without warrants?

Perhaps some individuals with copies of a video on their hard drives can supply a memory that has become unpopular or profitability-destroying for a company.

I'd rather see some robustness against political, social and economic whims institutionalized, rather than having to rely on a diaspora of individuals to allow access to ideas or a cultural heritage, however.