- Some of them are data. Some data is just facts, so is not copyright. In some jurisdictions, collections of facts are copyright. In Europe, databases are covered by database right, which is different from copyright.
- The copyright releases signed by authors differ, and the extent to which they cover supplemental materials may not be clear
For material that is subject to copyright, we strongly encourage use of Creative Commons licenses. They permit all activities required for preservation without consultation with the publisher. The legal risks of interpreting other license terms as permitting these activities without explicit permission are considerable, so even if the material was released under some other license terms we would generally prefer not to depend on them but seek explicit permission from the publisher instead. Obtaining explicit permission from the publisher is time-consuming and expensive. So is having a lawyer analyze the terms of a new license to determine whether it covers the required activities.
Efforts, such as those we cite in the article, are under way to develop suitable licenses for data, but they have yet to achieve even the limited penetration of Creative Commons for copyright works. Until there is a simple, clear, widely-accepted license in place difficulties will lie in the path of any broad approach to preserving supplemental materials, especially data. Creating such a license will be more a difficult task than Creative Commons faced, since it will not be able to draw on the firm legal foundation of copyright. Note that the analogs of Creative Commons licenses for software, the various Open Source licenses, are also based on copyright.
When and if suitable licenses become common, one or more machine-readable ways to identify content published under the licenses will be useful. We're agnostic as to how this is done; the details will have little effect on the archiving process once we have implemented a parser for the machine-readable rights expressions that we encounter. We have already done this using the various versions of the Creative Commons license for the Global LOCKSS Network.
The idea of a general "rights language" that would express the terms of a wide variety of licenses in machine-readable form is popular. But it is not a panacea. If there were a wide variety of license terms, even if they were encoded in machine-readable form, we would be reluctant to depend on them. There are few enough Creative Commons licenses and they are simple enough that they can be reviewed and approved by human lawyers. It would be too risky to depend on software interpreting the terms of licenses that had never had this review. So, a small set of simple clear licenses is essential for preservation. Encoding these licenses in machine-readable form is a good thing. That is what the Creative Commons license in machine-readable form does; it does not express the specific rights but simply points to the text of the license in question.
Encoding the specific terms of a wide variety of complex licenses in a rights language is much less useful. The software that interprets these encodings will not end up in court, nor will the encodings. The archives that use the software will end up in court facing the text of the license in a human language.