The CLOCKSS Archive is a dark archive of e-journal and e-book content, jointly managed by publishers and libraries, implemented using the LOCKSS technology and operated on behalf of the CLOCKSS not-for-profit by the LOCKSS team at the Stanford Library. For well over a year the LOCKSS team and CLOCKSS management have been preparing for and undergoing the Trustworthy Repositories Audit and Certification (TRAC) process for the CLOCKSS Archive with the Center for Research Libraries (CRL).
CRL just released the Certification Report on the CLOCKSS Archive. I'm happy to report that our work was rewarded with an overall score that equals the previous best, and the first ever perfect score in the "Technologies, Technical Infrastructure, Security" category. We are grateful for this wonderful endorsement of the LOCKSS technology.
In the interests of transparency the LOCKSS team have released all the non-confidential documentation submitted during the audit process. As you will see, there is a lot of it. What you see at the link is not exactly what we submitted. It has been edited to correct errors and obscurities we found during the audit, and to add material from the confidential part of the submission that we decided was not really confidential. These documents will continue to be edited as the underlying reality changes, to keep them up-to-date and satisfy one of the on-going requirements of the certification.
This is just a news item. In the near future I will follow up with posts describing the process of being audited, what we did to make the process work, and the lessons we learned that may be useful for future audits.
Update: the post describing the audit process is here and the post discussing the lessons to be drawn is here.
Thank you very much for opening your documentation! This will be an enormously useful classroom tool.
ReplyDeleteDavid I was referred to you as an expert on bit rot. Given today's raid 10/6 technologies is it really a concern
ReplyDeleteJack, at scale some bit rot is simply unavoidable. You can have more or less, depending on how much you want to spend, but you cannot have none. For details, see my series of posts and publications on A Petabyte for a Century. And don't get fooled by the hype around the various claims for quasi-immortal media.
ReplyDeleteI guess my question remains the same. Given all the checks and balances of RAID 5/6/10 - it still of course occurs - it's just as far as I am "sold" - these technologies use checksums, parity, auto-correction - so data loss due to bit rot/ media failure etc is mostly a thing of the past
ReplyDeleteJack, its your data. You decide how much you want to spend reducing the probability of bit rot, but no matter how much you spend the probability of rot will not be zero. And if you have a lot of data and want to keep it for a long time achievable levels of reliability, such as S3's eleven nines, are not sufficient to let you ignore the possibility.
ReplyDelete