Update: the third post discussing the lessons to be drawn is here.
From the point of view of an Archive being audited, the TRAC process can be divided into phases:
- The Archive negotiates a contract with an auditing organization, in our case CRL.
- The Archive generates and submits to the Auditors documentation describing its organization, policies, operations, technology and so on in great detail.
- The Auditors request further information, and evidence to support the claims in the documentation.
- A delegation of Auditors visits the Archive to ask questions, receive demonstrations, examine equipment, and so on.
- The Auditors prepare a draft Certification Report, which is reviewed by the Archive.
- The Auditors release their report.
Submission Phase
Our contract was signed in July last year, but the LOCKSS team started work on the necessary documentation about 6 months earlier. I was assigned responsibility for the submission, and spent full-time June through September editing previously written documents, writing new ones and organizing the submission.The example provided by the previous TRAC audit of Scholar's Portal was enormously useful in these early stages. In particular we observed that:
- They used a Wiki to assemble the documents they submitted to their auditors.
- The Wiki included a page for every one of the ISO16363 Criteria.
- The Wiki was made public, promoting transparency and providing support for future audits.
- The structure of the documentation, determined by the ISO16363 criteria, was extremely useful to the auditors but opaque to anyone not intimately familiar with OAIS and the audit criteria.
- The pages for the criteria were in some respects repetitious, which we felt would cause difficulty in ensuring that the information was consistent.
- Because the pages were implemented by a Wiki, the edit history of each page was visible to the auditors and the public, which could inhibit free discussion among the team as the pages were created.
- A Wiki (documents.clockss.org) that, following the Scholar's Portal example, would be made public at the end of the audit.
- A Wiki (trac.clockss.org) to contain all the confidential information that the auditors would request, to be taken down at the end of the audit.
- The auditors, who needed to access the content via the ISO16363 criteria.
- Members of the LOCKSS team, so that as far as possible these documents would be detailed enough to replace our internal documentation.
- Interested members of the public, who needed to access the content via some understandable, non-OAIS-related structure.
- A set of documents organized around coherent themes, which we called "the documents". The themes we ended up with were:
- CLOCKSS Archive Documents, describing the goals, organization, functions and requirements of the CLOCKSS Archive.
- LOCKSS Program Documents, describing the policies, practices and technology of the LOCKSS Program, which operates the CLOCKSS Archive under contract to the CLOCKSS Board.
- LOCKSS Adaptations to CLOCKSS Archive Documents, describing the adaptations made to the generic LOCKSS technology to satisfy the requirements of the CLOCKSS Archive.
- OAIS Conformance Documents, describing the mapping between the architecture of the CLOCKSS Archive and the OIAS Reference Model.
- A set of pages matching the hierarchical structure of the ISO16363 criteria, with one page for each criterion. This set of pages we called "the criteria". They would, as far as possible, serve only as a finding aid, with each page having minimal content but linking to the appropriate sections of "the documents".
The next step was to provide the leaf nodes with relevant content; notes about what content the auditors would need to see to judge that criterion, and links to appropriate sections of as yet non-existent pages in "the documents". A typical leaf node page was very sparse, for example:
4.2.3 - The repository shall document the final disposition of all SIPs.The next step was to create "the documents" pointed to by "the criteria". Although not in a suitable form, a good deal of the necessary content already existed as CLOCKSS Board documents, published papers from the LOCKSS team, this blog, and the LOCKSS team's internal Wiki and bug tracking system. This material was reviewed and incorporated as appropriate in "the documents". Despite this "the documents" mostly had to be written from scratch, after extensive consultation with the relevant team members. Writing "the documents" in this way had several beneficial effects:
Relevant Documents
- As regards harvest SIPs, see Definition of AIP.
- As regards file transfer SIPs, see Definition of AIP.
- It revealed that in some cases different team members had different ideas about how the process worked, or were in fact executing a different process from the one documented in the internal Wiki.
- The team came to distinguish for the first time between the team members as individuals, and the roles they played in the various processes. The documents were written to assign responsibilities to roles, not to individuals, and a page on the internal Wiki was created under the control of the LOCKSS Executive Director that mapped from these roles to individual team members.
- These new documents were placed under a formal document change system. Each specifies the roles which must review, and the role which must approve future changes to them. Because the documents are in a Wiki, conformance to this system is easy to establish through the edit comments.
Discussion Phase
About 6 weeks after the submission the auditors responded with an e-mailed list of questions and requests for further documentation covering:- Statistics. The auditors asked for detailed statistics in three areas:
- Counts of articles, journals, files, etc.
- The rate of growth of the archive.
- A list of file formats with a count of the instances of each.
- Administrative Documents. The auditors asked for 15 categories of such documents, all of which were confidential.
- Content Samples. The auditors asked for sample output from FITS, which could be provided, and sample content, which could not. Since CLOCKSS is a dark archive, absent a trigger event access to content in the archive is not permitted.
- Reports. The auditors asked for samples of three kinds of report.
- Additional Requests. The auditors asked for four further responses, one based on one of "the documents", one based on the LOCKSS team's January 2005 D-Lib paper and two based on a 2007 analysis of LOCKSS by CRL.
Lists of URLs for two sample archival units (AUs) were provided, one collected by web harvesting and one supplied from the publisher via file transfer. Keepers and KBART reports were already public and, in fact, already linked from the appropriate places in "the documents". A sample of the weekly internal report from the LOCKSS team to the CLOCKSS Executive Director was provided.
Once the requested information had been collected, the part that could be made public but was not already public was edited into "the documents". Pages containing the requested confidential information were added to the trac.clockss.org Wiki. A page in that Wiki was created to form the response to the auditors' request by taking the text of their e-mail, adding Wiki markup, and then adding the text of the response to each request in bold font, with links to the confidential information or to "the documents" as appropriate.
The auditors were notified of our response about five weeks after the request, just before Christmas 2013. They were given password-protected read-only accounts on trac.clockss.org that allowed them to read these pages.
Inquisition Phase
About eight weeks after our response to the auditors' first request we received a proposed schedule for the auditors' on-site visit, covering two days about five weeks later. It consisted of a list of requests for aspects of the Archive's functions to be demonstrated, and a set of questions similar to, but more detailed than, those in the first request. There were a total of 36 such questions covering:- Ingest
- Storage and data management
- Metadata
- Integrity
- Miscellaneous
- Follow up to previous statements
- Understandability, Rendering Content, and Representation Information
- Content examples
- A question based on our 9-year-old format migration paper whose answer was in "the documents". The first request had requested evidence for this answer, which the first response had provided.
- Two questions based on our 14-year-old first paper on the LOCKSS prototype.
Using the internal Wiki, we developed responses to the questions in the proposed schedule, and started planning suitable demonstrations. As soon as we started to consider which team members were most appropriate to handle each question and demonstration, it became obvious that the structure of the proposed schedule was inappropriate. Instead, we suggested a re-organization of the schedule to the auditors. We proposed to structure the presentations of demonstrations and answers to questions around the CLOCKSS Archive's workflow, thus:
- Engaging. The work of the CLOCKSS Executive Director and the Director of Publisher Outreach in recruiting publishers and libraries.
- Preparing. The work of the LOCKSS team in preparing the CLOCKSS system to ingest the content of newly recruited publishers.
- Ingesting. The operations of the CLOCKSS system as it ingests the flow of content from established publishers, and the quality control and monitoring processes performed by the LOCKSS team as it does.
- Preserving. The operations of the CLOCKSS system as it preserves the ingested content, and the monitoring processes performed by the LOCKSS team as it does.
- Extracting. The processes that extract metadata from the preserved content, and the uses to which the metadata is put.
- Triggering. The processes that occur when the CLOCKSS board declares a trigger event to extract preserved content from the CLOCKSS system and deliver it in usable form to re-publishing sites.
- Detailed answers to each of their specific questions with links to the relevant sections of "the documents". In some cases this required detailed consultation with the relevant team member.
- An outline of each of the requested demonstrations. In some cases the relevant team member chose the specific example to be demonstrated.
- An overview of the area, based on the content of "the documents".
- A list of each of the questions relevant to that area with answers.
The auditors request for a demonstration of the LOCKSS: Polling and Repair Protocol in action, even via annotated logs, posed significant problems. The directory trees and files for production preserved content (AUs) are very large and a poll on a typical AU takes a long time. The daemon runs many such polls simultaneously. If the daemon's logging mechanism were configured to generate enough detail to follow every step of one real poll, because that level of detail would apply to every poll under way for the duration of that poll, the volume of log data would be enormous.
Instead we gave a live demo of the file structure and polling process on a small AU of synthetic content in the LOCKSS team's STF testing framework. We used STF to create a network of 5 virtual LOCKSS boxes each running the full LOCKSS daemon in 5 processes on the laptop running the projector used for the demos. Each was specially configured to preserve just the synthetic AU. STF caused the first box (the poller) to call a poll on it. The remaining 4 boxes were voters in this poll. The results of the poll showed up in the poll status page of the poller with, as expected, 100% agreement and in the vote status page of each voter. The daemons in all these boxes had logging configured to show details of this single poll, and these logs were shown to the auditors after they watched the poll proceed. The logs are linked from the documentation here.
Then we re-created the same network with the same content, but on the poller we damaged the current version of the content of one of the AU's URLs before calling the poll. The results of the poll showed up in the poll status page of the poller showing that the damage was repaired, and in the vote status page of each voter. They were shown the poll and vote status pages of production CLOCKSS boxes to demonstrate that similar processes were under way in the real CLOCKSS network.
About a week before the auditors' visit, they e-mailed a draft document setting out the view they had derived from "the documents" of the mapping between the workflow of the CLOCKSS Archive and the OAIS model of the flow from SIP to AIP to DIP (SIP-AIP-DIP). This highlighted some areas of uncertainty and revealed some significant misunderstandings. I made extensive edits to the auditors' draft clarifying the uncertainties and returned it to them
The two days preceding the auditor's visit were given over to a complete run-through, with each presenter giving their talk and team members playing the role of auditors. A review after the run-through changed some of the time allocations and, more importantly, identified a missing presentation. Looking back on SIP-AIP-DIP and the run-through it was clear that the set of documents lacked an introductory document: LOCKSS: Basic Concepts. A slot was created after the initial introduction to present an overview of seven basic concepts:
This supporting document was written and the presentation created overnight. Other presentations were edited to respond to feedback from the rehearsal audience, and collected in PDF form on the single laptop used to project them. This avoided time-wasting projector-swapping. The PDFs contained live hyperlinks to web pages and demonstrations.
The presentations and demonstrations generally went well, and the auditors expressed satisfaction at the end. The only significant problem we encountered was that, although a team member had been assigned to record the auditors' questions and our answers in the internal Wiki during this time, the discussion rapidly overwhelmed their typing, so our record of the discussion was inadequate.
Follow-Up
At the end of the visit the pages we had created in the internal Wiki underwent a final edit to reflect as far as possible the outcome of the discussions, and were then transferred to the confidential Wiki (trac.clockss.org) to provide a record of our answers. This part of the confidential Wiki was structured as an introduction, a page for each of the workflow stages described above, and a page with the text of the auditor's proposed schedule, linking each question to the appropriate location in the relevant workflow stage page with the answer.About 1 month after the visit the auditors made one final request for information. As before, we put the text of the request and our answer into the confidential Wiki.
About two months after the visit the auditors sent us a draft of their certification report for review. We made 10 comments, ranging from trivial to significant, with suggested rewordings. About six weeks after these comments, the auditors released their certification report, which addressed all of them. Shortly after that, CLOCKSS Archive management put out a press release, the LOCKSS team made documents.clockss.org publicly accessible, and I put up a blog post.
No comments:
Post a Comment