Tuesday, September 25, 2018

Web Archives As Evidence

In Blockchain Solves Preservation! I critiqued John Collomosse et al's ARCHANGEL: Trusted Archives of Digital Public Documents. They argue that
integrity validation via hashes is needed because:
Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation — trust at face value in a centralised authority, like a national government archive or University.
But they also write that:
acceptance of content evidence might eventually become similar to acceptance of DNA evidence in court, but that establishing that level of confidence would require strong public engaged to explain Blockchain in an accessible manner particularly explaining why one could trust the cryptographic assurances inherent in a DLT solution.
At least as far as courts are concerned, they're wrong about both "face value" and how trust is established. Below the fold, an explanation.

Kieran McCarthy's Archive.org's Wayback Machine is legit legal evidence, US appeals court judges rule reports on the gradual process by which US appeals courts are allowing Web pages recovered from the Internet Archive's Wayback Machine to be entered in evidence:
The second circuit, based in New York, was asked over the summer to review an appeal by an Italian computer hacker in which he sought to exclude screenshots of websites run by him that tied him to a virus and botnet he was ultimately convicted over. Prosecutors had taken screenshots of his webpages from the Internet Archive and used them as trial evidence – and he wanted the files thrown out.

Fabio Gasperini argued that the presented Wayback Machine archives of his webpages were not adequately authenticated as legit and untampered, and so shouldn't have been included in his criminal trial.
...
In the Gasperini case, however, the second circuit noted that the prosecution had included testimony from the Internet Archive's office manager, "who explained how the Archive captures and preserves evidence of the contents of the internet at a given time."
...
The manager also testified that the prosecution's screenshots of the Wayback Machine's archive of Gasperini's webpages really did match the contents of the Internet Archive. And, combined, this created a sufficient degree of authenticity. Gasperini's lawyers were also able to cross-examine the office manager, the appeals court noted.
Note that the court based its acceptance on testimony from personal knowledge as to the archive's processes, not on a general public belief in the technology. And note that the processes in question actually record hashes of the content. These hashes are not entangled in Merkle trees, doing so at the Internet Archive's scale would be expensive.

The Third Circuit came to a similar conclusion via a similar route:
The appeals judges' decision reflected a similar one back in 2011 by the third circuit (United States v. Bansal) where a witness testified "from personal knowledge" how the Wayback Machine worked and how reliable it was. The court decided this provided "sufficient proof" that its mirrored pages were authentic.
...
Of course, it may still be the case that if a prosecution does not have an Internet Archive staffer to act as a witness in a case to explain the process by which it takes a snapshot and testify the screenshots were not faked, the material could be thrown out at a later date.
...
At some point, we imagine, it will no longer be necessary for there to be a witness, and the Internet Archive will stand up as a wholly legitimate source of past online activity.
The legal system is gradually building trust in the evidentiary value of the Wayback Machine by requiring personal testimony from the people operating the underlying processes, and exposing these witnesses to cross-examination. Systems based on blockchains would have to undergo a similar process; those using public, permissionless blockchains would have difficulty doing so since it would not be possible to obtain testimony from "the people operating the underlying processes". In the case of ARCHANGEL, which is intended to use a permissioned blockchain, testimony would be required from both the operators of the network nodes and from the archives generating the hashes to be injected into the chain. Mere public confidence would, and should not, be sufficient.