TL;DR: Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary. Below the fold, my results.
- Month: The date marked on the media in Sharpie, and verified via the on-disk metadata.
- Media: The type of media.
- Good: The number of media with this type and date for which all MD5 checksums were correctly verified.
- Bad: The number of media with this type and date for which any file failed MD5 verification.
- Vendor: the vendor name on the media
sr 6:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sr 6:0:0:0: [sr0] tag#0 Sense Key : Medium Error [current] sr 6:0:0:0: [sr0] tag#0 Add. Sense: L-EC uncorrectable error sr 6:0:0:0: [sr0] tag#0 CDB: Read(10) 28 00 00 05 64 30 00 00 02 00 00 00 blk_update_request: critical medium error, dev sr0, sector 1413312 sr 6:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sr 6:0:0:0: [sr0] tag#0 Sense Key : Medium Error [current] sr 6:0:0:0: [sr0] tag#0 Add. Sense: L-EC uncorrectable error sr 6:0:0:0: [sr0] tag#0 CDB: Read(10) 28 00 00 05 64 30 00 00 02 00 00 00 blk_update_request: critical medium error, dev sr0, sector 1413312 Aug 12 14:34:37 nuc7 kernel: [194688.719850] Buffer I/O error on dev sr0, logical block 176664, async page readI never intended these weekly backups to be a long-term archive. The intended use was disaster recovery; until now I was just too lazy to dispose of the back catalog. I've retained the sample of disks for future re-analysis. But the remaining approximately 1200 older than 3 years will be recycled by the CD Recycling Center of America, once I figure out how to ship 45lbs of optical disks!
I'm sorry that the sample isn't bigger, but it was time-consuming feeding 40 disks into the reader, and I need the space in the cupboards now.
Professor Wildani and myself often had discussions about what (if anything) could be labeled as "Archive by accident" and if there's value in it/should we care. The net result of the discussions was a resounding "Who knows", and back to the usual problems around identifying high-value data without an oracle lest we become packrats and data hoarders and all the problems that entails.
I remember at a Daghstuhl workshop a few years back (you were there, I believe), talking with folks about doing crude automatic triage, tossing near-duplicates, flagging things for a human to pick at etc, under the assumption that we will inevitably toss things that may be valuable, but we may still wind up with a greater corpus of "useful" stuff with a reduced workload. Potentially an intractable problem, but we can dream :)
Regardless, interesting stuff! Thanks for sharing!
The 2019 update of this post is here.ReplyDelete