Tuesday, July 16, 2019

The EFF vs. DMCA Section 1201

As the EFF's Parker Higgins wrote:
Simply put, Section 1201 means that you can be sued or even jailed if you bypass digital locks on copyrighted works—from DVDs to software in your car—even if you are doing so for an otherwise lawful reason, like security testing.;
Section 1201 is obviously a big problem for software preservation, especially when it comes to games.

Last December in Software Preservation Network I discussed both the SPN's important documents relating to the DMCA:
Below the fold, some important news about Section 1201.

Tuesday, July 9, 2019

Finn Brunton's "Digital Cash"

I attended the book launch event for Finn Brunton's Digital Cash at the Internet Archive, and purchased a copy. It is a historian's review of the backstory leading up to Satoshi Nakamoto's Bitcoin. To motivate you to read it, below the fold I summarize its impressive breadth.

Tuesday, July 2, 2019

The Web Is A Low-Trust Society

Back in 1992 Robert Putnam et al published Making democracy work: civic traditions in modern Italy, contrasting the social structures of Northern and Southern Italy. For historical reasons, the North has a high-trust structure whereas the South has a low-trust structure. The low-trust environment in the South had led to the rise of the Mafia and persistent poor economic performance. Subsequent effects include the rise of Silvio Berlusconi.

Now, in The Internet Has Made Dupes-And Cynics-Of Us All, Zynep Tufecki applies the same analysis to the Web:
ONLINE FAKERY RUNS wide and deep, but you don’t need me to tell you that. New species of digital fraud and deception come to light almost every week, if not every day: Russian bots that pretend to be American humans. American bots that pretend to be human trolls. Even humans that pretend to be bots. Yep, some “intelligent assistants,” promoted as advanced conversational AIs, have turned out to be little more than digital puppets operated by poorly paid people.

The internet was supposed to not only democratize information but also rationalize it—to create markets where impartial metrics would automatically surface the truest ideas and best products, at a vast and incorruptible scale. But deception and corruption, as we’ve all seen by now, scale pretty fantastically too.
Below the fold, some commentary.

Thursday, June 27, 2019

The Risks Of Outsourcing

My Cloud for Preservation post was in some sense all about the risks of outsourcing IT infrastructure to the cloud. Below the fold I comment on two recent articles illustrating different aspects of these risks.

Tuesday, June 25, 2019

Lina M. Khan On Structural Separation

In It's The Enforcement, Stupid! I argued that anti-trust enforcement was viable only if there were "bright lines". I even went further and, following Kim Stanley Robinson's Pacific Edge, suggested a hard cap on corporate revenue, as a way of making anti-trust self-executing.

Much of the recent wave of attention to anti-trust was sparked by Lina Khan's masterful January 2017 Yale Law Journal article Amazon's Antitrust Paradox (a must-read, even at 24,000 words). Now Cory Doctorow writes:
Khan (who is now a Columbia Law fellow) is back with The Separation of Platforms and Commerce -- clocking in at 61,000 words with footnotes! -- that describes the one-two punch of contemporary monopolism, in which Reagan-era deregulation enthusiasts took the brakes off of corporate conduct but said it would be OK because antitrust law would keep things from getting out of control, while Reagan-era antitrust "reformers" (led by Robert Bork and the Chicago School) dismantled antitrust). 
You should definitely read Khan's latest magnum opus. OK, maybe you can skip the footnotes, I admit I did. Below the fold I examine two threads among many in the article.

Thursday, June 20, 2019

Michael Nelson's CNI Keynote: Part 3

Here is the conclusion of my three-part "lengthy disquisition" on Michael Nelson's Spring CNI keynote Web Archives at the Nexus of Good Fakes and Flawed Originals (Nelson starts at 05:53 in the video, slides).

Part 1 and Part 2 addressed Nelson's description of the problems of the current state of the art. Below the fold I address the way forward.

Wednesday, June 19, 2019

HAMR-ing Home My Point

In Double-headed Seagate disk drives? Yes, on their way, Chris Mellor mentions that Seagate:
expects to intro 20TB+ HAMR-based nearline HDDs in calendar 2020.
Volume production of HAMR drives is still 1 year away. In 2009 Dave Anderson of Seagate presented this roadmap. It shows HAMR drives a year away in 2010. They have been a year away ever since. A decade of real-time slip.

Only the good Dr. Pangloss believes industry roadmaps.

Tuesday, June 18, 2019

Michael Nelson's CNI Keynote: Part 2

My "lengthy disquisition" on Michael Nelson's Spring CNI keynote Web Archives at the Nexus of Good Fakes and Flawed Originals (Nelson starts at 05:53 in the video, slides). continues here. Part 1 had an introduction and discussion of two of my issues with Nelson's big picture.
Below the fold I address my remaining issues with Nelson's big picture of the state of the art. Part 3 will compare his and my views of the path ahead.

Thursday, June 13, 2019

Michael Nelson's CNI Keynote: Part 1

Michael Nelson and his group at Old Dominion University have made major contributions to Web archiving. Among them are a series of fascinating papers on the problems of replaying archived Web content. I've blogged about several of them, most recently in All Your Tweets Are Belong To Kannada and The 47 Links Mystery. Nelson's Spring CNI keynote Web Archives at the Nexus of Good Fakes and Flawed Originals (Nelson starts at 05:53 in the video, slides) understandably focuses on recounting much of this important research. I'm a big fan of this work, and there is much to agree with in the rest of the talk.

But I have a number of issues with the big picture Nelson paints. Part of the reason for the gap in posting recently was that I started on a draft that discussed both the big picture issues and a whole lot of minor nits, and I ran into the sand. So I finally put that draft aside and started this one. I tried to restrict myself to the big picture, but despite that it is still too long for a single post. Follow me below the fold for the first part of a lengthy disquisition.

Thursday, May 23, 2019

Regulating Cryptocurrencies

Satoshi Nakamoto's Bitcoin emerged not just from three decades of computer science research, but also from two interrelated cult-like ideologies of the right, libertarianism and Austrian economics. Governments are generally happy with computer science research until it gets in the way of law enforcement, but non-kleptocratic governments tend to be unhappy with both libertarianism and Austrian economics, particularly when they get in the way of law enforcement.

Below the fold, a look at the varying approaches governments are taking to the problems they perceive cryptocurrencies pose.

Tuesday, May 21, 2019

Ten Hot Topics

The topic of scholarly communication has received short shrift here for the last few years. There has been too much to say about other topics, and developments such as Plan S have been exhaustively discussed elsewhere. But I do want to call attention to an extremely valuable review by Jon Tennant and a host of co-authors entitled Ten Hot Topics around Scholarly Publishing.

The authors pose the ten topics as questions, which allows for a scientific experiment. My hypothesis is that all these questions, while strictly not headlines, will nevertheless obey Betteridge's Law of Headlines, in that the answer will be "No". Below the fold, I try to falsify my hypothesis.

Thursday, May 16, 2019

Review Of Data Storage In DNA

Luis Ceze, Jeff Nivala and Karin Strauss of the University of Washington and Microsoft Research team have published a fascinating review of the history and state-of-the-art in Molecular digital data storage using DNA. The abstract reads:
Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review , we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.
They include a comprehensive bibliography. Below the fold, some commentary and a few quibbles.

Tuesday, May 14, 2019

Storing Data In Oligopeptides

Bryan Cafferty et al have published a paper entitled Storage of Information Using Small Organic Molecules. There's a press release from Harvard's Wyss Institute at Storage Beyond the Cloud. Below the fold, some commentary on the differences and similarities between this technique and using DNA to store data.

Thursday, May 9, 2019

Immutability FTW!

There's an apparently apocryphal story that when Willie Sutton, the notorious bank robber of the 1930s  to 1950s, was asked why he robbed banks, he answered:
Because that's where the money is!
Today's Willie Suttons don't need a disguise or an (unloaded) Thompson submachine gun, because they rob cryptocurrency exchanges. As David Gerard writes:
Crypto exchange hacks are incredibly rare, and only happen every month or so.
Yesterday Bloomberg reported:
Binance, one of the world’s largest cryptocurrency exchanges, said hackers withdrew 7,000 Bitcoins worth about $40 million via a single transaction in a “large scale security breach,” the latest in a long line of thefts in the digital currency space.
Below the fold, a few thoughts:

Thursday, May 2, 2019

Lets Put Our Money Where Our Ethics Are

I found a video of Jefferson Bailey's talk at the Ethics of Archiving the Web conference from a year ago. It was entitled Lets Put Our Money Where Our Ethics Are. The talk is the first 18.5 minutes of this video. It focused on the paucity of resources devoted to archiving the huge proportion of our culture that now lives on the evanescent Web. I've also written on this topic, for example in Pt. 2 of The Amnesiac Civilization. Below the fold, some detailed numbers (that may by now be somewhat out-of-date) and their implications.

Thursday, April 25, 2019

Short talk at Asilomar Microcomputer Workshop

I gave a revised version of Blockchain: What's Not To Like? in the 2019 Asilomar Microcomputer Workshop's Athematic session. Below the fold, the text of the talk with links to the sources. Readers should also consult the "Additional Material" in the original talk, the video of my original presentation, and the podcast interview.

Thursday, April 18, 2019

Personal Pods and Fatcat

Sir Tim Berners-Lee's Solid project envisages a decentralized Web in which people control their own data stored in personal "pods":
The basic idea of Solid is that each person would own a Web domain, the "host" part of a set of URLs that they control. These URLs would be served by a "pod", a Web server controlled by the user that implemented a whole set of Web API standards, including authentication and authorization. Browser-side apps would interact with these pods, allowing the user to:
  • Export a machine-readable profile describing the pod and its capabilities.
  • Write content for the pod.
  • Control others access to the content of the pod.
Pods would have inboxes to receive notifications from other pods. So that, for example, if Alice writes a document and Bob writes a comment in his pod that links to it in Alice's pod, a notification appears in the inbox of Alice's pod announcing that event. Alice can then link from the document in her pod to Bob's comment in his pod. In this way, users are in control of their content which, if access is allowed, can be used by Web apps elsewhere.
In his Paul Evan Peters Award Lecture, my friend Herbert Van de Sompel applied this concept to scholarly communication, envisaging a world in which access, for both humans and programs, to all the artifacts of research would be greatly enhanced.
In Herbert's vision, institutions would host their researchers "research pods", which would be part of their personal domain but would have extensions specific to scholarly communication, such as automatic archiving upon publication.
Follow me below the fold for an update to my take on the practical possibilities of Herbert's vision.

Tuesday, April 16, 2019

Tuesday, April 9, 2019

What is Amazon?

In Why It's Hard To Escape Amazon's Long Reach, Paris Martineau and Louise Matsakis have compiled an amazingly long list of businesses that exist inside Amazon's big tent. After it went up, they had to keep updating it as people pointed out businesses they'd missed. In most of those businesses, Amazon's competitors are at a huge disadvantage:
While its retail business is the most visible to consumers, the cloud computing arm, Amazon Web Services, is the cash cow. AWS has significantly higher profit margins than other parts of the company. In the third quarter, Amazon generated $3.7 billion in operating income (before taxes). More than half of the total, $2.1 billon, came from AWS, on just 12 percent of Amazon’s total revenue. Amazon can use its cloud cash to subsidize the goods it ships to customers, helping to undercut retail competitors who don’t have similar adjunct revenue streams.
In the mid-50s my father wrote a textbook, Organisation of retail distribution, with a second edition in the mid-60s. He would have been fascinated by Amazon. I've written about Amazon from many different viewpoints, including storage as a service, and anti-trust, so I'm fascinated with Amazon, too. Now, when you put recent posts by two different writers together, an extraordinarily interesting picture emerges, not just of Amazon but of the risks inherent to the "friction-free" nature of the Web:
  • Zack Kanter's What is Amazon? is easily the most insightful thing I've ever read about Amazon. It starts by examining how Walmart's "slow AI" transformed retail, continues by describing how Amazon transformed Walmart's "slow AI" into one better suited to the Internet, and ends up with a discussion of how Amazon's "slow AI" seems recently to have made a fundamental mistake.
  • Izabella Kaminska's series Amazon (sub)Prime? and Amazon (sub)Prime - Part II provides the deep dive to go with Kanter's big picture, looking in detail into one of the many symptoms of the "slow AI's" apparent mistake.
Below the fold, a long meditation on these posts.

Thursday, April 4, 2019

Digitized Historical Documents

Source
Josh Marshall of Talking Points Memo trained as a historian. From that perspective, he has a great post entitled Navigating the Deep Riches of the Web about the way digitization and the Web have transformed our access to historical documents. Below the fold, I bestow both praise and criticism.

Tuesday, April 2, 2019

First We Change How People Behave

Then the system will work the way we want. My skepticism about Level 5 self-driving cars keeps getting reinforced. Below the fold, two recent examples.

Thursday, March 28, 2019

The 47 Links Mystery

Nearly a year ago, in All Your Tweets Are Belong To Kannada, I blogged about Cookies Are Why Your Archived Twitter Page Is Not in English. It describes some fascinating research by Sawood Alam and Plinio Vargas into the effect of cookies on the archiving of multi-lingual web-sites.

Sawood Alam just followed up with Cookie Violations Cause Archived Twitter Pages to Simultaneously Replay In Multiple Languages, another fascinating exploration of these effects. Follow me below the fold for some commentary.

Tuesday, March 26, 2019

FAST 2019

I wasn't able to attend this year's FAST conference in Boston, and reading through the papers I didn't miss much relevant to long-term storage. Below the fold a couple of quick notes and a look at the one really relevant paper.

Thursday, March 21, 2019

Cost-Reducing Writing DNA Data

In DNA's Niche in the Storage Market, I addressed a hypothetical DNA storage company's engineers and posed this challenge:
increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
Now, a company called Catalog plans to demo a significant step in the right direction:
The goal of the demonstration, says Park, is to store 125 gigabytes, ... in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.
That would be 1E11 bits for $7E3. At the theoretical maximum 2 bits/base, it would be $3.5E-8 per base, versus last year's estimate of 1E-4, or around 30,000 times better.

If the demo succeeds, it marks a major achievement. But below the fold I continue to throw cold water on the medium-term prospects for DNA storage.

Tuesday, March 19, 2019

Compression vs. Preservation

An archive is in a hardware refresh cycle and they have asked me to comment on concerns arising because their favored storage hardware uses data compression, which may not be possible to disable even if doing so were a good idea. This is an issue I wrote about two years ago in Threats to stored data.

Because similar concerns keep re-appearing in discussions of digital preservation, I decided this time to discuss it in the same way as Cloud for Preservation, writing a post with a general discussion of the issues without referring to a specific institution. Below the fold, the details.

Thursday, March 14, 2019

It's The Enforcement, Stupid!

Kim Stanley Robinson is a remarkable author. In 1990 he concluded his Wild Shore triptych of novels describing alternate futures for California with Pacific Edge:
Pacific Edge (1990) can be compared to Ernest Callenbach's Ecotopia, and also to Ursula K. Le Guin's The Dispossessed. This book's Californian future is set in the El Modena neighborhood of Orange in 2065. It depicts a realistic utopia as it describes a possible transformation process from our present status, to a more ecologically-focused future.
Why am I writing about this now, nearly three decades later? Follow me below the fold for an explanation.

Thursday, March 7, 2019

It Isn't Just Cryptocurrency Mining

Izabella Kaminska's Just because it's digital doesn't mean it's green reports on:
A new report by the carbon emission think-tank The Shift Project out this week highlights that not much has changed since [2014]. ICT still contributes to about 4 per cent of global greenhouse gas emissions, which is still twice that of civil aviation. What is worse, its contribution is growing more quickly than that of civil aviation.
Cryptocurrency mining is definitely a problem, but how big a part of the problem isn't clear. It could be quite big. Follow me below the fold for some surprising details.

Tuesday, March 5, 2019

Demand Is Far From Insatiable

Based on numbers that IDC conjures from thin air, pundits believe that demand for storage is insatiable because everyone says Lets Just Keep Everything Forever In The Cloud. That idea assumes storage is free, but Storage Will Be Much Less Free Than It Used To Be. (Both links are from 2012). Below the fold I look at some real-world numbers showing how much storage actual customers are buying.

Tuesday, February 26, 2019

Economic Models Of Long-Term Storage

My work on the economics of long-term storage with students at the UC Santa Cruz Center for Research in Storage Systems stopped about six years ago some time after the funding from the Library of Congress ran out. Last year to help with some work at the Internet Archive I developed a much simplified economic model, which runs on a Raspberry Pi.

Two recent developments provide alternative models:
  • Last year, James Byron, Darrell Long, and Ethan Miller's Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems (also here) reported on a vastly more sophisticated model developed at the Center. It includes both much more detailed historical data about, for example, electricity cost, and covers various media types including tape, optical, and SSDs.
  • At the recent PASIG Julian Morley reported on the model being used at the Stanford Digital Repository, a hybrid local and cloud system, and he has made the spreadsheet available for use.
Below the fold some commentary on all three models.

Tuesday, February 12, 2019

IT Improves Productivity!

In The Productivity Paradox David Rotman writes:
Productivity growth in most of the world’s rich countries has been dismal since around 2004. Especially vexing is the sluggish pace of what economists call total factor productivity—the part that accounts for the contributions of innovation and technology. In a time of Facebook, smartphones, self-driving cars, and computers that can beat a person at just about any board game, how can the key economic measure of technological progress be so pathetic? Economists have tagged this the “productivity paradox.”

Some argue that it’s because today’s technologies are not nearly as impressive as we think. The leading proponent of that view, Northwestern University economist Robert Gordon, contends that compared with breakthroughs like indoor plumbing and the electric motor, today’s advances are small and of limited economic benefit. Others think productivity is in fact increasing but we simply don’t know how to measure things like the value delivered by Google and Facebook, particularly when many of the benefits are “free.”
My view is that IT is only one of the factors driving the decrease of productivity in the general economy, but that there are some areas of the economy in which IT is greatly increasing productivity. An explanation is below the fold.

Thursday, February 7, 2019

Cloud For Preservation

Imagine you're responsible for preserving the long-established digital collection at a large research or national library. It is currently preserved in home-grown software, or off-the-shelf software that's been extensively customized, that you are responsible for running on hardware run by your institution's IT department. You are probably not a large customer of theirs. They are probably laying down the law, saying "cloud first", especially as you are looking at a looming hardware refresh. Below the fold, I examine a set of issues that need to be clarified in the decision-making process.

Tuesday, February 5, 2019

The Economics Of Bitcoin Transactions

Source
Izabella Kaminska's BIS trolls bitcoin reports on analysis of the economics of Bitcoin transactions from Raphael Auer at the Bank for International Settlements. She starts:
Bitcoin aspires to take over the world. But as we all know (according to poorly sourced conspiracy forums), the world is currently run by the Bank of International Settlements (BIS), the central bank to central banks. That means Bitcoin needs to displace the BIS in the near future if it is to get anywhere.

But it takes one to know one.

So here's the dominant global payments system calling out the aspiring global payments system in an excellent piece of professional trolling this week
Auer's is indeed an excellent piece of work. Follow me below the fold for some details.

Thursday, January 31, 2019

Facebook's Catch-22

John Herrman's How Secrecy Fuels Facebook Paranoia takes the long way round to come to a very simple conclusion. My shorter version of Herrman's conclusion is this. In order to make money Facebook needs to:
  1. Convince advertisers that it is an effective means of manipulating the behavior of the mass of the population.
  2. Avoid regulation by convincing governments that it is not an effective means of manipulating the behavior of the mass of the population.
The dilemma is even worse because among the advertisers Facebook needs to believe in its effectiveness are individual politicians and political parties, both big advertisers! This Catch-22 is the source of Facebook's continuing PR problems, listed by Ryan Mac. Follow me below the fold for details.

Tuesday, January 29, 2019

Blockchain Video and Podcast

CNI has now posted the video of my 20-minute talk Blockchain: What's Not To Like? to YouTube and Vimeo. Here is the YouTube version:



Gerry Bayne interviewed me at the Fall CNI meeting for CNI's podcast series. The 20-minute conversation is a companion piece to the talk. The podcast is on the Educause SoundCloud channel.

I made one, easily spotted, mistake in the interview when I said $3,000 instead of $300,000. But other than that I'm happy with both the video and the podcast.

Tuesday, January 22, 2019

Trump's Shutdown Impacts Information Access

Source
Government shutdown causing information access problems by James A. Jacobs and James R. Jacobs is important. It documents the effect of the Trump government shutdown on access to globally important information:
Twitter and newspapers are buzzing with complaints about widespread problems with access to government information and data (see for example, Wall Street Journal (paywall 😐 ), ZDNet News, Pew Center, Washington Post, Scientific American, TheVerge, and FedScoop to name but a few).
Matthew Green, a professor at Johns Hopkins, said “It’s worrying that every single US cryptography standard is now unavailable to practitioners.” He was responding to the fact that he could not get the documents he needed from the National Institute of Standards and Technology (NIST) or its branch, the Computer Security Resource Center (CSRC). The government shutdown is the direct cause of these problems.
They point out how this illustrates the importance of libraries collecting and preserving web-published information:
Regardless of who you (or your user communities) blame for the shutdown itself, this loss of access was entirely foreseeable and avoidable. It was foreseeable because it has happened before. It was avoidable because libraries can select, acquire, organize, and preserve these documents and provide access to them and services for them whether the government is open or shut-down.
Go read the whole thing, and weep for the way libraries have abandoned their centuries-long mission of safeguarding information for future readers.

Thursday, January 10, 2019

Digital Preservation Network Is No More

In Why Is the Digital Preservation Network Disbanding? Roger Schonfeld examines the demise of the Digital Preservation Network which was announced last month:
An initial announcement said directly that "After careful analysis of the Digital Preservation Network's membership, operating model, and finances, the Board of Trustees of DPN passed a resolution to affect an orderly wind-down of DPN," including committing to consultations with each member to ensure that content would not be lost in the wind-down. Shortly thereafter, messages came out from DPN's hubs, both individually including HathiTrust, and collectively, characterizing their operating and financial strength and ability to provide for an orderly transition. Because DPN was not itself directly preserving anything but rather a broker for preservation services by underlying repositories, it does not appear that any content will be put at risk.
Below the fold, I look at various views of the lessons to be learned.

Thursday, January 3, 2019

Trust In Digital Content

This is the fourth and I hope final part of a series about trust in digital content that might be called:
Is this the real  life?
Is this just fantasy
  The series so far moved down the stack:
  • The first part was Certificate Transparency, about how we know we are getting content from the Web site we intended to.
  • The second part was Securing The Software Supply Chain, about how we know we're running the software we intended to, such as the browser that got the content whose certificate was transparent.
  • The third part was Securing The Hardware Supply Chain, about how we can know that the hardware the software we secured is running on is doing what we expect it to.
Below the fold this part asks whether, even if the certificate, software and hardware were all perfectly secure, we could trust what we were seeing.