Thursday, May 23, 2019

Regulating Cryptocurrencies

Satoshi Nakamoto's Bitcoin emerged not just from three decades of computer science research, but also from two interrelated cult-like ideologies of the right, libertarianism and Austrian economics. Governments are generally happy with computer science research until it gets in the way of law enforcement, but non-kleptocratic governments tend to be unhappy with both libertarianism and Austrian economics, particularly when they get in the way of law enforcement.

Tuesday, May 21, 2019

Ten Hot Topics

The topic of scholarly communication has received short shrift here for the last few years. There has been too much to say about other topics, and developments such as Plan S have been exhaustively discussed elsewhere. But I do want to call attention to an extremely valuable review by Jon Tennant and a host of co-authors entitled Ten Hot Topics around Scholarly Publishing.

Thursday, May 16, 2019

Review Of Data Storage In DNA

Luis Ceze, Jeff Nivala and Karin Strauss of the University of Washington and Microsoft Research team have published a fascinating review of the history and state-of-the-art in Molecular digital data storage using DNA. The abstract reads:
Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review , we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.
Tuesday, May 14, 2019

Storing Data In Oligopeptides

Thursday, May 9, 2019

Immutability FTW!

There's an apparently apocryphal story that when Willie Sutton, the notorious bank robber of the 1930s  to 1950s, was asked why he robbed banks, he answered:
Because that's where the money is!
Today's Willie Suttons don't need a disguise or an (unloaded) Thompson submachine gun, because they rob cryptocurrency exchanges. As David Gerard writes:
Crypto exchange hacks are incredibly rare, and only happen every month or so.
Yesterday Bloomberg reported:
Binance, one of the world’s largest cryptocurrency exchanges, said hackers withdrew 7,000 Bitcoins worth about $40 million via a single transaction in a “large scale security breach,” the latest in a long line of thefts in the digital currency space.
Thursday, May 2, 2019

Lets Put Our Money Where Our Ethics Are

Thursday, April 25, 2019

Short talk at Asilomar Microcomputer Workshop

Thursday, April 18, 2019

Personal Pods and Fatcat

Sir Tim Berners-Lee's Solid project envisages a decentralized Web in which people control their own data stored in personal "pods":
The basic idea of Solid is that each person would own a Web domain, the "host" part of a set of URLs that they control. These URLs would be served by a "pod", a Web server controlled by the user that implemented a whole set of Web API standards, including authentication and authorization. Browser-side apps would interact with these pods, allowing the user to:
  • Export a machine-readable profile describing the pod and its capabilities.
  • Write content for the pod.
  • Control others access to the content of the pod.
Pods would have inboxes to receive notifications from other pods. So that, for example, if Alice writes a document and Bob writes a comment in his pod that links to it in Alice's pod, a notification appears in the inbox of Alice's pod announcing that event. Alice can then link from the document in her pod to Bob's comment in his pod. In this way, users are in control of their content which, if access is allowed, can be used by Web apps elsewhere.
In his Paul Evan Peters Award Lecture, my friend Herbert Van de Sompel applied this concept to scholarly communication, envisaging a world in which access, for both humans and programs, to all the artifacts of research would be greatly enhanced.
In Herbert's vision, institutions would host their researchers "research pods", which would be part of their personal domain but would have extensions specific to scholarly communication, such as automatic archiving upon publication.
Tuesday, April 16, 2019

Tuesday, April 9, 2019

What is Amazon?

In Why It's Hard To Escape Amazon's Long Reach, Paris Martineau and Louise Matsakis have compiled an amazingly long list of businesses that exist inside Amazon's big tent. After it went up, they had to keep updating it as people pointed out businesses they'd missed. In most of those businesses, Amazon's competitors are at a huge disadvantage:
While its retail business is the most visible to consumers, the cloud computing arm, Amazon Web Services, is the cash cow. AWS has significantly higher profit margins than other parts of the company. In the third quarter, Amazon generated $3.7 billion in operating income (before taxes). More than half of the total, $2.1 billon, came from AWS, on just 12 percent of Amazon’s total revenue. Amazon can use its cloud cash to subsidize the goods it ships to customers, helping to undercut retail competitors who don’t have similar adjunct revenue streams.
In the mid-50s my father wrote a textbook, Organisation of retail distribution, with a second edition in the mid-60s. He would have been fascinated by Amazon. I've written about Amazon from many different viewpoints, including storage as a service, and anti-trust, so I'm fascinated with Amazon, too. Now, when you put recent posts by two different writers together, an extraordinarily interesting picture emerges, not just of Amazon but of the risks inherent to the "friction-free" nature of the Web:
  • Zack Kanter's What is Amazon? is easily the most insightful thing I've ever read about Amazon. It starts by examining how Walmart's "slow AI" transformed retail, continues by describing how Amazon transformed Walmart's "slow AI" into one better suited to the Internet, and ends up with a discussion of how Amazon's "slow AI" seems recently to have made a fundamental mistake.
  • Izabella Kaminska's series Amazon (sub)Prime? and Amazon (sub)Prime - Part II provides the deep dive to go with Kanter's big picture, looking in detail into one of the many symptoms of the "slow AI's" apparent mistake.
Thursday, April 4, 2019

Digitized Historical Documents

Tuesday, April 2, 2019

First We Change How People Behave

Then the system will work the way we want. My skepticism about Level 5 self-driving cars keeps getting reinforced. Below the fold, two recent examples.

Thursday, March 28, 2019

The 47 Links Mystery

Nearly a year ago, in All Your Tweets Are Belong To Kannada, I blogged about Cookies Are Why Your Archived Twitter Page Is Not in English. It describes some fascinating research by Sawood Alam and Plinio Vargas into the effect of cookies on the archiving of multi-lingual web-sites.

Tuesday, March 26, 2019

FAST 2019

I wasn't able to attend this year's FAST conference in Boston, and reading through the papers I didn't miss much relevant to long-term storage. Below the fold a couple of quick notes and a look at the one really relevant paper.

Thursday, March 21, 2019

Cost-Reducing Writing DNA Data

In DNA's Niche in the Storage Market, I addressed a hypothetical DNA storage company's engineers and posed this challenge:
increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
Now, a company called Catalog plans to demo a significant step in the right direction:
The goal of the demonstration, says Park, is to store 125 gigabytes, ... in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.
That would be 1E11 bits for $7E3. At the theoretical maximum 2 bits/base, it would be $3.5E-8 per base, versus last year's estimate of 1E-4, or around 30,000 times better.

Tuesday, March 19, 2019

Compression vs. Preservation

An archive is in a hardware refresh cycle and they have asked me to comment on concerns arising because their favored storage hardware uses data compression, which may not be possible to disable even if doing so were a good idea. This is an issue I wrote about two years ago in Threats to stored data.

Thursday, March 14, 2019

It's The Enforcement, Stupid!

Kim Stanley Robinson is a remarkable author. In 1990 he concluded his Wild Shore triptych of novels describing alternate futures for California with Pacific Edge:
Pacific Edge (1990) can be compared to Ernest Callenbach's Ecotopia, and also to Ursula K. Le Guin's The Dispossessed. This book's Californian future is set in the El Modena neighborhood of Orange in 2065. It depicts a realistic utopia as it describes a possible transformation process from our present status, to a more ecologically-focused future.
Thursday, March 7, 2019

It Isn't Just Cryptocurrency Mining

Izabella Kaminska's Just because it's digital doesn't mean it's green reports on:
A new report by the carbon emission think-tank The Shift Project out this week highlights that not much has changed since [2014]. ICT still contributes to about 4 per cent of global greenhouse gas emissions, which is still twice that of civil aviation. What is worse, its contribution is growing more quickly than that of civil aviation.
Tuesday, March 5, 2019

Demand Is Far From Insatiable

Based on numbers that IDC conjures from thin air, pundits believe that demand for storage is insatiable because everyone says Lets Just Keep Everything Forever In The Cloud. That idea assumes storage is free, but Storage Will Be Much Less Free Than It Used To Be. (Both links are from 2012). Below the fold I look at some real-world numbers showing how much storage actual customers are buying.

Tuesday, February 26, 2019

Economic Models Of Long-Term Storage

My work on the economics of long-term storage with students at the UC Santa Cruz Center for Research in Storage Systems stopped about six years ago some time after the funding from the Library of Congress ran out. Last year to help with some work at the Internet Archive I developed a much simplified economic model, which runs on a Raspberry Pi.

Two recent developments provide alternative models:
  • Last year, James Byron, Darrell Long, and Ethan Miller's Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems (also here) reported on a vastly more sophisticated model developed at the Center. It includes both much more detailed historical data about, for example, electricity cost, and covers various media types including tape, optical, and SSDs.
  • At the recent PASIG Julian Morley reported on the model being used at the Stanford Digital Repository, a hybrid local and cloud system, and he has made the spreadsheet available for use.
Tuesday, February 12, 2019

IT Improves Productivity!

In The Productivity Paradox David Rotman writes:
Productivity growth in most of the world’s rich countries has been dismal since around 2004. Especially vexing is the sluggish pace of what economists call total factor productivity—the part that accounts for the contributions of innovation and technology. In a time of Facebook, smartphones, self-driving cars, and computers that can beat a person at just about any board game, how can the key economic measure of technological progress be so pathetic? Economists have tagged this the “productivity paradox.”

Some argue that it’s because today’s technologies are not nearly as impressive as we think. The leading proponent of that view, Northwestern University economist Robert Gordon, contends that compared with breakthroughs like indoor plumbing and the electric motor, today’s advances are small and of limited economic benefit. Others think productivity is in fact increasing but we simply don’t know how to measure things like the value delivered by Google and Facebook, particularly when many of the benefits are “free.”
Thursday, February 7, 2019

Cloud For Preservation

Imagine you're responsible for preserving the long-established digital collection at a large research or national library. It is currently preserved in home-grown software, or off-the-shelf software that's been extensively customized, that you are responsible for running on hardware run by your institution's IT department. You are probably not a large customer of theirs. They are probably laying down the law, saying "cloud first", especially as you are looking at a looming hardware refresh. Below the fold, I examine a set of issues that need to be clarified in the decision-making process.

Tuesday, February 5, 2019

The Economics Of Bitcoin Transactions

Izabella Kaminska's BIS trolls bitcoin reports on analysis of the economics of Bitcoin transactions from Raphael Auer at the Bank for International Settlements. She starts:
Bitcoin aspires to take over the world. But as we all know (according to poorly sourced conspiracy forums), the world is currently run by the Bank of International Settlements (BIS), the central bank to central banks. That means Bitcoin needs to displace the BIS in the near future if it is to get anywhere.

But it takes one to know one.

So here's the dominant global payments system calling out the aspiring global payments system in an excellent piece of professional trolling this week
Thursday, January 31, 2019

Facebook's Catch-22

John Herrman's How Secrecy Fuels Facebook Paranoia takes the long way round to come to a very simple conclusion. My shorter version of Herrman's conclusion is this. In order to make money Facebook needs to:
  1. Convince advertisers that it is an effective means of manipulating the behavior of the mass of the population.
  2. Avoid regulation by convincing governments that it is not an effective means of manipulating the behavior of the mass of the population.
Tuesday, January 29, 2019

Blockchain Video and Podcast

CNI has now posted the video of my 20-minute talk Blockchain: What's Not To Like? to YouTube and Vimeo. Here is the YouTube version:

Gerry Bayne interviewed me at the Fall CNI meeting for CNI's podcast series. The 20-minute conversation is a companion piece to the talk. The podcast is on the Educause SoundCloud channel.

I made one, easily spotted, mistake in the interview when I said $3,000 instead of $300,000. But other than that I'm happy with both the video and the podcast.

Tuesday, January 22, 2019

Trump's Shutdown Impacts Information Access

Government shutdown causing information access problems by James A. Jacobs and James R. Jacobs is important. It documents the effect of the Trump government shutdown on access to globally important information:
Twitter and newspapers are buzzing with complaints about widespread problems with access to government information and data (see for example, Wall Street Journal (paywall 😐 ), ZDNet News, Pew Center, Washington Post, Scientific American, TheVerge, and FedScoop to name but a few).
Matthew Green, a professor at Johns Hopkins, said “It’s worrying that every single US cryptography standard is now unavailable to practitioners.” He was responding to the fact that he could not get the documents he needed from the National Institute of Standards and Technology (NIST) or its branch, the Computer Security Resource Center (CSRC). The government shutdown is the direct cause of these problems.
They point out how this illustrates the importance of libraries collecting and preserving web-published information:
Regardless of who you (or your user communities) blame for the shutdown itself, this loss of access was entirely foreseeable and avoidable. It was foreseeable because it has happened before. It was avoidable because libraries can select, acquire, organize, and preserve these documents and provide access to them and services for them whether the government is open or shut-down.
Go read the whole thing, and weep for the way libraries have abandoned their centuries-long mission of safeguarding information for future readers.

Thursday, January 10, 2019

Digital Preservation Network Is No More

In Why Is the Digital Preservation Network Disbanding? Roger Schonfeld examines the demise of the Digital Preservation Network which was announced last month:
An initial announcement said directly that "After careful analysis of the Digital Preservation Network's membership, operating model, and finances, the Board of Trustees of DPN passed a resolution to affect an orderly wind-down of DPN," including committing to consultations with each member to ensure that content would not be lost in the wind-down. Shortly thereafter, messages came out from DPN's hubs, both individually including HathiTrust, and collectively, characterizing their operating and financial strength and ability to provide for an orderly transition. Because DPN was not itself directly preserving anything but rather a broker for preservation services by underlying repositories, it does not appear that any content will be put at risk.
Below the fold, I look at various views of the lessons to be learned.

Thursday, January 3, 2019

Trust In Digital Content

This is the fourth and I hope final part of a series about trust in digital content that might be called:
Is this the real  life?
Is this just fantasy
  The series so far moved down the stack:
  • The first part was Certificate Transparency, about how we know we are getting content from the Web site we intended to.
  • The second part was Securing The Software Supply Chain, about how we know we're running the software we intended to, such as the browser that got the content whose certificate was transparent.
  • The third part was Securing The Hardware Supply Chain, about how we can know that the hardware the software we secured is running on is doing what we expect it to.
Below the fold this part asks whether, even if the certificate, software and hardware were all perfectly secure, we could trust what we were seeing.