Tuesday, October 30, 2018

Controlled Digital Lending

Three years ago in Emulation and Virtualization as Preservation Strategies I wrote about Controlled Digital Lending (CDL):
One idea that might be worth exploring as a way to mitigate the legal issues is lending. The Internet Archive has successfully implemented a lending system for their collection of digitized books; readers can check a book out for a limited period, and each book can be checked out to at most one reader at a time. This has not encountered much opposition from copyright holders.

A similar system for emulation would be feasible; readers would check out an emulation for a limited period, and each emulation could be checked out to at most one reader at a time. One issue would be dependencies. An archive might have, say, 10,000 emulations based on Windows 3.1. If checking out one blocked access to all 10,000 that might be too restrictive to be useful.
Now, Controlled Digital Lending by Libraries offers libraries the opportunity to:
  • better understand the legal framework underpinning CDL,
  • communicate their support for CDL, and
  • build a community of expertise around the practice of CDL.
Below the fold, some details.

Thursday, October 25, 2018

Syndicating Journal Publisher Content

There's a lot of good information in Roger Schonfeld's Will Publishers Syndicate Their Content?. It starts:
The scholarly publishing sector has struggled to address the problems that users face in their discovery-to-access workflow and thereby stave off skyrocketing piracy. The top-line impact of these struggles is becoming clearer, starting with Elsevier’s absence from Germany. This makes the efforts to establish seamles single-platform access to all scholarly publications — equal in extent as Sci-Hub but legitimate, and which I term a Supercontinent of Scholarly Publishing — all the more urgent. The technical solutions are challenging, and at the STM meeting in Frankfurt last week it became clear that, although progress is being made, policy, governance, and competition issues may complicate the drive to consensus.
Schonfeld asserts that providing a seamless, uniform view of the publisher's content, whether paywalled or open access, requires two services:
First, it requires an ability to authorize appropriate access in a decentralized distribution environment. A Shared Entitlements System, as it is sometimes called, would be a kind of common authorization service for all publishers. As I will discuss below, there are at least two options for how Entitlements can be addressed. Second, it requires Distributed Usage Logging, which is to say the ability for all usage, wherever it takes place, to be “counted” in measuring the value of articles on behalf of authors and licenses on behalf of publishers.
Below the fold, a rather long explanation of why I think Schonfeld's analysis doesn't go far enough.

Tuesday, October 23, 2018

Gini Coefficients Of Cryptocurrencies

The Gini coefficient expresses a system's degree of inequality or, in the blockchain context, centralization. It therefore factors into arguments, like mine, that claims of blockchains' decentralization are bogus.

In his testimony to the US Senate Committee on Banking, Housing and Community Affairs' hearing on “Exploring the Cryptocurrency and Blockchain Ecosystem" entitled Crypto is the Mother of All Scams and (Now Busted) Bubbles While Blockchain Is The Most Over-Hyped Technology Ever, No Better than a Spreadsheet/Database, Nouriel Roubini wrote:
wealth in crypto-land is more concentrated than in North Korea where the inequality Gini coefficient is 0.86 (it is 0.41 in the quite unequal US): the Gini coefficient for Bitcoin is an astonishing 0.88.
The link is to Joe Weisenthal's How Bitcoin Is Like North Korea from nearly five years ago, which was based upon a Stack Exchange post, which in turn was based upon a post by the owner of the Bitcoinica exchange from 2011! Which didn't look at all holdings of Bitcoin, let alone the whole of crypto-land, but only at Bitcoinica's customers!

Follow me below the fold as I search for more up-to-date and comprehensive information. I'm not even questioning how Roubini knows the Gini coefficient of North Korea to two decimal places.

Thursday, October 18, 2018

Betteridge's Law Violation

Erez Zadok points me to Wasim Ahmed Bhat's Is a Data-Capacity Gap Inevitable in Big Data Storage? in IEEE Computer. It is a violation of Betteridge's Law of Headlines because the answer isn't no. But what, exactly, is this gap? Follow me below the fold.

Tuesday, October 16, 2018

Software Heritage Foundation Update

I first wrote about the Software Heritage Foundation two years ago. It is four months since their Archive officially went live. Now Roberto di Cosmo and his collaborators have an article, and a video, entitled Building the Universal Archive of Source Code in Communications of the ACM describing their three challenges, of collection, preservation and sharing, and setting out their current status:
Software Heritage is an active project that has already assembled the largest existing collection of software source code. At the time of writing the Software Heritage Archive contains more than four billion unique source code files and one billion individual commits, gathered from more than 80 million publicly available source code repositories (including a full and up-to-date mirror of GitHub) and packages (including a full and up-to-date mirror of Debian). Three copies are currently maintained, including one on a public cloud.

As a graph, the Merkle DAG underpinning the archive consists of 10 billion nodes and 100 billion edges; in terms of resources, the compressed and fully de-duplicated archive requires some 200TB of storage space. These figures grow constantly, as the archive is kept up to date by periodically crawling major code hosting sites and software distributions, adding new software artifacts, but never removing anything. The contents of the archive can already be browsed online, or navigated via a REST API.
I have always believed, as I wrote in 2013:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
I'm very disappointed that national libraries haven't accepted this argument, let alone the argument that preservation and access to their other digital collections largely depend on preserving and providing access to open source software. Since they have failed in this task, it is up to the Software Heritage Foundation to step into the breach.

You can find out more at their Web site, and support this important work by donating.

Thursday, October 11, 2018

I'm Shocked, Shocked To Find Collusion Going On

The security of a permissionless peer-to-peer system generally depends upon the assumption of uncoordinated choice, the idea that each peer acts independently upon its own view of the system's state. Vitalik Buterin, a co-founder of Ethereum, wrote in The Meaning of Decentralization:
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently.
Another way of saying this is that the system isn't secure if enough peers collude with each other. Below the fold, I look at why this is a big problem.

Tuesday, October 9, 2018

Click On The Llama

There was lots of great stuff at the Internet Archive's Annual Bash. But for those of us who can remember the days before PCs played music, the highlight was right at the end of the presentations when the awesome Jason Scott introduced the port of 1997's WinAmp to the Web. Two years earlier:
WinPlay3 was the first real-time MP3 audio player for PCs running Windows, both 16-bit (Windows 3.1) and 32-bit (Windows 95). Prior to this, audio compressed with MP3 had to be decompressed prior to listening.
WinPlay3 was the first, but it was bare-bones.It was WinAmp that really got people to realize that the PC was a media device. But the best part was that WinAmp was mod-able. It unleashed a wave of creativity (Debbie does WinAmp, anyone?), now preserved in the Archive's collection of over 5,000 WinAmp skins!

Jason has the details in his blog post Don't Click on the Llama:
Thanks to Jordan Eldredge and the Webamp programming community for this new and strange periscope into the 1990s internet past.
When I first clicked on the llama on The Swiss Family Robinson on my Ubuntu desktop the sound ceased. It turns out that the codec selection mechanism is different between the regular player and WinAmp, and it needed a codec I didn't have installed. The fix was:
sudo apt install ubuntu-restricted-extras
I should also note that the Archive's amazing collection of emulations now includes the Commodore 64 (Jason's introduction is here), and 1,100 additional arcade machines.

Thursday, October 4, 2018

I Don't Really Want To Stop The Show

But I thought you might like to know,
It was twenty years ago today that Vicky Reich and I walked into Mike Keller's office in the Stanford Library and got the go-ahead to start the LOCKSS Program. I told the story of its birth five years ago.

Over the last couple of years, as we retired, the program has migrated from being an independent operation under the umbrella of the Stanford Library, to being one of the programs run by the Library's main IT operation, Tom Cramer's DLSS. The transition will shortly be symbolized by a redesigned website (its predecessor looked like this).

Now we are retired, on my blog there are lists of Vicky's and my publications from 1981 on (the LOCKSS ones start in 2000), and talks from 2006 on.

Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system. Many thanks to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive, that has sustained it in production. Special thanks to Don Waters for facilitating the program's evolution off grant funding, and to Margaret Kim for the original tortoise logo.

PS - Google is just one week older.  Vicky was the librarian on the Stanford Digital Library Project with Larry Page and Sergey Brin that led to Google.

Wednesday, October 3, 2018

Brief Talk At Internet Archive Event

Vicky Reich gave a brief talk at the Building A Better Web: The Internet Archive’s Annual Bash. She followed Jefferson Bailey's talk, which reported that the Internet Archive's efforts to preserve the journals have already accumulated full text and metadata of nearly 8.7M articles, of which nearly 1.5M are from "at-risk" small journals. This is around 10% of the entire academic literature.

Below the fold, an edited text of Vicky's talk with links to the sources.

Tuesday, October 2, 2018

Bitcoin's Academic Pedigree

Bitcoin's Academic Pedigree (also here) by Arvind Narayanan and Jeremy Clark starts:
If you've read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades' worth of research on digital cash, beginning with David Chaum, did not lead to commercial success because it required a centralized, banklike server controlling the system, and no banks wanted to sign on. Along came bitcoin, a radically different proposal for a decentralized cryptocurrency that didn't need the banks, and digital cash finally succeeded. Its inventor, the mysterious Satoshi Nakamoto, was an academic outsider, and bitcoin bears no resemblance to earlier academic proposals.
They comprehensively debunk this view, showing that each of the techniques Nakamoto used had been developed over the preceding three decades of academic research, and that Nakamoto's brilliant contribution was:
the specific, complex way in which the underlying components are put together.
Below the fold, details on the specific techniques.