Tuesday, December 29, 2020

Michael Nelson's Group On Archiving Twitter

The rise and fall of the Trump administration has amply illustrated the importance of Twitter in the historical record. Alas, Twitter has no economic motivation to cater to the needs of historians. As they work to optimize Twitter's user experience, the engineers are likely completely unaware of the problems they are causing the Web archives trying to preserve history. Even if they were aware, they would be unable to justify the time and effort necessary to mitigate them.

Over the last six months Michael Nelson's group at Old Dominion University have continued their excellent work to evaluate exactly how much trouble future historians will have to contend with in three new blog posts from Kritika Garg and Himarsha Jayanetti:
Below the fold, some commentary on each of them.

Tuesday, December 22, 2020

Stablecoins

I have long been skeptical of Bitcoin's "price" and, despite its recent massive surge, I'm still skeptical. But it turns out I was wrong two years ago when I wrote in Blockchain: What's Not To Like?:
Permissionless blockchains require an inflow of speculative funds at an average rate greater than the current rate of mining rewards if the "price" is not to collapse. To maintain Bitcoin's price at $4K requires an inflow of $300K/hour.
I found it hard to believe that this much actual money would flow in, but since then Bitcoin's "price" hasn't dropped below $4K, so I was wrong. Caution — I am only an amateur economist, and what follows below the fold is my attempt to make sense of what is going on.

Tuesday, December 8, 2020

RISC vs. CISC

The architectural debate between Complex Instruction Set Computers (CISC) and Reduced Instruction Set Conputers (RISC) really took off in the 1980s:
In particular, two projects at Stanford University and the University of California, Berkeley are most associated with the popularization of this concept. Stanford's MIPS would go on to be commercialized as the successful MIPS architecture, while Berkeley's RISC gave its name to the entire concept and was commercialized as the SPARC.
For the last decade or more the debate has seemed frozen, with the CISC x86 architecture dominating the server and desktop markets, while the RISC ARM architecture dominated the mobile market. But two recent developments are shaking things up. Below the fold, some discussion.

Tuesday, December 1, 2020

737 MAX Ungrounding

My post 737 MAX: The Case Against Boeing is a year old and has accumulated 58 updates in comments. Now the aircraft is returning to service, it is time for a new post. Below the fold, Bjorn Fehrm has two interesting posts about the ungrounding.

Tuesday, November 24, 2020

I Rest My Case

Jeff Rothenberg's seminal 1995 Ensuring the Longevity of Digital Documents focused on the threat of the format in which the documents were encoded becoming obsolete, and rendering its content inaccessible. This was understandable, it was a common experience in the preceeding decades. Rothenberg described two different approaches to the problem, migrating the document's content from the doomed format to a less doomed one, and emulating the software that accessed the document in a current environment.

The Web has dominated digital content since 1995, and in the Web world formats go obsolete very slowly, if at all, because they are in effect network protocols. The example of IPv6 shows how hard it is to evolve network protocols. But now we are facing the obsolescence of a Web format that was very widey used as the long effort to kill off Adobe's Flash comes to fruition. Fortunately, Jason Scott's Flash Animations Live Forever at the Internet Archive shows that we were right all along. Below the fold, I go into the details.

Thursday, November 19, 2020

Storage Media Update

My last post on storage media was After A Decade, HAMR Is Still Nearly Here back in July. Below the fold, I look at some of the developments since then.

Thursday, November 12, 2020

Even More On The Ad Bubble

I've been writing for some time about the hype around online advertising. There's a lot of evidence that it is ineffective. Recently, the UK's Information Commissioner's Office concluded an investigation into Cambridge Analytica's involvement in the 2016 US election and the Brexit referendum. At The Register, Shaun Nichols summarizes their conclusions in UK privacy watchdog wraps up probe into Cambridge Analytica and... it was all a little bit overblown, no?:
El Reg has heard on good authority from sources in British political circles that Cambridge Analytica's advertised powers of online suggestion were rather overblown and in fact mostly useless. In the end, it was skewered by its own hype, accused of tangibly influencing the Brexit and presidential votes on behalf of political parties and campaigners using its Facebook data. Yet, no evidence, according to the ICO, could be found supporting those specific claims.
Below the fold I look at this, a recent book on the topic, and other evidence that has emerged since I wrote Contextual vs. Behavioral Advertising.

Tuesday, November 3, 2020

The Order Flow

The MacGuffin in the last two books of William Gibson's Blue Ant trilogy is Chombo, a reclusive hacker. In Spook Country he tracks a container full of US currency, and in Zero History:
"It's the order flow, isn't it?" Milgrim had had no intent to ask this at all. Hadn't been thinking off it. Yet it had emerged. His therapist had told him that ideas, in human relations, had lives of their own. Were in a sense autonomous.
"Of course"
"That's what Chombo was doing. Finding the order flow."
"He found it a week before they kidnapped him, but his work, to that point, would have been useless, Without him, I mean."
"And the market, the whole thing, it's no longer real? Because you know the future?"
"It's a very tiny slice of the future. The merest paring. Minutes."
"How many?"
Bigend had glanced around the empty lounge. "Seventeen, presently."
"Is that enough?"
"Seven would have been entirely adequate. Seven seconds, in most cases."
Entirely adequate to make Hubertus Bigend much, much richer, because knowing the order flow allows him to front-run the transactions.

Wikipedia defines front-running thus:
Front running, also known as tailgating, is the prohibited practice of entering into an equity (stock) trade, option, futures contract, derivative, or security-based swap to capitalize on advance, nonpublic knowledge of a large ("block") pending transaction that will influence the price of the underlying security. ... A front running firm either buys for its own account before filling customer buy orders that drive up the price, or sells for its own account before filling customer sell orders that drive down the price. Front running is prohibited since the front-runner profits from nonpublic information, at the expense of its own customers, the block trade, or the public market.
Follow me below the fold for a discussion of why the architecture of cryptocurrencies means that no-one needs Chombo's mysterious skills to front-run the order flow.

Thursday, October 29, 2020

The Long Now

A talk by Stewart Brand and Danny Hillis about 25 years ago explaining the concept of the "Long Now" and the idea of building a 10,000-year clock to illustrate it was what started me thinking about long-term digital preservation. The idea of Lots Of Copies Keep Stuff Safe (LOCKSS), and the acronym came a couple of years later.

Hōryū-ji by Nekosuki, CC-BY-SA

Now, in The Data of Long-lived Institutions on the Long Now Foundation's blog, Alexander Rose refers to Hōryū-ji:
At about 1,400 years old, these are the two oldest continuously standing wooden structures in the world. And they’ve replaced a lot of parts of them. They keep the roofs on them, and even in a totally humid and raining environment, the central timbers of these buildings have stayed true. Interestingly, this temple was also the place where, over a thousand years ago, a Japanese princess had a vision that she needed to send a particular prayer out to the world to make sure that it survived into the future. And so she had, literally, a million wooden pagodas made with the prayer put inside them, and distributed these little pagodas as far and wide as she could. You can still buy these on eBay right now. It’s an early example of the philosophy of “Lots of Copies Keep Stuff Safe” (LOCKSS).
Below the fold, more on Rose's interesting post.

Tuesday, October 27, 2020

Unbanking The Banked

David Gerard has been writing a follow-up to his must-read and surprisingly successful Attack of the 50 Foot Blockchain: Bitcoin, Blockchain, Ethereum & Smart Contracts, entitled Libra Shrugged: How Facebook Tried to Take Over The Money. The Kindle version goes live on Amazon next Monday.

As one of his Patreons, I've been reading the chapters as he finished them. It isn't a laugh-a-minute read like his first book, but like the first it is a copiously sourced account of incredible hubris. Facebook's hubris led them to believe that they could, in effect, become a sovereign currency issuer like a government, without any of the responsibilities that governments assume when they control a currency. Actual governments looked at this proposal and responded "you have to be kidding". Follow me below the fold for more.

Tuesday, October 6, 2020

A Note On Blockchains

Blockchains have three components, a data structure, a set of replicas, and a consensus mechanism:
  • The data structure is often said to provide immutability or to be tamper-proof, but this is wrong. It is made out of bits, and bits can be changed or destroyed. What it actually provides is tamper-evidence, revealing that the data structure has changed.
  • If an unauthorized change to the data structure is detected the damage must be repaired. So there must be multiple replicas of the data structure to allow an undamaged replica to be copied to the damaged replica.
  • The role of the consensus mechanism is to authorize changes to the data structure, and prevent unauthorized changes. A change is authorized if the consensus of the replicas agrees to it.
Below the fold, some details.

Tuesday, September 29, 2020

Liability In The Software Supply Chain

Atlantic Council Report On Software Supply Chains was already rather long when I got to the last of the report's recommendations that I wanted to discuss, the one entitled Bring Lawyers, Guns and Money. It proposes imposing liability on actors in the software supply chain, and I wrote:
The fact that software vendors use licensing to disclaim liability for the functioning of their products is at the root of the lack of security in systems. These proposals are plausible but I believe they would either be ineffective or, more likely, actively harmful. There is so much to write about them that they deserve an entire post to themselves.
Below the fold is the post they deserve.

Tuesday, September 22, 2020

Moxie Marlinspike On Decentralization

The Ecosystem Is Moving: Challenges For Distributed And Decentralized Technology is a talk by Moxie Marlinspike that anyone interested in the movement to re-decentralize the Internet should watch and think about. Marlinspike concludes "I'm not entirely optimistic about the future of decentralized systems, but I'd also love to be proven wrong".

I spent nearly two decades building and operating in production the LOCKSS system, a small-ish system that was intended, but never quite managed, to be completely decentralized. I agree with Marlinspike's conclusion, and have been writing with this attitude at least 2014's Economies Of Scale In Peer-to-Peer Networks. It is always comforting to find someone coming to the same conclusion via a completely different route, as with scalability expert Todd Hoff in 2018 and now Moxie Marlinspike based on his experience building the Signal encrypted messaging system. Below the fold I contrast his reasons for skepticism with mine.

Thursday, September 17, 2020

Don't Say We Didn't Warn You

Just over a quarter-century ago, Stanford Libraries' HighWire Press pioneered the switch of academic journal publishing from paper to digital when they put the Journal of Biological Chemistry on-line. Even in those early days of the Web, people understood that Web pages, and links to them, decayed over time. A year later, Brewster Kahle founded the Internet Archive to preserve them for posterity.

One difficulty was that although academic journals contained some of the Web content that  was most important to preserve for the future, the Internet Archive could not access them because they were paywalled. Two years later, Vicky Reich and I started the LOCKSS (Lots Of Copies Keep Stuff Safe) program to address this problem. In 2000's Permanent Web Publishing we wrote:
Librarians have a well-founded confidence in their ability to provide their readers with access to material published on paper, even if it is centuries old. Preservation is a by-product of the need to scatter copies around to provide access. Librarians have an equally well-founded skepticism about their ability to do the same for material published in electronic form. Preservation is totally at the whim of the publisher.

A subscription to a paper journal provides the library with an archival copy of the content. Subscribing to a Web journal rents access to the publisher's copy. The publisher may promise "perpetual access", but there is no business model to support the promise. Recent events have demonstrated that major journals may vanish from the Web at a few months notice.

This poses a problem for librarians, who subscribe to these journals in order to provide both current and future readers with access to the material. Current readers need the Web editions. Future readers need paper; there is no other way to be sure the material will survive.
Now, Jeffrey Brainard's Dozens of scientific journals have vanished from the internet, and no one preserved them and Diana Kwon's More than 100 scientific journals have disappeared from the Internet draw attention to this long-standing problem. Below the fold I discuss the paper behind the Science and Nature articles.

Thursday, September 10, 2020

Amazon Is Profitable?

Six years ago, in Two Brief Updates I first wrote about Benedict Evans' insightful analysis of Amazon's financial reports:
He shows how Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into staring and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.
He is now back with a similarly insightful analysis entitled Amazon's profits, AWS and advertising, which starts:
People argue about Amazon a lot, and one of the most common and long-running arguments is about profits. The sales keep going up, and it takes a larger and larger share of US retail every year (7-8% in 2019), but it never seems to make any money. What’s going on?
Below the fold, some details of Evans' explanation.

Tuesday, September 8, 2020

Open Source Saturation

In Supporting Open Source Software I discussed the critical need for better support for contributors to open source projects. Now, Quo Vadis, Open Source? The Limits of Open Source Growth by Michael Dorner, Maximilian Capraro and Ann Barcomb presents statistical evidence suggesting that this problem is affecting the vitality of the open source environment. Follow me below the fold for the details.

Tuesday, September 1, 2020

Shout-Out To Gutenberg Project

I've mentioned before that my father spent his whole career, apart from WW2 as an RNVR watch officer on convoy escorts, at Harrods, the iconic London department store. He even published a textbook on retail distribution. So I can't resist a shout-out to the amazing work of Eric Hutton and the volunteers of Project Gutenberg who, over the last 13 years, have scanned, OCR-ed and proof-read the entire Harrods catalog from 1912. Below the fold, the details.

Thursday, August 27, 2020

Lack Of Anti-Trust Enforcement

The accelerating negative effects that have accumulated since the collapse of anti-trust enforcement in the US have been a prominent theme on this blog. This search currently returns 16 posts stretching back to 2009. Recently, perhaps started by Lina M. Khan's masterful January 2017 Yale Law Journal article Amazon's Antitrust Paradox a consensus has been gradually emerging as to these negative effects. One problem for this consensus is that "real economists" don't believe the real world, they only believe mathematical models that produce approximations to the real world.

Now, Yves Smith's Fed Economists Finger Monopoly Concentration as Underlying Driver of Neoliberal Economic Restructuring; Barry Lynn in Harpers and Fortnite Lawsuit Put Hot Light on Tech Monopoly Power covers three developments in the emerging anti-monopoly consensus:
  1. Apple and Google ganging up on Epic Games.
  2. Lina M. Khan's ex-boss Barry Lynn's The Big Tech Extortion Racket: How Google, Amazon, and Facebook control our lives.
  3. Market Power, Inequality, and Financial Instability by Fed economists Isabel Cairó and Jae Sim
The first two will have to wait for future posts, but the last of these may start to convince "real economists" because, as Yves Smith writes:
they developed a model to simulate the impact of companies’ rising market power, in conjunction with the assumption that the owners of capital liked to hold financial assets (here, bonds) as a sign of social status. They wanted to see it it would explain six developments over the last forty years. ... And it did!
Follow me below the fold for the details.

Thursday, August 20, 2020

Optical Media Durability: Update

Two years ago I posted Optical Media Durability and discovered:
Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
A year ago I repeated the mind-numbing process of feeding 45 disks through the reader and verifying their checksums. It is time again for this annual chore, and once again this year I failed to find any errors. Below the fold, the details.

Tuesday, August 18, 2020

Atlantic Council Report On Software Supply Chains

Eighteen months ago I posted a four-part series called Trust In Digital Content. The second part was Securing The Software Supply Chain, about how we know we're running the software we intended to. Now, Bruce Schneier's Survey of Supply Chain Attacks starts:
The Atlantic Council has released a report that looks at the history of computer supply chain attacks.
The Atlantic Council also has a summary of the report entitled Breaking trust: Shades of crisis across an insecure software supply chain:
Software supply chain security remains an under-appreciated domain of national security policymaking. Working to improve the security of software supporting private sector enterprise as well as sensitive Defense and Intelligence organizations requires more coherent policy response together industry and open source communities. This report profiles 115 attacks and disclosures against the software supply chain from the past decade to highlight the need for action and presents recommendations to both raise the cost of these attacks and limit their harm.
Below the fold, some commentary on the report and more recent attacks.

Tuesday, August 11, 2020

"Good" News For Bitcoin!

David Gerard is fond of pointing out how adept Bitcoin cultists are at spinning every item of news as "good news for Bitcoin". His latest news post has two notable items suited to this spin cycle:
  • Two successive successful 51% attacks on Ethereum Classic.
  • A new, more realistic estimate of Bitcoin's energy usage; it is only as much as Belgium
Follow me below the fold for details and commentary.

Tuesday, August 4, 2020

Contextual vs. Behavioral Advertising

In his New York Times op-ed entitled What if We All Just Sold Non-Creepy Advertising? Gabriel Weinberg, founder and CEO of DuckDuckGo (Jack Dorsey's and my default search engine), draws a clear distinction between the two types of Web advertising:
There is no reason to fear that sites cannot still make money with advertising. That’s because there are already two kinds of highly profitable online ads: contextual ads, based on the content being shown on screen, and behavioral ads, based on personal data collected about the person viewing the ad. Behavioral ads work by tracking your online behavior and compiling a profile about you using your internet activities (and even your offline activities in some cases) to send you targeted ads.
He argues that the creepiness of behavioral ads isn't necessary for sites to make money from ads. Below the fold I look at the evidence that Weinberg is right.

Tuesday, July 28, 2020

After A Decade, HAMR Is Still Nearly Here

At the 2009 Library of Congress workshop on Architectures for Digital Preservation, Dave Anderson of Seagate presented the company's roadmap for hard disks He included this graph projecting that the next recording technology, Heat Assisted Magnetic Recording (HAMR), would take over in the next year, and would be supplanted by a successor technology called Bit Patterned Media around 2015.

I started expressing my gradually increasing skepticism the following year. Now, nearly eleven years after Dave's talk, it is time to follow me below the fold for another update.

Tuesday, July 21, 2020

Twitter Fails Security 101 Again

Source
On July 15 the New York Times reported on the day's events at Twitter:
It was about 4 in the afternoon on Wednesday on the East Coast when chaos struck online. Dozens of the biggest names in America — including Joseph R. Biden Jr., Barack Obama, Kanye West, Bill Gates and Elon Musk — posted similar messages on Twitter: Send Bitcoin and the famous people would send back double your money.
Two days later Nathaniel Popper and Kate Conger's Hackers Tell the Story of the Twitter Attack From the Inside was based on interviews with some of the perpetrators:
Mr. O'Connor said other hackers had informed him that Kirk got access to the Twitter credentials when he found a way into Twitter’s internal Slack messaging channel and saw them posted there, along with a service that gave him access to the company’s servers. People investigating the case said that was consistent with what they had learned so far. A Twitter spokesman declined to comment, citing the active investigation.
Below the fold, some commentary on this and other stories of the fiasco.

Thursday, July 9, 2020

Inefficiency Is Good!

Back in 2015 I wrote Brittle systems and Pushing back against network effects, among other things about the need for resilient systems and the importance of antitrust enforcement in getting them:
All over this blog (e.g. here) you will find references to W. Brian Arthur's Increasing Returns and Path Dependence in the Economy because it pointed out the driving forces, often called network effects, that cause technology markets to be dominated by one, or at most a few, large players. This is a problem for digital preservation, and for society in general, for both economic and technical reasons. The economic reason is that these natural but unregulated monopolies extract rents from their customers. The technical reason is that they make the systems upon which society depends brittle, subject to sudden, catastrophic and hard-to-recover-from failures.
Now, the pandemic has inspired two writers to address the bigger version of the same problem, Bruce Schneier in The Security Value of Inefficiency and Jonathan Aldred in This pandemic has exposed the uselessness of orthodox economics. Below the fold, some commentary.

Tuesday, June 30, 2020

Bill Shannon RIP

Last Thursday my friend Bill Shannon lost a long battle with cancer. The Mercury News has his obituary. I thought to create a Wikipedia page for him as I did for my friend John Wharton. But, true to Bill's unassuming nature, he left almost no footprint on the Web. The lack of reliable sources attesting to his notability made such a page impossible. The brief account below the fold, compiled with invaluable assistance from many of his friends, will have to do instead. Comments with memories of Bill are welcome.

The image is Bill's card from the deck of playing cards the Usenix Association created for the 25th anniversary of the Unix operating system in 1994.

Thursday, June 25, 2020

Deanonymizing Ethereum Users

In last January's Bitcoin's Lightning Network I discussed A Cryptoeconomic Traffic Analysis of Bitcoin’s Lightning Network by the Hungarian team of Ferenc Béres, István A. Seres, and András A. Benczúr. They demolished the economics of the Lightning Network, writing:
Our findings on the estimated revenue from transaction fees are in line with the widespread opinion that participation is economically irrational for the majority of the large routing nodes who currently hold the network together. Either traffic or transaction fees must increase by orders of magnitude to make payment routing economically viable.
Below the fold I comment on their latest work.

Thursday, June 18, 2020

Breaking: Peer Review Is Broken!

The subhead of The Pandemic Claims New Victims: Prestigious Medical Journals by Roni Caryn Rabin reads:
Two major study retractions in one month have left researchers wondering if the peer review process is broken.
Below the fold I explain that the researchers who are only now "wondering if the peer review process is broken" must have been asleep for more than the last decade.

Tuesday, June 16, 2020

Supporting Open Source Software

In the Summer 2020 issue of Usenix's ;login: Dan Geer and George P. Sieniawski have a column entitled Who Will Pay the Piper for Open Source Software Maintenance? (it will be freely available in a year). They make many good points, some of which are relevant to my critique in Informational Capitalism of Prof.  Kapczynski's comment that:
open-source software is fully integrated into Google’s Android phones. The volunteer labor of thousands thus helps power Google’s surveillance-capitalist machine.
Below the fold, I discuss "the volunteer labor of thousands".

Thursday, June 4, 2020

"More Is Not Better" Revisited

Source
I have written many times on the topic of scholarly communication since the very first post to this blog thirteen years ago. The Economist's "Graphic Detail" column this week is entitled How to spot dodgy academic journals. It is about the continuing corruption of the system of academic communication, and features this scary graph. It shows:
  • Rapid but roughly linear growth in the number of "reliable" journals launched each year. About three times as many were launched in 2018 as in 1978.
  • Explosive growth since 2010 in the number of "predatory" journals launched each year. In 2018 almost half of all journals launched were predatory.
Below the fold, some commentary.

Tuesday, June 2, 2020

Informational Capitalism

In The Law of Informational Capitalism, Prof. Amy Kapczynski of the Yale Law School reviews two books, Shoshana Zuboff’s The Age of Surveillance Capitalism and Julie Cohen’s Between Truth and Power: The Legal Constructions of Informational Capitalism to document the legal structures on which the FAANGs and other "big tech" companies depend for their power.

Below the fold, some commentary on her fascinating article.

Tuesday, May 19, 2020

The Death Of Corporate Research Labs

In American innovation through the ages, Jamie Powell wrote:
who hasn’t finished a non-fiction book and thought “Gee, that could have been half the length and just as informative. If that.”

Yet every now and then you read something that provokes the exact opposite feeling. Where all you can do after reading a tweet, or an article, is type the subject into Google and hope there’s more material out there waiting to be read.

So it was with Alphaville this Tuesday afternoon reading a research paper from last year entitled The changing structure of American innovation: Some cautionary remarks for economic growth by Arora, Belenzon, Patacconi and Suh (h/t to KPMG’s Ben Southwood, who highlighted it on Twitter).

The exhaustive work of the Duke University and UEA academics traces the roots of American academia through the golden age of corporate-driven research, which roughly encompasses the postwar period up to Ronald Reagan’s presidency, before its steady decline up to the present day.
Arora et al argue that a cause of the decline in productivity is that:
The past three decades have been marked by a growing division of labor between universities focusing on research and large corporations focusing on development. Knowledge produced by universities is not often in a form that can be readily digested and turned into new goods and services. Small firms and university technology transfer offices cannot fully substitute for corporate research, which had integrated multiple disciplines at the scale required to solve significant technical problems.
As someone with many friends who worked at the legendary corporate research labs of the past, including Bell Labs and Xerox PARC, and who myself worked at Sun Microsystems' research lab, this is personal. Below the fold I add my 2c-worth to Arora et al's extraordinarily interesting article.

Friday, May 15, 2020

Economics Of Decentralized Storage

Almost two years ago, in The Four Most Expensive Words in the English Language , I wrote skeptically about the economics of decentralized storage networks. I followed up two months later with The Triumph Of Greed Over Arithmetic. Now, Got a few spare terabytes of storage sitting around unused? Tardigrade can turn that into crypto-bucks is Thomas Claiburn's report on initial experience with Tardigrade, the "decentralized" storage network from Storj Labs. Below the fold, some more skepticism.

Tuesday, May 5, 2020

Carl Malamud Wins (Mostly)

In Supreme Court rules Georgia can’t put the law behind a paywall Timothy B. Lee writes:
A narrowly divided US Supreme Court on Monday upheld the right to freely share the official law code of Georgia. The state claimed to own the copyright for the Official Code of Georgia Annotated and sued a nonprofit called Public.Resource.Org for publishing it online. Monday's ruling is not only a victory for the open-government group, it's an important precedent that will help secure the right to publish other legally significant public documents.

"Officials empowered to speak with the force of law cannot be the authors of—and therefore cannot copyright—the works they create in the course of their official duties," wrote Chief Justice John Roberts in an opinion that was joined by four other justices on the nine-member court.
Below the fold, commentary on various reports of the decision, and more.

Tuesday, April 28, 2020

Rarely Is The Question Asked

Four years ago the first major Smart Contract was launched. Then this happened:
"Smart contracts" are programs, and programs have bugs. Some of the bugs are exploitable vulnerabilities. Research has shown that the rate at which vulnerabilities in programs are discovered increases with the age of the program. The problems caused by making vulnerable software immutable were revealed by the first major "smart contract". The Decentralized Autonomous Organization (The DAO) was released on 30th April 2016, but on 27th May 2016 Dino Mark, Vlad Zamfir, and Emin Gün Sirer posted A Call for a Temporary Moratorium on The DAO, pointing out some of its vulnerabilities; it was ignored. Three weeks later, when The DAO contained about 10% of all the Ether in circulation, a combination of these vulnerabilities was used to steal its contents.
$25M goes Poof!
Now, David Gerard reports the latest Smart Contract fiascos in The dForce and Hegic DeFi exploits, and why Smart Contracts are bad. One caused the $25M loss shown in the chart, the other caused this reassuring message to users:
!! ALERT A typo has been found in the code. Because of that, liquidity in expired options contracts can’t be unlocked for new options. !! Please EXERCISE ALL OF YOUR ACTIVE OPTIONS CONTRACTS NOW.
Below the fold, some details.

Thursday, April 23, 2020

Funder Publishing Platforms

After posting Never Let A Crisis Go To Waste earlier this month, I can't resist a shout-out to Elizabeth Gadd's The purpose of publications in a pandemic and beyond:
The virus is reminding us that the purpose of scholarly communication is not to allocate credit for career advancement, and neither is it to keep publishers afloat. Scholarly communication is about, well, scholars communicating with each other, to share insights for the benefit of humanity. And whilst we’ve heard all this before, in a time of crisis we realise afresh that this isn’t just rhetoric, this is reality.
Below the fold, a few comments.

Tuesday, April 21, 2020

Outsourcing Reduces Productivity

Source
Salim Furth provides yet more evidence of falling productivity in What’s Behind Falling Productivity: The Census May Hold the Answer:
Records kept since 1940 tell a contrasting story: even as the census has introduced labor-saving technologies, it has required more, not fewer, workers. The efficiency of census-taking appears to have declined over time as it has for most of the economy.
Below the fold, some commentary.

Thursday, April 9, 2020

Yay, Library of Congress!

LoC Web Archive team
The web archiving team at the Library of Congress got some high-visibility, well-deserved publicity in the New York Times with Steven Kurutz's Meet Your Meme Lords:
For the past 20 years, a small team of archivists at the Library of Congress has been collecting the web, quietly and dutifully in its way. The initiative was born out of a desire to collect and preserve open-access materials from the web, especially U.S. government content around elections, which makes this the team’s busy season.

But the project has turned into a sweeping catalog of internet culture, defunct blogs, digital chat rooms, web comics, tweets and most other aspects of online life.
Kurutz did a good job; the article is well worth reading.

Tuesday, April 7, 2020

Never Let A Crisis Go To Waste

On March 13th, an Elsevier press release entitled Elsevier gives full access to its content on its COVID-19 Information Center for PubMed Central and other public health databases to accelerate fight against coronavirus announced:
From today, Elsevier, a global leader in research publishing and information analytics specializing in science and health, is making all its research and data content on its COVID-19 Information Center available to PubMed Central, the archive of biomedical and lifescience at the US. National Institutes of Health’s National Library of Medicine, and other publicly funded repositories globally, such as the WHO COVID database, for as long as needed while the public health emergency is ongoing. This additional access allows researchers to use artificial intelligence to keep up with the rapidly growing body of literature and identify trends as countries around the world address this global health crisis.
Elsevier and the other oligopoly academic publishers have reacted similarly in earlier virus outbreaks. Prof. John Willinsky pounced on this admission that these companies normal restrictive access policies based on copyright ownership slow the progress of science, and thus violate the US Constitution's intellectual property clause:
That Congress shall have Power...To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.
Below the fold I provide some details of his proposal.

Tuesday, March 31, 2020

Archival Cloud Storage Pricing

Although there are significant technological risks to data stored for the long term, its most important vulnerability is to interruptions in the money supply. The current pandemic is likely to cause archives to suffer significant interruptions in the money supply.

In Cloud For Preservation I described how much of the motivation for using cloud services was their month-by-month pay-for-what-you-use billing, which transforms capital expenditures (CapEx) into operational expenditures (OpEx). Organizations typically find OpEx much easier to justify than CapEx because:
  • The numbers they look at are smaller, even if what they add up to over time is greater.
  • OpEx is less of a commitment, since it can be decreased if circumstances change.
Unfortunately, the lower the commitment the higher the risk to long-term preservation. Since it doesn't deliver immediate returns, it is likely to be first on the chopping block. Thus both reducing storage cost and increasing its predictability are important for sustainable digital preservation. Below the fold I revisit this issue.

Tuesday, March 24, 2020

More On Failures From FAST 2020

A Study of SSD Reliability in Large Scale Enterprise Storage Deployments by Stathis Maneas et al, which I discussed in Enterprise SSD Reliability, wasn't the only paper at this year's Usenix FAST conference about storage failures. Below the fold I comment on one specifically about hard drives rather than SSDs, making it more relevant to archival storage.

Tuesday, March 17, 2020

Proof-of-Stake In Practice

At the most abstract level, the work of Eric Budish, Raphael Auer, Joshua Gans and Neil Gandal is obvious. A blockchain is secure only if the value to be gained by an attack is less than the cost of mounting it. These papers all assume that actors are "economically rational", driven by the immediate monetary bottom line, but this isn't always true in the real world. As I wrote when commenting on Gans and Gandal:
As we see with Bitcoin's Lightning Network, true members of the cryptocurrency cult are not concerned that the foregone interest on capital they devote to making the system work is vastly greater than the fees they receive for doing so. The reason is that, as David Gerard writes, they believe that "number go up". In other words, they are convinced that the finite supply of their favorite coin guarantees that its value will in the future "go to the moon", providing capital gains that vastly outweigh the foregone interest.
Follow me below the fold for a discussion of a recent attack on a Proof-of-Stake blockchain that wasn't motivated by the immediate monetary bottom line.

Tuesday, March 10, 2020

Enterprise SSD Reliability

I couldn't attend this year's USENIX FAST conference. Because of the COVID-19 outbreak the normally high level of participation from Asia was greatly reduced, with many registrants and even some presenters unable to make it. But I've been reading the papers, and below the fold I have commentary on an extremely interesting one about the reliability of SSD media in enterprise applications.

Saturday, March 7, 2020

Guest Post: Michael Nelson's Response

Back last June I posted a three part series on Michael Nelson's CNI keynote Web Archives at the Nexus of Good Fakes and Flawed Originals and offered him a guest post to respond. Now, I owe Nelson a profound apology. He e-mailed me in January, but I completely misunderstood his e-mail and missed the attachment containing the HTML of the guest post. It is no real excuse that I was on painkillers and extremely short of sleep at the time.

So, below the fold, greatly delayed through my failure, is Michael Nelson's response, which is also available here.

Tuesday, March 3, 2020

Falling Research Productivity Revisited

Last year, in Falling Research Productivity, I commented on Are Ideas Getting Harder to Find? by Nicholas Bloom et al. Now, The Economist's current issue has a Free Exchange column entitled How to get more innovation bang for the research buck that takes off from the same paper:
In a paper by Nicholas Bloom, Charles Jones and Michael Webb of Stanford University, and John Van Reenen of the Massachusetts Institute of Technology (MIT), the authors note that even as discovery has disappointed, real investment in new ideas has grown by more than 4% per year since the 1930s. Digging into particular targets of research—to increase computer processing power, crop yields and life expectancy—they find that in each case maintaining the pace of innovation takes ever more money and people.
Follow me below the fold for some commentary on a number of the other papers they cite.

Thursday, February 27, 2020

Ludwig Siegele On Data

Ludwig Siegele's latest Special report for The Economist is entitled A deluge of data is giving rise to a new economy. He provides an excellent overview of the impact the availability of vast amounts of data is having on business. But follow me below the fold for my two quibbles.

Tuesday, February 18, 2020

The Scholarly Record At The Internet Archive

The Internet Archive has been working on a Mellon-funded grant aimed at collecting, preserving and providing persistent access to as much of the open-access academic literature as possible. The motivation is that much of the "long tail" of academic literature comes from smaller publishers whose business model is fragile, and who are at risk of financial failure or takeover by the legacy oligopoly publishers. This is particularly true if their content is open access, since they don't have subscription income. This "long tail" content is thus at risk of loss or vanishing behind a paywall.

The project takes two opposite but synergistic approaches:
  • Top-Down: Using the bibliographic metadata from sources like CrossRef to ask whether that article is in the Wayback Machine and, if it isn't trying to get it from the live Web. Then, if a copy exists, adding the metadata to an index.
  • Bottom-up: Asking whether each of the PDFs in the Wayback Machine is an academic article, and if so extracting the bibliographic metadata and adding it to an index.
Below the fold, a discussion of the progress that has been made so far.

Thursday, February 13, 2020

Economic Limits Of Proof-of-Stake Blockchains

In 2018's Cryptocurrencies Have Limits I discussed Eric Budish's The Economic Limits Of Bitcoin And The Blockchain, an important analysis of the economics of two kinds of "51% attack" on Bitcoin and other cryptocurrencies based on "Proof-of-Work" (PoW) blockchains. Among other things, Budish shows that, for safety, the value of transactions in a block must be low relative to the fees in the block plus the reward for mining the block. In last year's The Economics Of Bitcoin Transactions I discussed Raphael Auer's Beyond the doomsday economics of “proof-of-work” in cryptocurrencies, in which Auer shows that:
proof-of-work can only achieve payment security if mining income is high, but the transaction market cannot generate an adequate level of income. ... the economic design of the transaction market fails to generate high enough fees.
Follow me below the fold for a discussion of a fascinating recent paper that extends Budish's analysis.

Tuesday, February 11, 2020

More On The Ad Bubble

Google UI Timeline
Two weeks ago a firestorm erupted over a seemingly insignificant change to the UI of Google's search engine. It was enough to get Google to backtrack. A week later Daisuke Wakabayashi and Tiffany Hsu had the details in Why Google Backtracked on Its New Search Results Look, including this informative timeline graphic of the history of such changes since 2007. Their explanation for why Google made the change was:
Users complained that Google was trying to trick people into clicking on more paid results, while marketing executives said it was yet another step in blurring the line between ads and unpaid search results, forcing them to spend more money with the internet company.
Well, yes, but follow me below the fold for the bigger picture.

Thursday, February 6, 2020

Meta: Slow Blogging

Blogging is slow right now because my physical therapist wants me standing up and moving around at least every 15 minutes. Long-form blogging in 15-minute increments is hard.

Thursday, January 30, 2020

Regulating Social Media: Part 1

It has become obvious that self-regulated social media are a threat to pretty much every country's national security. This is intended to be the start of a series looking at the range of suggestions as to how, at least in the United States, it might be done, including (I hope) at least these:
Below the fold, I start with the first of them.

Tuesday, January 14, 2020

Advertising Is A Bubble

The surveillance economy, and thus the stratospheric valuations of:
Facebook and Alphabet (Google’s parent), which rely on advertising for, respectively, 97% and 88% of their sales.
depend on the idea that targeted advertising, exploiting as much personal information about users as possible, results in enough increased sales to justify its cost.This is despite the fact the both experimental research and the experience of major publishers and advertisers show the opposite. Now, The new dot com bubble is here: it’s called online advertising by Jesse Frederik and Maurits Martijn provides an explanation for this disconnect. Follow me below the fold to find out about it and enjoy some wonderful quotes from them.

Thursday, January 9, 2020

Library of Congress Storage Architecture Meeting

.The Library of Congress has finally posted the presentations from the 2019 Designing Storage Architectures for Digital Collections workshop that took place in early September, I've greatly enjoyed the earlier editions of this meeting, so I was sorry I couldn't make it this time. Below the fold, I look at some of the presentations.

Tuesday, January 7, 2020

Bitcoin's Lightning Network (updated)

Discussions of cryptocurrencies and other blockchain technologies are bedeviled by a nearly universal assumption that attributes that are possible to achieve in theory are guaranteed to be realized in practice. Examples include decentralization and anonymity.

Back in June David Gerard asked:
How good a business is running a Lightning Network node? LNBig provides 49.6% ($3.7 million in bitcoins) of the Lightning Network’s total channel liquidity funding — that just sits there, locked in the channels until they’re closed. They see 300 transactions a day, for total earnings on that $3.7 million of … $20 a month. They also spent $1000 in channel-opening fees.
Even if the Lightning Network worked (which it doesn't), and were decentralized (which it isn't), Gerard's point was that the transaction fees were woefully inadequate to cover the costs of running a node. Now, A Cryptoeconomic Traffic Analysis of Bitcoin’s Lightning Network by the Hungarian team of Ferenc Béres, István A. Seres, and András A. Benczúr supports Gerard's conclusion with a detailed analysis.

Below the fold, some commentary.

Thursday, January 2, 2020

Bunnie Huang's Betrusted Project

The awesome Bunnie Huang asks Can We Build Trustable Hardware? It is a fascinating approach to the problem I discussed in Securing The Hardware Supply Chain:
how we can know that the hardware the software we secured is running on is doing what we expect it to?
Bunnie's experience has made him very skeptical of the integrity of the hardware supply chain:
In the process of making chips, I’ve also edited masks for chips; chips are surprisingly malleable, even post tape-out. I’ve also spent a decade wrangling supply chains, dealing with fakes, shoddy workmanship, undisclosed part substitutions – there are so many opportunities and motivations to swap out “good” chips for “bad” ones. Even if a factory could push out a perfectly vetted computer, you’ve got couriers, customs officials, and warehouse workers who can tamper the machine before it reaches the user.
Below the fold, some discussion of Bunnie's current project.