Thursday, May 24, 2018

How Far Is Far Enough?

When collecting an individual web site for preservation by crawling it is necessary to decide where its edges are, which links encountered are "part of the site" and which are links off-site. The crawlers use "crawl rules" to make these decisions. A simple rule would say:
Collect all URLs starting https://www.nytimes.com/
NoScript on http://nytimes.com
If a complex "site" is to be properly preserved the rules need to be a lot more complex. The image shows the start of the list of DNS names from which the New York Times home page embeds resources. Preserving this single page, let alone the "whole site", would need resources from at least 17 DNS names. Rules are needed for each of these names. How are all these more complex rules generated? Follow me below the fold for the answer, and news of an encouraging recent development.

Tuesday, May 22, 2018

ASICs and Mining Centralization

Three and a half years ago, as part of my explanation of why peer-to-peer networks that were successful would become centralized, I wrote in Economies of Scale in Peer-to-Peer Networks:
When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.
I'm not a blockchain insider. But now in a blockbuster post a real insider, David Vorick, the lead developer of Sia, a blockchain based cloud storage platform, makes it clear that the effect I described has been dominating the Bitcoin and other blockchains for a long time, and that it has led to centralization in the market for mining hardware:
The biggest takeaway from all of this is that mining is for big players. The more money you spend, the more of an advantage you have, and there’s not an easy way to change that equation. At least with traditional Nakamoto style consensus, a large entity that produces and controls most of the hashrate seems to be more or less the outcome, and at the very best you get into a situation where there are 2 or 3 major players that are all on similar footing. But I don’t think at any point in the next few decades will we see a situation where many manufacturing companies are all producing relatively competitive miners. Manufacturing just inherently leads to centralization, and it happens across many different vectors.
Below the fold, the details.

Wednesday, May 16, 2018

Shorter talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the shorter talk, which was updated from and entitled DNA's Niche in the Storage Market .

Longer talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the longer talk, which was updated from and entitled The Medium-Term Prospects for Long-Term Storage Systems.

Monday, May 14, 2018

Blockchain for Peer Review

An initiative has started in the UK called Blockchain for Peer Review. It claims:
The project will develop a protocol where information about peer review activities (submitted by publishers) are stored on a blockchain. This will allow the review process to be independently validated, and data to be fed to relevant vehicles to ensure recognition and validation for reviewers.  By sharing peer review information, while adhering to laws on privacy, data protection and confidentiality, we will foster innovation and increase interoperability.
Everything about this makes sense and could be implemented with a database run by a trusted party, as for example CrossRef does for DOI resolution. Implementing it with a blockchain is effectively impossible. Follow me below the fold for the explanation.

Tuesday, May 8, 2018

Prof. James Morris: "One Last Lecture"

The most important opportunity in my career was when Prof. Bob Sproull, then at Xerox PARC, suggested that I should join the Andrew Project (paper) then just starting at Carnegie-Mellon and run by Prof. James (Jim) Morris. The two years I spent working with Jim and the incredibly talented team he assembled (James Gosling, Mahadev Satyanarayanan, Nathaniel Borenstein, ...) changed my life.

Jim's final lecture at CMU is full of his trademark insights and humor, covering the five mostly CMU computing pioneers who influenced his career. You should watch the whole hour-long video, but below the fold I have transcribed a few tastes:

Monday, May 7, 2018

Might Need Some Work

"I Agree" - Source
Cory Doctorow writes:
"I Agree" is Dima Yarovinsky's art installation for Visualizing Knowledge 2018, with printouts of the terms of service for common apps on scrolls of colored paper, creating a bar chart of the fine print that neither you, nor anyone else in the history of the world, has ever read.
Earlier, Doctorow explained that the GDPR requires that:
Under the new directive, every time a European's personal data is captured or shared, they have to give meaningful consent, after being informed about the purpose of the use with enough clarity that they can predict what will happen to it.

Wednesday, May 2, 2018

"Privacy Is No Longer A Social Norm"

It is widely believed that in 2010 Mark Zuckerberg said "Privacy is no longer a social norm" but apparently that wasn't exactly what he said. Below the fold, I take off from this and other misquotes to look at our home-town's major industry, surveillance. Facebook (now headquartered in Menlo Park) has been getting all the attention recently, but they probably know less about you than Palantir Technologies, still headquartered in Palo Alto.