Thursday, July 28, 2016

End of Moore's Law

Richard Chirgwin at The Register reports that the Semiconductor Industry Association has issued their roadmap for chip technology, the ITRS:
The group suggests that the industry is approaching a point where economics, rather than physics, becomes the Moore's Law roadblock. The further below 10 nanometres transistors go, the harder it is to make them economically. That will put a post-2020 premium on stacking transistors in three dimensions without gathering too much heat for them to survive.
This is about logic, such as CPUs, but it is related to the issues that have forced flash memories to use 3D.

Energy demand of computing
There are other problems than the difficulty of making transistors smaller:
The biggest is electricity. The world's computing infrastructure already uses a significant slice of the world's power, and the ITRS says the current trajectory is self-limiting: by 2040, ... computing will need more electricity than the world can produce.
So we're looking at limits both on the affordability of the amounts of data that can be stored and the computations that can be performed on it.

The ITRS points to the wide range of different applications that the computations will be needed for, and the resulting:
research areas a confab of industry, government and academia see as critical: cyber-physical systems; intelligent storage; realtime communication; multi-level and scalable security; manufacturing; “insight” computing; and the Internet of Things.
We can see the end of the era of data and computation abundance. Dealing with an era of constrained resources will be very different.In particular, enthusiasm for blockchain technology as A Solution To Everything will need to be tempered by its voracious demand for energy. An estimate of the 2020 energy demand of the bitcoin blockchain alone ranges from optimistically the output of a major power station to pessimistically the output of Denmark. Deploying technologies that, like blockchains, deliberately waste vast amounts of computation will no longer be economically feasible.

Tuesday, July 26, 2016

The Citation Graph

An important point raised during the discussions at the recent JISC-CNI meeting is also raised by Larivière et al's A simple proposal for the publication of journal citation distributions:
However, the raw citation data used here are not publicly available but remain the property of Thomson Reuters. A logical step to facilitate scrutiny by independent researchers would therefore be for publishers to make the reference lists of their articles publicly available. Most publishers already provide these lists as part of the metadata they submit to the Crossref metadata database and can easily permit Crossref to make them public, though relatively few have opted to do so. If all Publisher and Society members of Crossref (over 5,300 organisations) were to grant this permission, it would enable more open research into citations in particular and into scholarly communication in general.
In other words, despite the importance of the citation graph for understanding and measuring the output of science, the data are in private hands, and are analyzed by opaque algorithms to produce a metric (journal impact factor) that is easily gamed and is corrupting the entire research ecosystem.

Simply by asking to flip a bit, publishers already providing their citations to CrossRef can make them public, but only a few have done so.

Larivière et al's painstaking research shows that journal publishers and others with access to these private databases (Web of Science and Scopus) can use it to graph the distribution of citations to the articles they publish. Doing so reveals that:
the shape of the distribution is highly skewed to the left, being dominated by papers with lower numbers of citations. Typically, 65-75% of the articles have fewer citations than indicated by the JIF. The distributions are also characterized by long rightward tails; for the set of journals analyzed here, only 15-25% of the articles account for 50% of the citations
Thus, as has been shown many times before, the impact factor of a journal conveys no useful information about the quality of a paper it contains. Further, the data on which it is based is itself suspect:
On a technical point, the many unmatched citations ... that were discovered in the data for eLife, Nature Communications, Proceedings of the Royal Society: Biology Sciences and Scientific Reports raises concerns about the general quality of the data provided by Thomson Reuters. Searches for citations to eLife papers, for example, have revealed that the data in the Web of ScienceTM are incomplete owing to technical problems that Thomson Reuters is currently working to resolve. ...
Because the citation graph data is not public, audits such as Larivière et al's are difficult and rare. Were the data to be public, both publishers and authors would be able to, and motivated to, improve it. It is perhaps a straw in the wind that Larivière's co-authors include senior figures from PLoS, AAAS, eLife, EMBO, Nature and the Royal Society.

Thursday, July 21, 2016

QLC Flash on the horizon

Exabytes shipped
Last May in my talk at the Future of Storage workshop I discussed the question of whether flash would displace hard disk as the bulk storage medium. As the graph shows, flash is currently only a small proportion of the total exabytes shipped. How rapidly it could displace hard disk is determined by how rapidly flash manufacturers can increase capacity. Below the fold I revisit this question based on some more recent information about flash technology and the hard disk business.

Tuesday, July 19, 2016

More on Terms of Service

When Jefferson Bailey & I finished writing My Web Browser's Terms of Service I thought I was done with the topic, but two recent articles bought it back into focus. Below the fold are links, extracts and comments.

Saturday, July 16, 2016

What is wrong with science?

This is a quick post to flag two articles well worth reading.

Wednesday, July 6, 2016

Talk at JISC/CNI Workshop

I was invited to give a talk at a workshop convened by JISC and CNI in Oxford. Below the fold, an edited text with links to the sources.

Tuesday, July 5, 2016

The Major Threat is Economic

I've frequently said that the major threat to digital preservation is economic; back in 2013 I posted The Major Threat is Economic. We are reminded of this by the announcement last March that:
The future of the Trove online database is in doubt due to funding cuts to the National Library of Australia.
Trove is the National Library's system:
In 2014, the database's fifth year, an estimated 70,000 people were using the website each day.

Australia Library and Information Association chief executive Sue McKarracher said Trove was a visionary move by the library and had turned into a world-class resource.
"If you look at things like the digital public libraries in the United States, really a lot of that came from looking at our Trove and seeing what a nation could do investing in a platform that would hold museum, gallery and library archives collections and make them accessible to the world."