Tuesday, June 16, 2020

Supporting Open Source Software

In the Summer 2020 issue of Usenix's ;login: Dan Geer and George P. Sieniawski have a column entitled Who Will Pay the Piper for Open Source Software Maintenance? (it will be freely available in a year). They make many good points, some of which are relevant to my critique in Informational Capitalism of Prof.  Kapczynski's comment that:
open-source software is fully integrated into Google’s Android phones. The volunteer labor of thousands thus helps power Google’s surveillance-capitalist machine.
Below the fold, I discuss "the volunteer labor of thousands".

I pointed out that very few kernel developers are unpaid volunteers:
even if one assumes that all of the “unknown” contributors are working on their own time, well over 85 percent of all kernel development is demonstrably done by developers who are being paid for their work. ... kernel developers are in short supply, so anybody who demonstrates an ability to get code into the mainline tends not to have trouble finding job offers. Indeed, the bigger problem can be fending those offers off. As a result, volunteer developers tend not to stay that way for long.
This was relevant to Kapczynski's comment, the vast bulk of open-source code in Android is the kernel and user-level code developed and supported by Google. However, Android is a very small part of the universe of open source, so I pointed out that the bigger picture is much less rosy:
It is definitely the case that there are gaps in this support, important infrastructure components dependent on the labor of individual volunteers.
Catalin Cimpanu illustrates the scale of the inadequate support problem in Vulnerabilities in popular open source projects doubled in 2019:
A study that analyzed the top 54 open source projects found that security vulnerabilities in these tools doubled in 2019, going from 421 bugs reported in 2018 to 968 last year.

According to RiskSense's "The Dark Reality of Open Source" report, released today, the company found 2,694 bugs reported in popular open source projects between 2015 and March 2020.

The report didn't include projects like Linux, WordPress, Drupal, and other super-popular free tools, since these projects are often monitored, and security bugs make the news, ensuring most of these security issues get patched fairly quickly.

Instead, RiskSense looked at other popular open source projects that aren't as well known but broadly adopted by the tech and software community. This included tools like Jenkins, MongoDB, Elasticsearch, Chef, GitLab, Spark, Puppet, and others.

RiskSense says that one of the main problems they found during their study was that a large number of the security bugs they analyzed had been reported to the National Vulnerability Database (NVD) many weeks after they've been publicly disclosed.

The company said it usually took on average around 54 days for bugs found in these 54 projects to be reported to the NVD, with PostgreSQL seeing reporting delays that amounted to eight months.
Source
Czech firm Jetbrains surveyed nearly 20K developers for their annual Developer Ecosystem survey:
And when asked if they contributed to open-source projects:
  • 44% said "No, but I would like to."
  • 20% said "I have only contributed a few times."
  • 16% said "Yes, from time to time (several times a year)."
  • 11% said "Yes, regularly (at least once a month)."
  • 4% said "No, and I would not like to."
  • 3% said "I work full-time on open-source code and get paid for it."
  • 2% said "I work full-time on open-source code but do not get paid for it."
So only 5% of developers work full-time on open source, and only 16% devote any significant proportion of their time to it. For 36% of developers, contributing is something they do rarely, probably only to fix an annoying bug they encounter. A significant improvement would be if some way could be found to encourage half the "No, but I would like to" developers to contribute rarely, getting occasional contributors to 58% of the population.

Geer and Sieniawski address maintenance of open source software (OSS):
Although there is "a high correlation between being employed and being a top contributor to" OSS, sustaining it takes more than a regular income stream. Long-term commitment to open source stewardship is also essential, as is budgeting time for periodic upkeep. For perspective, consider that 36% of professional developers report never contributing to open source projects, with another 28% reporting less than one open source contribution per year. Thus, despite more direct enterprise engagement with open source, risk-averse attitudes towards licensing risk and potential loss of proprietary advantage endure by and large. Consider further Table 1, which shows how concentrated contribution patterns are, particularly in JavaScript, and thus where additional OSS maintenance support could have an outsized impact.
Here is their Table 1:

Top 50
Packages
Primary
Language
Language
Rank
2019
Language
Rank
2018
Average
Dependent
Projects
Average
Direct
Contributors
npmJS113,500,00035
PipPython2378,000204
MavenJava32167,00099
NuGet.NET/C++6594,000109
RubyGemsRuby1010737,000146

Thirty-five people maintaining code with 100,000 users for each of them is surely a problem, especially when you consider how vulnerable the JavaScript supply chain is, and how tempting a phishing target each of the maintainers are for cryptojackers and other miscreants. To illustrate the problem, the Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks by Marc Ohm, Henrik Plate, Arnold Sykosch and Michael Meier:
presents a dataset of 174 malicious software packages that were used in real-world attacks on open source software supply chains, and which were distributed via the popular package repositories npm, PyPI, and RubyGems.
Nadia Eghbal's 143-page 2016 report for the Ford Foundation Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure is a comprehensive account of the need for support of essential open source infrastructure outside the kernel. She concludes:
In the last five years, open source infrastructure has become an essential layer of our social fabric. But much like startups or technology itself, what worked for the first 30 years of open source’s history won’t work moving forward. In order to maintain our pace of progress, we need to invest back into the tools that help us build bigger and better things.

Figuring out how to support digital infrastructure may seem daunting, but there are plenty of reasons to see the road ahead as an opportunity.

Firstly, the infrastructure is already there, with clearly demonstrated present value. This report does not propose to invest in an idea with unknown future value. The enormous social contributions of today’s digital infrastructure cannot be ignored or argued away, as has happened with other, equally important debates about data and privacy, net neutrality, or private versus public interests. This makes it easier to shift the conversation to solutions.

Secondly, there are already engaged, thriving open source communities to work with. Many developers identify with the programming language they use (such as Python or JavaScript), the function they provide (such as data science or devops), or a prominent project (such as Node.js or Rails). These are strong, vocal, and enthusiastic communities.The builders of our digital infrastructure are connected to each other, aware of their needs, and technically talented. They already built our city; we just need to help keep the lights on so they can continue doing what they do best.

Infrastructure, whether physical or digital, is not easy to understand, and its effects are not always visible, but this should compel us to look more, not less, closely. When a community has spoken so vocally and so often about its needs, all we need to do is listen.
Around the same time as Eghbal, Cameron Neylon wrote about the related problem of infrastructure for academic research in Squaring Circles: The economics and governance of scholarly infrastructures. I discussed it in Cameron Neylon's Squaring Circles, and he expanded on it in his 2017 paper Sustaining Scholarly Infrastructures through Collective Action: The Lessons that Olson can Teach us.

Neylon starts by identifying the three possible models for the sustainability of scholarly infrastructures:
Infrastructures for data, such as repositories, curation systems, aggregators, indexes and standards are public goods. This means that finding sustainable economic models to support them is a challenge. This is due to free-loading, where someone who does not contribute to the support of the infrastructure nonetheless gains the benefit of it. The work of Mancur Olson (1965) suggests there are only three ways to address this for large groups: compulsion (often as some form of taxation) to support the infrastructure; the provision of non-collective (club) goods to those who contribute; or mechanisms that change the effective number of participants in the negotiation.
In other words, the choices for sustainability are "taxation, byproduct, oligopoly". Applying them to open source support:
  • Taxation conflicts with the "free as in beer, free as in speech" ethos of open source.
  • Byproduct is, in effect, the "Red Hat" model of free software with paid support. Red Hat, the second place contributor to the Linux kernel and worth $34B when acquired by IBM last year. Others using this model may not have been quite as successful, but many have managed to survive (the LOCKSS program runs this way) and some to flourish (e.g. Canonical). 
  • Oligopoly is what happens in practice. Take, for example, the Linux Foundation, which is:
    supported by members such as AT&T, Cisco, Fujitsu, Google, Hitachi, Huawei, IBM, Intel, Microsoft, NEC, Oracle, Orange S.A., Qualcomm, Samsung, Tencent, and VMware, as well as developers from around the world
    It is pretty clear the the corporate members, and especially the big contributors like Intel, have more influence than the "developers from around the world".
In 2013 Jack Conte together with Sam Yam launched Patreon, a platform by which "users" of artists' products such as the YouTube music videos Conte and his wife Nataly Dawn made as Pomplamoose, could support them by small monthly payments. Since then Patreon has transferred over a billion dollars from its now over 5M "patrons" to its over 150K members.

Linux Mint donations
Among the Patreon members I patronize is the Linux Mint distro; I use it on many of my computers. Mint raises $10-15K/month in donations, of which about $2.5K/month comes via Patreon. A fairly small proportion of a fairly small income stream, but unlike the rest of the donations it is dependable, regular income. Mint only joined Patreon quite recently, and hasn't been aggressive about marketing its membership. But I think many of their users would be willing to pay Mint's basic $5/month Patreon tier.

Although between the X Window System and the LOCKSS Program I have made contributions to open source software, when writing this post I realized that since my retirement my only contribution has been a fix to a minor but annoying bug in touchpad-indicator, which is an important part of my Linux environment on Acer C720 Chromebooks. I need to do better in future.

2 comments:

Mike Linksvayer said...

> In the Summer 2020 issue of Usenix's ;login: Dan Geer and George P. Sieniawski have a column entitled Who Will Pay the Piper for Open Source Software Maintenance? (it will be freely available in a year).

It seems to be available now https://www.usenix.org/system/files/login/articles/login_summer20_11_geer.pdf

> Taxation conflicts with the "free as in beer, free as in speech" ethos of open source.

How so? OSS which has been created or maintained by tax-funded developers is not itself taxed; it is free as in beer and as in speech.

David. said...

Mike, in Olson & Neylon's terminology "taxation" is a shorthand for "compulsion (often as some form of taxation) to support the infrastructure" (see passage quoted in the post). It generally means what the US government would term a "user fee", i.e. a charge on the users. I stand by the assertion that "compulsion (often as some form of taxation) to support the infrastructure" conflicts with the "free as in beer, free as in speech" ethos of open source. The history of Tor is an example of some of the issues around government-funded open source software.