Tuesday, September 8, 2020

Open Source Saturation

In Supporting Open Source Software I discussed the critical need for better support for contributors to open source projects. Now, Quo Vadis, Open Source? The Limits of Open Source Growth by Michael Dorner, Maximilian Capraro and Ann Barcomb presents statistical evidence suggesting that this problem is affecting the vitality of the open source environment. Follow me below the fold for the details.

Their abstract reads (my emphasis):
Open source software plays a significant role in the software industry. Prior work described open source to be growing polynomially or even exponentially. However, such growth cannot be sustained infinitely given finite resources. In this study, we present the results of four accumulated measurements on size and growth of open source considering over 224,000 open source projects for the last 25 years. For each of those projects, we measured lines of code, commits, contributors and lifecycle state over time, which reproduces and replicates the measurements of three well-cited studies. We found the number of active open source projects has been shrinking since 2016 and the number of contributors and commits has decreased from a peak in 2013. Open source -- although initially growing at exponential rate -- is not growing anymore. We believe it has reached saturation.
Commit rate
The authors observe (Fig 4) that the monthly commit rate grew exponentially until 2010, peaked in 2013 and then declined until, in 2019 it matched the rate from 2007.

I would suggest that this is an application of W. Brian Arthur's model of the tech economy. When a new market niche is discovered, many efforts to address it will be started. Over time increasing returns to scale (network effects) will propel a few, perhaps only one, of them to dominate the niche.

The winners will probably grow to dominate the niche by adding features (commits) rapidly. The losers will be unable to keep up, so their commit rates will drop and they will become inactive, and then abandoned. As competition for the winners decreases, they will add fewer features. Their commits will increasingly be bug fixes, or addressing vulnerabilities, so thei rate will decrease too, but it wil stabilize at a lower but constant rate. Note that this model does not apply to a few large, aggregate projects such as the Linux kernel.

Project lifecycle
Perhaps the most interesting graph is Fig. 6, showing the population of open source projects at different stages of their lifecycle through time. The authors observe that:
we were able to confirm an exponential growth until 2013 over all available projects, while most of the projects are inactive – they do not receiving [sic] a commit within a given month. ... most inactive projects never receive a contribution again – they are abandoned.
And also that:
The portion of actively developed open source projects which receive at least one contribution in a given month is small and approximately constant over time.
If I'm right, the active group of projects will comprise those large projects, typically infrastructure, and a group of smaller projects still competing to dominate their niche. It would be very interesting to stratify Dorner et al's active project data by both age and size. One would expect two groups:
  • Larger, older, actively maintained projects.
  • Smaller, younger projects that, over time, become inactive.
Note that a smaller project that is inactive may be widely used. It has dominated its niche, acquired all important features, and shaken out all the easy-to-find bugs. A small proportion of smaller projects may remain in this mature state until obsoleted.

Active contributor rate
The authors observe (Fig 7) the behavior of the number of active contributors per month to be similar to the number of commits per month. It grew exponentially until 2010, peaked in 2013, and then declined until, in 2019, it matched the rate from 2007.

Of course, if all contributors were equally productive, this is what one would expect. They aren't; it is well-known that programmer productivity is a long-tailed distribution. But presumably the distribution doesn't change much with time, so the difference between the mean and median productivity is relatively stable, leading to the same result.

The authors suggest some possible explanations for the saturation of open source that they observe:
  • A decrease in developers willing to volunteer, and no corresponding increase in paid development work
  • The shift from volunteer to paid contributions reducing the effective time for contributing for each participant, due to company resource management
  • An increase in episodic participation [3], with more people preferring to volunteer less
  • A generational shift (the mean age of contributors in 2005 was 31, and in 2017 it was 30 [17,18]) from collective to reflexive volunteering [21], perhaps in response to the growing role of open source participation in career development
  • Increasing code complexity requiring skills fewer developers possess, and discouraging newcomers [45]
  • Increasing formalization of software projects, requiring significant effort on the part of developers to adhere to submission or foundation guidelines
  • A decreased quality of contributions and, therefore, a lower acceptance rate of contributions and an overload for reviewers and committers
These all seem plausible, but I would add one more, again based on W. Brian Arthur's model. Over time, the likelihood that a newly opened market niche is adjacent to an existing winner increases. So demand for new features is increasingly likely to be satisfied by one of its regular contributors adding a small number of commits to an existing project, rather than several new contributors making a larger number of commits to start a new project. This is analogous to the way the absence of anti-trust enforcement allows the tech oligopolists to suppress competition from new startups.

Tip of the hat to Glyn Moody, who concludes:
The new research might be an indication that the open source community, which has selflessly given so much for decades, is showing signs of altruism fatigue. Now would be a good time for companies to start giving back by supporting open source projects to a much greater degree than they have so far.

5 comments:

Stefane Fermigier said...

As indicated in the paper: "The most serious threat to our the validity of our study is the unknown precision and accuracy of Open Hub as measurement system."

I went to my OpenHub profile (https://www.openhub.net/accounts/sfermigier) and, according to it, I haven't committed since 2014 (which is not true at all).

I'm pretty sure the OpenHub data are massively bogus, hence the paper too (Garbage in, Garbage out).

Etienne Juliot said...

Hi.

These statistics counts the number of commits. But since Git is now mainstream for most of Open Source projects, it dramatically changes the number of commits you need to push a new features. And 2010 is close to the switch of lots of Open Source projects to Git.
First, I remember when we were working with SVN, we were doing lots of small commit to avoid conflicts and because it was the way to go. Now, with Git AND an efficient CI, we commit when the feature is coded, the tests are OK, the documentation is here, and the build is without regression. It changes the game of this number of commits.
As you can commit locally then squash your commit to make them coarse-grained, it can influence this number too.
It should be interested too to know if this number is only about the master, but also all branches and even the review branches. At my company (Obeo), we are working on several Open Source of the Eclipse Foundation, and our workflow is to commit all on going work to gerrit, to use this gerrit branch to test and validate. Then, only when it is finish, gerrit push to the master (or an official branch) the result, with one commit. I checked at https://www.openhub.net/p/eclipse_sirius and I highly suspect that commits inside Gerrit, which represent the large majority of our work, isn't counted.

David. said...

Clive Thompson's The Few, the Tired, the Open Source Coders is subtitled:

"The open source movement runs on the heroic efforts of not enough people doing too much work. They need help."

Thompson writes:

"Recently, Nadia Eghbal—the head of writer experience at the email newsletter platform Substack—published Working in Public, a fascinating book for which she spoke to hundreds of open source coders. She pinpointed the change I'm describing here. No matter how hard the programmers worked, most “still felt underwater in some shape or form,” Eghbal told me.

Why didn't the barn-raising model pan out? As Eghbal notes, it's partly that the random folks who pitch in make only very small contributions, like fixing a bug. Making and remaking code requires a lot of high-level synthesis—which, as it turns out, is hard to break into little pieces. It lives best in the heads of a small number of people.

Yet those poor top-level coders still need to respond to the smaller contributions (to say nothing of requests for help or reams of abuse). Their burdens, Eghbal realized, felt like those of YouTubers or Instagram influencers who feel overwhelmed by their ardent fan bases—but without the huge, ad-based remuneration."

shanen comments:

"I actually blame rms for confusing "free" with "free". He's gone away now, but the confusion is apparently his legacy and the main result is the "competitive failure" of most OSS (per the Wired story). The existence of a few counterexamples of success doesn't change the big picture of LOTS of failure."

Using the analogy of venture capital illuminates this. The reason some startups (e.g. Nvidia) are very successful is precisely because a huge number of startups failed. In Nvidia's case, 6 months after we started we knew of 36 other startups trying to build 3D graphics chips. All but 1 failed. This rate of failure should be expected. The reason a few open source projects are very successful is that lots of things were tried - the market selected the best and the much larger number of the rest failed. The market needs lots of different efforts in order to find the best. Open source, with very low barriers to entry, is extremely good at this.

David. said...

Thomas Claiburn reports that OpenCollective opens cash conduit between tech biz and unappreciated developers:

"OpenCollective, an online funding and community platform founded in 2015, on Wednesday launched Funds for Open Source, a program to facilitate financial support for open source software projects.

Open source software is everywhere but its rewards have not been evenly distributed. Technology giants like Amazon, Apple, Facebook, Google, and Microsoft all depend upon open source software, but not all those who develop the code underlying these businesses get recognized, paid, or supported."

See also Github Sponsors.

David. said...

Danny Bradbury's When software depends on a project thanklessly maintained by a random guy in Nebraska, is open source sustainable? is a excellent overview of the problem:

"Commercial reliance on open-source software (OSS) is huge. Software integrity company Synopsys, which publishes a regular report on open-source security and risk, found that the number of open-source components per commercial application jumped from 84 in 2016 to 528 last year. Yet the money that open-source maintainers get for working on this software, often in their free time, hasn't grown much if at all.

Funding for OSS projects is typically dire. In 2019, developer André Staltz collected data from Open Collective and GitHub to assess project revenues. Over 50 per cent of projects couldn't sustain their maintainers above the poverty line, while 31 per cent generated enough for a salary considered unacceptable in the industry."