rawdog

<	July 2018	>
Mon	Tue	Wed	Thu	Fri	Sat	Sun
25	26	27	28	29	30	01
02	03	04	05	06	07	08
09	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31	01	02	03	04	05

Sunday, 01 July 2018

10:00 PM

A framework for thinking about Open Source Sustainability? [Living in an Ivory Basement] 10:00 PM, Sunday, 01 July 2018 02:00 PM, Thursday, 06 August 2020

I just revisited Nadia Eghbal’s wonderful post on “the tragedy of the commons” and her thoughts of an alternate ending for it, based on Elinor Ostrom's work on Common Pool Resources, and it resonated with some thinking I’d been doing in another context, and I wanted to share.

Nadia has been exploring the open source sustainability problem (ref), in which a good deal of our important open source software is maintained by a relatively small number of people without much in the way of guaranteed funding. The size and scope of the problem vary depending on who you talk to, but there was a pretty shocking picture of the scientific python computing ecosystem in which there were only half a dozen maintainers for numpy. Since quite a bit of the Python scientific computing ecosystem relies on numpy, it seems critically challenging to have so few maintainers. For an excellent detailed discussion on a specific instance of the general challenges around software development, see "The Astropy problem", Muna et al., 2016.

(The discussion below is mostly focused on scientific software, but I think it might apply much more broadly.)

I myself work mostly in bioinformatics, where the field uses a constantly frothing mixture of software packages that are maintained (or not ;) by a wide variety of people - some graduate students, some postdocs, some faculty, some staff. We develop software in my lab (mostly khmer and sourmash along with some other things) and over the years we’ve developed tips and tricks for keeping the software going, mostly revolving around testing and continuous integration. But there is always something that isn’t working, and even with automation maintenance is a constant low-level burden. Luckily very few people use our software compared to projects like Jupyter, so we are not particularly deluged by bug reports and maintenance issues.

That having been said, the constant need to maintain our open source software affects us quite a bit. It is rare that a week goes by where some piece of software we maintain isn’t revealed to have a bug, or need some evolution to add a new function or command line flag. If I and the other people in the lab are on a research kick (vs coding kick), then we may not get to the problem for a while.

The same is true of training materials. We run an annual two-week workshop on sequence analysis, and every year we need to evolve the lessons a bit to account for new methods, new software, and new data types. While some of the lessons from 2010 may still work, my guess is that most of them have undergone bitrot.

From my own experience as well as from observing quite a few packages over the years, I’ve come to the firm conclusion that open online projects (including software and training materials) that aren’t actively maintained quickly decay. And, if people are actively using a project, they will invariably find bugs and problems that need to be fixed. This completely ignores the people that (lord forfend) actually want to help you improve your online projects and submit pull requests on GitHub that need to be reviewed and merged or rejected, or (if you’re really successful) the companies that want to join your project and merge in their own code.

This need for constant attention to projects, the sprawling ecosystem of amazing scientific software packages, and the relatively small community of actual maintainers, when combined, lead to the open source sustainability problem in science: we do not have the person power to keep it all running without heroic efforts. And when you couple this with the lack of clear career paths for software maintenance in science, it is clear that we cannot ethically and sustainably recruit more people into open source maintainership.

Recently a group of colleagues and I were brainstorming about another open online project (more on that later) and trying to frame it as a common pool resource problem. We were looking for this framing because we knew that success would present a sustainability problem and were hoping to use the common pool resource framework to think about sustainability.

Common Pool Resources, the Tragedy of the Commons, and Design Principles for Sustainability

Virtually everything I know about the common pool resource framework comes from Elinor Ostrom's excellent book, Governing the Commons. This is a tremendously readable book that outlines the problem in a very general way, and discusses how it has been solved by communities in the past.

Briefly, in the 60s and 70s, Elinor Ostrom and her collaborators noted that the so-called "tragedy of the commons”, in which common pool resources were over-utilized by selfish actors, was not an inevitable end; that government regulation and corporatization were not the only solution to managing commonses; and that, in fact, many communities had figured out how to manage common pool resources locally. From her own and others' case studies, Ostrom extracted eight "design principles" for sustainability of common pool resources.

Nadia does a great rundown of this in her blog post, so I will just point you at the eight design principles on Wikipedia, which are very readable.

For this work, Ostrom received the 2009 Nobel prize.

Back to open online projects

So, my colleagues and I wondered, how would this framework apply to an open online project in which we were working with digital resources? After all, digital resources are not consumable in the same way as physical resources, and (to borrow from the open source musings above) it’s not like someone using our project’s source code is consuming that source code in such a way that it would not be usable by others.

During this conversation, we realized the answer: effort. The common pool resource in open online projects is effort.

When a contributor to a project adds a feature, what are they doing? Applying effort. When a contributor files a bug report? They’re applying effort. When they file a good bug report? More effort. When they write documentation? When they test a feature? When they suggest a new feature? They’re applying effort, all of it.

But it goes deeper than that. When you bring a new contributor into a project, you’re growing the available pool of effort. When you engage a new investor in supplying funding for an open source project, often that funding goes to increasing the amount of dedicated effort that is being applied to the project.

Of course, not all contributions are positive in their effect on effort, as I wrote about here. Some contributions (new feature proposals, or bad bug reports) cost the project more net effort than they bring. Significant feature additions that don’t come with contributions to the underlying maintenance of the project can be very costly to the core project maintainers, if only in terms of reviewing and rejecting. And underpinning all of this is the low susurration of maintenance needs: as I outline above, maintenance needs act as a net drag on project effort.

At #GCCBOSC, Fernando Perez riffed on this same idea a bit, and pointed out that there are other extractive approaches being used by people who recruit from open source projects. Many companies recruit out of open source communities, and in a simple sense they are mining the effort that the open source community put into training the people in question.

If you look at the list of eight design principles for a sustainable common pool resource, and define “effort” as the common pool resource in question, you see that they apply more or less directly to the way open source projects have evolved:

Who is a contributor to an open source project is clearly defined.
Effort in open online projects is applied locally, to the needs of the project.
Many open source projects follow the rule that those who contribute participate in design decisions.
People who contribute significantly are often invited to join the project officially in more significant decision making roles.
There is often a range of sanctions for contributors who violate community rules.
Most conflicts are handled internally to the project, rather than being escalated to the legal system.
Most conflicts are handled by lightweight methods and discussions.
Many open source contributors contribute to multiple projects, e.g. in the Python ecosystem there are many projects to which the same people contribute. In this sense the Python ecosystem can be considered a larger-scale CPR of effort with many locally articulated CPRs like "core CPython dev" and "numerical computing/numpy library".

It seems likely to me that this will generalize to open communities of many kinds.

So, uhh, what does this all mean?

Since my colleagues and I started thinking this way, and I started looking at open source projects and other online community resources through this lens, it has proven to be a really nice simple framework for me to think about open source sustainability. The entire “How open is too open?” blog post came directly out of this thinking! It also gives a straightforward explanation for why recruiting more people to your project is viewed so positively: you’re increasing the pool of effort available for allocation to the project's needs; this further explains why Codes of Conduct and contributor guidelines are becoming so important, because they enable you to recruit and retain effort over the long term.

By itself, this perspective doesn’t solve any problems. But it does tie into a really nice collection of case studies, and a lot of deep thinking from the CPR literature about how to sustainably manage community resources.

More specifically, in the context of generic open online projects, it suggests a few points of consideration.

First, the pool of effort available to an open online project needs to be preserved against encroachment. For successful projects, this means that potential contributions should evaluated in terms of their net likely impact on the pool of effort. While this principle is already enshrined for technical contributions (see: “technical debt”), it should also be considered for bug reports and feature suggestions. (Many projects already do this, of course!)

Second, the cost of the constant maintenance needs (code, documentation, installation, etc.) on the pool of available effort needs to be taken into account. Contributions of new features that do not come with effort applied to maintenance should be carefully considered - is this new contributor likely to stick around? Can they and will they devote some effort to maintenance? If not, maybe those contributions should be deferred in favor of contributions that add maintenance effort to the project, e.g. via partnerships.

Third, training and nurturing new contributors should be considered in the cold hard light of increasing the available effort over the long term. But contributor psychology is tricky, so it may not be simple to predict who will stick around. Some projects have excellent incubators, like the Python Core Mentorship Program, where people who are interested in applying their effort to recruiting new contributors can do so. I suspect that considerations like creating a friendly environment and laying out expectations like “we’re happy to help you get up to speed on both adding new features AND FIXING BUGS so that you contribute to our maintenance effort” might help point new contributors in the right direction. In the long term, the health of the community is the health of the project.

Fourth, there are some interesting governance implications around allowing all or most of the resource appropriators to participate in decision making. I need to dig more into this, but, briefly, I think projects should formally lay out what level of investment and contribution is rewarded with what kind of operational, policy making, and constitutional decision making authority.

Fifth, defining maturity metrics may help with setting funder expectations and obtaining investments. From experience, the primary goal of many funders is to chart a path to project sustainability. I think the above design principles (and the case studies from CPR) can serve as a foundation for a set of project maturity rubrics that connect to sustainability. If a project is writing a funding proposal, they could articulate which design principles they’re targeting for improvement and how that ties into a broader framework of sustainability. For example, a project could say “right now we are worried about our ability to onboard many new contributors, and we’re starting to get some inquiries from companies about contributing on a larger scale; we’d like to build out our governance principles and improve our guidance for contributors, so that new contributors and investors have clear expectations about what level of investment is expected from them.” I suspect funders would welcome this level of clarity :).

Does the Common Pool Resource framework for “effort" really fit open source projects, and open online projects generally?

Good question! I am by no means knowledgeable about CPR, and I have a lot of reading ahead of me. I can see a few disconnects with the CPR framework, and I need to work through those; but I’m really, really excited by how well it fits with my intuition about how open source projects work. Having a conceptual framework like CPR is letting me revisit my observations and fit them into a neater picture and maybe reach different conclusions about fixes. Again, check out the “How open is too open?” blog post for an example of this.

Something else I really want to try is to engage in case studies of open source projects to see how practices in real living sustained open source projects do and don’t fit this framework. I have a sabbatical opportunity coming up in a few years… hmm….

One of the things I really like about this framework is that it divorces open source project thinking from the fuzzy woo that I used to value so much about free and open source software -- “we should be a big happy family and all work together!” starts to pale in the face of maintainers' lives getting destroyed by their commitment to open. This intersects with a comment that Fernando Perez made: open source projects really do run a significant part of the world these days, and we cannot afford to think about any part of them - including sustainability - in the informal way we are used to. We should hold ourselves accountable to the goal of building sustainable open projects, and lay out a realistic and hard-nosed framework by which the investors with money (the tech companies and academic communities that depend on our toolkits) can help foster that sustainability. To be clear, in my view it should not be the job of (e.g.) Google to figure out how to contribute to our sustainability; it should be our job to tell them how they can help us, and then follow through when they work with us. Many important projects are really incapable of doing this at the moment, however, and in any case there are no simple solutions in this space. But maybe CPR is a framework for thinking about it. We shall see!

On a personal level, it's interesting to look back at my various open source project efforts (none of which are sustainable/sustained :) and see why they fail to meet the design principles laid out by Ostrom. That may be another blog post down the road...

--titus

P.S. I'd like to especially thank Cameron Neylon and Michael Nielsen for pointing Elinor Ostrom's work out to me a few years back! I would also like to thank Nadia Eghbal for all of her explorations of this topic - the framing she provided in her blog posts is really instrumental to this post, and I hope to be fellow travelers going forward :)

P.P.S. While I'm dropping names, I got excellent feedback and suggestions on this topic from many people, including Luiz Irber, Katy Huff, Katie Mack, Cory Doctorow, Jake VanderPlas, Tracy Teal, Fernando Perez, Michael Crusoe, and Matthew Turk - thank you all! Several of these conversations were enabled by #scifoo18 and #gccbosc, so thanks, SciFoo and BOSC!

Feed	RSS	Last fetched	Next fetched after
	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Bits of DNA	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
blogs.perl.org	XML	12:00 AM, Tuesday, 18 January 2022	12:15 AM, Tuesday, 18 January 2022
Blue Collar Bioinformatics	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Boing Boing	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Epistasis Blog	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Futility Closet	XML	12:00 AM, Tuesday, 18 January 2022	12:15 AM, Tuesday, 18 January 2022
gCaptain	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Hackaday	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
In between lines of code	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
InciWeb Incidents for California	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
LeafSpring	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Living in an Ivory Basement	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
LWN.net	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Mastering Emacs	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Planet Debian	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Planet Emacsen	XML	12:00 AM, Tuesday, 18 January 2022	12:15 AM, Tuesday, 18 January 2022
RNA-Seq Blog	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
RStudio Blog - Latest Comments	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
RWeekly.org - Blogs to Learn R from the Community	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
The Adventure Blog	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
The Allium	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022
Variance Explained	XML	12:00 AM, Tuesday, 18 January 2022	12:30 AM, Tuesday, 18 January 2022

planet.donarmstrong.com