< January 2019 >
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203

Tuesday, 15 January 2019

11:00 PM

Revisiting authorship, and JOSS software publications [Living in an Ivory Basement] 11:00 PM, Tuesday, 15 January 2019 02:00 PM, Thursday, 06 August 2020

We are slowly working towards a v2.0 release of sourmash, our software for MinHash and modulo hash analysis of genomic data, and the question of proper authorship is once again on my mind!

The question du jour: how should authorship on software papers be decided?

Some background - our previous take on authorship

Those of you with long memories may recall a hullabaloo in 2015 over this occasioned by the khmer v2.0 paper submission to F1000 Research. Briefly, some took exception to our offer of authorship to all contributors to the GitHub repository in the publication, while others thought it was just fine. The reviewers had some interesting things to say about our authorship considerations (see the "Open Peer Review" section of the paper) but despite some reservations ultimately the paper was approved for publication.

One of the strongest outcomes for me of all of this was that I realized how inane any cut & dry principle of scientific authorship was - or, to rephrase, authorship decisions often contain a strong subjective component. Arguments for and against authorship can easily be made in many situations, but there are always corner cases that break each argument.

And this is true even when you have full version control history, as we do on khmer and sourmash.

For one example, it's always possible to argue that even significant code contributions are not major intellectual contributions. "It's just a bug fix", or "that's just engineering", - those or similar arguments can always be made. This makes it difficult to just look at the git commit log!

The same goes in reverse: someone who didn't contribute to the code base at all (and for which there are no tracked contributions) might have made significant intellectual contributions. Lots of action on scientific and software projects happens in other forums or in meatspace, and those should be rewarded too! (In the khmer foofarah, I note that I could have easily said, "hey, look, en zyme and I took long moonlit walks along the banks of the Willamette, and discussed khmer deeply and thoroughly; on that basis I consider their intellectual contribution to be significant and deserving of authorship." Who could have gainsaid me, other than en zyme?)

The bottom line is that I think might not have been entirely right to offer authorship to all committers (in the strict scientific sense of authorship), but at the same time people arguing against it were also off base. In fact, by focusing on git commit records we probably spiked the conversation and centered it around something reasonably verifiable rather than the deeper and more interesting questions. So it goes.

My real conclusion is this:

Fundamentally, in order to nurture a diverse array of valuable scientific contributions, we need new models of publication with new models of authorship.

Authorship v3.0, for sourmash v2.0?

Authorship for khmer was confused by the lack of a good publication venue - F1000 Research is too close to a traditional journal, basically. That confused things.

Conveniently, in the intervening years, a wonderful group of people created the Journal of Open Source Software (JOSS)! And, in fact, this is where sourmash v1.0 was published. But authorship for sourmash 1.0 was easy: it was just me and Luiz Irber, because we were the only contributors!

It is now time to release sourmash 2.0, a much revised and expanded piece of software that a number of people have been using (in my lab and outside it) for many many things. And of course we want to start by publishing an update in JOSS!

Since the publication of sourmash 1.0, however, we have had some amazing contributions to sourmash. Some really significant code and doc contributions have made it in, because we do our best to be friendly about accepting code. But there's more!

Many different members of the Lab for Data Intensive Biology have contributed extensively to the sourmash software by using sourmash in their own projects. A number of people in my lab engaged intellectually with the MinHash and modulo hash methodology, found and reported bugs, asked questions that led to new docs and tests, and motivated new functionality.

Outside of the lab, there were also a number of contributions. We had one user who picked up sourmash for their own project, and over a period of almost two years has reported many bugs and UX problems. A colleague at another institution pointed out that some of our calculations were incorrect, and provided utility code to validate our calculations. Another colleague built a pipeline on top of sourmash and reported several different bugs. Some collaborators really dug into sourmash functionality for another pipeline implementation, and suggested a lot of features. And yet another wonderful person created a bioconda recipe for sourmash, which led to many more users.

From my perspective, all of these people contributed to sourmash, and made the software better. I think they deserve to be offered authorship on the software software paper in JOSS.

(Tracking who receives consideration for authorship is hard on a multi-year project, especially when many contributions come in informally via e-mail or in-person interactions. Thoughts there welcome...)

A rubric for authorship on software publications

I think the right rubric for authorship on software publications is engaged effort in the project. (This comes out of a broader effort to think about sustainability using the Common Pool Resource framework).

What this means in practice is this:

If I or anyone else involved in a project can pinpoint the contribution made by an individual, and it was a positive contribution, as opposed to an extractive one, that person is a contributor. This excludes users who just used the software, but could include users who filed bug reports, asked good questions that led to new documentation, and otherwise engaged with the software.

And contributors deserve to be offered authorship.

Who decides?

I've been involved in several authorship disputes over the years, and, in my opinion and my experience, the senior author decides authorship. After that point other authors can decide if they want their contributions to be part of that scholarly work.

So, basically, in the case of sourmash I get to decide who to invite; at least in this case, there is no question who is the senior author on sourmash 2.0.

In this case, I believe that having submitted code via a PR is definitely a contribution. I'm going to have to think about how to evaluate issue contributions; right now I don't have a good way to summarize them...

Other questions

Authorship discussions are an excellent opportunity for involved commentary and scholarly nitpicking! Here are some of my nits --

When people (inside or outside the lab) use unpublished features of sourmash in their own research, should they include some or all of the sourmash authors on their work? I've generally defaulted to "no" - they should certainly cite sourmash, but even if sourmash enabled otherwise impossible analyses, we didn't contribute specifically to their research, and shouldn't be co-authors.


Is publishing the software in JOSS and then writing papers on various bits of sourmash for other journals so-called "salami slicing", or breaking up publications into their least publishable units?

For example, here we're expecting to have a version-specific JOSS paper, an F1000 "sourmash use cases" paper, and at least two (maybe three) different research papers investigating and using sourmash for different projects. These papers will have different messages, different workflows, and different (but overlapping) sets of authors, so my judgement is that it's not salami slicing. But even if you agree, so many papers will make citing sourmash challenging and complicated, and probably "dilute" the citations to any one paper... a conundrum for those who count citations-per-person!

I think it's up to us to make the message of each paper clear, and provide clear citation guidance for different features (as we have done for khmer, note).


Another question: does authorship keep accruing over versions? Should all the authors on sourmash 2.0 be authors on sourmash 3.0? I'm not sure how to deal with this. Wait a few more years for the khmer 3.0 and 4.0 releases and we'll see ;).

The bottom line

Citations are currency in academia, for better or for worse, so discussions of authorship are always fraught with tension around dilution of this currency. I think it's better to acknowledge people's engaged effort than not, and leave it up to co-authors to explain their own effort when asked. This is more or less how our merit and promotion works here at UC Davis - we have to explain clearly how we have contributed to each paper.

From the perspective of sustainability, I would argue that thinking broadly about contribution is a way to recruit, reward and retain maintainers and engaged users, which could lead to increased sustainability.

(This is all quite different from legal authorship on the software itself, which is a different and differently challenging question that I'm not remotely qualified to discuss. :)

--titus

p.s. Thanks (as always) to Michael Crusoe and Luiz Irber for their major impact on my thinking here! I would also like to especially thank Matthew Turk and Daniel Katz for their engagement with me on this topic over the years.

p.p.s. I would be remiss in not mentioning Casey Greene's tremendously-inspiring-and-as-usual-oddball approach to authorship on the Deep Learning in Biology review paper. So! It is now mentioned!

Feeds

FeedRSSLast fetchedNext fetched after
XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Bits of DNA XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
blogs.perl.org XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
Blue Collar Bioinformatics XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Boing Boing XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Epistasis Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Futility Closet XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
gCaptain XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Hackaday XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
In between lines of code XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
InciWeb Incidents for California XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LeafSpring XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Living in an Ivory Basement XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LWN.net XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Mastering Emacs XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Debian XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Emacsen XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
RNA-Seq Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RStudio Blog - Latest Comments XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RWeekly.org - Blogs to Learn R from the Community XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Adventure Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Allium XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Variance Explained XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
January 2022
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
December 2021
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
November 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
October 2021
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
September 2021
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
August 2021
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
July 2021
MonTueWedThuFriSatSun
28293001020304
05060708091011
12131415161718
19202122232425
26272829303101
June 2021
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
May 2021
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
April 2021
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
March 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
February 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
November 2020
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
September 2020
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
July 2020
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
June 2020
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
May 2020
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
April 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
February 2020
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282901
January 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
December 2019
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
October 2019
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
August 2019
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829303101
July 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
June 2019
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
May 2019
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
April 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
March 2019
MonTueWedThuFriSatSun
25262728010203
04050607080910
11121314151617
18192021222324
25262728293031
February 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728010203
January 2019
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
December 2018
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
November 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
October 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
September 2018
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
August 2018
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
July 2018
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
June 2018
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
May 2018
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
April 2018
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
February 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272801020304
January 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
December 2017
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
November 2017
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
September 2017
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
August 2017
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
March 2017
MonTueWedThuFriSatSun
27280102030405
06070809101112
13141516171819
20212223242526
27282930310102
January 2017
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2016
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
October 2016
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
September 2016
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
August 2016
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
July 2016
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
May 2016
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
April 2016
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
December 2014
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
October 2014
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102