Artigo Acesso aberto Revisado por pares

Elevating The Status of Code in Ecology

2015; Elsevier BV; Volume: 31; Issue: 1 Linguagem: Inglês

10.1016/j.tree.2015.11.006

ISSN

1872-8383

Autores

K. A. S. Mislan, Jeffrey Heer, Ethan P. White,

Tópico(s)

Research Data Management Practices

Resumo

Code is frequently written for ecological studies. Most ecology journals do not address code or software. Journals can promote release of code by changing article formats and requirements. Code archives should provide a license and be long-term and citable. Code is increasingly central to ecological research but often remains unpublished and insufficiently recognized. Making code available allows analyses to be more easily reproduced and can facilitate research by other scientists. We evaluate journal handling of code, discuss barriers to its publication, and suggest approaches for promoting and archiving code. Code is increasingly central to ecological research but often remains unpublished and insufficiently recognized. Making code available allows analyses to be more easily reproduced and can facilitate research by other scientists. We evaluate journal handling of code, discuss barriers to its publication, and suggest approaches for promoting and archiving code. Most ecologists now commonly write code as part of their laboratory, field, or modeling research. The transition to a greater reliance on code has been driven by increases in the quantity and types of data used in ecological studies, alongside improvements in computing power and software [1Hampton S.E. et al.Big data and the future of ecology.Front. Ecol. Environ. 2013; 11: 156-162Crossref Scopus (558) Google Scholar]. Code is written in programming languages such as R and Python, and is used by ecologists for a wide variety of tasks including manipulating, analyzing, and graphing data. A benefit of this transition to code-based analyses is that code provides a precise record of what has been done, making it easy to reproduce, adapt, and expand existing analyses. Scientific code can be separated into two general categories – analysis code and scientific software. Analysis code is code that is used to correct errors in data, simulate model results, conduct statistical analyses, and create figures [2Peng R.D. Reproducible research in computational science.Science. 2011; 334: 1226-1227Crossref PubMed Scopus (754) Google Scholar]. Release of analysis code is necessary for the results of a study to be reproducible [2Peng R.D. Reproducible research in computational science.Science. 2011; 334: 1226-1227Crossref PubMed Scopus (754) Google Scholar]. The majority of code written for ecological studies is analysis code, and making this code available is valuable even if it is rough because it documents precisely what analyses have been conducted [2Peng R.D. Reproducible research in computational science.Science. 2011; 334: 1226-1227Crossref PubMed Scopus (754) Google Scholar, 3Hampton S.E. et al.The Tao of open science for ecology.Ecosphere. 2015; 6: 120Crossref Scopus (102) Google Scholar, 4Barnes N. Publish your computer code: it is good enough.Nature. 2010; 467: 753Crossref PubMed Scopus (185) Google Scholar]. Scientific software is more general and is designed to be used in many different projects (e.g., R and Python packages). The development of ecological software is becoming more common and software is increasingly recognized as a research product [5Rubenstein M.A. Dear Colleague Letter – Issuance of a New NSF Proposal & Award Policies and Procedures Guide. National Science Foundation, 2012www.nsf.gov/pubs/2013/nsf13004/nsf13004.jspGoogle Scholar, 6Poisot T. Best publishing practices to improve user confidence in scientific software.Ideas Ecol. Evol. 2015; 8: 50-54Google Scholar]. Journals are the primary method that ecologists use to communicate results of studies. Therefore, the way journals handle code is important for evaluating the current status of code in ecology. To explore the current status of code in ecology journals, we identified journals through a search of the Journal Citation Reports (JCR) using the following search terms: 'Ecology' for category, '2013' for year, 'SCIE' (Science Citation Index) and 'SSCI' (Social Sciences Citation Index) for editions checked, and 'Web of Science' for the category schema. We selected the top 100 results for analysis and, after excluding museum bulletins, a book, and a journal with broken website links, evaluated a total of 96 journals. We searched the author guidelines for each journal to determine if there was any mention of code or software in the context of scientific research. We also conducted more specific searches to determine if journals had a section for documentation of scientific software releases, and if journals had a policy requiring the release of code and/or data for article publication. Data release policies provide a useful comparison to code release policies because there have been ongoing efforts to encourage or require the release of data once results are published (e.g., [7Whitlock M.C. et al.Data archiving.Am. Nat. 2010; 175: 145-146Crossref PubMed Scopus (127) Google Scholar]). As of June 1, 2015, more than 75% of ecology journals do not mention scientific code in the author guidelines (Figure 1). Of the journals that mention scientific code, only 14% require code to be made available. Nearly threefold more journals (38%) require data to be made available. A very small subset of journals (7%) have created a special section for software releases or have added software releases to a list of options for existing methods sections (Figure 1). These findings are similar to a recent analysis of journal code policies in other scientific fields [8Stodden V. et al.Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals.PLoS ONE. 2013; 8: e67111Crossref PubMed Scopus (134) Google Scholar]. Elevating the status of code in ecology will require changes in attitude and policy by both journals and researchers. Researchers are often concerned about making their code public for a variety of reasons [4Barnes N. Publish your computer code: it is good enough.Nature. 2010; 467: 753Crossref PubMed Scopus (185) Google Scholar, 9Ince D.C. et al.The case for open computer programs.Nature. 2012; 482: 485-488Crossref PubMed Scopus (361) Google Scholar]. One of the main concerns is that publishing code takes time and researchers do not receive sufficient credit to justify this effort. This is compounded by concerns that releasing code may increase the risk of being scooped or hinder the researcher's (or their institution's) ability to commercialize the software [9Ince D.C. et al.The case for open computer programs.Nature. 2012; 482: 485-488Crossref PubMed Scopus (361) Google Scholar]. In ecology, we believe that the benefits of publishing code outweigh the potential risks. There is little potential for commercialization of ecological analysis code, or even software, and reuse of code by others will raise the impact of the publications by the author of the code. It is also common for scientists to believe that their code is not useful and that the description of what their code does (typically in the methods section of a journal article) is sufficient to allow the analysis to be reproduced. However, computational and statistical methods have become increasingly complicated, and access to the analysis code is now crucial to understanding precisely how analyses were conducted [2Peng R.D. Reproducible research in computational science.Science. 2011; 334: 1226-1227Crossref PubMed Scopus (754) Google Scholar, 4Barnes N. Publish your computer code: it is good enough.Nature. 2010; 467: 753Crossref PubMed Scopus (185) Google Scholar, 9Ince D.C. et al.The case for open computer programs.Nature. 2012; 482: 485-488Crossref PubMed Scopus (361) Google Scholar]. Even code that is rough and difficult to run on other systems (owing to software dependencies and differences in computing platforms) still provides valuable information as part of detailed documentation of the analyses [2Peng R.D. Reproducible research in computational science.Science. 2011; 334: 1226-1227Crossref PubMed Scopus (754) Google Scholar, 4Barnes N. Publish your computer code: it is good enough.Nature. 2010; 467: 753Crossref PubMed Scopus (185) Google Scholar, 9Ince D.C. et al.The case for open computer programs.Nature. 2012; 482: 485-488Crossref PubMed Scopus (361) Google Scholar]. Given the relatively low risk and potentially large benefit to science of releasing code, sufficient incentives are needed to motivate scientists to take the time to do so. Journals can promote the release of code used in ecological studies by increasing the visibility and discoverability of code and software. One way to increase visibility is to indicate code availability in the table of contents of all formats of the journal and provide direct links from the online table of contents to the code (Figure 2A) . In the article, links to code prominently displayed on the first page will also increase visibility (Figure 2B). This article format for data has already been adopted by some ecology journals, including The American Naturalist. In addition, journals can require and verify that code is made available at the time an article is submitted for review or is accepted for publication [10Nosek B.A. et al.Promoting an open research culture.Science. 2015; 348: 1422-1425Crossref PubMed Scopus (1277) Google Scholar]. Requirements by journals for data to be made available have been very successful [3Hampton S.E. et al.The Tao of open science for ecology.Ecosphere. 2015; 6: 120Crossref Scopus (102) Google Scholar]. Specialized software sections in journals go a step further in promoting highly refined code that can be used broadly for ecological analyses and visualization, and provide an associated publication [11Pettersson L.B. Rahbek C. Editorial: launching Software Notes.Ecography. 2008; 31: 3Crossref Scopus (2) Google Scholar]. Communicating the availability of software in a well-described journal format to the ecology community highlights software as a product of ecological research. Discoverability can be enhanced if searchable databases for articles (e.g., journal archives, Web of Science, and PubMed) include an option for searching for articles with code. This search capability would make it more feasible to find, compare, and adapt code from multiple research articles for a new study. To increase the value of code releases within the existing academic incentive structure, papers and other scientific products that use publicly-available code need to cite the code and associated publication (if there is one). Journals should encourage or require the citing of code, and provide instructions and examples for how to do so in the author instructions. Citing code will increase the impact of journal articles which include code, and provide credit to ecologists developing valuable software resources. It is also important to consider how best to make ecological code publicly available. Ecologists may not be aware of the steps needed to archive code or the ease of doing so with available resources [3Hampton S.E. et al.The Tao of open science for ecology.Ecosphere. 2015; 6: 120Crossref Scopus (102) Google Scholar, 12Stodden V. Miguez S. Best practices for computational science: software infrastructure and environments for reproducible and extensible research.J. Open Res. Software. 2014; 2: 1-6Crossref Google Scholar, 13Wilson G. et al.Best practices for scientific computing.PLoS Biol. 2014; 12: e1001745Crossref PubMed Scopus (361) Google Scholar]. Table 1 compares some of the common resources available for archiving code. A license, which states the conditions under which the code can be used, should be included with a submission to an archive. If a submission does not include a license, then no one will be able to use the code. Most of the resources in Table 1 provide a license or license options, making it easy to add a license when code is submitted. Archives need to be long-term, assuring continuous availability ([14White E.P. Some thoughts on best publishing practices for scientific software.Ideas Ecol. Evol. 2015; 8: 55-57Google Scholar], https://caseybergman.wordpress.com/2012/11/08/on-the-preservation-of-published-bioinformatics-code-on-github/). All of the resources in Table 1 store submissions for the long-term except for GitHub and Bitbucket. Some of the archives assign code submissions a digital object identifier (DOI), which makes code straightforward to cite in scientific publications. Other considerations are whether it is possible to search specifically for code within the archive, the process for uploading code, and the cost of archiving code. Most of the archives host code for free if the code is made publicly available. Overall, Zenodo, Figshare, Dryad, and PANGAEA are good options for archiving because they provide licenses, are long-term, and are easily citable (Table 1).Table 1Comparison of Common Resources (Zenodo, Figshare, Dryad Digital Repository, PANGAEA Data Publisher, GitHub, and Bitbucket) Used for Archiving Code and DataaFor the default licenses: flexible means that multiple license options are available from a menu, MIT is the Massachusetts Institute of Technology License, CC0 is the Creative Commons Zero License, and CC-BY is the Creative Commons Attribution License. DOI, digital object identifier. Zenodo, Figshare, Dryad, and PANGAEA are good options for archiving because they provide licenses, are long-term, and are citable. The cost to authors assumes that the code is publicly available. Note that the information in this table is subject to change.ZenodoFigshareDryadPANGAEAGitHub and BitbucketSupplementary MaterialDefault LicenseFlexibleMITCC0CC-BYFlexibleNoneLong-termYesbLong-term availability depends on continued government funding or the success of the companies involved.Yes bLong-term availability depends on continued government funding or the success of the companies involved.Yes bLong-term availability depends on continued government funding or the success of the companies involved.Yes bLong-term availability depends on continued government funding or the success of the companies involved.NoYes bLong-term availability depends on continued government funding or the success of the companies involved.Assigns DOIYesYesYesYesNoNoCode Search OptionYesYesNoNoYesNoUpload from GitHubYesNoNoNo−NoCost to AuthorNoneNonePossibleNoneNoneNonea For the default licenses: flexible means that multiple license options are available from a menu, MIT is the Massachusetts Institute of Technology License, CC0 is the Creative Commons Zero License, and CC-BY is the Creative Commons Attribution License. DOI, digital object identifier. Zenodo, Figshare, Dryad, and PANGAEA are good options for archiving because they provide licenses, are long-term, and are citable. The cost to authors assumes that the code is publicly available. Note that the information in this table is subject to change.b Long-term availability depends on continued government funding or the success of the companies involved. Open table in a new tab Journals can have a significant impact on increasing the value of code within the ecology community. We believe that broad adoption of the suggestions to increase visibility and discoverability of code, require archiving of code, and increase citation incentives for doing so, will motivate more authors to release both analysis code and scientific software. By fostering reproducibility and reuse, more available code can improve the quality and accelerate the rate of research in ecology. K.A.S. was supported by the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery and the Moore/Sloan Data Science Environments Project at the University of Washington. This work was supported in part by the Gordon and Betty Moore Foundation Data-Driven Discovery Initiative (grants GBMF4553 to J.M.H. and GBMF4563 to E.P.W.). We thank Carl Boettiger for thoughtful comments that significantly improved the paper.

Referência(s)