Code Management Guidelines

R and GitHub Starter Kit for New Team Members

Author

Florencia D’Andrea

Published

December 23, 2024

GitHub Organization: www.github.com/StringhiniLab

Goal

The goal of this manual is to provide the minimum necessary guidelines for new members of Dr. Silvia Stringhini’s lab to follow agreed-upon practices in code management.

Introduction

The use of programming languages has become an essential part of data analysis for most researchers today. In this context, a basic skill set in computer science is key to ensuring reliable and reproducible results (Wilson et al. 2017; Hicks 2023; Abdill, Talarico, and Grieneisen 2024). Although a variety of educational materials, tutorials, and recommended practices specifically designed to train researchers are available (The Carpentries; Our Coding Club; The Turing Way Community 2023; CodeRefinery Project), there is a trade-off: adopting and practicing these techniques often requires significant effort, taking time away from researchers’ primary fields of study (Allen and Mehler 2019; Goldsmith et al. 2021; Hicks 2023).

One consequence of the deficiency in training is the uncertainty researchers may have about how to write code correctly, which negatively impacts their willingness to share their analyses (Gomes et al. 2022). Thus, this results in a decrease in the number of publications with available code, impacting the reproducibility and transparency of scientific research (Gomes et al. 2022; Sharma et al. 2024). This issue is exacerbated by the lack of incentives from the scientific system, leading to a high number of publications where authors do not share their code, despite the benefits of making their code open source (Allen and Mehler 2019; Melvin et al. 2022; Bertram et al. 2023; Tazare et al. 2024; Xu et al. 2025).

Encouraging researchers to actively adopting best practices and seek training in the use of computational tools that facilitate or enhance their work is desirable and should be promoted. However, leaving code management decisions entirely in their hands could have negative consequences for a research group.

Ten reasons to define code management practices from day one

Would the problem be solved if future new members of the lab arrived with better training in data science? No. We believe the research group should still define its priorities when it comes to managing code.

There are several benefits to defining clear minimum guidelines and basic computational skills from the moment new members join the lab:

  1. Avoid messy projects from the start.
    Centralizing data analyses on a GitHub Organization and creating standards for pushing code promotes improved repository structuring, version control, and better-documented code, ensuring reproducibility from the project’s inception.

  2. Implement minimum documentation and project management best practices.
    Defining group-level criteria for code and data management facilitates collaboration, saving time and avoiding errors.

  3. Focus on domain-specific skills first.
    Identifying domain-specific computational skills can save time for new researchers.
    This knowledge is sometimes shared in publications tailored to each discipline but is too specific to be addressed by general training courses and tutorials for scientists, being the only exception we know Data Carpentry (Data Carpentry 2024).

  4. Early peer review.
    Sharing analyses with team members in private repositories allows for valuable feedback. Although initially restricted, this practice fosters confidence in making code publicly accessible upon publication.

  5. Define a set of practices that should not be overlooked.
    Not all researchers who take a course in Git and GitHub will make their code available if there are no guidelines on whether it is expected of them to do it or not and how. Failing to define certain guidelines will result in each researcher adopting these practices in varying degrees.

  6. More efficient use of time. Taking a workshop on a computational tool may occur at an advanced stage of the project. As a result, decisions about code organization, documentation, and file structure could have been made more effectively from the beginning, saving valuable time.

  7. Maintain the group’s research history.
    This approach helps create and standardize a historical archive of the group’s data analyses, ensuring continuity and avoiding dependence on researchers leaving behind their code and data when they move on.

  8. Facilitate exchange of ideas about data and code management among team members. Creating guidelines helps build a body of knowledge that can be improved over time with contributions from students/researchers, allowing for discussions on which practices should be added, prioritized and/or removed.

  9. Make informed decisions about what to learn next.
    A researcher may hear that they should learn to use GitHub. By explaining from the beginning what GitHub is and the minimum knowledge required, it becomes easier for them to assess if they should focus on learning additional skills or not. Supporting new members of the research group in adopting basic computational techniques from the start lowers the barrier for researchers to explore other tools early.

  10. Adoption of open science practices. If the group aims to begin making research code available, these guidelines and training will effectively promote leaving the code open source.

How to cite this book?

D’Andrea, F., & Stringhini, S. Code Management Guidelines: R and GitHub Starter Kit for New Team Members. https://github.com/StringhiniLab/GitHubProceduresLab. Available at: https://stringhinilab.github.io/GitHubProceduresLab/ DOI: https://doi.org/10.5281/zenodo.14510774

References

Abdill, Richard, Emma Talarico, and Laura Grieneisen. 2024. “A How-to Guide for Code Sharing in Biology.” PLoS Biology 22 (9): e3002815.
Allen, Christopher, and David MA Mehler. 2019. “Open Science Challenges, Benefits and Tips in Early Career and Beyond.” PLoS Biology 17 (5): e3000246.
Bertram, Michael G, Josefin Sundin, Dominique G Roche, Alfredo Sánchez-Tójar, Eli SJ Thoré, and Tomas Brodin. 2023. “Open Science.” Current Biology 33 (15): R792–97.
CodeRefinery Project. “CodeRefinery Lessons.” https://coderefinery.org/lessons/.
Data Carpentry. 2024. “Data Carpentry.” https://datacarpentry.org/. 2024.
Goldsmith, Jeff, Yifei Sun, Linda Fried, Jeannette Wing, Gary W Miller, and Kiros Berhane. 2021. “The Emergence and Future of Public Health Data Science.” Public Health Reviews 42: 1604023.
Gomes, Dylan GE, Patrice Pottier, Robert Crystal-Ornelas, Emma J Hudgins, Vivienne Foroughirad, Luna L Sánchez-Reyes, Rachel Turba, et al. 2022. “Why Don’t We Share Data and Code? Perceived Barriers and Benefits to Public Archiving Practices.” Proceedings of the Royal Society B 289 (1987): 20221113.
Hicks, Daniel J. 2023. “Open Science, the Replication Crisis, and Environmental Public Health.” Accountability in Research 30 (1): 34–62. https://doi.org/10.1080/08989621.2023.1962713.
Melvin, Ryan L, Steven J Barker, Joe Kiani, and Dan E Berkowitz. 2022. “Pro-Con Debate: Should Code Sharing Be Mandatory for Publication?” Anesthesia & Analgesia 135 (2): 241–45.
Our Coding Club. Setting up a GitHub Repository for Your Lab - Version Control and Code Management with GitHub.” https://ourcodingclub.github.io/tutorials/git-for-labs/.
Sharma, Nitesh Kumar, Ram Ayyala, Dhrithi Deshpande, Yesha Patel, Viorel Munteanu, Dumitru Ciorba, Viorel Bostan, et al. 2024. “Analytical Code Sharing Practices in Biomedical Research.” PeerJ Computer Science 10: e2066.
Tazare, John, Shirley V Wang, Rosa Gini, Daniel Prieto-Alhambra, Peter Arlett, Daniel R Morales Leaver, Caroline Morton, et al. 2024. “Sharing Is Caring? International Society for Pharmacoepidemiology Review and Recommendations for Sharing Programming Code.” Pharmacoepidemiology and Drug Safety 33 (9): e5856.
The Carpentries. “WebPage. The Carpentries Teaches Foundational Coding and Data Science Skills to Researchers Worldwide.” https://carpentries.org/.
The Turing Way Community. 2023. “The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research.” Zenodo. https://doi.org/10.5281/zenodo.7625728.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” PLOS Computational Biology 13 (6): 1–20. https://doi.org/10.1371/journal.pcbi.1005510.
Xu, Edward, Anna Catharina V. Armond, David Moher, and Kelly Cobey. 2025. “Key Challenges in Epidemiology: Embracing Open Science.” Journal of Clinical Epidemiology 178: 111618. https://doi.org/https://doi.org/10.1016/j.jclinepi.2024.111618.