Why should you use Git/GitHub?

1 Collaboration and Version Control

Git allows you to keep track of all the versions of your analysis and track changes even when all your collaborators are working on different computers. The utility of Git-based version control might be apparent for massive collaborative projects, like the dplyr package with over 250 contributors.

But even for smaller projects (or if you work alone), there are benefits to avoiding the “file renaming” method, for example:

  • You can collaborate on the most up-to-date versions of the analysis asynchronously

    https://doi.org/10.1186/1751-0473-8-7

  • You can see what you changed, who made the change, and when between versions by looking at the “diff,”even years later

  • If you make a mistake and break your code or want to try something new, you can always revert to the last working version

  • It allows you to combine organizing and sharing your work into your daily workflow, and

  • Make revisiting your code easier for future you.

2 Reproducibility and Sharing

2.1 Reproducibility

We make any small decisions during data analysis that can significantly impact the outcome of an analysis. Often, these choices are left out of the methods sections of publications. Hosting your work on GitHub in a public repository provides a line-for-line method for your analysis which can be reviewed by others interested in your work or looking to use similar methods.

2.2 Sharing

GitHub provides great options for hosting and sharing your research. (This is a not an exhaustive list)

  • GitHub Markdown Documents: I’ve used this in the past for Knit Rmarkdown files (It’s not the cleanest approach but here is an example from a recent project)

  • Websites

  • Books/reports

3 Recover from mistakes

See the section on recovering from mistakes.