-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path_git-basics.qmd
93 lines (78 loc) · 4.63 KB
/
_git-basics.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
In our work lives, we regularly work with files, either creating,
editing, moving, copying, or deleting them. These files can be
anything from text documents, to images, to code. When we work on these
files, we often make changes to them, and sometimes many changes. We
might want to keep track of these changes, so we can see *what* we've
done, *when* we did it, *why* we did it, and *who* did it. This is both helpful for
potential collaborators and our future selves.
If a file has the ability to internally "track changes", like Word
does, you may have used that before, but likely only when getting
feedback from others. On the file level, you may have "tracked changes"
informally by saving multiple versions of a file with different names,
like in the example image below.

Does this way of saving files and keeping track of versions look
familiar? The above image may exaggerate how some people's versioning looks
like, but there is some truth to it: It is the most common approach to
"version control".
This "informal" version control isn't ideal because it involves multiple
copies of the same file. It makes it difficult to keep track of specific
changes and find the right version of the files.
There are, however, "formal" version control systems that automatically
manage changes to files. One of the world's most popular version control
systems is called
[Git](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F).
Git is used by millions of people around the world, including thousands
of organisations and researchers.
With Git you can create snapshots of file changes, known as *commits*. Each commit
captures:
- What specific changes were made to the file or files.
- Who made the changes to the files.
- When they made the changes to the files.
Each commit also has a short message attached to it that can
describe *why* the changes were made.
Git stores these commits in a history log. The history log allows you to
quickly go back and explore the changes made to files, along with a
message describing the changes. This is extremely useful when you
revisit your own work after a long time (because you *will* forget
things) and when you work in groups or with collaborators.
Git only tracks changes to files *within a specific folder* (and it's
sub-folders). In Git terminology, this folder is called a
**repository** (or a *repo* for short). The best way to use a repository
is to store all files related to a specific project, like a research
project, in this repository (this folder). This way, you can track all
changes made to all files in the project. It keeps things more organised and
self-contained, since everything related to a project is in one place.
Any type of file can be stored in the repository, including both
code and other non-code based files like Word or images. However, Git has
more features and tools for tracking specific changes when the file is
text-based, like a `.txt`, `.csv`, or code. Since these text-based files
are literally only text characters, it is easier to track the changes to
exact lines of text. Unlike files like images, or Word documents (that
actually aren't just text), there are no "lines" to track changes on.
To understand how powerful formal version control like Git is, consider about these
questions:
- How many files of different versions of a scientific document or
thesis do you have laying around after getting feedback from your
supervisor or co-authors?
- Have you ever wanted to test an analysis in a
file but ended up creating a new one to avoid modifying the original?
- Have you ever deleted something and wished you hadn't?
- Have you ever forgotten what you were doing on a project, or why you
chose a particular strategy or analysis?
All these problems can be fixed by using formal version control! There
are many good reasons to use version control, especially in science:
- Transparency of work done to demonstrate or substantiate your
scientific claim and protect against accusations of fraud.
- Claim to first discovery, since you have a time-stamped history of
your work.
- Evidence of contributions and work, since who does what is tracked.
- Easier collaboration, because you can work on a single file/folder
in a single central location rather than emailing file versions
around.
- Organized files and folders, since there is one single project
folder and one single version of each file, rather than multiple
versions of the same file.
- Less time spent on finding things related to your projects, because
everything is organized and in one place.