Git Concepts
Last updated
Last updated
We use version control all the time in our lives. Think of something like Google Docs, where you can revert changes and "rewind" your changes if you make a mistakes and also see how the document has changed over time.
Well, the key thing is that Google Docs, as the name suggests, is very much confined to a single document. Well, that can be easily solved, we can just Google Docify our folders right? Here's an example of how the Google Docs method of version control starts to fall apart.
Google Docs takes a snapshot of your document every n amount of time, and tries to "blame" each change on someone (every character/line changed has to be attributed to someone).
Imagine I'm helping modify a cake recipe in Google Docs:
At x point in time, a snapshot is taken. There is a line that says "Add 5g of sugar"
At x + 1 point in time, I decide that's too much sugar, I make a change to the line to "Add 3g of sugar"
At x + 2 point in time, Person Y accidentally sits on his keyboard while the docs is open and replaces the line with "Add 2348g of sugar"
At x + 3 point in time, a snapshot is taken. Google Docs versioning now shows that Person Y has changed 5g to 2348g of sugar, and my changes are lost to time.
Now imagine this problem on a large codebase of millions of lines, with hundreds of engineers contributing to different parts of this file. How can one prevent something like this? We want each and every change to be well documented, justified, and more importantly, reversible. These are the guarantee version control like Git provides.
A commit is a snapshot of the entire repository at a point in time, plus some metadata. More specifically, it contains:
A hash, or a unique (kinda) identifier for a commit, sort of like your student ID
The author of the commit, the email of the author, and the time of the commitds
Each commit also has a parent commit (except the first commit)
We can "chain" commits by following the parent commit till we hit the first commit. If we do this for every commit, we get a directed acyclic graph
Directed: Commits point to their parents
Acyclic: There cannot be commit cycles.
Git relies on the core concept of a repository, which is essentially a parent folder that Git is added to to monitor the changes of the folder and its contents (including sub-folders).
These repositories can exist on both your local machines or remotely on an external server (or ). This guide will look at both instances.
Github is an example of a hosted remote Git server where you can create remote repositories and work on them locally (while pushing changes remotely, hence the "decentralized" nature of Git).
Think of it like having two versions of a Google Docs. When you are editing your document in a train for example, you might lose connectivity, and you'll have an offline copy which is different from the online copy (the source of truth)