Archives

Creative Commons License
This blog is licensed under a Creative Commons License.

Git from the bottom up

| 35 Comments | 1 TrackBack

In my pursuit to understand Git, it’s been helpful for me to understand it from the bottom up – rather than look at it only in terms of its high-level commands. And since Git is so beautifully simple when viewed this way, I thought others might be interested to read what I’ve found, and perhaps avoid the pain I went through finding it.

The following article offers what I’ve learned on this journey so far. I hope it can help others to comprehend this wonderful system, and discover some of the joy I’ve experienced in the past few weeks. NOTE: After receiving more than fifty corrections by e-mail from very helpful readers, I’ve updated the PDF to reflect their input. The date at the front should read “Fri, 2 May 2008” if you have the latest version.

Here is a summary from the table of contents:

  • Introduction
  • Repository: Directory content tracking
  • Introducing the blob
  • Blobs are stored in trees
  • How trees are made
  • The beauty of commits
  • A commit by any other name…
  • Branching and the power of rebase
  • Index Cache: Meet the middle man
  • Taking the index cache farther
  • To reset, or not to reset
  • Last links in the chain: Stashing and the reflog

1 TrackBack

TrackBack URL: http://www.newartisans.com/mt/mt-tb.cgi/13

Resources from Confluence: PCI Toolkit on March 19, 2009 4:50 PM

Git Cheatsheet Read More

35 Comments

Your link to the pdf file is broken when accessed via feedreader.

http://feeds.feedburner.com/blog_assets/git.from.bottom.up.pdf

greets

Please check to see if the link is fixed now.

awesome, I look forward to reading it :)

I read the PDF and it was helpful in solidifying my mental image of git and introducing me to other tools. Thank you.

I think an interesting explanation to add would be blob injection, that is, two or more HEADs that have no shared history between them. I do that when I want to add meta-info about a cloned upstream repository.

It doesn’t help me understand if git knows about “moving a function from one file to another”? Or where (or if) compression and differencing comes into play. Perhaps these are details not necessary for this level of understanding.

Your git-stash usage as a backup system sounds novel, but I suppose it doesn’t track files that are not registered (untracked, like new files).

Perhaps nitpicks: Is there an extraneous quote in the title or is this some special font for “t”? “Git’ …”? On page 25, “they’re” is broken across lines incorrectly.

John,

Thanks for this resource: as a newcomer to Git, the PDF was a very comprehensible and eye opening read.

I spotted one very minor grammar mistake: on p24 it should be “This approach has two distinct advantages” rather than “This approach two distinct advantages”.

I was a bit confused by your use of HEAD@{1} notation in the section on hard resets until I came to the section on stashing. Perhaps the material on recovering from an inadvertent hard reset could be moved into the stashing section?

Thanks again! Max

@piyo I’m not sure I follow what you mean about “blob injection”, could you share an example?

Also, you’re right, git-stash does not track unregistered files.

The curly “t” is a “stylistic alternate” within the OpenType font Garamond Premier Pro.

I’ve added yours and Max’s grammatical corrections. Thank you!

@Max I added a short note on the usage of HEAD@{1}, that it will be explained in the next section. I just thought it important to point out there so that people would connect it with the idea of restoring from an accidental reset.

Hey the pdf looks very nice, how did you produce it?

Do you know if there is something similar to this for mercurial?

thanks in advance

@MB The PDF was written using the word processor Mellel, the font Garamond Premier Pro from Adobe, and the drawing program OmniGraffle Pro for the diagrams.

I know of nothing similar for Mercurial, but I haven’t really looked either.

Regarding what I call “blob injection”, it is a way of adding blobs that are unrelated to the current repository’s history. In other words, adding another track of blobs with no relation between the existing blobs.

Actually, you’ve already touch on that, with your CVS and Subversion import example in “Diving into Git”. However, you speedily sow these two tracks together with git-rebase. I however, keep the tracks separate, because I use it for separate data, like where I got this repository from and how it should be checked out. This is a limited technique probably only suitable for metadata.

An example:

$ git clone git://repo.or.cz/git.git
# or any other repo
$ mkdir checkout_info && cd checkout_info
$ git init
$ echo "This repository is from git://repo.or.cz/git.git" > .git_checkout_info
$ git add .git_checkout_info && git commit -m "Checkout info"
$ git tag checkout_info
$ cd ../git
$ git remote add checkout_info ../checkout_info/.git 
$ git fetch checkout_info # this is where we insert unrelated info.
$ rm ../checkout_info # no longer needed
$ gitk --all -d # notice there is nothing connecting the two tracks together.

Ah, I actually use this feature too, to keep “side-band” data which is related to the repository, but doesn’t belong in any regular working tree.

I could have added a note about it in the PDF, but I think it might have added a bit more complexity than necessary. This is the kind of thing for you to blog about so I can link there! :)

Would it be possible to see the article in a format better suited to on-screen reading, like HTML or plain text? Maybe just the words without the pictures?

I have no easy way to convert it to HTML at present.

Excellent document even for people accustomed to Git, in order to have a deeper understanding of the beast.

Keep up the good work!

Thank you very much for writing this. It really increased my understanding of git.

Thank you for this paper, it was very useful to me!

I was wondering if you could license your work under a cc license (or something similar) so I could translate and share your work in italian language.

Masci, consider it done. Just e-mail me so that I’m certain to get done what you need for the translation to happen.

Very well written paper, thanks a lot. I was very glad to see that a lot of the needs of developers are taking into account by GIT.

The funny stuff is that I have written a pattern language about CM and not all but most techniques are covered by GIT. If you would like to have a look into my paper, just drop me an e-mail.

Conerning you expression about the “the index”, instead I would like to call this a “temporary repository” which might be a better fit for its itention. But any way now it is too late.

However from my point of view “Staging area” might not be realy covered by “the index”. I see a staging area devloped with the use of a multi dimensional file system (UnionFS, or aufs) because then the time required to share a change with other developers is

1) independent of the size of the code

2) indepedent of the number of developers

I significant adavantage for high speed developments, I have used something like for over a century.

best regards

Juergen

Thank you very much for the work.

Nice and very well thought. :-)

Thanks John

Very helpful!

Your bottom-up tutorial helped me grok git’s guts in an easy (and entertaining) way.

Thank you a lot for taking the time to put together this quality document.

Thank you so much! This document was what I needed to actually get git. I knew it was good but without the understanding of how it works, it was a real confusion whenever anything went a bit wrong.

Thank you.

Great exposition; many thanks.

I do have one question, though. In “Branching and the power of rebase”, you note that “the “base” of the Z branch is A, while the base of the D branch continues back to some unlabeled commit in the past”. I’m probably being dense, but I don’t see how/why the “base”(s) of these two branches should be different: Why does the Z branch ‘stop’ at A when the D branch doesn’t?

David, technically speaking you are right, they can both be viewed as independent branches with the same ancestor commit. I suppose I meant that Z stops at A only for the sake of considering Z a “branch off of A”. I’ll see about rewording it.

Excellent writing, thanks!

The url for the git-core tutorial has changed slightly - it is now at http://www.kernel.org/pub/software/scm/git/docs/gitcore-tutorial.html

Thanks for the update. I’ll include this among the next round of edits.

Thank you very much for this; as a bottom-up guy it made me understand git better than dozens of other tutorials. A small correction:

This means that when you create a tree from your index and store it under a commit (all of which is done by commit), you are also, inadvertently adding that commit to the reflog, which can be viewed using the following command:

You got an extra comma after “also”.

Hi John,

a comment to the paragraph about reset:

$ git reset HEAD foo.c

actually isn’t a mixed reset. If you specify a path, HEAD is never touched.

So

$ git reset HEAD~3

makes a (mixed) reset to HEAD~3, so your HEAD changes.

$ git reset HEAD~3 foo.c

only changes the index entry for foo.c though.

Hello John

Above, you say “The date at the front should read “Fri, 2 May 2008” if you have the latest version.”

I took this to mean that the file http://ftp.newartisans.com/pub/git.from.bottom.up.pdf would carry that date. However, the file downloaded today contains “Thu, 11 Sep 2008” just after the title.

I am confused as to which or where is the latest version of your document.

Thanks for letting me know, I’ll try to rectify the situation this week, as well as incorporating several recent corrections that were sent in.

this is an excellent write up; I’ve been reading much into git and i found this to be conceptually useful and pleasing to the visual whole of my brain.

i am in the process on converting my company to git, and i intend on using git hooks to sync merges on specific branches to our dev server to ALL distributed production servers (web servers) transparently, verify syntax in certain files, and update our custom internal ticketing system. we will be able to develop, sync to staging, and sync to production without ever leaving the dev server.

thank you for your investment into this document; i expect many others will find it as useful as i have.

You deserve a lot of praise for this article. It’s well written and looks great. Far too few authors care about aesthetics although it makes a text so much easier to read and follow.

I will give this to my interested co-workers.

Thank you!

John, thanks for the work you put into this. Since I am coming from SVN, I’m really struggling to “git” git, and this approach is usually what works better for me than tutorials and the like.

One issue that is confusing me. You defined several terms at the beginning, but did not define “commit”. Coming from SVN, there appears to be a subtle but important difference. In paticular, this, from page six:

“This first commit added my greeting file to the repository. It contains one Git tree…”

is ambiguous. On first reading, I took “it” to refer to the repository, but subsequent statements are inconsistent with that, and seem to point toward “it” referring to the commit. Which further implies that a commit is more a piece of data than an action, or simply a change in the state of the single repository tree as it is in SVN.

I’m moving on with a big asterisk next to my tentative understanding of what a commit actually is in the system.

Great article, indeed.

@Kyle: I’m a git newbie and SVN convert too, but looking at http://eagain.net/articles/git-for-computer-scientists/ I guess that each commit has one tree.

Hi John, I just started to read your GFTBU, it’s so very well written and has such a good intro, really is the right way things should be taught. Brilliant. Just a feedback… thanks.-

About this Entry

This page contains a single entry by John Wiegley published on April 27, 2008 6:32 PM.

Diving into Git was the previous entry in this blog.

Ready Lisp version 20080428 now available is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • Curt Sampson: That there’s “no state” in Haskell is quite wrong; in read more
  • rv: Hi. I wanted to drop you a quick note to read more
  • John Wiegley: It’s here: http://ftp.newartisans.com/pub/python/modpython_gateway.py read more
  • Leon: The file “modpython_gateway.py” Is no longer available in the downloads read more
  • Kathy: Well, the article is really the sweetest on this laudable read more
  • mr.design: Hi John, I just started to read your GFTBU, it’s read more
  • yoman: “Barfin”? “Slurping”? “Slime” “Hunchentoot” ??? What in the T.F. world read more
  • John Wiegley: Something like this is slated for the next release of read more
  • womens health: According to me, Apple has implemented something called blocks, which read more
  • Bjorn Tipling: Why would you add instructions for installing an editor when read more
OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.261