A Brief Tutorial on a Shared Git Repository

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2011/01/23/git-shared-repository-tutorial.html

A while ago, I set up Git for a group privately sharing the same
central repository. Specifically, this is a tutorial for those who would
want to have a Git setup that is a little bit like a SVN repository: a
central repository that has all the branches that matter published there
in one repository. I found this file today floating in a directory of
“thing I should publish at some point”, so I decided just to
put it up, as every time I came across this file, it reminded me I should
put this up and it’s really morally wrong (IMO) to keep generally useful
technical information private, even when it’s only laziness that’s causing
it.

Before you read this, note that most developers don’t use Git this way,
particularly with the advent of shared
hosting facilities like Gitorious
, as systems like Gitorious solve the
weirdness of problems that this tutorial addresses. When I originally
wrote this (more than a year ago), the only well-known project that I
found using a system like this was Samba; I haven’t seen a lot of other
projects that do this. Indeed, this process is not really what Git is
designed to do, but sometimes groups that are used to SVN expect there to be
a “canonical repository” that has all the contents of the
shared work under one proverbial roof, and set up a “one true Git
repository” for the project from which everyone clones.

Thus, this tutorial is primarily targeted to a user mostly familiar
with an SVN workflow, that has ssh access to
host.example.org that has a writable (usually by multiple people)
Git repository living in the directory
/git/REPOSITORY.git/.

Ultimately, The stuff that I’ve documented herein is basically to fill
in the gaps that I found when reading the following tutorials:

So, here’s my tutorial, FWIW. (I apologize that I make the mortal sin
of tutorial writing: I drift wildly between second-person-singular,
first-person-plural, and passive-voice third-person. If someone sends
me a patch to the HTML file that fixes this, I’ll fix it. 🙂

Initial Setup

Before you start using git, you should run these commands to let it
know who you are so your info appears correctly in commit logs:

         $ git config --global user.email [email protected]
         $ git config --global user.name “Your Real Name”
        

Examining Your First Clone

To get started, first we clone the repository:

          $ git clone ssh://host.example.org/git/REPOSITORY.git/
        

Now, note that Git almost always operates in the terms of
branches. Unlike Subversion, Git’s branches are first-class citizens and
most operations in Git operate around a branch. The default branch is
often called “master”, although I tend to avoid using the
master branch for much, mainly because everyone who uses git has a
different perception of what the master branch should embody. Therefore,
giving all your branches more descriptive name is helpful. But, when you
first import something into git, (for example, from existing Subversion
trees), everything from Subversion’s trunk is thrown on the master
branch.

So, we take a look at the result of that clone command. We have a new
directory, called REPOSITORY, that contains a “working
checkout&rquo; of the repository, and under that there is one special
directory, REPOSITORY/.git/, which is a full copy of the repository. Note
that this is not like Subversion, where what you have on your local
machine is merely one view of the repository. With Git, you have a full
copy of everything. However, an interesting thing has been done on your
copy with the branches. You can take a look with these commands:

          $ git branch
          * master
          $ git branch -r
          origin/HEAD
          origin/master
        

The first list of branches are the branches that are personal and local
to you. (By default, git branch uses the -l option,
which shows you only “local” branches; -r means
“remote” branches. You can also use -a to see all of
them.) Unless you take action to publish your local branches in some way,
they will be your private area to work in and live only on your
computer. (And be aware: they are not backed up unless you back them up!)
The remote ones, that all start with “origin/” track the
progress on the shared repository.

(Note the term “origin” is a standard way of referring to
“the repository from whence you cloned”, and
origin/BRANCH refers to “BRANCH as it looks in the
repository from whence you cloned”. However, there is nothing
magical about the name “origin”. It’s set up to DTRT in your
WORKING-DIRECTORY/.git/config file, and the clone command set it
all up for you, which is why you have them now.)

Get to Work

The canonical way to “get moving” with a new task in Git is
to somehow create a branch for it. Branches are designed to be cheap and
quick to create so that users will not be shy about creating a new one.
Naming conventions are your own, but generally I like to call a
branch USERNAME/TASK when I’m still not sure exactly what I’ll be
doing with it (i.e., who I will publish it to, etc.) You can always merge
it back into another branch, or copy it to another branch (perhaps using a
more formal name) later.

Where do you Start Your Branch From?

Once a repository exists, each branch in the repository comes from
somewhere — it has a parent. These relationships help Git know how
to easily merge branches together. So, the most typical procedure of
starting a new branch of your own is to begin with an existing branch.
The git checkout command is the easiest to use to start this:

           git checkout -b USERNAME/feature origin/master
        

In this example, we’ve created our own local branch, called
USERNAME/feature, and it’s started from the current state
of origin/master. When you are getting started, you will
probably usually want to always base your new branches off of ones that
exist on the origin. This isn’t a rule, it’s just less confusing
for a newbie if all your branches have a parent revision that live on the
server.

Now, it’s important to note here that no branch stands still. It’s
best to think about a branch as a “moving pointer” to a linked
list of some set of revisions in the repository.

Every revision stored in git, local or remote, has a SHA1 which is
computed based on the revisions before it plus new patch the revision just
applied.

Meanwhile, the only two substantive differences between one of these
SHA1 identifiers and an actual branch is that (a) Git keeps changing what
identifier the branch refers to as new commits come in (aka it moves the
branch’s HEAD), and (b) Git keeps track of the history of identifiers the
branch previously referred to.

So, above, when we asked git checkout to creat a new branch called
USERNAME/feature based on origin/master, the two
important things to realize are that (a) your new branch has its HEAD
pointing at the same head that is currently the HEAD of
origin/master, and (b) you got a new list to start adding
revisions in the new branch.

We didn’t have to use branch for that. We could have simply started
our branch from any old SHA1 of any revision. We happened to want to
declare a relationship with the master branch on the server in
this case, but we could have easily picked any SHA1 from our git log and
used that one.

Do Not Fear the checkout

Every time you run a git checkout SOMETHING command, your
entire working directory changes. This normally scares Subversion users;
it certainly scared me the first time I used git checkout
SOMETHING
. But, the only reason it is scary is because svn
switch
, which is the roughly analogous command in the Subversion
world, so often doesn’t do something sane with your working copy. By
contrast, switching branches and changing your whole working directory is
a common occurrence with git.

Note, however, that you cannot do git checkout with
uncommitted changes in your directory (which, BTW, also makes it safer
than svn switch). However, don’t be too Subversion-user-like and
therefore afraid to commit things. Remember, with Git (and unlike with
Subversion), committing and publishing are two different operations. You
can commit to your heart’s content on local branches and merge or push
into public branches later. (There are even commands to squash many
commits into one before putting it on a public branch, in case you don’t
want people to see all the intermediate goofiness you might have done.
This is why, BTW, many Git users commit as often as an SVN user would save
in their editors.)

However, if you must switch checkouts but really do fear making
commits, there is a tool for you: look into git stash.

Share with the Group

Once you’ve been doing some work, you’ll end up with some useful work
finished on a USERNAME/feature branch. As noted before, this is
your own private branch. You probably want to use the shared repository
to make your work available to others.

When using a shared Git repository, there are two ways to share your
branches with your colleagues. The first procedure is when you simply
want to publish directly on an existing branch. The second is when you
wish to create your own branch.

Publishing to Existing Branch

You may choose to merge your work directly into a known branch on the
remote repository. That’s a viable option, certainly, but often you want
to make it available on a separate branch for others to examine, even
before you merge it into something like the master branch.
We discuss the slightly more complicated new branch publication next, but
for the moment, we can consider the quicker process of publishing to an
existing branch.

Let’s consider when we have work on USERNAME/feature and we
would like to make it available on the master branch. Make sure
your USERNAME/feature branch is clean (i.e., all your changes are
committed).

The first thing you should verify is that you have what I call a
“local tracking branch” (this is my own term that I made up, I
think, you won’t likely see it in other documentation) that is tied
directly with the same name to the origin. This is not completely
necessary, but is much more convenient to keep track of what you are
doing. To check, do a:

           $ git branch -a
           * USERNAME/feature
             master
             origin/master
        

In the list, you should see both master and
origin/master. If you don’t have that, you should create it
with:

           $ git checkout -b master origin/master
        

So, either way, you wan to be on the master branch. To get
there if it already existed, you can run:

           $ git checkout master
        

And you should be able verify that you are now on master with:

           $ git branch
           * master
           ...
        

Now, we’re ready to merge in our changes:

           $ git merge USERNAME/feature
           Updating ded2fb3..9b1c0c9
           Fast forward
           FILE ...
           N files changed, X insertions(+), Y deletions(-)
        

If you don’t get any message about conflicts, everything is fine. Your
changes from USERNAME/feature are now on master. Next,
we publish it to the shared repository:

          $ git push
          Counting objects: N, done.
          Compressing objects: 100% (A/A), done.
          Writing objects: 100% (A/A), XXX bytes, done.
          Total G (delta T), reused 0 (delta 0)
          refs/heads/master: IDENTIFIER_X -> IDENTIFIER_Y
          To ssh://host.example.org/git/REPOSITORY.git
           X..Y  master -> master
        

Your changes can now be seen by others when they git pull (See
below for details).

Publishing to a New Branch

Suppose, what you wanted to instead of immediately putting the feature
on the master branch, you wanted to simply mirror your personal
feature branch to the rest of your colleagues so they can try it out
before it officially becomes part of master. To do that, first,
you need tell Git we want to make a new branch on the shared repository.
In this case, you do have to use the git push command as
well. (It is a catch-all command for any operations you want to do to the
remote repository without actually logging into the server where the
shared Git repository is hosted. Thus, Not surprisingly, nearly any
git push commands you can think of will require you to be
net.connected.)

So, first let’s create a local branch that has the actual name we want
to use publicly. To do this, we’ll just use the checkout command, because
it’s the most convenient and quick way to create a local branch from an
already existing local branch:

          $ git branch -l
          * USERNAME/feature
            master
            ...
          $ git checkout -b proposed-feature USERNAME/feature
          Switched to a new branch “proposed-feature”
          $ git branch -l
          * proposed-feature
            USERNAME/feature
            master
            ...
        

Now, again, we’ve only created this branch locally. We need an
equivalent branch on the server, too. This is where git push comes in:

          $ git push origin proposed-feature:refs/heads/proposed-feature
        

Let’s break that command down. The first argument for push is always
“the place you are pushing to”. That can be any sort of git
URL, including ssh://, http://, or git://. However, remember that the
original clone operation set up this shorthand “origin” to
refer to the place from whence we cloned. We’ll use that shorthand here
so we don’t have to type out that big long URL.

The second argument is a colon-separated item. The left hand side is
the local branch we’re pushing from on our local repository, and
the right hand side is the branch we are pushing to on the remote
repository.

(BTW, I have no idea why refs/heads/ is necessary. It seems
you should be able to say proposed-feature:proposed-feature and git would
figure out what you mean. But, in the setups I’ve worked with, it doesn’t
usually work if you don’t put in refs/heads/.)

That operation will take a bit to run, but when it is done we see
something like:

          Counting objects: 35, done.
          Compressing objects: 100% (31/31), done.
          Writing objects: 100% (33/33), 9.44 MiB | 262 KiB/s, done.
          Total 33 (delta 1), reused 27 (delta 0)
          refs/heads/proposed-feature: 0000000000000000000000000000000000000000
                                         -> CURRENT_HEAD_SHA1_SUM
          To ssh://host.example.org/git/REPOSITORY.git/
           * [new branch]      proposed-feature -> proposed-feature
        

In older Git clients, you may not see that last line, and you won’t get
the origin/proposed-feature branch until you do a subsequent pull. I
believe newer git clients do the pull automatically for you.

Reconfiguring Your Client to see the New Remote Branch

Annoyingly, as the creator of the branch, we have some extra config
work to do to officially tell our repository copy that these two branches
should be linked. Git didn’t know from our single git push
command that our repository’s relationship with that remote branch was
going to be a long term thing. To marry our local to
origin/proposed-feature to a local branch, we must use the
commands:

          $ git config branch.proposed-feature.remote origin
          $ git config branch.proposed-feature.merge refs/heads/proposed-feature
        

We can see that this branch now exists because we find:

          $ git branch -a
          * proposed-feature
            USERNAME/feature
            master
            origin/HEAD
            origin/proposed-feature
            origin/master
         

After this is done, the remote repository has a
proposed-feature branch and, locally, we have a
proposed-feature branch that is a “local tracking
branch” of origin/proposed-feature. Note that
our USERNAME/feature, where all this stuff started from, is
still around too, but can be deleted with:

        git branch -d USERNAME/feature
        

Finding It Elsewhere

Meanwhile, someone else who has separately cloned the repository before
we did this won’t see these changes automatically, but a simple git
pull
command can get it:

          $ git pull
          remote: Generating pack...
          remote: Done counting 35 objects.
          remote: Result has 33 objects.
          remote: Deltifying 33 objects...
          remote:  100% (33/33) done
          remote: Total 33 (delta 1), reused 27 (delta 0)
          Unpacking objects: 100% (33/33), done.
          From ssh://host.example.org/git/REPOSITORY.git
           * [new branch]      proposed-feature -> origin/proposed-feature
          Already up-to-date.
          $ git branch -a
          * master
            origin/HEAD
            origin/proposed-feature
            origin/master
        

However, their checkout directory won’t be updated to show the changes
until they make a local “mirror” branch to show them the
changes. Usually, this would be done with:

          $ git checkout -b proposed-feature origin/proposed-feature
        

Then they’ll have a working copy with all the data and a local branch
to work on.

BTW, if you want to try this yourself just to see how it works, you can
always make another clone in some other director just to play with, by
doing something like:

          $ git clone ssh://host.example.org/git/SOME-REPOSITORY.git/ \
            extra-clone-for-git-didactic-purposes
        

Now on this secondary checkout (which makes you just like the user who
is not the creator of the new branch), work can be pushed and pulled on
that branch easily. Namely, anything you merge into or commit on your
local proposed-feature branch will automatically be pushed to
origin/proposed-feature on the server when you git push. And,
anything that shows up from other users on the origin/proposed-feature
branch will show up when you do a git pull. These two branches were paired
together from the start.

Irrational Rebased Fears

When using a shared repository like this, it’s generally the case that
git rebase usually screws something up. When Git is used in the
“normal way”, rebase is one of the amazing things about Git.
The rebase idea is: you unwind the entire work you’ve done on one of your
local branches, bringing in changes that other people have made in the
meantime, and then reapply your changes on top of them.

It works out great when you use Git the way the Linux Project does.
However, if you use a single, shared repository in a work group, rebase
can be dangerous.

Generally speaking, though, with a shared repository, you can
use git merge and won’t need rebasing. My usual work flow is
that I get started on a feature with:

          $ git checkout -b bkuhn/new-feature starting-branch
        

I work work work away on it. Then, when it’s ready, I send a patch around
to a mailing list that I generate with:

          $ git diff $(git merge-base starting-branch bkuhn/new-feature) bkuhn/new-feature
        

Note that the thing in the $() returns a single identifier for a
version, namely, the version of the fork point between starting-branch and
bkuhn/new-feature. Therefore, the diff output is just the stuff I’ve
actually changed. This generates all the differences between the place
where I forked and my current work.

Once I have discussed and decided with my co-developers that we like
what I’ve done, I do this:

          $ git checkout starting-branch
          $ git merge bkuhn/new-feature
        

If all went well, this should automatically commit my feature into
starting-branch. Usually, there is also an origin/starting-branch, which
I’ve probably set up for automatic push/pull with my local
starting-branch, so I then can make the change officially by running:

          $ git push
        

The fact that I avoid rebase is probably merely FUD, and if I learned
more, I could use it safely in cases with shared repository. But I have
no advice on how to make it work. In
particular, this
Git FAQ entry
shows quite clearly that my work sequence ceases to work
all that well when you do a rebase — namely, doing a git
push
becomes more complicated.

I am sure a rebase would easily become very necessary if I lived on
bkuhn/new-feature for a long time and there had been tons of changes
underneath me, but I generally try not to dive to deep into a fork,
although many people love DVCS because they can do just that. YMMV,
etc.