markpasc (markpasc) wrote,
markpasc
markpasc

git equivalent to “svn copy” for forking files with history?

As might be obvious from my previous post, I don’t yet grok git. (It took three tries over about a week to figure out what I posted there.) My other major question as a dual git/subversion user is how do I svn copy in git? I don’t see that adequately answered anywhere.

The most common use of svn copy is to branch, which is precisely what I don’t mean here. git seems to promote branches to a first-order concept, in that the entire git repository exists across branches, and there are specific commands for branching and merging. You can’t have a nonstandard trunk/branch/tag hierarchy like you do sometimes in subversion, because there is no hierarchy. git branches are completely orthogonal to your file structure.

Git From the Bottom Up suggests (perhaps a little facetiously) phrasing your problem in git’s language in order to understand:

Understanding commits is the key to grokking Git. You’ll know you have reached the Zen plateau of branching wisdom when your mind contains only commit topologies, leaving behind the confusion of branches, tags, local and remote repositories….

In these terms, you find branches are really names for other commits besides the master head main trunk commit. It’s not really that branches are first-order things, but that branches are names for commits instead of files. Either way, they’re completely orthogonal to the filesystem.

Looking again just now for the answer to my question, I found this thread, which illustrates git’s current position on copying, and how it’s contrary to this second use of svn copy that I’m trying to figure out:

svn copy::
Duplicate something in working copy or repository, remembering history.
cp A B; git add B::
Git doesn’t have a direct equivalent of svn copy. It’s arguable whether it needs it once the user knows they can git-add so easily.

Git wins. Git’s ability to detect copies after-the-fact, mean that a git-copy isn’t necessary.

svn copy is more like git checout -b, i.e. it’s primary purpose is not to “copy” things, it is to create branches. You generally do not copy code (I hope).

Well, in fact, I often do copy code with svn, because I want to fork a file with history. Often I discover when I’m working on (say) some class, I’ve accreted unrelated functionality around the class’s real work, and I need to separate it out. Obviously I can do that just fine and check both parts in, but if I naïvely fork one file in twain—by doing a real file copy and adding the new file, say—the new one will lose all its history. The first commit git will know about for it is the one where it appears fully formed from Zeus’ head, though it happens to share an equivalent blob with some other file in the commit.

In git terms, you can see why it’s not obvious how to copy with history: to duplicate svn copy A B, you want git to understand when you ask about B’s history to include all of A’s previous commits. It’s as though you want to change all A’s commits retroactively to include B, sort of. It’s more about the behavior of the tools than anything you can articulate in a git repository’s data.

So I looked for behavior: lo and behold, Andy Parkin’s message above notes that git can “detect copies after-the-fact,” and I guess he means for example git log’s -C and --find-copies-harder options. According to the manual, this seems to be exactly the behavior I need:

-C
Detect copies as well as renames. See also --find-copies-harder.
--find-copies-harder
For performance reasons, by default, -C option finds copies only if the original file of the copy was modified in the same changeset. This flag makes the command inspect unmodified files as candidates for the source of copy. This is a very expensive operation for large projects, so use it with caution. Giving more than one -C option has the same effect.

If only it worked that way. See the terminalcast I did showing what I mean: once I duplicate the file, svn log doesn’t show fred’s initial commit in wilma’s history, even with the -C or --find-copies-harder flags.

I would be delighted to be wrong, but as far as I can tell, it’s not possible to fork files in git, while it’s trivial with svn copy.

Tags: code, git, subversion
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 15 comments
No, git log will not follow the copy to display more history. That feature just isn't implemented yet. But it does see the copy:

dragon-% mkdir test
dragon-% cd test
dragon-% git init
Initialized empty Git repository in /home/taral/tmp/test/.git/
dragon-% echo hello > file1
dragon-% git add file1
dragon-% git commit -m Initial
Created initial commit de6bf2e: Initial
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 file1
dragon-% cp file1 file2
dragon-% git add file2
dragon-% git commit -m Fork
Created commit 214ffb7: Fork
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 file2
dragon-% git log -C -C --name-status --no-walk
commit 214ffb73dcef3293adf7cb5581ff1536f3ab9339
Author: Taral <taral@taral.net>
Date:   Fri Jul 11 15:15:02 2008 -0700

    Fork

C100    file1   file2


That last line means that file2 was copied from file1.
Ah, I see. I guess that’s good to know!
The question I have is: Why do you need svn copy?

I’m writing a plugin that implements a feature and some conversion routines. I implemented the feature in an Lecf::App module, and now I’m working on the conversion portion. At first I made it one method of Lecf::App, but it grew until it made sense to make part of it a callable routine. Then those parts grew their own routines.

Eventually I wrote a bunch of new code without watching it very carefully, and I had a compilation error, which due to a delicate part of the unrelated feature portion meant the app didn’t actually load at all. That plus the fact that the conversion portion will be rarely used and so won’t need to be loaded as often as the feature indicated I should probably move those routines out of Lecf::App into a new Lecf::Convert module.

With subversion, I would svn cp lib/Lecf/App.pm lib/Lecf/Convert.pm, then delete opposite halves of each file, and I’d still be able to refer to each revision I made to the conversion routines just fine with svn log and svn blame.

That’s what I mean by “fork a file with history.”

Ahh! Makes sense now. Interesting use-case. Unfortunately git doesn't actually keep per-file history, it only keeps per-commit history. So what you want literally doesn't exist in git's mind. When reconstructing per-file history for diff purposes, it can compare hashes to find out if a file is a perfect copy of another one, but that's really kind of a hack.

Yeah, that’s the conclusion I came to. It’s hard to make history follow a copied file when you’re only tracking blobs and commits. It’d be adequate if git log and git blame used the copying information.

It sort of sounds to me like trying to extract a feature branch out of another line of work. You might go that route.

If the changes are in discreet commits and haven't been bumped back to the svn you may also check out the rebase --interactive to reorder/toss etc various commits then save off the ones you need as a patch to apply into the newly extracted feature.

But alas, I r no gitwizard.

Hmmm, that’s an interesting idea, copying the old version and moving those commits to the new file. You’d need heavy history rewriting fu though, so I doubt it’s worth it in practice.

But neat!

git mv f1 f2
git commit -m 'rename file1 to file2'
git checkout HEAD^ f1
git commit -m 'actually I want to copy file1 to file2'
oops, that doesn't work either. sorry

markpasc, thanks so much for this post. I had exactly the same problem, and was glad that a net.search turned up your post first, because it is a clear, concise explanation for the hard-core svn user who is learning git

I thought that since I am using git svn, and therefore my backend repository is svn itself, I could “cheat” to get this done in my case. However, the cheat that I thought might work did not. Here's what I tried:

svn cp svn+ssh://user@svn.example.org/svn/existing_file svn+ssh://user@svn.example.org/svn/similar_forked_file
git svn fetch
git svn rebase

Unfortunately, while I see similar_forked_file appear as it should, in the git log, it has only the revision of being added anew.

This is definitely a feature that git needs, as, like you, I often fork a file to implement something very similar, such as generalizing a specific feature to use it elsewhere.

BTW, great LJ layout. I am going to have to figure out what layout you are using and use it myself later!

No problem! git is still new enough that there are plenty of parts to explain, apparently.

And thanks! This is a customized layer on top of Bloggish. I had to write some S2 code to do some of the trickier tricks.
Great post, markpasc!

What might be helpful would be if git visualizers like qgit/gitk/github had support for --find-copies-harder... they could conceivably graph the history of a particular file including its ancestry.

Git is very different from SVN; I'm finding the transition a bit tricky. But after using git a little, I now realize why I absolutely need its features.
Someone else has recommended to use
git log --follow
see http://stackoverflow.com/a/1043566 .
Good to know you can ask for this now. Thanks!