How can I recover after a checksum mismatch with 'git svn clone'? - migration

I'm cloning an SVN repository to git as part of our migration plan. I've hit various snags along the way, forcing me to continue the clone with a git svn fetch command. The most recent failure I can't figure out how to solve:
$ git svn fetch
Checksum mismatch: dc/trunk-4632-jh/dc-smtpd/lib/Qpsmtpd/Address.pm.t 8ce3aea3f47dc115e8fe53bd62d0f074cfe93ec6
expected: 59de969022e46135fa6dc7599fc2f3b4
got: 4334926a01c905cdb7fce71265e370c1
I found this related answer, however that solution doesn't work because git svn log is not yet functional, as the repo is not fully in place:
$ git svn log dc/trunk-4632-jh/dc-smtpd/lib/Qpsmtpd/Address.pm.t
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions
log --no-color --first-parent --pretty=medium HEAD: command returned error: 128
How can I proceed?

Another answer to an old question but straight forward solutions are tough to find for this problem so hopefully this helps others.
I think this issue occurs due to a corrupted file during transfer. Not sure how or why it happens, but in my case, I get the same error at different revisions every time I do a new clone and sometimes not at all.
Using the questioners error message
$ git svn fetch
Checksum mismatch: dc/trunk-4632-jh/dc-smtpd/lib/Qpsmtpd/Address.pm.t
8ce3aea3f47dc115e8fe53bd62d0f074cfe93ec6
expected: 59de969022e46135fa6dc7599fc2f3b4
got: 4334926a01c905cdb7fce71265e370c1
The following steps allowed me to resume and progress :-
View all branches. These will all be remote branches. git branch -a
Checkout branch affected. git checkout remotes/origin/trunk-4632-jh
This will take some time to complete.
Find the last revision that the problematic file was changed. git svn log dc-smtpd/lib/Qpsmtpd/Address.pm.t
Note the highest revision #
Reset back to this rev. git svn reset -r (rev #) -p
Carry on. git svn fetch
Good luck.

I know this is old but maybe it will be helpful for future reference as all search results on this are not helpful.
I've hit similar issue on our huge repository which takes days to clone and unfortunately at one point I had to restart my machine. I am currently working out how to resolve the problem, so please keep in mind this is more a suggestion than tested solution.
I think you need to try creating a branch and checking out the commits you currently have from previous fetch:
git checkout -b master git-svn
After that is done you should have working tree up to that commit. Another fetches will probably fail due to object mismatch but at that point at least it should be possible to use "git svn reset" to revert faulty svn fetches (see OP's related answer link). If that's true find offending commit, reset before it and then continue fetching.
You might want to rebase and revert to state before that broken commit on your master branch or convert back to bare repository, if that's what you're after (in my case it is).
Hope this works. I'll post an update when my checkout is done (will take at least few hours... sigh).
Edit: That seemed to work. I successfully discarded some git-svn commits and am able to re-fetch them again. :)
Edit2: Make sure to reset until you don't get any object mismatch warnings on git svn fetch (otherwise you will run into the same issue soon).
Cheers,
Henryk

See also: Git svn rebase : checksum mismatch
In our case the additional treatment of the files (server-side includes in Apache) caused the checksum problem.
Disabling SSI in Apache's /etc/httpd.conf file for the period of migration by commenting out the
AddType text/html .shtml
AddOutputFilter INCLUDES .shtml
directives solved the problem, caused by the interpretation of .shtml files by the front-end Apache server, which produced a new content (and thus a new hash), other than the hash of the original file itself.

That means some files in the repository got corrupted. It can be caused by various reasons such as software bugs, bit rots in drives, etc. I was recently transitioning very old ~10GB svn repository to git, therefore some corruption was expected.
To fix the corruption, you basically need to dump the entire repository and import it while filtering the errors out. Note that our goal is to complete the import process no matter why or how the repository got corrupted. You cannot simply fix the corruption without having a backup and diffing through the revision files.
First basic one-off command you could use is:
svnadmin create repo2
svnadmin dump repo | sed '/^Text-content-md5/d' | svnadmin load repo2
This removes the checksum calculation from the dump so the new repo will have updated checksums.
If you encountered more errors during the dump and load (which is expected), try incremental approach so you can continue from the point you left. Below command will dump the revisions starting from 101 to 150 (inclusive).
svnadmin dump --incremental -r101:150 repo | sed '/^Text-content-md5/d' | svnadmin load repo2
Some common errors and solutions:
'Premature end of content data in dumpstream': That means Content-length of some file does not match the repository version, so some data is lost in the specified file. We must skip it. Add | svndumpfilter exclude path/to/file.jar command like this:
svnadmin dump --incremental -r101:150 repo | svndumpfilter exclude path/to/file.jar | sed '/^Text-content-md5/d' | svnadmin load repo2
Property errors: Add --bypass-prop-validation to svnadmin load command
After populating your second repo, you would simply svnserve -d -r repo2 and try git svn fetch again.
Good luck!

Related

Can't svn2git public repos

I need to migrate a customer from SVN to Git, so I wanted first to try svn2git on a public SVN repository.
I have found several public repos, e.g., https://svn.alfresco.com/repos/alfresco-open-mirror/alfresco and http://svn.apache.org/repos/asf/spamassassin. There is no problem of doing svn co, but when I try svn2git, I get the following problem:
D:\Documents\work\svn2git\apache>svn2git http://svn.apache.org/repos/asf/spamassassin
Initialized empty Git repository in D:/Documents/work/svn2git/apache/.git/
Using higher level of URL: http://svn.apache.org/repos/asf/spamassassin => http://svn.apache.org/repos/asf
W: Ignoring error from SVN, path probably does not exist: (160013): Filesystem has no item: REPORT request failed on '/repos/asf/!svn/bc/100': File not found: revision 100, path '/spamassassin'
W: Do not be alarmed at the above message git-svn is just searching aggressively for old history.
This may take a while on large repositories
Checked through r100
Checked through r200
Checked through r300
It ran the whole night, and ended with:
Checked through r22000
Checked through r22100
W: Ignoring error from SVN, path probably does not exist: (175002): RA layer request failed: PROPFIND request failed on '/repos/asf': PROPFIND of '/repos/asf': could not connect to server (http://svn.apache.org)
W: Do not be alarmed at the above message git-svn is just searching aggressively for old history.
This may take a while on large repositories
Checked through r477700
Path 'spamassassin' was probably deleted:
RA layer request failed: PROPFIND request failed on '/repos/asf': PROPFIND of '/repos/asf': could not connect to server (http://svn.apache.org)
Will attempt to follow revisions r477601 .. r477700 committed before the deletion
r477601 .. r477679 OK
Checked through r748600
Path 'spamassassin' was probably deleted:
RA layer request failed: PROPFIND request failed on '/repos/asf': PROPFIND of '/repos/asf': could not connect to server (http://svn.apache.org)
Will attempt to follow revisions r748501 .. r748600 committed before the deletion
Checked through r748700
Checked through r748800
Checked through r748900
Checked through r749000
Checked through r749100
Checked through r749200
Checked through r749300
Checked through r749400
W: Ignoring error from SVN, path probably does not exist: (175002): RA layer request failed: PROPFIND request failed on '/repos/asf/!svn/vcc/default': PROPFIND of '/repos/asf/!svn/vcc/default': could not connect to server (http://svn.apache.org)
W: Do not be alarmed at the above message git-svn is just searching aggressively for old history.
This may take a while on large repositories
Checked through r805700
Path 'spamassassin' was probably deleted:
RA layer request failed: PROPFIND request failed on '/repos/asf/!svn/vcc/default': PROPFIND of '/repos/asf/!svn/vcc/default': could not connect to server (http://svn.apache.org)
Will attempt to follow revisions r805601 .. r805700 committed before the deletion
Checked through r805800
Checked through r805900
Checked through r806000
Checked through r806100
Checked through r806200
Checked through r806300
Checked through r806400
Checked through r806500
Checked through r806600
Checked through r806700
command failed:
git checkout -f master
Why does it happen? Is it a permission problem?
Well, so far everything looks fine. If it fails, you should specify with what message it fails.
But besides that, for a one-time migration git-svn (the svn2git tool you use is based on git-svn) is not the right tool for conversions of repositories or parts of repositories. It is a great tool if you want to use Git as frontend for an existing SVN server, but for one-time conversions you should not use git-svn or tools based on it, but svn2git which is much more suited for this use-case.
There are plenty tools called svn2git, the probably best one is the KDE one from https://github.com/svn-all-fast-export/svn2git. I strongly recommend using that svn2git tool. It is the best I know available out there and it is very flexible in what you can do with its rules files. The svn2git that is based on git-svn suffers from most of the same drawbacks than pur git-svn, just working around of some of the issues.
You will be easily able to configure svn2gits rule file to produce the result you want from your current SVN layout, including any complex histories that might exist and including producing several Git repos out of one SVN repo or combining different SVN repos into one Git repo cleanly in one run if you like.
If you are not 100% about the history of your repository, svneverever from http://blog.hartwork.org/?p=763 is a great tool to investigate the history of an SVN repository when migrating it to Git.
Even though git-svn is easier to start with, here are some further reasons why using the KDE svn2git instead of git-svn is superior, besides its flexibility:
the history is rebuilt much better and cleaner by svn2git (if the correct one is used), this is especially the case for more complex histories with branches and merges and so on
the tags are real tags and not branches in Git
with git-svn the tags contain an extra empty commit which also makes them not part of the branches, so a normal fetch will not get them until you give --tags to the command as by default only tags pointing to fetched branches are fetched also. With the proper svn2git tags are where they belong
if you changed layout in SVN you can easily configure this with svn2git, with git-svn you will loose history eventually
with svn2git you can also split one SVN repository into multiple Git repositories easily
or combine multiple SVN repositories in the same SVN root into one Git repository easily
the conversion is a gazillion times faster with the correct svn2git than with git-svn
You see, there are many reasons why git-svn is worse and the KDE svn2git is superior. :-)

Is git svn dcommit atomic?

In my company we have a subversion server and everyone is using subversion on their machines.
However I'd like to use git, committing changes locally and then "push" them when I'm ready.
However, I can't understand what happens in the following situation.
Let's say that I made 3 git commits locally and now I'm ready to "push" everything on the subversion server. If I understand correctly, git svn dcommit should basically make 3 commits sequentially on the server, right? But what happens if in the meantime (let's say between the second and the third commit) another colleague of mine issues a commit?
The scenarios I can think of are:
1) git kind of "locks" (is that even possible?) the subversion server during commits so that my commits are doing atomically and my colleague's one is done after mine
2) The commit history on the server becomes mine1-mine2-other-mine3 (even if 'other' should fail since my colleague doesn't have an updated working copy at that point).
I think it's #2, but perhaps the committing speed is so high that this seldom becomes an issue. So which one is, #1 or #2?
No locks are not supported in Git, it's not a Git way (Git way is branching and merginig).
With git-svn you'll get mine1-mine2-other-mine3 history. If you need atomicity, have a look at SubGit project (it is installed into the SVN server and creates a pure Git interface for the SVN repository).
There was a similar question recently that might be interesting for you.
If you are lucky then number 2 but most of the time you aren't that lucky. In my experience when I dcommit a lot of commits and someone else commits while doing that usually 2 things happen:
It stops with dcommitting your other changes.
You lose the commits not-yet dcommitted.
Number 2 is really really annoying. The main problem is that you need to be totally up-to date to use git svn dcommit. This is because git-svn doesn't let the server merge revisions on the fly. (Because it would require both committers to have a working tree with both changes).
The only way to solve this are the following steps which I found here
Open .git/logs/HEAD
Look for your most recent commit (note that these commits are sorted
by “unix time”, although you can also find the latest one by reading
the shortlog there
Confirm that the commit you found is the right one: git show
git reset --hard hash from log
git svn rebase
git svn dcommit
Following this procedure allows you to take off from where it failed. I hope they fix this soon but they said this isn't priority for them yet.
Ofcourse if you commmit small groups and have a fast connection to the server it shouldn't happen that often. (I only got it 2-3 times when actively working and committing every day for 6 months).

git-svn : rebase fails multiple times then works

We are rolling out git on our clients to interact with a central SVN repository. On most work stations it works fine, but we have one work station where the person has to run git svn rebase 3-4 times before it completes. Each time there is no error, but random files are marks as modified or new. The files seem to be a commit that was pulled down from the central svn repository but not completed. Rerunning git svn rebase again a few times clears this up. The computer is top of the line with plenty of hard drive space and 16 gigs ram. Has anyone else ran into issues like this?
I had a similar issue that was solved by upgrading to git 1.8.
Why does "git rebase" leave opposite sets of modifications in the stage and the working copy?
Maybe you should try that.

How to fix "file not found" on git svn dcommit?

I'm trying to do git svn dcommit, however, one directory continues to fail on me and therefore stops my commit and continue to get this error:
Filesystem has no item: File not found: transaction '43999-6', path '/path/to/folder' at /usr/local/git/libexec/git-core/git-svn line 572
I tried adding the folder back in but i continue to get that error. can I remove a commit from the tree to bypass this? Not sure what else to do here.
edit
some of the following don't fully answer my question, but they seem to be in the right direction:
issue about tracking and not detaching the HEAD
issue about rebasing
issue about recovering commits
The last issue seems to be what I wanted, but with the size of my repo (last time, took me around a whole work day to checkout the entire thing), and the little amount of work I would have lost by just doing a hard reset (which ultimately seemed to do the trick), I went for the hard reset option.
svn reset --hard didn't work for me
the reason of this is that when doing a dcommit to svn, it seems like the commit that deleted the file appears to be done in both git and svn at the same time but the link is lost.
The solution that worked for me was to reset master to the commit before the problem, then merge all sucessive commit back to master (except the faulty one), then redo the file deletion.
there may be a more elegant solution...
side note:
git svn DOES svn rename/move files correctly.
It (either tortoisegit+mysgit or jgit/egit) does it automagically all the time ;)
I don't think git-svn actually supports renaming files. I get this error every time I try to rename something. I always end up having to rename it with svn and then rebase with git-svn.
Update
This is likely due to the fact that git-svn doesn't play nicely with spaces in URLs. I often have to rename project paths in order to get them to work with git-svn. Of course, this isn't an acceptable solution for projects that actually have other people working on them. For those I simply have to resort to using svn to move files. It's a huge hassle.
I was able to work around the problem of git svn not working for repositories with spaces in them by patching git-svn.
I updated the url_path function to:
sub url_path {
my ($self, $path) = #_;
my $url = $self->{url} . '/' . $self->repo_path($path);
if ($self->{url} =~ m#^https?://#) {
$url =~ s!([^~a-zA-Z0-9_./-])!uc sprintf("%%%02x",ord($1))!eg;
$url =~ s!^(https?)%3A//!$1://!;
}
$url
}
This ensures that the spaces in the url are encoded correctly.
It seems to work for me, but hasn't been tested thoroughly.
I believe the problem should be fixed in Git >= 1.8.0
You should consider to upgrade it.
Home page: https://github.com/git/git
I know this is an old question but I had this exact issue recently and wanted to share how I fixed the problem. Admittedly this is not a nice solution but it allowed me to complete my commit. I did the following:
Added the folder/file under complaint back into svn using svn.
Committed my original code from git to svn (git svn dcommit --rmdir)
Deleted the folder/file in git and committed this to svn.
This meant I had an extra 2 small commits, one to add and then another to remove the offending folder/file but after this everything worked as expected again. I know this isn't a nice solution and it doesn't address the root of the problem but at least it allowed me to commit my code. Hopefully this can help someone else in this situation needing a quick fix.

RA layer request failed while git-svn fetch

I use git svn to sync with the subversion repos:
$ mkdir prj && cd prj
$ git svn init http://url/to/repos/branches/experimental
$ git svn fetch
and got the error message:
RA layer request failed: OPTIONS of 'http://url/to/repos/branches/experimental':
Could not read status line: connection was closed by proxy server
(http://url/to/repos) at /usr/bin/git-svn line 1352
Why and how can I fix this?
I had the same issue when accessing a SVN repo through a proxy.
The solution for me was to edit ~/.subversion/servers and add the needed proxy to the [globals] section. Uncomment the relevant lines (http-proxy-host, http-proxy-port, optionally http-proxy-username and http-proxy-password) and enter the needed information there.
This is needed because git svn uses the settings stored in ~/.subversion/servers to access SVN repositories.
It seems like this is a timeout issue on the server. Here's one bug report (I can't access the ticket it's a duplicate of, unfortunately). It's happening a lot to me, but if I just try the command again, it gets a little farther before timing out again. Eventually, I'll have the whole repository, and won't have to do this again, I hope.
I witnessed the similar
Could not read response body: connection was closed by server
I was able to resolve it by setting Timeout to 6000 in the Apache config.