Get SVN URL of removed git-svn file - git-svn

I would like to track a removed file as far back in history as possible, while using git-svn on a subdirectory of the SVN repository.
Using git log --full-history -- path/to/removed_file.py, I can get see the history starting with the time the file was moved into the subdirectory I checked out using git-svn.
I can see which SVN revision that was in the git-svn commit message postfix, so I would now like to use svn log <full_url>#revision to see the rest of the history.
I know that I could use git svn info --url path/to/existing_file.py to see the required full SVN url, but what is a quick (ideally scriptable) way of getting the SVN URL of a file that is no longer in the repository?

To git, it doesn't matter much that a file foo/bar.py is removed in HEAD — as long as you have it in history, you can view every past version of it.
For clarity of concreteness, I'll take this git-svn repo from the LLVM project as an example. There, the file docs/todo.rst has been deleted in svn revision 308987, git commit fb572868… and is absent in master.
Let's first init a local clone.
$ git clone https://github.com/llvm-mirror/lnt && cd lnt
Cloning into 'lnt'...
...
$ git svn init https://llvm.org/svn/llvm-project/lnt/trunk
$ git update-ref refs/remotes/git-svn refs/remotes/origin/master
$
$ #-- ask svn info of anything to check setup and/or force laziness
$ git svn info --url README.md
Rebuilding .git/svn/refs/remotes/git-svn/.rev_map.91177308-0d34-0410-b5e6-96231b3b80d8 ...
r154126 = 3c3062527ac17b5fac440c55a3e1510d0ab8c9d9
r154135 = 82a95d29ac7d25c355fbd0898a44dc3e71a75fd8
...
r374687 = 446f9a3b651086e87684d643705273ef78045279
r374824 = 8c57bba3687ada10de5653ae46c537e957525bdb
Done rebuilding .git/svn/refs/remotes/git-svn/.rev_map.91177308-0d34-0410-b5e6-96231b3b80d8
https://llvm.org/svn/llvm-project/lnt/trunk/README.md
So it gives back the README.md URL as expected. Now let's try the case of a deleted file:
$ git svn info --url docs/todo.rst
svn: 'docs/todo.rst' is not under version control
Fails, just like you say. man git-svn says that info Does not currently support a -r/--revision argument.
OK then, let's try emulating what it does, first by hand.
https://llvm.org/svn/llvm-project/lnt/trunk/README.md?r=374824 — this is the URL for given file at given revision.
Our vanished docs/todo.rst is available at https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986 Notice the decrement: per git show fb572868 | grep git-svn-id, docs/todo.rst is already deleted in r308987 — so we request r308986.
On to scripting it... rather simple job.
git-svn-oldinfo () {
relfname="$1"
git log -n1 -- "$relfname" \
| awk '/git-svn-id:/ {sub(/#/, " ", $2); print $2}' \
| { read baseurl rev; echo "${baseurl}/${relfname}?p=$((rev-1))"; }
}
#-- test:
$ git-svn-oldinfo docs/todo.rst
https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986
Quick-n-dirty but tested — you're welcome to adjust & extend as needed.
Edit
Despite git log being a "porcelain" command (i.e. not really designed for scripting), it's quite possible to parse out the filenames from it too, if you're to query by globs like **/removed_file.py:
git-svn-oldinfo-glob () {
fileglob="$1"
git log -n1 --stat --format=oneline -- "$fileglob" \
| { read commit msg; \
read fullname _remainder_dummy; \
git cat-file -p $commit \
| tail -n1 \
| awk '/git-svn-id:/ {sub(/#/, " ", $2); print $2}' \
| { read baseurl rev; echo "${baseurl}/${fullname}?p=$((rev-1))"; } \
}
}
#-- test:
$ git-svn-oldinfo-glob '**/todo.rst'
https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986
Take it with a grain of salt: it'll probably break in hilarious ways or output garbage if the glob matches multiple files, non-removed files, files with whitespace in the name, etc.
As always, check out man git-log and customize as needed.

Related

KDE SVN2GIT "WARN: Branch ... in repository ... doesn't exist at revision ... -- did you resume from the wrong revision?" Can't continue

I'm trying to migrate an 11GB SVN repo with over than 24k revisions inside to a single GIT repository.
I did a single file dump of the SVN using svnrdump command and load it into my local SVN server, placed on my MacBook machine.
I downloaded the svn2git from the https://github.com/svn-all-fast-export/svn2git repository.
Due to differences in the way how SVN and GIT handle tags, I used a merged-branches-tags.rules from the svn2git sample directory, which look like this (I've removed comments):
create repository myproject
end repository
match /trunk/
repository myproject
branch master
end match
match /(branches|tags)/([^/]+)/
repository myproject
branch \2
end match
Then I used a docker image solution as described in the documentation (in my console it was a single line. I've did split it to clarify what I was doing):
docker run --rm -it \
-v /Users/me/work/SVN/dest:/workdir \
-v /Users/me/work/svnServer/repositories/my_svn_repo:/tmp/svn \
-v /Users/me/work/SVN/svn2git/samples:/tmp/conf \
svn2git /usr/local/svn2git/svn-all-fast-export \
--rules /tmp/conf/merged-branches-tags.rules \
--add-metadata --svn-branches --debug-rules --svn-ignore --empty-dirs \
/tmp/svn/
During the first try I got an error between revisions 12600 and 126001:
Exporting revision 12601 /tags/7.0M0p0000 was copied from /tags rev 12600
rev 12601 /tags/7.0M0p0000/ matched rule: "/tmp/conf/merged-branches-tags.rules:28 /(branches|tags)/([^/]+++++
)/" exporting.
.WARN: SVN reports a "copy from" # 12601 from /tags # 12600 but no matching rules found! Ignoring copy, treating as a modification
WARN: Transaction: "7.0M0p0000" is not a known branch in repository "myproject"
Going to create it automatically
add/change dir ( /tags/7.0M0p0000 -> "7.0M0p0000" "" )
+++++
WARN: Branch "7.0M0p0000" in repository "myproject" doesn't exist at revision 12601 -- did you resume from the wrong revision?
Failed to write to process: Process crashed for repository myproject
6223345 modifications from SVN /tags/7.0M0p0000/ to myproject/7.0M0p0000%
I've check it and in the rev 12601 there there is a new tag named as "7.0M0p0000", which I'm going to import as a branch and which wasn't in the repo in rev 12600.
Do you have any ideas what can I do to fix that and continue my migration?
Any help will be really appreciated.
After a further investigation, it turns out that the mentioned "7.0M0p0000" tag was created in the rev 12601 as a copy of all the tags from rev 12600.
I've found it in the dump file, created using this command:
svnrdump dump -r 12600:12601 --incremental http://xxx.xxx.xxx.xxx/svn/my_repo > my_repo.dump
There was an entry:
Revision-number: 12601
...
Node-path: tags/7.0M0p0000
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 12600
Node-copyfrom-path: tags
It seems that KDE's svn2git is unable to deal with such cases (which was probably done by mistake).
The only solution I found was completely skip this tag by adding a match to my merged-branches-tags.rules file (order of match expressions is important):
match /tags/7.0M0p0000/
min revision 12600
max revision 12606
end match
...
match /(branches|tags)/([^/]+)/
repository myproject
branch \2
end match

Apache Subversion pre-commit to restrict files

I'm new to Apache SVN and I need some help to use a pre-commit script to filter which files are being upload to my repository.
I was searching a lot and found this script on another question, but it didn't work for me.
#!/bin/bash
REPOS=$1
TXN=$2
AWK=/usr/bin/awk
SVNLOOK="/usr/bin/svnlook";
#Put all the restricted formats in variable FILTER
FILTER=".(sh|xls|xlsx|exe|xlsm|XLSM|vsd|VSD|bak|BAK|class|CLASS)$"
# Figure out what directories have changed using svnlook.
FILES=`${SVNLOOK} changed -t ${REPOS} ${TXN} | ${AWK} '{ print $2 }'` > /dev/null
for FILE in $FILES; do
#Get the base Filename to extract its extension
NAME=`basename "$FILE"`
#Get the extension of the current file
EXTENSION=`echo "$NAME" | cut -d'.' -f2-`
#Checks if it contains the restricted format
if [[ "$FILTER" == *"$EXTENSION"* ]]; then
echo "Your commit has been blocked because you are trying to commit a restricted file." 1>&2
echo "Please contact SVN Admin. -- Thank you" 1>&2
exit 1
fi
done
exit 0
If I try to use svnlook changed -t repodirectory it didn't work because had a missing subcommand.
I overwrote my pre-commit.tmpl but it didn't work, can someone help me?
First - seems you incorrectly use svnlook. It should has parameters:
svnlook changed ${REPOS} -t ${TXN}
-t means 'read from transaction' and TXN - transaction name itself.
Second - not sure if I understand correctly, but hook file should has name pre-commit not pre-commit.tmpl
Third - pre-commit should has correct rights. For tests try a+rwx
update. It is not easy to obtain transaction object for tests, but you can use svnlook -r <revision> <repositiry_path> and experiment on already commited revisions.

`git svn dcommit` failing on a branch

I have been using git-svn to communicate with my company’s svn repo for a while now without any major headaches.
Today, the “headache”-part changed dramatically:
I’ve been working on master/trunk pretty exclusively, and needed to merge most (but not all!) of those change-sets into a new svn-branch, that originated from a pre-existing svn-branch.
Basically this:
🍒---💩---💩---💩--1🍒--1🍒---💩--1🍒---💩---💩--1🍒--1🍒--1🍒---💩 master/trunk
\
\
2🍒--2🍒--2🍒--2🍒--2🍒 versioned-release
Should have become this:
🍒---💩---💩---💩--1🍒--1🍒---💩--1🍒---💩---💩--1🍒--1🍒--1🍒---💩 master/trunk
\
\
2🍒--2🍒--2🍒--2🍒--2🍒 versioned-release
\
\
1🍒--1🍒--1🍒--1🍒--1🍒--1🍒 new-versioned-release
Where 💩 are commits that shouldn’t be in the new-versioned-release, and x🍒 the wanted commits from the respective branches x.
So I did the following:
git checkout -b versioned-release-svn remotes/versioned-release
git svn branch new-versioned-release -m "Preparing for merge of XXX"
git checkout -b new-versioned-release-svn remotes/new-versioned-release
git cherry-pick ... for every 1🍒, resolving any conflicts on the way.
Because I wanted to be sure I was really going to target the correct branch on the repo, I then ran git svn dcommit --dry-run which did not yield any errors or warnings, but told me…
Committing to svn://username#$repo-host/$repo-name/$path/branches/new-versioned-release ...
…followed by a couple of diff-tree lines.
So I attempted to omit the --dry-run and half way through the commits ended up with…
Item already exists in filesystem: File already exists: filesystem '/data/subvroot/$repo-name/db', transaction '20856-g3m', path '/$path/branches/new-versioned-release/some-directory' at /usr/libexec/git-core/git-svn line 862
…and a bunch of unstaged changes.
Apart from the obvious — “WTF?!?” and “How do I get out of this mess without losing everything I did?” — I have two questions:
Assuming I was back to before git svn dcommit: How do I get my local branch dcommit to its planned destination?
By now it seems obvious, that this wasn’t the right way to achieve what I wanted…but how should I have done it, instead?
Everything I found for the error-message, that somehow resembled my situation, so far was this other stack overflow question and the proposed solution of “somehow […] to blow away the .git/svn metadata directory” doesn’t resonate quite that well with me…
Someone just up–voted my old question, so I thought I’d share how I do that nowadays.
It works really quite well.
Assuming the git repository has been created using
git svn clone \
--prefix svn/ \
--stdlayout \
svn://username#$repo-host/$repo-name/$path
$git_repo_name
change into the git repo, and there run
git checkout svn/versioned-release
git svn branch new-versioned-release
This will result in the following history on the SVN server:
🍒---💩---💩---💩--1🍒--1🍒---💩--1🍒---💩---💩--1🍒--1🍒--1🍒---💩 trunk
\
\
2🍒--2🍒--2🍒--2🍒--2🍒 versioned-release
\
\
3⭐️ new-versioned-release
Now I’d run
git checkout svn/new-versioned-release
git checkout -b new-versioned-release
# resulting in the following **local** history:
#
# 🍒---💩---💩---💩--1🍒--1🍒---💩--1🍒---💩---💩--1🍒--1🍒--1🍒---💩 master (tracks 'svn/trunk')
# \
# \
# 2🍒--2🍒--2🍒--2🍒--2🍒--3⭐️ new-versioned-release (tracks 'svn/new-versioned-release')
This is the foundation for achieving what I wanted.
There is one additional commit, because branching in SVN doesn’t work the same way as in Git: creating a branch always means a new revision, (aka commit) and that’s where the 3⭐️ comes from. It doesn’t really matter, but it’s there.
I can now git cherry-pick all the 1🍒s, ending up with this local history:
🍒---💩---💩---💩--1🍒--1🍒---💩--1🍒---💩---💩--1🍒--1🍒--1🍒---💩 master (tracks 'svn/trunk')
\
\
2🍒--2🍒--2🍒--2🍒--2🍒--3⭐️--1🍒--1🍒--1🍒--1🍒--1🍒--1🍒 new-versioned-release (tracks 'svn/new-versioned-release')
When I now git svn dcommit while sitting on new-versioned-release in git, the history on the SVN server looks like what I wanted to end up with:
🍒---💩---💩---💩--1🍒--1🍒---💩--1🍒---💩---💩--1🍒--1🍒--1🍒---💩 trunk
\
\
2🍒--2🍒--2🍒--2🍒--2🍒 versioned-release
\
\
3⭐️--1🍒--1🍒--1🍒--1🍒--1🍒--1🍒 new-versioned-release
The only difference being that additional 3⭐️ from creating the third SVN branch.

How to detect code change frequency?

I am working on a program written by several folks with largely varying skill level. There are files in there that have never changed (and probably never will, as we're afraid to touch them) and others that are changing constantly.
I wonder, are there any tools out there that would look at the entire repo history (git) and produce analysis on how frequently a given file changes? Or package? Or project?
It would be of value to recognize that (for example) we spent 25% of our time working on a set of packages, which would be indicative or code's fragility, as compared with code that "just works".
If you're looking for an OS solution, I'd probably consider starting with gitstats and look at extending it by grabbing file logs and aggregating that data.
I'd have a look at NChurn:
NChurn is a utility that helps asses the churn level of your files in
your repository. Churn can help you detect which files are changed the
most in their life time. This helps identify potential bug hives, and
improper design.The best thing to do is to plug NChurn into your build
process and store history of each run. Then, you can plot the
evolution of your repository's churn.
I wrote something that we use to visualize this information successfully.
https://github.com/bcarlso/defect-density-heatmap
Take a look at the project and you can see what the output looks like in the readme.
You can do what you need by first getting a list of files that have changed in each commit from Git.
~ $ git log --pretty="format:" --name-only | grep -v ^$ > file-changes.txt
~ $ for i in `cat file-changes.txt | cut -d"." -f1,2 | uniq`; do num=`cat file-changes.txt | grep $i | wc -l`; if (( $num > 1 )); then echo $num,0,$i; fi; done | heatmap > results.html
This will give you a tag cloud with files that churn more will show up larger.
I suggest using a command like
git log --follow -p file
That will give you all the changes that happened to the file in the history (including renames). If you want to get the number of commits that changed the file then you can do on a UNIX-based OS :
git log --follow --format=oneline Gemfile | wc -l
You can then create a bash script to apply this to multiple files with the name aside.
Hope it helped !
Building on a previous answer I suggest the following script to parse all project files
#!/bin/sh
cd $1
find . -path ./.git -prune -o -name "*" -exec sh -c 'git log --follow --format=oneline $1 | wc -l | awk "{ print \$1,\"\\t\",\"$1\" }" ' {} {} \; | sort -nr
cd ..
If you call the script as file_churn.sh you can parse your git project directory calling
> ./file_churn.sh project_dir
Hope it helps.

How to ignore certain files when branching / checking out?

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.
As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.
I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
#!/bin/bash
max_retries=5
rooturl="$1"
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
exit 1
fi
tmpdir="$(mktemp -d)"
target="$(pwd)"
# used for holding HTTP response before extracting data
tmp="$(mktemp)"
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
}
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
IFS=$'\n'
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
done
return
fi
done
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
}
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
done=true
break
fi
done
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
done
}
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
Feel free to correct any mistakes or add improvements.
There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)