How to ignore certain files when branching / checking out? - bazaar

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.

As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.

I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
#!/bin/bash
max_retries=5
rooturl="$1"
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
exit 1
fi
tmpdir="$(mktemp -d)"
target="$(pwd)"
# used for holding HTTP response before extracting data
tmp="$(mktemp)"
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
}
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
IFS=$'\n'
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
done
return
fi
done
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
}
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
done=true
break
fi
done
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
done
}
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
Feel free to correct any mistakes or add improvements.

There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)

Related

Get SVN URL of removed git-svn file

I would like to track a removed file as far back in history as possible, while using git-svn on a subdirectory of the SVN repository.
Using git log --full-history -- path/to/removed_file.py, I can get see the history starting with the time the file was moved into the subdirectory I checked out using git-svn.
I can see which SVN revision that was in the git-svn commit message postfix, so I would now like to use svn log <full_url>#revision to see the rest of the history.
I know that I could use git svn info --url path/to/existing_file.py to see the required full SVN url, but what is a quick (ideally scriptable) way of getting the SVN URL of a file that is no longer in the repository?
To git, it doesn't matter much that a file foo/bar.py is removed in HEAD — as long as you have it in history, you can view every past version of it.
For clarity of concreteness, I'll take this git-svn repo from the LLVM project as an example. There, the file docs/todo.rst has been deleted in svn revision 308987, git commit fb572868… and is absent in master.
Let's first init a local clone.
$ git clone https://github.com/llvm-mirror/lnt && cd lnt
Cloning into 'lnt'...
...
$ git svn init https://llvm.org/svn/llvm-project/lnt/trunk
$ git update-ref refs/remotes/git-svn refs/remotes/origin/master
$
$ #-- ask svn info of anything to check setup and/or force laziness
$ git svn info --url README.md
Rebuilding .git/svn/refs/remotes/git-svn/.rev_map.91177308-0d34-0410-b5e6-96231b3b80d8 ...
r154126 = 3c3062527ac17b5fac440c55a3e1510d0ab8c9d9
r154135 = 82a95d29ac7d25c355fbd0898a44dc3e71a75fd8
...
r374687 = 446f9a3b651086e87684d643705273ef78045279
r374824 = 8c57bba3687ada10de5653ae46c537e957525bdb
Done rebuilding .git/svn/refs/remotes/git-svn/.rev_map.91177308-0d34-0410-b5e6-96231b3b80d8
https://llvm.org/svn/llvm-project/lnt/trunk/README.md
So it gives back the README.md URL as expected. Now let's try the case of a deleted file:
$ git svn info --url docs/todo.rst
svn: 'docs/todo.rst' is not under version control
Fails, just like you say. man git-svn says that info Does not currently support a -r/--revision argument.
OK then, let's try emulating what it does, first by hand.
https://llvm.org/svn/llvm-project/lnt/trunk/README.md?r=374824 — this is the URL for given file at given revision.
Our vanished docs/todo.rst is available at https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986 Notice the decrement: per git show fb572868 | grep git-svn-id, docs/todo.rst is already deleted in r308987 — so we request r308986.
On to scripting it... rather simple job.
git-svn-oldinfo () {
relfname="$1"
git log -n1 -- "$relfname" \
| awk '/git-svn-id:/ {sub(/#/, " ", $2); print $2}' \
| { read baseurl rev; echo "${baseurl}/${relfname}?p=$((rev-1))"; }
}
#-- test:
$ git-svn-oldinfo docs/todo.rst
https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986
Quick-n-dirty but tested — you're welcome to adjust & extend as needed.
Edit
Despite git log being a "porcelain" command (i.e. not really designed for scripting), it's quite possible to parse out the filenames from it too, if you're to query by globs like **/removed_file.py:
git-svn-oldinfo-glob () {
fileglob="$1"
git log -n1 --stat --format=oneline -- "$fileglob" \
| { read commit msg; \
read fullname _remainder_dummy; \
git cat-file -p $commit \
| tail -n1 \
| awk '/git-svn-id:/ {sub(/#/, " ", $2); print $2}' \
| { read baseurl rev; echo "${baseurl}/${fullname}?p=$((rev-1))"; } \
}
}
#-- test:
$ git-svn-oldinfo-glob '**/todo.rst'
https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986
Take it with a grain of salt: it'll probably break in hilarious ways or output garbage if the glob matches multiple files, non-removed files, files with whitespace in the name, etc.
As always, check out man git-log and customize as needed.

Move file, change permissions and rename it keeping the same extesion

Using zsh 5.2 on Fedora 24 workstation.
I want to be programatically able to:
move an image file (can have jpg/ jpeg/ png/ JPG/ PNG extensions)
from /tmp/folder1 to ~/Pictures
This file will have the same few initial characters --- prefix111.jpg OR prefix222.png, etc.
rename the file such that samefilename.JPG becomes 20161013.jpg
20161013 is today's date in yyyymmdd format
Note that the extension becomes small letters
And JPEG or jpeg becomes jpg
change the permissions of the moved file to 644
All at one go.
If there are multiple prefix* files, the command should just fail silently.
I will initially like to do it at the command prompt with an option to add a cron job later. I mean, will the same zsh command/ script work in cron?
I am sure, this is doable. However, with my limited shell knowledge, could only achieve:
mv /tmp/folder1/prefix-*.JPG ~/Pictures/$(date +'%Y%m%d').jpg
Problems with my approach are many. It does not handle capitalization, does not take care of different extensions and does not address the permission issue.
How about this:
#!/bin/sh
FILES="/tmp/folder1/prefix*.jpg /tmp/folder1/prefix*.jpeg /tmp/folder1/prefix*.png h/tmp/folder1/prefix*.JPG /tmp/folder1/prefix*.PNG"
if [ $(ls $FILES | wc -l ) -gt 1 ]; then
exit 1
fi
if [ $(ls $FILES | grep -i '\.png$') ]; then
SUFF=png
else
SUFF=jpg
fi
DEST=$HOME/Pictures/$(date +'%Y%m%d').$SUFF
mv $FILES $DEST
chmod 644 $DEST

Apache Subversion pre-commit to restrict files

I'm new to Apache SVN and I need some help to use a pre-commit script to filter which files are being upload to my repository.
I was searching a lot and found this script on another question, but it didn't work for me.
#!/bin/bash
REPOS=$1
TXN=$2
AWK=/usr/bin/awk
SVNLOOK="/usr/bin/svnlook";
#Put all the restricted formats in variable FILTER
FILTER=".(sh|xls|xlsx|exe|xlsm|XLSM|vsd|VSD|bak|BAK|class|CLASS)$"
# Figure out what directories have changed using svnlook.
FILES=`${SVNLOOK} changed -t ${REPOS} ${TXN} | ${AWK} '{ print $2 }'` > /dev/null
for FILE in $FILES; do
#Get the base Filename to extract its extension
NAME=`basename "$FILE"`
#Get the extension of the current file
EXTENSION=`echo "$NAME" | cut -d'.' -f2-`
#Checks if it contains the restricted format
if [[ "$FILTER" == *"$EXTENSION"* ]]; then
echo "Your commit has been blocked because you are trying to commit a restricted file." 1>&2
echo "Please contact SVN Admin. -- Thank you" 1>&2
exit 1
fi
done
exit 0
If I try to use svnlook changed -t repodirectory it didn't work because had a missing subcommand.
I overwrote my pre-commit.tmpl but it didn't work, can someone help me?
First - seems you incorrectly use svnlook. It should has parameters:
svnlook changed ${REPOS} -t ${TXN}
-t means 'read from transaction' and TXN - transaction name itself.
Second - not sure if I understand correctly, but hook file should has name pre-commit not pre-commit.tmpl
Third - pre-commit should has correct rights. For tests try a+rwx
update. It is not easy to obtain transaction object for tests, but you can use svnlook -r <revision> <repositiry_path> and experiment on already commited revisions.

How to detect code change frequency?

I am working on a program written by several folks with largely varying skill level. There are files in there that have never changed (and probably never will, as we're afraid to touch them) and others that are changing constantly.
I wonder, are there any tools out there that would look at the entire repo history (git) and produce analysis on how frequently a given file changes? Or package? Or project?
It would be of value to recognize that (for example) we spent 25% of our time working on a set of packages, which would be indicative or code's fragility, as compared with code that "just works".
If you're looking for an OS solution, I'd probably consider starting with gitstats and look at extending it by grabbing file logs and aggregating that data.
I'd have a look at NChurn:
NChurn is a utility that helps asses the churn level of your files in
your repository. Churn can help you detect which files are changed the
most in their life time. This helps identify potential bug hives, and
improper design.The best thing to do is to plug NChurn into your build
process and store history of each run. Then, you can plot the
evolution of your repository's churn.
I wrote something that we use to visualize this information successfully.
https://github.com/bcarlso/defect-density-heatmap
Take a look at the project and you can see what the output looks like in the readme.
You can do what you need by first getting a list of files that have changed in each commit from Git.
~ $ git log --pretty="format:" --name-only | grep -v ^$ > file-changes.txt
~ $ for i in `cat file-changes.txt | cut -d"." -f1,2 | uniq`; do num=`cat file-changes.txt | grep $i | wc -l`; if (( $num > 1 )); then echo $num,0,$i; fi; done | heatmap > results.html
This will give you a tag cloud with files that churn more will show up larger.
I suggest using a command like
git log --follow -p file
That will give you all the changes that happened to the file in the history (including renames). If you want to get the number of commits that changed the file then you can do on a UNIX-based OS :
git log --follow --format=oneline Gemfile | wc -l
You can then create a bash script to apply this to multiple files with the name aside.
Hope it helped !
Building on a previous answer I suggest the following script to parse all project files
#!/bin/sh
cd $1
find . -path ./.git -prune -o -name "*" -exec sh -c 'git log --follow --format=oneline $1 | wc -l | awk "{ print \$1,\"\\t\",\"$1\" }" ' {} {} \; | sort -nr
cd ..
If you call the script as file_churn.sh you can parse your git project directory calling
> ./file_churn.sh project_dir
Hope it helps.

Is there a curl/wget option that prevents saving files in case of http errors?

I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors.
As far as I can tell from the man pages, neither curl or wget provide such functionality.
Does anyone know about another downloader who does?
I think the -f option to curl does what you want:
-f, --fail
(HTTP) Fail silently (no output at all) on server errors. This is mostly done to better
enable scripts etc to better deal with failed attempts. In normal cases when an HTTP
server fails to deliver a document, it returns an HTML document stating so (which often
also describes why and more). This flag will prevent curl from outputting that and
return error 22. [...]
However, if the response was actually a 301 or 302 redirect, that still gets saved, even if its destination would result in an error:
$ curl -fO http://google.com/aoeu
$ cat aoeu
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
To follow the redirect to its dead end, also give the -L option:
-L, --location
(HTTP/HTTPS) If the server reports that the requested page has moved to a different
location (indicated with a Location: header and a 3XX response code), this option will
make curl redo the request on the new place. [...]
One liner I just setup for this very purpose:
(works only with a single file, might be useful for others)
A=$$; ( wget -q "http://foo.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file")
This will attempt to download the file from the remote Host. If there is an Error, the file is not kept. In all other cases, it's kept and renamed.
Ancient thread.. landed here looking for a solution... ended up writing some shell code to do it.
if [ `curl -s -w "%{http_code}" --compress -o /tmp/something \
http://example.com/my/url/` = "200" ]; then
echo "yay"; cp /tmp/something /path/to/destination/filename
fi
This will download output to a tmp file, and create/overwrite output file only if status was a 200. My usecase is slightly different.. in my case the output takes > 10 seconds to generate... and I did not want the destination file to remain blank for that duration.
NOTE: I am aware that this is an older question, but I believe I have found a better solution for those using wget than any of the above answers provide.
wget -q $URL 2>/dev/null
Will save the target file to the local directory if and only if the HTTP status code is within the 200 range (Ok).
Additionally, if you wanted to do something like print out an error whenever the request was met with an error, you could check the wget exit code for non-zero values like so:
wget -q $URL 2>/dev/null
if [ $? != 0]; then
echo "There was an error!"
fi
I hope this is helpful to someone out there facing the same issues I was.
Update:
I just put this into a more script-able form for my own project, and thought I'd share:
function dl {
pushd . > /dev/null
cd $(dirname $1)
wget -q $BASE_URL/$1 2> /dev/null
if [ $? != 0 ]; then
echo ">> ERROR could not download file \"$1\"" 1>&2
exit 1
fi
popd > /dev/null
}
I have a workaround to propose, it does download the file but it also removes it if its size is 0 (which happens if a 404 occurs).
wget -O <filename> <url/to/file>
if [[ (du <filename> | cut -f 1) == 0 ]]; then
rm <filename>;
fi;
It works for zsh but you can adapt it for other shells.
But it only saves it in first place if you provide the -O option
As alternative you can create a temporal rotational file:
wget http://example.net/myfile.json -O myfile.json.tmp -t 3 -q && mv list.json.tmp list.json
The previous command will always download the file "myfile.json.tmp" however only when the wget exit status is equal to 0 the file is rotated as "myfile.json".
This solution will prevent to overwrite the final file when a network failure occurs.
The advantage of this method is that in case that something is wrong you can inspect the temporal file and see what error message is returned.
The "-t" parameter attempt to download the file several times in case of error.
The "-q" is the quiet mode and it's important to use with cron because cron will report any output of wget.
The "-O" is the output file path and name.
Remember that for Cron schedules it's very important to provide always the full path for all the files and in this case for the "wget" program it self as well.
You can download the file without saving using "-O -" option as
wget -O - http://jagor.srce.hr/
You can get mor information at http://www.gnu.org/software/wget/manual/wget.html#Advanced-Usage