Bazaar: How to export just changed file of some specific revision? - bazaar

I'm wonder if there is anyway to just export files that have changed in specific revision.
e.g : I have branch with three files :
file.php
file.js
file.css
Just file.js has changed in last commit.
How to use export command to just export changed file (file.js) and prevent exporting others.
Is there any Plugin or external 3rdParty ?

Using bzr export you can specify a single directory to export, but not individual files.
As an alternative, you can get the contents of a file at some past revision like this:
bzr cat -r REV path/to/file > file.rREV
You can get the list of changed files at some past revision with the one-liner:
bzr diff -c REV | grep ^===
To wrap it up, here's a complete one-liner that does just what you asked for: export just the modified files of some specific revision REV into a directory called EX:
bzr diff -cREV | grep '^=== modified file ' | sed -e "s/[^']*//" -e "s/'//g" |\
while read fname; do echo $fname; mkdir -p EX/"$(dirname "$fname")";\
bzr cat -rREV "$fname" > EX/"$fname"; done
It loops over the modified files in revision REV, prepares the export directory EX with all parent directories needed to save the file preserving the path, and finally gets the file with bzr cat and writes it at the correct relative path inside EX.

Related

Extract huge tar.gz archives from S3 without copying archives to a local system

I'm looking for a way to extract huge dataset (18 TB+ found here https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) with this in mind I need the process to be fast (i.e. I don't want to spend twice the time for first copying and then extracting files) Also I don't want archives to take extra space not even one 20 gb+ archive.
Any thoughts on how one can achieve that?
If you can arrange to pipe the data straight into tar, it can uncompress and extract it without needing a temporary file.
Here is a example. First create a tar file to play with
$ echo abc >one
$ echo def >two
$ tar cvf test.tar
$ tar cvf test.tar one two
one
two
$ gzip test.tar
Remove the test files
$ rm one two
$ ls one two
ls: cannot access one: No such file or directory
ls: cannot access two: No such file or directory
Now extract the contents by piping the compressed tar file into the tar command.
$ cat test.tar.gz | tar xzvf -
one
two
$ ls one two
one two
The only part missing now is how to download the data and pipe it into tar. Assuming you can access the URL with wget, you can get it to send the data to stdout. So you end up with this
wget -qO- https://youtdata | tar xzvf -

Move file, change permissions and rename it keeping the same extesion

Using zsh 5.2 on Fedora 24 workstation.
I want to be programatically able to:
move an image file (can have jpg/ jpeg/ png/ JPG/ PNG extensions)
from /tmp/folder1 to ~/Pictures
This file will have the same few initial characters --- prefix111.jpg OR prefix222.png, etc.
rename the file such that samefilename.JPG becomes 20161013.jpg
20161013 is today's date in yyyymmdd format
Note that the extension becomes small letters
And JPEG or jpeg becomes jpg
change the permissions of the moved file to 644
All at one go.
If there are multiple prefix* files, the command should just fail silently.
I will initially like to do it at the command prompt with an option to add a cron job later. I mean, will the same zsh command/ script work in cron?
I am sure, this is doable. However, with my limited shell knowledge, could only achieve:
mv /tmp/folder1/prefix-*.JPG ~/Pictures/$(date +'%Y%m%d').jpg
Problems with my approach are many. It does not handle capitalization, does not take care of different extensions and does not address the permission issue.
How about this:
#!/bin/sh
FILES="/tmp/folder1/prefix*.jpg /tmp/folder1/prefix*.jpeg /tmp/folder1/prefix*.png h/tmp/folder1/prefix*.JPG /tmp/folder1/prefix*.PNG"
if [ $(ls $FILES | wc -l ) -gt 1 ]; then
exit 1
fi
if [ $(ls $FILES | grep -i '\.png$') ]; then
SUFF=png
else
SUFF=jpg
fi
DEST=$HOME/Pictures/$(date +'%Y%m%d').$SUFF
mv $FILES $DEST
chmod 644 $DEST

s3cmd copy files preserving path

Is there a way to use copy files to an S3 bucket by preserving the file path?
This is the example:
1. I produce a list of files that are different in bucket1 then in bucket2 using s3cmd sync --dry-run
The list looks like this:
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/definition/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/nodetypes.cnd
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/properties.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/app-store/.content.xml
I need to process this list to upload to a new location in the bucket (e.g. s3://bucket/diff/) only the files in the list BUT with the full path as shown in the list.
A simple loop like this:
diff_file_list=$(s3cmd -c s3cfg sync --dry-run s3://BUCKET/20150831/PROD s3://BUCKET/20150831/DEV | awk '{print $2}')
for f in $diff_file_list; do
s3cmd -c s3cfg cp $f s3://BUCKET/20150831/DIFF/
done
does not work; it produces this:
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/definition/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/nodetypes.cnd copied to s3://BUCKET/20150831/DIFF/nodetypes.cnd
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/properties.xml copied to s3://BUCKET/20150831/DIFF/properties.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/origin-store/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
Thanks,
Short answer: not it is not! That is because the paths in S3 buckets are not actually directories/folders and the S3 bucket have no such concepts of structure even if various tools are presenting it this way (including s3cmd which is really confusing...).
So, the "path" is actually a prefix (although the sc3cmd sync to local knows how to translate this prefix in a directory structure on your filesystem).
For a bash script the solution is:
1. create a file listing all the paths from a s3cmd sync --dry-run command (basically a list of diffs) => file1
copy that file and use sed to modify the paths as needed:
sed 's/(^s3.*)PROD/\1DIFF/') => file2
Merge the files so that line1 in file1 is continued by line1 in file2 and so on:
paste file1 file2 > final.txt
Read final.txt, line by line, in a loop and use each line as a set of 2 parameters to a copy or syun command:
while IFS='' read -r line || [[ -n "$line" ]]; do
s3cmd -c s3cfg sync $line
done < "final.txt"
Notes:
1. $line in the s3cmd must not be in quotes; if it is the sync command will complain that it received one parameter only... of course!
2. the [[ -n "$line" ]] is used here so that read will not fail of the last line has not new line character
Boto could not help more unfortunately so if you need something similar in python you would do it pretty much the same....

SSH unzip AND change the filename (or get the filename contained therein)

/usr/bin/curl http://somewebsite.com/foo.zip -o 4232.zip
unzip -o -q -L 4232.zip
chown 508 /home/me/www/inbound/data/??????.xml
rm -f 4232.zip
I am using this SSH script to download a zip file called foo.zip, rename the file to 4232.zip, the extract the contents.
My problem is that the zip file contains a single file whose name is constantly changing. I cannot see a flag for unzip that lets me rename the file(s) inside the zip.
How can I rename the mystery file inside. There is really only ever one file in my immediate project.
-or-
How can I get that filename so I can change ownership and use it in a PHP script that will process it later on...
Any help would be appreciated.
This will tell you the name of the .xml file in the zip
unzip -l z.zip | grep -o '[^/]*\.xml'
If you want to extract and rename the xml file
unzip -p z.zip \*.xml | cat > NEWFILE.xml

How to ignore certain files when branching / checking out?

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.
As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.
I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
#!/bin/bash
max_retries=5
rooturl="$1"
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
exit 1
fi
tmpdir="$(mktemp -d)"
target="$(pwd)"
# used for holding HTTP response before extracting data
tmp="$(mktemp)"
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
}
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
IFS=$'\n'
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
done
return
fi
done
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
}
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
done=true
break
fi
done
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
done
}
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
Feel free to correct any mistakes or add improvements.
There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)