Gitlab Tee: collapsed multi-line command & unrecognized option: append - gitlab-ci

Inside of my GitLab CI file I have a file which is copied from the "Publish npm packages instruction",
before_script:
- |
{
echo "#${CI_PROJECT_ROOT_NAMESPACE}:registry=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/npm/"
echo "${CI_API_V4_URL#https?}/projects/${CI_PROJECT_ID}/packages/npm/:_authToken=\${CI_JOB_TOKEN}"
} | tee --append .npmrc
When I try to run this in Alpine Linux I'm getting.
$ { # collapsed multi-line command
tee: unrecognized option: append
BusyBox v1.31.1 () multi-call binary.
Usage: tee [-ai] [FILE]...
Copy stdin to each FILE, and also to stdout
-a Append to the given FILEs, don't overwrite
-i Ignore interrupt signals (SIGINT)

The reason is simple, there are two implementations of tee,
Busybox, which Alpine uses, has tee but it doesn't provide an --append. It does provide an -a option (short for append) which is defined as "append to the given FILEs, do not overwrite"
GNU CoreUtils provides a copy of tee too it has --append which you're making use of here. It's also defined as "append to the given FILEs, do not overwrite". But as a shorthand, GNU Tee also provides the alias is -a.
So in short, if you want something to be compatible with Alpine and BusyBox as well as distros that ship GNU Tee then use -a (supported in both) instead of --append (supported only in GNU Tee).

Related

RSYNC and folder hierachcy

After making a full forensic copy of a harddrive using dd, I would like to keep up with changes between the original and backup harddisc, therefore I started using rsync.
Whenever, I run
sync -a -v -n --progress /media/drive1 /media/drive2
the command would start listing all files contained in drive1. However, only a couple of them has changed after I did DD.
Trying that on a single folder
sync -a -v -n --progress /media/drive1/folder /media/drive2
works fine and just displays the new files in that folder - those which are not contained in /media/drive2/folder.
However, executing the command on the level of both volumes
sync -a -v -n --progress /media/drive1 /media/drive2
does not account for the differentials, contrary to the documentation which is everywhere available, but takes all files which are already on both drives.
What is my mistake?
The way rsync treats its source and destination paths is easy to get wrong. When you use the command:
sync -a -v -n --progress /media/drive1 /media/drive2
...it tries to sync the drive1 folder into drive2; that is, it creates and populates /media/drive2/drive1. When you add "/folder" to the source path, it works as expected because then it's trying to sync with /media/drive2/folder, which is what you want.
Fortunately, the solution is easy: add "/" to the end of the source path, which tells it to sync the contents of drive1 into drive2, rather than the folder itself:
sync -a -v -n --progress /media/drive1/ /media/drive2
BTW, I'd recommend adding --dry-run to make sure it's doing what you want before running it "for real". You'll probably also have to delete /media/drive2/drive1.

Use env variable to represent SSH options

These two ssh options:
ssh -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0
I am interested in setting those using env variables, is there a way to do that?
export StrictHostKeyChecking=no
export NumberOfPasswordPrompts=0
but that's of course not quite right
The "proper" way to do this is within your ~/.ssh/config file, where you can make those global for all connections or you can restrict it by host (or a few more advanced things). I'm going to assume you can't do that but that you do still have the ability to alter your ~/.bashrc or whatever.
You can solve this by putting this in your ~/.bashrc or ~/.zshrc (or by running these lines before ssh):
SSH_ARGS=( -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 )
alias ssh='ssh "${SSH_ARGS[#]}"'
If you want explicit environment variables like what you're proposing in your question (e.g. $StrictHostKeyChecking) then you'll have to make a really ugly alias or function to convert them all to the final ssh call. Here's a solution that should work in bash but won't work in POSIX shell:
ssh() {
local SSH_VAR SSH_VAR_VALUE
for SSH_VAR in StrictHostKeyChecking NumberOfPasswordPrompts; do # expand this
SSH_VAR_VALUE="${!SSH_VAR}" # bash indirect expansion
if [ -n "$SSH_VAR_VALUE" ]; then
set -- -o "$SSH_VAR=$SSH_VAR_VALUE" "$#"
fi
done
command ssh "$#"
}
The above example only supports the two SSH variables named in the question. That line will likely have to get really long as it must explicitly name every option before the semicolon on the line with the # expand this comment.
This loops over every supported SSH variable and uses bash's indirect expansion to check whether it is in the environment (and not an empty value). If it is, a -o and the option name and its value are prepended to the start of the argument list ($#).
If you use zsh, you'll need zsh's parameter expansion flag P, replacing bash's ${!SSH_VAR} with zsh's ${(P)SSH_VAR}. Other shells may need to replace that whole "bash indirect expansion" line with eval "SSH_VAR_VALUE=\"\${$SSH_VAR}\"" (watch your quoting there, or that line can be dangerous).
Invoking command ssh prevents the recursion we'd get from calling ssh from within a function of the same name. command ssh ignores the function and actually runs the true ssh command.

Scripts installed by the deb package have wrong prefix

Building our own deb packages we've run into the issue of having to patch manually some scripts so they get the proper prefix.
In particular,
We're building mono
We're using official tarballs.
The scripts that end up with wrong prefix are: mcs, xbuild, nunit-console4, etc
An example of a wrong script:
#!/bin/sh
exec /root/7digital-mono/mono/bin/mono \
--debug $MONO_OPTIONS \
/root/7digital-mono/mono/lib/mono/2.0/nunit-console.exe "$#"
What should be the correct end result:
#!/bin/sh
exec /usr/bin/mono \
--debug $MONO_OPTIONS \
/usr/lib/mono/2.0/nunit-console.exe "$#"
The workaround we're using in our build-package script before calling dpkg-buildpackage:
sed -i s,`pwd`/mono,/usr,g $TARGET_DIR/bin/mcs
sed -i s,`pwd`/mono,/usr,g $TARGET_DIR/bin/xbuild
sed -i s,`pwd`/mono,/usr,g $TARGET_DIR/bin/nunit-console
sed -i s,`pwd`/mono,/usr,g $TARGET_DIR/bin/nunit-console2
sed -i s,`pwd`/mono,/usr,g $TARGET_DIR/bin/nunit-console4
Now, what is the CORRECT way to fix this? Full debian package creation scripts here.
Disclaimer: I know there are preview packages of Mono 3 here! But those don't work for Squeeze.
the proper way is to not call ./configure --prefix=$TARGET_DIR
this tells all the binaries/scripts/... that the installated files will end up in ${TARGET_DIR}, whereas they really should endup in /usr.
you can use the DESTDIR variable (as in make install DESTDIR=${TARGET_DIR}) to change (prefix) the installation target at install time (files will end-up in ${TARGET_DIR}/${prefix} but will only have ${prefix} "built-in")

Utilizing multi core for tar+gzip/bzip compression/decompression

I normally compress using tar zcvf and decompress using tar zxvf (using gzip due to habit).
I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.
For example use:
tar -c --use-compress-program=pigz -f tar.file dir_to_zip
You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz
By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.
tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz
Common approach
There is option for tar program:
-I, --use-compress-program PROG
filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.
p7zip
For p7zip for compression you need a small shell script like the following:
#!/bin/sh
case $1 in
-d) 7za -txz -si -so e;;
*) 7za -txz -si -so a .;;
esac 2>/dev/null
Save it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T or --threads to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0").
This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this
option has no effect for now.
However this will not work for decompression of files that haven't also
been compressed with threading enabled. From man for version 5.2.2:
Threaded decompression hasn't been implemented yet. It will only work
on files that contain multiple blocks with size information in
block headers. All files compressed in multi-threaded mode meet this
condition, but files compressed in single-threaded mode don't even if
--block-size=size is used.
Recompiling with replacement
If you build tar from sources, then you can recompile with parameters
--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip
After recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz"
-j, --bzip2 filter the archive through lbzip2
--lzip filter the archive through plzip
-z, --gzip, --gunzip, --ungzip filter the archive through pigz
You can use the shortcut -I for tar's --use-compress-program switch, and invoke pbzip2 for bzip2 compression on multiple cores:
tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/
If you want to have more flexibility with filenames and compression options, you can use:
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \
tar -P --transform='s#/my/path/##g' -cf - {} + | \
pigz -9 -p 4 > myarchive.tar.gz
Step 1: find
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case /my/path/*.sql and /my/path/*.log. Add as many -o -name "pattern" as you want.
-exec will execute the next command using the results of find: tar
Step 2: tar
tar -P --transform='s#/my/path/##g' -cf - {} +
--transform is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C option to change directory as you'll lose benefits of find: all files of the directory would be included.
-P tells tar to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform anyway.
-cf - tells tar to use the tarball name we'll specify later
{} + uses everyfiles that find found previously
Step 3: pigz
pigz -9 -p 4
Use as many parameters as you want.
In this case -9 is the compression level and -p 4 is the number of cores dedicated to compression.
If you run this on a heavy loaded webserver, you probably don't want to use all available cores.
Step 4: archive name
> myarchive.tar.gz
Finally.
A relatively newer (de)compression tool you might want to consider is zstandard. It does an excellent job of utilizing spare cores, and it has made some great trade-offs when it comes to compression ratio vs. (de)compression time. It is also highly tweak-able depending on your compression ratio needs.
Here is an example for tar with modern zstd compressor, as finding out good examples on this one was difficult:
apt poem to install zstd and pv utilities for Ubuntu
Compress multiple files and folders (zstd command alone can only do single files)
Display progress using pv - shows the total bytes compressed and compression speed GB/sec real-time
Use all physical cores with -T0
Set compression level higher than the default with -8
Display the resulting wall clock and CPU time used after the operation is finished using time
apt install zstd pv
DATA_DIR=/path/to/my/folder/to/compress
TARGET=/path/to/my/arcive.tar.zst
time (cd $DATA_DIR && tar -cf - * | pv | zstd -T0 -8 -o $TARGET)

How to ignore certain files when branching / checking out?

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.
As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.
I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
#!/bin/bash
max_retries=5
rooturl="$1"
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
exit 1
fi
tmpdir="$(mktemp -d)"
target="$(pwd)"
# used for holding HTTP response before extracting data
tmp="$(mktemp)"
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
}
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
IFS=$'\n'
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
done
return
fi
done
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
}
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
done=true
break
fi
done
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
done
}
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
Feel free to correct any mistakes or add improvements.
There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)