How do I transfer small files quickly over the network with zstd? - ssh

As the question states, I want to backup many small files and send them via ssh to a destination. Does rsync speed things up significantly vs tar?

This works quite well, significantly faster than gzip.
Push (Upload)
tar -c --zstd src_dir | ssh user#dest_addr "cd dest_dir && tar -x --zstd"
This does the following
Creates a tar file using Zstd and outputs it via STDOUT
Connects via ssh, piping STDOUT over the network
Reads data from STDIN, and extracts it
Custom zstd flags
This uses maximum compression (default level is 3) and multithreading.
tar -c -I "zstd -19 -T0" src_dir | ssh user#dest_addr "cd dest_dir && tar -x --zstd"
With progress
tar -c --zstd src_dir | pv --timer --rate | ssh user#dst_addr "cd dest_dir && tar -x --zstd"
Pull (Download)
ssh user#dest_addr "tar --zstd -cf - src_dir" | tar -x --zstd --directory dest_dir

Related

PID recv: short read in CRIU

I am receiving PID recv: short read error while using lazy pages migration with CRIU.
At the source, I run the following command:
memhog -r1000 64m
cd /tmp/dump sudo -H -E criu dump -t $(pidof memhog) -D /tmp/dump --lazy-pages --address 10.237.23.102 --port 1234 --shell-job --display-stats -vvvv -o d.log
Then, in a separate terminal on the source machine itself:
scp -r /tmp/dump/ dst:/tmp/
Now, on the destination machine I start the daemon:
cd /tmp/dump criu lazy-pages --page-server --address $(gethostip -d src) --port 1234 --display-stats -vvvvv
And finally, the restore command:
cd /tmp/dump criu restore -D /tmp/dump/ --shell-job --lazy-pages -vvvv --display-stats -o restore.log -vvvv
The error is thrown by the lazy server daemon on the destination machine.
Furthermore, it works fine for the memhog installed from numactl. However, it does not if I build it from the source.
Any suggestions for solving this will be appreciated.
::Update:: Solved. See answer
Found the issue:
I was building them separately on two different machines due to which their "build-id" was not matching. Solution: Build on one machine and then just scp it over to the other machine.

Streaming stdout from remote shell call

I have a read-only remote filesystem that stores logs.
I use ssh -t to run grep queries on these logs. Sometimes, the queries can take too long and cause the ssh to timeout.
Is there some way to stream the stdout back and keep ssh connection alive?
Example command:
ssh -t my-host.com "cd /path/to/my/folder ; find ./ -name '*' -print0 | xargs -0 -n1 -P8 zgrep -B 5 -H 'My search string'" > search_result.txt
Thanks

TAR over two hops

I need to create a tar and shipped it to my local folder.
If i can create tar file, i can easily get it on local folder using scp.
Here problem is at first step: Creating TAR on remote server. Server is accessible only through another remote server (bastion server).
Here is the command i'm using currently:
timestamp="20160226-085856"
ssh bastion_server -t ssh remote_server "sudo su -c \"cp -r /etc/nginx /home/ubuntu/backup/nginx_26Feb && cd /home/ubuntu/backup && tar -C /home/ubuntu/backup -cf backup_nginx-$timestamp.tar ./nginx_26Feb\" "
Here is the error i am getting:
su: invalid option -- 'r'
Usage: su [options] [LOGIN]
Any help here would be great.
Give it a try without the fancy sudo su -c. Using sudo -s should be enough:
ssh bastion_server -t ssh remote_server "sudo -s cp -r /etc/nginx \
/home/ubuntu/backup/nginx_26Feb && cd /home/ubuntu/backup && \
tar -C /home/ubuntu/backup -cf backup_nginx-$timestamp.tar ./nginx_26Feb"
Or rather set up proper two-hops ~/.ssh/config:
Host bastion
Hostname bastion_server
Host remote
Hostname remote_server
ProxyCommand ssh -W %h:%p bastion
and then just run
ssh remote sudo su -c "cp -r /etc/nginx /home/ubuntu/backup/nginx_26Feb \
&& cd /home/ubuntu/backup && tar -C /home/ubuntu/backup -cf \
backup_nginx-$timestamp.tar ./nginx_26Feb"
Without the fancy escaping and stuff.

Utilizing multi core for tar+gzip/bzip compression/decompression

I normally compress using tar zcvf and decompress using tar zxvf (using gzip due to habit).
I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.
For example use:
tar -c --use-compress-program=pigz -f tar.file dir_to_zip
You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz
By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.
tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz
Common approach
There is option for tar program:
-I, --use-compress-program PROG
filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.
p7zip
For p7zip for compression you need a small shell script like the following:
#!/bin/sh
case $1 in
-d) 7za -txz -si -so e;;
*) 7za -txz -si -so a .;;
esac 2>/dev/null
Save it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T or --threads to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0").
This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this
option has no effect for now.
However this will not work for decompression of files that haven't also
been compressed with threading enabled. From man for version 5.2.2:
Threaded decompression hasn't been implemented yet. It will only work
on files that contain multiple blocks with size information in
block headers. All files compressed in multi-threaded mode meet this
condition, but files compressed in single-threaded mode don't even if
--block-size=size is used.
Recompiling with replacement
If you build tar from sources, then you can recompile with parameters
--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip
After recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz"
-j, --bzip2 filter the archive through lbzip2
--lzip filter the archive through plzip
-z, --gzip, --gunzip, --ungzip filter the archive through pigz
You can use the shortcut -I for tar's --use-compress-program switch, and invoke pbzip2 for bzip2 compression on multiple cores:
tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/
If you want to have more flexibility with filenames and compression options, you can use:
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \
tar -P --transform='s#/my/path/##g' -cf - {} + | \
pigz -9 -p 4 > myarchive.tar.gz
Step 1: find
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case /my/path/*.sql and /my/path/*.log. Add as many -o -name "pattern" as you want.
-exec will execute the next command using the results of find: tar
Step 2: tar
tar -P --transform='s#/my/path/##g' -cf - {} +
--transform is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C option to change directory as you'll lose benefits of find: all files of the directory would be included.
-P tells tar to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform anyway.
-cf - tells tar to use the tarball name we'll specify later
{} + uses everyfiles that find found previously
Step 3: pigz
pigz -9 -p 4
Use as many parameters as you want.
In this case -9 is the compression level and -p 4 is the number of cores dedicated to compression.
If you run this on a heavy loaded webserver, you probably don't want to use all available cores.
Step 4: archive name
> myarchive.tar.gz
Finally.
A relatively newer (de)compression tool you might want to consider is zstandard. It does an excellent job of utilizing spare cores, and it has made some great trade-offs when it comes to compression ratio vs. (de)compression time. It is also highly tweak-able depending on your compression ratio needs.
Here is an example for tar with modern zstd compressor, as finding out good examples on this one was difficult:
apt poem to install zstd and pv utilities for Ubuntu
Compress multiple files and folders (zstd command alone can only do single files)
Display progress using pv - shows the total bytes compressed and compression speed GB/sec real-time
Use all physical cores with -T0
Set compression level higher than the default with -8
Display the resulting wall clock and CPU time used after the operation is finished using time
apt install zstd pv
DATA_DIR=/path/to/my/folder/to/compress
TARGET=/path/to/my/arcive.tar.zst
time (cd $DATA_DIR && tar -cf - * | pv | zstd -T0 -8 -o $TARGET)

is it possible to take a large number of files & tar/gzip and stream them on-the-fly?

I have a large number of files which I need to backup, problem is there isn't enough disk space to create a tar file of them and then upload it offsite. Is there a way of using python, php or perl to tar up a set of files and upload them on-the-fly without making a tar file on disk? They are also way too large to store in memory.
I always do this just via ssh:
tar czf - FILES/* | ssh me#someplace "tar xzf -"
This way, the files end up all unpacked on the other machine. Alternatively
tar czf - FILES/* | ssh me#someplace "cat > foo.tgz"
Puts them in an archive on the other machine, which is what you actually wanted.
You can pipe the output of tar over ssh:
tar zcvf - testdir/ | ssh user#domain.com "cat > testdir.tar.gz"