How to efficiently transfer remotely millions of files - ssh

In some machine there are almost five millions of small (50KB) text files. I need to send them to another machine on our LAN. I tried doing
scp *.txt remote_machine:
since ssh connection is setup passwordless between them. But a new connection is established for each file, so it is painstakingly slow.
I wonder therefore what would be the best strategy for doing this.

You can make a files.tar.gz file before transferring.

Not sure if scp is multi-threaded. If not try something like this so better utilize all cores/ cpus and network bandwidth:
scp [A-M]*.txt remote_machine:
scp [M-Z]*.txt remote_machine:
scp [0-9]*.txt remote_machine:
...
Of course the patterns to use depend on the naming of your files.
Instead of scp you could also use rsync with the same approach.

well ssh also means encyption/decryption, why not you use ftp rather to transfer ... if security is not a real concern ?
more over, you can archive the data and decompress it after transfering ... if your network is slow.
so in short, issue the following command to make archive ..
cd /path/to/transfer/folder
tar -cvpjf /tmp/transfer.tar.bz2 .
to tranfer you will issue command
ftp open remotemachine
put /tmp/transfer.tar.bz2
on reciving, you will issue command in the folder you wanted to issue all...
cd /path/where/to/extract
tar -xvpjf ~/transfer.tar.bz2
rm ~/transfer.tar.bz2
definately you can automate it, i automated this process for me to transfer big chunk of data to a target ...

.tar.gz the files together and un.tar.gz the files apart at the other end.
tar cz *.txt | ssh remote_machine 'tar xz'
Ssh itself slows things down. If you are copying between hosts on the same network and security isn't an issue, it may be better to use a raw tcp connection.
remote_machine$ nc -l 3333 -q 1 | tar xz
local_machine$ tar cz *.txt >/dev/tcp/remote_machine/3333
If you want to use a different port number from 3333, make sure you change it in both lines.

Related

Where do I find the spec for scp -t?

So I have discovered I can do the following with scp via stdin:
Directory creation
scp -tr .
stdin -> D0755 0 <directory_name>
stdin -> \x00
File writing
scp -tr .
stdin -> C<filemode, eg. 0744> <file_size_in_bytes> <filename>
stdin -> actual file bytes
stdin -> \x00
In the man pages I can't find any mention of this, nor have I had luck with googling. Where do I find the spec for these various commands: file creation, directory creation? What else can I do? I'm curious where this is defined. I'm struggling to find where I even found this code initially. Why is there no mention of the -t flag in the scp man page?
scp transfers files by opening an SSH connection to a remote server and invoking another copy of scp on the remote system. The two scp instances communicate through a simple protocol; one instance sends commands and file data; the other instance acts on the commands to store the files on its local system.
The -t option tells scp that it was invoked by another scp instance and that it'll be receiving files. There is another option -f which tells scp that it was invoked by another instance and should send files.
You'd have to ask the OpenSSH developers why the options aren't documented. One might presume that it's because they're not intended for use by humans and so not really part of the user interface.
The best online descriptions of the SCP protocol that I know of are:
How the SCP protocol works
Ruby net-scp source code
OpenSSH scp source code

Best way to copy files from Docker volume on remote server to local host?

I've got,
My laptop
A remote server I can SSH into which has a Docker volume inside of which are some files I'd like to copy to my laptop.
What is the best way to copy these files over? Bonus points for using things like rsync, etc.. which are fast / can resume / show me progress and not writing any temporary files.
Note: my user on the remote server does not have permission to just scp the data straight out of the volume mount in /var/lib/docker, although I can run any containers on there.
Having this problem, I created dvsync which uses ngrok to establish a tunnel that is being used by rsync to copy data even if the machine is in a private VPC. To use it, you first start the dvsync-server locally, pointing it at the source directory:
$ docker run --rm -e NGROK_AUTHTOKEN="$NGROK_AUTHTOKEN" \
--mount source=MY_DIRECTORY,target=/data,readonly \
quay.io/suda/dvsync-server
Note, you need the NGROK_AUTHTOKEN which can be obtained from ngrok dashboard. Then start the dvsync-client on the target machine:
docker run -e DVSYNC_TOKEN="$DVSYNC_TOKEN" \
--mount source=MY_TARGET_VOLUME,target=/data \
quay.io/suda/dvsync-client
The DVSYNC_TOKEN can be found in dvsync-server output and it's a base64 encoded private key and tunnel info. Once the data has been copied, the client wil exit.
I'm not sure about the best way of doing so, but if I were you I would run a container sharing the same volume (in read-only -- as it seems you just want to download the files within the volume) and download theses.
This container could be running rsync as you wish.

Is possible to copy files over ssh during an active connection

Very often I need to copy a file from a ssh connection. Lets say a mysql dump. what I do is
local $ ssh my_server
server$ mysqldump database >> ~/export.sql
server$ exit
local $ scp myserver:~/export.sql .
I know ssh has a lot of features like ssh-agent, port-forwarding, etc, and I was wondering if there is anyway to execute scp FROM the server to copy to my local computer (without creating another ssh connection).
First of all, this question is off-topic here, so it will be migrated or put on hold early.
Anyway, I described the solution for similar problem here, but it should help you: https://stackoverflow.com/a/33266538/2196426
Summed up, yes it is possible using remote port forwarding:
[local] $ ssh -R 2222:xyz-VirtuaBox:22 remote
[remote]$ scp -P 2222 /home/user/test xyz#localhost:/home/user

How to copy a directory from local machine to remote machine

I am using ssh to connect to a remote machine.
Is there a way i can copy an entire directory from a local machine to the remote machine?
I found this link to do it the other way round i.e copying from remote machine to local machine.
Easiest way is scp
scp -r /path/to/local/storage user#remote.host:/path/to/copy
rsync is best for when you want to update versions where it has been previously copied.
If that doesn't work, rerun with -v and see what the error is.
It is very easy with rsync as well:
rsync /path/to/local/storage user#remote.host:/path/to/copy
I recommend the usage of rsync over scp, because it is highly likely that you will one day need a feature that rsync offers and then you benefit from your experience with the tool.
This is worked for me
rsync -avz -e 'ssh' /path/to/local/dir user#remotehost:/path/to/remote/dir
this is if you have to used another ssh port other than 22
rsync -avzh -e 'ssh -p sshPort' /my/local/dir/ remoteUser#host:/path/to/remote/dir
this works if your remote server uses default 22 port
rsync -avzh /my/local/dir/ remoteUser#host:/path/to/remote/dir
This worked for me.
Follow this link for detailed understanding.
we can do this by using scp command for example:
scp -r /path/to/local/machine/directory user#remotehost(server IP Address):/path/to/sever/directory
In case of differnt port
By default, the SCP protocol operates on port 22 but this can be overridden by supplying the -P flag, followed by the port number for example:
scp -P 8563 -r /path/to/local/machine/directory user#remotehost(server IP Address):/path/to/sever/directory
NOTE: we use -r flag to copy directory's files/folders recursively instead of a single file.

Smart way to copy multiple files from different paths using scp [duplicate]

This question already has answers here:
scp or sftp copy multiple files with single command
(19 answers)
Closed last year.
I would like to know an easy way to use scp to copy files and folders that are present in different paths on my file system. The SSH destination server requests a password and I cannot put this in configuration files. I know that scp doesn't have a password parameter that I could supply from a script, so for now I must copy each file or directory one by one, writing my password every time.
in addition to the already mentioned glob:
you can use {,} to define alternative paths/pathparts in one single statement
e.g.: scp user#host:/{PATH1,PATH2} DESTINATION
From this site:
Open the master
SSHSOCKET=~/.ssh/myUsername#targetServerName
ssh -M -f -N -o ControlPath=$SSHSOCKET myUsername#targetServerName
Open and close other connections without re-authenticating as you like
scp -o ControlPath=$SSHSOCKET myUsername#targetServerName:remoteFile.txt ./
Close the master connection
ssh -S $SSHSOCKET -O exit myUsername#targetServerName
It's intuitive, safer than creating a key pair, faster than creating a compressed file and worked for me!
If you can express all the names of the files you want to copy from the remote system using a single glob pattern, then you can do this in a single scp command. This usage will only support a single destination folder on the local system for all files though. For example:
scp 'RemoteHost:/tmp/[abc]*/*.tar.gz' .
copies all of the files from the remote system which are names (something).tar.gz and which are located in subdirectories of /tmp whose names begin with a, b, or c. The single quotes are to protect the glob pattern from being interpreted from the shell on the local system.
If you cannot express all the files you want to copy as a single glob pattern and you still want the copy to be done using a single command (and a single SSH connection which will ask for your passsword only once) then you can either:
Use a different command than scp, like sftp or rsync, or
Open an SSH master connection to the remote host and run several scp commands as slaves of that master. The slaves will piggyback on the master connection which stays open throughout and won't ask you for a password. Read up on master & slave connections in the ssh manpage.
create a key pair, copy the public key to the server side.
ssh-keygen -t rsa
Append content inside the file ~/.ssh/identity.pub to file ~/.ssh/authorized_keys2 of server side user. You need not to type password anymore.
However, be careful! anybody who can access your "local account" can "ssh" to the server without password as well.
Alternatively, if you cannot use public key authentication, you may add the following configuration to SSH (either to ~/.ssh/config or as the appropriate command-line arguments):
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
ControlPersist 2m
With this config, the SSH connection will be kept open for 2 minutes so you'll only need to type the password the first time.
This post has more details on this feature.