how to move files from remote server to s3 at the command line - amazon-s3

I have a lot of big files on a remote server and I want to move them into S3. I want to do it at the command line or with a bash script (e.g., I do NOT want to use a gui app like cyberduck) so that I can automate/replicate efforts.
I have tried to mount my remote server onto my local machine using Osxfuse and sshfs and then push it to s3 using s3cmd. This does work but I keep running into errors (connection being lost for no apparent reason; mount errors, etc.).
Is this the best way to do it? Does anyone know a better way to do it?
Thanks,
A

You can use minio client aka mc to do the same.
$ mc cp --recursive localDir/ s3/remoteBucket
In case of network disconnection, mc will provide you an option to resume the upload.

Is your remote server in ec2? Your current setup requires two copies (first to pull data to your local machine via sshfs, then to push to s3 via s3cmd), if you run s3cmd on your remote server directly you can reduce that to one.
If you want to mount s3 as a filesystem, you can also use tools like goofys or s3fs. Again you should do that on your remote server to avoid extra copies.

Related

CSV transfer from SFTP server to S3 via WinSCP

My work uses WinSCP for SFTP transfers. We have some data coming in this way each week and I would like to get it into an S3 bucket. We want to automate this transferring by use of a cron job or some other way like this.
I know there are AWS tools, but they cost money and money can't be spent. We also do not have an ETL tool like Alteryx, otherwise I would use it. Nothing on the internet gives a lot of detail about transfering files from SFTP server to another server. Mostly reading how to transfer from server to local.
Below is the code I have found.
Can this WinSCP commands be used to transfer to S3 bucket somehow at the 'put:'? (I cannot use the generator like other posts have said because I do not have access to our AWS or any buckets, yet.) This is all about proving a concept.
# Connect to SFTP server using a password
open sftp://user:password#example.com/ -hostkey="ssh-rsa 2048 xxxxxxxxxxx...="
# Upload file (THIS IS WHERE I WOULD WANT S3 PATH SYNTAX)
put d:\examplefile.txt /home/user/
# Exit WinSCP
Exit
Once I have this command we can then create a Windows schedule task, from what I read. This would automate where the file is and we can then do more with the file where SFTP servers limit us.
If I understand the question correctly, you are asking how to transfer files directly from SFTP server to S3 using a script running on yet another machine.
It's not possible (unless AWS has a feature for that, but then it won't be free). You have to download the files from SFTP server and then upload them to S3.
With WinSCP scripting, you can do it with a script like:
open sftp://username:password#sftp.example.com/
get /sftp/path/*
exit
open s3://accesskey:secretkey#s3.amazonaws.com/
put * /bucket/
exit
You can mount your bucket on your server with "s3fs" "fuse" like an normal harddisc.

Accessing external hard drive after logging into a remote machine using ssh command

I am doing an intensive computing project with a super old C program. The program requires a library called Sun Performance Library which is a commercial ware. Instead of purchasing the library by myself, I am running the program by logging onto a Solaris machine in our computer lab with the ssh command, while the working directory to store output data is still on my local Mac.
Now, a problem just occurred: the program uses large amount of disk space to save some intermediate results and the space on my local Mac is quickly filled (50 GB for each user prescribed by the administrator). These results are necessary for the next stage of computing and I cannot delete any of them before it finally produce the output data. Therefore, I have to move the working directory to an external hard drive in order to continue. Obviously,
cd /Volumes/VOLNAME
is not the correct way to do it because the remote machine will give me a prompt saying
/Volumes/VOLNAME: No such file or directory.
So, what is the correct way to do it?
sshfs recently added support for "slave mode" which allows you to do this. Assuming you have sshfs on Solaris (I'm not sure about this), the following command (ran from your Mac) will do what you want: dpipe /usr/lib/openssh/sftp-server = ssh SOLARISHOSTNAME sshfs MACHOSTNAME:/Volumes/VOLNAME MOUNTPOINT -o slave
This will result in the MOUNTPOINT directory on the server being mounted to your local external drive. Note that I'm not sure whether macOS has dpipe. If it doesn't, you can replace it with one of the equivalent solutions at How to make bidirectional pipe between two programs?. Also, if your SFTP server binary is somewhere else, substitute its path.
The common way to mount a remote volume in Solaris is via NFS, but that usually requires root permissions.
Another approach would be to make your application read its data from stdin and output its results to stdout, without using the file system directly. Then you could just redirect the data from/to your local machine through ssh. For instance:
ssh user#host </Volumes/VOLNAME/input.data >/Volumes/VOLNAME/output.data

How to upload files directly to Amazon S3 from a remote server?

Is it possible to upload a file to S3 from a remote server?
The remote server is basically a URL based file server. Example, using http://example.com/1.jpg, it serves the image. It doesn't do anything else and can't run code on this server.
It is possible to have another server telling S3 to upload a file from http://example.com/1.jpg
upload from http://example.com/1.jpg
server -------------------------------------------> S3 <-----> example.com
If you can't run code on the server or execute requests then, no, you can't do this. You will have to download the file to a server or computer that you own and upload from there.
You can see the operations you can perform on amazon S3 at http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
Checking the operations for both the REST and SOAP APIs you'll see there's no way to give Amazon S3 a remote URL and have it grab the object for you. All of the PUT requests require the object's data to be provided as part of the request. Meaning the server or computer that is initiating the web request needs to have the data.
I have had a similar problem in the past where I wanted to download my users' Facebook Thumbnails and upload them to S3 for use on my site. The way I did it was to download the image from Facebook into Memory on my server, then upload to Amazon S3 - the full thing took under 2 seconds. After the upload to S3 was complete, write the bucket/key to a database.
Unfortunately there's no other way to do it.
I think the suggestion provided is quite good, you can SCP the file to S3 Bucket. Giving the pem file will be a password less authentication, via PHP file you can validate the extensions. PHP file can pass the file, as argument to SCP command.
The only problem with this solution is, you must have your instance in AWS. You can't use this solution if your website is hosted in other Hosting Providers and you are trying to upload files straight to S3 Bucket.
Technically it's possible, using AWS Signature Version 4, Assuming your remote server is the customer in the image below, you could prepare a form in the main server, and send the form fields to the remote server, for it to curl it. Detailed example here.
you can use scp command from Terminal.
1)using terminal, go to the place where there is that file you want to transfer to the server
2) type this:
scp -i yourAmazonKeypairPath.pem fileNameThatYouWantToTransfer.php ec2-user#ec2-00-000-000-15.us-west-2.compute.amazonaws.com:
N.B. Add "ec2-user#" before your ec2blablbla stuffs that you got from the Ec2 website!! This is such a picky error!
3) your file will be uploaded and the progress will be shown. When it is 100%, you are done!

server to server transfer

Need to transfer 1 file from old host with no SSH access to a new host in which I do have SSH access. Having a hard time figuring this out. Looking for a simple answer if there is one. And also trying to avoid the slow upload times from my local machine, hence the reason for server to server transfer needed.
Are you able use FTP? You could use that to transfer the files.
If you have the URL of the file you want to move from your old host, you can use the wget command in your SSH terminal. You can use this for any file extension, or folders if you want.
For example, if you want to move http://www.yourhost.com/file.zip to your new host, you would SSH into the folder you want to download move this to, and type:
wget http://www.yourhost.com/file.zip

Fastest / best way copy data between S3 to EC2?

I have a fairly large amount of data (~30G, split into ~100 files) I'd like to transfer between S3 and EC2: when I fire up the EC2 instances I'd like to copy the data from S3 to EC2 local disks as quickly as I can, and when I'm done processing I'd like to copy the results back to S3.
I'm looking for a tool that'll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I'm not looking for pointers to basic libraries; I'm looking for something fast and reliable.
Unfortunately, Adam's suggestion won't work as his understanding of EBS is wrong (although I wish he was right and often thought myself it should work that way)... as EBS has nothing to do with S3, but it will only give you an "external drive" for EC2 instances that are separate, but connectable to the instances. You still have to do copying between S3 and EC2, even though there are no data transfer costs between the two.
You didn't mention an operating system of your instance, so I cannot give tailored information. A popular command line tool I use is http://s3tools.org/s3cmd ... it is based on Python and therefore, according to info on its website it should work on Win as well as Linux, although I use it ALL the time on Linux. You could easily whip up a quick script that uses its built in "sync" command that works similar to rsync, and have it triggered every time you're done processing your data. You could also use the recursive put and get commands to get and put data only when needed.
There are graphical tools like Cloudberry Pro that have some command line options for Windows too that you can setup schedule commands. http://s3tools.org/s3cmd is probably the easiest.
By now, there is a sync command in the AWS Command line tools, that should do the trick: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
On startup:
aws s3 sync s3://mybucket /mylocalfolder
before shutdown:
aws s3 sync /mylocalfolder s3://mybucket
Of course, the details are always fun to work out eg. how can parallel it is (and can you make it more parallel and is that any faster goven the virtual nature of the whole setup)
Btw hope you're still working on this... or somebody is. ;)
I think you might be better off using an Elastic Block Store to store your files instead of S3. An EBS is akin to a 'drive' on S3 that can be mounted into your EC2 instance without having to copy the data each time, thereby allowing you to persist your data between EC2 instances without having to write to or read from S3 each time.
http://aws.amazon.com/ebs/
Install s3cmd Package as
yum install s3cmd
or
sudo apt-get install s3cmd
depending on your OS
then copy data with this
s3cmd get s3://tecadmin/file.txt
also ls can list the files.
for more detils see this
For me the best form is:
wget http://s3.amazonaws.com/my_bucket/my_folder/my_file.ext
from PuTTy