My work uses WinSCP for SFTP transfers. We have some data coming in this way each week and I would like to get it into an S3 bucket. We want to automate this transferring by use of a cron job or some other way like this.
I know there are AWS tools, but they cost money and money can't be spent. We also do not have an ETL tool like Alteryx, otherwise I would use it. Nothing on the internet gives a lot of detail about transfering files from SFTP server to another server. Mostly reading how to transfer from server to local.
Below is the code I have found.
Can this WinSCP commands be used to transfer to S3 bucket somehow at the 'put:'? (I cannot use the generator like other posts have said because I do not have access to our AWS or any buckets, yet.) This is all about proving a concept.
# Connect to SFTP server using a password
open sftp://user:password#example.com/ -hostkey="ssh-rsa 2048 xxxxxxxxxxx...="
# Upload file (THIS IS WHERE I WOULD WANT S3 PATH SYNTAX)
put d:\examplefile.txt /home/user/
# Exit WinSCP
Exit
Once I have this command we can then create a Windows schedule task, from what I read. This would automate where the file is and we can then do more with the file where SFTP servers limit us.
If I understand the question correctly, you are asking how to transfer files directly from SFTP server to S3 using a script running on yet another machine.
It's not possible (unless AWS has a feature for that, but then it won't be free). You have to download the files from SFTP server and then upload them to S3.
With WinSCP scripting, you can do it with a script like:
open sftp://username:password#sftp.example.com/
get /sftp/path/*
exit
open s3://accesskey:secretkey#s3.amazonaws.com/
put * /bucket/
exit
You can mount your bucket on your server with "s3fs" "fuse" like an normal harddisc.
Related
I have a lot of big files on a remote server and I want to move them into S3. I want to do it at the command line or with a bash script (e.g., I do NOT want to use a gui app like cyberduck) so that I can automate/replicate efforts.
I have tried to mount my remote server onto my local machine using Osxfuse and sshfs and then push it to s3 using s3cmd. This does work but I keep running into errors (connection being lost for no apparent reason; mount errors, etc.).
Is this the best way to do it? Does anyone know a better way to do it?
Thanks,
A
You can use minio client aka mc to do the same.
$ mc cp --recursive localDir/ s3/remoteBucket
In case of network disconnection, mc will provide you an option to resume the upload.
Is your remote server in ec2? Your current setup requires two copies (first to pull data to your local machine via sshfs, then to push to s3 via s3cmd), if you run s3cmd on your remote server directly you can reduce that to one.
If you want to mount s3 as a filesystem, you can also use tools like goofys or s3fs. Again you should do that on your remote server to avoid extra copies.
I have a s3 bucket with about 100 gb of small files (in folders).
I have been requested to back this up to a local NAS on a weekly basis.
I have access to a an EC2 instance that is attached to the S3 storage.
My Nas allows me to run an sFTP server.
I also have access to a local server in which I can run a cron job to pull the backup if need be.
How can I best go about this? If possible i would like to only download the files that have been added or changed, or compress it on the server end and then push the compressed file to the SFtp on the Nas.
The end goal is to have a complete backup of the S3 bucket on my Nas with the lowest amount of transfer each week.
Any suggestions are welcome!
Thanks for your help!
Ryan
I think the most scalable method for you to achieve this is using AWS Elastic Map Reduce and Data pipeline.
The architecture is this way:
You will use Data pipeline to configure S3 as an input data node, then EC2 with pig/hive scripts to do the required processing to send the data to SFTP. Pig is extendable to have a custom UDF (user defined function) to send data to SFTP. Then you can setup this pipeline to run at a periodical interval. Having said this this, it requires quite some reading to achieve all these - But a good skill to achieve if you for see future data transformation needs.
Start reading from here:
http://aws.typepad.com/aws/2012/11/the-new-amazon-data-pipeline.html
Similar method can be used for Taking periodic backup of DynamoDB to S3, Reading files from FTP servers, processing and moving to say S3/RDS etc.
How I can I access the file sitting in the following folder of S3 which is own by someone else
s3n://elasticmapreduce/samples/wordcount/input
The files in s3n://elasticmapreduce/samples/wordcount/input are public, and made available as input by Amazon to the sample word count Hadoop program. The best way to fetch them is to
Start a new Amazon Elastic MapReduce Job Flow (it doesn't matter which one) from the Amazon Web Services console, and make sure that you keep the the job alive with the Keep Alive option
Once the EC2 machines have started, find the instances on EC2 from the Amazon Web Services console
ssh into one of the running EC2 instances, using the hadoop user, for example
ssh -i keypair.pem hadoop#ec2-IPADDRESS.compute-1.amazonaws.com
Obtain the files you need, using hadoop dfs -copyToLocal s3://elasticmapreduce/samples/wordcount/input/0002 .
sftp the files to your local system
You can access wordSplitter.py here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/wordSplitter.py
You can access the input files here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0012
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0011
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0010
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0009
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0008
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0007
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0006
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0005
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0004
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0003
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0002
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0001
The owner of the folder (most likely a file in the folder) must have made it accessible to anonymous reader.
If that is the case, s3n://x/y... is translated to
http://s3.amazonaws.com/x/y...
or
http://x.s3.amazonaws.com/y...
x is the name of the bucket.
y... is the path wihtin the bucket.
If you want to make sure the file exists, e.g. if you suspect the name was misspelled, you can in your browser to open
http://s3.amazonaws.com/x
and you'll see XML describing "files" that is S3 objects, available.
Try this:
http://s3.amazonaws.com/elasticmapreduce
I tried this, and seems that the path you want is not public.
AWS EBS documentation quotes s3://elasticmapreduce/samples/wordcount/input in one of the "getting started" examples. But s3 is different from s3n, so input might be available to EMR, but not to HTTP access.
In Amazon S3, there is no concept of folders, a bucket it just a flat collection of objects. But you can list all the files you are interested in a browser with the following URL:
s3.amazonaws.com/elasticmapreduce?prefix=samples/wordcount/input/
Then you can download them by specifying the whole name, e.g.
s3.amazonaws.com/elasticmapreduce/samples/wordcount/input/0001
Is it possible to upload a file to S3 from a remote server?
The remote server is basically a URL based file server. Example, using http://example.com/1.jpg, it serves the image. It doesn't do anything else and can't run code on this server.
It is possible to have another server telling S3 to upload a file from http://example.com/1.jpg
upload from http://example.com/1.jpg
server -------------------------------------------> S3 <-----> example.com
If you can't run code on the server or execute requests then, no, you can't do this. You will have to download the file to a server or computer that you own and upload from there.
You can see the operations you can perform on amazon S3 at http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
Checking the operations for both the REST and SOAP APIs you'll see there's no way to give Amazon S3 a remote URL and have it grab the object for you. All of the PUT requests require the object's data to be provided as part of the request. Meaning the server or computer that is initiating the web request needs to have the data.
I have had a similar problem in the past where I wanted to download my users' Facebook Thumbnails and upload them to S3 for use on my site. The way I did it was to download the image from Facebook into Memory on my server, then upload to Amazon S3 - the full thing took under 2 seconds. After the upload to S3 was complete, write the bucket/key to a database.
Unfortunately there's no other way to do it.
I think the suggestion provided is quite good, you can SCP the file to S3 Bucket. Giving the pem file will be a password less authentication, via PHP file you can validate the extensions. PHP file can pass the file, as argument to SCP command.
The only problem with this solution is, you must have your instance in AWS. You can't use this solution if your website is hosted in other Hosting Providers and you are trying to upload files straight to S3 Bucket.
Technically it's possible, using AWS Signature Version 4, Assuming your remote server is the customer in the image below, you could prepare a form in the main server, and send the form fields to the remote server, for it to curl it. Detailed example here.
you can use scp command from Terminal.
1)using terminal, go to the place where there is that file you want to transfer to the server
2) type this:
scp -i yourAmazonKeypairPath.pem fileNameThatYouWantToTransfer.php ec2-user#ec2-00-000-000-15.us-west-2.compute.amazonaws.com:
N.B. Add "ec2-user#" before your ec2blablbla stuffs that you got from the Ec2 website!! This is such a picky error!
3) your file will be uploaded and the progress will be shown. When it is 100%, you are done!
Does anyone know of a script or program that can be used for backing up multiple websites?
Ideally, I would like the have it setup on a server where the backups will be stored.
I would like to be able to add the website login info, and it connects and creates a zip file or similar that it would then be sent back to the remote server to be saved as a backup etc...
But it would also need to be able to be set up as a cron so it backed up everyday at least?
I can find PC to Server backups that a similar, but no server to server remote backup scripts etc...
It would be heavily used, and needs to be a gui so the less techy can use it too?
Does anyone know of anything similar to what we need?
HTTP-Track website mirroring utility.
Wget and scripts
RSync and FTP login (or SFTP for security)
Git can be used for backup and has security features and networking ability.
7Zip can be called from the command line to create a zip file.
In any case you will need to implement either secure FTP (SSH secured) OR a password-secured upload form. If you feel clever you might use WebDAV.
Here's what I would do:
Put a backup generator script on each website (outputting a ZIP)
Protect its access with a .htpasswd file
On the backupserver, make a cron script download all the backups and store them