QNAP Backup to s3 - amazon-s3

I am using Qnap NAS currently. I want to sync the local file to AWS S3.
I have created a job and scheduled the backup operation, but the synchronization works only for small files but it cannot work for large files such as 3/4 GB files.
How can I solve this issue?

Lowering the multipart_chunksize to a smaller value will reduce the chance of S3 sync not working for big files.
Try by executing AWS CLI commands as
aws configure set default.s3.max_concurrent_requests 20
and aws configure set default.s3.multipart_chunksize 2MB
Not sure but this might resolve your issue
To get in a detailed description on how this Configuration Values works please follow the below link
https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Related

How to decompress split zip files on AWS S3?

I've got a file (4GB) which is too big to upload on AWS S3 with unstable internet connection, so I split the file into several parts using WinZip.
So, file.csv became a series of files:
- file.z01
- file.z02
- ...
- file.z12
After uploading it on AWS S3 I need to unzip it. How do I do it?
You wont be able to do it without the help of an EC2 instance.
If you have already uploaded these small zip files, launch a new EC2 instance, download these files from S3 using curl or wget, combine them together and upload to s3 again.
Since you are using Winzip, consider launching a Windows based instance, as it will be tough for you find a linux based equivalent for winzip.

AWS S3 not stops uploading from my Lenovo® ix2-dl

I have a NAS drive Lenovo® ix2-dl that I set up to back up to AWS S3. It connected fine. But for some reason it only uploads 5% of my Lenovo® ix2-dl Data. How can I get it to upload my whole Lenovo® ix2-dl Data?
I updated my NAS to the latest Firmware 4.1.218.34037.
I recently had issues with the s3 backup feature, where the uploads simply stopped working. No errors, nothing in logs to indicate an issue. I tested by AWS S3 access key and secret with another method and was able to upload files just fine.
To resolve the issue, i had to create a new AWS S3 bucket, then go into the S3 setup of Lenovo and provide the required info. I think what made this work for me, was i made sure to not have anything in the bucket name other than letters and numbers. My bucket name before was similar to this lastname.family.pics, my new bucket which works is similar to this lastname123.
Hope this helps, this feature has worked fine for a long time, perhaps an update came down which has different requirements for the api.

Download files from FTP to amazon EMR

I need to download files from FTP server to amazon EMR, I have a shell script to download files but it's working in linux machines, not in amazon EMR namenode. I am not getting any error, the terminal not displaying anything after ran shell script.
Note:I have enable ports on Master security groups. I know the other approach to download FTP to s3 and then amazon EMR, but I need to download files directly to Amazon EMR.
I assume you have tried to download files from FTP server to amazon EMR using bootstrap scripts.
To debug whats going wrong. Can you connect to master node / slaves nodes when they are up and see you script runs well ? this can help if script is running for not.
Other way to debug is , once node is launched try to run script manually on the EMR nodes and see if they throw some error.
Hope the will help to debug why scripts are not running.

Script to take a S3 bucket, Compress it, push the compressed file to an SFTP server

I have a s3 bucket with about 100 gb of small files (in folders).
I have been requested to back this up to a local NAS on a weekly basis.
I have access to a an EC2 instance that is attached to the S3 storage.
My Nas allows me to run an sFTP server.
I also have access to a local server in which I can run a cron job to pull the backup if need be.
How can I best go about this? If possible i would like to only download the files that have been added or changed, or compress it on the server end and then push the compressed file to the SFtp on the Nas.
The end goal is to have a complete backup of the S3 bucket on my Nas with the lowest amount of transfer each week.
Any suggestions are welcome!
Thanks for your help!
Ryan
I think the most scalable method for you to achieve this is using AWS Elastic Map Reduce and Data pipeline.
The architecture is this way:
You will use Data pipeline to configure S3 as an input data node, then EC2 with pig/hive scripts to do the required processing to send the data to SFTP. Pig is extendable to have a custom UDF (user defined function) to send data to SFTP. Then you can setup this pipeline to run at a periodical interval. Having said this this, it requires quite some reading to achieve all these - But a good skill to achieve if you for see future data transformation needs.
Start reading from here:
http://aws.typepad.com/aws/2012/11/the-new-amazon-data-pipeline.html
Similar method can be used for Taking periodic backup of DynamoDB to S3, Reading files from FTP servers, processing and moving to say S3/RDS etc.

getting large datasets onto amazon elastic map reduce

There are some large datasets (25gb+, downloadable on the Internet) that I want to play around with using Amazon EMR. Instead of downloading the datasets onto my own computer, and then re-uploading them onto Amazon, what's the best way to get the datasets onto Amazon?
Do I fire up an EC2 instance, download the datasets (using wget) into S3 from within the instance, and then access S3 when I run my EMR jobs? (I haven't used Amazon's cloud infrastructure before, so not sure if what I just said makes any sense.)
I recommend the following...
fire up your EMR cluster
elastic-mapreduce --create --alive --other-options-here
log on to the master node and download the data from there
wget http://blah/data
copy into HDFS
hadoop fs -copyFromLocal data /data
There's no real reason to put the original dataset through S3. If you want to keep the results you can move them into S3 before shutting down your cluster.
If the dataset is represented by multiple files you can use the cluster to download it in parallel across the machines. Let me know if this is the case and I'll walk you through it.
Mat
If you're just getting started and experimenting with EMR, I'm guessing you want these on s3 so you don't have to start an interactive Hadoop session (and instead use the EMR wizards via the AWS console).
The best way would be to start a micro instance in the same region as your S3 bucket, download to that machine using wget and then use something like s3cmd (which you'll probably need to install on the instance). On Ubuntu:
wget http://example.com/mydataset dataset
sudo apt-get install s3cmd
s3cmd --configure
s3cmd put dataset s3://mybucket/
The reason you'll want your instance and s3 bucket in the same region is to avoid extra data transfer charges. Although you'll be charged in bound bandwidth to the instance for the wget, the xfer to S3 will be free.
I'm not sure about it, but to me it seems like hadoop should be able to download files directly from your sources.
just enter http://blah/data as your input, and hadoop should do the rest. It certainly works with s3, why should it not work with http?