Backing up to S3: symlinks - amazon-s3

I'm trying to figure out right now how to backup some data to S3.
We have a local backup system implemented using rsnapshot and that works perfectly. We're trying to use s3cmd with the --sync option to mimic rsync to transfer the files.
The problem we're having is that symlinks aren't created as symlinks, it seems to be resolving them to the physical file and uploading that instead. Does anyone have any suggestions as to why this would happen?
Am I missing something obvious? Or is it that S3 just isn't suited to this sort of operation? I could setup an EC2 instance and attach some EBS, but it'd be preferable to use S3.

Amazon S3 itself doesn't have the concept of symlinks, which is why I suspect s3cmd uploads the physical file. Its a limitation of S3, not s3cmd.
I'm assuming that you need the symlink itself copied though? If that's the case, can you gzip/tar your directory with symlink and upload that?

There is no symlinks available on S3 but what you can use is google's solution which creates a based file system using S3 (FUSE) . More information here :
https://code.google.com/p/s3fs/wiki/FuseOverAmazon
and here:
http://tjstein.com/articles/mounting-s3-buckets-using-fuse/
I hope it helps

Try using the -F, --follow-symlinks OPTiON when using sync. This worked for me.

Related

Streaming compression to S3 bucket with a custom directory structure

I have got an application that requires to create a compressed file from different objects that are saved on S3. The issue I am facing is I would like to compress objects on the fly without downloading files into a container and do the compression. The reason for that is the size of files can be quite big and I can easily run out of disk space and of course, there will be an extra round trip time of downloading files on disk, compressing them and upload the compressed file to s3 again.
It is worth mentioning that I would like to locate files in the output compressed file in different directories, so when a user decompress the file can see it is stored in different folders.
Since S3 does not have the concept of physical folder structure, I am not sure if this is possible and if there is a better way than download/uploading the files.
NOTE
My issue is not about how to use AWS Lambda to export a set of big files. It is about how I can export files from S3 without downloading objects on a local disk and create a zip file and upload to S3. I would like to simply zip the files on S3 on the fly and most importantly being able to customize the directory structure.
For example,
inputs:
big-file1
big-file2
big-file3
...
output:
big-zip.zip
with the directory structure of:
images/big-file1
images/big-file2
videos/big-file3
...
I have almost the same use case as yours. I have researched it for about 2 months and try with multiple ways but finally I have to use ECS (EC2) for my use case because of the zip file can be huge like 100GB ....
Currently AWS doesn't support a native way to perform compress. I have talked to them and they are considering it a feature but there is no time line given yet.
If your files is about 3 GB in term of size, you can think of Lambda to achieve your requirement.
If your files is more than 4 GB, I believe it is safe to do it with ECS or EC2 and attach more volume if it requires more space/memory for compression.
Thanks,
Yes, there are at least two ways: either using AWS-Lambda or AWS-EC2
EC2
Since aws-cli has support of cp command, you can pipe S3 file to any archiver using unix-pipe, e.g.:
aws s3 cp s3://yours-bucket/huge_file - | gzip | aws s3 cp - s3://yours-bucket/compressed_file
AWS-Lambda
Since maintaining and using EC2 instance just for compressing may be too expensive, you can use Lambda for one-time compressions.
But keep in mind that Lambda has a lifetime limit of 15 minutes. So, if your files really huge try this sequence:
To make sure that file will be compressed, try partial file compression using Lambda
Compressed files could me merged on S3 into one file using Upload Part - Copy

gsutil + rsync remove synced files? (no --remove-source-files flag?)

Is there a way to make gsutil rsync remove synced files?
As far as I know, normally it is done by passing --remove-source-files, but it does not seem to be an option with gsutil rsync (documentation).
Context:
I have a script that produces a large amount of CSV files (100GB+) I want those files to be transferred to Cloud Storage (and once transferred to be removed from my HDD).
Ended up using gcsfuse.
Per documentation:
Local storage: Objects that are new or modified will be stored in
their entirety in a local temporary file until they are closed or
synced.
One work-around for small buckets is delete all bucket contents and re-sync periodically.

How to use Amazon S3 as Moodle Data Root

I am trying to move my moodledata folder content into Amazon S3. i didnt found any document (or guide) to configure this setup.
I am using MOODLE 3.3 STABLE build version.
Can anyone help me to setup this?
You could use s3fs and mount it on your webserver.
I suggest to use local directory (for performance) to:
cache, localcache and sessions

Pentaho - upload list of files to Amazon s3

I am looking for a way to upload a list of files to Amazon S3.
I have tried this: http://open-bi.blogspot.co.il/2010/03/kettel-job-plugin-send-files-to-amazon.html
But it did not work for me. I am using ketle 5.
I would prefer a transformation step, but a job step would also be great.
Thanks
I am looking for the same thing. I think the best solution might be to use ftp? I think you can send files to S3 with ftp.
In my scenario, I also have to move and rename the files before uploading. So we have the path to the file and filename, we can use a FileExists step to make sure it exists first. Then run the move and rename file. Then I was going to try with an sftp step to upload the entire directory of tiles up to Amazon.

Fastest / best way copy data between S3 to EC2?

I have a fairly large amount of data (~30G, split into ~100 files) I'd like to transfer between S3 and EC2: when I fire up the EC2 instances I'd like to copy the data from S3 to EC2 local disks as quickly as I can, and when I'm done processing I'd like to copy the results back to S3.
I'm looking for a tool that'll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I'm not looking for pointers to basic libraries; I'm looking for something fast and reliable.
Unfortunately, Adam's suggestion won't work as his understanding of EBS is wrong (although I wish he was right and often thought myself it should work that way)... as EBS has nothing to do with S3, but it will only give you an "external drive" for EC2 instances that are separate, but connectable to the instances. You still have to do copying between S3 and EC2, even though there are no data transfer costs between the two.
You didn't mention an operating system of your instance, so I cannot give tailored information. A popular command line tool I use is http://s3tools.org/s3cmd ... it is based on Python and therefore, according to info on its website it should work on Win as well as Linux, although I use it ALL the time on Linux. You could easily whip up a quick script that uses its built in "sync" command that works similar to rsync, and have it triggered every time you're done processing your data. You could also use the recursive put and get commands to get and put data only when needed.
There are graphical tools like Cloudberry Pro that have some command line options for Windows too that you can setup schedule commands. http://s3tools.org/s3cmd is probably the easiest.
By now, there is a sync command in the AWS Command line tools, that should do the trick: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
On startup:
aws s3 sync s3://mybucket /mylocalfolder
before shutdown:
aws s3 sync /mylocalfolder s3://mybucket
Of course, the details are always fun to work out eg. how can parallel it is (and can you make it more parallel and is that any faster goven the virtual nature of the whole setup)
Btw hope you're still working on this... or somebody is. ;)
I think you might be better off using an Elastic Block Store to store your files instead of S3. An EBS is akin to a 'drive' on S3 that can be mounted into your EC2 instance without having to copy the data each time, thereby allowing you to persist your data between EC2 instances without having to write to or read from S3 each time.
http://aws.amazon.com/ebs/
Install s3cmd Package as
yum install s3cmd
or
sudo apt-get install s3cmd
depending on your OS
then copy data with this
s3cmd get s3://tecadmin/file.txt
also ls can list the files.
for more detils see this
For me the best form is:
wget http://s3.amazonaws.com/my_bucket/my_folder/my_file.ext
from PuTTy