Is there a way to compress 20 - 40mb images when uploading to S3? - amazon-s3

I am looking to create a way where when I upload a file to S3, I would want an original copy and compressed copy that stores in different folders in the same bucket (different bucket works too). I tried to do that with the serverless app repository 'compress'. However, it does not compress images > 4MB
The structure I want to create is:
I upload the file to S3
The original file (with the 100% file size) goes into one folder
A compressed copy is created that goes into another folder in the same bucket
Is there a way to do figure this out? I'm new to AWS

Yes, you can achieve this by doing the following.
Uploaded file to S3 triggers a bucket event notification with a destination of an AWS Lambda. The Lambda reads the file from S3, compresses the file, and then saves the file to another folder in the same bucket

Related

AWS s3 event ObjectRemoved - get file

I am trying to access the file that has been deleted from an s3 Bucket using aws lambdas.
I have set up a trigger for s3:ObjectRemoved*, however after extracting the bucket and file name of the deleted file, the file is deleted from s3 so I do not have access to the contents of the file.
What approach should be taken with AWS lambda to get the contents of the file after a file is deleted from an s3 bucket.
Comment proposed by #keithRozario was useful however with versioning, applying a GET request will result in a not found error as per the s3 documentation.
#Ersoy suggestion of creating a 'bin' bucket or directory/prefix with the same file name and working with that as per your requirements.
In my case copying the initial object created to a bin directory and then accessing that folder when the file is deleted from the main upload directory.

Streaming compression to S3 bucket with a custom directory structure

I have got an application that requires to create a compressed file from different objects that are saved on S3. The issue I am facing is I would like to compress objects on the fly without downloading files into a container and do the compression. The reason for that is the size of files can be quite big and I can easily run out of disk space and of course, there will be an extra round trip time of downloading files on disk, compressing them and upload the compressed file to s3 again.
It is worth mentioning that I would like to locate files in the output compressed file in different directories, so when a user decompress the file can see it is stored in different folders.
Since S3 does not have the concept of physical folder structure, I am not sure if this is possible and if there is a better way than download/uploading the files.
NOTE
My issue is not about how to use AWS Lambda to export a set of big files. It is about how I can export files from S3 without downloading objects on a local disk and create a zip file and upload to S3. I would like to simply zip the files on S3 on the fly and most importantly being able to customize the directory structure.
For example,
inputs:
big-file1
big-file2
big-file3
...
output:
big-zip.zip
with the directory structure of:
images/big-file1
images/big-file2
videos/big-file3
...
I have almost the same use case as yours. I have researched it for about 2 months and try with multiple ways but finally I have to use ECS (EC2) for my use case because of the zip file can be huge like 100GB ....
Currently AWS doesn't support a native way to perform compress. I have talked to them and they are considering it a feature but there is no time line given yet.
If your files is about 3 GB in term of size, you can think of Lambda to achieve your requirement.
If your files is more than 4 GB, I believe it is safe to do it with ECS or EC2 and attach more volume if it requires more space/memory for compression.
Thanks,
Yes, there are at least two ways: either using AWS-Lambda or AWS-EC2
EC2
Since aws-cli has support of cp command, you can pipe S3 file to any archiver using unix-pipe, e.g.:
aws s3 cp s3://yours-bucket/huge_file - | gzip | aws s3 cp - s3://yours-bucket/compressed_file
AWS-Lambda
Since maintaining and using EC2 instance just for compressing may be too expensive, you can use Lambda for one-time compressions.
But keep in mind that Lambda has a lifetime limit of 15 minutes. So, if your files really huge try this sequence:
To make sure that file will be compressed, try partial file compression using Lambda
Compressed files could me merged on S3 into one file using Upload Part - Copy

How to import data to Amazon S3 from URL

I have an S3 bucket and the URL of a large file. I would like to store the content located at the URL in the S3 bucket.
I could download the file to my local machine and then upload it to S3 with Cloudberry or Jungledisk or whatever. However, if the file is large, this may take a long time because the file must be transferred twice, and my network connection is much slower than Amazon's.
If I have a lot of data to store in S3, I can start an EC2 instance, retrieve the files to the instance with curl or wget, and then push the data from the EC2 instance to S3. This works, but it's a lot of steps if I just want to archive one file.
Any suggestions?
You can stream the file directly from the source to S3.
If you are using node, you can use streaming-s3.

Compress file on S3

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.
I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).
I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.
S3 does not support stream compression nor is it possible to compress the uploaded file remotely.
If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html
If you need this more frequently
Serving gzipped CSS and JavaScript from Amazon CloudFront via S3
Late answer but I found this working perfectly.
aws s3 sync s3://your-pics .
for file in "$(find . -name "*.jpg")"; do gzip "$file"; echo "$file"; done
aws s3 sync . s3://your-pics --content-encoding gzip --dryrun
This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket.
Verify the data before removing dryrun flag.
There are now pre-built apps in Lambda that you could use to compress images and files in S3 buckets. So just create a new Lambda function and select a pre-built app of your choice and complete the configuration.
Step 1 - Create a new Lambda function
Step 2 - Search for prebuilt app
Step 3 - Select the app that suits your need and complete the configuration process by providing the S3 bucket names.

Is it possible to add files to Amazon S3 buckets using web URL as source?

I am trying to load one of my S3 buckets.
File i am trying to load is huge tarball on the web, I don't want to download file on my disk and then again start uploading it to S3 bucket.
is there any way that I can directly specify this URL and it get added to S3 ?
You have to "put" to S3, and it does not "get".