Uploading smaller files with multipart file upload with only one part using AWS CLI - amazon-s3

I have configured AWS S3 and a lambda function which triggers when a file is inserted into S3. I have configured an event s3:ObjectCreated:CompleteMultipartUpload in lambda to trigger. When I tested through AWS CLI with large files it worked. But when I upload a smaller file with size less than 5 MB, the event is not triggering the lambda. How can I do this for small size files with only one part?
Anyone please help....

Files less than 5MB cannot be uploaded using multipart upload. Therefore, you can add s3:ObjectCreated:Put event to let your lambda get notified too.

Related

Can I trust aws-cli to re-upload my data without corrupting when the transfer fails?

I extensively use S3 to store encrypted and compressed backups of my workstations. I use the aws cli to sync them to S3. Sometimes, the transfer might fail when in progress. I usually just retry it and let it finish.
My question is: Does S3 has some kind of check to make sure that the previously failed transfer didn't leave corrupted files? Does anyone know if syncing again is enough to fix the previously failed transfer?
Thanks!
Individual files uploaded to S3 are never partially uploaded. Either the entire file is completed and S3 stores the file as an S3 object, or the upload is aborted and S3 object is never stored.
Even in the multi-part upload case, multiple parts can be uploaded but they never form a complete S3 object unless all of the pieces are uploaded and the "Complete Multipart Upload" operation is performed. So there is no need worry about corruption via partial uploads.
Syncing will certainly be enough to fix the previously failed transfer.
Yes, looks like AWS CLI does validate what it uploads and takes care of corruption scenarios by employing MD5 checksum.
From https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html
The AWS CLI will perform checksum validation for uploading and downloading files in specific scenarios.
The AWS CLI will calculate and auto-populate the Content-MD5 header for both standard and multipart uploads. If the checksum that S3 calculates does not match the Content-MD5 provided, S3 will not store the object and instead will return an error message back the AWS CLI.

Is there a way to specify file extension to the file saved to s3 by kinesis firehose

I am setting up a kinesis firehose stream and everything works well with the files getting created on s3 which are delimited. But i was wondering if there is a way to specify an extension to this file since the consumer of this file require it to be either a csv or txt. Is there any way of doing this?
Thanks
You can create an s3 trigger to lambda and rename from there.
I was not able to get an extension for the files generated by firehose but I ended up using data pipeline to do this by using the ShellCommandActivity component which allows one to run shell commands on the files in Amazon S3 and write the resulting files to either S3 or any other location that you'd like.

Merging pdf files stored on Amazon S3

Currently I'm using pdfbox to download all my pdf files on my server and then using pdfbox to merge them together. It's working perfectly fine but it's very slow--since I have to download them all.
Is there a way to perform all of this on S3 directly? I'm trying to find a way to do it, even if not in java also in python and unable to do so.
I read the following:
Merging files on S3 Amazon
https://github.com/boazsegev/combine_pdf/issues/18
Is there a way to merge files stored in S3 without having to download them?
EDIT
The way I ended up doing it was using concurrent.futures and implementing it with concurrent.futures.ThreadPoolExecutor. I set a maximum of 8 worker threads to download all the pdf files from s3.
Once all files were downloaded I merged them with pdfbox. Simple.
S3 is just a data store, so at some level you need to transfer the PDF files from S3 to a server and then back. You'll probably gain the best speed by doing your conversions on an EC2 instance located in the same region as your S3 bucket.
If you don't want to spin up an EC2 instance yourself just to do this then another alternative may be to make use of AWS Lambda, which is a compute service where you can upload your code and have AWS manage the execution of it.

Event-driven Elastic Transcoder?

Is there a way to setup a transcoding pipeline on AWS such that it automatically transcodes any new files uploaded to a particular S3 bucket, and places them in another bucket?
I know there is a REST API, and that in theory the uploader could also issue a REST request to the transcoder after it has uploaded the file, but for a variety of reasons, this isn't really an option.
This can now be accomplished using AWS Lambda.
Lambda basically allows you to trigger/run scripts based off of events. You could easily create a Lambda script that runs as soon as a new file is uploaded to a designated s3 bucket. The script would then start a transcoding job for that newly uploaded video file.
This is literally one of the example use cases provided in the Lambda documentation.

Upload file to Amazon S3 and assign callback for percentage uploaded

Is there an easy way to assign callback function for percentage uploaded in PHP - Amazon s3.
Something similar to this
File Download
but for upload?
The AWS SDK for php 1.2.6 includes a runnable sample in _samples/cli-s3_progress_bar.php.
which shows tracking upload/download progress.
Download here
http://aws.amazon.com/releasenotes/PHP/1553377899765189