Resize existing images in s3 bucket/folder - amazon-s3

I have a thousands of existing images in s3, and I need to resize all images from one folder, and put it in another folder, but still in one bucket. Is there any solution to resize it with or without lambda? And what trigger shoud I use?
Thanks in advance

If this is a one-time job I would use the simplest approach:
Start EC2 instance.
Mount S3 bucket as a filesystem with s3fs or goofys.
Run ImageMagick on all the files with scaling parameters.

Rather than resizing the images, you could consider using a "resize-on-the-fly" service such as:
Cloudinary
Imgix
You can construct URLs that automatically resize images to the desired size, without having to resize and store them yourself.

I am using Serverless Image Handler from AWS and I'm enjoying it. It's very simple to implement!

Related

Is there a way to edit the size of on image on Cloudinary without changing the url?

I am optimizing a small website with images hosted on Cloudinary.
I would like to reduce all the file sizes of the images in one go without editing the URLs on the website.
Can this be done?
You'll have to reupload the images while performing incoming transformations, where the transformations are applied upon upload rather than on the fly.

Extract data fom Marklogic 8.0.6 to AWS S3

I'm using Marklogic 8.0.6 and we also have JSON documents in it. I need to extract a lot of data from Marklogic and store them in AWS S3. We tried to run "mlcp" locally and them upload the data to AWS S3 but it's very slow because it's generating a lot of files.
Our Marklogic platform is already connected to S3 to perform backup. Is there a way to extract a specific database in aws s3 ?
It can be OK for me if I have one big file with one JSON document per line
Thanks,
Romain.
I don't know about getting it to s3, but you can use CORB2 to extract MarkLogic documents to one big file with one JSON document per line.
S3:// is a native file type in MarkLogic. So you can also iterate through all your docs and export them with xdmp:save("s3://...).
If you want to make agrigates, then You may want to marry this idea into Sam's suggestion of CORB2 to control the process and assist in grouping your whole database into multiple manageable aggregate documents. Then use a post-back task to run xdmp-save
Thanks guys for your answers. I do not know about CORB2, this is a great solution! But unfortunately, due to bad I/O I prefer a solution to write directly on s3.
I can use a basic Ml query and dump to s3:// with native connector but I always face memory error even launching with the "spawn" function to generate a background process.
Do you have any xquey example to extract each document on s3 one by one without memory permission?
Thanks

Creating thumbnails for images on S3

I have quite common situation, as I suppose. I have website that is lcoated on amazon EC2 and I'd like to move all dynamic files to amazon S3. Everything seems ok, except 2 points:
I'm using library PDFNet with their WebViewer. To display pdf files in browser Webviwer use special ".xod" format. PDFNet provide functionality to convert pdf files to xod format. Let's see an example, when PDF file was upload on S3 and no xod file was created (I'm going to use Lambda to avoid it in future, but still). So in this case I have to download file to my local machine, convert it to xod file and upload xod file on S3(I don't see any other opportunities to do it, but it can take a lot of traffic)?
Second problem is almost the same, but it's linked with thumbnails. Currently I'm dynamically resize thumbnails depending on the required resolution and I'd like to keep it. Amazon Lambda is not situable in this case, what is the best way to do it?
Why do you say that Lambda is not suitable here?
For pt#1 PDFNet gives a library for Java, you can write a lambda function in java (its possible now) and use that to get infinite scale.
For pt#2: Amazons tutorial (http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html) gives a detailed example of how to resize images when uploaded to S3. The example is in nodeJs, you can write a java version as well if you like.
Note that if you want to have custom logic for decision making, you can add attributes while uploading the file in S3 (http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#User-Defined Metadata) which you can use in your lambda function to take decisions while resizing.

Change resolutions of image files stored on S3 server

Is there a way to run imagemagick or some other tool on s3 servers to resize the images.
The way I know is first downloading all the image files on my machine and then convert these files and reupload them on s3 server. The problem is the number of file is more than 10000. I don't want to download all the files on my local machine.
Is there a way to convert it on s3 server itself.
look at it: https://github.com/Turistforeningen/node-s3-uploader.
It is a library providing some features for s3 uploading including resizing as you want
Another option is NOT to change the resolution, but to use a service that can convert the images on-the-fly when they are accessed, such as:
Cloudinary
imgix
Also check out the following article on amazon's compute blog.. I found myself here because i had the same question. I think i'm going to implement this in Lambda so i can just specify the size and see if that helps. My problem is i have image files on s3 that are 2MB.. i dont want them at full resolution because I have an app that is retrieving them and it takes a while sometimes for a phone to pull down a 2MB image. But i dont mind storing them at full resolution if i can get a different size just by specifying it in the URL. easy!
https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
S3 does not, alone, enable arbitrary compute (such as resizing) on the data.
I would suggest looking into AWS-Lambda (available in the AWS console), which will allow you to setup a little program (which they call a Lambda) to run when certain events occur in a S3 bucket. You don't need to setup a VM, you only need to specify a few files, with a particular entry point. The program can be written in a few languages, namely node.js python and java. You'd be able to do it all from the console's web GUI.
Usually those are setup for computing things on new files being uploaded. To trigger the program for files that are already in place on S3, you have to "force" S3 to emit one of the events you can hook into for the files you already have. The list is here. Forcing a S3 copy might be sufficient (copy A to B, delete B), an S3 rename operation (rename A to A.tmp, rename A.tmp to A), and creation of new S3 objects would all work. You essentially just poke your existing files in a way that causes your Lambda to fire. You may also invoke your Lambda manually.
This example shows how to automatically generate a thumbnail out of an image on S3, which you could adapt to your resizing needs and reuse to create your Lambda:
http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser-create-test-function-create-function.html
Also, here is the walkthrough on how to configure your lambda with certain S3 events:
http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser.html

Moving files >5 gig to AWS S3 using a Data Pipeline

We are experiencing problems with files produced by Java code which are written locally and then copied by the Data Pipeline to S3. The error mentions file size.
I would have thought that if multipart uploads is required, then the Pipeline would figure that out. I wonder if there is a way of configuring the Pipeline so that it indeed uses multipart uploading. Because otherwise the current Java code which is agnostic about S3 has to write directly to S3 or has to do what it used to and then use multipart uploading -- in fact, I would think the code would just directly write to S3 and not worry about uploading.
Can anyone tell me if Pipelines can use multipart uploading and if not, can you suggest whether the correct approach is to have the program write directly to S3 or to continue to write to local storage and then perhaps have a separate program be invoked within the same Pipeline which will do the multipart uploading?
The answer, based on AWS support, is that indeed 5 gig files can't be uploaded directly to S3. And there is no way currently for a Data Pipeline to say, "You are trying to upload a large file, so I will do something special to handle this." It simply fails.
This may change in the future.
Data Pipeline CopyActivity does not support files larger than 4GB. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html
This is below the 5GB limit imposed by S3 for each file-part put.
You need to write your own script wrapping AWS CLI or S3cmd (older). This script may be executed as a shell activity.
Writing directly to S3 may be an issue as S3 does not support append operations - unless you can somehow write multiple smaller objects in a folder.