I have a scenario where I want to host different versions of my Javascript file on Amazon S3, which should all be available at the same time. Due to the constraints of my platform, I can only use 'GET' params to differentiate between these two files.
Ex.
https://s3.bucket.aws.com/bucketname/main.js?ver=1
https://s3.bucket.aws.com/bucketname/main.js?ver=2
How do I store these file versions?
Turn on object versioning on S3, and you can retrieve a specific object version by adding versionId=xxx query parameter. https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectVersions.html
Related
I am using Amazon S3 to store a big amount of text files.
My software is in Java, and I am using the official S3 SDK.
Apart from create/delete/retrieve/, i often need to append new content to files.
S3 does not support append, so I have implemented an append operation that basically:
- with an S3 GET, obtains the file metadata from S3
- with an S3 GET, downloads the whole file into a local copy
- performs the append to the the local copy
- with an S3 PUT, uploads the local file on S3 overwriting the old one.
Appends are never performed concurrently.
I have tested the software, and so far it seems to work well.
And here’s my issue: in scenarios where appends are very very frequent, when I perform append big parts of my files are lost.
Might this depend on S3 eventual consistency on overwrite PUTs?
Thanks for your help!
Yes, it could. Eventual consistency means that the next GET of an object may or may not return the results of the last PUT when an object has been overwritten.
Enable bucket versioning and you should easily be able to identify what happens in these events by capturing and logging the object's version-id each time you upload or download it.
If the version you last uploaded isn't the one you subsequently download, that's a sign of eventual consistency causing the issue.
On the other hand, if you actively manage your download by specifically requesting the latest version using its last known version ID (which you'd need to capture when you PUT the object, and store somewhere that offers strongly-consistent reads, like DynamoDB or RDS) then you can always request the latest version explicitly when you download it.
Explicit requests for a specific version of an object solve the problem because they have no consistency limitations -- a given, specified version of an object either exists or doesn't. The consistency issue is related to implicitly fetching the "latest" version of an object. If the specific index replica that happens to serve your request hasn't yet learned of the latest version, it will serve up a prior version.
This holds true whether versioning is enabled, or not, because an overwrite of an object is not truly an overwrite, even in an unversioned bucket. It's a store + update index to new internal storage location + purge old storage location operation. This isn't documented but atomic overwrites and the consistency model dictate that it must necessarily be the case.
My project needs to meet next requirements.
store large amount of files for reasonable price
tag individual files with custom tags
have API method to search files by name (contains) and tags (exact)
do it all via JS SDK (keep project serverless)
I made some work with Amazon S3 and turned out
no search method in JS SDK http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjectsV2-property
listObjects accepts param Key Prefix (i.e. filename starts with), so there is no way to find by contains
no param to search by tag at all, i can only get it for individual file with getObjectTagging
So question is - what stable service can i use for file storage WITH functionality described above
Azure? Google Cloud? Backblaze B2? something else?
thanks!
If you use Azure blob storage, you can use Azure Search blob indexer to index both the metadata and textual content of your blobs. For a walkthrough of setting this up, see Build and query your first Azure Search index in the portal.
i would like to ask your help, since i read the Amazon guide talking about using Query String Parameters in the Urls to request content stored in buckets, and it was not clear to me.
So, i am planning to use a bucket to store Media content, and use Query String parameter for different versions of a media file. So, if i have an image, i can create the original version, the small version, the large version, and so on. Then, i can request the different versions on my website, based on my need.
But i did not understand how this is all managed. So, all the versions of the file, do they have the same file name? And using Powershell script to upload the files to the bucket, how do i specify the version that i am uploading?
Thank you.
Is there a way to run imagemagick or some other tool on s3 servers to resize the images.
The way I know is first downloading all the image files on my machine and then convert these files and reupload them on s3 server. The problem is the number of file is more than 10000. I don't want to download all the files on my local machine.
Is there a way to convert it on s3 server itself.
look at it: https://github.com/Turistforeningen/node-s3-uploader.
It is a library providing some features for s3 uploading including resizing as you want
Another option is NOT to change the resolution, but to use a service that can convert the images on-the-fly when they are accessed, such as:
Cloudinary
imgix
Also check out the following article on amazon's compute blog.. I found myself here because i had the same question. I think i'm going to implement this in Lambda so i can just specify the size and see if that helps. My problem is i have image files on s3 that are 2MB.. i dont want them at full resolution because I have an app that is retrieving them and it takes a while sometimes for a phone to pull down a 2MB image. But i dont mind storing them at full resolution if i can get a different size just by specifying it in the URL. easy!
https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
S3 does not, alone, enable arbitrary compute (such as resizing) on the data.
I would suggest looking into AWS-Lambda (available in the AWS console), which will allow you to setup a little program (which they call a Lambda) to run when certain events occur in a S3 bucket. You don't need to setup a VM, you only need to specify a few files, with a particular entry point. The program can be written in a few languages, namely node.js python and java. You'd be able to do it all from the console's web GUI.
Usually those are setup for computing things on new files being uploaded. To trigger the program for files that are already in place on S3, you have to "force" S3 to emit one of the events you can hook into for the files you already have. The list is here. Forcing a S3 copy might be sufficient (copy A to B, delete B), an S3 rename operation (rename A to A.tmp, rename A.tmp to A), and creation of new S3 objects would all work. You essentially just poke your existing files in a way that causes your Lambda to fire. You may also invoke your Lambda manually.
This example shows how to automatically generate a thumbnail out of an image on S3, which you could adapt to your resizing needs and reuse to create your Lambda:
http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser-create-test-function-create-function.html
Also, here is the walkthrough on how to configure your lambda with certain S3 events:
http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser.html
I need to store user uploaded files in Amazon S3. I'm new to S3, but as I got from docs, S3 requires of me to specify file upload path in PUT method.
I'm wondering if there is a way to send file to S3, and simply get link for http(s) access? I wish Amazon to handle all headache related to file/folder structure itself. For example, I just pipe from node.js file to S3, and on callback I get http link with no expiration date. And Amazon itself creates smth like /2014/12/01/.../$hash.jpg and just returns me the final link? Such use case looks to be quite common.
Is it possible? If no, could you suggest any options to simplify file storage/filesystem tree structure in S3?
Many thanks.
S3 doesnt' have folders, actually. In a normal filesystem, 2014/12/01/blah.jpg would mean you've got a 2014 folder with a folder called 12 inside it and so on, but in S3 the entire 2014/12/01/blah.jpg it the key - essentially a single long filename. You don't have to create any folders.