Cloudfront bucket - Media consumption using Query String parameter - amazon-s3

i would like to ask your help, since i read the Amazon guide talking about using Query String Parameters in the Urls to request content stored in buckets, and it was not clear to me.
So, i am planning to use a bucket to store Media content, and use Query String parameter for different versions of a media file. So, if i have an image, i can create the original version, the small version, the large version, and so on. Then, i can request the different versions on my website, based on my need.
But i did not understand how this is all managed. So, all the versions of the file, do they have the same file name? And using Powershell script to upload the files to the bucket, how do i specify the version that i am uploading?
Thank you.

Related

Maintain different versions of same file on Amazon s3 with "GET" parameters

I have a scenario where I want to host different versions of my Javascript file on Amazon S3, which should all be available at the same time. Due to the constraints of my platform, I can only use 'GET' params to differentiate between these two files.
Ex.
https://s3.bucket.aws.com/bucketname/main.js?ver=1
https://s3.bucket.aws.com/bucketname/main.js?ver=2
How do I store these file versions?
Turn on object versioning on S3, and you can retrieve a specific object version by adding versionId=xxx query parameter. https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectVersions.html

Creating thumbnails for images on S3

I have quite common situation, as I suppose. I have website that is lcoated on amazon EC2 and I'd like to move all dynamic files to amazon S3. Everything seems ok, except 2 points:
I'm using library PDFNet with their WebViewer. To display pdf files in browser Webviwer use special ".xod" format. PDFNet provide functionality to convert pdf files to xod format. Let's see an example, when PDF file was upload on S3 and no xod file was created (I'm going to use Lambda to avoid it in future, but still). So in this case I have to download file to my local machine, convert it to xod file and upload xod file on S3(I don't see any other opportunities to do it, but it can take a lot of traffic)?
Second problem is almost the same, but it's linked with thumbnails. Currently I'm dynamically resize thumbnails depending on the required resolution and I'd like to keep it. Amazon Lambda is not situable in this case, what is the best way to do it?
Why do you say that Lambda is not suitable here?
For pt#1 PDFNet gives a library for Java, you can write a lambda function in java (its possible now) and use that to get infinite scale.
For pt#2: Amazons tutorial (http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html) gives a detailed example of how to resize images when uploaded to S3. The example is in nodeJs, you can write a java version as well if you like.
Note that if you want to have custom logic for decision making, you can add attributes while uploading the file in S3 (http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#User-Defined Metadata) which you can use in your lambda function to take decisions while resizing.

File permissions on a web server?

I'm new at writing code for websites. The website allows users to upload files, such as profile pictures or other pictures. The files are saved in the unix file system and the URLs to find those images are stored in a MySQL database.
It seems like the only way I can let the user upload files is to give write access to anybody using chmod. Otherwise it complains that it doesn't have write permissions. But they shouldn't be able to write whatever they want or overwrite other users stuff. Similarly, to allow users to see images that they have rightful access to, they need read permissions on the file system. But now that means that anybody with the url to that picture can see the image too, correct? That's not what I want.
Is there a solution to this contradiction? Or am I thinking about the problem incorrectly? Thanks for any help.
You need to manage the permissions in your application and not expose arbitrary parts of your local filesystem directly to the clients. Your application should decide what files someone can see or where to write data. You should not trust data (filenames, etc) from your clients...ideally, store files on disk using systematically generated names and store human-readable names in the database.
SunStar9,
Since you are already using a MySQL database to store the URL of the image on the file system, why not just store the image itself as a BLOB (binary large object)?
This is generally a well-accepted design practice for allowing users to upload binary data to a website.
Are you using PHP, Java, Ruby/Rails, or something other to develop your website? Depending on what you are using, there could be file upload/management plugins or modules that will help you develop what you are trying to do if you are certain you want to use the files ystem for storing the image data.

How do services like Dropbox implement delta encoding if their files are stored in the cloud?

Dropbox claims that during syncing only the portion of files that changes are transmitted back to main server, which is obviously a great functionality, but how do they perform changes to files stored in Amazon S3 cloud? So for example, lets say a 30 page document on user's desktop contains changes to only page 4. Dropbox now syncs the blocks representing the changes and what happens on the backend if they files that they store are in the cloud? Does that mean they have to download the 30 page document stored in S3 to their server, then perform replacement of blocks representing page 4, and then uploading back to the cloud? I doubt this would be the case because that would be somewhat inefficient. The other option I could think of is if Amazon S3 provides update of file stored in the cloud based on byte ranges, so for example, make a PUT request to file X from bytes 100-200 which will replace all the bytes from 100 to 200 with value of PUT request. So I was curious how companies that use other cloud services such as Amazon, implement this type of syncing.
Thanks
As S3 and similar storages don't offer filesystem capabilities, anything that pretends to store files and directories needs to emulate a file system. And when doing this files are often split to pages of certain size, where each page is stored in a separate file in the storage. This way the changed block requires uploading only one page (for example) and not the whole file. I should note, that with files like office documents this approach can be faulty if file size is changed - for example, if you insert a page at the beginning or delete a page, then the whole file will be changed and the complete file would need to be re-uploaded. We didn't analyze how Dropbox in particular does his job, and I just described the common scenario. There exist also different "patch algorithms", where a patch can be created locally (if Dropbox has an older local copy in the cache) and then applied to one or more blocks on the server.
There are several synchronizing tools which transfer deltas over the wire like rsync, rdiff, rdiff-backup, etc. For bi-directional synchronising with S3 there are paid services like s3rsync for example. For pure client-side synchronising, tools like zsync can be considered (which is what many people employ to roll-out app updates).
An alternative approach would be to tar-ball a directory, generate a delta file (using rdiff or xdelta3), and upload the delta file by using a timestamp as part of the key. In order to sync, all you need to do is to perform these 2 checks client-side:
You have all the delta files from S3. If not pull them and apply them to generate the latest backup state.
Your last backup state corresponds to your current directory. If not generate a new delta file and push to S3.
The concerning factor here would be the at least 100% additional space utilization, client-side. But this approach will help you revert changes if needed.

jets3t and Downloading Files from AmazonS3 with Different Name

We're using Amazon S3 for file storage and recently found out that we need to keep some sort of directory structure. Since S3 doesn't allow that, we know we can name the files according to their structure for storage. For example...
abc/123/draft.doc
What I want to know is if I want to provide a public link to this particular file is there anyway that the file can simply be draft.doc instead of abc/123/draft.doc ?
I feel stupid. After some more investigation I realized that by creating a GET url to the resource, I get exactly what I need.