an input text file from the web browser needs to be processed in AWS lambda and the output (JSON) needs to rendered back to the browser. (note: AWS beanstalk is being used).
How to handle a case, if there are 10 users/client uploading text file with the same name. AWS lambda should render the output to the respective user/client. How to do this with S3 or EFS.
(NOTE: the users cannot be uniquely identified as there any login credentials for the users)
We had similar problem and solved in the following way.
Find the uniqueness and name it accordingly.
Filename-TimeStamp.Extension
If there are frequent uploads in a given time, then add a random sequence number.
Filename-TimeStamp-RandomSequence.Extension
If you want to make it completely random, you can use uuid(hexadecimal) or idgen (alphanumeric)
Hope this help.
Related
It seems that the GET parameter location is a reserved parameter on AWS S3. Say I have a resource on an S3 bucket, accessible via the web:
http://my-bucket.s3.amazonaws.com/index.html
... and I simply append the GET parameter location to it, I get an HTTP 403:
http://my-bucket.s3.amazonaws.com/index.html?location=US
It works so long as I change the parameter name to something else. For example:
http://my-bucket.s3.amazonaws.com/index.html?loc=US
So clearly location is a reserved word in AWS S3. My question is: is there a list of all reserved words I shouldn't try to use as GET parameters with S3?
I searched the docs but couldn't find any such list.
location in the query tells S3 that you're asking for the location of a bucket. It's one of several "subresources" (things that are not objects) in S3 that are accessed via query string parameters.
You could probably compile a nearly complete list by reviewing the entire API reference documentation, but here's a partial list found in some older docs (Signature Version 2):
The subresources that must be included when constructing the CanonicalizedResource Element are acl, lifecycle, location, logging, notification, partNumber, policy, requestPayment, torrent, uploadId, uploads, versionId, versioning, versions, and website.
https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html
They periodically add new ones, like select and delete and tagging, so an exhaustive list is not future-proof.
Your safest bet is to use parameters beginning with x- (but not beginning with x-amz since these may be reserved or carry other implications). This is mentioned in the logging documentation:
You can include custom information to be stored in the access log record for a request by adding a custom query-string parameter to the URL for the request. Amazon S3 ignores query-string parameters that begin with "x-", but includes those parameters in the access log record for the request, as part of the Request-URI field of the log record.
https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html
i have set up a cloudfront distribution to deal with my image resizing for an app.
I have my image resize function sitting in AWS Lambda, with An API gateway call wrapped around it. In order to call this function the following url is used:
/images?url=&width=&height=
and the example:
/images?height=300&width=300&url=smodlEMvQc
When i add this onto the end of my cloudfront URL as follows:
examplecloudfront.net/images?height=300&width=300&url=smodlEMvQc
The query strings never appear in the Popular Objects which indicates the urls are not be cached.
I have ticketed the forward query string option and so it should be showing the query string inclusive url in the popular items as I have tested the same urls many times without any success
For the sake of completeness.
The query strings will appear in cloudfront logs.
Enable them for the cloudfornt distribution that you are using from the AWS console.
This will direct the logs into an S3 bucket.
The logs are CSV files that show the URL and the query strings.
I would like to send the last modified date of the uploaded file to the server. I have the javascroipt snippet to get that using FileApi ($(this).fineUploaderS3('getFile', id).lastModifiedDate). I would like to send this information when the uploadSuccess's endPoint is called, but I cannot find the callback which is right for me at Events | Fine Uploader documentation, and I cannot find the way I could inject the data.
These are submitted as POST parameters to my server when the upload finished to S3: key, uuid, name, bucket. I would like to inject the lastModified date here somehow.
Option 2:
Asking the Amazon S3 service about last modification date does not help directly, because the uploaded file has the current date, not the file's original date. It would be great if we could inject the information into the FineUploader->S3 communication in a way that S3 would use it for setting it's own last modified date for the uploaded file.
Other perspective I considered:
If I use onSubmit and setParams then I the Amazon S3 server will take it as 'x-amz-meta-lastModified'. The problem is that when I upload larger files (which is uploaded in chunks with an other dance) then I get signing error. ...<Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>....
EDIT
The Other perspective I considered works. The bottleneck was the name of the custom metadata chih I used at setParams. It cannot contain capital letters, otherwise the signing fails. I did not find any reference documentation for it. For one I checked Object Key and Metadata - Amazon Simple Storage Service. If someone could find me a reference I would include that here.
The original question (when and how to send last modified date to the server component) remains.
(Server is PHP.)
EDIT2
The Option 2 will not work, as far my research went the "Last Modified" entry cannot be manually altered at Amazon S3.
If the S3 API does not return the expected last modified date, you can check the value of the lastModifiedDate on the File object associated with the upload (provided the browser supports the file API) and send that value as a parameter to the upload success endpoint. See the documentation for the setUploadSuccessParams API method for more details.
If I have a bucket with hundreds of thousands of images, is it ok to have to search for each image I want to display in my site via it's ID or is there a more efficient way (including having multiple folders in a bucket maybe)?
I was also thinking of giving each image a unique hash or something similar in order to stop duplicated names in the bucket. Does that seem like a good idea?
You just link to each image using normal urls. for public files the urls are in the format:
http://mybucket.s3.amazonaws.com/myimage.jpg
For private urls, you need to generate a url (which is easy using any of the sdks) in the format:
http://mybucket.s3.amazonaws.com/myimage.jpg?AWSAccessKeyId=44CF9SAMPLEF252F707&Expires=1177363698&Signature=vjSAMPLENmGa%2ByT272YEAiv4%3D
There's nothing wrong with storing each file with a unique name. If you set the correct headers on the file, any downloads can still have the original name. eg Content-Disposition: attachment; filename=myimage.jpg;
For listing a buckets contents you would use the APIs GetBucket command. I find it easier to use the SDKs for any access via the API.
It can be a pain to search or do things in parallel over bucket objects as amazon lists everything lexicographically (the only way currently supported). The problem with using random IDs is that all of it would be written to the same block storage and you cannot do search in parallel to optimize.
Here is an interesting article on performance improvements. I use it for my work and see significant difference in high load.
http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html
Is there any way to update files stored on Amazon CloudFront (Amazon's CDN service)?
Seems like it won't take any update of a file we make (e.g. removing the file and storing the new one with the same file name as before).
Do I have to explicitly trigger an update process to remove the files from the edge servers to get the new file contents published?
Thanks for your help
Here is how I do it using the CloudFront control panel.
Select CloudFront from the list of services.
Make sure Distributions from the top left is selected.
Next click the link for the associated distribution from the list (under id).
Select the Invalidations tab.
Click the Create Invalidation button and enter the location of the files you want to be invalidated (updated).
For example:
Then click the Invalidate button and you should now see InProgress under status.
It usually takes 10 to 15 minutes to complete your invalidation
request, depending on the size of your request.
Once it says completed you are good to go.
Tip:
Once you have created a few invalidations if you come back and need to invalidate the same files use the select box and the Copy link will become available making it even quicker.
Amazon added an Invalidation Feature. This is API Reference.
Sample Request from the API Reference:
POST /2010-08-01/distribution/[distribution ID]/invalidation HTTP/1.0
Host: cloudfront.amazonaws.com
Authorization: [AWS authentication string]
Content-Type: text/xml
<InvalidationBatch>
<Path>/image1.jpg</Path>
<Path>/image2.jpg</Path>
<Path>/videos/movie.flv</Path>
<CallerReference>my-batch</CallerReference>
</InvalidationBatch>
Set TTL=1 hour and replace
http://developer.amazonwebservices.com/connect/ann.jspa?annID=655
Download Cloudberry Explorer freeware version to do this on single files:
http://blog.cloudberrylab.com/2010/08/how-to-manage-cloudfront-object.html
Cyberduck for Mac & Windows provides a user interface for object invalidation. Refer to http://trac.cyberduck.ch/wiki/help/en/howto/cloudfront.
I seem to remember seeing this on serverfault already, but here's the answer:
By "Amazon CDN" I assume you mean "CloudFront"?
It's cached, so if you need it to be updated right now (as opposed to "new version will be visible in 24hours") you'll have to choose a new name. Instead of "logo.png", use "logo.png--0", and then update it using "logo.png--1", and change your html to point to that.
There is no way to "flush" amazon cloudfront.
Edit: This was not possible, it is now. See comments to this reply.
CloudFront's user interface offers this under the [i] button > "Distribution Settings", tab "Invalidations": https://console.aws.amazon.com/cloudfront/home#distribution-settings
In ruby, using the fog gem
AWS_ACCESS_KEY = ENV['AWS_ACCESS_KEY_ID']
AWS_SECRET_KEY = ENV['AWS_SECRET_ACCESS_KEY']
AWS_DISTRIBUTION_ID = ENV['AWS_DISTRIBUTION_ID']
conn = Fog::CDN.new(
:provider => 'AWS',
:aws_access_key_id => AWS_ACCESS_KEY,
:aws_secret_access_key => AWS_SECRET_KEY
)
images = ['/path/to/image1.jpg', '/path/to/another/image2.jpg']
conn.post_invalidation AWS_DISTRIBUTION_ID, images
even on invalidation, it still takes 5-10 minutes for the invalidation to process and refresh on all amazon edge servers
CrossFTP for Win, Mac, and Linux provides a user interface for CloudFront invalidation, check this for more details: http://crossftp.blogspot.com/2013/07/cloudfront-invalidation-with-crossftp.html
I am going to summarize possible solutions.
Case 1: One-time update: Use Console UI.
You can manually go through the console's UI as per #CoalaWeb's answer and initiate an "invalidation" on CloudFront that usually takes less than one minute to finish. It's a single click.
Additionally, you can manually update the path it points to in S3 there in the UI.
Case 2: Frequent update, on the Same path in S3: Use AWS CLI.
You can use AWS CLI to simply run the above thing via command line.
The command is:
aws cloudfront create-invalidation --distribution-id E1234567890 --paths "/*"
Replace the E1234567890 part with the DistributionId that you can see in the console. You can also limit this to certain files instead of /* for everything.
An example of how to put it in package.json for a Node/JavaScript project as a target can be found in this answer. (different question)
Notes:
I believe the first 1000 invalidations per month are free right now (April 2021).
The user that performs AWS CLI invalidation should have CreateInvalidation access in IAM. (Example in the case below.)
Case 3: Frequent update, the Path on S3 Changes every time: Use a Manual Script.
If you are storing different versions of your files in S3 (i.e. the path contains the version-id of the files/artifacts) and you need to change that in CloudFront every time, you need to write a script to perform that.
Unfortunately, AWS CLI for CloudFront doesn't allow you to easily update the path with one command. You need to have a detailed script. I wrote one, which is available with details in this answer. (different question)