i have set up a cloudfront distribution to deal with my image resizing for an app.
I have my image resize function sitting in AWS Lambda, with An API gateway call wrapped around it. In order to call this function the following url is used:
/images?url=&width=&height=
and the example:
/images?height=300&width=300&url=smodlEMvQc
When i add this onto the end of my cloudfront URL as follows:
examplecloudfront.net/images?height=300&width=300&url=smodlEMvQc
The query strings never appear in the Popular Objects which indicates the urls are not be cached.
I have ticketed the forward query string option and so it should be showing the query string inclusive url in the popular items as I have tested the same urls many times without any success
For the sake of completeness.
The query strings will appear in cloudfront logs.
Enable them for the cloudfornt distribution that you are using from the AWS console.
This will direct the logs into an S3 bucket.
The logs are CSV files that show the URL and the query strings.
Related
We want to show a different website depending on the current State of the visiting user, is it possible? I've seen you can do Restriction, but I guess this is kind of the opposite.
Without a Redirect would be awesome, ie. using different S3 buckets.
You can use Lambda#Edge to inspect the request and redirect the user as you wish.
Here is a standard pretty URL redirect example that you can adjust to your needs.
You can also add multiple origins e.g. S3 buckets and use path matching pattern to direct traffic based on the URL to specific path.
It seems that the GET parameter location is a reserved parameter on AWS S3. Say I have a resource on an S3 bucket, accessible via the web:
http://my-bucket.s3.amazonaws.com/index.html
... and I simply append the GET parameter location to it, I get an HTTP 403:
http://my-bucket.s3.amazonaws.com/index.html?location=US
It works so long as I change the parameter name to something else. For example:
http://my-bucket.s3.amazonaws.com/index.html?loc=US
So clearly location is a reserved word in AWS S3. My question is: is there a list of all reserved words I shouldn't try to use as GET parameters with S3?
I searched the docs but couldn't find any such list.
location in the query tells S3 that you're asking for the location of a bucket. It's one of several "subresources" (things that are not objects) in S3 that are accessed via query string parameters.
You could probably compile a nearly complete list by reviewing the entire API reference documentation, but here's a partial list found in some older docs (Signature Version 2):
The subresources that must be included when constructing the CanonicalizedResource Element are acl, lifecycle, location, logging, notification, partNumber, policy, requestPayment, torrent, uploadId, uploads, versionId, versioning, versions, and website.
https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html
They periodically add new ones, like select and delete and tagging, so an exhaustive list is not future-proof.
Your safest bet is to use parameters beginning with x- (but not beginning with x-amz since these may be reserved or carry other implications). This is mentioned in the logging documentation:
You can include custom information to be stored in the access log record for a request by adding a custom query-string parameter to the URL for the request. Amazon S3 ignores query-string parameters that begin with "x-", but includes those parameters in the access log record for the request, as part of the Request-URI field of the log record.
https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html
an input text file from the web browser needs to be processed in AWS lambda and the output (JSON) needs to rendered back to the browser. (note: AWS beanstalk is being used).
How to handle a case, if there are 10 users/client uploading text file with the same name. AWS lambda should render the output to the respective user/client. How to do this with S3 or EFS.
(NOTE: the users cannot be uniquely identified as there any login credentials for the users)
We had similar problem and solved in the following way.
Find the uniqueness and name it accordingly.
Filename-TimeStamp.Extension
If there are frequent uploads in a given time, then add a random sequence number.
Filename-TimeStamp-RandomSequence.Extension
If you want to make it completely random, you can use uuid(hexadecimal) or idgen (alphanumeric)
Hope this help.
I think I need some explanation on this. On this page:http://stage.bullydog.com/Products/unfiltered-product/bd/BDGTPD/bully-dog-gt-platinum-diesel, you will see the following images:
The 2 on the left don't have any query strings and are located at:
http://stage.bullydog.com/azure/bdgtpd/40420_1.png and http://stage.bullydog.com/azure/bdgtpd/40420_4.png
On the third image, I put a query string of ?w=500 and that is located at:
http://stage.bullydog.com/azure/bdgtpd/40420_5.png?w=500
Her is a capture of the network traffic when I requested the page that contained the images:
Here is where I need some clarification and overall what is happening here:
If the image url contains a query string, does it pull the image from the azure cdn? I noticed the image 40420_5.png?w=500 has a Request Url of http://stage.bullydog.com/azure/bdgtpd/40420_5.png?w=500, so it doesn't appear to be pulling from the azure cdn. Why is this?
For the other images, take 40420_4 for example, I noticed it issues a 302 first and then another request. Why does it do this?
If using srcset, is ImageResizer beneficial here, for example, is it better to set the different sources with the query string attached, such as 40420_5.png?w=250, 40420_5.png?w=500, etc or is it better to just create the different image sizes such as 40420_5_w250.png, 40420_5_w500.png? or maybe Slimmage with SlimResponse would be the way to go?
AzureReader2 will issue a redirect to Azure if processing isn't required. Otherwise, your browser will not be able to see that AzureReader2 is making an HTTP request in the background to fetch the source resource. Your URLs should always point to the ImageResizer server.
Srcset + ImageResizer is great. Most people use it with the w=[value] and zoom=[value] querystring commands.
If I have a bucket with hundreds of thousands of images, is it ok to have to search for each image I want to display in my site via it's ID or is there a more efficient way (including having multiple folders in a bucket maybe)?
I was also thinking of giving each image a unique hash or something similar in order to stop duplicated names in the bucket. Does that seem like a good idea?
You just link to each image using normal urls. for public files the urls are in the format:
http://mybucket.s3.amazonaws.com/myimage.jpg
For private urls, you need to generate a url (which is easy using any of the sdks) in the format:
http://mybucket.s3.amazonaws.com/myimage.jpg?AWSAccessKeyId=44CF9SAMPLEF252F707&Expires=1177363698&Signature=vjSAMPLENmGa%2ByT272YEAiv4%3D
There's nothing wrong with storing each file with a unique name. If you set the correct headers on the file, any downloads can still have the original name. eg Content-Disposition: attachment; filename=myimage.jpg;
For listing a buckets contents you would use the APIs GetBucket command. I find it easier to use the SDKs for any access via the API.
It can be a pain to search or do things in parallel over bucket objects as amazon lists everything lexicographically (the only way currently supported). The problem with using random IDs is that all of it would be written to the same block storage and you cannot do search in parallel to optimize.
Here is an interesting article on performance improvements. I use it for my work and see significant difference in high load.
http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html