S3 Fallback Bucket - amazon-s3

I have built a system where I have product templates. A brand will overwrite the template to create a product. Images can be uploaded to the template and be overwritten on the product. The product images are uploaded to the corresponding brand's S3 bucket. But on the product template images are uploaded to a generic S3 bucket.
Is there a way to make the brand's bucket fallback to the generic bucket if it receives a 404 or 403 with a file url. Similar to the hosted website redirect rules? These are just buckets with images so it wouldn't be a hosted website and I was hoping to avoid turning that on.

There is not a way to do this with S3 alone, but it can be done with CloudFront, in conjunction with two S3 buckets, configured in an origin group with appropriate origin failover settings, so that 403/404 errors from the first bucket cause CloudFront to make a follow-up request from the second bucket.
After you configure origin failover for a cache behavior, CloudFront does the following for viewer requests:
When there’s a cache hit, CloudFront returns the requested file.
When there’s a cache miss, CloudFront routes the request to the primary origin that you identified in the origin group.
When a status code that has not been configured for failover is returned, such as an HTTP 2xx or HTTP 3xx status code, CloudFront serves the requested content.
When the primary origin returns an HTTP status code that you’ve configured for failover, or after a timeout, CloudFront routes the request to the backup origin in the origin group.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/high_availability_origin_failover.html
This seems to be the desired behavior you're describing. It means, of course, that cache misses that need to fall back to the second bucket will require additional time to be served, but for cache hits, there won't be any delay since CloudFront only goes through the "try one, then try the other" on cache misses. It also means that you'll be paying for some traffic on the primary bucket for objects that aren't present, so it makes the most sense sense if the primary bucket will have the object more often than not.
This solution does not redirect the browser -- CloudFront follows the second path before returning a response -- so you'll want to be mindful of the Cache-Control settings you attach to the fallback objects when you upload them, since adding a (previously-absent) primary object after a fallback object is already fetched and cached (by either CloudFront or the browser) will not be visible until any cached objects expire.

Related

How to use Akamai infront of S3 buckets?

I have a static website that is currently hosted in apache servers. I have an akamai server which routes requests to my site to those servers. I want to move my static websites to Amazon S3, to get away from having to host those static files in my servers.
I created a S3 bucket in amazon, gave it appropriate policies. I also set up my bucket for static website hosting. It told me that I can access the site at
http://my-site.s3-website-us-east-1.amazonaws.com
I modified my akamai properties to point to this url as my origin server. When I goto my website, I get Http 504 errors.
What am i missing here?
Thanks
K
S3 buckets don't support HTTPS?
Buckets support HTTPS, but not directly in conjunction with the static web site hosting feature.
See Website Endpoints in the S3 Developer Guide for discussion of the feature set differences between the REST endpoints and the web site hosting endpoints.
Note that if you try to directly connect to your web site hosting endpoint with your browser, you will get a timeout error.
The REST endpoint https://your-bucket.s3.amazonaws.com will work for providing HTTPS between bucket and CDN, as long as there are no dots in the name of your bucket
Or if you need the web site hosting features (index documents and redirects), you can place CloudFront between Akamai and S3, encrypting the traffic inside CloudFront as it left the AWS network on its way to Akamai (it would still be in the clear from S3 to CloudFront, but this is internal traffic on the AWS network). CloudFront automatically provides HTTPS support on the dddexample.cloudfront.net hostname it assigns to each distribution.
I admit, it sounds a bit silly, initially, to put CloudFront behind another CDN but it's really pretty sensible -- CloudFront was designed in part to augment the capabilities of S3. CloudFront also provides Lambda#Edge, which allows injection of logic at 4 trigger points in the request processing cycle (before and after the CloudFront cache, during the request and during the response) where you can modify request and response headers, generate dynamic responses, and make external network requests if needed to implement processing logic.
I faced this problem currently and as mentioned by Michael - sqlbot, putting the CloudFront between Akamai and S3 Bucket could be a workaround, but doing that you're using a CDN behind another CDN. I strongly recommend you to configure the redirects and also customize the response when origin error directly in Akamai (using REST API endpoint in your bucket). You'll need to create three rules, but first, go to CDN > Properties and select your property, Edit New Version based on the last one and click on Add Rule in Property Configuration Settings section. The first rule will be responsible for redirect empty paths to index.html, create it just like the image below:
builtin.AK_PATH is an Akamai's variable. The next step is responsible for redirect paths different from the static ones (html, ico, json, js, css, jpg, png, gif, etc) to \index.html:
The last step is responsible for customize an error response when origin throws an HTTP error code (just like the CloudFront Error Pages). When the origin returns 404 or 403 HTTP status code, the Akamai will call the Failover Hostname Edge Server (which is inside the Akamai network) with the /index.html path. This setup will be triggered when refreshing pages in the browser and when the application has redirection links (which opens new tabs for example). In the Property Hostnames section, add a new hostname that will work as the Failover Hostname Edge Server, the name should has less than 16 characters, then, add the -a.akamaihd.net suffix to it (that's the Akamai pattern). For example: failover-a.akamaihd.net:
Finally, create a new empty rule just like the image below (type the hostname that you just created in the Alternate Hostname in This Property section):
Since you are already using Akamai as a CDN, you could simply use their NetStorage product line to achieve this in a simplified manner.
All you would need to do is to move the content from s3 to Akamai and it would take care of the rest(hosting, distribution, scaling, security, redundancy).
The origin settings on Luna control panel could simply point to the Netstorage FTP location. This will also remove the network latency otherwise present when accessing the S3 bucket from the Akamai Network.

Cloudfront redirecting to S3 endpoint with 307 Temporary Redirect

I'm getting cloudfront endpoint redirecting to S3 with 307 Temporary Redirect. Is there a reason why this is happening?
I've tried creating website endpoint and changed the origin but no luck with same result.
The Temporary Request Redirection It's actually caused by the way S3 buckets behave when they are newly-created (thanks to #Michael-sqlbot) for clarifying this.
From the docs (Temporary Request Redirection)
Due to the distributed nature of Amazon S3, requests can be temporarily routed to the wrong facility. This is most likely to occur immediately after buckets are created or deleted. For example, if you create a new bucket and immediately make a request to the bucket, you might receive a temporary redirect, depending on the location constraint of the bucket.
Change your Origin Domain Name to bucketname.s3-region.amazonaws.com per the docs:
If you're using an Amazon CloudFront distribution with an Amazon S3 origin, CloudFront forwards requests to the default S3 endpoint (s3.amazonaws.com), which is in the us-east-1 Region. If you must access Amazon S3 within the first 24 hours of creating the bucket, you can change the Origin Domain Name of the distribution to include the regional endpoint of the bucket. For example, if the bucket is in us-west-2, you can change the Origin Domain Name from bucketname.s3.amazonaws.com to bucketname.s3-us-west-2.amazonaws.com.

Cloudfront Won't Set Expiration Header from S3 Origin

I am using an S3 bucket to store a bunch of product images for a large web site. These images are being served through Cloudfront with the S3 bucket as the origin. I have noticed that Cloudfront does not put an expiration header on the image even though I have set the distribution behavior to customize the cache headers and set a long min, max, and default TTL in Cloudfront.
I understand that I can put an expiration on the S3 object, however this is going to be quite impractical as I have millions of images. I was hoping that cloudfront would do me the honors of adding this header for me, but it does not.
So my question is the only way to get this expiration header to apply it every S3 object, or perhaps I am missing something in Cloudfront that will do it for me?
CloudFront's TTL configuration only controls the amount of time CloudFront keeps the object in the cache.
It doesn't add any headers.
So, yes, you'll need to set these on the objects in S3.
Note that Cache-Control: is usually considered a better choice than Expires:.
A alternative to avoid updating the onjects is to configure a proxy server in EC2 in the same region as the bucket, and let the server add the headers as the responses pass through it.
Request: CloudFront >> Proxy >> S3
Response: S3 >> Proxy >> CloudFront
...for what it's worth.

Using S3 error document with https

We have an S3 bucket with website hosting enabled, and an error document set. We want to use it to serve images over https.
Over http, the 404 works fine: example. But for https, we need to use a different URL scheme, and the 404 no longer works: example. (That URL scheme also fails with http: example.)
Is there some way to do this? Have I misconfigured the S3 bucket, or something along those lines? (I've given 'list' permission to everyone, which turned the failure from a 403 to a 404, but not the 404 I want.)
We solved this by setting up a Cloudfront as an interface to the S3 bucket.
One tricky bit: the origin for the CloudFront distribution needs to have origin protocol policy set to HTTP only. That means it can't directly be an S3 bucket, which always has 'match viewer' policy. Instead you can set it to the URL of an S3 bucket. Instead of BUCKET.s3.amazonaws.com, use the endpoint given by S3's static website hosting: BUCKET.s3-website-REGION.amazonaws.com.
This might have unintended side effects, and there might be better ways to do it, but it works.

How can I hide a custom origin server from the public when using AWS CloudFront?

I am not sure if this exactly qualifies for StackOverflow, but since I need to do this programatically, and I figure lots of people on SO use CloudFront, I think it does... so here goes:
I want to hide public access to my custom origin server.
CloudFront pulls from the custom origin, however I cannot find documentation or any sort of example on preventing direct requests from users to my origin when proxied behind CloudFront unless my origin is S3... which isn't the case with a custom origin.
What technique can I use to identify/authenticate that a request is being proxied through CloudFront instead of being directly requested by the client?
The CloudFront documentation only covers this case when used with an S3 origin. The AWS forum post that lists CloudFront's IP addresses has a disclaimer that the list is not guaranteed to be current and should not be relied upon. See https://forums.aws.amazon.com/ann.jspa?annID=910
I assume that anyone using CloudFront has some sort of way to hide their custom origin from direct requests / crawlers. I would appreciate any sort of tip to get me started. Thanks.
I would suggest using something similar to facebook's robots.txt in order to prevent all crawlers from accessing all sensitive content in your website.
https://www.facebook.com/robots.txt (you may have to tweak it a bit)
After that, just point your app.. (eg. Rails) to be the custom origin server.
Now rewrite all the urls on your site to become absolute urls like :
https://d2d3cu3tt4cei5.cloudfront.net/hello.html
Basically all urls should point to your cloudfront distribution. Now if someone requests a file from https://d2d3cu3tt4cei5.cloudfront.net/hello.html and it does not have hello.html.. it can fetch it from your server (over an encrypted channel like https) and then serve it to the user.
so even if the user does a view source, they do not know your origin server... only know your cloudfront distribution.
more details on setting this up here:
http://blog.codeship.io/2012/05/18/Assets-Sprites-CDN.html
Create a custom CNAME that only CloudFront uses. On your own servers, block any request for static assets not coming from that CNAME.
For instance, if your site is http://abc.mydomain.net then set up a CNAME for http://xyz.mydomain.net that points to the exact same place and put that new domain in CloudFront as the origin pull server. Then, on requests, you can tell if it's from CloudFront or not and do whatever you want.
Downside is that this is security through obscurity. The client never sees the requests for http://xyzy.mydomain.net but that doesn't mean they won't have some way of figuring it out.
[I know this thread is old, but I'm answering it for people like me who see it months later.]
From what I've read and seen, CloudFront does not consistently identify itself in requests. But you can get around this problem by overriding robots.txt at the CloudFront distribution.
1) Create a new S3 bucket that only contains one file: robots.txt. That will be the robots.txt for your CloudFront domain.
2) Go to your distribution settings in the AWS Console and click Create Origin. Add the bucket.
3) Go to Behaviors and click Create Behavior:
Path Pattern: robots.txt
Origin: (your new bucket)
4) Set the robots.txt behavior at a higher precedence (lower number).
5) Go to invalidations and invalidate /robots.txt.
Now abc123.cloudfront.net/robots.txt will be served from the bucket and everything else will be served from your domain. You can choose to allow/disallow crawling at either level independently.
Another domain/subdomain will also work in place of a bucket, but why go to the trouble.