How to serve an AWS EC2 instance from S3 subdirectory - amazon-s3

I have a website hosted on AWS S3, served over Cloudfront. www.mysite.com
I am hosting a blog on an EC2 instance.
I would like to have this blog served from www.mysite.com/blog
For SEO purposes I do not want it to be www.blog.mysite.com
Is it possible to achieve this with only S3 and Couldfront?
I have played around with S3 redirects and Lambda#edge but the documentation on these is not great. In the case of Lambda#edge I want to avoid further complexity if I can. S3 redirects work but the user is no longer under the mysite domain.
S3 Redirect example
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>blog/</KeyPrefixEquals>
</Condition>
<Redirect>
<HostName>${EC2_Public_DNS}</HostName>
</Redirect>
</RoutingRule>
</RoutingRules>
Other articles I have read involve using apache or nginx servers handling the redirect. I would rather not have to add one of these.

Since a redirect is an instruction for the browser, telling it to look elsewhere for the requested resource, CloudFront isn't designed to follow redirects itself -- it returns the redirect to the browser.
What you want, instead, is a new CloudFront Cache Behavior, and a new CloudFront Origin server declaration, configured in the existing CloudFront distribution that's handling your site.
In CloudFront, add a new Origin, setting the Origin Domain Name to the hostname pointing to the EC2 instance (or to the load balancer in front of the instance, if there is one). You'll notice a field called "Origin Path" which you might be tempted to set to "/blog/" or something similar, but that is incorrect. Leave "origin Path" blank.
Then add a new Cache Behavior matching the Path Pattern /blog/* and pointing it to the new origin.
That, in a nutshell, is what you are looking for, but there are several other factors that will require appropriate settings and configuration.
You'll need a TLS certificate on your origin server, unless you set the Origin Protocol Policy to HTTP Only, in which case you're running unencrypted traffic between CloudFront and EC2. CloudFront has specific requirements for correctly configuring TLS on your origin server and most misconfigurations related to TLS will result in a 502 Bad Gateway error though of course there can be other causes for that error code.
Your blog software might require query string parameters and/or cookies, which CloudFront, by default, strips from all requests (because they interfere with caching). These are two of the Cache Behavior settings that commonly require customization, since the defaults are based on appropriate settings for typical static content.
You will also need to configure your blog software to expect the incoming requests to include the path prefix "/blog/" because CloudFront does not remove path components. The only way to present the path to the origin server with one or more elements stripped from it is to use Lambda#Edge to rewrite the path -- as I explained here.
If you are now mentally protesting the path being set to "/blog/" instead of "/blog", the problem to keep in mind is that this path needs correct anchoring -- HTTP semantics assume directory levels end with "/" while files and other resources don't, so you're likely to encounter difficulties if you try to put the blog at a path that doesn't end with /... but for the benefit of users who shouldn't be expected to type the trailing /, you still do need to configure a redirect in S3 -- but only in order to send requests for /blog right back to /blog/.
<RoutingRules>
<RoutingRule>
<Condition>
<KeyEquals>blog</KeyEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>blog/</ReplaceKeyWith>
<HostName>${main_site_hostname}</HostName>
<Protocol>https</Protocol>
</Redirect>
</RoutingRule>
</RoutingRules>
When testing, you may also want to set your Error Caching Minimum TTL to 0 so that you don't fix a problem and keep seeing cached errors returned for the next 5 minutes, even though the error has been resolved by a change you made. CloudFront does this to help avoid overloading an origin server that might already be experiencing problems (as evidenced by the fact that it's returning errors), but it catches some users off guard.

Related

How to use Akamai infront of S3 buckets?

I have a static website that is currently hosted in apache servers. I have an akamai server which routes requests to my site to those servers. I want to move my static websites to Amazon S3, to get away from having to host those static files in my servers.
I created a S3 bucket in amazon, gave it appropriate policies. I also set up my bucket for static website hosting. It told me that I can access the site at
http://my-site.s3-website-us-east-1.amazonaws.com
I modified my akamai properties to point to this url as my origin server. When I goto my website, I get Http 504 errors.
What am i missing here?
Thanks
K
S3 buckets don't support HTTPS?
Buckets support HTTPS, but not directly in conjunction with the static web site hosting feature.
See Website Endpoints in the S3 Developer Guide for discussion of the feature set differences between the REST endpoints and the web site hosting endpoints.
Note that if you try to directly connect to your web site hosting endpoint with your browser, you will get a timeout error.
The REST endpoint https://your-bucket.s3.amazonaws.com will work for providing HTTPS between bucket and CDN, as long as there are no dots in the name of your bucket
Or if you need the web site hosting features (index documents and redirects), you can place CloudFront between Akamai and S3, encrypting the traffic inside CloudFront as it left the AWS network on its way to Akamai (it would still be in the clear from S3 to CloudFront, but this is internal traffic on the AWS network). CloudFront automatically provides HTTPS support on the dddexample.cloudfront.net hostname it assigns to each distribution.
I admit, it sounds a bit silly, initially, to put CloudFront behind another CDN but it's really pretty sensible -- CloudFront was designed in part to augment the capabilities of S3. CloudFront also provides Lambda#Edge, which allows injection of logic at 4 trigger points in the request processing cycle (before and after the CloudFront cache, during the request and during the response) where you can modify request and response headers, generate dynamic responses, and make external network requests if needed to implement processing logic.
I faced this problem currently and as mentioned by Michael - sqlbot, putting the CloudFront between Akamai and S3 Bucket could be a workaround, but doing that you're using a CDN behind another CDN. I strongly recommend you to configure the redirects and also customize the response when origin error directly in Akamai (using REST API endpoint in your bucket). You'll need to create three rules, but first, go to CDN > Properties and select your property, Edit New Version based on the last one and click on Add Rule in Property Configuration Settings section. The first rule will be responsible for redirect empty paths to index.html, create it just like the image below:
builtin.AK_PATH is an Akamai's variable. The next step is responsible for redirect paths different from the static ones (html, ico, json, js, css, jpg, png, gif, etc) to \index.html:
The last step is responsible for customize an error response when origin throws an HTTP error code (just like the CloudFront Error Pages). When the origin returns 404 or 403 HTTP status code, the Akamai will call the Failover Hostname Edge Server (which is inside the Akamai network) with the /index.html path. This setup will be triggered when refreshing pages in the browser and when the application has redirection links (which opens new tabs for example). In the Property Hostnames section, add a new hostname that will work as the Failover Hostname Edge Server, the name should has less than 16 characters, then, add the -a.akamaihd.net suffix to it (that's the Akamai pattern). For example: failover-a.akamaihd.net:
Finally, create a new empty rule just like the image below (type the hostname that you just created in the Alternate Hostname in This Property section):
Since you are already using Akamai as a CDN, you could simply use their NetStorage product line to achieve this in a simplified manner.
All you would need to do is to move the content from s3 to Akamai and it would take care of the rest(hosting, distribution, scaling, security, redundancy).
The origin settings on Luna control panel could simply point to the Netstorage FTP location. This will also remove the network latency otherwise present when accessing the S3 bucket from the Akamai Network.

Redirection issue following migration from Node.js app to static site on Amazon S3

I plan to migrate my personal blog presently using Node.js as a backend to Amazon S3, considering the fact that the content is pretty much always static.
One problem I noticed is that there's no way to do redirection or whatsoever on Amazon S3 (as far as I know).
Lets say I have this URL:
http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean
When I'll migrate it to Amazon, I'll have to create this folder hierarchy:
/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean/
and then add the file index.html in it, containing the data.
Considering this, my URL will then be changed from:
http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean
to
http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean/
There's no way to redirect that right now using Amazon S3.
Also, anyone requesting http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean/index.html will obtain a file, and this is annoying in term of SEO.
Is there an available solution to prevent this behavior and preserve good SEO of my blog?
EDIT
And for people flagging it as not appropriate question, I'm looking here to make proper permanent redirection on Amazon S3, to make sure that visitors looking for articles in the future will find them. Please note here that visitor includes humans and robots.
It seems like we can create redirection rules this way (for a-propos to a-propos/):
<?xml version="1.0"?>
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>a-propos</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>a-propos/</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>
Considering I have a ton of URL to redirect (83 in total), it seems impossible to do because there's a limit of redirection rule:
83 routing rules provided, the number of routing rules in a website configuration is limited to 50.
Other then that, the only option I see is to add this HTTP header x-amz-website-redirect-location to a file with the same name as your prior URL.
For the example above, create a file named a-propos and add the HTTP header x-amz-website-redirect-location and put a-propos/ as a value. This should work, but it takes forever to do.
Syntax can be found here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html#configure-bucket-as-website-routing-rule-syntax
Rule generator can be found here:
http://quiet-cove-8872.herokuapp.com/

Using S3 error document with https

We have an S3 bucket with website hosting enabled, and an error document set. We want to use it to serve images over https.
Over http, the 404 works fine: example. But for https, we need to use a different URL scheme, and the 404 no longer works: example. (That URL scheme also fails with http: example.)
Is there some way to do this? Have I misconfigured the S3 bucket, or something along those lines? (I've given 'list' permission to everyone, which turned the failure from a 403 to a 404, but not the 404 I want.)
We solved this by setting up a Cloudfront as an interface to the S3 bucket.
One tricky bit: the origin for the CloudFront distribution needs to have origin protocol policy set to HTTP only. That means it can't directly be an S3 bucket, which always has 'match viewer' policy. Instead you can set it to the URL of an S3 bucket. Instead of BUCKET.s3.amazonaws.com, use the endpoint given by S3's static website hosting: BUCKET.s3-website-REGION.amazonaws.com.
This might have unintended side effects, and there might be better ways to do it, but it works.

How can I hide a custom origin server from the public when using AWS CloudFront?

I am not sure if this exactly qualifies for StackOverflow, but since I need to do this programatically, and I figure lots of people on SO use CloudFront, I think it does... so here goes:
I want to hide public access to my custom origin server.
CloudFront pulls from the custom origin, however I cannot find documentation or any sort of example on preventing direct requests from users to my origin when proxied behind CloudFront unless my origin is S3... which isn't the case with a custom origin.
What technique can I use to identify/authenticate that a request is being proxied through CloudFront instead of being directly requested by the client?
The CloudFront documentation only covers this case when used with an S3 origin. The AWS forum post that lists CloudFront's IP addresses has a disclaimer that the list is not guaranteed to be current and should not be relied upon. See https://forums.aws.amazon.com/ann.jspa?annID=910
I assume that anyone using CloudFront has some sort of way to hide their custom origin from direct requests / crawlers. I would appreciate any sort of tip to get me started. Thanks.
I would suggest using something similar to facebook's robots.txt in order to prevent all crawlers from accessing all sensitive content in your website.
https://www.facebook.com/robots.txt (you may have to tweak it a bit)
After that, just point your app.. (eg. Rails) to be the custom origin server.
Now rewrite all the urls on your site to become absolute urls like :
https://d2d3cu3tt4cei5.cloudfront.net/hello.html
Basically all urls should point to your cloudfront distribution. Now if someone requests a file from https://d2d3cu3tt4cei5.cloudfront.net/hello.html and it does not have hello.html.. it can fetch it from your server (over an encrypted channel like https) and then serve it to the user.
so even if the user does a view source, they do not know your origin server... only know your cloudfront distribution.
more details on setting this up here:
http://blog.codeship.io/2012/05/18/Assets-Sprites-CDN.html
Create a custom CNAME that only CloudFront uses. On your own servers, block any request for static assets not coming from that CNAME.
For instance, if your site is http://abc.mydomain.net then set up a CNAME for http://xyz.mydomain.net that points to the exact same place and put that new domain in CloudFront as the origin pull server. Then, on requests, you can tell if it's from CloudFront or not and do whatever you want.
Downside is that this is security through obscurity. The client never sees the requests for http://xyzy.mydomain.net but that doesn't mean they won't have some way of figuring it out.
[I know this thread is old, but I'm answering it for people like me who see it months later.]
From what I've read and seen, CloudFront does not consistently identify itself in requests. But you can get around this problem by overriding robots.txt at the CloudFront distribution.
1) Create a new S3 bucket that only contains one file: robots.txt. That will be the robots.txt for your CloudFront domain.
2) Go to your distribution settings in the AWS Console and click Create Origin. Add the bucket.
3) Go to Behaviors and click Create Behavior:
Path Pattern: robots.txt
Origin: (your new bucket)
4) Set the robots.txt behavior at a higher precedence (lower number).
5) Go to invalidations and invalidate /robots.txt.
Now abc123.cloudfront.net/robots.txt will be served from the bucket and everything else will be served from your domain. You can choose to allow/disallow crawling at either level independently.
Another domain/subdomain will also work in place of a bucket, but why go to the trouble.

.htaccess, YSlow, and "Use cookie-free domains"

One of YSlow's measurables is to use cookie-free domains to serve static files.
"When the browser requests a static
image and sends cookies with the
request, the server ignores the
cookies. These cookies are unnecessary
network traffic. To workaround this
problem, make sure that static
components are requested with
cookie-free requests by creating a
subdomain and hosting them there." --
Yahoo YSlow
I interpret this to mean that I could experience performance gains if I move www.example.com/images to static.example.com/images.
Although this is easy to do, I would lose the handy ability within my content management system (Joomla/WordPress) to easily reference and link to these images.
Is it possible to use .htaccess to redirect all requests for a particular folder on www.example.com to a folder on static.example.com instead? Would this method also fool the CMS into thinking the images were located in the default locations on its own domain?
Is it possible to use .htaccess to redirect all requests
for a particular folder on www.example.com to a folder on
static.example.com instead?
Possible, but counter productive — the client would have to make an HTTP request, get the redirect response, then make another HTTP request.
This costs a lot more than the single line of cookie data saved!
Would this method also fool the CMS into thinking the images
were located in the default locations on its own domain?
No.
Although this is easy to do, I would
lose the handy ability within my
content management system
(Joomla/WordPress) to easily reference
and link to these images.
What you could try to do is create a plugin in Joomla that dinamically creates these references.
For example, you have a plugin that when you enter {dinamic_path path} in an article, it appends 'static.example.com/images' to the path provided. So, everytime you need to change the server path, you just change in the plugin. For the links that are already in the database, you can try to use phpMyAdmin to change them in this structure.
It still loses the WYSIWYG hability in TinyMCE, but is an alternative.
In theory you could create a virtual domain that points directly to the images folder, such as images.example.com. Then in your CMS (hopefully at the theme layer) you could replace any paths that point to the images folder with an absolute path to the subdomain.
The redirects would cause far more network traffic, and far more latency, than simply leaving things as they are.
It would redirect the request but the client would still be sending its cookies to the server, so really you accomplished nothing. You would have to directly access the files from a domain that isn't storing cookies for it to work.
What you really want to do is use staticexample.com/images instead of static.example.com/images so that you don't pick up any cookies on the example.com domain that you may have set. If all you do is server images from that domain with a simple apache server or something then you can configure that server not to return even a session cookie.
The redirects are a very bad idea. Cookies cause some performance hits but round trips to the server such as a redirect would cause are a much more serious performance issue.
I did below and gained success:
<FilesMatch "!\.(gif|jpe?g|png)$">
php_value session.cookie_domain example.com
</FilesMatch>
What it means is that if you do not set images in cookie information.
Then images are cookie-free with server.