Map multiple subdomains to same S3-bucket - amazon-s3

Is there some way to map multiple (thousands) of subdomains to one s3-bucket?
If so is it also possible to map it to a specific path in the bucket for each subdomain?
I want test1.example.com to map to mybucket/test1 and test2.example.com to map to mybucket/test2.
I know the last part isn't possible with normal dns-records but maybe there is some nifty Route 53 feature?

It's not possible with S3 directly. You can only use 1 subdomain with an S3 bucket.
However you can map multiple subdomains to a Cloudfront distribution.
Update (thanks to #SimonHutchison's comment below)
You can now map up to 100 alternate domains to a Cloudfront
distribution - see http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_cloudfront
You can also use a wildcard to map any subdomain to your distribution:
Using the * Wildcard in Alternate Domain Names
When you add alternate domain names, you can use the * wildcard at the
beginning of a domain name instead of specifying subdomains
individually. For example, with an alternate domain name of
*.example.com, you can use any domain name that ends with example.com in your object URLs, such as www.example.com,
product-name.example.com, and marketing.product-name.example.com. The
name of an object is the same regardless of the domain name, for
example:
www.example.com/images/image.jpg
product-name.example.com/images/image.jpg
marketing.product-name.example.com/images/image.jpg

Starting from October 2012 Amazon introduced a function to handle redirects (HTTP 301) for S3 buckets. You can read the release notes here and refer to this link for configuration via Console / API.
From AWS S3 docs :
Redirects all requests If your root domain is example.com and you want to serve requests for both http://example.com and
http://www.example.com, you can create two buckets named
example.com and www.example.com, maintain website content in only
one bucket, say, example.com, and configure the other bucket to
redirect all requests to the example.com bucket.
Advanced conditional redirects You can conditionally route requests according to specific object key names or prefixes in the
request, or according to the response code. For example, suppose
that you delete or rename an object in your bucket. You can add a
routing rule that redirects the request to another object. Suppose
that you want to make a folder unavailable. You can add a routing
rule to redirect the request to another page, which explains why
the folder is no longer available. You can also add a routing rule
to handle an error condition by routing requests that return the
error to another domain, where the error will be processed.

Related

Redirect using WAF vs CDN

Our application has 2 domains.
http://www.example.org
and
https://secure.example.org
We are planning to decommission https://secure.example.org and have just 1 secure domain name:https://www.example.org
But we want to make sure any old URL still works and gets redirect to the new URL.
http://www.example.org/my-url should redirect you to https://www.example.org/my-url
https://secure.example.org/my-url should redirect you to https://www.example.org/my-url.
The question is - should the redirect be done at the CDN or WAF. We could also do it at the apache webserver, but would like to avoid hops. What is the best approach with their pros and cons.
AWS CloudFront does not support redirects, but it can achieved with using lambda or by using S3. But is there any concern if we use WAF for redirects.
I'm not sure why you need a CDN for this and I'm fairly certain this is not a feature of AWS WAF. If your domain names are managed inside AWS (Route53) you can simply create an alias record that points the old record at the new one.
If your domain names are managed outside of AWS try migrating them to Route53. If you were going to use CloudFront (AWS CDN) to do this you could put it infront of your old URL but it would still require that you place an alias on the CDN. With CloudFront you can configure HTTP to HTTPS redirects if that is your interest in using the CDN.

hosting multiple sites on S3 bucket serving index.html from directory path

I'm new to using AWS S3. I want to know if it's possible to host multiple static websites in one bucket using the website route directing meta data option. I am planning to have multiple folders each with their own index.html, but how can I configure the bucket settings to route to each individual site when a user types the address.
For example by typing
http://<bucket-name>.s3-website-<AWS-region>.amazonaws.com/folder1
will take them to website 1
and
http://<bucket-name>.s3-website-<AWS-region>.amazonaws.com/folder2
will take them to website 2
If this is possible, is there any way to also achieve the configuration using the AWS CLI?
This is possible with a slight modification to the URL. You need to use the URLs as follows with the trailing slash to serve the index.html document inside folder1 and folder2.
http://<bucket-name>.s3-website-<AWS-region>.amazonaws.com/folder1/
http://<bucket-name>.s3-website-<AWS-region>.amazonaws.com/folder2/
If you create such a folder structure in your bucket, you must have an
index document at each level. When a user specifies a URL that
resembles a folder lookup, the presence or absence of a trailing slash
determines the behavior of the website. For example, the following
URL, with a trailing slash, returns the photos/index.html index
document.
Reference: Index Document Support

Wildcard subdomains point to appropriate S3/CloudFront subdirectories

I need multiple subdomains to point to individual buckets/subdirectories on Amazon S3 (synched to CloudFront distribution), where I'm hosting some static files.
So that ANY
SUBDOMAINNAME.example.com
automatically points to
s3.amazonaws.com/somebucket/SUBDOMAINNAME
or
somedistributionname.cloudfront.net/SUBDOMAINNAME
Is there a way to accomplish this without running a server for redirection?
Can it be done without changing DNS records for each new subdomain or, if not, adding the DNS rules programmatically?
What is the most efficient way of doing it, in terms of resource usage. (There might be hundreds of subdomains with 100s of daily requests for each)
update: this answer was correct when written, and the techniques described below are still perfectly viable but potentially less desirable since Lambda#Edge can now be used to accomplish this objective, as I explained in my answer to Serving a multitude of static sites from a wildcard domain in AWS.
No, there is no way to do this automatically.
Is there a way to accomplish this without running a server for redirection?
Technically, it isn't redirection that you'd need, to accomplish this. You'd need path rewriting, and that's why the answer to your ultimate question is "no" -- because Route 53 (and DNS in general) can't do anything related to paths.
Route 53 does support wildcard DNS, but that's of limited help without CloudFront and/or S3 supporting a mechanism to put the host header from the HTTP request into the path (which they don't).
Now, this could easily be accomplished in a "zero-touch" mode with a single Route 53 * wildcard entry, a single CloudFront distribution configured for *.example.com, and one or more EC2 instances running HAProxy to do the request path rewriting and proxy the request onward to the S3 bucket. A single line in a basic configuration file would accomplish that request rewrite:
http-request set-path /%[req.hdr(host)]%[path]
Then you'd need the proxy to send the the actual bucket endpoint hostname to S3, instead of the hostname supplied by the browser:
http-request set-header Host example-bucket.s3.amazonaws.com
The proxy would send the modified request to S3, return S3's response to CloudFront, which would return the response to the browser.
However, if you don't want to take this approach, since a server would be required, then the alternative solution looks like this:
Configure a CloudFront distribution for each subdomain, setting the alternate domain name for the distribution to match the specific subdomain.
Configure the Origin for each subdomain's distribution to point to the same bucket, setting the origin path to /one-specific-subdomain.example.com. CloudFront will change a request for GET /images/funny-cat.jpg HTTP/1.1 to GET /one-specific-subdomain.example.com/images/funny-cat.jpg HTTP/1.1 before sending the request to S3, resulting in the behavior you described. (This is the same net result as the behavior I described for HAProxy, but it is static, not dynamic, hence one distribution per subdomain; in neither case would this be a "redirect" -- so the address bar would not change).
Configure an A-record Alias in Route 53 for each subdomain, pointing to the subdomain's specific CloudFront distribution.
This can all be done programmatically through the APIs, using any one of the the SDKs, or using aws-cli, which is a very simple way to test, prototype, and script such things without writing much code. CloudFront and Route 53 are both fully automation-friendly.
Note that there is no significant disadvantage to each site using its own CloudFront distribution, because your hit ratio will be no different, and distributions do not have a separate charge -- only request and bandwidth charges.
Note also that CloudFront has a default limit of 200 distributions per AWS account but this is a soft limit that can be increased by sending a request to AWS support.
Since Lambda#edge this can be done with a lambda function triggered by the Cloud Front "Viewer Request" event.
Here is an example of such a Lambda function where a request like foo.example.com/index.html will return the file /foo/index.html from your origin.
You will need a CF distribution with the CNAME *.example.com, and an A record "*.example.com" pointing to it
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const subdomain = getSubdomain(request);
if (subdomain) {
request.uri = '/' + subdomain + request.uri;
}
callback(null, request);
};
function getSubdomain(request) {
const hostItem = request.headers.host.find(item => item.key === 'Host');
const reg = /(?:(.*?)\.)[^.]*\.[^.]*$/;
const [_, subdomain] = hostItem.value.match(reg) || [];
return subdomain;
}
As for the costs take a look at lambda pricing. At current pricing is 0.913$ per million requests
A wildcard works on S3. I just put an A record * that points to an IP and it worked.

Why must the Amazon S3 bucket name be the same as website name when hosting a static website

I want to host a static website on S3, i.e. example.com. It requires a bucket name the same as example.com.
Then I found example.com had been occupied by other people.
So that's my question, why bucket name must be the same as the website name? For there are Route 53 to map the website to the bucket endpoints, it appears no necessary for this limitation.
Is there any reason for this?
The brief answer is, "that's how Amazon designed it."
If the bucket name weren't the same as the domain name, how would S3 know which bucket to use to serve requests for a given domain?
You can't say "Route 53," because S3 was created before Route 53, and web site hosting in S3 works the same even if you aren't using Route 53 for DNS.
Similarly, it can't be a configuration option on the bucket, because that would just create a new series of problems -- if the previous owner of a domain still had their bucket configured with your domain, you'd have exactly the same problem as you do, now.
You can still host your site on S3, but with a mismatched bucket name, you need either a reverse proxy server in EC2 in the same region, to rewrite the host header in each request to match the bucket name, or, you can use CloudFront to accomplish a similar purpose, because the bucket name, then, does not need to match -- CloudFront will rewrite the Host header also.
There's a pretty simple reason for this: by the time Amazon gets the request from your browser, the main information available is the domain in the URL, which isn't enough to figure it out.
Suppose your site is example.com, but that bucket name is taken, so you make the bucket my-example. Then you'll have an URL something like http://my-example.s3-website.us-east-1.amazonaws.com/. That will work just fine in your browser, because it gets resolved to some AWS web server, which looks at the Host HTTP header, pulls out your bucket name, and grabs your bucket content.
Now suppose you add something to Route53 to make example.com work. You can either add A records, which let your browser turn example.com directly into an IP address for some AWS S3 webserver. Or you can put in a CNAME, which points from example.com to the full my-example hostname. Either way, your browser's going to look up an IP address, contact an Amazon webserver, and send a Host header that just says example.com. So if that isn't the bucket name, it doesn't know what to do.
Admittedly, it could go an extra step. After all, you told it the hostname when you set up the bucket for serving websites. So at first thought, it seems like it would be nice if it used that as well. However, that won't really solve your problem either, because whoever set up the example.com bucket could well have set it up for hosting.
It seems like the best way to work around this is Cloudfront, which can associate domain names with arbitrary buckets.
I think is the way AWS have designed this and that's it : Check this
I have this done for my company's website and it works great!
Create an S3 bucket and configure it to host a website
Amazon S3 lets you store and retrieve your data from anywhere on the
internet. To organize your data, you create buckets and upload your
data to the buckets by using the AWS Management Console. You can use
S3 to host a static website in a bucket. The following procedure
explains how to create a bucket and configure it for website hosting.
To create an S3 bucket and configure it to host a website
Open the Amazon S3 console at https://console.aws.amazon.com/s3/.
Choose Create bucket.
Enter the following values:
Bucket name - Enter the name of your domain, such as example.com.
Region - Choose the region closest to most of your users.
Make note of the region that you choose; you'll need this information later in the process.
Choose Next.
On the Configure options page, choose Next to accept the default values.
On the Set permissions page, uncheck the Block all public access check box, and choose Next.
Note
The console displays a message about public access to the bucket. Later in this procedure, you add a bucket policy that limits access to the bucket.
On the Review page, choose Create bucket.
On the list of S3 buckets, choose the name of the bucket that you just created.
Choose the Properties tab.
Choose Static website hosting.
Choose Use this bucket to host a website.
For Index document, enter the name of the file that contains the main page for your website.
Note
You'll create an HTML file and upload it to your bucket later in the process.
Choose Save.
Choose the Permissions tab.
Choose Bucket policy.
Copy the following bucket policy and paste it into a text editor. This policy grants everyone on the internet ("Principal":"*") permission to get the files ("Action":["s3:GetObject"]) in the S3 bucket that is associated with your domain name ("arn:aws:s3:::your-domain-name/*"):
{
"Version":"2012-10-17",
"Statement":[{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal":"*",
"Action":[
"s3:GetObject"
],
"Resource":[
"arn:aws:s3:::your-domain-name/*"
]
}]
}
In the bucket policy, replace the value your-domain-name with the name of your domain, such as example.com. This value must match the name of the bucket.
Choose Save.
Create another S3 Bucket, for www.your-domain-name
In the preceding procedure, you created a bucket for your domain name, such as example.com. This allows your users to access your website by using your domain name, such as example.com.
If you also want your users to be able to use www.your-domain-name, such as www.example.com, to access your sample website, you create a second S3 bucket. You then configure the second bucket to route traffic to the first bucket.
Note
Websites typically redirect your-domain-name to www.your-domain-name, for example, from example.com to www.example.com. Because of the way S3 works, you must set up the redirection in the opposite direction, from www.example.com to example.com.
To create an S3 bucket for www.your-domain-name
Choose Create bucket.
Enter the following values:
Bucket name - Enter www.your-domain-name. For example, if you registered the domain name example.com, enter www.example.com.
Region -Choose the same region that you created the first bucket in.
Choose Next.
On the Configure options page, choose Next to accept the default values.
On the Set permissions page, choose Next to accept the default values.
On the Review page, choose Create bucket.
n the list of S3 buckets, choose the name of the bucket that you just created.
Choose the Properties tab.
Choose Static website hosting.
Choose Redirect requests.
Enter the following values:
Target bucket or domain
Enter the name of the bucket that you want to redirect requests to. This is the name of the bucket that you created in the procedure To create an S3 bucket and configure it to host a website.
Protocol - Enter http. You're redirecting requests to an S3 bucket that is configured as a website endpoint, and Amazon S3 doesn't support HTTPS connections for website endpoints.
Choose Save.

How can I hide a custom origin server from the public when using AWS CloudFront?

I am not sure if this exactly qualifies for StackOverflow, but since I need to do this programatically, and I figure lots of people on SO use CloudFront, I think it does... so here goes:
I want to hide public access to my custom origin server.
CloudFront pulls from the custom origin, however I cannot find documentation or any sort of example on preventing direct requests from users to my origin when proxied behind CloudFront unless my origin is S3... which isn't the case with a custom origin.
What technique can I use to identify/authenticate that a request is being proxied through CloudFront instead of being directly requested by the client?
The CloudFront documentation only covers this case when used with an S3 origin. The AWS forum post that lists CloudFront's IP addresses has a disclaimer that the list is not guaranteed to be current and should not be relied upon. See https://forums.aws.amazon.com/ann.jspa?annID=910
I assume that anyone using CloudFront has some sort of way to hide their custom origin from direct requests / crawlers. I would appreciate any sort of tip to get me started. Thanks.
I would suggest using something similar to facebook's robots.txt in order to prevent all crawlers from accessing all sensitive content in your website.
https://www.facebook.com/robots.txt (you may have to tweak it a bit)
After that, just point your app.. (eg. Rails) to be the custom origin server.
Now rewrite all the urls on your site to become absolute urls like :
https://d2d3cu3tt4cei5.cloudfront.net/hello.html
Basically all urls should point to your cloudfront distribution. Now if someone requests a file from https://d2d3cu3tt4cei5.cloudfront.net/hello.html and it does not have hello.html.. it can fetch it from your server (over an encrypted channel like https) and then serve it to the user.
so even if the user does a view source, they do not know your origin server... only know your cloudfront distribution.
more details on setting this up here:
http://blog.codeship.io/2012/05/18/Assets-Sprites-CDN.html
Create a custom CNAME that only CloudFront uses. On your own servers, block any request for static assets not coming from that CNAME.
For instance, if your site is http://abc.mydomain.net then set up a CNAME for http://xyz.mydomain.net that points to the exact same place and put that new domain in CloudFront as the origin pull server. Then, on requests, you can tell if it's from CloudFront or not and do whatever you want.
Downside is that this is security through obscurity. The client never sees the requests for http://xyzy.mydomain.net but that doesn't mean they won't have some way of figuring it out.
[I know this thread is old, but I'm answering it for people like me who see it months later.]
From what I've read and seen, CloudFront does not consistently identify itself in requests. But you can get around this problem by overriding robots.txt at the CloudFront distribution.
1) Create a new S3 bucket that only contains one file: robots.txt. That will be the robots.txt for your CloudFront domain.
2) Go to your distribution settings in the AWS Console and click Create Origin. Add the bucket.
3) Go to Behaviors and click Create Behavior:
Path Pattern: robots.txt
Origin: (your new bucket)
4) Set the robots.txt behavior at a higher precedence (lower number).
5) Go to invalidations and invalidate /robots.txt.
Now abc123.cloudfront.net/robots.txt will be served from the bucket and everything else will be served from your domain. You can choose to allow/disallow crawling at either level independently.
Another domain/subdomain will also work in place of a bucket, but why go to the trouble.