Modifying CDN Migration Paths - amazon-s3

I'm trying to migrate a CDN from rackspace to aws.
In the former, everything is mapped to individual containers via CNAME records like so:
container1 = CNAME = container1.cdndomain
container2 = CNAME = container2.cdndomain
container3 = CNAME = container3.cdndomain
When we set up aws, everything I read said to set up one (only) cloudfront, with various buckets. So that's what I did.
Now I'm trying to somehow remap all of those containers into their new aws home and corresponding 'buckets' but the single cloudfront is making it hard for me.
I'd rather not go through thousands of line of code and config files and change all the current urls (e.g. manually change container1.cdndomain to cdndomain/container1).
But I can't find a way to remap
this: http://bgimgs.cdndomain/image
To it's aws counterpart
here: http://cdndomain/bucket/image
We use Zerigo for DNS and the interface will accept this CNAME path:
container.domain = CNAME = cdndomain/bucket
but aws doesn't route that to the correct bucket.
I've tried an .htaccess solution
RewriteEngine On
RewriteCond %{HTTP_HOST} ^container1\.cdndomain(.*)$ [NC]
RewriteRule ^(.*)$ http:\/\/cdndomain/container1/$1 [L,R=301]
But that's not working either.
Any ideas?

I don't know where the advice came from to only use one CloudFront distribution with multiple S3 buckets. Sure, you can, but if it doesn't match your needs, there is no reason why you should.
Just create multiple distributions.
If you are already accessing them with different hostnames, they they're different collections of objects, I can't actually think of any reason such advice would be applicable in your case. The only reason that comes to mind that would suggest you use just one CloudFront distro with multiple buckets would be to store all the content (from the different buckets) behind one hostname, or behind multiple hostnames with the same paths used at each hostname (a hack to get browsers to load assets faster is to convince the browser that the assets are all coming from different hosts, so more parallel connections will be opened by the browser).
In your case, this doesn't seem like what you need at all. Distributions, themselves, are free. Problem solved.
Regarding DNS, the other vendor may accept that configuration, but that doesn't mean it's a valid configuration. It isn't. You can't change paths with DNS.
Regarding .htaccess, S3 doesn't process .htaccess files. If you were creating them on your web server, that would be a no-go, too, since the web server would not see the requests in order to redirect them.

Related

Does Amazon S3 support symlinks?

I have an object which I would like to address using different keys without actually copying the object itself, like a symlink in Linux. Does Amazon S3 provide such a thing?
S3 does not support the notion of a symlink, where one object key is treated as an alias for a different object key. (You've probably heard this before: S3 is not a filesystem. It's an object store).
If you are using the static web site hosting feature, there is a partial emulation of this capability, with object-level redirects:
http://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
This causes requests for "object-a" to be greeted with a 301 Moved Permanently response, with the URL for "object-b" in the Location: header, which serves a similar purpose, but is of course still quite different. It only works if the request arrives at the website endpoint (not the REST endpoint).
If you use a reverse proxy (haproxy, nginx, etc.) in EC2 to handle incoming requests and forward them to the bucket, then of course you have the option at the proxy layer of rewriting the request URL before forwarding to S3, so you could translate the incoming request path to whatever you needed to present to S3. How practical this is depends on your application and motivation, but this is one of the strategies I use to modify where, in a particular bucket, an object appears, compared to where it is actually stored, allowing me to rewrite paths based on other attributes in the request.
I had a similar question and needed a solution, which I describe below. While S3 does not support symlinks, you can do this in a way with the following:
echo "https://s3.amazonaws.com/my.bucket.name/path/to/a/targetfile" > file
aws s3 cp file s3://my.bucket.name/file
wget $(curl https://s3.amazonaws.com/my.bucket.name/file)
What this is actually doing is getting the contents of the file, which is really just a pointer to the target file, then passing that to wget (curl can also be used to redirect to a file instead of wget).
This is really just a work around though as its not a true symlink but rather a creative solution to simulate symlinks.
Symlinks no, but same object to multiple keys, maybe.
Please refer to Rodrigo's answer at Amazon S3 - Multiple keys to one object
If you're using the website serving on S3, you can do it via header x-amz-website-redirect-location
If you're not using the website serving, you can create your custom header (x-amz-meta-KeyAlias) and handle it manually.

multiple domains in server - howto

let's suppose we have shopify.com,a platform where everybody can create his e-shop and provide it under his domain,the user can add his domain in other words.
When somebody ads a domain,what's actually happening under the hood?
As far as i know,in apache2 a new VirtualHost is created for each new domain,pointing to the user's folder. But is this the best and most efficient solution to this?
I'm asking for curiosity reasons mainly and also i'd like how those systems work (like shopify.com or webs.com,where every user adds a domain)
Thank you in advance!!
You have a few options that I know of, mostly depending on whether traffic goes to the same ip or not.
When setting up DNS entries you can specify wildcard for subdomains. *.example.com which makes it so that any request for any subdomain that isn't match by another DNS record goes to example.com.
So, having:
*.example.com <ip A>
blog.example.com <ip B>
Would make blog.example.com go to < ip B> and example.com and all other subdomains go to < ip A>.
This means you could have the possibility of giving each new subdomain go to its own ip (very unlikely). You can also catch them all at the same ip and handle it there.
As you mentioned, you could add a new virtual host for each new sub domain created. However, that's kind of a heavy solution, and I think it would generally involve restarting your webserver program to reload the new configuration. Instead, you can use something like rewrites to achieve something similar to the virtual host.
Having a rewrite rule that does <subdomain>.example.com/<resource> => example.com/<subdomain>/<resource> would mean all that would be necessary is creating a new folder in the root of your served directories containing the user's content. No change to configuration. Also, I'm not sure if you're familiar with rewrites, but, they're invisible to the browser/user, so the user still sees <subdomain>.example.com/<resource> even though they're being served content from example.com/<subdomain/<resource>.
This isn't a definitive list of the possibilities, simply a couple possibilities. Any large or scalable solution is probably going to involve many layers of indirection allowing for more complex DNS directing, load balancing, and serving of content.

Strange domains in mod_pagespeed cache folder

About a year ago I have installed mod_pagespeed on my VPS server, set it up and left it running. Recently I was exploring files on my server, went to pagespeed cache folder and discovered some strange folders.
All folders usually named this way ,2Fwww.mydomain.com or ,2F111.111.111.111 for IP addresses. I was surprised to see some domains that does not belong to me, like:
24x7-allrequestsallowed.com
allrequestsallowed.com
m.odnoklassniki.ru
www.fbi.gov
www.securitylab.ru
It looks like something dodgy is going on, was my server compromised, is there any reasonable explanation?
That does look peculiar. Everything in the cache folder should be files that mod_pagespeed tried to rewrite. There are two ways that I know of that this can happen:
1) You reference some third-party resource (say an image from another domain, or google analytics script) and you have explicitly enabled rewriting of that domain with ModPagespeedDomain www.example.com or ModPagespeedDomain *.
2) If your server accepts HTTP requests with invalid Host headers. Try (for example) wget --header="Host: www.fbi.gov" www.yourdomain.com/foo/bar.html. If your server accepts requests like that it may be providing mod_pagespeed with an incorrect base domain, and then subresources would be fetched from the same domain (so if www.yourdomain.com/foo/bar.html references some.jpeg, and your server accepts invalid host headers, we could fetch www.fbi.gov/foo/some.jpeg as the resource). There was a recent security release that makes sure all of these subrequests are done against localhost (not arbitrary third-party websites). Please see: https://developers.google.com/speed/docs/mod_pagespeed/CVE-2012-4001
You might want to look through these folders and see what specific resources are in there. I think that the biggest concern you should have is that someone might be trying to perform an XSS attack on your users or maybe a DDoS attack against another website (like www.fbi.gov), using your server as one vector. I do not think that these folders are indicative that your server itself is compromised.
If you would like to discuss this more, https://groups.google.com/forum/?fromgroups#!forum/mod-pagespeed-discuss is a good list to join and email.

How do I rewrite URLs with Nginx admin / Apache / Wordpress

I have the following URL format:
www.example.com/members/admin/projects/?projectid=41
And I would like to rewrite them to the following format:
www.example.com/avits/projectname/
Project names do not have to be unique when a user creates them therefore I will be checking for an existing name and appending an integer to the end of the project name if a project of the same name already exists. e.g. example.project, example.project1, example.project2 etc.
I am happy setting up the GET request to query the database by project name however I am having huge problems setting up these pretty url's.
I am using Apache with Nginx Admin installed which mens that all static content is served via Nginx without the overhead of apache.
I am totally confused as to whether I should be employing an nginx rewrite rule in my nginx.conf file or standard rewrites in my .htaccess file.
To confuse matters further although this is a rather large custom appliction it is build on top of a wordpress backbone for easy blogging functionality meaning that I also have the built in wordpress rewrite module at my disposal.
I have tried all three methods with absolutely no success. I have read a lot on the matter but simply cannot seem to get anything to work. I am certain this is purely down to a complete lack of understanding on with regards to URL rewriting. Combined with the fact that I don't know which type of rewriting should be applicable in my case means that I am doing nothing more than going round in circles.
Can anyone clear up this matter for me and explain how to rewrite my URLs in the manner described above?
Many thanks.
If you are proxying all the non static file requests to Apache, do the rewrites there - you don't need to do anything on nginx as it will just pass the requests to the back end.
The problem with what you are proposing is that it's not actually a rewrite, a rewrite is taking the first URL and just changing it around or moving the user to another location.
What you need actually takes logic to extrapolate the project name from the project ID.
For example you can rewrite:
www.example.com/members/admin/projects/?projectid=41
To:
www.example.com/avits/41/
Fairly easily, but can you map that /41/ in your app code to change it to /projectname/ - because a URL rewrite can't do that.

Prevent subdomains with AWS S3

I am considering moving one of my very static sites to use Amazon's Simple Storage Service. I have read a few articles describing how I can load the files in and set things up so that http://www.example.com/ is directed at those files, but is there a way I can ensure that people who go to http://example.com/ get 301'd to http://www.example.com/?
Just a heads up for those who find this, S3 now support main root domain. You no longer have to do a redirect.