Rewriting static resources url in .htaccess for CDN - apache

I have an existing live application based on php/Mysql/apache stack. A quick performance evaluation revealed that a CDN solution would help us gain a lot of speed. Planning to use cloudfront for the CDN.
The issue is existing code wasnt written with CDN in mind.
At the moment, our html outputs contain statuc resources under link tags and are referenced with "./images/test.png" etc...
is there anyway to identify these links just before sending the output and replace it to load from local CDN url.

Related

Serve images via Cloudfront while using mod_pagespeed with Apache

We have mod_pagespeed on Apache and filters like convert_jpeg_to_webp.
We would however, also like to add CDN to the website, especially for the images and javascript files, in order to further enhance the performance and the Pagespeed score for this website. However, once we enable CDN, jpeg versions of the image are delivered via the CDN. We would like, however, to get the webp versions of the images (which work totally as expected) like when CDN is not enabled.
For example, without cdn, www.domainname.com/assets/images/imagename.jpg.pagespeed.ce.fqqfe4pa.jpg
is converted to
www.domainname.com/assets/images/imagename.jpg.pagespeed.ce.fqqfe4pa.webp
But with CDN enabled, we get something like this in return:
cdn.domainname.com/assets/images/imagename.jpg.pagespeed.ce.fqqfe4pa.jpg
Anyone maybe has a solution?
Have you authorized your CDN domain? You can find further details here: https://www.modpagespeed.com/doc/domains

Cloudfront static sites with versioning

I like the idea of hosting a static site in S3 + Cloudfront.
Best practice seems to be to version files in S3. For example, for site version 2324, I'd put stuff in
s3://my-site-assets/2324/images/logo.jpg
The thing I'm having trouble with is how to version the actual pages. If a "hello" page is at
s3://my-site-assets/2324/hello.html
I would want visitors to https://my-site.com/hello.html to get the correct version.
Is this possible with a 100% static site? Right now, I'm doing something similar by versioning assets, but my pages are all served via EC2/Varnish/ELB. It seems quite heavyweight just for rewriting hello.html -> 2324/hello.html.
It is possible today with Lambda#Edge You have to do server side redirection to load latest versioned site. Since you're versioning your site, somewhere you must be maintaining the version. Use that number in your lambda#edge logic.
Request (https://my-site.com/hello.html) -> L#E (redirect here) -> CF -> S3 (and all the way back)
L#E logic: replace(base_url, base_url+/+${latest_version})
Reference doc on routing via Lambda#Edge: https://making.close.com/posts/redirects-using-cloudfront-lambda-edge

Setting up the dispatcher and CDN integration with AEM 6.x - Steps and Best Practices?

We need to setup a new AEM 6.x project that on production makes use of benefits of a CDN (like Akamai) and a dispatcher module within a Apache HTTP Web Server.
So this query is about asking for what point to begin at and what are the steps involved in the same? Also what are the best practices to take into consideration while going for the same?
It entirely depends on how you want to configure your systems, both dispatcher and CDN cache have their own best practices outlined in documentation (available over internet).
There are two types of setup I have seen so far -
Cache everything on dispatcher as well as CDN
Cache everything on dispatcher but do not cache HTML on CDN (so effectively you are caching images, CSS, JS but no HTML)
Cache everything on dispatcher as well as CDN
After first hit everything gets cached
Simple setup
Cache cleanup is complex, you will need your own logic to associate with dispatcher flush to flush CDN cache. Refer to Akamai Connector
There are complexities to related content flush, while publishing the content from author to publish AEM identifies the related content and sends the activation for same. This needs to happen for CDN flush as well.
Complete flush of CDN cache is not an option, it takes a lot of time to complete.
Not caching HTML on CDN
Has all the advantages of above approach
For libraries and image assets implement selector based versioning (AEM ACS Commons provides that for ClientLibs, you could implement your own logic for ASSETS url rewriter that adds last modifiedDate as selector to asset call, and your rendering servlet takes care of selector management)
With proper expires header set on Assets and clientLibraries you will not have to worry about explicit CDN cache management
Pages when activated with new assets and/or library will refer to updated selectors and get cached on dispatcher. When a call is made to that page, CDN caches the libraries and assets and page refers to CDN version of same. Assets and libraries are independent and are reflected independently with the Pages.
Based on TTL the outdated resources gets cleared of from CDN
There may be additional steps required in getting the above steps working, what I have outlined is the high level approach. You will need to follow the security, SSL, domain modeling, and other configuration guidelines as specified in the dispatcher documentation and CDN setup. For few you could refer to AKAMAI blog here

React Router + AWS Backend, how to SEO

I am using React and React Router in my single page web application. Since I'm doing client side rendering, I'd like to serve all of my static files (HTML, CSS, JS) with a CDN. I'm using Amazon S3 to host the files and Amazon CloudFront as the CDN.
When the user requests /css/styles.css, the file exists so S3 serves it.
When the user requests /foo/bar, this is a dynamic URL so S3 adds a hashbang: /#!/foo/bar. This will serve index.html. On my client side I remove the hashbang so my URLs are pretty.
This all works great for 100% of my users.
All static files are served through a CDN
A dynamic URL will be routed to /#!/{...} which serves index.html (my single page application)
My client side removes the hashbang so the URLs are pretty again
The problem
The problem is that Google won't crawl my website. Here's why:
Google requests /
They see a bunch of links, e.g. to /foo/bar
Google requests /foo/bar
They get redirected to /#!/foo/bar (302 Found)
They remove the hashbang and request /
Why is the hashbang being removed? My app works great for 100% of my users so why do I need to redesign it in such a way just to get Google to crawl it properly? It's 2016, just follow the hashbang...
</rant>
Am I doing something wrong? Is there a better way to get S3 to serve index.html when it doesn't recognize the path?
Setting up a node server to handle these paths isn't the correct solution because that defeats the entire purpose of having a CDN.
In this thread Michael Jackson, top contributor to React Router, says "Thankfully hashbang is no longer in widespread use." How would you change my set up to not use the hashbang?
You can also check out this trick. You need to setup cloudfront distribution and then alter 404 behaviour in "Error Pages" section of your distribution. That way you can again domain.com/foo/bar links :)
I know this has been a few months old, but for anyone that came across the same problem, you can simply specify "index.html" as the error document in S3. Error document property can be found under bucket Properties => static Website Hosting => Enable website hosting.
Please keep in mind that, taking this approach means you will be responsible for handling Http errors like 404 in your own application along with other http errors.
The Hash bang is not recommended when you want to make SEO friendly website, even if its indexed in Google, the page will display only a little and thin content.
The best way to do your website is by using the latest trend and techniques which is "Progressive web enhancement" search for it on Google and you will find many articles about it.
Mainly you should do a separate link for each page, and when the user clicks on any page he will be redirected to this page using any effect you want or even if it single page website.
In this case, Google will have a unique link for each page and the user will have the fancy effect and the great UX.
EX:
Contact Us

Redirection issue following migration from Node.js app to static site on Amazon S3

I plan to migrate my personal blog presently using Node.js as a backend to Amazon S3, considering the fact that the content is pretty much always static.
One problem I noticed is that there's no way to do redirection or whatsoever on Amazon S3 (as far as I know).
Lets say I have this URL:
http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean
When I'll migrate it to Amazon, I'll have to create this folder hierarchy:
/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean/
and then add the file index.html in it, containing the data.
Considering this, my URL will then be changed from:
http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean
to
http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean/
There's no way to redirect that right now using Amazon S3.
Also, anyone requesting http://blogue.jpmonette.net/2013/06/11/hebergez-vos-applications-nodejs-grace-a-digitalocean/index.html will obtain a file, and this is annoying in term of SEO.
Is there an available solution to prevent this behavior and preserve good SEO of my blog?
EDIT
And for people flagging it as not appropriate question, I'm looking here to make proper permanent redirection on Amazon S3, to make sure that visitors looking for articles in the future will find them. Please note here that visitor includes humans and robots.
It seems like we can create redirection rules this way (for a-propos to a-propos/):
<?xml version="1.0"?>
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>a-propos</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>a-propos/</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>
Considering I have a ton of URL to redirect (83 in total), it seems impossible to do because there's a limit of redirection rule:
83 routing rules provided, the number of routing rules in a website configuration is limited to 50.
Other then that, the only option I see is to add this HTTP header x-amz-website-redirect-location to a file with the same name as your prior URL.
For the example above, create a file named a-propos and add the HTTP header x-amz-website-redirect-location and put a-propos/ as a value. This should work, but it takes forever to do.
Syntax can be found here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html#configure-bucket-as-website-routing-rule-syntax
Rule generator can be found here:
http://quiet-cove-8872.herokuapp.com/