I've a web app served by Apache, html pages sent to browsers include several CSS files that are hosted at same web app domain.
I've noticed some websites use my css (and images) including in their pages but this increase my (limited) Apache server traffic.
I want to allow css access only for pages hosted at specific domain(s).
How can I configure the web server (Apache) to refuse serving css outside specific domain(s)?
Example (valid access)
myhost.com/index.html contains inclusion for styles/mystyles.css
Example (invalid access)
foreignhost.com/index.html contains inclusion to myhost.com/styles/mystyles.css
Hotlinking can be prevented with .htaccess files, but it might be more fun to change the URL of your CSS files and put up a file at the old URL that makes their entire site hot pink.
Related
It seems that if I have in my page links that use the .html extension, Google Search Console's crawling or live test will fail to retrieve those resources, with a Redirection Error message under Page resources: x/x couldn't be loaded, even though a client browser can get the results just fine.
Side note: If those .html URLs are tested directly for a live URL test, they show they can be retrieved and can be indexed.
Is there any solution to this besides removing all the .html path extensions in production? Keeping the extensions is for the local testing Python server.
I am using React and React Router in my single page web application. Since I'm doing client side rendering, I'd like to serve all of my static files (HTML, CSS, JS) with a CDN. I'm using Amazon S3 to host the files and Amazon CloudFront as the CDN.
When the user requests /css/styles.css, the file exists so S3 serves it.
When the user requests /foo/bar, this is a dynamic URL so S3 adds a hashbang: /#!/foo/bar. This will serve index.html. On my client side I remove the hashbang so my URLs are pretty.
This all works great for 100% of my users.
All static files are served through a CDN
A dynamic URL will be routed to /#!/{...} which serves index.html (my single page application)
My client side removes the hashbang so the URLs are pretty again
The problem
The problem is that Google won't crawl my website. Here's why:
Google requests /
They see a bunch of links, e.g. to /foo/bar
Google requests /foo/bar
They get redirected to /#!/foo/bar (302 Found)
They remove the hashbang and request /
Why is the hashbang being removed? My app works great for 100% of my users so why do I need to redesign it in such a way just to get Google to crawl it properly? It's 2016, just follow the hashbang...
</rant>
Am I doing something wrong? Is there a better way to get S3 to serve index.html when it doesn't recognize the path?
Setting up a node server to handle these paths isn't the correct solution because that defeats the entire purpose of having a CDN.
In this thread Michael Jackson, top contributor to React Router, says "Thankfully hashbang is no longer in widespread use." How would you change my set up to not use the hashbang?
You can also check out this trick. You need to setup cloudfront distribution and then alter 404 behaviour in "Error Pages" section of your distribution. That way you can again domain.com/foo/bar links :)
I know this has been a few months old, but for anyone that came across the same problem, you can simply specify "index.html" as the error document in S3. Error document property can be found under bucket Properties => static Website Hosting => Enable website hosting.
Please keep in mind that, taking this approach means you will be responsible for handling Http errors like 404 in your own application along with other http errors.
The Hash bang is not recommended when you want to make SEO friendly website, even if its indexed in Google, the page will display only a little and thin content.
The best way to do your website is by using the latest trend and techniques which is "Progressive web enhancement" search for it on Google and you will find many articles about it.
Mainly you should do a separate link for each page, and when the user clicks on any page he will be redirected to this page using any effect you want or even if it single page website.
In this case, Google will have a unique link for each page and the user will have the fancy effect and the great UX.
EX:
Contact Us
I am using a static site generator for my site, that means my entire site is static. All my resources and HTML files are referenced with the domain name prefixed, so that the CDN could be used.
But due to SEO concerns I disabled non-www access and redirect those to the www.domain.com variant. But now I cannot use a CDN apparently, because the origin server needs to be different from the supername.
Can a CDN be used for HTML files?
How can I deliver content through www.domain.com and use a CDN?
Can I give the CDN access to static.domain.com an an origin server, but deny access to other clients? Seems clumsy!
Any ideas?
Using Apache2.2 trying to use Level 3 CDN through my hosting company's site
depending what you are able to set on the CDN via your hosting company, the best way would be to override the host header on the CDN settings.
So, first let's look at your DNS settings:
www should point to the CDN
origin should point to your web server.
Now, on the CDN you set your origin to origin.yourdomain.com and add (I can't tell you if this is possible in your setup) a "http host header override" to www.yourdomain.com. In some cases it's implemented the other way around, so you would "force IP-Host" to origin.yourdomain.com.
In both cases, what you want to achieve is this:
when an end user requests www.yourdomain.com , it is resolved to the CDN
The CDN needs to fetch the content from your server, so it establishes a session on port 80 (assuming HTTP) to origin.yourdomain.com
Once the port is open, the CDN sends (amongst others) a HTTP Host-Header with www.yourdomain.com (this is the name based virtual host APache is seeing and evaluating).
That way you can set up your web server in exactly the same way as you would without a CDN.
So I have a site http://www.example.com.
The JS/CSS/Images are served from a CDN - http://xxxx.cloudfront.net OR http://cdn.example.com; they are both the same things. Now the CDN just serves any type of file, including my PHP pages. Google somehow got crawling that CDN site as well; two site actually - from cdn.example.com AND from http://xxxx.cloudfront.net. Considering
I am NOT trying set up a subdomain OR a mirror site. If that happens, that is a side affect of me trying to set up a CDN.
CDN is some web server, not necessarily an Apache. I do not know what type of server would that be.
There is no request processing on CDN. it just fetches things from origin server. I think, you cannot put custom files out there on the CDN; it just fetches things from the origin server. Whatever you need to put on the CDN comes from the origin server.
How do I prevent the crawling of PHP pages?
Should I allow crawling of images from cdn.example.com OR from example.com? The links to images inside the HTML are all to cdn.example.com. If I allow crawling of images only from example.com, then there is practically nothing to crawl - there are no links to such images. If I allow crawling of images from cdn.example.com, then does it not leak away the SEO benefits?
Some alternatives that I considered, based on stackoverflow answers:
Write custom robot_cdn.txt and serve that custom robots_cdn.txt based on HTTP_HOST. This is as per many answers on the stack overflow.
Serve a new robots.txt from subdomain. As I explained above, I do not think that CDN can be treated like a subdomain.
Do 301 redirects when HTTP_HOST is cdn.example.com to www.example.com
Suggestions?
Questions related to this, e.g. How Disallow a mirror site (on sub-domain) using robots.txt?
You can put robots.txt in your root directory so that it will be served with cdn.-yourdomain-.com/robots.txt. In this robots.txt you can disallow all the crawlers with the below setting
User-agent: *
Disallow: /
I have an ASP web site that give a warning to visitors with red x (in chrome) and FireFox not verified when they try to login. see the picture
Please advise what it means and what I should do
thanks
When a page is loaded via an HTTPS URL, the browser security model states that all resources referenced by that page should also be HTTPS URLs. Check your page for references to JavaScript, CSS, JPGs, etc. All of them should be using HTTPS when the main page is loaded by HTTPS.
If you have JavaScript that is dynamically loading content with XHR, you need to make sure the URLs you load match the scheme (HTTP or HTTPS) of the main page. This is particularly important for JavaScript that is intended to be reused on multiple HTML pages, some which are loaded via HTTP and some with are loaded via HTTPS.