Cloudflare Pages .html redirect causes Redirection Error for Google indexing - seo

It seems that if I have in my page links that use the .html extension, Google Search Console's crawling or live test will fail to retrieve those resources, with a Redirection Error message under Page resources: x/x couldn't be loaded, even though a client browser can get the results just fine.
Side note: If those .html URLs are tested directly for a live URL test, they show they can be retrieved and can be indexed.
Is there any solution to this besides removing all the .html path extensions in production? Keeping the extensions is for the local testing Python server.

Related

URL with trailing slash does not load scripts and CSS in PHP site

what is the difference between these two URLs:
abc.com/company-profile.php
abc.com/company-profile.php/
When I directly open the first one (without trailing slash) the page open perfectly.
But when search it in Google I get the second link (with trailing slash) and when I click on this link, the page only shows code, no script or css included.
After clicking on any style.css, the page shows like abc.com/company-profile.php/css/style.css.
What is happening?
The cause can be that Google indexes your website with a sitemap.xml file having the company-profile.php/ url and not company-profile.php.
And unfortunately your website doesn't manage well the trailing slash at the end of the url (you can fix this with updating .htaccess file)
You have to check the sitemap generated and downloaded by Google.
The url's sitemap name differs according you are using Wordpress, Drupal or another CMS, or a custom website...

Has google changed crawlers in a way that could lead to the 404 growth?

Since yesterday i'm seeing growing number of 404 errors on our website. It is very strange because we don't have those pages which are reported as missing. Also we didn't released any code changes on that day.
Google Webmaster tool is reporting those errors, but when I look into the pages which are linking to the missing urls - there is no a such links. Could this be a Google Crawlers issue?
404 URL:
http://www.justanswer.co.uk/boat/home-improvement/homework/writing
Linked from:
http://www.justanswer.co.uk/boat/home-improvement/homework
http://www.justanswer.co.uk/boat/home-improvement/hvac
It seems that You have CORS issues doing cross-domain javascript.
https://www.facebook.com/connect/ping?client_id=172525162793917&domain=www.justanswer.co.uk&origin=1&redirect_uri=http%3A%2F%2Fstaticxx.facebook.com%2Fconnect%2Fxd_arbiter.php%3Fversion%3D42%23cb%3Df316e5bca883b5%26domain%3Dwww.justanswer.co.uk%26origin%3Dhttp%253A%252F%252Fwww.justanswer.co.uk%252Ff50e0366c05c14%26relation%3Dparent&response_type=token%2Csigned_request%2Ccode&sdk=joey
is saying that
Given URL is not allowed by the Application configuration: One or more of the given URLs is not allowed by the App's settings. It must match the Website URL or Canvas URL, or the domain must be a subdomain of one of the App's domains.

React Router + AWS Backend, how to SEO

I am using React and React Router in my single page web application. Since I'm doing client side rendering, I'd like to serve all of my static files (HTML, CSS, JS) with a CDN. I'm using Amazon S3 to host the files and Amazon CloudFront as the CDN.
When the user requests /css/styles.css, the file exists so S3 serves it.
When the user requests /foo/bar, this is a dynamic URL so S3 adds a hashbang: /#!/foo/bar. This will serve index.html. On my client side I remove the hashbang so my URLs are pretty.
This all works great for 100% of my users.
All static files are served through a CDN
A dynamic URL will be routed to /#!/{...} which serves index.html (my single page application)
My client side removes the hashbang so the URLs are pretty again
The problem
The problem is that Google won't crawl my website. Here's why:
Google requests /
They see a bunch of links, e.g. to /foo/bar
Google requests /foo/bar
They get redirected to /#!/foo/bar (302 Found)
They remove the hashbang and request /
Why is the hashbang being removed? My app works great for 100% of my users so why do I need to redesign it in such a way just to get Google to crawl it properly? It's 2016, just follow the hashbang...
</rant>
Am I doing something wrong? Is there a better way to get S3 to serve index.html when it doesn't recognize the path?
Setting up a node server to handle these paths isn't the correct solution because that defeats the entire purpose of having a CDN.
In this thread Michael Jackson, top contributor to React Router, says "Thankfully hashbang is no longer in widespread use." How would you change my set up to not use the hashbang?
You can also check out this trick. You need to setup cloudfront distribution and then alter 404 behaviour in "Error Pages" section of your distribution. That way you can again domain.com/foo/bar links :)
I know this has been a few months old, but for anyone that came across the same problem, you can simply specify "index.html" as the error document in S3. Error document property can be found under bucket Properties => static Website Hosting => Enable website hosting.
Please keep in mind that, taking this approach means you will be responsible for handling Http errors like 404 in your own application along with other http errors.
The Hash bang is not recommended when you want to make SEO friendly website, even if its indexed in Google, the page will display only a little and thin content.
The best way to do your website is by using the latest trend and techniques which is "Progressive web enhancement" search for it on Google and you will find many articles about it.
Mainly you should do a separate link for each page, and when the user clicks on any page he will be redirected to this page using any effect you want or even if it single page website.
In this case, Google will have a unique link for each page and the user will have the fancy effect and the great UX.
EX:
Contact Us

How to Avoid a Mixed-Content Error When Displaying a Search Result?

Question:
How can I include both https: and http: results from a single domain in a Google custom search engine but display any such result in an iframe with a secure parent window?
How It's Structured:
My Google custom search engine currently searches "mydomainname.com/directory/" with the option to "Include all pages whose address contains this URL". It operates on a specific page of the website to search pages within the specified directory. The Link Target set in Websearch Settings is an iframe on the same page as the search bar.
The browser window and the iframe src are both on the same secure domain. And since the search results are all from a directory within the site structure, are all on this same domain as well.
Currently some results appear as "https://..." and some appear "www...". Obviously, this creates a mixed-content error when the browser window is https:// and an attempt is made to display a http:// search result in the iframe.
The results that are http:// will, of course, also work as https:// urls. I do not know what makes a page or file appear in the search results as "www." or "https://" when they all originate from a single secure domain.
The "http://" results appear even if I specify the site to be searched as https://www.mydomainname.com/directory/. I don't want to exclude these results, but I want them to be able to be displayed when browsing the site securely.
The Objective:
So the bottom-line rule that I need to work around is that insecure pages or files cannot be loaded into an iframe on a secure web page. I obviously want users to be able to utilize the https:// site but then I need the search to function in such a way that allows for all possible search results for these users.
The reason I need the results' target to be this iframe is that this is the frame that displays all the content of the web page. The search results work in harmony with the organization of other information. Such that choosing a link from a category in the page's navigation and choosing a search result from the custom search result display the chosen content into the same location, the iframe.
What I've Tried:
I've tried designating https:// specifically in the Google Search Engine (gse) settings and removing : 'http' from the script line gcse.src =(document.location.protocol == 'https:' ? 'https:' : 'http:') + '//cse.google.com/cse.js?cx=' + cx;.
I looked in the script file that it's linking to: http://cse.google.com/cse.js?cx=012685392925564329750:ghl2znnfada but I can't decipher what might need to be changed in it.
In the error log on the console I don't see much to be relevant except for the expected inability to load insecure pages while browsing securely. But there is this that looks like (maybe) it's relevant? though I could be completely wrong because I can't really decipher it either:
Mixed Content: The page at
'https://mydomainname.com/directory/index.php' was loaded over HTTPS,
but requested an insecure script 'http://www.google.com/jsapi?
key=ABQIAAAAdCtw6Xq1Q31YAr7VSQOSvxS5g7WKqCWUBuUdhz3-
rUOumR2saRSPGvey2WjYALW7f5_JzakSL3lAEg'. This request has been blocked;
the content must be served over HTTPS.
Insecure Script from Error Message:
http://www.google.com/jsapi?key=ABQIAAAAdCtw6Xq1Q31YAr7VSQOSvxS5g7WKqCWUBuUdhz3-rUOumR2saRSPGvey2WjYALW7f5_JzakSL3lAEg
Proposed Paths to a Solution:
I am open to any solution methods that may be possible. I have considered several routes but am not sure how to properly execute them or have failed in my attempts to execute them.
Some solutions I thought may work are:
Show all results as https:// links (without excluding any) so that they can be accessed whether on a secure connection to the site or not.
Redirect any links clicked without https:// to be loaded into the iframe as https://
Change something about the pages and files on the server so that they only appear in the search results as https://
Change something about Google's search engine script so it parses all found results as https://
Somehow show links as http:// if browsing non-secure, and https:// if browsing secure *
*I don't know how viable or efficient this would be
The most robust solution is to migrate all your website in https :
use 301 (permanent) redirect from http to https
and activate HSTS (if possible with includeSubdomains)
Google will take a little time to update his index but the HSTS will automatically replace http by https so you should avoid any mixed content issues.

SSL certificate warning in asp site

I have an ASP web site that give a warning to visitors with red x (in chrome) and FireFox not verified when they try to login. see the picture
Please advise what it means and what I should do
thanks
When a page is loaded via an HTTPS URL, the browser security model states that all resources referenced by that page should also be HTTPS URLs. Check your page for references to JavaScript, CSS, JPGs, etc. All of them should be using HTTPS when the main page is loaded by HTTPS.
If you have JavaScript that is dynamically loading content with XHR, you need to make sure the URLs you load match the scheme (HTTP or HTTPS) of the main page. This is particularly important for JavaScript that is intended to be reused on multiple HTML pages, some which are loaded via HTTP and some with are loaded via HTTPS.