Testing that page is not 404 page with Cypress and Gatsby - testing

I have a Gatsby site. If I hit a url that doesn't exist, Gatsby serves up a 404 page, however it doesn't change the URL.
I am testing this site using Cypress. Cypress's recommended way of testing navigation within a site is the use of location, however in this instance, checking the pathname of the page that was navigated to is not reliable, as if the page doesn't exist, it will still have the same pathname as if it did. For example if I get Cypress to cy.click() a link with an href of /incorrect-url/, and test its pathname, I would get a passing test, even though the page that loaded was the 404 page, not the page I was expecting.
I know I could test that elements I am expecting are present on the page I've navigated to, but I'd prefer a reliable way to know if the page 404d (Gatsby returns the page with a 404 status code).
To summarise:
checking the Location / pathname is not a reliable way of testing that a specific page has loaded
I don't want to check for elements on the page as a way of verifying
Surely there is way of verifying that the page loaded without a 404 status code.
How can I reliably check that the page navigated to was not the 404 page.

I know the question was asked 2 months ago, so hopefully you've already found a way to validate the status, but as a future reference, you can do a request to your link's href attribute. You will then have access to the status code.
cy.get(SELECTOR FOR YOUR LINK).then((link) => cy.request(link.prop('href')).its('status').should('eq', 200));

Related

Google 404 soft error on index page that is working fine

A friend of mine has been having trouble getting her site indexed by google and asked me to have a look, but that is not something I really know much about and was hoping for some assistance.
Looking at her search console, google crawl shows an error of soft-404 on the index page. I marked this as fixed a few times, because the site looks fine to me but it keeps coming back.
If I fetch the site as google it seems to be working fine, although it is showing the mobile version instead of the desktop.
It keeps giving another reoccurring 404 of a page http://www.smeyan.com/new-page, which doesn't exist anywhere I can see including server files or sitemaps.
Here is what I know about this site:
It used to be a wix site and was moved to a host gator shared server 2-3 months ago.
It's using JavaScript/jQuery .load to get page content outside the index.html template.
It has 2 sitemaps one for the URLs and one for both URLs and images
http://www.smeyan.com/sitemap_url.xml http://www.smeyan.com/sitemap.xml
It has been about 2 months since it was submitted for indexing and google has not indexed any of the content when you search for site:www.smeyan.com it shows some old stuff from the wix server. Although search console says it has 172 images indexed.
it has www. as a preference set in search console.
Has anyone experienced this and has an direction for a fix?
How long time was set for this site in Cache-Control header? If long, you should use "google removals" for obsolete snippets and cache. I simulated Google visit on your webpage. Correct 404 return code. Correct headers. Thus. Report google removals for "not found" pages. You must request visit of Googlebot and keep calm and wait for reaction.
BTW: For permanently removed content use 410 Gone for Google or... report via Removals.
https://support.google.com/webmasters/answer/1663419?hl=en
The only download error that I saw while using Chrome's Inspect function pertains to a SCRIPT tag with a Facebook url as the source (src) file.
This is the error as reported by Inspect.
This is the SCRIPT tag that caused the error.
I am not sure that this is the cause of the reoccurring 404 error, but it is an issue that needs attention on this website.
I checked your site with Tor Browser which has... DISABLED SCRIPTS. You should provide any content on your site with use of <noscript/> tag. It doesn't have to be beautiful but should be visible for bots. <a href... ></a>, <img/> etc. and... TEXT. Without it the site is NOT OPTIMIZED for search bots. Read about SEO. The sitemap content can be never indexed if the content will be never linked.
Probably your webpage also doesn't meet requirements for screen readers (for blind people).
Note: The image with "SMEYAN" caption is visible on webpage and is indexed.
second image on the webpage (in source): <img class="gallery-full-image" src="./galleries/home_gallery/smeyan_home-1.jpg" /> and indexed
The menu also doesn't work without scripts.
I thought the step is good implemented.
Please use <noscript/> element and implement version for blind people (without scripts, provide alt tag for images) and for noscript browsers. You can test it via disabling script or via NOSCRIPT extension for Firefox.
BTW. You should use HTML, CSS (including animations) and... use the JS ONLY if it is needed. Or... <noscript/> method.
Google bot currently use web rendering service (WRS) that is based on old Chrome 41 (M41), so it may fail where browsers succeed.
To learn how google boot works read this.
Add this code to the page to see the real error.
You can see the error using Url Inspector live, from google search console. It will show at more info tab.
Note: if the bot gets a 301 code or if the page is too little to have significant content it will return a soft 404 error, and won't preview or show any other error.

How to remove hashtag(#) from vue-router URL?

I want remove hashtag(#) from urls, but also i need to save no-reload mode. Can i do that?
I have: page.com/#/home
I want: page.com/home
I tried mode: 'history', but page reloads with it.
UPD: Is it possible to create SPA app without page reloading and with normal URLs?
When activating the history mode, you need to first configure your server according to the documentation. The reason for that is, that the history mode just changes the URL of the current page. When the user actually reloads the page, he'll get a 404 error, because the requested URL is not actually there. Reconfiguring the server to serve always the main index.html of your SPA resolves this issue.
When using a # in the URL (no history mode), the browser tries to navigate to the element with the ID, which was given after the # (within the same document). This was the original behavior of the fragment identifier. Therefore, if you add a link to your HTML with such a fragment identifier, the browser won't reload the page but actually look for the ID inside the document. The vue-router watches this change and routes you to the correct route. This is the reason it works with hashes. If you just add a regular URL to the HTML, the browser's native behavior is to actually navigate to this page (hard-link). This leads to your experienced reload effect.
The way to handle this, is, to never use regular links to route within a Vue Single-Page-Application. Use the tag <router-link> for routing between one page and another (but only within the SPA). This is the way to go, no matter if the browser allows the navigation with # without reloading or not. Here is the documentation for the recommended routing tag: link
You can also route from one route to another programmatically. Use $router.push() for that. Here is the documentation for that: link

Change link behavior for non-existent files / broken links

When users click on a link to a file (eg: csv, zip), there are two things that could happen:
If the file exists: it starts downloading (the web page does not change).
If the file does not exist: the user will be redirected to a 404 page (default or custom).
If when the file exists there's no redirection (or to be proper, the user doesn't get the idea of), why is there a redirection when the file doesn't exist?
I understand that when the browser gets a 404 response code, it needs to somehow display it to the user, but shouldn't it take into consideration the event that triggered the request? It is not the same to type an incorrect URL in the address bar than to click on a link. Wouldn't the browser showing an alert and remaining on the same page be a more "appropriate behavior"?
It would be relatively easy to create some back-end (or front-end) script that checks all the links, verifies if the files exists (or not), and then replaces the behavior accordingly. But is there something different that can be done? is it possible to (programatically) change the browser's settings so it doesn't redirect if the file doesn't exist? Or have the server return an HTTP code different than 404 (maybe in the 400s family) to cause such behavior?

Broken Link Test 404 error, but pages appear in browser

I just did an SEO test of my site http://www.photographyattic.com using seositecheckup.com. It flagged up pages with 404 errors
From 100 distinct anchor links analyzed, 72 of them seems to be broken.
These pages don't seem to be broken when I view with my browser. Example http://www.photographyattic.com/category-1
Any idea why this would be?
http://www.photographyattic.com/category-1 is sending HTTP status code 404. The page doesn’t have to be broken because of that, you can display whatever you like on 404 pages.
You should send the status code 200 instead.

SEO - 301 redirect via 404 page

I am new to this so I will try to explain myself clearly.
I am doing my 301 redirect from a custom 404 page. Now I got it working my question is more regarding how google would treat this. Cause we going to a 404 page would google just record it as a 40 page or would it actually record the 301? As i said I am new to this and have looked through google to try and find an answer to this.
Anyway any help or comment would be greatly appriciated. thanks in advance
Best practice in this case could be:
If the page doesn't exist, but we have new one, with highly similar content, we can make 301 redirection, simply saying: "Moved permanently", which is instructing Google to actually take new URL on account and prioritize it.
If the page doesn't exist, and we actually have no idea why someone could type this link, as this URL never existed and is just wrong, then we serve 404 "Not Found". It simply means that the URL is wrong, and someone (or some other website) has fooled you to follow this link. You shouldn't automatically redirect user from this page, but place a link to the homepage instead, so user can choose his action.
If the page doesn't exist, and we know that we had this page, but it doesn't exist, and it will not exist in the future as well (we has simply decided that we will no longer have this page), then serve 410 "Gone" page, with a link to homepage as well, and let user decide.
HTTP codes, are not just a theory, it's a standard we should use. I noticed, that many 404 pages are served without correct HTTP response code, which only suggests that there is a poor development behind it.
More about HTTP response codes here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
From my understanding, a 301 redirect is the best way to retain "link juice" and should be used if the 404 page is referencing has a lot of external links, has substantial traffic, etc.
Sending a generic 404 page straight to the home page is not ideal, as it may confuse the user. Allowing the 404 keeps the page from being repeatedly indexed and crawled by search engines.
Read more about it here: http://moz.com/learn/seo/http-status-codes.
it is not ok to redirect 404 page to another. it's better to correct it and show the old page. if it's impossible you should show 404 page and put some helpful links in it.
if you want to redirect to the correct one it's ok but the best way is to show display original page regardless of duplication. but you must use rel canonical to tell search engines where is the correct version on the page.
https://support.google.com/webmasters/answer/139394?hl=en