Google 404 soft error on index page that is working fine - indexing

A friend of mine has been having trouble getting her site indexed by google and asked me to have a look, but that is not something I really know much about and was hoping for some assistance.
Looking at her search console, google crawl shows an error of soft-404 on the index page. I marked this as fixed a few times, because the site looks fine to me but it keeps coming back.
If I fetch the site as google it seems to be working fine, although it is showing the mobile version instead of the desktop.
It keeps giving another reoccurring 404 of a page http://www.smeyan.com/new-page, which doesn't exist anywhere I can see including server files or sitemaps.
Here is what I know about this site:
It used to be a wix site and was moved to a host gator shared server 2-3 months ago.
It's using JavaScript/jQuery .load to get page content outside the index.html template.
It has 2 sitemaps one for the URLs and one for both URLs and images
http://www.smeyan.com/sitemap_url.xml http://www.smeyan.com/sitemap.xml
It has been about 2 months since it was submitted for indexing and google has not indexed any of the content when you search for site:www.smeyan.com it shows some old stuff from the wix server. Although search console says it has 172 images indexed.
it has www. as a preference set in search console.
Has anyone experienced this and has an direction for a fix?

How long time was set for this site in Cache-Control header? If long, you should use "google removals" for obsolete snippets and cache. I simulated Google visit on your webpage. Correct 404 return code. Correct headers. Thus. Report google removals for "not found" pages. You must request visit of Googlebot and keep calm and wait for reaction.
BTW: For permanently removed content use 410 Gone for Google or... report via Removals.
https://support.google.com/webmasters/answer/1663419?hl=en

The only download error that I saw while using Chrome's Inspect function pertains to a SCRIPT tag with a Facebook url as the source (src) file.
This is the error as reported by Inspect.
This is the SCRIPT tag that caused the error.
I am not sure that this is the cause of the reoccurring 404 error, but it is an issue that needs attention on this website.

I checked your site with Tor Browser which has... DISABLED SCRIPTS. You should provide any content on your site with use of <noscript/> tag. It doesn't have to be beautiful but should be visible for bots. <a href... ></a>, <img/> etc. and... TEXT. Without it the site is NOT OPTIMIZED for search bots. Read about SEO. The sitemap content can be never indexed if the content will be never linked.
Probably your webpage also doesn't meet requirements for screen readers (for blind people).
Note: The image with "SMEYAN" caption is visible on webpage and is indexed.
second image on the webpage (in source): <img class="gallery-full-image" src="./galleries/home_gallery/smeyan_home-1.jpg" /> and indexed
The menu also doesn't work without scripts.
I thought the step is good implemented.
Please use <noscript/> element and implement version for blind people (without scripts, provide alt tag for images) and for noscript browsers. You can test it via disabling script or via NOSCRIPT extension for Firefox.
BTW. You should use HTML, CSS (including animations) and... use the JS ONLY if it is needed. Or... <noscript/> method.

Google bot currently use web rendering service (WRS) that is based on old Chrome 41 (M41), so it may fail where browsers succeed.
To learn how google boot works read this.
Add this code to the page to see the real error.
You can see the error using Url Inspector live, from google search console. It will show at more info tab.
Note: if the bot gets a 301 code or if the page is too little to have significant content it will return a soft 404 error, and won't preview or show any other error.

Related

CloudFlare failed to purge dynamic content to a blog post

I run my blog that linked with Cloudflare when I create a new post I don't see the current post that affected my site. I tried many methods that I learned from sites like trying to use a page rule to bypass Cloudflare cache but it didn't work. Also, I turn it off auto minify js, CSS, and Html still does not work. my blog still shows the oldest post that is from since 5 days ago. When you log in WordPress dashboard panel you will see the current posts but for the normal visit will see the cached posts that remain static all time
here is my Cloudflare setting
Page Rules Settings
Page Speed Settings
Page Caching Settings
I need your help from everybody who knows about this problem and how we can solve it
Thanks....!
Looking at the site now, looks like you have maybe disabled Cloudflare? I'm seeing the latest posts, and there are no Cloudflare response headers coming back.
For troubleshooting caching issues, one of the most useful things you can do is to inspect the response headers (using Chrome Dev Tools or similar 'Network' tab). First step is to identify which request is responsible for the cached content (a document, or an AJAX call, etc.)
From there, you can look at the response headers to see why it is behaving this way, specifically you'll want to check the Cache-Control and CF-Cache-Status header. More info here - https://developers.cloudflare.com/cache/about/default-cache-behavior
I fixed this problem because all performance is managed by Ezoic. I tried to purge everything on Ezoic and all things are working perfectly.

Google AppEngine API Explorer redirects and lists no URLs

I'm having an unending issue trying to use the AppEngine API explorer with the stupidly simple helloworld example.
When trying to navigate to the url to explore the API my Chrome browser redirects to HTTPS from the default HTTP and no API's are listed. I have gone through every possible fix I can find (Like this, and all of these) and none are working reliably.
What's the most infuriating is I have gotten the API listed TWICE but now no longer displays with any of the methods below.
The setup I had when it worked the first time:
Chrome launched with "C:\Program Files (x86)\Google\Chrome\Application\Chrome.exe" --unsafely-treat-insecure-origin-as-secure=http://localhost:8080 (As per the tutorial)
The url being: (http://)apis-explorer.appspot.com/apis-explorer/?base=http://localhost:8080/_ah/api&root=http://localhost:8080/_ah/api#p/
The second time it worked was using also using the above URL but lasted only a second before being redirected to HTTPS and not listing anything.
Some specifics:
Windows 10 OS.
Every time the page loads I get the "The API you are exploring is hosted over HTTP, which can cause problems. Learn how to use Explorer with a local HTTP API." message, even the times the API displayed correctly.
Every time I now load any of the API Explorer URLs I get redirected to HTTPS, and nothing is listed. Also the URL is escaped (%3A instead of ':'). Not sure if it's important but the first time it worked the URL was HTTP and NOT escaped.
I have tried the shield in the search bar and enabling Load unsafe scripts ( from here).
Tried launching Chrome as usual and with the flags --unsafely-treat-insecure-origin-as-secure=http://localhost:8080 and/or --allow-running-insecure-content (from this answer).
Tried http://localhost:8080/_ah/api/explorer
Tried http://apis-explorer.appspot.com/apis-explorer/?base=http://localhost:8080/_ah/api#p/
http://localhost:8080/_ah/admin works correctly and shows the Admin console every time.
Since the API's being listed once I haven't touched the project code, but restarted the server, Chrome, and tried different URLs on more occasions than I care to count.
I also tried accessing the API URL directly as explained in this answer but cannot find the correct URL to access the helloworld /sayHi endpoint. Maybe someone can help me work out what I need to prefix it with as all of the variations I try give me a 404.
Any help would be a very very appreciated.

Firefox: "Some parts of this page are not secure, such as images." What counts as insecure?

I use the browser Firefox, and sometimes, on certain webpages, the SSL icon says "Some parts of this page are not secure, such as images." What, exactly, counts as an insecure element?
Thanks!
Anything that is delivered over an insecure channel.
What this generally means is that the developer of the web page is combining HTTP-based URLs with HTTPS-based URLs in the same page. The URLs could be for images as well as JavaScript, CSS, or anything else that can be referenced from a web page. As a user, there's not much you can do about this -- it's a warning that there is a possibility that your data could be delivered to other servers in an open, unencrypted manner over the Internet. This is a Bad Thing, but you can't do much except avoid that site, or contact the support or webmaster for the site.
If you're the developer, most of the time you can use a scheme-relative URLs when referencing images or javascript, etc.
i.e. Instead of this:
<img src="http://example.com/dot.png">
use this:
<img src="//example.com/dot.png">
YMMV.
See also: https://url.spec.whatwg.org/
In firefox you can see in Inspect Element=>Network Tab=>Domain Columns.
And also please check in Console tab too.
I hope it will solve your problem.
"Insecure" simply means "not loaded via HTTPS".
This is insecure:
<img class="media-object" src="http://placehold.it/50x50">
This is secure:
<img class="media-object" src="https://placehold.it/50x50">
"Some parts of this page are not secure, such as images."
means not all content are are loading with secure https you can use this online tool to determine which resource is loading with http whynopadlock
This is My Solution
To resolve this issue make sure that the page code does not pull data directly from a non-secure URL.
View the page source html code to check for non-secure items. This can be done in a web browser by doing a right click and selecting 'view source'.
To identify non-secure elements view the source code of the page and search for the text src="http://
This will then highlight elements on your page being loaded from a non-secure URL.
The source code (HTML) needs to be checked for NON SECURE tags. (i.e. http://www.symantec.com/images/seals/Secure...) Ensure the following references are changed to HTTPS or a virtual directory.
Note: The webmaster should always be consulted prior to any adjustments made to a web site.
Thank You and I hope this will Help out
For wordpress users that can't find any 'http:' in the page source check if you have a favicon set. Wordpress will default to their W icon (w-logo-blue.png) and I've had a couple sites continue to serve it from http even after fully converting to ssl.
Dashboard -> Appearance -> Customize -> Site Identity -> add a site icon
FOR IMAGES
First of all, check whether your image file has HTTP instead of HTTPS if so change it to https or rather save those images and put in in the Server.
For instance,
<img src="http://example.com/images/image.jpg">(http image source)
to
<img src="https://example.com/images/image.jpg">(https image source)or into
<img src="//example.com/images/image.jpg">(server image)
Firefox throws an error when you have mixed active content. This is having a combination of HTTP and HTTPs requests; it's a security issue as it leaves room for man in the middle attack- intercepting HTTP content requests with malicious or unwanted requests.
Tip: Check all the following urls in your active content:
< script >
< link >
< iframe >
< XMLHttpRequest >
fetch()
urls in CSS (#font-face,cursor,background.image)
< object >
Navigator.sendBeacon (look for url)
Another tip: make sure to check all files (css gets overlooked)

Url rewrite for an existing website

I hope this is not a very stupid question. I have to rewrite the url's of an existing website. I have never done that before, so I'm a little bit confused. I have allready written some rules that make a page like faq.php be accessible also by entering only faq without the .php extension, but the page fap.php it's still accessible and in the footer when I move the mouse over the faq icon the link shown in the browser is faq.php. To make the browser display faq and not faq.php do I have to change the href link in the page manually or there is another way?
For example: in this website(stackoverflow) when you move the mouse over questions, tags, users or other links, the browser displays only /questions (or the others) without .php(or whatever they used to built the website). How can I achieve that?
What you see in the status bar is what your server sends to the browser.
To remove .php from faq.php you must either
send only faq (PHP) or
replace the contents of href attributes on the client side (JavaScript).

why are some pages in my start directory not showing in my site map?

If i generate a site map at http://www.xml-sitemaps.com/, for aromapersona.com, I get 9 pages, however there are a bunch more pages that should show up. For example, aromapersona.com/candle_holder is in the same "front" directory as the other 9 pages, but doesnt generate in sitemap. Is this because no other pages on my site link to it? Im trying to get these other URLs indexed, and I even edited the site map to include this URL as well as others and submitted to google via webmaster tools, and still nothing. Advice?
I'm not familiar with aromapersona.com but it will only be able to list pages that are linked to from the initial page you give it (or ones they link to) unless you provide the site with FTP access (which I presume you dont).
If you include the URL's in your sitemap for goggle it should eventually list them, but linking to them from other parts of your site is probably the most effective.
I have not checked the website, but do also take the cause is not because of noindex, nofollow, robots.txt, javascript links, mixing http/https etc.
In clear wording: There is no link pointing to the subpage "candle_holder", hence the XML site generator (which works by following links on your site) cannot detect it.
You can add it manually to the XML, but then again, it should be accessible from the site directly.