Why won't my sitemap validate after following Google's advice for hreflang elements? - seo

After adding multilingual support to a site I've followed Google advice for their XML sitemap to incorporate hreflang links to alternate language versions.
However, after doing this I'm finding various online validators (such as this one) are returning validation errors along these lines:
Element '{http://www.w3.org/1999/xhtml}link': No matching global
element declaration available, but demanded by the strict wildcard
A sanitised extract of my sitemap is below. I've preserved the actual formatting (not neatly indented due to how the framework renders it) in case that matters:
<?xml version="1.0" encoding="UTF-8" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
http://www.w3.org/1999/xhtml
http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd"><url>
<loc>http://dev.domain.com/template/en-GB/</loc>
<lastmod>2017-02-17</lastmod>
<xhtml:link rel="alternate" hreflang="en-GB" href="http://dev.domain.com/template/en-gb/" /><xhtml:link rel="alternate" hreflang="de" href="http://dev.domain.com/template/de-de/" /><xhtml:link rel="alternate" hreflang="en-PH" href="http://dev.domain.com/template/en-gb/" /><xhtml:link rel="alternate" hreflang="cs-CZ" href="http://dev.domain.com/cz/cs-cz/about-domain/thing" /><xhtml:link rel="alternate" hreflang="af-ZA" href="http://dev.domain.com/cz/cs-cz/about-domain/thing" /><xhtml:link rel="alternate" hreflang="ar-QA" href="http://dev.domain.com/cz/cs-cz/about-domain/thing" /><xhtml:link rel="alternate" hreflang="de-DE" href="http://dev.domain.com/de/de-de/applications" /><xhtml:link rel="alternate" hreflang="x-default" href="http://dev.domain.com/template/en-gb/" />
</url><url>
<loc>http://dev.domain.com/template/en-gb/about-domain-en</loc>
<lastmod>2017-02-28</lastmod>
<xhtml:link rel="alternate" hreflang="en" href="http://dev.domain.com/template/en-gb/about-domain-en" /><xhtml:link rel="alternate" hreflang="de-DE" href="http://dev.domain.com/template/de-de/uber-domain" /><xhtml:link rel="alternate" hreflang="en-PH" href="http://dev.domain.com/template/en-gb/about-domain-en" />
</url><url>
<loc>http://dev.domain.com/template/en-gb/about-domain/thing-en</loc>
<lastmod>2016-10-20</lastmod>
<xhtml:link rel="alternate" hreflang="en-GB" href="http://dev.domain.com/template/en-gb/about-domain/thing-en" /><xhtml:link rel="alternate" hreflang="de-DE" href="http://dev.domain.com/template/de-de/uber-domain/thing" /><xhtml:link rel="alternate" hreflang="en-PH" href="http://dev.domain.com/template/en-gb/about-domain/thing-en" />
</url></urlset>
I can't see any issues with the XML and it looks to follow Google advice precisely. Am I safe to deploy this to production, or is Google going to penalise until I work out what the problem is here?

Sitemap only helps Google to understand how to crawl your page, in any case is going to penalize. In the case you dont have set it properly is not going to reduce your site current position in SERPs, but it can make some URLs that google did not know not to be indexed as long as they are not linked from anywhere.
Different is that the content of your URLs is not Google best practices compliant and then you get penalty from there.

Related

SEO - hreflang "no return tags"

I am recently having an increased number of "hreflang no return tags" error on the Google webmaster console and I cannot figure out what I am missing. My site is www.example.com and it can be accessed in different languages as www.example.com/#!/xx, where xx is one of the following options: it, ro, ru, pt, en, es, fr.
My code snippet looks like:
<link view-head rel="alternate" hreflang="x-default" href="{{domain_absolute}}#!/{{mainVars.currentLanguage}}/--about-us" />
<link view-head rel="alternate" hreflang="es" href="{{domain_absolute}}#!/es/--about-us" />
<link view-head rel="alternate" hreflang="pt" href="{{domain_absolute}}#!/pt/--about-us" />
<link view-head rel="alternate" hreflang="ro" href="{{domain_absolute}}#!/ro/--about-us" />
<link view-head rel="alternate" hreflang="ru" href="{{domain_absolute}}#!/ru/--about-us" />
<link view-head rel="alternate" hreflang="en" href="{{domain_absolute}}#!/en/--about-us" />
<link view-head rel="alternate" hreflang="it" href="{{domain_absolute}}#!/it/--about-us" />
<link view-head rel="alternate" hreflang="fr" href="{{domain_absolute}}#!/fr/--about-us" />
And the errors I get from Google are the following:
Original URL : #!/en/some-document
Alternate URLs: http://www.example.com/?_escaped_fragment_=/en and http://www.example.com/?_escaped_fragment_=/en/some-document - no return tags
I get the same errors for all of the supported languages.
What am I doing wrong?
# is a special character. Everything in a URL after the # is ignored.
#! is a special case. If you use #! then Google treats it as a signal to convert it to a different URL for AJAX crawling. This "feature" is deprecated. And your URL structure tells me you are not using #! in this way anyway.
So bottomline: change your URLs so that you are not using #. Give all languages their own URL.
While deprecated, Google is understanding the #! and converting the URL to the escaped fragment version. Since Google is telling you they cannot find the return on the /?_escaped_fragment_ version of the URL this is telling me that your rewrite of the URL to respond to their request is missing the HREFLang Element.
View the source on http://www.example.com/?_escaped_fragment_=/en/some-document and if you don't see the HREFLang string you show in the screen capture that is your problem. Ensure you have it on both versions and YES you should cover it to include the escaped fragment version of the URL and that error will go away.
The alternative I use for sites with more than 5 country/language combinations is to use XML site maps and submit the excepted fragment version.

SEO - Canonical url and multilanguage site

Our website is targeting several languages and countries. We make the choice to use subdomain to manage our URLs. We want to avoid to create duplicate content and canonical issues.The content for www. and en. is identical but we plan to adapt the content for en. (in order to target UK).
For main domain, google is understanding:
<link href="https://www.example.com/" hreflang="en" rel="canonical" data-trid="15">
<link href="https://example.com/" hreflang="x-default" rel="alternate" data-trid="16">
<link href="https://example.com/" hreflang="en" rel="alternate" data-trid="17">
<link href="https://en.example.com/" hreflang="en-gb" rel="alternate" data-trid="18">
<link href="https://fr.example.com/" hreflang="fr" rel="alternate" data-trid="19">
<link href="https://de.example.com/" hreflang="de" rel="alternate" data-trid="20">
For english subdomain:
<link href="https://en.example.com/" hreflang="en-gb" rel="canonical">
<link href="https://example.com/" hreflang="x-default" rel="alternate">
<link href="https://example.com/" hreflang="en" rel="alternate">
<link href="https://en.example.com/" hreflang="en-gb" rel="alternate">
<link href="https://fr.example.com/" hreflang="fr" rel="alternate">
<link href="https://de.example.com/" hreflang="de" rel="alternate">
What is the best practice to avoid any canonical & duplicate content issues?
Thanks for your help!
Common Mistakes
Important: Make sure that your provided hreflang value is actually valid. Take special care in regard to the two most common mistakes:
Missing confirmation links: If page A links to page B, page B must link back to page A. If this is not the case for all pages that use hreflang annotations, those annotations may be ignored or not interpreted correctly.
Incorrect language codes: Make sure that all language codes you use identify the language (in ISO 639-1 format) and optionally the region (in ISO 3166-1 Alpha 2 format) of an alternate URL. Specifying the region alone is not valid. Via / Read:
https://support.google.com/webmasters/answer/189077?hl=en
The best way to avoid duplicates is to create a unique content for each version, if you can't add content now you have to block these pages or sub-domains by robots.txt or by adding a canonical link to the original page which is en.example.com in your case.
when you have content for each version, remove all these changes and make google able to index them.

What should be the name of the sitemap file for Google SEO?

I created a sitemap for my website that contains the below code:
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://www.example.com/</loc>
</url>
<url>
<loc>http://www.example.com/aboutus.html</loc>
</url>
<url>
<loc>http://www.example.com/contactus.html</loc>
</url>
<url>
<loc>http://www.example.com/careers.html</loc>
</url>
<url>
<loc>http://www.example.com/terms.html</loc>
</url>
</urlset>
My doubt is that what should be the name of the file that contains this code so that Google could find my sitemap file?
There is no fixed name/location defined.
You can tell search engines where your sitemap is located by
linking it from your robots.txt,
sending an HTTP request to each search engine, or
submitting it to each search engine.
Of course the easiest way is to include the URL in your robots.txt, as you only have to do this one time, instead of contacting each search engine separately. And note that not only search engines are interested in your sitemap, so including it in your robots.txt allows other consumers to find it also.
Simply add this line, containing the full URL to your sitemap, in your robots.txt:
Sitemap: http://www.example.com/your-sitemap-file.xml
basically the site map file name is if XML then /sitemap.xml
but its not any issue if you use any other name but we have to add this to webmaster.

Sitemap with alternate langs urls, but same images

I'm trying generate the xml sitemap for a ecommerce website. Is a multilanguage project and the product pages are in multiple languages, but same product images are the same. Reading Google post for multilang sitemaps and here it seems that I must do this:
<url>
<loc>http://www.example.com/en/product-1</loc>
<xhtml:link rel="alternate" hreflang="de" href="http://www.example.com/de/product-1" />
<xhtml:link rel="alternate" hreflang="en" href="http://www.example.com/en/product-1" />
<image:image>
<image:loc>http://www.example.com/image1.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://www.example.com/image2.jpg</image:loc>
</image:image>
</url>
<url>
<loc>http://www.example.com/de/product-1</loc>
<xhtml:link rel="alternate" hreflang="de" href="http://www.example.com/de/product-1" />
<xhtml:link rel="alternate" hreflang="en" href="http://www.example.com/en/product-1" />
<image:image>
<image:loc>http://www.example.com/image1.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://www.example.com/image2.jpg</image:loc>
</image:image>
So my question is, I must repeat images in all urls if they are always the same??
Thanks in advance!
If an image is surrounded by German text, it will rank in German. Same for other languages.
If your pages point to those images, Google will find them and rank them properly. So the answer to your question would be no.

Multi-language site - one domain or multiple domains?

Which one is the best way to contruct a multi-language site if you want the most effective SEO?
Use a single domain?
www.domain.com/en
www.domain.com/de
www.domain.com/dk
Or use multiple domains, one for each language?
www.domain.com
www.domain.de
www.domain.dk
If you have multi-lingual content you should follow Google's new multi-lingual guidelines. Basically, you use subdomains for the different translations:
To explain how it works, let’s look at some example URLs:
http://www.example.com/ - contains the general homepage of a website, in Spanish
http://es-es.example.com/ - is the version for users in Spain, in Spanish
http://es-mx.example.com/ - is the version for users in Mexico, in Spanish
http://en.example.com/ - is the generic English language version
On all of these pages, we could use the following markup to specify language and optionally the region:
<link rel="alternate" hreflang="es" href="http://www.example.com/" />
<link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" />
<link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" />
<link rel="alternate" hreflang="en" href="http://en.example.com/" />
If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.