How to add hreflang tags in SAP Spartacus - spartacus-storefront

We have multiple basestores with multiple languages and want to implement the hreflang tags to all pages (if easier, it might not include cart and checkout) telling the searchengines that there are multiple versions of the same page in different languages.
We followed the guide on Automatic Context Configuration to provide the multiple basestores.
We also added the SiteContextSelectorComponent to allow the user to change the language.
At the moment all pages face the same structure:
https://company.com/us/en/c/CategoryName -> where us is the country and en the language
Other examples:
https://company.com/us/es/c/CategoryName -> us store with spanish language
https://company.com/us/de/c/CategoryName -> us store with german language
https://company.com/de/de/c/CategoryName -> de german store with german language
The hreflang tags should be generated like
<link rel="alternate" hreflang="en-us" href="https://company.com/us/en/c/CategoryName" />
<link rel="alternate" hreflang="es-us" href="https://company.com/us/es/c/CategoryName" />
<link rel="alternate" hreflang="de-us" href="https://company.com/us/de/c/CategoryName" />
<link rel="alternate" hreflang="de-de" href="https://company.com/de/en/c/CategoryName" />
<link rel="alternate" hreflang="x-default" href="https://company.com/c/CategoryName" />
Working with meta resolvers might not work, due to the fact, that the tags are link and not meta.
So what's the spartcus way to do this?
A (angular) solution might be, injecting the document and adding the tags this way. But is this really the preferred way?
And a further question would be, where's the right place to add these tags (on languageswitch? or routechange?)
Kind regards and thanks for your help,
Andreas

At the moment of writing, Spartacus doesn't have a mechanism for declaring in the <head> of the document the language-oriented alternate URLs.
However, Spartacus already implements the mechanism for resolving the canonical URL. It's based on the concept of multiple PageMetaResolvers and the core PageMetaService. Then the SeoMetaService observes the resolved meta-data for the current page and updates the document with appropriate canonical URL link, title, and other data.
You will need to write a customization for "alternate" links. And here are my recommendations how to do it:
extend the SeoMetaService and perhaps PageMetaLinkService - for phisically placing the links in the head of the document
perhaps extend the PageMetaConfig and PageMetaService for resolving specific values for your alternate links
The concrete strategy of resolving the "alternate" links might vary from customer to customer. Your case seems to be simple, since you don't have any localisation of the URL segments (e.g. /category/xxx in English vs. /categoría/xxx in Spanish). Your URLs seem to only vary in the URL prefix - the standard URL site context. Then you can derive your specific "alternate" links from the canonical URL (which Spartacus already have implemented) and replace only the language in the prefix of the URL path, e.g. using a RegExp.
Since you don't need a different strategy of resolving "alternate" links for different pages (e.g. different strategy for PDP and different for PLP), then you might "hardcode" your way of resolving "alternate" links either in your customized PageMetaService or SeoMetaService. In other words, you won't need to bother with customizing each and every existing PageMetaResolver.

Related

Google URL Parameter tool - what to exclude?

Situation: Site built on OpenCart, which utilizes faceted navigation.
Problem: Google Webmaster Tools' "URL Parameters" tool reports a huge number of URLs with parameters like "sort", "order", "limit", "search", and "page.
I would like to exclude them, but I'm worried about 2 things:
1.) Maybe there's a better way to handle this issue? Exclusion directives in robots.txt? Something else? I.e. fixing the problem on the site, before Google detects it in the first place.
2.) I don't want to accidentally exclude actual content.
So... anyone familiar with SEO and/or OpenCart, please give me a 2nd opinion on which of these parameters I should exclude, or change the settings for?
Thanks!
I'm not aware of a robots.txt option. But you might sovle this using http headers and/or html headers.
You could set <META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> for duplicate pages in HTML header (cf. http://www.robotstxt.org/meta.html).
Another approach could be to provide canonical URLs (e.g., in HTML header or HTTP headers, cf. https://yoast.com/rel-canonical/, https://support.google.com/webmasters/answer/139066 and https://support.google.com/webmasters/answer/1663744).

Using noindex and nofollow to avoid duplicate content penalization

Scenario:
I own a website with original content. But to support some categories I use creative commons licensed contents, which is, of course, duplicate content.
Question:
If I want to avoid penalization for duplicate content, are this statements true?
I should mention the original author to be a fair human being.
I must use meta noindex to avoid robots from fetching the content.
I must use cannonical url to metion the original content and it's author.
I don't need to use nofollow meta along with noindex, because it has other purposes.
I don't have to use rel="nofollow" on incoming links inside my site that point to the duplicate content, because it won't be indexed anyways, given the noindex meta tag.
I did my research and that is what I got from it. But I am not sure about this, and I would like to understand it before applying anything at all.
Thank you.
In order to avoid the penalization for duplicate content, you can of course use meta noindex and rel="nofollow". Here is the syntax:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This tells robots not to index the content of a page, and/or not scan it for links to follow.
There are two important considerations when using the robots <META> tag:
Robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
The NOFOLLOW directive only applies to links on this page. It's entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.
Don't confuse this NOFOLLOW with the rel="nofollow" link attribute.

Google only indexing English content for multi-language website -- default not English

I have a multi-language website. The default language is German. Google seems to be indexing only the English version at example.de/en/. The redirect being used is based on geolocation. For example if someone visits the site from outside of Germany, they will see the /en/ site.
From what I understand Google crawlers will end up redirecting because they are based in the USA and thus will only index the English version of the site both on google.com and google.de. Since my site is targeting primarily Germany, I want to make sure that when someone searches on google.de they will see the German site in the results. What is the best way for me to go about this? I am currently using the hreflang property. For example on the English site we have this code:
<link rel="alternate" hreflang="de-DE" href="https://mysite.de/" />
And on the German site we have this code:
<link rel="alternate" hreflang="en-US" href="https://mysite.de/en/" />
Shouldn't Google recognize this and display relevant search results based on which version of Google is being queried?
You are not using the hreflang links properly. You need to use x-default for the default language and you need to add multiple hreflang links to specify alternate version of your pages, in each page. You could also achieve this by using hreflang in your sitemap. Check here for more details.
Right now, Google is probably confused by what your are trying to achieve, because the information is incomplete. It probably tries to play it safe to avoid duplicate content.

Editing the head element on an old blog platform on a post-by-post basis. Is this impossible or am I missing something?

Sorry for being a total rookie.
I am trying to help my professor implement this advice:
Either as a courtesy to Forbes or a favor to yourself, you may want to include the rel="canonical" link element on your cross-posts. To do this, on the content you want to take the backseat in search engines, you add in the head of the page. The URL should be for the content you want to be favored by search engines. Otherwise, search engines see duplicate content, grow confused, and then get upset. You can read more about the canonical tag here: http://www.mattcutts.com/blog/canonical-link-tag/. Have a great day!
The problem is I am having trouble figuring out how to edit the head element on a post-by-post basis. We are currently on a super old blogging platform (Movable Type 3.2 from 2005), so maybe it is not possible. But I'd like to know if that is likely the reason, so I'm not missing out on a workaround.
If anyone could point me in the right direction, I would greatly appreciate it!
Without knowing much about your installation, I'll give a general description, and hopefully it matches what you see and helps.
In Movable Type, each blog has a "Design" section where you can see and edit the templates for the blog. On this page, the templates that are published once are listed under "Index Templates," and the templates published multiple times, once per entry, per category, etc., are listed under "Archive Templates."
There probably is an archive template called "Entry" (could be renamed) publishing to a path like category/sub-category/entry-basename.php. This is the main template that publishes each entry. Click on this to open the template editor.
This template could be an entire HTML document, or it might have "includes" that look like <MTInclude module=""> or <$mt:Include module=""$> (MT supports varying tag styles.).
You may find there is an included module that contains the <head> content, or it might just be right in that template. To "follow" the includes and see those templates, there should be links on the side of the included templates.
Once you find the <head> content, you can add a canonical link tag like this:
<mt:IfArchiveType type="Individual">
<mt:If tag="EntryPermalink">
<link rel="canonical" href="<$mt:EntryPermalink$>" />
</mt:If>
</mt:IfArchiveType>
Depending on your needs, you might want to customize this to output a specific URL structure for other types of content, like category listings. The above will just take care of telling search engines the preferred URL for each entry.
#Charlie: may be I'm missing something, but your solution basically places a canonical link on each entry to… itself, which is a no-no for search engines (the link should point to another page that's considered the canonical one).
#user2359284 you need a way to define the canonical entry for those which need this link. As Shmuel suggested, either reuse an unused field or a custom field plugin. Then you simply add that link in the header in the proper archive template that outputs your notes. In the hypothesis that the Entry template includes the same header as other templates, and, say, you're using the Keywords field to set the URL, then the following code should work (the mt:IfArchiveType test simply ensures it's output in the proper context, which you don't need if your Entry template has its own code for the header):
<mt:IfArchiveType type="Individual">
<link rel="canonical" href="<$mt:EntryKeywords$>" />
</mt:IfArchiveType>

Google Webmaster Tools - Remove query parameters from URL

I am using JBoss Seam on a Jetty web server and am having some issues with the query parameters breaking links when they appear in google searches.
The first parameter is one JBoss Seam uses to track conversations, cid or conversationId. This is a minor issue as Google is complaining I am submitting different urls with the same information.
Secondly, would it make sense to publish/remove urls via the Google Webmaster API instead of publishing/removing via the sitemap?
Walter
Hey Walter, I would recommend that you use the rel=canonical tag to tell the search engines to ignore certain parameters in your URL strings. The canonical tag is a common standard that Google, Yahoo and Microsoft have committed to supporting.
For example, if JBoss is creating URLs that look like this: mysite.com?cid=FOO&conversationId=BAR, then you can create a canonical tag in the section of your website like this:
<html>
<head>
<link rel="canonical" href="http://mysite.com" />
</head>
</html>
The search engines will use this information to normalize the URLs on your website to the canonical (or shortest & most authoritative) version. Specifically, they will treat this as a 301 redirect from the URL of the HTTP request to the URL specified in the canonical tag (as long as you haven't done anything silly, like make it an infinite loop, or pointed to a URL that doesn't exist).
While the canonical tag is pretty fricken cool, it is only a 90% solution, in that you can still run into issues with metrics tracking with all the extra parameters on your website. The best solution would be to update your infrastructure to trap these tracking parameters, create a cookie, and then use a 301 redirect to redirect the URL to the canonical version. However, this can be a prohibitive amount of work for that extra 10% gain, so many people prefer to start with the canonical tag.
As for your second question, generally you don't want to remove these URLs from Google if people are linking to them. By using the canonical tag, you achieve the same goal, but don't loose any value of the inbound links to your website.
For more information about the canonical tag, and the specific issues & solutions, check out this article I wrote on it here: http://janeandrobot.com/library/url-referrer-tracking.
Google Webmaster Tools will tell you about duplicate titles and other issues that Google see that are being caused by "duplicates" that are really the same page being served up with two different URL versions. I suggest trying to make sure the number of errors listed in Webmaster Tools account under duplicate titles is as close to zero as possible.