Should the home page canonical URL end in a "/" or does it matter? - canonical-link

For the home page's canonical URL...
<link rel="canonical" href="http://mysite.com" />
OR
<link rel="canonical" href="http://mysite.com/" />
AND
Where should it be placed within the head section? at the top, somewhere in the middle, or last?

The rel="canonical" attribute should be used only to specify the preferred
version of many pages with identical content (although minor differences,
such as sort order, are okay).
That's from the Google description of the canonical link. Therefore you shouldn't be using the site's root, therefore this question is moot.

Should it? I don't think there's a single answer to that. Does it matter? Yes, in many cases. It's two different URLs, and there can be cases (such as with web optimizationn or analytics) when code is looking at the URL and making decisions on what it contains. If it expects "www.mysite.com/page?test=1" and sees "www.mysite.com/page/?test=1" it won't work. Server redirects may also be affected by the difference.

Assuming you don't already have a redirect to force a trailing /, then you can choose to use either version of the canonical tag.
Its up to you, how would you like your url to appear in the search results, with a trailing / or without?
You may place the canonical tag any where in the head section.

I come also recently to decide my canonical url of my homepage whether use trailing slash or not. Googling to the case could not find a good answer. However I marked that top searching are mostly resulting to site homepage with trailing slash.

Related

Google URL Parameter tool - what to exclude?

Situation: Site built on OpenCart, which utilizes faceted navigation.
Problem: Google Webmaster Tools' "URL Parameters" tool reports a huge number of URLs with parameters like "sort", "order", "limit", "search", and "page.
I would like to exclude them, but I'm worried about 2 things:
1.) Maybe there's a better way to handle this issue? Exclusion directives in robots.txt? Something else? I.e. fixing the problem on the site, before Google detects it in the first place.
2.) I don't want to accidentally exclude actual content.
So... anyone familiar with SEO and/or OpenCart, please give me a 2nd opinion on which of these parameters I should exclude, or change the settings for?
Thanks!
I'm not aware of a robots.txt option. But you might sovle this using http headers and/or html headers.
You could set <META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> for duplicate pages in HTML header (cf. http://www.robotstxt.org/meta.html).
Another approach could be to provide canonical URLs (e.g., in HTML header or HTTP headers, cf. https://yoast.com/rel-canonical/, https://support.google.com/webmasters/answer/139066 and https://support.google.com/webmasters/answer/1663744).

301 redirect vs canonical links?

For technical reasons on a site we may have two or more links that refer to the same product page. For example:
http://example.com/a-nice-product-no1234.html
and:
http://example.com/a-nice-foobar-product-no1234.html
Apparently the first one is the "correct" link. What is the right approach when the second link is opened?
Approach 1)
Redirect 301 to the first link
Approach 2)
Status 200 and
<link rel="canonical" href="http://example.com/a-nice-product-no1234.html">
in the HTML head? Is approach 2) applicable for other search engines than Google? Other suggestions?
Thank you!
If
http://example.com/a-nice-foobar-product-no1234.html
Is in any way invalid or you have the intention of removing it a 301 Moved Permanently is the way to go.
A technical discussion from google of rel="canonical" shows it should be used to indicate original content, as opposed to say, the same content ordered differently, using different formatting and so on.
This will also have the benefit of users not bookmarking and using links to these "slightly invalid" pages. Making their use lessen over time.

Removing URL duplicates when using pretty urls

I'm using pretty URLs in my web app, one example is 'forum/post/1' which invokes PostController in Forum module, which loads a post with id=1. This is what I need but that post is also accessible from 'forum/post/view/id/1'. That's bad, because search crawlers don't like when same page is accessible from several URLs, right?
I'm using Yii framework which supports 'useStrictParsing' option, which tells that incoming request must match at least one "pretty" route, otherwise request fails with 404. However it's not a perfect solution, because I don't have pretty URLs for every controller/action.
Ideally, framework should redirect 'forum/post/view/id/1' to 'forum/post/1' with a 301 status code. How did you solve this problem? It's not Yii/PHP specific question, how does your framework/tool deal with it?
The best way to make sure search engines only rank one page the pretty url over another, if there are multiple ways to view the content is to your a canonical tag within the header of your document
<link rel="canonical" href="http://www.mydomain.com/nice-url/" />
This is very useful with windows based system as IIS is not case sensitive with its web pages but the web standard is case sensitive.
So
www.maydomain.com/Newpage.aspx
www.maydomain.com/newpage.aspx
www.maydomain.com/NEWPAGE.aspx
These are all seen by Google as different pages, and you are then marked down for having a site with duplicate content, but not so with a canonical as each page in the case above would have the same canonical meta tag and the that url is the only one which will be used by the search engines.
Provided that no one links to your non-pretty urls, the search engines will never know that they exist.
If you do want to eliminate them, you could bypass your web framework by adding an alias in you web server's configuration file; the url will be redirected before it ever reaches the framework.
Frameworks like Django, which don't provide 'magic' routing, don't face this issue, the only routes which exist are those which you define manually. In it's case, you could define a view for the non-pretty url which returns the appropriate redirect.

Google Webmaster Tools - Remove query parameters from URL

I am using JBoss Seam on a Jetty web server and am having some issues with the query parameters breaking links when they appear in google searches.
The first parameter is one JBoss Seam uses to track conversations, cid or conversationId. This is a minor issue as Google is complaining I am submitting different urls with the same information.
Secondly, would it make sense to publish/remove urls via the Google Webmaster API instead of publishing/removing via the sitemap?
Walter
Hey Walter, I would recommend that you use the rel=canonical tag to tell the search engines to ignore certain parameters in your URL strings. The canonical tag is a common standard that Google, Yahoo and Microsoft have committed to supporting.
For example, if JBoss is creating URLs that look like this: mysite.com?cid=FOO&conversationId=BAR, then you can create a canonical tag in the section of your website like this:
<html>
<head>
<link rel="canonical" href="http://mysite.com" />
</head>
</html>
The search engines will use this information to normalize the URLs on your website to the canonical (or shortest & most authoritative) version. Specifically, they will treat this as a 301 redirect from the URL of the HTTP request to the URL specified in the canonical tag (as long as you haven't done anything silly, like make it an infinite loop, or pointed to a URL that doesn't exist).
While the canonical tag is pretty fricken cool, it is only a 90% solution, in that you can still run into issues with metrics tracking with all the extra parameters on your website. The best solution would be to update your infrastructure to trap these tracking parameters, create a cookie, and then use a 301 redirect to redirect the URL to the canonical version. However, this can be a prohibitive amount of work for that extra 10% gain, so many people prefer to start with the canonical tag.
As for your second question, generally you don't want to remove these URLs from Google if people are linking to them. By using the canonical tag, you achieve the same goal, but don't loose any value of the inbound links to your website.
For more information about the canonical tag, and the specific issues & solutions, check out this article I wrote on it here: http://janeandrobot.com/library/url-referrer-tracking.
Google Webmaster Tools will tell you about duplicate titles and other issues that Google see that are being caused by "duplicates" that are really the same page being served up with two different URL versions. I suggest trying to make sure the number of errors listed in Webmaster Tools account under duplicate titles is as close to zero as possible.

Should I be concerned if googlebot is trying to index marketing URLs?

I have recently started using Google Webmaster Tools.
I was quite surprised to see just how many links google is trying to index.
http://www.example.com/?c=123
http://www.example.com/?c=82
http://www.example.com/?c=234
http://www.example.com/?c=991
These are all campaigns that exist as links from partner sites.
For right now they're all being denied by my robots file until the site is complete - as is EVERY page on the site.
I'm wondering what is the best approach to deal with links like this is - before I make my robots.txt file less restrictive.
I'm concerned that they will be treated as different URLS and start appearing in google's search results. They all correspond to the same page - give or take. I dont want people finding them as they are and clicking on them.
By best idea so far is to render a page that contains a query string as follows :
// DO NOT TRY THIS AT HOME. See edit below
<% if (Request.QueryString != "") { %>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<% } %>
Do I need to do this? Is this the best approach?
Edit: This turns out NOT TO BE A GOOD APPROACH. It turns out that Google is seeing NOINDEX on a page that has the same content as another page that does not have NOINDEX. Apparently it figures they're the same thing and the NOINDEX takes precedence. My site completely disappeared from Google as a result. Caveat: it could have been something else i did at the same time, but i wouldn't risk this approach.
This is the sort of thing that rel="canonical" was designed for. Google posted a blog article about it.
Yes, Google would interprete them as different URLs.
Depending on your webserver you could use a rewrite filter to remove the parameter for search engines, eg url rewrite filter for Tomcat, or mod rewrite for Apache.
Personally I'd just redirect to the same page with the tracking parameter removed.
That seems like the best approach unless the page exists in it's own folder in which case you can modify the robots.txt file just to ignore that folder.
For resources that should not be indexed I prefer to do a simple return in the page load:
if (IsBot(Request.UserAgent)
return;