Duplicate without user-selected canonical [closed] - seo

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
We have a live website with URL e.g. abc.com but when the site fully loads, it gets redirected to abc.com/home.
I indexed all the pages to google search console, under coverage it says,
Duplicate without user-selected canonical and the page is not under valid URL's. We have not added the URL "abc.com/home" in the sitemap that we have submitted to the search console.
how do I deal with "Duplicate without user-selected canonical" so that I get good rankings on SEO?

Google maintains a Support Article listing all the different ways you can specify which of your links to treat as the "canonical" or "main" version when google detects that your site has multiple pages that are duplicates (that is if they are in fact actually duplicates, if the pages aren't "meant" to be duplicates, find out why and fix it).
Reasons you may see duplicate URL's:
There are valid reasons why your site might have different URLs that point to the same page, or have duplicate or very similar pages at different URLs. Here are the most common:
To support multiple device types:
https://example.com/news/koala-rampage
https://m.example.com/news/koala-rampage
https://amp.example.com/news/koala-rampage
To enable dynamic URLs for things like search parameters or session IDs:
https://www.example.com/products?category=dresses&color=green
https://example.com/dresses/cocktail?gclid=ABCD
https://www.example.com/dresses/green/greendress.html
If your blog system automatically saves multiple URLs as you position the same post under multiple sections.
https://blog.example.com/dresses/green-dresses-are-awesome/
https://blog.example.com/green-things/green-dresses-are-awesome/
If your server is configured to serve the same content for www/non-www http/https variants:
http://example.com/green-dresses
https://example.com/green-dresses
http://www.example.com/green-dresses
If content you provide on a blog for syndication to other sites is replicated in part or in full on those domains:
https://news.example.com/green-dresses-for-every-day-155672.html (syndicated post)
https://blog.example.com/dresses/green-dresses-are-awesome/3245/
(original post)
I should also add an example for analytics campaigns. In my case, google is detecting url's with third-party (not google) campaign URL parameters as separate (and therefore duplicate) pages.
Telling Google about your Canonical pages:
The support article also includes a table and details on various methods for telling google about canonical pages roughly in importance order:
add a <link> tag to the HTML of all duplicate pages with the rel=canonical attribute to point to the new URL (i.e. googles example: <link rel="canonical" href="https://example.com/dresses/green-dresses" />)
rel=canonical HTTP headers. (i.e. Link: <http://www.example.com/downloads/white-paper.pdf>; rel="canonical" )
Submit your canonical URL's in a sitemap
Use 301 (permanent) redirects for URLs that have permanently moved so that the old and new locations aren't marked as duplicates of each other

Related

How to evidence a particular page on Google SERP? [duplicate]

This question already has answers here:
How to get Google Sitelinks on a website? [closed]
(3 answers)
Closed 2 years ago.
I noticed that some results of searching on Google are not a single url but a single url with a two-column list of what I call 'important links' of this website.
For example: If you open Google and search for "amazon.it", without the double quote, you got this:
As you can see, some links are evidenced directly on SERP ("eBook Kindle", for example).
I know I can produce a sitemap.xml for Google bot pleasure, but, my question is: How can I evidence some particular link of my website to be presented in this way in Google's SERP?.
As far I know, there is no a special syntax in the sitemap protocol to 'force' or 'suggest' this to search engines. For future readers: this is the link to the sitemap protocol.
These are called sitelinks and they are unrelated to sitemaps. Google only shows them when:
It understands the structure of your website (typically via the structures in URLs).
It trusts your website's content (no spam).
The content/link is relevant and useful for the corresponding user query.
Some say implementing breadcrumbs helps Google find SiteLinks candidates, but this has never been confirmed and may be completely false.

Canonical Link Element for Dynamic Pages ( rel="canonical") [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a stack system that passes page tokens in the URL. As well my pages are dynamically created content so I have one php page to access the content with parameters.
index.php?grade=7&page=astronomy&pageno=2&token=foo1
I understand the search indexing goal to be The goal is to have only one link per unique set of data on your website.
Bing has a way to specify specific parameters to ignore.
Google it seems uses rel="canonical" but is it possible to use this to tell Google to ignore the token parameter? My URL (without tokens) can be anything like:
index.php?grade=5&page=astronomy&pageno=2
index.php?grade=6&page=math&pageno=1
index.php?grade=7&page=chemistry&page2=combustion&pageno=4
If there is not a solution for Google... Other possible solutions:
If I provide a site map for each base page, I can supply base URLs but any crawing of that page's links will crate tokens on resulting pages. Plus I would have to constantly recreate the site map to cover new pages (e.g. 25 posts per page, post 26 is on page 2).
One idea I've had is to identify bots on page load (I do this already) and disable all tokens for bots. Since (I'm presuming) bots don't use session data between pages anyway, the back buttons and editing features are useless. Is it feasible (or is it crazy) to write custom code for bots?
Thanks for your thoughts.
You can use the Google Webmaster Tools to tell Google to ignore certain URL parameters.
This is covered on the Google Webmaster Help page.

Google search results showing my site even though I've disallowed it in robots.txt [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
My staging site is showing up in search results, even though I've specified that I don't want the site crawled. Here's the contents of my robots.txt file for the staging site:
User-agent: Mozilla/4.0 (compatible; ISYS Web Spider 9)
Disallow:
User-agent: *
Disallow: /
Is there something I'm doing wrong here?
Your robots.txt tells Google not to crawl/index your page's content.
It doesn't tell Google not to add your URL to their search results.
So if your page (which is blocked by robots.txt) is linked somewhere else, and Google finds this link, it checks your robots.txt if it is allowed to crawl. It finds that it is forbidden, but hey, it still has your URL.
Now Google might decide that it would be useful to include this URL in their search index. But as they are not allowed (per your robots.txt) to get the page's metadata/content, they only index it with keywords from your URL itself, and possibly anchor/title text that someone else used to link to your page.
If you don't want your URLs to be indexed by Google, you'd need to use the meta-robots, e.g.:
<meta name="robots" content="noindex">
See Google's documentation: Using meta tags to block access to your site
You're robots file looks clean, but remember Google, Yahoo, Bing, etc. etc. do not need to crawl your site in order to index it.
There is a very good chance the Open Directory Project or a less polite bot of some kind stumbled across it. Once someone else finds your site these days it seems everyone gets their hands on it. Drives me crazy too.
A good rule of thumb when staging is:
Always test your robots file for any oversights with relation to syntax before posting it on your production site. Try robots.txt Checker, Analyze robots.txt, or Robots.txt Analysis - Check whether your site can be accessed by Robots.
2.Password protect your content while staging. Even if its somewhat bogus, put a login and password at your indexes root. Its an extra step for your fans and testers -- but well worth it if you want polite --OR-- unpolite bots out of your hair.
3.Depending on the project you may not want to use your actual domain for testing. Even if I have a static ip - sometimes Ill use dnsdynamic or noip.com to stage my password protected site. So for example, if I want to stage my domain ihatebots.com :) I will simply go to dnsdynamic or noip (theyre free btw) and create a fake domain such as: ihatebots.user32.com or somthingtotallyrandom.user32.com and then assign my ip address to it. This way even if someone crawls my staging project -- my original domain: ihatebots.com is still untouched from any kind of search engine result (so are its records too btw).
Remember there are billions of dollars around the world aimed at finding you 24 hrs a day and that number is ever increasing. Its tough these days. Be creative and always password protect if you can while staging.
Good luck.

real meaning of nofollow links [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
i'm getting confused what are realy nofollow attributes do.
I do believe that they tell search engine spiders to do not follow the target.
But my question is: do nofollow links alter the Pagerank?
Thanks
ok, here is a simplified pipe google uses to serve your pages
discovery
crawling
indexing
ranking
discovery is basically the step to discovery new urls, common sources are
links it finds on other pages
sitemap.xml
adding a nofollow to a link on your page and google disocovers that links basically pushes the link into the google discovery queue, but with the flag "nofollow = do not crawl that site (based on this discovery), do not index this url (based on this discovery), do not rank this url (based on this discovery)"
so basically you have de-valued that specific link. the link does not count as a vote for that other page.
said that:
it does not help you to save "pagerank" - the concept of pagerank is just thoughtcancer - the "link juice" does not stay on your page, it just get flushed into nirvana. congrats. it's like voting with a note not to count that vote.
there are only 2 use cases when a nofollow links makes sense
if you can't (user generated content without editorial quality assurance)
or if you won't (a link to a site you want to point out is sh*t)
vote for another page.
p.s.: this is the site for not programming related SEO questions https://webmasters.stackexchange.com/
nofollow is poorly named. I'll try and give another explanation:
All the links on a web page acquire link juice that they can pass on to the pages they link to.
The amount of juice available to a page is based on the link juice it receives from other pages that link it. This all relates to the PageRank algorithm.
How the juice is distributed to the links is outside the question, and a Google secret. But each link gets a share.
nofollow in a link says don't pass on my share of link juice.
What is believed is that this link juice is just leaked out so using nofollow cannot be used to retain ranking. Just to deny the recipient of any boost in their ranking.
A good use for nofollow is when external users can add their own links in your website. This can protect you from people spamming you to pass on juice to their own websites.
nofollow is indeed badly named. What it does is prevent the passing of PageRank and anchor text benefit to the receiving link. However, nofollow links can still be beneficial. Trust and authority can still be passed on, so a link from Wikipedia is still very valuable.
Nofollow attribute means not to pass the PR with link..Nofollow links does not alter Page rank..but they can help in driving traffic towards to your site.. :)
If you have links on your pages that link to external web sites, you can add nofollow so that your site does not "spill" page rank to the external pages that you link to (which they would if you don't add nofollow).

SEO Help with Pages Indexed by Google [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all.
Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3
I created my own CMS, and this is simply a breakdown of user id #3's statistics, which I noticed is indexed by Google, although it shouldn't be. I understand that it takes some time before Google's results reflect accurately on my site's content, but this has been improperly indexed for nearly six months now.
Here are the precautions that I have taken:
My robots.txt file has a line like this:
Disallow: /wow/profile.php*
When running the url through Google Webmaster Tools, it indicates that I did, indeed, correctly create the disallow command. It did state, however, that a page that doesn't get crawled may still get displayed in the search results if it's being linked to. Thus, I took one more precaution.
In the source code I included the following meta data:
<meta name="robots" content="noindex,follow" />
I am assuming that follow means to use the page when calculating PageRank, etc, and the noindex tells Google to not display the page in the search results.
This page, profile.php, is used to take the $_GET['id'] and find the corresponding registered user. It displays a bit of information about that user, but is in no way relevant enough to warrant a display in the search results, so that is why I am trying to stop Google from indexing it.
This is not the only page Google is indexing that I would like removed. I also have a WordPress blog, and there are many category pages, tag pages, and archive pages that I would like removed, and am doing the same procedures to attempt to remove them.
Can someone explain how to get pages removed from Google's search results, and possibly some criteria that should help determine what types of pages that I don't want indexed. In terms of my WordPress blog, the only pages that I truly want indexed are my articles. Everything else I have tried to block, with little luck from Google.
Can someone also explain why it's bad to have pages indexed that don't provide any new or relevant content, such as pages for WordPress tags or categories, which are clearly never going to receive traffic from Google.
Thanks!
It would be a better idea to revise your meta robots directives to:
<meta name="robots" content="noindex,noarchive,nosnippet,follow" />
My robots file was blocking access to the page where the meta tag was included. Thus, even though the meta tag told Google to not index my pages, Google never got that far.
Case closed. :P
If you have blocked and tested URL in robots.txt, it must work. Here you don't need to add additional meta tag into particular page.
I am sure, give some time to Google for crawling your website. It should work !
For removing URLs, you can use Google webmaster tool. (i am sure you know that)