Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all.
Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3
I created my own CMS, and this is simply a breakdown of user id #3's statistics, which I noticed is indexed by Google, although it shouldn't be. I understand that it takes some time before Google's results reflect accurately on my site's content, but this has been improperly indexed for nearly six months now.
Here are the precautions that I have taken:
My robots.txt file has a line like this:
Disallow: /wow/profile.php*
When running the url through Google Webmaster Tools, it indicates that I did, indeed, correctly create the disallow command. It did state, however, that a page that doesn't get crawled may still get displayed in the search results if it's being linked to. Thus, I took one more precaution.
In the source code I included the following meta data:
<meta name="robots" content="noindex,follow" />
I am assuming that follow means to use the page when calculating PageRank, etc, and the noindex tells Google to not display the page in the search results.
This page, profile.php, is used to take the $_GET['id'] and find the corresponding registered user. It displays a bit of information about that user, but is in no way relevant enough to warrant a display in the search results, so that is why I am trying to stop Google from indexing it.
This is not the only page Google is indexing that I would like removed. I also have a WordPress blog, and there are many category pages, tag pages, and archive pages that I would like removed, and am doing the same procedures to attempt to remove them.
Can someone explain how to get pages removed from Google's search results, and possibly some criteria that should help determine what types of pages that I don't want indexed. In terms of my WordPress blog, the only pages that I truly want indexed are my articles. Everything else I have tried to block, with little luck from Google.
Can someone also explain why it's bad to have pages indexed that don't provide any new or relevant content, such as pages for WordPress tags or categories, which are clearly never going to receive traffic from Google.
Thanks!
It would be a better idea to revise your meta robots directives to:
<meta name="robots" content="noindex,noarchive,nosnippet,follow" />
My robots file was blocking access to the page where the meta tag was included. Thus, even though the meta tag told Google to not index my pages, Google never got that far.
Case closed. :P
If you have blocked and tested URL in robots.txt, it must work. Here you don't need to add additional meta tag into particular page.
I am sure, give some time to Google for crawling your website. It should work !
For removing URLs, you can use Google webmaster tool. (i am sure you know that)
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
My staging site is showing up in search results, even though I've specified that I don't want the site crawled. Here's the contents of my robots.txt file for the staging site:
User-agent: Mozilla/4.0 (compatible; ISYS Web Spider 9)
Disallow:
User-agent: *
Disallow: /
Is there something I'm doing wrong here?
Your robots.txt tells Google not to crawl/index your page's content.
It doesn't tell Google not to add your URL to their search results.
So if your page (which is blocked by robots.txt) is linked somewhere else, and Google finds this link, it checks your robots.txt if it is allowed to crawl. It finds that it is forbidden, but hey, it still has your URL.
Now Google might decide that it would be useful to include this URL in their search index. But as they are not allowed (per your robots.txt) to get the page's metadata/content, they only index it with keywords from your URL itself, and possibly anchor/title text that someone else used to link to your page.
If you don't want your URLs to be indexed by Google, you'd need to use the meta-robots, e.g.:
<meta name="robots" content="noindex">
See Google's documentation: Using meta tags to block access to your site
You're robots file looks clean, but remember Google, Yahoo, Bing, etc. etc. do not need to crawl your site in order to index it.
There is a very good chance the Open Directory Project or a less polite bot of some kind stumbled across it. Once someone else finds your site these days it seems everyone gets their hands on it. Drives me crazy too.
A good rule of thumb when staging is:
Always test your robots file for any oversights with relation to syntax before posting it on your production site. Try robots.txt Checker, Analyze robots.txt, or Robots.txt Analysis - Check whether your site can be accessed by Robots.
2.Password protect your content while staging. Even if its somewhat bogus, put a login and password at your indexes root. Its an extra step for your fans and testers -- but well worth it if you want polite --OR-- unpolite bots out of your hair.
3.Depending on the project you may not want to use your actual domain for testing. Even if I have a static ip - sometimes Ill use dnsdynamic or noip.com to stage my password protected site. So for example, if I want to stage my domain ihatebots.com :) I will simply go to dnsdynamic or noip (theyre free btw) and create a fake domain such as: ihatebots.user32.com or somthingtotallyrandom.user32.com and then assign my ip address to it. This way even if someone crawls my staging project -- my original domain: ihatebots.com is still untouched from any kind of search engine result (so are its records too btw).
Remember there are billions of dollars around the world aimed at finding you 24 hrs a day and that number is ever increasing. Its tough these days. Be creative and always password protect if you can while staging.
Good luck.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a product page in my website which I have added 3 years before.
Now the product production was stopped and the product page was removed from website.
What I did is I started displaying message in the product page telling that the production of the product got stopped.
when some one searches in google for that products the product page which was removed from site shows up first in google search.
The page rank for the product page is also high.
I don't want the removed product page to be shown at the top of search result.
What is the proper method to remove a page from website so that it gets depicted by what ever google have indexed in its table.
Thanks for the reply
Delete It
The proper way to remove a page from a site is to delete the actual file that is been returned to the user/bot when the page is requested. If the file is not on the webserver, any well configured webserver will return a 404 and the bot/spider will choose to remove that from the index in the next refresh.
Redirect It
If you want to keep the good "google juice" or SERP ranking the page has, probably due to any inbound links from external sites, you'd be best to set your websever to do a 302 redirect to a similar (updated product).
Keep and convert
However, if the page is doing so well that it ranks #1 for searches to the entire site, you need to use this to your advantage. Leave the bulk of the copy on the page the same, but highlight to the viewer that the product no longer exists and provide some helpful options to the user instead: tell them about a newer, better product, tell them why it's no longer available, tell them where they can go to get support if they already have the discontinued product.
I am completely agree with above suggestion and want to add just one point.
If you want to remove that page from Google Search Result; just login to Google webmaster tool (you must have verified that website in Google webmaster tool) and add that particular page for index removal request.
Google will de-index that page and it will be removed from Google search rankings.
My website has about 200 useful articles. Because the website has an internal search function with lots of parameters, the search engines end up spidering urls with all possible permutations of additional parameters such as tags, search phrases, versions, dates etc. Most of these pages are simply a list of search results with some snippets of the original articles.
According to Google's Webmaster-tools Google spidered only about 150 of the 200 entries in the xml sitemap. It looks as if Google has not yet seen all of the content years after it went online.
I plan to add a few "Disallow:" lines to robots.txt so that the search engines no longer spiders those dynamic urls. In addition I plan to disable some url parameters in the Webmaster-tools "website configuration" --> "url parameter" section.
Will that improve or hurt my current SEO ranking? It will look as if my website is losing thousands of content pages.
This is exactly what canonical URLs are for. If one page (e.g. article) can be reached by more then one URL then you need to specify the primary URL using a canonical URL. This prevents duplicate content issues and tells Google which URL to display in their search results.
So do not block any of your articles and you don't need to enter any parameters, either. Just use canonical URLs and you'll be fine.
As nn4l pointed out, canonical is not a good solution for search pages.
The first thing you should do is have search results pages include a robots meta tag saying noindex. This will help get them removed from your index and let Google focus on your real content. Google should slowly remove them as they get re-crawled.
Other measures:
In GWMT tell Google to ignore all those search parameters. Just a band aid but may help speed up the recovery.
Don't block the search page in the robots.txt file as this will block the robots from crawling and cleanly removing those pages already indexed. Wait till your index is clear before doing a full block like that.
Your search system must be based on links (a tags) or GET based forms and not POST based forms. This is why they got indexed. Switching them to POST based forms should stop robots from trying to index those pages in the first place. JavaScript or AJAX is another way to do it.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
i'm getting confused what are realy nofollow attributes do.
I do believe that they tell search engine spiders to do not follow the target.
But my question is: do nofollow links alter the Pagerank?
Thanks
ok, here is a simplified pipe google uses to serve your pages
discovery
crawling
indexing
ranking
discovery is basically the step to discovery new urls, common sources are
links it finds on other pages
sitemap.xml
adding a nofollow to a link on your page and google disocovers that links basically pushes the link into the google discovery queue, but with the flag "nofollow = do not crawl that site (based on this discovery), do not index this url (based on this discovery), do not rank this url (based on this discovery)"
so basically you have de-valued that specific link. the link does not count as a vote for that other page.
said that:
it does not help you to save "pagerank" - the concept of pagerank is just thoughtcancer - the "link juice" does not stay on your page, it just get flushed into nirvana. congrats. it's like voting with a note not to count that vote.
there are only 2 use cases when a nofollow links makes sense
if you can't (user generated content without editorial quality assurance)
or if you won't (a link to a site you want to point out is sh*t)
vote for another page.
p.s.: this is the site for not programming related SEO questions https://webmasters.stackexchange.com/
nofollow is poorly named. I'll try and give another explanation:
All the links on a web page acquire link juice that they can pass on to the pages they link to.
The amount of juice available to a page is based on the link juice it receives from other pages that link it. This all relates to the PageRank algorithm.
How the juice is distributed to the links is outside the question, and a Google secret. But each link gets a share.
nofollow in a link says don't pass on my share of link juice.
What is believed is that this link juice is just leaked out so using nofollow cannot be used to retain ranking. Just to deny the recipient of any boost in their ranking.
A good use for nofollow is when external users can add their own links in your website. This can protect you from people spamming you to pass on juice to their own websites.
nofollow is indeed badly named. What it does is prevent the passing of PageRank and anchor text benefit to the receiving link. However, nofollow links can still be beneficial. Trust and authority can still be passed on, so a link from Wikipedia is still very valuable.
Nofollow attribute means not to pass the PR with link..Nofollow links does not alter Page rank..but they can help in driving traffic towards to your site.. :)
If you have links on your pages that link to external web sites, you can add nofollow so that your site does not "spill" page rank to the external pages that you link to (which they would if you don't add nofollow).
i have a blog build in wordpress, And my domain name is like example.com (i can't give you the original name, because some times the editors will mark this question as SPAM :( , and if any one really want to check directly from my site will add at the end of the question.)
http://example.com and the blog name is http://example.com/articles/
and the sitemap.xml is available in http://example.com/sitemap.xml
Google daily visit my site and all my new articles were crawled, if i search the "articles title + example.com " will get the search result from the google , its my site. but the heading is not the actual one. its getting from another article's data.
(i think can give you a sample search query, please don't take this as a spam)
Installing Go Language in Ubuntu+tutorboy - But this will list with proper title after a long title :(, I think now you understood what i am facing ... please help me to find out why this happens.
Edit:
How can i improve my SEO with wordpress?
When I search that query I don't get the page "Installing Go...", I get the "PHP header types" article, which has the above text on the page (links at the right). So the titles showing in Google are correct.
Google has obviously not crawled that page yet since it's quite new. Give it time, especially if your site is new and/or unpopular.
Couple of things I need to make clear:
Google crawled your site on 24 Nov 2009 12:01:01 GMT, so needless to say Google actually does not visit your site(blog)everyday.
When I queried the phrase you provided, the results are right. There are two url relates to your site. One is home page of your blog, another is the page that is (more closely)related to your query. The reason is the query phrase is directly related to the page of tutorboy.com/articles/php/use-php-functions-in-javascript.html, however, in your home page there are still some related keywords. That is the reason why Google presents two pages on the result page.
Your second question is hard to answer since it needs a complicated answer. Still, the following steps are crucial to your SEO.
Unique and good content. Content is king, and it is the subject that remains consistent in the whole time while another elements are changing with the evolving of search engine technology. Also keep your site content fresh.
Back links. Part of the reason that Google does not visit your site after your updating your site is your site lacks enough back links.
Good structure. Properly use those tags like<t>, <description>,<alt>etc.
Using web analysts tools like Google Analysts. It free, and you can see lot of things that you missed.
Most importantly, grabbing some SEO books or spending couple of minutes everyday to read some SEO articles.
Good Luck,