How to hide party mode from search engines? - seo

How do I remove the party mode version of websites I build so that they do not show up in search engine results? So far party mode isn't indexing, but I am afraid the links could potentially become indexed and want to make sure I don't screw up my client's SEO.
For example:
www.example.com/?partymode
#above should not be indexed
www.example.com/
#above *should* be indexed
Does party mode get indexed separately or does google automatically know exclude it?

Check out the active parameters > no URL section https://support.google.com/webmasters/answer/6080550?hl=en#active_parameters. This will allow you to forbid the crawling of a webpage containing a specific query parameter.

Related

Search Database via Google Custom Search? Attached Google CSE to (SQL/NoSQL) Database for website?

TOPIC - Google Search Engine / Custom Search - with Database
References
Search for "Google Search Engine" and "Google Custom Search"
(New to StackOverflow; just joined the other day.I'm limited to 2 links I can post right now).
NOTE:
I have not YET decided/committed to any specific coding language, framework, etc. Not until I figure out how to accomplish my question (below).
BACKGROUND INFO
What I'm trying to do (for now) is add a "search-box/ search engine" to a simple website I'm building out. Before I get too far into it (planning ahead) I would like to use Google CSE if all possible (which can do A LOT of things and works well). However, I will have a database (not sure on type YET. Will depend on what my options and I can do with CSE) of "items" that I want to be able to quickly search (in the search-box) i.e. like Amazon.com.
QUESTION:
Is there any way at all, to use Google Custom Search and or Custom Search API to search/attach a database (SQL, NoSQL, or others)? I would HIGHLY prefer being able to do all of this in Google Cloud Platform, and use one of their storage/database products.
If I get what you try to do, Google CSE is enough.
From the google doc you linked :
#Defining a Custom Search Engine in Control Panel
In the Sites to search section, add the pages you want to include in
your search engine. You can include any sites you want, not just the
sites you own. You can include whole site URLs or individual pages
URLs. You can also use URL patterns.
#Enabling Autocomplete
[...]you can enable or disable autocomplete feature using
enableAutoComplete attribute.
For the Is there any way at all [..] to search a database, I'll said not directly, but it's not a big problem.
Google CSE work on "indexable web pages", so it'll not work again a raw DB, restricted internet, or custom network not under http(s)://.
But in your case, if you make a DB, I suppose you'll have to make web page to display the data you store inside to your users ? (like products pages on Amazon)
If yes, then you'll run Google CSE again these pages by adding your http://[server ip] or http://[domain name] in the white list.
As far as I know, custom search won't guarantee all your content will be indexed.
You probably want to try exporting a full sitemap.xml, a RSS feed and if the custom search results from either of these won't satisfy you, you will probably want to look at the google search appliance product.
There's also http://sphinxsearch.com/ by the way.

How do I test a website in webmaster tools without indexing it

Suppose I have my live site at www.mywebsite.com, tracked and managed via Google Webmaster Tools. Then I want to add to the project list a subdomain like test.mywebsite.com which I use for testing purposes. Of course that subdomain shouldn't be tracked or indexed by Google, but I would like to use "fetch as Google" feature on it to see how the crawler manages the pages. Can I set up such a test environment without being indexed by Google?
Not had chance to test this, but I think if you add noindex tags to your site then it should still allow your site to be registered with webmaster tools, as it can still see the site's content in order to detect ownership.
I believe "fetch as google" then returns live results rather than what is already indexed (it wouldn't be very useful if it didn't allow you to check new pages or re-check updated pages), and so temporarily removing the noindex tag when you run it should allow this feature to be used (it may also return some useful information without removing it).
The fact "fetch as" has a separate "submit" button suggests to me that it will not automatically index pages found via this method, so that should not be a concern.
Adding canonical tags pointing to your main content would provide an additional security measure to stop it accidentally listing.
Google can't provide any information about your website if it's not indexed.
In other words, you can use Google Webmaster Tools without your website being indexed, but it will be pretty much useless, since will not provide any data.
Google webmaster tools won't let you do that but you can test a website for seo checkup or other errors like search description missing,image alt missing etc with bing webmaster tools

SEO: how can dynamic URL with query strings be searched by search engine bots?

I’m developing an ecommerce web site in ASP.NET using SQL server 2008 database.
Most of my pages are database driven and all the content is gathered from a SQL Server.
Every product page is created dynamically from data coming from the database, hence every product’s page URL has a unique query string, containing a “product_id” variable.
*Example: http://www.myecommence.com/products.aspx?product_id=1*
I'd like to improve my Search Engine Optimization.
Dealing with a small number of products could be fine but what if I
had more than 1000 products, how could every product be crawled?
How does the google spider/bot know that a product_id with a
hypothetical number of 767 exists?
I’ve been googleing this, still I can’t understand how pages that
have absolutely no reference in the site or external sites can be
crawled? If this is possible the spider should know how to read the
website’s database tables, but I guess that this is not the case.
At this point since most of the pages and links are dynamic how
could they be indexed, the same thing applies to “user detail” pages
that are accessed via query string using a “user id=n”?
Probably what I’m asking has already been discussed but still I don’t have clear some points.
I would advise using Mod Rewrite rules to make your URLs search engine friendly.
This is very important for Google.
As is a good category structure.
Eg:
domain.com/t-shirts/girls/star-wars-t-shirt/
is far better than
domain.com/products.aspx?product_id=1*
Here is some info:
http://msdn.microsoft.com/en-us/library/ms972974.aspx
http://www.wrox.com/WileyCDA/Section/id-305997.html
To answer your questions:
Dealing with a small number of products could be fine but what if I had more than 1000 products, how could every product be crawled?
If you have a good sitemap / menu structure etc, it is likely that Google will crawl all your pages.
How does the google spider/bot know that a product_id with a hypothetical number of 767 exists?
Via crawling your site, via your sitemap, via the menu system on the site etc. However always remember: Google is not psychic - it cannot find a page unless you tell how to / link to it.
I’ve been googleing this, still I can’t understand how pages that have absolutely no reference in the site or external sites can be crawled? If this is possible the spider should know how to read the website’s database tables, but I guess that this is not the case.
If you have no reference - you are doing something wrong. Improve your site structure.
At this point since most of the pages and links are dynamic how could they be indexed, the same thing applies to “user detail” pages that are accessed via query string using a “user id=n”?
Nothing wrong with a dynamic URL per-se - but again I would recommend implementing search engine friendly URLs via Mod Rewrite or similar - see the above resources.
Good luck,
Colin
Modern systems optimize for SEO by allowing for either custom or automated URLs that remap to your id based url pattern. This URL style allows for a fully custom word for word product title or keyword/description, which carries more weight than a random id number in a URL.
To ensure all individual pages are indexed, you generally benefit most from submitting or making available a sitemap xml. More info from google on generating one here:
https://code.google.com/p/googlesitemapgenerator/
Hope that gets you going in the right direction!

Can search engines index pages generated by server side code?

I'm guessing a site like stack overflow doesn't keep an html file around for every question ever asked. Instead, server-side code creates the page every time a question is clicked on(I think). Is it possible for search engines to index every quesiton on Stack Overflow, or would a page-per-question need to be kept in the directory so the search engine can crawl it?
Yes. Search engines can index dynamically generated pages no problem. In fact, from the search engine bot's perspective, it can't really even distinguish between a dynamically generated page and a static one.
You might be interested by the Dynamic URLs vs. static URLs post on the Official Google Webmaster Central Blog.
Yes it's perfectly possible - when a link is followed the server returns HTML just like any other web page. The only difference is that the server generated it, rather than a person.
As far as the client (be it a browser or search engine) is concerned, there is no difference between a server-generated page and a static file. They're virtually indistinguishable (depending on how the page is generated, it might be missing Last-Modified headers, etc). As such, yes, search engines can index generated pages without a problem.
That said, there is something to be said for giving them a hint. Using sitemaps, for example, gives a search engine a nice listing of all your pages, so it's less likely to miss them. More importantly, it can summarize last modified times, to focus the search engine's attention on what has changed recently. This isn't mandatory, but it does help - regardless of whether the pages are static HTML or generated.
Any link that uses a GET can be followed by most crawlers. Anything that requires a POST will generally be ignored.
The mechanism for generating the page is irrelevant.
yes if this is not restricted by robot.txt or meta tags.Search engine requests web page like normal user,no one have access to server side code(if your site isn't hacked))
Search engines can see pretty much anything on a given Web page that isn't hidden behind client-side code (i.e., JavaScript).
So, if there's a URL that you can enter into your browser's address bar to get this page, and this page is linked to from somewhere, a search engine will find it and "see" the same content that you do. The fact that the page was generated dynamically by a server is irrelevant to a search engine, since what is sent to a browser upon requesting a URL is still just an HTML file.
In other words, that HTML file doesn't exist in the same form on the server - i.e., it's actually some server-side code that generates HTML, not a static HTML file - but that's not what a search engine is crawling though and indexing, rather links to document URLs that are exactly what you see in your browser's address bar.

Is there a way to prevent Googlebot from indexing certain parts of a page?

Is it possible to fine-tune directives to Google to such an extent that it will ignore part of a page, yet still index the rest?
There are a couple of different issues we've come across which would be helped by this, such as:
RSS feed/news ticker-type text on a page displaying content from an external source
users entering contact phone etc. details who want them visible on the site but would rather they not be google-able
I'm aware that both of the above can be addressed via other techniques (such as writing the content with JavaScript), but am wondering if anyone knows if there's a cleaner option already available from Google?
I've been doing some digging on this and came across mentions of googleon and googleoff tags, but these seem to be exclusive to Google Search Appliances.
Does anyone know if there's a similar set of tags to which Googlebot will adhere?
Edit: Just to clarify, I don't want to go down the dangerous route of cloaking/serving up different content to Google, which is why I'm looking to see if there's a "legit" way of achieving what I'd like to do here.
What you're asking for, can't really be done, Google either takes the entire page, or none of it.
You could do some sneaky tricks though like insert the part of the page you don't want indexed in an iFrame and use robots.txt to ask Google not to index that iFrame.
In short NO - unless you use cloaking with is discouraged by Google.
Please check out the official documentation from here
http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html
Go to section "Excluding Unwanted Text from the Index"
<!--googleoff: index-->
here will be skipped
<!--googleon: index-->
Found useful resource for using certain duplicate content and not to allow index by search engine for such content.
<p>This is normal (X)HTML content that will be indexed by Google.</p>
<!--googleoff: index-->
<p>This (X)HTML content will NOT be indexed by Google.</p>
<!--googleon: index>
At your server detect the search bot by IP using PHP or ASP. Then feed the IP addresses that fall into that list a version of the page you wish to be indexed. In that search engine friendly version of your page use the canonical link tag to specify to the search engine the page version that you do not want to be indexed.
This way the page with the content that do want to be index will be indexed by address only while the only the content you wish to be indexed will be indexed. This method will not get you blocked by the search engines and is completely safe.
Yes definitely you can stop Google from indexing some parts of your website by creating custom robots.txt and write which portions you don't want to index like wpadmins, or a particular post or page so you can do that easily by creating this robots.txt file .before creating check your site robots.txt for example www.yoursite.com/robots.txt.
All search engines either index or ignore the entire page. The only possible way to implement what you want is to:
(a) have two different versions of the same page
(b) detect the browser used
(c) If it's a search engine, serve the second version of your page.
This link might prove helpful.
There are meta-tags for bots, and there's also the robots.txt, with which you can restrict access to certain directories.