Stop Google from Indexing files from our second server - indexing

I'm actually scouring the web for the right terms for this question but after a few hours I decided to post my question here.
The scenario is:
we have a website running on two servers. So the files/website is synchronized in these two servers. We have a second server for internal purposes. Let's name the first server as www and ww2 for the second server. ww2 is automatically updated once the files are updated in www.
Now, Google is indexing the ww2 which I want to stop and just let www be crawled and indexed. My questions are:
1. How can I removed those crawled pages in ww2 removed from Google index?
2. How can I stop google from indexing ww2?
Thanks guys!

You can simply use robots.txt to disallow indexing. And there is a robots meta tag obeyed by google.

For your first question Google has a removal tool in their Webmaster tools here is a more info about it Remove page or site from Google's search results
For the second question you either can use a robots.txt file to block Google from crawling your site (here is a more info Blocking Google) or you can restrict the access to that server

Related

Cross site scripting in Domino?

I have a Domino site that is getting highs for cross site scripting on app scan.
We don't have a license to run appscan. Another group needs to do that (yeah big corporations :) ). But I have noticed that the IE browser will complain too with a url as such:
http://myserver.com/cld/cldg.nsf/vwTOC?OpenView&Start=28
(ie will warn you on crosssite scripting with such a url).
I noticed the notes.net forum site does not come up with such an error in IE, when I try to inject script tags. I guess it must scrub the url before the page is rendered? How is this being done in the notes.net forum? Is it done at server level or a database level?
I did found this thread
How to avoid a XSP/Domino Cross-Site Scripting Vulnerability?
where Steve mentions his blog and web rules but the blog mentions that they are not needed in 8.5.4. and above. Am I understanding that right? If so we are at 8.5.4. Is there something I still need to do to scrub my url?
Edit: We are at 8.5.3. Not 8.5.4. I was mistaken. Our admin is going to try Steves's suggestions

Avoiding one domain to be indexed by search engines

I have a site that is available through 2 domains. One domain is one that I got for free with a hosting plan and don't want to promote. However when I perform search queries, pages from my site on that domain pop up.
What techniques are there to avoid the one domain from indexing while the other is perfectly indexed? Remember it's one hosting space with the exact same files on.
I have already submitted it in Google webmaster tools but that only works for Google obviously.
I would set up a sitewide 301 redirect from the domain you don't want to use too the other one. That way you will remove it from indexing as well as move people to use the correct one. You can probably do it in the .htaccess file (apache server). I'm not at my computer at the moment so can't easily give you the commands.

Sharepoint 2010 not searching content across files

Having made sure that all the proper indexing options are set, my dev install of SP2010 is still not searching the content of word docs, only titles. Any suggestions?
Does your crawler account has sufficient permission to access the file attached to the list item ? Are you crawling your site as a SharePoint site or as a web site (in that case you need to make sure that you have link(s) pointing to the document(s).
Don't you have robots.txt file a the root of your web application that might have exclusions rules preventing the content to be properly crawled ?
If you really want to know what's happening when the crawler is doing it's job, you can install fiddler on your dev machine and change the proxy settings of your search service application to the one created by fiddler. Doing so will allow you to check in real time what url / content is currently being crawled and the http status code that are being returned to diagnose permissions / content issue.
Hope it helped.

FAST Search for Sharepoint Crawler issue with Dokuwiki pages

My level of frustion is maxxing out over crawling Dokuwiki sites.
I have a content source using FAST search for SharePoint that i have set up to crawl a dokuwiki/doku.php site. My crawler rules are set to: http://servername/* , match case and include all items in this path with crawl complex urls.. testing the content source in the crawl rules shows that it will be crawled by the crawler. However..... The crawl always last for under 2 minutes and completes having only crawled the page I pointed to and no other link on that page. I have check with the Dokuwki admin and he has the robots text set to allow. when I look at the source on the pages I see that it says
meta name="robots" content="index,follow"
so in order to test that the other linked pages were not a problem, I added those links to the content souce manually and recrawled.. example source page has three links
site A
site B
site C.
I added Site A,B and C urls to the crawl source. The results of this crawl are 4 successes, the primary souce page and the other links A,B, and C i manually added.
So my question is why wont the crawler crawl the link on the page? is this something I need to do with the crawler on my end or is it something to do with how namespaces are defined and links constructed with Dokuwiki?
Any help would be appreciated
Eric
Did you disable the delayed indexing options and rel=nofollow options?
The issue was around authentication even though no issues were reported suggesting it was authentication in the FAST Crawl Logs.
The fix was adding a $freepass setting for the IP address of the Search indexing server so that Appache would not go through the authentication process for each page hit.
Thanks for the reply
Eric

Site down because of moving to another host bad for seo?

I have bought a ipad website and it's moved to my server.
Now i have tried to make an addon domain, but it does not work on my first hosting account.
On my second hosting account it works, but on that server there is another ipad website so i don't think this is smart to do because of the same ip adresses.
So adding an addon domain does not work and the site is down now!
I have added a service ticket, but i think this will cost at least 8 hours before i get an answer.
Can anyone tell me how bad this is for my serp position in google.
The website has always been on the first page.
Will this 404 error do bad to my site?cOr is it better to place the site on the same server as the other ipad website?
EDIT:
It is not ideal to serve a 404/timeouts, however your rankings should recover. You mentioned that the sites are different. Moving the site to a different server/IP shouldn't matter too much as long as you can minimize the down time of the said process performed (and should probably be preferred over downtime, if possible). I want to ensure this is communicated, but do NOT show site #2 as site #1 in the short term as you will experience duplicate content issues.
If you don't already have it, you might open up a Google Webmaster Tools account. It will provide you with some diagnostics about your outage (e.g. how many attempts Google tried, the returned response codes, etc..) and if something major happens, which is unlikely, you can request re-inclusion.
I believe it is very bad if the 404 is a result of an internal link.
I cannot tell you anything about which server you should host it on though, as i have no idea if that scenario is bad. Could you possibly host it on the one server, then when the next is up, host it from there?