Remove google indexing from our image server - seo

We do a lot of email marketing and sometimes developers will put the html file out on the image server (i know the easy answer is to not do this) but those html files end up getting indexed by Google and eventually rank high on search results. Which in turns makes the SEO company's want us to remove these pages. Is it possible to have google not index anything from our sub domain? we have image.{ourUrl}.com where we put all these files.
Would putting a robot.txt file in the main directory do it? Or would we need to add that robot text file in every directory?
Is there an easy way to blanket this?

A robots.txt file would just stop crawling, files might still be indexed. a noindex directive would work, you could use an x-robots-tag. See here https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

Related

Hide PDFs from Google and Smart Search in Kentico

Not sure if this is even possible but wanted to give it a shot.
Is it possible to add PDFs and other files to Kentico Media Library folder that wouldn’t be searchable through Google or another search engine? It also should not be searchable through Kentico's Smart Search.
Users should be able to access it ONLY in case they know the full URL.
I know I can add the path to robots.txt to disable indexing, but is there are more foolproof way?
Thanks.
By default, any files in the Media Library are not searchable by Kentico's smart search index (you need to add files to the Content Tree to be able to index them, or create a custom indexer yourself).
The robots.txt is the way to go, search engines honor it as long as it's set properly.
If you want to take another step, you would have to modify the Response the server gives for those files and include the headers
X-Robots-Tag: noindex
there are more tags to look at here.
You can modify the response tags through the URL rewrite engine in IIS.

Why My website name when indexed google changes to japanese font

This is a malware or what, when i type my site name keyword then changes to Japanese font? Can you tell me why its happen?
Just make sure that nobody injected malicious code in one of your website index. Maybe there is code which directs you to this site.
You can try:
make a new fresh clean index
delete all of your strange files or better templates of your website
try out Google Search Console

.htaccess redirect hotlinked PDF files, no single rule for where

I've seen a lot of anti-hotlink strategies, but so far none where each file needs a unique redirect.
My employer's site has over 500 PDF files of original artwork for printable papercrafts which she offers for free, monetizing through ads.
What we're trying to prevent is others simply linking to our .pdf files and letting their users access our content without ever seeing our ads. The goal is to catch these external links and redirect them to our .html page which links to that file.
What makes this different from a lot of problems I've read is that while we want to get the user as close to the file they're seeking as possible, there is no calculable link between the file names of the .pdf requested, and the .html where they should land.
The best idea I've come up with so far, given my knowledge of .htaccess is to use the best mod_rewrite anti-hotlink strategy I can find to rename /PDF/file.pdf to something like /PDF/file.redirect, then write a separate redirect rule for each one, such as /PDF/fall-leaves.redirect to /seasons.html, and so-on.
Is there a better solution to this problem?
Thanks,
John
You can use a RewriteMap instead of a bunch of rules. See the Apache documentation for more details on how that works, but it's basically a lookup table.

Google, do not index YET

In the effort of building a live site on its actual live hosting platform is there a way to tell google to not YET index the website? I found the following:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
But would that tell them to never come back or would they simply see the noindex tag and then not list the results, then when it comes back to crawl again later and my site is good to go I would have the noindex removed and the site would then start getting indexed?
Sounds like you want to use a robots.txt file instead:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449&topic=2370588&ctx=topic
Update your robots.txt file when you want your content to be indexed.
You can use the robot.txt method.
You can specify which subpage could be spidered. And google comes back, checking the file before indexing. So you can delete the file later in order to get fully indexed.
More Information
About /robots.txt
Robots.txt File Generator
You can always change it. The way Google and other robots find your page is if it is linked to on another page. As long as it isn't linked to on another page, it won't be found. Also, once your site is up, chances are that it will be far back in the list of sites.

Google crawling XML file

I need xml file for indexing my website for google crawling. I'm using some software to make XML file. My question is do I need to list all dynamic pages. I mean like this:
http://mysite.com/page/?id=01
http://mysite.com/page/?id=02
http://mysite.com/page/?id=03
http://mysite.com/page/?id=04
http://mysite.com/page/?id=05
if yes, why is that? and what is going to happend if I wouldnt include them and just say:
http://mysite.com/page/
If I include all the id's the result would be a huge XML file. Does google accept this such a large file or they have limit for it?
Thanks in advance for all help and time.
Google isn't going to index all your dynamic pages anyways. It will throw many of them out even if you put them in the sitemap.xml. The content will be too similar.
There is a limit to the number of entries in a sitemap.xml It used to be ~50k pages/10MB. In my experience Google will crawl a few thousand and stop if they look too similar and have no inbound links.
You do not need an XML sitemap at all. It just makes it easier for google to crawl your content.
And obviously you don't have to put dynamic stuff in it.
If this is a real issue, try reading up on rel="canonical" which is made to exclude those types of pages from Google. While it's usefulness is based on use case, you may find it is the right solution for you.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394