Why google index this? [duplicate] - seo

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why google index this ?
In this webpage:
http://www.alvolante.it/news/pompe_benzina_%E2%80%9Ctruccate%E2%80%9D_autostrada-308391044
there is this image:
http://immagini.alvolante.it/sites/default/files/imagecache/anteprima_100/images/rifornimento_benzina.jpg
Why this image is indexed if in the robots.txt there is "Disallow: /sites/" ??
You can see that is indexed from this search:
http://www.google.it/images?q=rifornimento+benzina&um=1&ie=UTF-8&source=og&sa=N&hl=it&tab=wi&biw=1280&bih=712
.
P.S. robots.txt is in the domain alvolante.it and in the subdomain immagini.alvolante.it
P.P.S. This is NOT my website... so I can't use google webmaster tool.

If this image has been hotlinked by another website, google will crawl it - despite its original domain's robots.txt
There is really not much you can do, despite trying to avoid hotlinking through .htaccess or some other form of redirection or IIS directive. However, as you are not owner of the original site, these alternatives might not be viable.

Related

MediaWiki on Subdomain (.htaccess rewrite)

I am using an Apache Server (cant config apache root files) and running my core website (Invision Power) in the root domain "example.com". We decided to expand our services with a wiki using MediaWiki which is installed and can currently be reached on "example.com/w/".
I am utterly noobish with .htacess and Rewrite Conds/Rules and looking for help! We want our wiki to be access via wiki.example.com - and this URL should NOT change in the adressbar. Each page (wiki.example.com/Main_Page) should be accessed like this.
Please keep in mind that we want our core website keep working as it did for years now. So example.com and any other folders should not be affected by the Rewrite Rule.
Can someone please help - do you need any further information??
THANK YOU SO MUCH
Best Regards

redirect to another domain, but keep the url [duplicate]

This question already has answers here:
Redirect to other domain but keep typed domain
(2 answers)
Closed 6 years ago.
I have 2 vhosts: www.example1.com and www.example2.com
I have a page on www.example1.com/page.html
I want this page to be accessible in the other domain, meaning I want people to be able to get to www.example2.com/page.html and view that page that is physically on the other vhost. But the URL must display www.example2.com
I don't want to simply move the page physically to the other vhost, because it is a wordpress page and it would be a huge hassle.
Is this possible?
Yes it's possible.
I suggest reverse-proxy to do that. A software like NGinX does it very well, as well as apache of course (mod_proxy I recall).
Anyway, the downside of that approach is that your server (on www.example2.com) would request the page from example1 and send it through itself, and not directly from example1.
By the way - this technique is being used for load balancing as well.
Edit #1:
In nginx it's called ngx_http_upstream_module.
Edit #2:
grahaminn is correct yet my approach displays URL "correctly" - instead of one fixed URL which would make problems of, for example, bookmark a specific page.
Two options:
Use Apache ProxyPass: http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
Use Rewrite with [P] flag (requires mod_proxy installed)
RewriteRule /(.*)$ http://example2.com/$1 [P]
http://httpd.apache.org/docs/2.4/rewrite/flags.html
You can just serve a page from example2.com that uses javascript or an iframe to load the content from example1.com

Using GoDaddy Domain Hosting to link to Amazon S3 Website [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
So I have some domains hosted at GoDaddy.com. I am trying to avoid to pay for a hosting service, except for using a static Amazon S3 page (as I don't expect much traffic at all for these sites). I have had some success, but not sure if this is a poor solution...
What I did for domain.com:
Set up permanent forwarding on GoDaddy to www.domain.com
Remove all DNS except for the A address to the GoDaddy IP, and a CNAME of www to the Amazon S3 site
It works as planned if someone types in www.domain.com. It seems to work alright for domain.com, too. However, it seems to do a 302 redirect instead of 301 even when I tell GoDaddy to have it be a permanent forward. I can ultimately goto Google Webmaster tools and say that I want it to use www.domain.com. However, that seems a little excessive.
Any suggestions on how to make this solution work better?
Possibly by changing some of the DNS settings or some other GoDaddy options that I don't know about?
You must name your S3 bucket the same as your domain, "www.example.com" Make sure you include the "www." subdomain prefix as part of the bucket name.
Set up your bucket as a web site per Amazon's instructions!. Make sure you have an "index.htm" file name entered and the correct bucket policy setup under Permissions.
Under Godaddy DNS settings make just one entry, set Host WWW CNAME, points to, "s3-website-us-east-1.amazonaws.com" or whatever s3 domain Amazon supplies for your bucket. You leave off the "http://www.example.com." heading in the url that Amazon supplies.
The last step under Forwarding/manage is to "forward only" your naked domain name "example.com" to "www.example.com"
If you did it right your browser will display your site as "www.example.com" whether you entered the www or not when you entered the url.
NOTE: You could just Forward to your bucket using the complete bucket url with "name masking", however most web crawlers will not see your complete site if you do it that way and web searches will fail.
Be sure to wait at least 30 minutes before testing your changes and by all means clear your cache in your browser or it will use the old address that it remembers from the past.
The DNS-apex-requires-an-A-record problem is definitely not well solved.
I can't personally vouch for them, but www.wwwizer.com hosts a free redirect service.

block search engines from indexing dev sites

i think one of my sites recently got delisted from google because it found and started indexing my dev site. it is basically a replica of my main site. (dev.site.com & site.com)
anyway, is there a way to create one robot.txt that would prevent any traffic to dev.site.com from being indexed, leaving site.com to still be fully indexed.
i know i could just have separate robot files for each, but it would just be easier to have one that covers both. especially since i work with a whole of sites which have dev sites, and would just like to have an easy workflow and not have to change the robot files when i push new versions of site to live.
Perhaps you could serve the robots.txt file dynamically, e.g. via PHP:
<?php
if ($_SERVER['HTTP_HOST'] === 'dev.site.com') {
echo "...";
} else {
echo "...";
}
Another approach is to add a line to your .htaccess file:
Header set X-Robots-Tag "noindex, nofollow"
This is advocated to be superior to the robots.txt as if there is a link to your dev site the search engines will report the link (even if they do not index your site). This is advocated here:
http://yoast.com/prevent-site-being-indexed/
It's part of the standard that each subdomain must have its own robots.txt (if being accessed from dev.site.com; you wouldn't need another for site.com/dev).

Upgrading a site with SEO in mind

I'm managing an established site which is currently in the process of being upgraded (completely replaced anew), but I'm worried that I'll lose all my Google indexing (that is, there will be a lot of pages in Google's index which won't exist in that place any more).
The last time I upgraded a (different) site, someone told me I should have done something so that my SEO isn't adversely affected. The problem is, I can't remember what that something was.
Update for some clarification: Basically I'm looking for some way to map the old paths to the new ones. For example:
User searches for "awesome page"
Google returns mysite.com/old_awesome_page.php, user clicks it.
My site takes them to mysite.com/new_awesome_page.php
And when Google gets around to crawling the site again...
Google crawls my site, refreshing the existing indexes.
Requests old_awesome_page.php
My site tells Google that the page has now moved to new_awesome_page.php.
There won't be a simple 1:1 mapping like that, it'll be more like (old) index.php?page=awesome --> (new) index.php/pages/awesome, so I can't just replace the contents of the existing files with redirects.
I'm using PHP on Apache
301 redirect all your old (gone) pages to the new ones.
Edit:
Here's a link to help. It has a few links to other places too.
You need to put some rewrite rules in an .htaccess file.
You can find lots of good information here. It's for Apache 1.3, but it works for Apache 2, too.
From that article, a sample for redirecting to files that have moved directories:
RewriteEngine on
RewriteRule ^/~(.+) http://newserver/~$1 [R,L]
This reads:
Turn on the rewrite engine.
For anything that starts with /~, followed by one or more of "anything", rewrite it to http://newserver/~ followed by that "anything".
The [L] means that the rewriting should stop after this rule.
There are additional directives that you can use to set a [301] redirect
You could do:
RewriteEngine on
RewriteRule old_page.php new_page.php [L]
But you'd have to have a rule for every page. To avoid this, I'd look at using Regular Expressions, as in the first example.
You can tune Google's view of your site, and probably notify its changes, from within Google Webmaster Tools. I think you should build a sitemap of your current site, and have it verified when the site changes.