Googlebot indexing my localhost dev machine - where did it find my IP? - seo

I've discovered Googlebot indexing my development site (home PC) via its IP address. Surprise. I've changed my .htaccess file to prevent future access, but...
How did Googlebot find me anyway? I made a request to Google to index my live site, but there shouldn't be any links to my IP anywhere on the web.
The only place my IP is listed on my site is in a PHP function that is used to exclude my address from being logged. Can Googlebot (or any bot) harvest IP addresses from raw PHP code?

IP addresses can't likely be harvested from your PHP code because the web server will execute the PHP script and only send the result to the browser.
But there are lots of bots that just scan random IP addresses on port 80 and look for vulnerable software, often using Google's user agent string - did you check if the request's IP address actually belongs to Google? There is even a search engine for IP addresses that have web servers running, you could check if you can find your own host: http://www.shodanhq.com
It is generally a bad idea to let a development server listen on 0.0.0.0, i.e. expose it to the Internet. If you don't need to access it from the outside, let it listen on 127.0.0.1 or you could run into trouble if you don't update it often.

Related

ip addressed not responding : but 127.0.0.1 responding : Apache

I currently have Apache running as part of XAMPP and I am able to run the PHP scripts by accessing them at 127.0.0.1/<program_name>.php but when I try to access them as <my_ip>/<program_name>.php I get no response.
Am I doing something incorrectly or does my configuration need fixing?
assuming you are trying to access from an external ip address you need to setup your router (port forwarding) to send web traffic to the LAN ip of your machine.
you also may need to disable various firewalls at various points in your network.
In short there is not enough information given to provide you a definitive answer.

Finding domain names from IP addresses in Apache access.log

I am using Webalizer on ubuntu to generate apache access.log reports for my site. Webalizer just gives me the ip address from where my site was accessed. I want the domain names for the ip addresses that are retrieved.
Is there any way to get the domain names? Is there some other tool that I need to use? I tried using Awstats but my server admin told me not to because it changes access rights of various different files. I cannot use google analytics or piwik as well.
I tried nslookup but how can i use nslookup on the access.log file? I don't know how to do this. Can anyone help me out?
If you just want to see the registered domain of one IP address, copy and paste the IP address into the upper right field at arin.net
If you have a bunch of IP addresses to look up, there are other web analytics software products you can use - check out Angelfish Software.

How do I redirect a specific port for my subdomain to another IP address

Ok so I have a domain registered, for these purposes I will refer to it as mydomain.com.
I also have Shared Hosting (just fyi) so I may be restricted in doing what I am planning.
So basically I have a sub-domain, gserver.mydomain.com, which points to a directory on the Host server showing basically a seperate website for this subdomain displaying information about it's corresponding gameserver.
Since it's about a gameserver, naturally I would want gserver.mydomain.com to also direct users to the gameserver's IP but I can't have it both to the Web Server and Game Server in the zone record as they are seperate IPs.
If the gameserver listens on.. let's say port 2400, then is it possible to have gserver.mydomain.com:2400 point to another IP (the gameserver's IP) while still retaining the Web Hosts IP on port 80.
I have a general idea of how to go about it but with the current Hosting Plan, restrictions may be preventing me.
Talking about DNS, it's not possible to use port (tcp or udp) information, as it handles only name/ip's (basicaly).
So, gserver.mydomain.com will always be resolved to the IP in the DNS database, regardless of the :port. Actualy, the :port is not part of the DNS name.
If all of your server will be HTTP servers and you have access to an Apache web servers, you can use something like proxy_pass.
You can take a look at this link http://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypass

Error with DOJO when using IP

Strange error with an Project using dojo:
if i call : http://localhost/project everything works like expected.
if i call : http://127.0.0.1/project everything works like expected.
if i call : http://192.168.2.1/project i get the following error (ONLY in IE6!):
"Bundle not found, locale.."
Any ideas?
Iam running Zend Server CE with PHP 5.2
if i add: 192.168.2.1 to "hosts" it works (windows)
Sounds like Zend server is performing some kind of virtual site support using the site name as a partial domain.
I can't say 100% if/how it is beacuse I don't use Zend, but I can explain the principle using Apache as an Example.
There are 3 ways in which a web site can be virtually hosted under a single web server application, this applies to most servers on the market today, Apache, IIS, nginx and many others.
It all boils down to one thing, giving one running server application instance the ability to host multiple individual websites.
The 3 methods of seperating sites are as follows:
By IP address : If you have multiple IP addresses (Usually -but not always beacuse you have multiple network interface cards) then you can tell your server application to listen to one IP for one site, another IP for another site and so on. If you browse to one IP you'll get one site, and likewise the other on the other IP.
By Port Number : If your using only one IP address, then you can bind to multiple port numbers, port 80 is generally the default for web servers, but by browsing to an address and pinning the port number on the end (http://mysite.com:99) you'll force the browser to use that port. You can then have multiple websites listening on different ports and select them manually at browse time as required.
By Host Name Header: This is by far the most common way of supporting multiple sites, all web servers that understand the HTTP/1.1 protocol have to obey a header field in the request that contains the host name, when a request comes in for EG: http://mysite,com/ then there will be an entry in the request header that looks like 'Host: mysite.com' the webserver can then use that to say, oh yes.. I know which one that is.. and it then selects and serves the correct website.
The problems start to arise however when you start to use IP addresses that generally cannot be resolved or have no DNS name, because the web server then doesn't know which hostname to tag it to.
As an example in Apache, if you set up a virtual host, then try to browse that server using just the IP address, you'll get the default server, which in many cases won't even be configured to respond correctly or display anything.
To compound this, going up to web application layer, many frameworks also do their own checks on hostnames and other variables passed to them by the web server, and many make decisions on how to operate based on this information.
If you've gotten to the default web application by IP address, then there's a high chance that the framework may get confused at being presented with an IP address as a host name.
As the OP noted, in many cases, you can add a name to your hosts file and use this as a poor man's DNS substitute, the file to modify can be found in the following locations:
c:\windows\system32\drivers\etc\ - on windows
and
/etc/
on Linux/Unix
The file is generally just called 'hosts' and is a plain text file. Adding a line like:
123.456.789.123 myserver
Will tie http://myserver/ to http://123.456.789.123/
If you can, and your doing a lot of web applications it may be worth setting up your own DNS server, most Linux distros will allow you to install 'Bind' and I do also believe there is a version available for windows too.
I'm not going to go into the pro's and cons of private DNS servers here, it's a whole other subject in itself, but if your likely to be doing a lot of additions to your hosts, then in the long run you'll find it a better option.

What is happening when you enter

First URL stands for Uniform Resource Locator. It will be very difficult to remember an IP address. Instead of remembering the IP addresses URL came like www.intrepidkarthi.com. Url normally contains three parts. For example http://intrepidkarthi.com/index.php. Here "http" refers to the protocol it uses. Then the server name and then the requested file name.
Here I have enlisted the flow of working mechanism behind your browser
The flow of work
Your browser communicates with a name server to translate the server name "www.intrepidkarthi.com" into an IP Address, which it uses to connect to the server machine. * So your browser will see if it already has the appropriate IP address cached away from previous visits to the site. If not, it will make a DNS query to your DNS server (might be your router or your ISP's DNS server). DNS stands for Domain Name Server - For exapmle if you want to get karthik's phone number then you will look into your telephone directory. Likewise your computer doesn't know intrepidkarthi.com's IP address . So it looks into DNS.
The browser then formed a connection to the server at that IP address on port 80. HTTP protocol uses port number 80
The browser sends a GET request to the server, asking for the file "http://www.google.com/karthikeyan.htm". The webserver then returns the requested page and your browser renders it to the screen.
The firewall will control connections to & from your computer. For the most part it will just be controlling who can connect to your computer and on what ports. For web browsing your firewall generally won't be doing a whole lot.
Your router essentially guides your request through the network, helping the packets get from computer to computer and potentially doing some NAT (Network Address Tranlator) to translate IP addresses along the way (so your internat LAN request can be transitioned onto the wider internet and back).
I don't know what I understood is correct or not. I need to understand it completely till the hardware level at the back.
browser has no DNS cache. your operating system's tcp stack has.
the server name in DNS may have many IP addresses. the browsers usually choose one at random.
DNS is a tree. to get www.google.com, you go to google.com name service and get IP of the computer www.
returned HTML page is a small part of the information. In turn, it points your browser to establish many connections to other servers, to bring scripts, pictures, etc.
otherwise okay.