Googlebot-Image/1.0 requesting multiple images - apache

for a while I experienced that the google image bot is requesting a bunch of images in a single request. This request always ends up in a 404, but all images exist.
The request url consists a comma seperated list of URLs.
Here is a line from the apache access.log:
66.249.76.96 - - [21/Nov/2018:15:25:14 +0100] "GET /images/img1.jpg,https://example.com/images/img2.jpg,https://example.com/images/img3.jpg HTTP/1.1" 404 10459 "-" "Googlebot-Image/1.0"
Is this request type even possible? And how can I fix the server to serve the images?
Thanks in advance.

Related

Bigger HTTPS size

On apache access log, I find out that https return bigger size than http
210.10.0.156 - - [29/Apr/2019:12:22:46 +0800] "GET /robots.txt HTTP/1.1" 200 5837 "-" "curl/7.52.1"
If you can see it is 5837 bytes where is for http less than 1000 bytes
my robots.txt content is only
User-agent: *
Disallow: /
Is this a normal things?
When i tried to do same thing on other server with cpanel installed, the size is much more lower, im not sure what configuration i missed, any advise?
Yes, this is perfectly normal.
Your website may not be configured for HTTP, which will redirect people to HTTPS with a 301 or 302 rule, meaning all they see when they access your site through HTTP is a redirect page, which is usually smaller than the regular webpage which they were expecting.
You can go to your website with http:// or https:// at the start of the URL and see if it looks any different.

Web page not loading CSS. HTTP works, HTTPS does not

I have an HTTPS-based site that loads CSS via HTML just fine, but not via HTTPS.
http://site/foo.css
... loads the asset fine. But...
https://site/foo.css
Does not. I get an Apache 502 error. The Apache access log shows:
[07/Nov/2018:10:17:20 -0800] "GET /foo.css HTTP/1.0" 200 95568 "-" ...
That tells me that it's trying to load my foo.css as HTTP even though I specified HTTPS. Also note that while my browser gives a 502 error, I get a 200 response in the logs.
Seems like some sort of HTTPS misconfiguration but I'm not sure what. Help?
Use
//site/foo.css
instead of adding protocol when linking your css.
link everything with https as secure sites does not support mixed content.

Apache access log, strange post requests

Getting lot strange requests in my access log:
ip login:"-" - - [24/May/2017:01:26:30 +0700] "POST /3A348409-DD98-D443-96A4-D712F51D8B11/D89B1EDB-4CED-D145-9246-16243451D23D/from HTTP/1.0" 404 1346 Time:"2s" pid:23050 Mem:"2097152
ip login:"-" - - [24/May/2017:00:48:35 +0700] "POST /3A348409-DD98-D443-96A4-D712F51D8B11/E970DBFE-0DB1-A749-9392-CF1704CC81FD/from HTTP/1.0" 404 1348 Time:"0s" pid:22893 Mem:"4194304"
ip login:"-" - - [23/May/2017:00:33:08 +0700] "POST /CE92AFB2-2FDE-8742-B5ED-0629F2B9B622/2D682DC1-D8C5-574F-8A0E-AC62EB96CBD8/from HTTP/1.0" 404 1348 Time:"0s" pid:6695 Mem:"4194304"
...
Also, sometimes (not so frequently), getting another type of logs records containing parts of my HTML pages:
ip login:"-" - - [23/May/2017:14:00:49 +0700] "GET /static/legacy/js/ion%20value=201602>%D4%E5%E2%F0%E0%EB%FC%202016</option><option%20value=201601>%DF%ED%E2%E0%F0%FC%202016</option><option%20value=201512>%C4%E5%EA%E0%E1%F0%FC%202015</option><option%20value=201511>%CD%EE%FF%E1%F0%FC%202015</option><option%20value=201510>%CE%EA%F2%FF%E1%F0%FC%202015</option><option%20value=201509>%D1%E5%ED%F2%FF%E1%F0%FC%202015</option><option%20value=201508>%C0%E2%E3%F3%F1%F2%202015</option><option%20value=201507>%C8%FE%EB%FC%202015</option><option%20value=201506>%C8%FE%ED%FC%202015</option><option%20value=201505>%CC%E0%E9%202015</option><option%20value=201504>%C0%EF%F0%E5%EB%FC%202015</option><option%20value=201503>%CC%E0%F0%F2%202015</option><option%20value=201502>%D4%E5%E2%F0%E0%EB%FC%202015</option><option%20value=201501>%DF%ED%E2%E0%F0%FC%202015</option><option%20value=201412>%C4%E5%EA%E0%E1%F0%FC%202014</option><option%20value=201411>%CD%EE%FF%E1%F0%FC%202014</option><option%20value=201410>%CE%EA%F2%FF%E1%F0%FC%202014</option><option%20value=201409>%D1%E5%ED%F2%FF%E1%F0%FC%202014</option><option%20value=201408>%C0%E2%E3%F3%F1%F2%202014</option><option%20value=201407>%C8%FE%EB%FC%202014</option><option%20value=201406>%C8%FE%ED%FC%202014</option><option%20value=201405>%CC%E0%E9%202014</option><option%20value=201404>%C0%EF%F0%E5%EB%FC%202014</option><option%20value=201403>%CC%E0%F0%F2%202014</option><option%20value=201402>%D4%E5%E2%F0%E0%EB%FC%202014</option><option%20value=201401>%DF%ED%E2%E0%F0%FC%202014</option><option%20value=201312>%C4%E5%EA%E0%E1%F0%FC%202013</option><option%20value=201311>%CD%EE%FF%E1%F0%FC%202013</option></select></td></tr><script%20type= HTTP/1.0" 404 1347 Time:"0s" pid:15377 Mem:"4194304"
Anyone know something about it?
OS: ubuntu 15.10 x64
Apache: v 2.4.24
Looks to me like someone found a cross-site scripting (XSS) vulnerability somewhere in your code.
Without seeing the code found in the file found (presumably) at /static/legacy/js/ion, it's almost impossible to offer any advice or answers as to what needs to be done.
Generally speaking though, somewhere along the line there's code that exists which is producing output without first being sanitized. It could be inside that file, or maybe even inside the file that produces the output that writes that line.
Either way, it would probably be best to search for things like $_POST, $_GET, $_REQUEST, etc., that are producing output provided by the user without first being sanitized.

Strange GET request in Apache Log

I'm monitoring my website with apache log and i saw some stranges requests, see:
51.255.65.74 - - [28/May/2016:11:48:02 -0300] "GET /insert/xahanave.html HTTP/1.1" 404 1035 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.1; +http://ahrefs.com/robot/)"
207.46.13.128 - - [28/May/2016:11:49:13 -0300] "GET / HTTP/1.1" 200 14188 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
66.249.64.87 - - [28/May/2016:11:49:32 -0300] "GET /css/kin8tengoku-1144-may.html HTTP/1.1" 404 1039 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Well, my FTP don't have the folder "/insert/xanahave", neither file 'kin8tengoku' in folder css. Is it possibile make a request to a non existen file/folder ?
Important: Some days ago my site was hacked and a "insert" folder was created without permission in FTP, but now everything was clean and folder "insert" don't exist anymore. My big question is, why requests to this folder continue ?
Because the files were picked up by Ahrefs, Bing search engine and Google search engine when they were up and they periodically recheck files to see if there are any changes. This is how Google and the like return up to date information on your site.
You can see it's these companies from the user agent sent (at the end of each line). Now some, more nefarious bots, sometimes pretend to be GoogleBot but a quick Google of these IP addresses show these to be legitimate ones.
As you can see your server correctly responds with a 404 (page not found status) and, providing there are no links to them, then these companies will eventually take the hint and drop them from their index and stop requesting them. Can take a month or two. They don't do this immediately in case the 404 is an error because you accidentally removed the page or similar.

Apache 500 Error due to User Agent?

I am currently getting 500 errors from Apache using a alarming probe shell script that has been provided to myself.
Unfortunately I have not been able to get to the bottom of why the script generates a 500 error when attempting to access content locally on the server but using other methods like wget and telnet works fine.
The following are the Apache access log entries for each of the attempts:
Using Wget
127.0.0.1 - "" [19/Mar/2013:14:31:44 +1100] "GET /index.html HTTP/1.1" 200 1635 "-" "Wget/1.13.3" "-"
Using Telnet
127.0.0.1 - "" [20/Mar/2013:13:12:11 +1100] "GET /index.html HTTP/1.1" 200 1635 "-" "-" "-"
Using the Probe Scripts
127.0.0.1 - - [19/Mar/2013:14:33:56 +1100] "GET /index.html HTTP/1.1" 500 - "-" "" "-"
The only difference I can see is that the probe has a - instead of a "" in the user agent (3rd item) which either way tells me it wasn't passed in any of the instances (as this is expected since there is no authentication).
I've bumped up the logging for everything in Apache and can't figure out what is amiss. There is no processing involved, it's a static file, and I have attempted with other file types too, like images to no avail.
Does anyone have any ideas or has seen something similar?
Thanks,
Tony