Apache access logs show a domain name where IP addresses usually are - apache

Very rarely I will get a computer attempting to connect to my server with a domain name show-up where the IP addresses usually are. Can someone explain why this is happening and if this is something I should keep a closer eye on?
(related log snippet)
403 - ec2-52-53-242-144.us-west-1.compute.amazonaws.com - - [30/Nov/2017:20:26:47 -0500] "OPTIONS / HTTP/1.1" 339 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"

Related

Why does one specific customer's IP get refused (403 error) from our apache2.4?

We never had any problem and we didn't deploy anything, but one particular customer on his ipv6 addr is now getting 403 error from our Apache and I just can't figure out why.
I'm not sure what to provide but I double check every a2 config file.
I can see the customer access in the access.log (with the 403 code status), but nothing in the error.log.
access.log :
2a02:2788(...):102f - - [17/May/2021:12:54:12 +0200] "GET /page_url HTTP/1.0" 403 368 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.75"
2a02:2788(...):102f - - [17/May/2021:12:54:15 +0200] "GET /page_url HTTP/1.0" 403 368 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.75"
It's not on the application level too, we don"t have anything that return a 403 error.
Any idea on what Apache can do to trigger 403 error specificly on IP ?
Why/how is the customer seemingly making an HTTP/1.0 request? This alone could be sufficient reason for the server to reject the request since normal users using normal browsers don't send HTTP 1.0 requests. (HTTP/1.1 is expected.)
Generally, only certain bots make HTTP 1.0 requests.
An Apache module like mod_security could potentially have a rule that would block such requests. (Or any other rule using mod_rewrite, for instance, could also block such requests - but this is certainly not a default.)
Edg/89.0.774.75
It would seem this may have been a bug with Microsoft Edge, as the following Microsoft community post (from around the same time as this question) would seem to suggest:
https://answers.microsoft.com/en-us/microsoftedge/forum/all/internet-explorer-and-ms-edge-sends-ssl-requests/22708bcd-f196-45fb-84c9-6d8c34e7e08f
And as also noted in the above article, this would seem to have been "fixed" in later versions. So, your customer may also now be "fixed". (?)

log format for goaccess log analysis

Installed goaccess, and trying to parse/analyse one log file. Facing issues in the log format. Any one knows the format we need to use - for below kind of log:[updated the log sample]
::1 - - [24/Jun/2013:17:10:39 -0500] "GET /favicon.ico HTTP/1.1" 404 286 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36" 0 -
It worked after using --log-format=COMBINED.
Answer credits to #Pete Darrow.

301 error randomly happening for python cgi file request

I have a URL which sometimes fails to resolve and kicks me back to its parent directory. So I type this:
www.mysite.com/hub/parent/mycgi.cgi
... and get sent here instead:
www.mysite.com/hub/parent/
The parent dir in my file system has an index.cgi page that ends up showing, and this index.cgi has the exact same stats and permissions as mycgi. 775 and the group/owner are the same.
This problem is hard to reproduce, but some combination of logging in and out while incognito, then trying the URL in the browser causes the issue. I don't see anything in my httpd/error_log, but in the access log I can see:
<internal proxy IP> - - [10/May/2017:11:52:41 -0700] "GET /hub/parent/mycgi.cgi? HTTP/1.1" 301 236 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Geck..."
I also see this (sometimes) when I add a ?:
<internal proxy IP> - - [10/May/2017:11:35:58 -0700] "GET /hub/parent/mycgi.cgi? HTTP/1.1" 301 236 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Geck..."
I know that 301 means "Moved Permanently", but these files are not moving... How is this possible and what can be done to fix it?

Where statement in Kibana search?

This is a typical log line from Apache being stored in AWS Elasticsearch. I'd like to be able to add a viz to my dashboard showing top referrers. The problem is that many static files have referrers from its own domain which prevents me from seeing the data I want.
Is it possible to have a search expression like "where REFERRER does not contain VHOST"
123.456.78.9 - - [15/Feb/2017:18:33:25 +0000] example.com "GET / HTTP/1.1" 200 42766 "http://facebook.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1" Server=aws8 SSL=- 8868 0
123.456.78.9 - - [15/Feb/2017:18:33:25 +0000] example.com "GET /js/lib/jquery-ui/jquery-ui.js HTTP/1.1" 200 42766 "http://example.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1" Server=aws8 SSL=- 8868 0

Apache access log investigation

I have been monitoring google analytics of our ecommerce server. Normally we would have less than 10 visitors. However recently I been seeing unusual bot activities. Sometimes it jumps to over 50 connections at a time. All in within few minutes. I am not sure if it is a bad crawler or someone committing click fraud on our google PPC ad campaigns.
Following is a small part from our access_log. Checking ip addresses does not reveal much. Also ipaddresses are unique and I could not find any repeat access from same ip when I compare over a few days.
76.189.130.73 - - [27/Feb/2016:21:32:25 -0600] "GET /hp-ce260x-toner-cartridge.html HTTP/1.1" 200 11548 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/4E423F"
71.82.43.43 - - [27/Feb/2016:21:32:26 -0600] "GET /hp-cb540a-oem-black-toner-cartridge.html HTTP/1.1" 200 11497 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
68.4.69.7 - - [27/Feb/2016:21:32:25 -0600] "GET /hp-c9723a-magenta-laser-toner-cartridge.html HTTP/1.1" 200 11233 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36"
50.54.179.218 - - [27/Feb/2016:21:32:26 -0600] "GET /hp-q5942xd-black-toner-cartridge.html HTTP/1.1" 200 11299 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36"
64.213.217.226 - - [27/Feb/2016:21:32:28 -0600] "GET /hp-q2682a-yellow-toner-cartridge.html HTTP/1.1" 200 11336 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36"
50.25.245.238 - - [27/Feb/2016:21:32:29 -0600] "GET /hp-ce255x-oem-high-yield-toner-cartridge.html HTTP/1.1" 200 11196 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36"
I am not sure if this is related but I also see a few crawling from ahrefs.com/robot/ and webmeup-crawler.com/, but their ip addresses are consistent. I have already modified robots.txt to block ahrefs.com bot.
robots.txt can be abused, but it's mainly meant for google bots looking for what's available to be searched for. I have noticed in my own log that both google and random IP addresses tries a variety of different directories including these:
/phpMyAdmin/scripts/setup.php
/phpmyadmin/scripts/setup.php
/pma/scripts/setup.php
/robots.txt (Google in this case)
'9\xdd\xb1\xf8\xa1\xa8\xa8\x82\x904\x1f\x84\xbeNv\x7fa\xd9\xd4,)\x98^\xbf\x98\x14\x82q
\x19\xa5\b\x7f\xee\x98\x02\xde_\xa1\x1b\xc0
\x06\xe6\xf2\xba\"!=\xe1\x18?\xb6\xf5$\xb4n0[\x92\xe9_
\x8b[Y5nS\x1d (some kind of hash cracker)
//wp-login.php
/blog//wp-login.php
/wordpress//wp-login.php
/wp//wp-login.php
/?author=1
What they are looking for are mostly pre-created directories from from free download templates.
You should know that nearly all IP's starting on 66.249 are google.
The rest you can lookup yourself.
In your case it looks like the bot(s) are looking for an HP printer to mess with.
Hope this helped