Get Top 50 IPs from access.log including IPV6 - apache

I often use this to check website access logs by IP address. The problem is that it only includes IPV4 and not IPV6.
Any idea what regex I can use so that it includes (or runs a separate) command for IPV6?
cat access.log | sed -e 's/^\([[:digit:]\.]*\).*"\(.*\)"$/\1 \2/' | sort -n | uniq -c | sort -nr | head -50

Matching IP addresses via regular expressions can be tricky - yours matches lots of things that aren't valid IPv4 addresses, like 100000.55, for example.
There's a perl module, Regexp::Common that provides well tested regular expressions for matching all sorts of things, including both IPv4 and IPv6 addresses. If you install it (The Ubuntu package is libregexp-common-perl), you can replace the sed part of that pipeline with
perl -MRegexp::Common=net -lne '/^($RE{net}{IPv4}|$RE{net}{IPv6}).*"(.*)"$/ && print "$1 $2"'
to match both address families.

Related

ssh-keyscan - Is there any way to specify a port other than 22 from within the file specified by -f?

I am looking for a way using ssh-keyscan to possibly define a port within the keyscan file specified with the -f flag instead of having to specify it on the command line.
The following is how I currently do it:
/usr/bin/ssh-keyscan -f /home/ansible/.ssh/test-scanner-keyscan_hosts -p 16005 >> /home/ansible/.ssh/known_hosts;
Contents of the keyscan file:
mainserver1.org,shortname1,shortname2
mainserver2.org,shortname1,shortname2
The issue is, each "mainserver" has a unique ssh port that is different from the others. While this will cause mainserver1 to work, since it's port is 16005, mainserver2 will fail because it's port is 17005. The only way around it currently is to try to do each ssh-keyscan command separately and specifying each different port such that it works.
Instead, I want to be able to specify them within the file, and/or utilize a method that allows for a scanning of a list allowing for unique ports. The issue is, there doesn't seem to be any way to do that.
I tried the following within the keyscan file, and it does not work:
mainserver1.org:16005,shortname1,shortname2
mainserver2.org:17005,shortname1,shortname2
Is there any way to make this work, or any way other than ssh-keyscan, or some other way within ansible to make this function like I hope it does? Otherwise, I have to do an ssh-keyscan task for EVERY server because the ssh ports are all different.
Thanks in advance!
You're actually welcome to use that format, and then use it to drive the actual implementation since ssh-keyscan -f accepts "-" to read from stdin; thus:
scan_em() {
local fn
local port
fn="$1"
for port in $(grep -Eo ':[0-9]+' $fn | sed s/:// | sort | uniq); do
sed -ne "s/:${port}//p" $fn | ssh-keyscan -f - -p $port
done
}
scan_em /home/ansible/.ssh/test-scanner-keyscan_hosts >> /home/ansible/.ssh/known_hosts

Apache server running at nearly 100%

We have just moved our web apps to a self hosted site on digital ocean, vs our previous web host. The instance is getting hammered by rpm's according to New Relic but we are seeing very few page views. Throughput RPM's are around the 400rpm stage where as we only have about 1 page view per minute.
When i look at the access log it is getting hammered with what i am guessing is spambots, trying to access the non existant downloads folder. Its causing my CPU to run at 95%, even though nothing is actually happening.
How can i stop this spamming access??
So far i have created a downloads folder and put a Deny All in a htaccess file in it. That appeared to cool things down but now its getting worse again (hence the desperate post)
Find a pattern of malevolent requests and restrict the IP they are coming from.
Require a hashed headrt to be provided for each request to verify the identity of the person/group wanting access.
Restrict more than N downloads to any IP over M time threshold.
Distribute traffic load via DNS proxying to multiple hosts/web servers.
Switch to NGINX. NGINX is more performant than Apache in most cases with "high-levels" of requests. See Digital Ocean's article --> https://www.digitalocean.com/community/tutorials/apache-vs-nginx-practical-considerations.
Make sure your firewall employs a whitelist of hosts/ports. NOT *
I'd use tables to drop any connection from the spam bot ip address.
Find which ips are connected to your apache server:
netstat -tn 2>/dev/null | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head
You should get something like:
20 49.237.134.0
10 31.187.6.0
15 166.137.246.0
Once you find the bot ip addresses (probably the ones with higher number of connections), use iptables to DROP further connections:
iptables -A INPUT -s 49.237.134.0 -p tcp --destination-port 80 -j DROP
iptables -A INPUT -s 31.187.6.0 -p tcp --destination-port 80 -j DROP
iptables -A INPUT -s 166.137.246.0 -p tcp --destination-port 80 -j DROP
Note:
Make sure you're not dropping connections from search engine bots like google, yahoo, etc...
You can use www.infobyip.com to get detailed information about a specific ip address.

How can I remove a Host entry from an ssh config file?

The standard format of the file is:
Host example
HostName example.com
Port 2222
Host example2
Hostname two.example.com
Host three.example.com
Port 4444
Knowing the host's short name, I want to remove an entire entry in order to re-add it with new details.
The closest I have got is this, except I need to keep the following Host declaration and the second search term will capture two many terms (like HostName):
sed '/Host example/,/\nHost/d'
With GNU sed:
host="example"
sed 's/^Host/\n&/' file | sed '/^Host '"$host"'$/,/^$/d;/^$/d'
Output:
Host example2
Hostname two.example.com
Host three.example.com
Port 4444
s/^Host/\n&/: Insert a newline before every line that begins with "Host"
/^Host '"$host"'$/,/^$/d: delete all lines matching "Host $host" to next empty line
/^$/d: clean up: delete every empty line
My final solution (which is somewhat OSX orientated) was this. It's largely based on Cyrus' earlier answer.
sed < $SOURCE "/^$/d;s/Host /$NL&/" | sed '/^Host '"$HOST"'$/,/^$/d;' > $SOURCE
This is more resilient to HostName directives which aren't indented.

Get VM IP addresses eth0 ... ethn with ESX commands

I need get all IP addresses that a VM has it. I only can use ESX commands, no PowerCLI.
If I use, with or whithout grep:
vim-cmd vmsvc/get.summary 1 | grep -i "ip"
I only get the first vNIC IP address, I need all :-(.
If, it is possible...
Thanks to all !!
So, finally I get the solution.
With the command:
vim-cmd vmsvc/get.guest <vmid>
We will have many keys ipAddress with the values, that we will may parse with grep command.

Grep logs for IPs in the format [client 123.456.78.90]

I'm a grep and sed newbie, and have read through a bunch of answers on SO referring to grepping IPs in apache logs with no luck for my particular situation.
I have megs of error logs from bots and nefarious humans hitting a site, and I need to search through the logs and find the most common IPs so I can confirm they're bad and block them in .htaccess.
But, my error logs don't have the IP as the first item on the line as it seems most Apache logs do, according to the other answers here on SO. In my logs, the IP is within each line and in the format [client 123.456.78.90].
This older answer is exactly what I need, I think, Grepping logs for IP adresses as it "will print each IP... sorted prefixed with the count."
But according to the answerer, "It assumes the IP-address is the first thing on each line."
How can I modify the sed command from that answer for the IP format [client 123.456.78.90] rather than the IP on the first line of each log entry?
sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c
8/25/14 This works re: Kent's answer below:
grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+' logfile|sort|uniq -c
Update 9/02/14
To sort by number of occurrences of each IP;
grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+' logfile|sort -n | uniq -c | sort -rn
grep is for Globally finding Regular Expressions on individual lines and Printing them (G/RE/P get it?).
sed is for Stream EDiting (SED get it?), i.e. making simple substitutions on individual lines.
For any other general text manipulation (including anything that spans multiple lines) you should use awk (named after 3 guys who ran out of imagination for naming tools).
awk '
match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/) { cnt[substr($0,RSTART,RLENGTH)]++ }
END { for (ip in cnt) print cnt[ip], ip }
' logfile
dirty and quick :
grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+' logfile|sort|uniq -c
a big diff between sed and grep is: sed can change the input text (like substitution), but grep can't. :-)