How to pipe tail -f into awk - awk

I'm trying to set up a script where an alert is generated when a certain string appears in a log file.
The solution already in place greps the whole log file once a minute and counts how often the string appears, using the log line's timestamp to count only occurrences in the previous minute.
I figured it would be much more efficient to do this with a tail, so I tried the following, as a test:
FILENAME="/var/log/file.log"
tail -f $FILENAME | awk -F , -v var="$HOSTNAME" '
BEGIN {
failed_count=0;
}
/account failure reason/ {
failed_count++;
}
END {
printf("%saccount failure reason (Errors per Interval)=%d\n", var, failed_count);
}
'
but this just hangs and doesn't output anything. Somebody suggested this minor change:
FILENAME="/var/log/file.log"
awk -F , -v var="$HOSTNAME" '
BEGIN {
failed_count=0;
}
/account failure reason/ {
failed_count++;
}
END {
printf("%saccount failure reason (Errors per Interval)=%d\n", var, failed_count);
}
' <(tail -f $FILENAME)
but that does the same thing.
The awk I'm using (I've simplified in the code above) works, as it's used in the existing script where the results of grep "^$TIMESTAMP" are piped into it.
My question is, how can get the tail -f to work with awk?

Assuming your log looks something like this:
Jul 13 06:43:18 foo account failure reason: unknown
│ │
│ └── $2 in awk
└────── $1 in awk
you could do something like this:
FILENAME="/var/log/file.log"
tail -F $FILENAME | awk -v hostname="$HOSTNAME" '
NR == 1 {
last=$1 " " $2;
}
$1 " " $2 != last {
printf("%s account failure reason (Errors on %s)=%d\n", hostname, last, failed);
last=$1 " " $2;
failed=0;
}
/account failure reason/ {
failed++;
}
'
Note that I've changed this to tail -F (capital F) because it handles log aging. This isn't supported in every operating system, but it should work in modern BSDs and Linuces.
How does this work?
Awk scripts consist of sets of test { commands; } evaluated against each line of input. (There are two special tests, BEGIN and END whose commands run when awk starts and when awk ends, respectively. In your question, awk never ended, so the END code was never run.)
The script above has three of test/command sections:
In the first, NR == 1 is a test that evaluates true on only the first line of input. The command it runs creates the initial value for the last variable, used in the next section.
In the second section, we test whether the "last" variable has changed since the last line that was evaluated. If this is true, it indicates that we're evaluating a new day's data. Now it's time to print a summary (log) of last month, reset our variables and move on.
In the third, if the line we're evaluating matches the regular expression /account failure reason/, we increment our counter.
Clear as mud? :-)

Related

Why does my AWK-Script wait for me to press a key?

I started learning AWK today and I wrote a script that calculates powers of two. When I start it though, it waits for me to press enter and when it's done printing the powers of two, it doesn't print the word "End", although it's defined in the End section.
Here's my code:
BEGIN{
print "Power of two"
x=0
}
{
while(res<=1000){
res = 2^x
print 2 "^" x "=" res
x++
}
}
END{
print "End"
}
You told your script to expect input by providing code between BEGIN and END ({while ... }}) to process that input. That section doesn't start till you provide input by hitting enter and then it won't end until you end the input by typing control-D at which point THEN your END section will be executed.
It sounds like this is what you intended to write instead:
BEGIN{
print "Power of two"
x=0
while(res<=1000){
res = 2^x
print 2 "^" x "=" res
x++
}
print "End"
}
or if you really want to have an END section for some reason:
BEGIN{
print "Power of two"
x=0
while(res<=1000){
res = 2^x
print 2 "^" x "=" res
x++
}
exit
}
END{
print "End"
}
You awk has in middle section that need some input (as #Sundeep writes)
Try this
echo "50" | awk '
BEGIN{
print "Power of two"
x=0
}
{
while(res<=$1){
res = 2^x
print 2 "^" x "=" res
x++
}
}
END{
print "End"
}'
Power of two
2^0=1
2^1=2
2^2=4
2^3=8
2^4=16
2^5=32
2^6=64
End
The reason your program performs further actions upon keypress is that it is awaiting input. Input generally comes from files or output from another command when using a pipe. However, from time to time, awk can await input from /dev/stdin. This is the case when you call your program without any files as argument (See [1] section Extended description: Overall Program Structure), or use <hyphen> as an argument (See [1] section Operands).
$ awk -f source.awk # input from /dev/stdin via keyboard
$ awk -f source.awk [file] - # input from /dev/stdin via [file and] keyboard
$ cmd | awk -f source.awk # input from /dev/stdin via pipe
$ cmd | awk -f source.awk [file] - # input from /dev/stdin via [file and] pipe
Be aware that the above cases might need input from /dev/stdin. The need for input depends on the program structure of awk. So we can ask ourselves now the following question:
When does awk require input from a file, command, keyboard or any other possible form of input?
An awk program is composed of pairs of the form:
pattern { action }
where pattern is generally a logical condition to determine whether or not action should be executed. Posix awk recognizes two special patterns, BEGIN and END. Gnu awk has other special patterns such as BEGINFILE and ENDFILE but for this answer, we can classify them as a regular pattern. We can now make the following statements (See [1] subsection Special patterns):
A regular pattern always requires input.
The special pattern BEGIN does not require input (except when it contains a getline)
The special pattern END always requires input to be read before it is executed
From this we can say:
An awk program does not require input if
it only consists of BEGIN patterns which do not call getline.
or, a BEGIN pattern calls the exit routine before any getline could be called.
in any other case, awk will require input!
The last statement comes from the rules bound to the exit statement. The exit statement shall invoke all END actions in the order in which they occur in the program source and then terminate the program without reading further input. (See [1] subsection Actions)
Based on the above, we can now answer the OP's question:
Why does my AWK-Script wait for me to press a key?
Since the OP's program roughly looks like:
BEGIN { something without exit }
pattern { something else }
END { something final }
it will require input. Furthermore, the OP calls it as
$ awk -f file.awk
which implies that the input comes from /dev/stdin, or in this case, the keyboard. Therefore awk will wait to execute the regular action-pattern pair until it received a record (here a line) from the keyboard. I.e. press some keys followed by an Enter. Every time such a line has been sent, awk will process all regular patterns. The END pattern will only be executed when the input is finished. You can inform awk that the input via the keyboard is finished by sending an end-of-file (EOF) via keyboard. This is done by pressing Ctrl-D.
A clean rewrite of the code can be found in the answer of Ed Morton. A quick workaround in the answer of Jotne.
[1]: Posix standard, utility section, awk

Can this filter be implemented using Sed?

I wrote a script to groom my .bash_history file, filtering "uninteresting" commands like ls from the persisted history.
(I know there's the HISTIGNORE variable, but that would also exclude such commands from the current session's in-memory history. I find it useful to have them around within the scope of a single session, but not persisted across sessions.)
The history file can contain multi-line history entries with embedded newlines, so the entries are separated by timestamps. The script takes an input file like:
#1501304269
git stash
#1501304270
ls
#1501304318
ls | while IFS= read line; do
echo 'line is: ' $line
done
and filters out single-line ls, man, and cat commands, producing:
#1501304269
git stash
#1501304318
ls | while IFS= read line; do
echo 'line is: ' $line
done
Note that multi-line entries are unfiltered -- I figure if they're interesting enough to warrant multiple lines, they're worth remembering.
I implemented it in Awk, but I've been reading about Sed's multiline capabilities (N, h, H, x, etc.) and I'd like to try it for this purpose. If nothing else, I'd be curious to compare the two for speed.
Here's the Awk script:
/^#[[:digit:]]{10}$/ {
timestamp = $0
histentry = ""
next
}
$1 ~ /^(ls?|man|cat)$/ {
if (! timestamp) {
print
} else {
histentry = $0
}
next
}
timestamp {
print timestamp
timestamp = ""
}
histentry {
print histentry
histentry = ""
}
{ print }
Can this be done using Sed?
Sure it can be done with sed. Here is an example using GNU seds -z option, which lets us deal with the whole file at once instead of working line for line:
sed -rz "s/(#[0-9]{10}\n(cat|ls|man)\n)+(#[0-9]{10}\n|$)/\3/g;" yourfile
If everything works fine and you have a backup of your history file you might even use GNU sed -i option for inplace modification.
The -r options enables extended regexp, the -z option is explained in the manual like this:
Treat the input as a set of lines, each terminated by a zero byte
(the ASCII 'NUL' character) instead of a newline. This option can
be used with commands like 'sort -z' and 'find -print0' to process
arbitrary file names.
The basic idea is this: an uninteresting command is preceded and followed by a timestamp (or it is the last line in the file).
the timestamp RE #[0-9]{10} is taken from your awk script
(#[0-9]{10}\n(cat|ls|man)\n)+ matches one or more of the the uninteresting commands
(#[0-9]{10}|$) the second timestamp is captured into \3 (due to being in the third pair of parens) for reuse in the replacement part and the alternation |$ fits the end of file case

Having awk act upon results from a system command

I have found some excellent help here about hoe to invoke external commands from within awk and store results in a variable. What I have not been able to find is how to have awk act upon the result as it would on an ordinary input text file.
I use awk to parse a small HTML file (the status page of a running Tahoe LAFS node) in order to find some IP addresses listed. On each IP address I run an nmap scan of a specific port to see if it is open (yes, this is to become an automated Tahoe LAFS grid monitor). Using an if statement I can pick out the line of the output from nmap that contains the status (open/filtered/closed) of the port as its second field (typically "8098/TCP open unknown"). I would like to strip the line of fields 1 and 3 and only keep field 2, however, $2 of course refers to the fields in the HTML file I use as input to my awk script. I tried a user defined function which just did return $2, but that also refers to the field in the input HTML file.
Is there a way to refer to fields in an internally created variable inside an awk script? Something like a nested awk command within an awk script?
Use the getline "function". It sets $0 to the entire record and $1 through $NF in the usual way:
$ awk '/test/ {
> while (("ping -c 2 google.com") | getline > 0) {
> printf("$1 = %s, $2 = %s\n", $1, $2);
> }
> }'
abc
test
$1 = PING, $2 = google.com
$1 = 64, $2 = bytes
$1 = 64, $2 = bytes
$1 = , $2 =
$1 = ---, $2 = google.com
$1 = 2, $2 = packets
$1 = round-trip, $2 = min/avg/max/stddev
xyz
$
Edit: added parentheses around (cmd | getline) (it works for me without them but I guess some awk variants require it?).
Edit 2: apparently the "parentheses around getline" thing comes from a quite different issue noted in the GNU awk manuals:
According to POSIX, ‘expression | getline’ is ambiguous if expression contains unparenthesized operators other than ‘$’—for example, ‘"echo " "date" | getline’ is ambiguous because the concatenation operator is not parenthesized. You should write it as ‘("echo " "date") | getline’ if you want your program to be portable to all awk implementations.
In this case, the expression before the pipe is a single string, so there is no ambiguity. I moved the parentheses to where they would be needed for a more complex expression.
Also, it's a good idea to call close() on the command after the while loop exits. If there is another line matching test, awk will assume the existing sub-command should be read further, unless it has been close()d. As the command match is via the string, it's even better, rather than parenthesizing the left hand side of the pipe-to-getline, to store it in a variable and use that variable as an argument to close. For example:
awk '/^test / {
cmd = sprintf("ping -c %d %s", $2, $3)
while (cmd | getline > 0) print
close(cmd)
}'
(a variant without the semicolons that some dislike :-) ), which, when fed:
test 1 google.com
produces:
PING google.com (74.125.225.161): 56 data bytes
64 bytes from 74.125.225.161: icmp_seq=0 ttl=56 time=22.898 ms
--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 22.898/22.898/22.898/0.000 ms
Addendum (poking around on the web, I discovered that this is less obvious than I thought it was): be aware that this kind of "bare" getline, because it replaces the "current line", causes any remaining pattern-and-action rules in the script to fire on the new line contents. E.g., after the above, $0 begins with round-trip min/av , so a subsequent rule of the form /^round/ would match, even though the input line that triggered the "ping" was test 1 google.com. If this is not the last rule, it's probably appropriate to add a next directive to it. (In a complicated script I'd put that in every getline-ing action, even the last one, in case the last rule is moved, or more are added.)
As the relevant part of my final awk script is too large to fit as a comment, I'll post as an answer. The stripInputRecord, getIpNumber and getPortNumber functions just pick out the useful parts from the HTML code.
/address/ {
ip = stripInputRecord( $0 );
ip = getIpNumber( ip );
port[na] = stripInputRecord( $0 );
port[na] = getPortNumber( port[na] );
if (!(ip~"N/A")) {
if (ip~/loopback/) {
ip="127.0.0.1";
port[na]=stdp;
}
cmd="nmap -PN -p "stdp" "ip
cmd2="nmap -PN -p " port[na] " " ip
while ((cmd | getline)==1) {
if ($0~stdp) {
stdportstatus[na] = $2
}
}
while ((cmd2 | getline)==1) {
if ($0~port[na]) {
otherportstatus[na] = $2
}
}
}
close(cmd)
close(cmd2)
if ($0~/N\/A/) {
stdportstatus[na] = "-";
otherportstatus[na] = "-";
}
na++;
}
Thank you all (especially torek!)

Read form more files using awk, how?

I would like to read more input files with awk. In every file in my folder starting with ftp_dst_ I want to run this little awk script.
for i in ftp_dst_*;
do
gawk -v a="$a" -v b="$b" -v fa="$fa" -v fb="$fb" -v max="$max" '
BEGIN{
FS=" ";
OFS="\t";
}
{
if ($8 == "nrecvdatabytes_")
{
b=a;
a=$1;
if (b!=0)
{
fa=a-b;
if (fa>max && fa!=0)
{
max=fa;
}
}
}
}
END{
print "lol";
#print flowid, max;
}
'./ftp_dst_*
done
So now ftp_dst_5, ftp_dst_6, ftp_dst_7 are in the folder so I should get 3 lines with lol in the command line. Of course this "print lol" is only a try, I want to get 3 values from the 3 files.
So how can I read from all these files using awk?
By using a glob in the argument, all the files are taken together as if they were one file. Without the shell for loop, you would get output one time. Since you have the for loop, you should be getting the output three times. Part of your problem may be that you need a space after the closing single quote or you may need to change the argument to "$i" as Karl Nordström suggested if you want each file to be considered separately.

Unable to make a factorial function in AWK

The code
#!/usr/bin/awk
# Sed and AWK by O'Reilly (p.179)
# Example of what should happen: factorial 5 gives
# factorial
# Enter number: 3
# The factorial of 3 is 6
BEGIN {
printf("Enter number: ")
}
$1 ~ /^[0-9]+$/ {
# assign value of $1 to number & fact
number = $1
if (number == 0)
fact = 1
else
fact = number
for (x = 1; x < number; x++)
fact *=x
printf("The factorial of %d is %g\n", number, fact)
exit
}
# if not a number, prompt again.
{ printf("\nInvalid entry. Enter a number: ")
}
I run the command unsuccessfully by
./factorial.awk
I get
/usr/bin/awk: syntax error at source line 1
context is
>>> <<< ./factorial.awk
/usr/bin/awk: bailing out at source line 1
What does the error message mean?
I think that the problem is that you are writing a shell script and passing it to awk for execution. The following is a shell script, hence the #! /bin/sh, so it will be passed to the shell (Bourne-compatible in this case).
#! /bin/sh
awk 'BEGIN { printf("Hello world!\n"); exit }'
The she-bang (#!) line tells the current interpreter which interpreter to pass the script to for execution. You have to pass the script to the awk interpreter so you need to call awk explicitly. This assumes that awk is in your path somewhere.
The following, however, is an awk script.
#! /usr/bin/awk -f
BEGIN {
printf("Hello world!\n");
exit
}
The she-bang invokes awk and passes the script as input. You don't need to explicitly invoke awk in this case and you don't have to quote the entire script since it is passed directly to awk.
Think of the she-bang as saying take what follows the she-bang, append the name of the file, and execute it. Wikipedia describes the usage pretty well including some common ways to solve the path to the interpreter problem.
Possibly a dumb answer but in my terminal I would have to type in:
./factorial.awk
where the file is factorial.awk.
You could edit your path environment variable to include . but ./ should work just fine I think. And adding . to $PATH could prove to be very dangerous in some situations where you would run code that you did not expect to.
Does that work??
EDIT:
./factorial.awk
-bash: ./factorial.awk: /usr/bin/gawk: bad interpreter: No such file or directory
That says that it ran the file but could not find the program gawk.
Please type in 'which gawk' and then 'which awk'.
Is your first line supposed to be:
#!/usr/bin/awk
Also, just to amuse me, type in:
sudo apt-get install gawk
That will make sure you actually have gawk on your system.
EDIT2:
I took a look at your code and this is what I have now. I removed two quotes and a dash.
#!/usr/bin/gawk
# I think we do not need these (p.179) so I comment them out, since I do not know where else to put them.
# The same bug occurs also with them.
#fact = number
#for (x = number -1 ; x > 1; x--)
# fact *= x
awk # factorial: returns factorial of user-supplied number
BEGIN {
printf("Enter number: ")
}
$1 ~ /^[0-9]+$/ {
# assign value of $1 to number & fact
number = $1
if (number == 0)
fact = 1
else
fact = number
#loop to multiply fact*x until x = 1
for (x = number - 1; x > 1; x--)
fact *= x
printf("The factorial of %d is %g\n", number, fact)
#exit -- saves user from typing ^-d
exit
}
# if not a number, prompt again.
{ printf("\nInvalid entry. Enter a number: ")
}
may be it wasn't that complicated.
#!/usr/bin/awk ---------> #!/usr/bin/awk -f
Check whether there is a file /usr/bin/gawk; if not, use either the path of awk or the correct location for gawk.
Also, did you make the script executable?
And also, do you have the current directory in your PATH?
I got the script to work in Ubuntu and OS X by running
awk -f factorial.awk
It seems that you cannot run the script as follows although the book says so
./factorial.awk
Here's a recursive version:
#!/usr/bin/awk -f
function f(x) {
if (x <= 1) return 1
return (f(x-1) *x)}
BEGIN {
printf("Enter number: ")
}
$1 ~ /^[0-9]+$/ {
printf("The factorial of %d is %d\n", $1, f($1))
exit
}
{ printf("\nInvalid entry. Enter a number: ")
}
This question was the top hit on Google for the search phrase "awk factorial", so here's a simple way to print a factorial in awk:
$ awk 'BEGIN{x=1;for(i=2;i<=6;i++)x*=i;print x}'
720
As a shell function (the space after -v is required by nawk which comes with macOS but not by gawk):
$ fac(){ awk -v "n=$1" 'BEGIN{x=1;for(i=2;i<=n;i++)x*=i;print x}';}
$ fac 6
720
As an awk function for calculating k-combinations:
$ awk 'function f(x){r=1;for(i=2;i<=x;i++)r*=i;return r}BEGIN{n=5;k=3;print f(n)/(f(k)*f(n-k))}'
10