Linux AWK Programming - awk

I am trying the sort through mail log files using awk. My goal is to determine which emails had a delay longer than 10 seconds. The format of the log file displays the delays in delay=xxxx I have came up with:
awk '/delay/ { if($9 >=10) print}' filename
When I run this command it returns all the entries with the word delay and does not just give the delays that are greater than 10 seconds.
Please Help

Here is a sample mogfile:
$ cat mog
... the log file displays the delays in delay=11 ...
... the log file displays the delays in delay=9 ...
and the script:
$ awk '/delay/{split($9,a,"=");if(a[2]>=10)print}' mog
... the log file displays the delays in delay=11 ...

Related

numerical comparison to less or equal with awk

I'm trying to build a script (not command line) in awk that will pull a list of users on a Linux system then save them to a file.
I have most of it working but I can't figure out how to filter for users that are not system users, ie have an ID over 1000. However when I built the code and ran it, it returned an empty file. I'm saving it to a file in the command line.
Any advice here would be fantastic as I have been pulling my hair out trying to figure out why this isn't working. The code I currently have is this:
#! /usr/bin/awk -f
BEGIN { FS=":" }
/$3<=1000/ { print "Username :",$1,"User ID :",$3}
Use:
$3<=1000 {print "Username :",$1,"User ID :",$3}
or
{if($3<=1000) {print "Username :",$1,"User ID :",$3}}

grep command to extract time in "HR:MIN MST" format from the log file

Here's a sample of a log file format I am creating from my program.
Sample log file
Job execution started 2018-05-16 05:54:08 MST
Starting job for '2018-05-16'
Starting job for '2018-05-16'
Control table count is : '33768'
Processing batch_id: '11548
11568
11598
11611
11637
11662
11688
Completed job for '2018-05-16'
Job execution completed 2018-05-16 06:04:59 MST
I want to extract only the start time and end time from a log file. Can anyone help me determine how I might achieve this?
You could use awk like:
awk -F'[ :]' '/Job execution (started|completed)/{ print $5 ":" $6, $8}' infile
This sets field separator either a colon or space and print the fields 5, 6 and 7 from the lines matched with the given patterns /.../ only.

Run awk on file being decoded continuously and then print patterns from decode file using awk

I a command which decodes binary logs to ascii format
From ASCII format file, I need to grep some patters using awk and print them
How can this be done?
What I have tried is as below in shell script and it does not work.
command > file.txt | awk /pattern/ | sed/pattern/
Also I need command to continously decode file and keep printing patterns on file being updated
Thanks in advance
command to continously decode file and keep printing patterns
The first question is exactly how continuously manifests itself. Most log files grow by being appended to -- for our purpose here, by some unknown external process -- and are periodically rotated. If you're going to continuously decode them, you're going to have to keep track of log rotation.
Can command continuously decode, or do you intend to re-run command periodically, picking up where you left off? If the latter, you might instead try some variation of:
cat log | command | awk
If that can't be done, you'll have to record where each iteration terminates, something like:
touch pos
while -f pos
do
command | awk -v status=pos script.awk > output || rm pos
done
where script.awk skips input until NR equals the value of the number in the pos file. It then processes lines until EOF, and overwrites pos with its final NR. On error, it calls exit 1, and the file is removed, and the loop terminates.
I recommend you ignore sed, and put all the pattern matching logic in one awk script. It will be easier to understand and cheaper to execute.

Can I speed up AWK program using NR function

I am using awk to pull out data form a file that us +30M records. I know within a few 1000 records where the records I want are. I am curious if I can cut down on the time it take awk to find the records by telling it a starting point setting the NR. for example, my record is >25 million lines in I could use the following:
awk 'BEGIN{NR=25000000}{rest of my script}' in
would this make awk skip straight to the 25M record and save me the time of it scanning each record before that?
For a better example, I am using this AWK in a loop in sh. I need the normal output of the awk script, but I would also like it pass along the NR when it finished to the next interation when loop comes back to this script again.
awk -v n=$line -v r=$record 'BEGIN{a=1}$4==n{print $10;a=2}($4!=n&&a==2){(pass NR out to $record);exit}' in
Nope. Let's try it:
$ cat -n file
1 one
2 two
3 three
4 four
$ awk 'BEGIN {NR=2} {print NR, $0}' file
3 one
4 two
5 three
6 four
Are your records fixed length, or do you know the average line length? If yes, then you can use a language that allows you to open a file and seek to a position. Otherwise you have to read all those lines:
awk -v start=25000000 'NR < start {next} {your program here}' file
To maintain your position between runs of the script, I'd use a language like perl: at the end of the run use tell() to output the current position, say to a file; then at the start of the next run, use seek() to pick up where you left off. Add a check that the starting position is less than the current file size, in case the file was truncated.
One way (Using sed), if you know the line numbers
for n in 3 5 8 9 ....
do
sed -n "${n}p" file |awk command
done
or
sed -n "25000,30000p" file |awk command
Records generally have no fixed size so there is no way for awk but to scan the first part of the file even just to skip them.
Should you want to skip the first part of the input file and you (roughly) know the size to ignore, you can use dd to truncate the input, eg here assuming a record is 80 bytes wide:
dd if=inputfile bs=25MB skip=80 | awk ...
Finally, you can avoid awk to scan the last records by exiting from the awk script when you have hit the end of the interesting zone.

Is there a way to create an awk input inactivity timer?

I have a text source (a log file), which gets new lines appended to it by some third party.
I can output the additions to my source file using tail -f source. I can then pipe that through an awk script awk -f parser.awk to parse and format the output.
My question is: while tail -f source | awk -f parser.awk is running, is there a way to call function foo() inside my parser.awk script every time there is more than 5 seconds elapsed without anything coming through the pipe into the standard input of the awk script?
Edit: Currently using GNU Awk 3.1.6. May be able to upgrade to newer version if required.
If your shell's read supports -t and -u, here's an ugly hack:
{ echo hello; sleep 6; echo world; } | awk 'BEGIN{
while( "while read -t 5 -u 3 line; do echo \"$line\"; done" | getline > 0 )
print
}' 3<&0
You can replace the print in the body of the while loop with your script. However, it would probably make a lot more sense to put the read timeout between tail and awk in the pipeline, and it would make even more sense to re-implement tail to timeout.
Not exactly the answer to your question. However there is a little hack in shell that can do practically what you want:
{ tail -f log.file >&2 | { while : ; do sleep 5; echo SECRET_PHRASE ; done ; } ; } 2>&1 | awk -f script.awk
When awk receives SECRET_PHRASE it will run foo function every 5 seconds. Unfortunately is will run it every 5 second even in case there was some output during this time from tail.
ps. You can replace '{}' with '()' and vice versa. In the first case it won't create subshell, in the second one it will.
The another way is to append this secret phrase dirctly to log file in case nobody wrote there during last five seconds. But looks like it's not good idea due to you will have spoiled log file.