Search logs within date/time range - awk

I, Newbie, have searched this forum high and low, and have tried several awks, seds, & greps.
I am trying to search log files to output all logs within a date & time.
Unfortunately, the logs that I am searching all have different date formats.
I did get this one to work:
awk '$0 >= "2018-08-23.11:00:00" && $0 <= "2018-08-23.14:00:00"' catalina.out
for that specific date format.
I can't get these date formats to work, maybe an issue with the spacing?
2018-08-23 11:00:00, or Aug 23, 2018 11:00:00
Some examples of what I have tried:
sed -n '/2018-08-23 16:00/,/2018-08-23 18:00/p' testfile.txt
sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' testfile.txt
awk '$0 >= "2018-08-23 17:00:00" && $0 <= "2018-08-23 19:00:00"' testfile.txt
I have also tried setting variables:
FROM="Aug 23, 2018 17:00:00" , TO="Aug 23, 2018 19:00:00"
awk '$0 >= "$FROM" && $0 <= "$TO"' testfile.txt
Can anyone help me with this?
UPDATE: I got THIS to work for the 2018-08-23 11:00:00 format
grep -n '2018-08-23 11:[0-9][0-9]' testfile.txt | head -1
grep -n '2018-08-23 12:[0-9][0-9]' testfile.txt | tail -1
awk 'NR>=2 && NR<=4' testfile.txt > rangeoftext
But I could not get it to work with the Aug 23, 2018 11:00:00 -- again, I think this may be a space issue? Not sure how to resolve....

This is a difficult problem. grep and sed have no concept of a date, and even GNU awk has only limited support for dates and times.
The problem becomes somewhat more tractable if you use a sane date format, i.e. a date format that can be used in string comparisons, such as 2018-08-15 17:00:00. This should work regardless of whether the string contains whitespace or not. However, beware of tools that automatically split on whitespace, such as the shell and awk.
Now, to your examples:
sed -n '/2018-08-23 16:00/,/2018-08-23 18:00/p' testfile.txt
sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' testfile.txt
awk '$0 >= "2018-08-23 17:00:00" && $0 <= "2018-08-23 19:00:00"' testfile.txt
The first two should work, but only if the file really contains both timestamps, since you are only checking for the presence of certain arbitrary strings. The third should also work, provided that the records all start with a timestamp.

This might be what you're looking for (making some assumptions about what your input file might look like):
$ cat file
Aug 22, 2018 11:00:00 bad
2018-08-23 11:00:00 good
Aug 23, 2018 11:00:00 good
2018-08-24 11:00:00 bad
$ cat tst.awk
BEGIN {
min = raw2dt(min)
max = raw2dt(max)
}
{ cur = raw2dt($0) }
(cur >= min) && (cur <= max)
function raw2dt(raw, tmp, mthNr, dt, fmt) {
fmt = "%04d%02d%02d%02d%02d%02d"
if ( match(raw,/[0-9]{4}(-[0-9]{2}){2}( [0-9:]+)?/) ) {
split(substr(raw,RSTART,RLENGTH),tmp,/[^[:alnum:]]+/)
dt = sprintf(fmt, tmp[1], tmp[2], tmp[3], tmp[4], tmp[5], tmp[6])
}
else if ( match(raw,/[[:alpha:]]{3} [0-9]{2}, [0-9]{4}( [0-9:]+)?/) ) {
split(substr(raw,RSTART,RLENGTH),tmp,/[^[:alnum:]]+/)
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",tmp[1])+2)/3
dt = sprintf(fmt, tmp[3], mthNr, tmp[2], tmp[4], tmp[5], tmp[6])
}
return dt
}
$ awk -v min='Aug 23, 2018 11:00' -v max='2018-08-23 11:00' -f tst.awk file
2018-08-23 11:00:00 good
Aug 23, 2018 11:00:00 good
The above will work using any POSIX awk in any shell on any UNIX box.

When trying to obtain a set of log-entries which appear between two dates, one should never use sed to check for this. Yes it is true that sed has a cool and very useful feature to check address ranges (so does awk btw.) but
sed -n `/date1/,/date2/p` file
will not always work. This means it will only work if date1 and date2 are actually in the file. If one of them is missing, this will fail.
An editing command with two addresses shall select the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second.
[address[,address]]
On top of that, when comparing dates, one should never use string comparisons unless you use a sane format. Some sane formats are YYYY-MM-DD, YYYY-MM-DD hh:mm:ss, ... Some bad formats are "Aug 1 2018" as it comes before "Jan 1 2018" and "99-01-31" comes after "01-01-31", or "2018-2-1" comes after "2018-11-1"
So if you can, try to convert your date you obtain to a sane format. The sanest format is computing the date-difference wrt an epoch. Unix has various tools that allow computing the number of seconds since the UNIX EPOCH of 1970-01-01 00:00:00 UTC. This is what you are really after.
As you mention, your log-file has various date-formats, and this does not make things easy. Even though gnu awk has various Time Functions, they require that you know the format beforehand.
Since we do not know which formats exist in your log-file, we will make use of the unix function date which has a very elaborate interpreter that knows a lot of formats.
Also, I will make the assumption that in awk you are able to uniquely identify the date somehow store the date in a string called date. Maybe there is a special character always appearing after the date that allows you to do this:
Example input file:
2018-08-23 16:00 | some entry
Aug 23 2018 16:01:01 | some other entry
So, in this case, we can say:
awk -F| -v t1=$(date -d "START_DATE" "+%s") \
-v t2=$(date -d "END_DATE" "+%s") \
'{date=$1}
{cmd="date -d \""$1"\" +%s"; cmd | getline epoch; close cmd}
(t1 <= epoch && epoch <= t2)' testfile

Related

How do I print every nth entry of the mth column, starting from a particular line of a file?

Consider the following data in a file file.txt:
$
$
$
FORCE 10 30 40
* 1 5 4
FORCE 11 20 22
* 2 3 0
FORCE 19 25 10
* 16 12 8
.
.
.
I want to print every 2nd element of the third column, starting from line 4, resulting in:
30
20
25
I have tried:
cat file.txt | sed 's/\|/ /' | awk 'NR%2==4 {print $3}'
However, this is not resulting in anything being printed and no errors generated either.
You might use awk checking that the row number > 3 and then check for an even row number with NR%2==0.
Note that you don't have to use cat
awk 'NR > 3 && NR%2==0 {
print $3
}' file.txt
Output
30
20
25
Using sed
$ sed -En '4~2s/([^ \t]*[ \t]+){2}([^ \t]*).*/\2/p' input_file
30
20
25
I have tried:
cat file.txt | sed 's/\|/ /' | awk 'NR%2==4 {print $3}'
However, this is not resulting in anything being printed and no errors
generated either.
You do not need cat whilst using GNU sed as it can read file on its' own, in this case it would be sed 's/\|/ /' file.txt.
You should consider if you need that part at all, your sample input does not have pipe character at all, so it would do nothing to it. You might also drop that part if lines holding values you want to print do not have that character.
Output is empty as NR%2==4 does never hold, remainder of division by x is always smaller than x (in particular case of %2 only 2 values are possible: 0 and 1)
This might work for you (GNU sed):
sed -nE '4~2s/^((\S+)\s*){3}.*/\2/p' file
Turn off implicit printing by setting the -n option and reduce back slashes in regexps by turning on -E.
From the fourth line and then every second line thereafter, capture the third column and print it.
N.B. The \2 represents the last inhabitant of that back reference which in conjunction with the {3} means the above.
Alternative:
sed -n '4,${s/^\(\(\S\+\)\s*\)\{3\}.*/\2/p;n}' file

AWK command reduce YYYYMMDD by 4 years

I am a csv file with 2 columns, i.e. DATE and TYPE. If Type is B, the DATE should reduce by 4 years i.e. in YYYYMMDD YYYY should be -4. Example if the date is 20200422, the date should become 20160422 for data B under TYPE.
Thank you!
DATE,TYPE,
20200101,A
20200422,B
20200401,B
Since in Awk, a string that looks like a number can be treated like one, the solution can be a simple as subtracting 40000.
$ awk 'BEGIN { print 20200422 - 40000 }'
20160422
$ awk 'BEGIN { print "20200422" - 40000 }'
20160422
Just learned the e flag (for "execute") a few days ago. It seems to work great for this problem. Note that this is a GNU extension - this trick may not work, perhaps, on MacOS (which is based on FreeBSD).
Here's the input file I created for testing:
$ cat myfile.csv
DATE,TYPE
20200310,B
20180228,B
20181215,A
20130404,A
20050228,B
And here is the sed solution and the output:
$ cat myfile.csv | sed -E 's/^([[:digit:]]{8}),B$/echo $((\1 - 40000)),B/e'
DATE,TYPE
20160310,B
20140228,B
20181215,A
20130404,A
20010228,B
Of course, the "date" won't be a valid date if you start, for example, with 19040229 and subtract 4 from the year; 1904 was a leap year, but 1900 was not. Luckily, 2000 is divisible by 400, so it was a leap year; you will be fine for years between 1905 and 2103.
A safer option would be to replace echo $((\1 - 40000)),B with
date -d "& -4 years" +%Y%m%d,B

Use grep or awk to extract formatted dates from a text file

I have a file with many dates, written as "January 1, 2014". How can I exact all of these dates from a file in chronological order (they are ordered in the file) using awk or grep?
I basically want:
grep "$a %d, %d" file.txt
But, I want to let $a = {January, ... , December}.
Basically, in the end, I want a file that has:
June 1, 2010
June 5, 2010
...
Since there are only 12 month names, it is not unreasonable to hardcode them into the expression. Remember I am using ... below but you should write in the actual month names.
egrep -o '(January|February|March|...|December) [0-9]+, [0-9]+' Input.txt
TL:DR
$ sort -M /tmp/dates | awk -v month=June '$0 ~ month {print $1, $2, $3}'
Use GNU Sort and GNU Awk
GNU sort provides the --month-sort flag. Given the following input:
December 31, 2014
June 5, 2010
December 31, 2013
June 1, 2010
January 1, 2009
the sort command will sort the lines into a reasonable date-sorted order. If you have to do a secondary sort, you can always do that, too.
Meanwhile, you can then use the awk command to match each line against the desired month (stored in the month variable) and then print just the date fields from each matching record.
On my system, given the input above, I get the following output when month=June:
June 1, 2010
June 5, 2010

convert month from Aaa to xx in little script with awk

I am trying to report on the number of files created on each date. I can do that with this little one liner:
ls -la foo*.bar|awk '{print $7, $6}'|sort|uniq -c
and I get a list how many fooxxx.bar files were created by date, but the month is in the form: Aaa (ie: Apr) and I want xx (ie: 04).
I have feeling the answer is in here:
awk '
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
months[d[o]]=sprintf("%02d",o)
}
format = "%m/%d/%Y %H:%M"
}
{
split($4,time,":")
date = (strftime("%Y") " " months[$2] " " $3 " " time[1] " " time[2] " 0")
print strftime(format, mktime(date))
}'
But have no to little idea what I need to strip out and no idea how to pass $7 to whatever I carve out of this to convert Apr to 04.
Thanks!
Here's the idiomatic way to convert an abbreviated month name to a number in awk:
$ echo "Feb" | awk '{printf "%02d\n",(index("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'
02
$ echo "May" | awk '{printf "%02d\n",(index("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'
05
Let us know if you need more info to solve your problem.
Assuming the name of the months only appear in the month column, then you could do this:
ls -la foo*.bar|awk '{sub(/Jan/,"01");sub(/Feb/,"02");print $7, $6}'|sort|uniq -c
Just use the field number of your month as an index into the months array.
print months[$6]
Since ls output differs from system to system and sometimes on the same system depending on file age and you didn't give any examples, I have no way of knowing how to guide you further.
Oh, and don't parse ls.
To parse AIX istat, I use:
istat .profile | grep "^Last modified" | read dummy dummy dummy mon day time dummy yr dummy
echo "M: $mon D: $day T: $time Y: $yr"
-> Month: Mar Day: 12 Time: 12:05:36 Year: 2012
To parse AIX istat month, I use this two-liner AIX 6.1 ksh 88:
monstr="???JanFebMarAprMayJunJulAugSepOctNovDec???"
mon="Oct" ; hugo=${monstr%${mon}*} ; hugolen=${#hugo} ; let hugol=hugolen/3 ; echo "Month: $hugol"
-> Month: 10
1..12 : month name ok
If lt 1 or gt 12 : month name not ok
Instead of "hugo" use speaking names ;-))
Adding a version for AIX, that shows how to retrieve all the date elements (in whatever timezone you need it them in), and display an iso8601 output
tempTZ="UTC" ; TZ="$tempTZ" istat /path/to/somefile \
| grep modified \
| awk -v tmpTZ="$tempTZ" '
BEGIN {Mmms="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec";
n=split(Mmms,Mmm," ") ;
for(i=1;i<=n;i++){ mm[Mmm[i]]=sprintf("%02d",i) }
}
{ printf("%s-%s-%sT%s %s",$NF, mm[$4], $5, $6, tmpTZ ) }
' ## this will output an iso8601 date of the modification date of that file,
## for ex: 2019-04-18T14:16:05 UTC
## you can tempTZ=anything, for ex: tempTZ="UTC+2" to see that date in UTC+2 timezone... or tempTZ="EST" , etc
I show the iso8601 version to make it more known & used, but of course you may only need the "mm" portion, which is easly done : mm[$4]

Utility Script - To find out change in time

I have a list of timestamps in a text file. I want to figure out the times at which the change is more than a given threshold.
Input format:
10:13:55
10:14:00
10:14:01
10:14:02
10:14:41
10:14:46
10:17:58
10:18:00
10:19:10
10:19:16
If the threshold is, say, 30 seconds, I want the output to list the cases where the change is >= 30 seconds
eg. 10:14:02 and 10:14:41, 10:14:46 and 10:17:58
Solutions in bash, python or ruby would be helpful. Thanks.
I tend to use awk (with a sed filter to break your lines up) for things like that:
echo '10:13:55 10:14:00 10:14:01 10:14:02
10:14:41 10:14:46 10:17:58 10:18:00
10:19:10 10:19:16'
| sed -e 's/ *//g' -e 's/^ //' -e 's/ $//' -e 's/ /\n/g'
| awk -F: '
NR==1 {s=$0;s1=$1*3600+$2*60+$3}
NR>1 {t1=$1*3600+$2*60+$3;if (t1-s1 > 30) print s" "$0;s1=t1;s=$0}
'
outputs:
10:14:02 10:14:41
10:14:46 10:17:58
10:18:00 10:19:10
Here's how it works:
It sets the field separator to : for easy extraction.
When the record number is 1 (NR==1), it simply stores the time (s=$0) and number of seconds since midnight (s1=$1*3600+$2*60+$3). This is the first baseline.
Otherwise (NR>1), it gets the seconds since midnight (t1=$1*3600+$2*60+$3) and, if that's more than 30 seconds since the last one, it outputs the last time and this time (if (t1-s1 > 30) print s" "$0).
Then it resets the baseline for the next line (s1=t1;s=$0).
Keep in mind the sed command is probably more complicated that it needs to be in this example - it collapses all space sequences to one space, removes them from the start and end of lines then converts newline characters into spaces. Depending on the input form of your data (mine is complicated since it's formatted for readability), this may not all be necessary.
Update: Since the question edit has stated that the input is one time per line, you don't need the sed part at all.
Python:
from datetime import datetime
list = open("times.txt").read()
lasttime = None
for timestamp in [datetime.strptime(datestring, "%H:%M:%S") for datestring in list.split()]:
if lasttime and (timestamp - lasttime).seconds > 30:
print lasttime.time(),"and",timestamp.time()
lasttime = timestamp
In Python:
data = open('filename').read()
times = [datetime.time(x) for x in data.split()]
for i in range(1, len(times)):
if times[i] - times[i-1] > datetime.timedelta(seconds=30):
print times[i], times[i-1]
Ruby:
File.open(filename,'r').each do |line|
times = split
times.each { |time| time = Time.parse(time) }
times.each_with_index do |time,i|
puts time if ((time[i+1] - time [i]).sec > 30)
end
end
#OP, you algorithm is just to find a way to iterate each field, converting them to secs, and compare against the neighbours.
gawk 'BEGIN{threshold=30}
{
for(i=1;i<=NF;i++){
m=split($i,t,":")
n=split($(i+1),w,":")
sec = (t[1]*3600) + (t[2]*60) + t[3]
sec_next = (w[1]*3600) + (w[2]*60) + w[3]
if ( (sec_next - sec) > threshold ){
print $i, $(i+1)
}
}
}' file
output:
# ./shell.sh
10:14:02 10:14:41
10:14:46 10:17:58
10:18:00 10:19:10