calculate the difference from flat file

calculate the difference from flat file - awk

I have a text file and the last 2 lines look like this...
Uptime: 822832 Threads: 32 Questions: 13591705 Slow queries: 722 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.518
Uptime: 822893 Threads: 31 Questions: 13592768 Slow queries: 732 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.618
How do I find the difference between the two values of each parameter?
The expected output is:
61 -1 1063 10 0 0 0 0.1
In other words I will like to deduct the current uptime value from the earlier uptime.
Find the difference between the threads and Questions and so on.
The purpose of this exercise is to watch this file and alert the user when the difference is too high. For e.g. if the slow queries are more than 500 or the "Questions" parameter is too low (<100)
(It is the MySQL status but has nothing to do with it, so mysql tag does not apply)

Just a slight variation on ghostdog74's (original) answer:
tail -2 file | awk ' {
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ");
for (i=1;i<=m;i++)
if (NR==1) b[i]=a[i]; else print a[i] - b[i]
} '

here's one way. tail is used to get the last 2 lines, especially useful in terms of efficiency if you have a big file.
tail -2 file | awk '
{
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ")
if (f) {
for (i=1;i<=m;i++){
print -(b[i]-a[i])
}
# to check for Questions, slow queries etc
if ( -(b[3]-a[3]) < 100 ){
print "Questions parameter too low"
}else if ( -(b[4]-a[4]) > 500 ){
print "Slow queries more than 500"
}else if ( a[1] - b[1] < 0 ){
print "mysql ...... "
}
exit
}
for(i=1;i<=m;i++ ){ b[i]=a[i] ;f=1 }
} '
output
$ ./shell.sh
61
-1
1063
10
0
0
0
0.1

gawk:
BEGIN {
arr[1] = "0"
}
length(arr) > 1 {
print $2-arr[1], $4-arr[2], $6-arr[3], $9-arr[4], $11-arr[5], $14-arr[6], $17-arr[7], $22-arr[8]
}
{
arr[1] = $2
arr[2] = $4
arr[3] = $6
arr[4] = $9
arr[5] = $11
arr[6] = $14
arr[7] = $17
arr[8] = $22
}

Related

awk need epoch days logic for data transformation

In excel when I give in the cell A1=1/1/1970 and A2=today(), A2 resolves to 17-Apr-2020.
And when I subtract A2 with A1, I get 18369 which is the epoch days for my data transformation logic.
I'm trying to simulate the same functionality in awk, but not getting the whole number
$ awk ' BEGIN { t=strftime("%F",systime()); print t; gsub("-"," ",t); t2=mktime(t " 0 0 0" ); print t2/(24*60*60) } '
2020-04-17
18368.8
$
Is this translation for awk correct?. What is the reason for 0.2 difference in my awk code. How to fix it.

Converting my comment to answer so that solution is easy to find for future visitors.
You are seeing this difference due to timezone offset between your current local timezone and UTC.
On more recent gnu awk version you can use this to pass an additional utc-flag to mktime function:
awk 'BEGIN {t=strftime("%F",systime()); print t;
gsub("-"," ",t); t2=mktime(t " 0 0 0", 1); print t2/(24*60*60)}'
However if you are on older awk versions then use this work-around to get time computations in UTC:
TZ=UTC awk 'BEGIN {t=strftime("%F",systime()); print t;
gsub("-"," ",t); t2=mktime(t " 0 0 0"); print t2/(24*60*60) } '

The epoch days involve calculating how many multiples of "86400" seconds have passed since 1-Jan-1970. We have a boundary issue in this calculation, i.e we should not take 1st Jan, 1970 as 1 epoch day because the particular day has not completed.
That is the reason the java epoch days function returns 0 for 1-Jan-1970
java.time.LocalDate.parse("1970-01-01").toEpochDay
res2: Long = 0
Now, with this for 17-Apr-2020, we get
java.time.LocalDate.parse("2020-04-17").toEpochDay
res3: Long = 18369
Let's try to get the same answer using awk.
awk -v dt="2020-04-17" '
function epoch_days(dta)
{
yr=dta[0];
for(i=1970;i<strftime("%Y",mktime(dta[1] " 12 31 0 0 0"));i++)
e1+=strftime("%j",mktime(i " 12 31 0 0 0"))+0;
e1+=strftime("%j",mktime(dta[1] " " dta[2] " " dta[3] " 0 0 0"))-1;
return e1
}
BEGIN {
split(dt,dta,"-");
print epoch_days(dta) ;
}'
18369
Lets cross verify for current date i.e 22-Aug-2021
java.time.LocalDate.now
res5: java.time.LocalDate = 2021-08-22
java.time.LocalDate.now.toEpochDay
res6: Long = 18861
And our awk code returns
awk '
function epoch_days(dta)
{
yr=dta[0];
for(i=1970;i<strftime("%Y",mktime(dta[1] " 12 31 0 0 0"));i++)
e1+=strftime("%j",mktime(i " 12 31 0 0 0"))+0;
e1+=strftime("%j",mktime(dta[1] " " dta[2] " " dta[3] " 0 0 0"))-1;
return e1
}
BEGIN {
dt=systime(); # 22-Aug-2021
dt2=strftime("%F",dt);
split(dt2,dta,"-");
print epoch_days(dta) ;
}'
18861

awk: filter for number of non-zeros not filtering

I have previously used awk to reduce an enormous data table which has mostly zeros, to a smaller table with just the interesting rows (those with not too many zeros), with something like this:
awk -F '\t' '{count=0} {for(i=2; i<30; i++) if($i==0) count++} {if(count<5) print $0}' BigTable > SmallerTable
Now I would like to filter a similar table, to find rows with non-zero values in most of the "female" columns and zeros in most of the "male" columns. I tried to use the same awk logic, but my code returns all lines of the input file.
#! /usr/bin/awk -f
FS="\t"
{countF=0} {for(i=2; i<7; i++) if($i==0) countF++}
# count zeros in female columns 2-6
{countM=0} {for(i=7; i<12; i++) if($i==0) countM++}
# count zeros in male columns 7-12
{if (countF<2 && countM>3) {print $0}}
# if fewer than 2/5 females AND more than 3/5 males are zero, print line
My input file starts like this:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN F_CR1 F_CR2 F_CR3 F_CR4 F_CR6 M_CR10 M_CR5 M_CR7 M_CR8 M_CR9
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 14727 13526 13318 13862 11040 18975 21411 20079 16285 15611
CCGGTGTGACAACTGTAGTGAACTCAGCTCA 23 32 26 15 28 28 42 29 8 22
AACCAAATCTACAAACAGGAGATGTTGTTCT 107 110 118 106 95 100 121 132 92 90
GAAATAGAACAGGCCTGGAAGCCATGTCAAA 15 15 16 12 11 31 23 19 9 28
Have I messed up the syntax in the print line? Any advice much appreciated!

Change FS="\t" to BEGIN{FS="\t"}. Right now the result of that assignment is a true condition which invokes the default action of printing every line.
Then change your shell script to:
/usr/bin/awk '
BEGIN { FS="\t" }
{
# count zeros in female columns
countF=0
for(i=2; i<=6; i++) {
if ($i==0) {
countF++
}
}
# count zeros in male columns
countM=0
for(; i<=NF; i++) {
if ($i==0) {
countM++
}
}
}
# if fewer than 2/5 females AND more than 3/5 males are zero, print line
countF<2 && countM>3
' "$#"
so it's more awk-ish and easier to enhance later if/when you need to separate shell args into awk args and awk variable assignments (shebangs are not useful for this).
Also consider abbreviating it and removing the hard-coded Male/Female limits but get them from the header line instead:
/usr/bin/awk '
BEGIN { FS="\t" }
FNR==1 {
for (i=2; i<=NF; i++) {
sub(/_.*/,"",$i)
gender[i] = $i
}
next
}
{
for (i=2; i<=NF; i++) {
count[gender[i]] += ($i==0)
}
}
count["F"]<2 && count["M"]>3
' "$#"
The above is untested since you didn't provide the expected output for us to test with.

awk only specific rows transpose into multiple columns

Does anybody know how to transpose this input of rows in a file?
invStatus: ONLINE
System: 55
StatFail: 0
invState: 0 Unknown
invFailReason: None
invBase: SYS-MG5-L359-XO1-TRAFFIC STAT 5: TRAF2
invFlag: 0xeee5 SEMAN PRESENT STATUS H_DOWN BASE LOGIC_ONLINE DEX EPASUS INDEX ACK
dexIn: 0
dexIO: 0
badTrans: 0
badSys: 0
io_IN: 0
io_OUT: 0
Tr_in: 0
Tr_out: 0
into similar output:
invBase: SYS-MG5-L359-XO1-TRAFFIC STAT 5: TRAF2
invFlag: 0xeee5 SEMAN PRESENT STATUS H_DOWN BASE LOGIC_ONLINE DEX EPASUS INDEX ACK
invStatus: ONLINE System: 55 StatFail: 0 invState: 0 Unknown invFailReason: None
dexIn: 0 dexIO: 0 badTrans 0 badSys: 0
io_IN: 0 io_OUT: 0 Tr_in: 0 Tr_out: 0
i tried 1st time at the beginning to add at the end of each row ";" then join multiple rows > then split them based on string but still getting messy output
I am at this stage with formatting:
cat port | sed 's/$/;/g' | awk 'ORS=/;$/?" ":"\n"'

I'd start with this
awk -F: '
{data[$1] = $0}
END {
OFS="\t"
print data["invBase"]
print data["invFlag"]
print data["invStatus"], data["System"], data["StatFail"], data["invState"], data["invFailReason"]
print data["dexIn"], data["dexIO"], data["badTrans"], data["badSys"]
print data["io_IN"], data["io_OUT"], data["Tr_in"], data["Tr_out"]
}
' file
invBase: SYS-MG5-L359-XO1-TRAFFIC STAT 5: TRAF2
invFlag: 0xeee5 SEMAN PRESENT STATUS H_DOWN BASE LOGIC_ONLINE DEX EPASUS INDEX ACK
invStatus: ONLINE System: 55 StatFail: 0 invState: 0 Unknown invFailReason: None
dexIn: 0 dexIO: 0 badTrans: 0 badSys: 0
io_IN: 0 io_OUT: 0 Tr_in: 0 Tr_out: 0
Then, to make it as pretty as you want, start with storing the line lengths and change the print statements to printf statements using some of those lengths.
A closer look at the file reveals that, except for 3 lines, they are sequential and can be pasted into 4 columns:
awk -F: '
$1 == "invBase" || $1 == "invFlag" {print; next}
$1 == "invStatus" {invStatus = $0; next}
{line[n++] = $0}
END {
printf invStatus "\t"
paste = "paste - - - -"
for (i=0; i<n; i++) {print line[i] | paste}
close(paste)
}
' file
which provides the same output as above.

Awk Script to process data from a trace file

I have a table (.tr file) with different rows (events).
**Event** **Time** **PacketLength** PacketId
sent 1 100 1
dropped 2 100 1
sent 3 100 2
sent 4.5 100 3
dropped 5 100 2
sent 6 100 4
sent 7 100 5
sent 8 100 6
sent 10 100 7
And I would like to create a new table as the following and I don't know how to it in AWK.
**SentTime** **PacketLength Dropped**
1 100 Yes
3 100 Yes
4.5 100
6 100
7 100
8 100
10 100
I have a simple code to find dropped or sent packets, time and id but I do not know how to create a column in my table with the results for dropped packets.
BEGIN{}
{
Event = $1;
Time = $2;
Packet = $6;
Node = $10;
id = $11;
if (Event=="s" && Node=="1.0.1.2"){
printf ("%f\t %d\n", $2, $6);
}
}
END {}

You have to save all the information in an array to postprocess it at the end of the file. Obviously, if the file is huge, this could cause memory problems.
BEGIN {
template="#sentTime\t#packetLength\t#dropped";
}
{
print $0;
event = $1;
time = $2;
packet_length = $3;
packet_id = $4;
# save all the info in an array
packet_info[packet_id] = packet_info[packet_id] "#" packet_length "#" time "#" event;
}
END {
# traverse the information of the array
for( time in packet_info )
{
print "the time is: " time " = " packet_info[time];
# for every element in the array (= packet),
# the data has this format "#100#1#sent#100#2#dropped"
split( packet_info[time], info, "#" );
# info[2] <-- 100
# info[3] <-- 1
# info[4] <-- sent
# info[5] <-- 100
# info[6] <-- 2
# info[7] <-- dropped
line = template;
line = gensub( "#sentTime", info[3], "g", line );
line = gensub( "#packetLength", info[2], "g", line );
if( info[4] == "dropped" )
line = gensub( "#dropped", "yes", "g", line );
if( info[7] == "dropped" )
line = gensub( "#dropped", "yes", "g", line );
line = gensub( "#dropped", "", "g", line );
print line;
} # for
}

I would say...
awk '/sent/{pack[$4]=$2; len[$4]=$3}
/dropped/{drop[$4]}
END {print "Sent time", "PacketLength", "Dropped";
for (p in pack)
print pack[p], len[p], ((p in drop)?"yes":"")
}' file
This stores the packages in pack[], the lengths in len[] and the dropped in drop[], so that they are fetched later on.
Test
$ awk '/sent/{pack[$4]=$2; len[$4]=$3} /dropped/{drop[$4]} END {print "Sent time", "PacketLength", "Dropped"; for (p in pack) print pack[p], len[p], ((p in drop)?"yes":"")}' a
Sent time PacketLength Dropped
1 100 yes
3 100 yes
4.5 100
6 100
7 100
8 100
10 100

How to append column to existing file in awk?

I have a file named bt.B.1.log that looks like this:
.
.
.
Time in seconds = 260.37
.
.
.
Compiled procs = 1
.
.
.
Time in seconds = 260.04
.
.
.
Compiled procs = 1
.
.
.
and so on for 40 records of Time in seconds and Compiled procs (dots represent useless lines).
How do I add a single column with the value of Compiled procs (which is 1) to the result of the following two commands:
This prints the average of Time in seconds values (thanks to dawg for this one)
awk -F= '/Time in seconds/ {s+=$2; c++} END {print s/c}' bt.B.1.log > t1avg.dat
Desired output:
260.20 1
This prints all of the ocurrences of Time in seconds, but there is a small problem with it; it is printing an extra blank line at the beginning of the list.
awk 'BEGIN { FS = "Time in seconds =" } ; { printf $2 } {printf " "}' bt.B.1.log > t1.dat
Desired output:
260.37 1
260.04
.
.
.
In both cases I need the value of Compiled procs to appear only once, preferrably in the first line, and no use of intermediate files.
What I managed to do so far prints all values of Time in seconds with the Compiled procs column appearing in every line and with a strange identation:
awk '/seconds/ {printf $5} {printf " "} /procs/ {print $4}' bt.B.1.log > t1.dat
Please help!
UPDATE
Contents of file bt.B.1.log:
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Start in 16:40:51--25/12/2014
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of active processes: 1
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.7507984505732E-14
2 0.9933052259015E+02 0.9933052259015E+02 0.3147459568137E-14
3 0.3564602564454E+03 0.3564602564454E+03 0.4783990739472E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.2309751522921E-13
5 0.3270754125466E+04 0.3270754125466E+04 0.8481098651866E-14
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.2682819657265E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1989963674771E-15
3 0.1312257334221E+02 0.1312257334221E+02 0.4060995034457E-15
4 0.1200692532356E+02 0.1200692532356E+02 0.2958887128106E-15
5 0.1245957615104E+03 0.1245957615104E+03 0.2281113665977E-15
Verification Successful
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 260.37
Total processes = 1
Compiled procs = 1
Mop/s total = 2696.83
Mop/s/process = 2696.83
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 25 Dec 2014
Compile options:
MPIF77 = mpif77
FLINK = $(MPIF77)
FMPI_LIB = -L/usr/lib/openmpi/lib -lmpi -lopen-rte -lo...
FMPI_INC = -I/usr/lib/openmpi/include -I/usr/lib/openm...
FFLAGS = -O
FLINKFLAGS = -O
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb#nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
Finish in 16:45:14--25/12/2014
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Start in 16:58:50--25/12/2014
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of active processes: 1
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.7507984505732E-14
2 0.9933052259015E+02 0.9933052259015E+02 0.3147459568137E-14
3 0.3564602564454E+03 0.3564602564454E+03 0.4783990739472E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.2309751522921E-13
5 0.3270754125466E+04 0.3270754125466E+04 0.8481098651866E-14
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.2682819657265E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1989963674771E-15
3 0.1312257334221E+02 0.1312257334221E+02 0.4060995034457E-15
4 0.1200692532356E+02 0.1200692532356E+02 0.2958887128106E-15
5 0.1245957615104E+03 0.1245957615104E+03 0.2281113665977E-15
Verification Successful
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 260.04
Total processes = 1
Compiled procs = 1
Mop/s total = 2700.25
Mop/s/process = 2700.25
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 25 Dec 2014
Compile options:
MPIF77 = mpif77
FLINK = $(MPIF77)
FMPI_LIB = -L/usr/lib/openmpi/lib -lmpi -lopen-rte -lo...
FMPI_INC = -I/usr/lib/openmpi/include -I/usr/lib/openm...
FFLAGS = -O
FLINKFLAGS = -O
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb#nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
Finish in 17:03:12--25/12/2014
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
There are 40 entries in the log, but I've provided only 2 for abbreviation purposes.

To fix the first issue, replace:
awk -F= '/Time in seconds/ {s+=$2; c++} END {print s/c}' bt.B.1.log > t1avg.dat
with:
awk 'BEGIN { FS = "[ \t]*=[ \t]*" } /Time in seconds/ { s += $2; c++ } /Compiled procs/ { if (! CP) CP = $2 } END { print s/c, CP }' bt.B.1.log >t1avg.dat
A potential minor issue is that 260.205 1 might be output but the question does not address this as a weakness of the given script. Rounding with something like printf "%.2f %s\n", s/c, CP gives 260.21 1 though. To truncate the extra digit, use something like printf "%.2f %s\n", int (s/c * 100) / 100, CP.
To fix the second issue, replace:
awk 'BEGIN { FS = "Time in seconds =" } ; { printf $2 } {printf " "}' bt.B.1.log > t1.dat
with:
awk 'BEGIN { FS = "[ \t]*[=][ \t]" } /Time in seconds/ { printf "%s", $2 } /Compiled procs/ { if (CP) { printf "\n" } else { CP = $2; printf " %s\n", $2 } }' bt.B.1.log > t1.dat
BTW, the "strange indentation" is a result of failing to output a newline when using printf and failing to filter unwanted input lines from processing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

calculate the difference from flat file - awk

Just a slight variation on ghostdog74's (original) answer: tail -2 file | awk ' { gsub(/[a-zA-Z: ]+/," ") m=split($0,a," "); for (i=1;i<=m;i++) if (NR==1) b[i]=a[i]; else print a[i] - b[i] } '

gawk: BEGIN { arr[1] = "0" } length(arr) > 1 { print $2-arr[1], $4-arr[2], $6-arr[3], $9-arr[4], $11-arr[5], $14-arr[6], $17-arr[7], $22-arr[8] } { arr[1] = $2 arr[2] = $4 arr[3] = $6 arr[4] = $9 arr[5] = $11 arr[6] = $14 arr[7] = $17 arr[8] = $22 }

Related

awk need epoch days logic for data transformation

awk: filter for number of non-zeros not filtering

awk only specific rows transpose into multiple columns

Awk Script to process data from a trace file

How to append column to existing file in awk?

Categories

Resources