How to process fields between matching patterns?

How to process fields between matching patterns? - awk

Related post: How to select lines between two patterns?
#fedorqui, thanks for providing all these different options for awk. I have been using this to parse through var log messages when troubleshooting ooms and it works great. I want to extend this further but I have not been able to figure out how to proceed. What I am trying to do:
Print the lines between rss and out of memory. I have done that with the example
Order the sections between each match by rss field. I have not been able to figure this out
Add an extra column with its own header and perform some mathematical operation. I have been able to do this somewhat but I am running into some formatting issues. I am not sure how to skip the first and last line when adding the column so I lose those lines. I am also not able to keep the spacing from the original if I do any operations other than print.
Here's the command I am using right now:
less /var/log/messages'|awk '/swapents/{x=1; print "=================="};/Out of memory/{x=0} x'|sed 's/[]\[]//g'
Here's the source Data:
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.617265 pid uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.622250 1828 0 1828 4331 116 14 3 0 -1000 udevd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.627310 2664 0 2664 28002 53 23 3 0 -1000 auditd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.633181 2680 0 2680 62032 1181 24 4 0 0 rsyslogd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.638888 2694 0 2694 3444 61 11 3 0 0 irqbalance
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.644912 2710 81 2710 5430 56 14 3 0 0 dbus-daemon
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.651108 2779 0 2779 19958 203 42 3 0 -1000 sshd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.656670 2789 0 2789 5622 56 17 3 0 0 xinetd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.653452 Out of memory: Kill process 43390 (mysql) score 1000 or sacrifice child
blah
blah
blah
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.617265 pid uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.622250 1828 0 1828 4331 116 14 3 0 -1000 udevd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.627310 2664 0 2664 28002 53 23 3 0 -1000 auditd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.633181 2680 0 2680 62032 1181 24 4 0 0 rsyslogd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.638888 2694 0 2694 3444 61 11 3 0 0 irqbalance
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.644912 2710 81 2710 5430 56 14 3 0 0 dbus-daemon
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.651108 2779 0 2779 19958 203 42 3 0 -1000 sshd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.656670 2789 0 2789 5622 56 17 3 0 0 xinetd
Sep 8 11:35:15 ip-10-23-15-70 kernel: 11810061.653452 Out of memory: Kill process 43390 (mysql) score 1000 or sacrifice child
Here's what my output looks like:
================== 0MB
Sep 8 11:35:15 pid 0MB name <---- should be header (Pid virt rss etc)
Sep 8 11:35:15 1828 0MB udevd
Sep 8 11:35:15 2664 0MB auditd
Sep 8 11:35:15 2680 4MB rsyslogd
Sep 8 11:35:15 2694 0MB irqbalance
Sep 8 11:35:15 2710 0MB dbus-daemon
Sep 8 11:35:15 2779 0MB sshd
Sep 8 11:35:15 2789 0MB xinetd
Sep 8 11:35:15 2822 0MB crond
Sep 8 11:35:15 Out 0MB or <---- should be footer (out of memory etc)
================== 0MB
Sep 8 11:35:15 pid 0MB name <---- should be header (Pid virt rss etc)
Sep 8 11:35:15 1828 0MB udevd
Sep 8 11:35:15 2664 0MB auditd
Sep 8 11:35:15 2680 4MB rsyslogd
Sep 8 11:35:15 2694 0MB irqbalance
Sep 8 11:35:15 2710 0MB dbus-daemon
Sep 8 11:35:15 2779 0MB sshd
Sep 8 11:35:15 2789 0MB xinetd
Sep 8 11:35:15 2822 0MB crond
Sep 8 11:35:15 Out 0MB or <---- should be footer (out of memory etc)
================== 0MB
You can see the from the output that the separator I added for each oom field, awk tries to calculate values for it, I would love to avoid this if possible. Also the header and footer are getting chopped off and it would be nice to avoid that too.
Here's what I would like:
========================
Sep 8 11:35:15 pid rss memused_MB oom_score_adj name
Sep 8 11:35:15 2664 53 {rss*4/1024} -1000 auditd
Sep 8 11:35:15 2789 56 {rss*4/1024} 0 xinetd
Sep 8 11:35:15 2710 56 {rss*4/1024} 0 dbus-dae
Sep 8 11:35:15 2694 61 {rss*4/1024} 0 irqbalan
Sep 8 11:35:15 1828 116 {rss*4/1024} -1000 udevd
Sep 8 11:35:15 2680 181 {rss*4/1024} 0 rsyslogd
Sep 8 11:35:15 2779 203 {rss*4/1024} -1000 sshd
Sep 8 11:35:15 Out of memory: Kill process 43390 (mysql) score 1000 or sacrifice child
========================
Sep 8 11:35:15 pid rss memused_MB oom_score_adj name
Sep 8 11:35:15 2664 53 {rss*4/1024} -1000 auditd
Sep 8 11:35:15 2789 56 {rss*4/1024} 0 xinetd
Sep 8 11:35:15 2710 56 {rss*4/1024} 0 dbus-dae
Sep 8 11:35:15 2694 61 {rss*4/1024} 0 irqbalan
Sep 8 11:35:15 1828 116 {rss*4/1024} -1000 udevd
Sep 8 11:35:15 2680 181 {rss*4/1024} 0 rsyslogd
Sep 8 11:35:15 2779 203 {rss*4/1024} -1000 sshd
Sep 8 11:35:15 Out of memory: Kill process 43390 (mysql) score 1000 or sacrifice child
========================

awk solution:
$ cat tst.awk
/swapents/ {
x=1;
print "=================="
printf( "%s %s %s pid\t%4s\tmemused_MB\toom_score_adj\tname\n", $1, $2, $3, "rss");
next
}
/Out of memory/ {
printf( "%s %s %s %s\n", $1, $2, $3, substr($0,index($0,$7)));
x=0
}
x {
printf( "%s %s %s %s\t%4d\t%10.5f\t%13d\t%s\n", $1, $2, $3, $7, $11, ($11*4)/1024, $15, $16 )
}
You can play around with the formatting, like the precision in column 6, using the specifiers in the printf function. Call this with:
$ awk -f tst.awk /var/log/messages
EDIT: with sorting
OP asked to sort the output by rss columns. Using a standard sort wouldn't work here because you want to sort in between starting and ending matches. You can solve this by saving the intermediate result in an array and sort that with a self-defined function. Like this:
$ cat tst2.awk
/swapents/ {
x=1;
print "=================="
printf( "%s %s %s pid\t%4s\tmemused_MB\toom_score_adj\tname\n", $1, $2, $3, "rss");
next
}
/Out of memory/ {
n=asort(a, sorted, "cmp_rss")
for (i=1; i<=n; i++) {
print sorted[i]
}
delete a;
printf( "%s %s %s %s\n", $1, $2, $3, substr($0,index($0,$7)));
x=0
}
x {
a[i++] = sprintf( "%s %s %s %s\t%4d\t%10.5f\t%13d\t%s", $1, $2, $3, $7, $11, ($11*4)/1024, $15, $16 );
}
function cmp_rss(i1, v1, i2, v2)
{
split(v1, a1, " ")
split(v2, a2, " ")
rss1=a1[5];
rss2=a2[5];
return (rss1 - rss2)
}
which leads to:
$ awk -f tst2.awk input.txt
==================
Sep 8 11:35:15 pid rss memused_MB oom_score_adj name
Sep 8 11:35:15 2664 53 0.20703 -1000 auditd
Sep 8 11:35:15 2710 56 0.21875 0 dbus-daemon
Sep 8 11:35:15 2789 56 0.21875 0 xinetd
Sep 8 11:35:15 2694 61 0.23828 0 irqbalance
Sep 8 11:35:15 1828 116 0.45312 -1000 udevd
Sep 8 11:35:15 2779 203 0.79297 -1000 sshd
Sep 8 11:35:15 2680 1181 4.61328 0 rsyslogd
Sep 8 11:35:15 Out of memory: Kill process 43390 (mysql) score 1000 or sacrifice child
==================
Sep 8 11:35:15 pid rss memused_MB oom_score_adj name
Sep 8 11:35:15 2664 53 0.20703 -1000 auditd
Sep 8 11:35:15 2710 56 0.21875 0 dbus-daemon
Sep 8 11:35:15 2789 56 0.21875 0 xinetd
Sep 8 11:35:15 2694 61 0.23828 0 irqbalance
Sep 8 11:35:15 1828 116 0.45312 -1000 udevd
Sep 8 11:35:15 2779 203 0.79297 -1000 sshd
Sep 8 11:35:15 2680 1181 4.61328 0 rsyslogd
Sep 8 11:35:15 Out of memory: Kill process 43390 (mysql) score 1000 or sacrifice child

Based on Marc Lambrichs response I was able to create this one liner that does the job. Thanks so much. The only thing missing now is sorting by the rss column, I haven't been able to get the rss field sorted though
less /var/log/messages|awk '/swapents/ {x=1; print "==================";gsub(/\[|\]/, "") ;printf "%s %s %s %s %10s %10s %10s %15s memory_used %-s\n", $1,$2,$3,$4,$7,$10,$11,$15,$16 ;next } {gsub(/\[|\]/, "")} /Out of memory/ {print $0 ;x=0 } x {printf "%s %s %s %s %10s %10s %10s %15s %9.2fMB %-s\n", $1,$2,$3,$4,$7,$10,$11,$15,$11*4/1024,$16}'
For readability, the awk code formatted:
/swapents/ {
x=1;
print "==================";
gsub(/\[|\]/, "");
printf "%s %s %s %s %10s %10s %10s %15s memory_used %-s\n", $1,$2,$3,$4,$7,$10,$11,$15,$16 ;
next
}
{
gsub(/\[|\]/, "")
}
/Out of memory/ {
print $0;
x=0
}
x {
printf "%s %s %s %s %10s %10s %10s %15s %9.2fMB %-s\n", $1,$2,$3,$4,$7,$10,$11,$15,$11*4/1024,$16
}

Related

AWK ouput to separated file

I am trying to split my file based on the patterns that it contains and store them as separate text files with that pattern.
I am using AWK to split my lines by two different separators ("\t" "_") and use $2 as file name and append that line ($0) into related file that is generated.
My command is:
awk 'BEGIN {FS="[_ \t]"}; {fn=$2"_HP100.txt"}; {print $0 >fn}' my_file.txt
Awk can generate text file named after my $2, and also it can give $0 as whole line when I printed separately. But My generated text files are always empty... Am I missing something?
My text file contains:
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
Expected output is CGTCGCCAAACTTAAC_HP100.txt file that contains all the lines with _CGTCGCCAAACTTAAC tag.

You can just use this awk command:
awk -F '[\t_]' '{print $0 >> ($2 "_HP100.txt")}' file

With your shown samples and attempts please try following awk code. Its a combination of awk + sort + cut + awk solution, this will help us in case you have multiple lines where 2nd fields are same and you want to save all same $2 lines in same output file. This is sorting them from $2(by making 2nd field as 1st field to sort it and then using only required/actual portion of input only).
awk -F'_|\t' '{print $2,$0}' Input_file | sort | cut -d' ' -f2- |
awk -F'_|\t' '
prev!=$2{
close(outputFile)
outputFile=$2"_HP100.txt"
}
{
print $0 > (outputFile)
prev=$2
}
'
NOTE: Pass your Input_file to first used awk in above code. Also this code is good for small files.

The approach I have in mind to deal with the 50GB issue is to collect a certain amount each round, then when the input arrays reach a fixed threshold, output the rows according to their file names
instead of nonstop concat-ing into a very very long string - the current approach uses one array for the rows themselves, and one more for mapping filenames to row #s (aka NR) :
collect until reaching threshold (i didn't set one that's up to u)
dump out all data in arrays to files
— also allows for re-creation of original input row order
use print (. . .) >> (fn) instead of >, just in case
use split("", arr) to reset both arrays to a mint state
— it's cleaner and more portable than delete arr statement
host system resource friendly
minimizes concurrent resource usage on the host system by only requiring one single instance of awk
the threshold you set would also constrain RAM usage at any moment to a sane level
avoids the need to use shell loops, xargs, or gnu-parallel
avoids nonstop close(fn)
avoids wasting time performing any amount of data sort-ing
mawk '1;1;1;1;1;1;1' raw_input.txt | shuf |
mawk2 '
BEGIN { OFS = "_HP100.txt"
FS = "^[^_]+_|[ \t].+$"
} { ___[ NR ] = $_
sub(OFS, _, $!(NF = NF))
sub( "$", "=" NR,____[$_])
} END {
OFS = "\f\r\t\t"
for(_ in ___) { print _, ___[_] }
for(_ in ____) { print _,____[_] } } '
1
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
2
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
3
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
4
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
5
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
6
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
7
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
8
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
9
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
10
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
11
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
12
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
13
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
14
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
15
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
16
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
17
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
18
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
19
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
20
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
21
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
22
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
23
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
24
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
25
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
26
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
27
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
28
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
29
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
30
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
31
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
32
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
33
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
34
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
35
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
36
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
37
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
38
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
39
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
40
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
41
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
42
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
43
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
44
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
45
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
46
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
47
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
48
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
49
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
50
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
51
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
52
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
53
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
54
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
55
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
56
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
57
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
58
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
59
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
60
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
61
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
62
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
63
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
TCATACCAAGTCTCCG_HP100.txt
=5=12=15=23=24=40=48
CGTCGCCATCGCTAGG_HP100.txt
=29=30=34=38=44=46=50
TCATCGAACCTCGTTG_HP100.txt
=9=25=27=31=36=49=59
AGCGTCTTTCATGCTG_HP100.txt
=3=8=14=17=18=22=26
GCCTATTCCCTCGTTG_HP100.txt
=11=37=52=54=56=57=61
GATGCGTTTTCTGGTT_HP100.txt
=6=7=10=20=39=41=55
TCGAACGACCGTTGCG_HP100.txt
=16=19=32=42=43=60=63
TGGATCAACAGGACCA_HP100.txt
=1=13=33=35=45=47=62
CGTCGCCAAACTTAAC_HP100.txt
=2=4=21=28=51=53=58

Transpose data in awk

I have a benchmarking tool that has an output looking like this:
Algorithm Data Size CPU Time (ns)
----------------------------------------
bubble_sort 1 16.1
bubble_sort 2 19.1
bubble_sort 4 32.8
bubble_sort 8 74.3
bubble_sort 16 257
bubble_sort 32 997
bubble_sort 64 4225
bubble_sort 128 18925
bubble_sort 256 83565
bubble_sort 512 313589
bubble_sort 1024 1161146
insertion_sort 1 16.1
insertion_sort 2 17.7
insertion_sort 4 26.5
insertion_sort 8 43.7
insertion_sort 16 96.1
insertion_sort 32 263
insertion_sort 64 770
insertion_sort 128 2807
insertion_sort 256 10775
insertion_sort 512 38956
insertion_sort 1024 135419
std_sort 1 17.3
std_sort 2 20.7
std_sort 4 24.4
std_sort 8 32.7
std_sort 16 59.6
std_sort 32 173
std_sort 64 345
std_sort 128 762
std_sort 256 1769
std_sort 512 3982
std_sort 1024 18500
And I'm trying to transform this to become more like this:
Data Size bubble_sort insertion_sort std_sort
1 16.1 16.1 17.3
2 19.1 17.7 20.7
4 32.8 26.5 24.4
8 74.3 43.7 32.7
16 257 96.1 59.6
32 997 263 173
64 4225 770 345
128 18925 2807 762
256 83565 10775 1769
512 313589 38956 3982
1024 1161146 135419 18500
Is there a simple way to achieve this using awk? I'm mostly interested in the numbers in the final table, so the header line isn't essential.
==============================
EDIT:
I was actually able to achieve this using the following code
{
map[$1][$2] = $3
}
END {
for (algo in map) {
some_algo = algo
break;
}
printf "size "
for (algo in map) {
printf "%s ", algo
}
print ""
for (size in map[some_algo]) {
printf "%s ", size
for (algo in map) {
printf "%s ", map[algo][size]
}
printf "\n"
}
}
This works. However, it has two minor problems: It looks a little bit difficult to read, therefore, is there a better and more idiomatic way to do the job? Also, the order of the resulting columns is different from the order in the original data rows. Is there a simple way to fix this order?

here is an alternative
$ sed 1,2d file |
pr -w200 -3t |
awk 'NR==1{print "Data_Size", $1,$4,$7} {print $2,$3,$6,$9}' |
column -t
Data_Size bubble_sort insertion_sort std_sort
1 16.1 16.1 17.3
2 19.1 17.7 20.7
4 32.8 26.5 24.4
8 74.3 43.7 32.7
16 257 96.1 59.6
32 997 263 173
64 4225 770 345
128 18925 2807 762
256 83565 10775 1769
512 313589 38956 3982
1024 1161146 135419 18500

Here is a Ruby to do this.
Ruby is very much like awk but with additional functions and data structures. The advantage to this is it will correctly deal with missing values by inserting n/a if one of the data values is missing.
$ sed 1,2d file |
ruby -lane 'BEGIN{
h=Hash.new {|n,k| n[k]={} }
l="Data_Size"
}
h[l][$F[1]]=$F[1]
h[$F[0]][$F[1]]=$F[2]
END{
puts h.keys.join("\t")
h[l]=h[l].sort{|a,b| a[0].to_i<=>b[0].to_i}.to_h
h[l].each_key { |k|
a=[]
h.each_key { |j|
a.push(h[j][k] || "n/a")
}
puts a.join("\t")
}
}' | column -t
Taking your example and removing the line bubble_sort 4 32.8 prints:
Data_Size bubble_sort insertion_sort std_sort
1 16.1 16.1 17.3
2 19.1 17.7 20.7
4 n/a 26.5 24.4
8 74.3 43.7 32.7
16 257 96.1 59.6
32 997 263 173
64 4225 770 345
128 18925 2807 762
256 83565 10775 1769
512 313589 38956 3982
1024 1161146 135419 18500

Commaide killed by OOM when analyzing a file

Comma is always killed by oom when I try to edit a file in which I use 10 modules (mainly Cro::HTTP ) and after the start of analysis. I can see that many raku process are running (in order to analyze?) :
journalctl :
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 4722] 1000 4722 654 29 40960 0 0 comma.sh
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 4771] 1000 4771 1085318 155576 1949696 0 0 java
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 4825] 1000 4825 783 35 40960 0 0 fsnotifier64
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5036] 1000 5036 52039 24008 364544 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5038] 1000 5038 51119 25114 372736 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5039] 1000 5039 52391 23805 368640 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5047] 1000 5047 51473 22787 352256 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5049] 1000 5049 51129 22929 356352 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5050] 1000 5050 49796 21981 348160 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5052] 1000 5052 50929 25154 368640 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5057] 1000 5057 52078 23535 364544 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5066] 1000 5066 51071 22735 348160 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5075] 1000 5075 51254 22555 356352 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5081] 1000 5081 49423 21271 335872 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5093] 1000 5093 49375 21590 344064 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5100] 1000 5100 50784 22763 352256 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5104] 1000 5104 49360 21141 335872 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: [ 5115] 1000 5115 46338 14169 282624 0 0 rakudo
janv. 09 19:47:42 samuel-Virtual-Machine kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user#1000.service,task=java,pid=4771,uid=1000
janv. 09 19:47:42 samuel-Virtual-Machine kernel: Out of memory: Killed process 4771 (java) total-vm:4341272kB, anon-rss:622236kB, file-rss:0kB, shmem-rss:68kB, UID:1000 pgtables:1904kB oom_score_adj:0
janv. 09 19:47:42 samuel-Virtual-Machine kernel: oom_reaper: reaped process 4771 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:68kB
Is this a normal behavior? Is there an option to limit the number of parallel raku process ?
(I work on a small Vm with 4GB of memory).

It's expected that Comma will invoke the selected Raku compiler in order to obtain symbols from modules. That should take place once at the start of editing a file using a particular module and then be cached (and the caching is across the project as a whole).
Aside from the number of rakudo instances spawned, the memory usage of the java process itself looks a bit on the high side. Probably it's worth asking the Comma developers to take a look at it, and providing some more detailed information. Of note, the Help menu has a "Collect Logs and Diagnostic Data" option (which will provide a zip file that can be sent to the developers, although note it may also include some data about the project you are working on). Any other information to aid reproduction (such as the list of modules being used) would also be useful.

Cannot run imagenet download and preprocess script (suggestion in issue 202 did not work)

I am following the instruction here: https://github.com/tensorflow/models/tree/master/inception
After running bazel-bin/inception/imagenet_train --num_gpus=1 --batch_size=32 --train_dir=/tmp/imagenet_train --data_dir=/tmp/imagenet_data
I get the following error:
bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception/inception/data/download_imagenet.sh: line 105: bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception/inception/data/imagenet_lsvrc_2015_synsets.txt: No such file or directory"
I saw the post on this in 202 but the suggestion to "add main before /inception" in work_dir did not solve the problem. Below is the output of ls -l -R bazel-bin/inception/download_and_preprocess_imagenet.runfiles/:
bazel-bin/inception/download_and_preprocess_imagenet.runfiles/:
total 8
drwxr-xr-x 3 parsa parsa 4096 Jun 30 14:34 inception
-r-xr-xr-x 1 parsa parsa 1737 Jun 30 14:34 MANIFEST
bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception:
total 4
drwxr-xr-x 3 parsa parsa 4096 Jun 30 14:34 inception
bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception/inception:
total 12
lrwxrwxrwx 1 parsa parsa 149 Jun 30 14:34 build_imagenet_data -> /home/parsa/.cache/bazel/_bazel_parsa/cf59658e104287859f50b192c32a27cc/execroot/inception/bazel-out/local-fastbuild/bin/inception/build_imagenet_data
drwxr-xr-x 2 parsa parsa 4096 Jun 30 14:34 data
lrwxrwxrwx 1 parsa parsa 162 Jun 30 14:34 download_and_preprocess_imagenet -> /home/parsa/.cache/bazel/_bazel_parsa/cf59658e104287859f50b192c32a27cc/execroot/inception/bazel-out/local-fastbuild/bin/inception/download_and_preprocess_imagenet
-r-xr-xr-x 1 parsa parsa 0 Jun 30 14:34 init.py
bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception/inception/data:
total 32
lrwxrwxrwx 1 parsa parsa 94 Jun 30 14:34 build_imagenet_data.py -> /home/parsa/Documents/development/brain/models/inception/inception/data/build_imagenet_data.py
lrwxrwxrwx 1 parsa parsa 107 Jun 30 14:34 download_and_preprocess_imagenet.sh -> /home/parsa/Documents/development/brain/models/inception/inception/data/download_and_preprocess_imagenet.sh
lrwxrwxrwx 1 parsa parsa 92 Jun 30 14:34 download_imagenet.sh -> /home/parsa/Documents/development/brain/models/inception/inception/data/download_imagenet.sh
lrwxrwxrwx 1 parsa parsa 114 Jun 30 14:34 imagenet_2012_validation_synset_labels.txt -> /home/parsa/Documents/development/brain/models/inception/inception/data/imagenet_2012_validation_synset_labels.txt
lrwxrwxrwx 1 parsa parsa 103 Jun 30 14:34 imagenet_lsvrc_2015_synsets.txt -> /home/parsa/Documents/development/brain/models/inception/inception/data/imagenet_lsvrc_2015_synsets.txt
lrwxrwxrwx 1 parsa parsa 93 Jun 30 14:34 imagenet_metadata.txt -> /home/parsa/Documents/development/brain/models/inception/inception/data/imagenet_metadata.txt
-r-xr-xr-x 1 parsa parsa 0 Jun 30 14:34 init.py
lrwxrwxrwx 1 parsa parsa 110 Jun 30 14:34 preprocess_imagenet_validation_data.py -> /home/parsa/Documents/development/brain/models/inception/inception/data/preprocess_imagenet_validation_data.py
lrwxrwxrwx 1 parsa parsa 97 Jun 30 14:34 process_bounding_boxes.py -> /home/parsa/Documents/development/brain/models/inception/inception/data/process_bounding_boxes.py

I solved the problem by replacing the LABELS_FILE with the actual full file path in download_and_preprocess_imagenet.sh

WAS not processing request hence web are not available

I'm new to supporting WAS (Websphere Application Server), currently I'm having issue with my WAS, my WAS was installed under AIX under 2 servers/nodes.
While investigating it, I found in our application log that there are some activity which is "Performing Cache Maintenance":
===================================
2017-01-14 01:31:52,619: [Cache Maintenance] com.ibm.srm.util.db.ServerCache refreshed 2017-01-14 01:31:53,314: [Cache Maintenance] Memory: available=[6884mb] used=[9500mb] %used avail=[58%] max=[16384mb] %used max=[58%] total=[16384mb] free=[6884mb] used by doMaintenance=[-251,201,3 92bytes] Time=[22,818ms] 2017-01-14 01:51:53,325: -------- Performing Cache Maintenance -------- 2017-01-14 01:51:53,325: null : QN=319 Select * from perform.cache_timestamps where row_class_name not like '%Cache' and row_class_name not like '%(SRM 6.0)' 2017-01-14 01:51:53,333: Returning 19 data records, QN=319,2 columns, Time: 8ms conn/query time: 5ms
2017-01-14 01:51:53,333: [Cache Maintenance] Memory: available=[5492mb] used=[10892mb] %used avail=[66%] max=[16384mb] %used max=[66%] total=[16384mb] free=[5492mb] used by doMaintenance=[532kb] Time=[8ms]
===================================
After this activity triggered, I found that mpmstats value for 'bsy' are keep increasing until reach MaxClient maximum value which is '4000':
===================================
[Sat Jan 14 01:38:58 2017] [notice] mpmstats: rdy 166 bsy 234 rd 0 wr 234 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 01:38:58 2017] [notice] mpmstats: bsy: 234 in mod_was_ap22_http.c [Sat Jan 14 01:48:58 2017] [notice] mpmstats: rdy 195 bsy 505 rd 0 wr 505 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 01:48:58 2017] [notice] mpmstats: bsy: 505 in mod_was_ap22_http.c [Sat Jan 14 01:58:58 2017] [notice] mpmstats: rdy 180 bsy 720 rd 0 wr 720 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 01:58:58 2017] [notice] mpmstats: bsy: 720 in mod_was_ap22_http.c [Sat Jan 14 02:08:59 2017] [notice] mpmstats: rdy 105 bsy 895 rd 1 wr 894 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 02:08:59 2017] [notice] mpmstats: bsy: 894 in mod_was_ap22_http.c [Sat Jan 14 02:18:59 2017] [notice] mpmstats: rdy 112 bsy 1088 rd 1 wr 1087 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 02:18:59 2017] [notice] mpmstats: bsy: 1087 in mod_was_ap22_http.c
[Sat Jan 14 02:28:59 2017] [notice] mpmstats: rdy 158 bsy 1242 rd 1 wr 1241 ka 0 log 0 dns 0 cls 0
----
[Sat Jan 14 04:55:34 2017] [notice] mpmstats: rdy 0 bsy 4000 rd 0 wr 4000 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 04:55:34 2017] [notice] mpmstats: bsy: 4000 in mod_was_ap22_http.c [Sat Jan 14 04:57:04 2017] [notice] mpmstats: reached MaxClients (4000/4000) [Sat Jan 14 04:57:04 2017] [notice] mpmstats: rdy 0 bsy 4000 rd 0 wr 4000 ka 0 log 0 dns 0 cls 0 [Sat Jan 14 04:57:04 2017] [notice] mpmstats: bsy: 4000 in mod_was_ap22_http.c [Sat Jan 14 04:58:34 2017] [notice] mpmstats: reached MaxClients (4000/4000) [Sat Jan 14 04:58:34 2017] [notice] mpmstats: rdy 0 bsy 4000 rd 0 wr 4000 ka 0 log 0 dns 0 cls 0
[Sat Jan 14 04:58:34 2017] [notice] mpmstats: bsy: 4000 in mod_was_ap22_http.c
===================================
It seem WAS are not processing the client request until it reached the Maximum value.
The questions are:
Is there any log that I can check about why WAS are not processing the Client request until it reached to the max value?
Does the "Cache Maintenance" activity hold WAS from processing the Client request? Because as mentioned from our developer this activity should not lead to this issue.
What is the procedure that I can take to identify/resolve this issue?
Appreciate if can help me for this thing as this issue already occurred for a long time but still not resolve.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to process fields between matching patterns? - awk

Related

AWK ouput to separated file

Transpose data in awk

Commaide killed by OOM when analyzing a file

Cannot run imagenet download and preprocess script (suggestion in issue 202 did not work)

WAS not processing request hence web are not available

Categories

Resources