I'm trying to match the lines containing (123) and then manipulate field 2 replacing x and + by space that will give 4 columns. Then change order of column 3 by Column 4.
To finally print sorted first by column 3 and second by column 4.
I'm able to get the output piping sort command after awk output in this way.
$ echo "
0: 1920x1663+0+0 kpwr(746)
323: 892x550+71+955 kpwr(746)
211: 891x550+1003+410 kpwr(746)
210: 892x451+71+410 kpwr(746)
415: 891x451+1003+1054 kpwr(746)
1: 894x532+70+330 kpwr(123)
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
61: 892x1+71+375 kpwr(0)
252: 892x1+71+921 kpwr(0)" |
awk '/\(123\)/{b = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2); print b}' |
sort -k3 -k4 -n
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
How can I get the same output using only awk without the need to pipe sort? Thanks for any help.
Here is how you can get it from awk (gnu) itself:
awk '/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a) # split by space and store into array a
# store array by index 3 and 4
rec[a[3]][a[4]] = (rec[a[3]][a[4]] == "" ? "" : rec[a[3]][a[4]] ORS) $2
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # sort by numeric key ascending
for (i in rec) # print stored array rec
for (j in rec[i])
print rec[i][j]
}' file
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
Can you handle GNU awk?:
$ gawk '
BEGIN {
PROCINFO["sorted_in"]="#val_num_asc" # for order strategy
}
/\(123\)$/ { # pick records
split($2,t,/[+x]/) # split 2nd field
if((t[4] in a) && (t[3] in a[t[4]])) { # if index collision
n=split(a[t[4]][t[3]],u,ORS) # split stacked element
u[n+1]=t[1] OFS t[2] OFS t[4] OFS t[3] # add new data
delete a[t[4]][t[3]] # del before rebuilding
for(i in u) # sort on whole record
a[t[4]][t[3]]=a[t[4]][t[3]] ORS u[i] # restack to element
} else
a[t[4]][t[3]]=t[1] OFS t[2] OFS t[4] OFS t[3] # no collision, just add
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # strategy on output
for(i in a)
for(j in a[i])
print a[i][j]
}' file
Output:
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
With collisioning data like:
1: 894x532+70+330 kpwr(123) # this
1: 123x456+70+330 kpwr(123) # and this, notice order
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
output would be:
123 456 330 70 # ordered by the whole record when collision
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
I was almost done with writing and my solution was ditto as #anubhava's so adding a bit tweak to his solution :) This one will take care of multiple lines of same values here.
awk '
BEGIN{
PROCINFO["sorted_in"]="#ind_num_asc"
}
/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a," ")
arr[a[3]][a[4]] = (arr[a[3]][a[4]]!=""?arr[a[3]][a[4]] ORS:"")$2
}
END {
for (i in arr){
for (j in arr[i]){ print arr[i][j] }
}
}' Input_file
So I have a log file that contains entries like this:
[STAT] - December 11, 2017 13:16:05.360
.\something.cpp(99): [Text] Code::Open Port 1, baud 9600, parity 0, Stop bits 0, flow control 0
[STAT] - December 11, 2017 13:20:24.637
.\something\more\code.cpp(100): [log]
fooBarBaz[32] = 32, 1, 2, 7, 3, 1092, 5, 196875, 6, 270592, 20, 196870, 8, 289, 30, 196867, 11, 1156, 5, 196875, 28, 278784, 5, 196874, 32, 266496, 30, 6866, 36, 147712, 5, 196874,
[STAT] - December 11, 2017 13:20:40.939
.\something\more\code.cpp(100): [log]
fooBarBaz[8] = 8, 1, 2, 1, 31, 532992, 5, 196875,
[STAT] - December 11, 2017 13:18:16.214
.\something\more\code.cpp(100): [log]
fooBarBaz[12] = 12, 1, 2, 2, 17, 296960, 10, 196872, 51, 1792, 50, 196878,
On the command line, I can do this:
gawk -F', *' '/fooBarBaz\[[^0].*\]/ {for (f=5; f<=NF; f+=4) print $f | "sort -n" }' log
Which produces an output like this:
3
6
8
11
17
28
31
32
36
51
I'd like to have an awk script do the same thing, but my efforts so far haven't
worked.
#!/usr/local/bin/gawk -f
BEGIN { print "lines"
FS=", *";
/fooBarBaz\[[^0].*\]/
}
{
{for (f=5; f<=NF; f+=4) print $f}
}
I don't think my regular expression statement is in the right place, because
running gawk -f script.awk prints lines not relevant to my data.
What am I doing wrong?
tl;dr: On lines with fooBarBaz and not [0], I want to parse the digits starting at position 5 and then position 4 to the end of the line.
Optimized GNU awk solution:
parse_digits.awk script:
#!/bin/awk -f
BEGIN{
FS=", *";
PROCINFO["sorted_in"]="#ind_num_asc";
print "lines";
}
/fooBarBaz\[[1-9]+[0-9]*\]/{
for (i=5; i <= NF; i+=4)
if ($i != "") a[$i]
}
END{
for (i in a) print i
}
Usage:
awk -f parse_digits.awk inputfile
The output:
lines
3
6
8
11
17
28
31
32
36
51
I am trying to look for $2 of file1 (skipping the header) in $2 of file2 and if they match and the value in $10 is > 30 and $11 is > 49, then print the line to a output file. The below awk has syntax errors in it though shellcheck didn't return any. Both the input and output are tab-delimited. I think the below is close, but not sure what is wrong. Thank you :).
awk
awk -F'\t' -v OFS='\t' 'NR==FNR{A[$2];next}$2 in A
{if($10 >.5 OFS $11 > 49)
print ; next
' file1 file2
awk: cmd. line:2: {if($10 >.5 OFS $11 > 49)
awk: cmd. line:2: ^ syntax error
awk: cmd. line:3: print ; next
awk: cmd. line:3: ^ unexpected newline or end of string
file1
Missing in IDP but found in Reference:
2 166848646 G A exonic SCN1A 68 13 16;20 0;0 17;15 0;0 0;0 0;0 c.[5139C>T]+[=] 52.94
file2
chr2 166245425 SCN2A AMPL5155065355 SNP Het C/T C T 54 100 50 23 27
chr2 166848646 SCN1A AMPL1543060606 SNP Het G/A G A 52.9411764706 100 68 32 36
desired output
2 166848646 G A exonic SCN1A 68 13 16;20 0;0 17;15 0;0 0;0 0;0 c.[5139C>T]+[=] 52.94
edit with new awk
awk -F'\t' -v OFS='\t' 'NR==FNR{A[$2];next}$2 in A {
if($10 >.5 OFS $11 > 49) >>> if($10 >.5 && $11 > 49)
print }
' file1 file2 > out
awk: cmd. line:2: if($10 >.5 OFS $11 > 49) >>> if($10 >.5 && $11 > 49)
awk: cmd. line:2: ^ syntax error
here you go...
$ awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$2]; next}
($2 in a) && $10>30 && $11>49 ' file1 file2
I am trying to use awk to remove the lines in file that do not match the digits after the NM_ but before the . in $2 of list. Thank you :).
file
204 NM_003852 chr7 + 138145078 138270332 138145293
204 NM_015905 chr7 + 138145078 138270332 138145293
list
TRIM24 NM_015905.2
awk
awk -v OFS="\t" '{ sub(/\r/, "") } ; NR==FNR { N=$2 ; sub(/\..*/, "", $2); A[$2]=N; next } ; $2 in A { $2=A[$2] } 1' list file > out
current output
204 NM_003852 chr7 + 138145078 138270332 138145293
204 NM_015905.2 chr7 + 138145078 138270332 138145293
desired output (line 1 removed as that is the line that does not match)
204 NM_015905.2 chr7 + 138145078 138270332 138145293
awk 'NR==FNR{split($2,f2,".");a[f2[1]];next} $2 in a' list file
$ awk -F'[ .]' 'NR==FNR{a[$2];next}$2 in a' list file
204 NM_015905 chr7 + 138145078 138270332 138145293
fatal: not enough arguments to satisfy format string
`%s SPT=80'
^ ran out for this one
This my code
for ((h = 1 ; h < 4 ; h++ )); do
x=$(awk -v i=h -v j=17 'FNR == 2 {printf "%s " $j}' newiptables.log)
echo $x
This is my file
Dec 26 09:17:51 localhost kernel: IN=eth0 OUT= MAC=00:10:c6:a8:da:68:00:90:7f:9c:50:5a:08:00 SRC=198.252.206.16 DST=10.128.1.225 LEN=313 TOS=0x00 PREC=0x00 TTL=64 ID=59334 PROTO=TCP SPT=80 DPT=56506 WINDOW=46535 RES=0x00 ACK PSH URGP=0
Dec 26 09:17:52 localhost kernel: IN=eth0 OUT= MAC=00:10:c6:a8:da:68:00:90:7f:9c:50:5a:08:00 SRC=198.252.206.16 DST=10.128.1.225 LEN=1440 TOS=0x00 PREC=0x00 TTL=64 ID=47303 PROTO=TCP SPT=80 DPT=56506 WINDOW=46535 RES=0x00 ACK URGP=0
Dec 26 09:17:52 localhost kernel: IN=eth0 OUT= MAC=00:10:c6:a8:da:68:00:90:7f:9c:50:5a:08:00 SRC=198.252.206.16 DST=10.128.1.225 LEN=1440 TOS=0x00 PREC=0x00 TTL=64 ID=47559 PROTO=TCP SPT=80 DPT=56506 WINDOW=46535 RES=0x00 ACK URGP=0
The problem is a missing comma in the printf command for awk:
awk -v i=h -v j=17 'FNR == 2 {printf "%s ", $j}' newiptables.log
^
|== This is needed
Quoting from the manual:
A simple printf statement looks like this:
printf format, item1, item2, ...