Print blank space in a table with awk - awk
I have a pattern that looks something like this.
No Type Pid Status Cause Start Rstr Err Sem Time Program Cl User Action Table
-------------------------------------------------------------------------------------------------------------------------------
0 DIA 10897 Wait yes no 0 0 0 NO_ACTION
1 DIA 10903 Wait yes no 0 0 0 NO_ACTION
2 DIA 10909 Wait yes no 0 0 0 NO_ACTION
3 DIA 10916 Wait yes no 0 0 0 NO_ACTION
4 DIA 10917 Wait yes no 0 0 0 NO_ACTION
5 DIA 9061 Wait yes no 1 0 0 NO_ACTION
6 DIA 10919 Wait yes no 0 0 0 NO_ACTION
7 DIA 10920 Wait yes no 0 0 0 NO_ACTION
8 UPD 10921 Wait yes no 0 0 0 NO_ACTION
9 BTC 24376 Wait yes no 0 0 0 NO_ACTION
10 BTC 25651 Wait yes no 1 0 0 NO_ACTION
11 BTC 25361 Wait yes no 0 0 0 NO_ACTION
12 BTC 15201 Wait yes no 0 0 0 NO_ACTION
13 BTC 5241 Wait yes no 0 0 0 NO_ACTION
14 BTC 23572 Wait yes no 0 0 0 NO_ACTION
15 BTC 8603 Wait yes no 0 0 0 NO_ACTION
16 BTC 1418 Wait yes no 0 0 0 NO_ACTION
17 BTC 18127 Wait yes no 1 0 0 NO_ACTION
18 BTC 14780 Wait yes no 0 0 0 NO_ACTION
19 BTC 18234 Wait yes no 0 0 0 NO_ACTION
20 BTC 14856 Wait yes no 0 0 0 NO_ACTION
21 SPO 10934 Wait yes no 0 0 0 NO_ACTION
22 UP2 10939 Wait yes no 0 0 0 NO_ACTION
Now I am using awk to convert it something like below
NO=0,Type=DIA,Pid=10897,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=
above is the sample for one line would be the same for all lines.
we are removing the column header via sed command at runtime, now when we use awk it misses space between status and cause and write the value which should be with start at the cause.
we are using the below command.
awk 'BEGIN{FS=" ";OFS=","}{print "NO="$1,"Type="$2,"Pid="$3,"Status="$4,"Cause="$5,"Start="$6,"Rstr="$7,"Err="$8,"Sem="$9,"Time="$10,"Program="$11,"Cl="$12,"User="$13,"Action="$14,"Table="$15;}'
we want the output to be like this
NO=0,Type=DIA,Pid=10897,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
and one more thing to add these blank fields will have some values from time to time.
This may do:
awk 'NR==1 {for (i=1;i<=NF;i++) a[i]=$i;c=NF;next} NR>2 {for (i=1;i<=c;i++) printf "%s=%s,",a[i],$i;print ""}' file
No=0,Type=DIA,Pid=10897,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=1,Type=DIA,Pid=10903,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=2,Type=DIA,Pid=10909,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=3,Type=DIA,Pid=10916,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=4,Type=DIA,Pid=10917,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=5,Type=DIA,Pid=9061,Status=Wait,Cause=yes,Start=no,Rstr=1,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=6,Type=DIA,Pid=10919,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=7,Type=DIA,Pid=10920,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=8,Type=UPD,Pid=10921,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=9,Type=BTC,Pid=24376,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=10,Type=BTC,Pid=25651,Status=Wait,Cause=yes,Start=no,Rstr=1,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=11,Type=BTC,Pid=25361,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
No=12,Type=BTC,Pid=15201,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=,
NO_ACTION is hard to handle but can be done using fixed file width FIELDWIDTHS="3 3 3 3 3 3 3 3". But since header are not aligned with data, it may be hard in a simple command.
There is no clear information on how your data looks like. We do not know if your data is tab-delimited (which would be nice) or only space delimited. If it is space-delimited, as the example gives, than it is hard to distinguish empty columns.
The only way I can see to distinguish empty columns is by assuming that the header of the input file, is aligned with the corresponding column, so we can use this to our advantage. The following solution is for GNU awk 4.2 or higher
Have a file convert.awk which contains the following content:
BEGIN{ OFS="," }
# Read header and find the starting index of each column
# and the corresponding length
# We assume that the headers are uniquely defined.
(FNR==1) {
h[1]=$1; l=1
for (i=2;i<=NF;++i) {
h[i]=$i; t=index($0,$i); f=f " "(t-l); l=t
}
n=NF; FIELDWIDTHS = f " *"
next
}
# skip ruler
/^[-]+$/ { next }
# print record
{
for (i=1;i<=n;++i) {
t=(i>NF ? "" : $i); gsub("(^ *| *$)","",t)
printf "%s%s=%s",(i==1?"":OFS),h[i],t
}
printf ORS
}
and run with:
$ awk -f convert.awk input > output
This outputs:
No=0,Type=DIA,Pid=10897,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=1,Type=DIA,Pid=10903,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=2,Type=DIA,Pid=10909,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=3,Type=DIA,Pid=10916,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=4,Type=DIA,Pid=10917,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=5,Type=DIA,Pid=9061,Status=Wait,Cause=,Start=yes,Rstr=no,Err=1,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=6,Type=DIA,Pid=10919,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=7,Type=DIA,Pid=10920,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=8,Type=UPD,Pid=10921,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=9,Type=BTC,Pid=24376,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=10,Type=BTC,Pid=25651,Status=Wait,Cause=,Start=yes,Rstr=no,Err=1,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=11,Type=BTC,Pid=25361,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=12,Type=BTC,Pid=15201,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=13,Type=BTC,Pid=5241,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=14,Type=BTC,Pid=23572,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=15,Type=BTC,Pid=8603,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=16,Type=BTC,Pid=1418,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=17,Type=BTC,Pid=18127,Status=Wait,Cause=,Start=yes,Rstr=no,Err=1,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=18,Type=BTC,Pid=14780,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=19,Type=BTC,Pid=18234,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=20,Type=BTC,Pid=14856,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=21,Type=SPO,Pid=10934,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
No=22,Type=UP2,Pid=10939,Status=Wait,Cause=,Start=yes,Rstr=no,Err=0,Sem=0,Time=0,Program=,Cl=,User=,Action=NO_ACTION,Table=
Related
Awk multiply integers in geometric sequence in each cell
I have a dataframe like this 0 1 0 1 0 0.... 1 1 1 1 0 0 0 0 1 1 0 1 . . . And I want to multiply each of them with a geometric sequence 1, 10, 100, 1000, 10000 ... 10^(n-1) so the result will be 0 10 0 1000 0 0.... 1 10 100 1000 0 0 0 0 100 1000 0 100000 . . . I have tried with awk '{n=0 ; x=0 ; for (i = 1; i <= NF; i++) if ($i == 1) {n=10**i ; x = x+n } print x }' test.txt But the results were not what I expected
With GNU awk: awk '{for (i=1; i<=NF; i++){if($i==1){n=10**(i-1); $i=$i*n}} print}' test.txt Output: 0 10 0 1000 0 0 1 10 100 1000 0 0 0 0 100 1000 0 100000
Note: In this answer, we always assume single digits per column There are a couple of things you have to take into account: If you have a sequence given by: a b c d e Then the final number will be edcba awk is not aware of integers, but knows only floating point numbers, so there is a maximum number it can reach, from an integer perspective, and that is 253 (See biggest integer that can be stored in a double). This means that multiplication is not the way forward. If we don't use awk, this is still valid for integer arithmetic as the maximum value is 264-1 in unsigned version. Having that said, it is better to just write the number with n places and use 0 as a delimiter. Example, if you want to compute 3 × 104, you can do; awk 'BEGIN{printf "%0.*d",4+1,3}' | rev Here we make use of rev to reverse the strings (00003 → 30000) Solution 1: In the OP, the code alludes to the fact that the final sum is requested (a b c d e → edcba). So we can just do the following: sed 's/ //g' file | rev awk -v OFS="" '{$1=$1}1' file | rev If you want to get rid of the possible starting zero's you can do: sed 's/ //g;s/^0*//; file | rev Solution 2: If the OP only wants the multiplied columns as output, we can do: awk '{for(i=NF;i>0;--i) printf("%0.*d"(i==1?ORS:OFS),i,$i)}' file | rev Solution 3: If the OP only wants the multiplied columns as output and the sum: awk '{ s=$0;gsub(/ /,"",s); printf s OFS } { for(i=NF;i>0;--i) printf("%0.*d"(i==1?ORS:OFS),i,$i)} } ' | rev
What you wrote is absolutely not what you want. Your awk program parses each line of the input and computes only one number per line which happens to be 10 times the integer you would see if you were writing the 0's and 1's in reverse order. So, for a line like: 1 0 0 1 0 1 your awk program computes 10+0+0+10000+0+1000000=1010010. As you can see, this is the same as 10 times 101001 (100101 reversed). To do what you want you can loop over all fields and modify them on the fly by multiplying them by the corresponding power of 10, as shown in the an other answer. Note: another awk solution, a bit more compact, but strictly equivalent for your inputs, could be: awk '{for(i=1;i<=NF;i++) $i*=10**(i-1)} {print}' test.txt The first block loops over the fields and modifies them on the fly by multiplying them by the corresponding power of 10. The second block prints the modified line. As noted in an other answer there is a potential overflow issue with the pure arithmetic approach. If you have lines with many fields you could hit the maximum of integer representation in floating point format. It could be that the strange 1024 values in the output you show are due to this. If there is a risk of overflow, as suggested in the other answer, you could use another approach where the trailing zeroes are added not by multiplying by a power of 10, but by concatenating value 0 represented on the desired number of digits, something that printf and sprintf can do: $ awk 'BEGIN {printf("%s%0.*d\n",1,4,0)}' /dev/null 10000 So, a GNU awk solution based on this could be: awk '{for(i=1;i<=NF;i++) $i = $i ? sprintf("%s%0.*d",$i,i-1,0) : $i} 1' test.txt
how about not doing any math at all : {m,n,g}awk '{ for(_+=_^=___=+(__="");_<=NF;_++) { $_=$_ ( \ __=__""___) } } gsub((_=" "(___))"+",_)^+_' = 1 0 0 0 10000 0 0 0 0 1000000000 10000000000 1 0 0 0 10000 0 0 10000000 0 0 10000000000 1 0 0 0 10000 100000 0 0 0 0 10000000000 1 0 0 1000 0 0 1000000 0 100000000 1000000000 1 0 0 1000 10000 0 0 0 0 1000000000 10000000000 1 0 100 0 0 0 1000000 10000000 0 0 10000000000 1 0 100 0 0 100000 1000000 10000000 100000000 1000000000 1 0 100 0 10000 0 1000000 0 100000000 1 0 100 1000 0 100000 0 0 0 0 10000000000 1 0 100 1000 10000 0 0 10000000 1 10 0 0 0 0 1000000 10000000 0 1000000000 1 10 0 1000 0 100000 0 0 100000000 1 10 0 1000 0 100000 0 0 100000000 1000000000 10000000000 1 10 0 1000 0 100000 0 10000000 100000000 1000000000 1 10 100 1000 10000 100000 0 0 0 1000000000
Write GeoTIFF File to GRIB2 Using GDAL
I am looking to convert a GeoTIFF file to GRIB2, and define several pieces of metadata manually as seen in the provided literature here. I am using the GDAL library, specifically the script gdal translate. My attempt to convert and pass specific metadata is as follows: gdal_translate -b 1 -mo DISCIPLINE=0 IDS_CENTER=248 IDS_SUBCENTER=4 IDS_MASTER_TABLE=24 IDS_SIGNF_REF_TIME=1 IDS_REF_TIME=2020-07-02T00:00:00Z IDS_PROD_STATUS=0 IDS_TYPE=1 PDS_PDTN=0 PDS_TEMPLATE_NUMBERS="0 4 2 0 96 0 0 0 1 0 0 0 0 103 0 0 0 0 2 255 0 0 0 0 0 7 228 7 2 13 0 0 1 0 0 0 0 2 2 1 0 0 0 1 255 0 0 0 0" PDS_TEMPLATE_ASSEMBLED_VALUES="0 4 2 0 96 0 0 1 0 103 0 2 255 0 0 2020 7 2 13 0 0 1 0 2 2 1 1 255 0" input.tif output.grb2 However, upon executing this command I receive the following error: ERROR 6: Too many command options 'IDS_MASTER_TABLE=24' Potential errors: Not calling the correct subprocess (currently using -mo) when attempting to pass the metadata, all metadata pairs must be encased in quotation marks, etc. Any help would be greatly appreciated!
You need to add an -mo flag for every metadata. Your command would become: $ gdal_translate -b 1 \ -mo DISCIPLINE=0 \ -mo IDS_CENTER=248 \ # etc. input.tif output.grb2
How to awk interval expression and grouping together?
I am trying to get the measure of cpu time spent on user tasks, system tasks, interrupt handling, io wait etc by parsing the the below output of /proc/stat. My intent is to retrieve the numerical value in the first line{the one that starts with "cpu " into seperate array elements indexed from 1 through N [kcube#myPc ~]$ cat /proc/stat cpu 70508209 48325 12341967 18807644987 671141 0 11736 0 0 0 cpu0 4350458 319 868828 1175271469 23047 0 2397 0 0 0 cpu1 3944197 277 857728 1175822236 16462 0 1025 0 0 0 cpu2 3919468 538 924717 1175628294 136617 0 2270 0 0 0 cpu3 3763268 441 855219 1175968114 43631 0 733 0 0 0 cpu4 3551196 147 856029 1176198902 18392 0 851 0 0 0 cpu5 5394823 1806 997806 1174089493 120122 0 2056 0 0 0 cpu6 3425023 656 839042 1176324091 58718 0 3 0 0 0 cpu7 3167959 189 811389 1176654383 19218 0 2 0 0 0 cpu8 4454976 5046 625657 1175714502 10447 0 26 0 0 0 cpu9 5049813 5365 655732 1175082394 10511 0 30 0 0 0 cpu10 4746872 4727 630042 1175408141 10959 0 28 0 0 0 cpu11 5367186 4684 659408 1174759103 9992 0 23 0 0 0 cpu12 4744405 5940 704282 1175177246 149934 0 714 0 0 0 cpu13 4689816 5954 650193 1175439255 13494 0 5 0 0 0 cpu14 4872185 5479 699429 1175126266 16945 0 898 0 0 0 cpu15 5066558 6748 706459 1174981089 12643 0 669 0 0 0 I have below awk script. [kcube#myPc ~]$ cat test.awk { if ( match($0,/cpu\s(\s[[:digit:]]+){10}$/, ary) ) { print ary[0] print ary[1] } } This always gives me the last numeric value in the first line into ary[1]. What I am looking for is to have like : ary[1] = 70508209 ary[2] = 48325 . . so on I never used interval expression and grouping together. I tried searching for answers but couldn't find one. Can someone help me out? I'm using GNU Awk 4.0.2
$ cat tst.awk match($0,/^cpu\s(\s[[:digit:]]+){10}$/,ary) { print "bad match:", ary[1] print "bad match:", ary[2] } match($0,/^cpu\s+([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)\s([[:digit:]]+)$/,ary) { print "good match:", ary[1] print "good match:", ary[2] } /^cpu\s/ && split($0,tmp,/[[:digit:]]+/,ary) { print "good split:", ary[1] print "good split:", ary[2] } $ awk -f tst.awk file bad match: 0 bad match: good match: 70508209 good match: 48325 good split: 70508209 good split: 48325 An interval expression defines how many repetitions of the previous expression must exist for the regexp to match, that is all. It is not part of populating capture groups - that is entirely up to use of round brackets enclosing regexp segments. To do what you want you need to either define explicit capture groups for each number, or use split() or similar to create the array based on a regexp that describes each entity you want to be captured. All of the above uses GNU awk - for the 3rd arg to match() and the 4th arg to split(). Note that you can also just do this with GNU awk for FPAT: $ awk -v FPAT='[0-9]+' '/^cpu /{for (i=1; i<=NF; i++) print i, $i}' file 1 70508209 2 48325 3 12341967 4 18807644987 5 671141 6 0 7 11736 8 0 9 0 10 0
awk: take entries from file and add values in between
I have the following file: 2 some 5 some 8 some 10 thing 15 thing 19 thing Now I want to end up with entries, where for "some" 2,5,8 correspond to rows where there is a 1, everything else is 0. It doesn't matter how many rows there are. This means for "some": 0 1 0 0 1 0 0 1 0 0 and for "thing" 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 Is this possible in a quick way with awk? I mean with something like: awk '{for(i=1;i<=10;i++) entries[$i]=0 for(f=0;<=NF;f++) entries[$f]=1' testfile.txt
another awk, output terminates with the last index awk -v key='thing' '$2==key{while(++c<$1) print 0; print 1}' file to add some extra 0's after the last 1; add END{while(i++<3) print 0}
Something like this seems to work in order to produce "some" data: $ cat file1 2 some 5 some 8 some 10 thing 15 thing 19 thing $ awk 'max<$1 && $2=="some"{max=$1;b[$1]=1}END{for (i=1;i<=max;i++) print (i in b?1:0)}' file1 0 1 0 0 1 0 0 1 Similarly , this one works for "thing" data $ awk 'max<$1 && $2=="thing"{max=$1;b[$1]=1}END{for (i=1;i<=max;i++) print (i in b?1:0)}' file1 Alternativelly, as mentioned by glennjackman in comments we could use an external variable to select between some or thing: $ awk -v word="some" 'max<$1 && $2==word{max=$1;b[$1]=1}END{for (i=1;i<=max;i++) print (i in b?1:0)}' file1 # for thing just apply awk -v word="thing" You can achieve better parameterizing using an awk variable like this: $ w="some" #selectable / set by shell , by script , etc $ awk -v word="$w" 'max<$1 && $2==word{max=$1;b[$1]=1}END{for (i=1;i<=max;i++) print (i in b?1:0)}' file1
perl: perl -lanE ' push #{$idx{$F[1]}}, $F[0] - 1; # subtract 1 because we are working with # (zero-based) array indices $max = $F[0]; # I assume the input is sorted by column 1 } END { $, = "\n"; for $word (keys %idx) { # create a $max-sized array filled with zeroes #a = (0) x $max; # then, populate the entries which should be 1 #a[ #{$idx{$word}} ] = (1) x #{$idx{$word}}; say $word, #a; } ' file | pr -2T -s | nl -v0 0 thing some 1 0 0 2 0 1 3 0 0 4 0 0 5 0 1 6 0 0 7 0 0 8 0 1 9 0 0 10 1 0 11 0 0 12 0 0 13 0 0 14 0 0 15 1 0 16 0 0 17 0 0 18 0 0 19 1 0
Eliminate lines based on values in multiple columns
I'm trying to remove rows from a big table but conditioned that one column has one value and another column has other values. So far I've been trying this but I guess I'm not combining the awk properly.. awk '$11 !="1"'| awk '$20==2 || $20==3' infile.txt >out.txt The code is probably too simple but should work anyways..or not? Thanks edit: This is what the table looks like 5306083 TGATCAATCTCATAAC[A/C]AAAAAAAAA consensus_24 211 1 species 0 0 0 T 0 7 T recommended 0.708 F 0 -100 T recommended 5193751 AGTAGCTTGCGCGGA[C/T]GGGGGGGGG consensus_32 227 1 species 0 0 0 T 1 1 T not_recommended 0.75 F 0 -100 T not_recommended 5193254 TAAAAAAAAAAAAAA[G/T]ATTCATCC consensus_26 192 1 species 0 0 0 T 1 0 T not_recommended 0.726 F 0 -100 T neutral So if I filter based in that $11=1 and $20 needs to be "neutral" or "not_recommended" I would get this 5306083 TGATCAATCTCATAAC[A/C]AAAAAAAAA consensus_24 211 1 species 0 0 0 T 0 7 T recommended 0.708 F 0 -100 T recommended
awk '$11!=1 && ($20==2 || $20==3)' infile.txt > out.txt should do. UPDATE: based on the input given, you should get two lines in the output for this condition $ awk '$11==1 && ($20=="not_recommended" || $20=="neutral")' file 5193751 AGTAGCTTGCGCGGA[C/T]GGGGGGGGG consensus_32 227 1 species 0 0 0 T 1 1 T not_recommended 0.75 F 0 -100 T not_recommended 5193254 TAAAAAAAAAAAAAA[G/T]ATTCATCC consensus_26 192 1 species 0 0 0 T 1 0 T not_recommended 0.726 F 0 -100 T neutral But I guess, what you mean is you want the negation of this which is different from your original post $ awk '$11!=1 || ($20!="not_recommended" && $20!="neutral")' file 5306083 TGATCAATCTCATAAC[A/C]AAAAAAAAA consensus_24 211 1 species 0 0 0 T 0 7 T recommended 0.708 F 0 -100 T recommended