awk to select CSV file columns greater than a certain number

awk to select CSV file columns greater than a certain number - awk

I have a CSV file named as "awk_column_select_test.csv"
a,b,c
0.2,0.4,0.5
0.3,0.6,0.7
0.4,0.8,0.9
I was trying to write an awk code to select the rows where either column 1 or column 2 or column 3 is great than 0.5.
My awk program, named "awk_select_column_test.awk" looks like this:
#!/usr/bin/awk -f
BEGIN {FS=","; cutoff="0.5"}
{$1 > cutoff || $2 > cutoff || $3 > cutoff}
END {print}
Then, I tried to run on command line using:
awk -f awk_select_column_test.awk awk_column_select_test.csv
I got the following output with only 1 row:
0.4,0.8,0.9
However, I was hoping to get 2 rows like this:
0.3,0.6,0.7
0.4,0.8,0.9

You have extra curly brackets {} and END block print last line, the logic is all right.
#!/usr/bin/awk -f
BEGIN {FS=","; cutoff=0.5}
NR>1 && ($1 > cutoff || $2 > cutoff || $3 > cutoff){print}
This can be written, also
#!/usr/bin/awk -f
BEGIN {FS=","; cutoff=0.5}
NR>1 && ($1 > cutoff || $2 > cutoff || $3 > cutoff)

Related

AWK: Compare two columns conditionally in one file

I have a pipe (|) delimited file where $1 has IDs and there are values in $2 and $3. The file has ~5000 rows in it with each ID $1 repeated multiple times. The file looks like this
a|1|2
a|2|0
a|3|3
a|4|0
b|5|3
b|2|4
I am trying to print lines where the $2 on the current line is <= the max $3 so the output will be
a|1|2
a|2|0
a|3|3
b|2|4
Any lead on this would be highly appreciated! Thank you.

It sounds like you just want to, for each $1, print those lines where $2 is less than or equal to the max $3:
$ cat tst.awk
BEGIN { FS="[|]" }
NR==FNR {
max[$1] = ( ($1 in max) && (max[$1] > $3) ? max[$1] : $3 )
next
}
$2 <= max[$1]
$ awk -f tst.awk file file
a|1|2
a|2|0
a|3|3
b|2|4

Awk compare 2 files and match 2 columns, get % difference between column 3s

I have 2 files, with the formatting below. I am trying to compare lines where columns 1 and 2 match and then get the difference in the 2 #'s that are in column 3.
if file 2 column 3 is greater than file1 column 3, i would like a + at the end of the row
if file2 column is less than file 1 column 3, i would like a - at the end of the row
if either file column 3 is 0 i would like a * at the end of the row.
I only want to print lines where the difference between the 2 columns is > 15%
file1
abc,1,472
abc,2,536
abc,3,652
abc,4,512
abc,5,474
abc,6,266
abc,7,520
def,1,954
def,9,538
def,10,136
def,11,341
def,12,183
def,13,1209
def,14,365
def,15,536
def,16,979
def,17,0
xyz,1,547
xyz,19,0
xyz,20,0
xyz,21,0
xyz,22,0
xyz,23,0
xyz,24,0
File 2
abc,1,456
abc,2,533
abc,3,643
abc,4,444
abc,5,124
abc,6,255
abc,7,520
def,1,954
def,9,538
def,10,435
def,11,341
def,12,155
def,13,1209
def,14,365
def,15,536
def,16,979
def,17,0
xyz,1,547
xyz,19,124
xyz,20,0
xyz,21,0
xyz,22,0
xyz,23,0
xyz,24,0
expected output
abc,5,474,124,74%,- // (474-124)/474 = 74%
def,10,136,435,31%,+. // (435-136)/474 = 69%
xyz,19,0,124,100%,*. // either file has 0 , print 100% and *
I have tried multiple iterations of this but cannot seem to get the formatting to work.
awk -F, 'FNR==NR{a[$1,$2]; next ;b[$1,$2,$3]; next} $1,$2 in a {if ($3>b[$3]) {Q=((b[$3]/$3) *100)) {print Q,$0 }} else if (b[$3]>$3) {Q=(($3/b[$3]) *100)){print Q,$0 }}' file1 file2
i get this error
^ unexpected newline or end of string
also tried variations on this line but i cannot figure out the division by 0 error
awk -F, 'FNR==NR{a[$1,$2]; next ;b[$1,$2,$3]; next} $1,$2 in a {if ((Q=(b[$3]/$3) > 15) || (Q=($3/b[$3])) > 15 ){print Q,$0}}' file1 file2
awk: cmd. line:1: (FILENAME=file2 FNR=1) fatal: division by zero attempted

you need to handle if the denominator is zero in the base case, since you cannot find the relative change in that case, you need to report absolute change.
$ awk -F, -v OFS=, '{k=$1 FS $2}
FNR==NR {a[k]=$3; next}
k in a {if(a[k]) q=$3/a[k]-1;
else if($3) zero=1
else q=0
plus=q>0.15
minus=q<-0.15
q=q<0?-q:q;
if(zero) plus=minus=0
if(plus || minus || zero)
print k,a[k],$3,(zero?100:int(100*q))"%",(plus?"+":minus?"-":"*")
q=zero=0}' file1 file2
abc,5,474,124,73%,-
def,10,136,435,219%,+
def,12,183,155,15%,-
xyz,19,0,124,100%,*
you can put this in a diff.awk file and run with awk -f diff.awk file1 file2
the file contents should be
BEGIN{FS=OFS=","}
{k=$1 FS $2}
... the code in between
q=zero=0}
note that text body is without the single quotes. You can make it executable with the right shebang but I think this will be simpler.

awk to extract the data between Dates

Would like to extract the line items, if the dates between 5th Apr to 10th Apr from second field ($2) . Having many gun zip files into that directory.
Inputs.gz
Des1,DATE,Des1,Des2,Des3
ab,01-APR-15,10,0,4
ab,04-APR-15,25,0,12
ab,05-APR-15,40,0,6
ab,07-APR-15,55,0,6
ab,10-APR-15,70,0,1
ab,11-APR-15,85,0,1
I have tried below command and in-complete
zcat Inputs*.gz | awk 'BEGIN{FS=OFS=","} { if ( (substr($2,1,2) >=5) && (substr($2,1,2) <=10) ) print $0 }' > Output.txt
Expected Output
ab,05-APR-15,40,0,6
ab,07-APR-15,55,0,6
ab,10-APR-15,70,0,1
Please suggest ...

Try this:
awk -F",|-" '$2 >= 5 && $2 <= 10'
It adds the date delimiter to the FS using the -F flag. To ensure that it's APR of 2015, you could separately add tests like:
awk -F",|-" '$2 >= 5 && $2 <= 10 && $3=="APR" && $4==15'
While this makes the date easy to parse up front, if you want to print it out again, you'll need to reconstruct it with something like _date = $2 "-" $3 "-" $4. And if you need to manipulate the data in general, you'd want to add back in the BEGIN {OFS=","} part.
The field numbering I used assumes there are no "-" delimiters in the first field.
I get the following output:
ab,05-APR-15,40,0,6
ab,07-APR-15,55,0,6
ab,10-APR-15,70,0,1
If you have a whole mess of dates and you really only care about the one in the 2nd field via comma delimiters, you could use split like:
awk -F"," '{ split($2, darr, "-") } darr[1] >= 5 && darr[1] <= 10 && darr[2]=="APR" && darr[3]==15'
which is like saying:
for every line, parse the 2nd field into the darr array using the - delimiter
for every line, if the logic darr[1] >= 5 && darr[1] <= 10 && darr[2]=="APR" && darr[3]==15 is true print the whole line.

Another simple solution by using regular expression
awk -F',' '$2 ~ /([0][5-9]|10)-APR-15/{ print $0 }' txt
-F Field separator.
$2 second field
~ match regular expression
'/([0][5-9]|10)-APR-15/` reguler expression to match 05 to 09 or 10
APR-15
Using internal field separator
awk 'BEGIN{ FS="," } $2 ~ /([0][5-9]|10)-APR-15/{ print $0 }' txt
using explicate date number declarations
awk 'BEGIN{ FS="," } $2 ~ /(05|06|07|08|09|10)-APR-15/{ print $0 }' txt

awk printing against the condition

I have a simple question but I could not figure it out.
I have a file that I want to print all the lines that DO NOT match the condition I specify in the awk if condition. But I can just get to print the condition, how the other would work?
This is my code:
awk '{if ($18==0 && $19==0 && $20==0 && $21==0) print $0}' file
I also tried this:
awk '{if !($18==0 && $19==0 && $20==0 && $21==0) print $0}' file
But the second one doesn't work, any help is appreciated. Thank you.

Here you can do:
awk '$18+$19+$20+$21!=0' file
print $0 is not needed, since its default action.

The negation (!) needs to be inside the parentheses:
awk '{if (!($18==0 && $19==0 && $20==0 && $21==0)) print $0}' file
And we add another set inside to wrap everything.
(FYI, if you had given how it "didn't work" (i.e., a syntax error on !, that would have been more helpful. Please remember to include error messages or symptoms of something not working for future questions!)

You could also reverse your conditional statement:
you want the opposite of :
awk '{if ($18==0 && $19==0 && $20==0 && $21==0) print $0}' file
Which can either be :
awk '{if ($18!=0 || $19!=0 || $20!=0 || $21!=0) print $0}' file
or
awk '{if (!($18==0 && $19==0 && $20==0 && $21==0)) print $0}' file
Example :
!cat file
A 0 0 0
B 1 1 1
C 1 0 1
awk '$2+$3+$4!=0' file
B 1 1 1
C 1 0 1
awk '{if ($2!=0 || $3!=0 || $4!=0) print $0}' file
B 1 1 1
C 1 0 1
awk '{if (!($2==0 && $3==0 && $4==0)) print $0}' file
B 1 1 1
C 1 0 1
awk '{if (!($2==0 || $3==0 || $4==0)) print $0}' file
B 1 1 1

New to Awk. Struggling with negative number formatting

Goal: to output only data that is above 1 and below -1
or
output data that is between 1 and -1
I have the basics of awk and can print column 2 (where my data is)
notice I also specified a range of 0-1
awk '/[0-1]/ {print $2}' test.dat
I am also needing to have the line number so I added NR...
awk '/[0-1]/ {print $2 NR}' test.dat
To make sure I am clear, the point is to identify which lines of the data are outside of the acceptable range, so we can ignore them in our analysis. (ie anything bigger than 1 or lower than -1 is too much of a change).
Any help you can provide would be great. I have pasted some sample data below.
http://pastebin.com/7tpBAqua

Not sure if you want to evaluate the data in every column, or if there's a specific column you need to test. Testing a single column is simplest; testing multiple or all columns is a fairly simple repetitive extension of the pattern. Since you mention column 2 specifically, let's assume you want to print column 2 only when it is between -1 and 1:
awk -F, '($2 >= -1) && ($2 <= 1) { print $2 }'
To test for the field being greater than 1 or less than -1 instead:
awk -F, '($2 <= -1) || ($2 >= 1) { print $2 }'
Printing a different field, or the entire line instead ($0) should be fairly obvious. To examine each field, either simply repeat the entire ($2 >= -1) && ($2 <= 1) { print $2 } clause for each field you're interested in (which quickly gets verbose), or something like this (not tested):
awk -F, '{ for (i = 1; i <= NF; ++i) if (($i >= -1) && ($i <= 1)) print $i; }'

awk -F'[ ,]' 'NR>2{for (i=2;i<=NF;i++) if ($i<-1 || $i>1) { print NR; next } }' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk to select CSV file columns greater than a certain number - awk

Related

AWK: Compare two columns conditionally in one file

Awk compare 2 files and match 2 columns, get % difference between column 3s

awk to extract the data between Dates

awk printing against the condition

New to Awk. Struggling with negative number formatting

Categories

Resources