awk include column value in pattern - awk

I am looking for a way to pattern match against another column in awk. For example, I wish to find rows for which the value in column 4 is nested in column 5.
Performing awk '$4 ~ /$5/' doesn't work, as the dollar sign is interpreted as part of the regular expression. How do I get the column 5 value into this pattern match!
Many thanks!

if you're looking for literal match, not regex; you can use
awk 'index($5,$4)' file
will print the lines where $4 is a substring of $5.

> awk '$2 ~ $1' <<< "field another_field"
field another_field
this will print lines when $2 contains the value of $1

Related

Extract all negative number from column2 in another file

I have a very long data file "file.dat" with two columns.
Here I am putting a very small portion. I want to extract all the negative numbers from the column2 into another file let us say file2.dat and similarly for positive numbers from the same column2 to another file file3.dat
4.0499 -7.1787
4.0716 -7.1778
4.0932 -7.1778
4.1148 -7.1785
4.1365 -7.1799
4.1581 -7.1819
4.1798 -7.1843
4.2014 -7.1868
4.2231 -7.1890
4.2447 -7.1902
4.2663 -7.1900
4.2880 -7.1886
<-------Note: this kind of break is also there in many places
0.0000 2.1372
0.0707 2.1552
0.1414 2.2074
0.2121 2.2864
0.2828 2.3791
0.3535 2.4646
0.4242 2.5189
0.4949 2.5207
0.5655 2.5098
Expected Results for Negative numbers file2.dat
-7.1787
-7.1778
-7.1778
-7.1785
-7.1799
-7.1819
-7.1843
-7.1868
-7.1890
-7.1902
-7.1900
-7.1886
Expected Results for Positive numbers file3.dat
2.1372
2.1552
2.2074
2.2864
2.3791
2.4646
2.5189
2.5207
2.5098
Nearest Solution I found
This solution did not work for me because of my lack of knowledge.
http://www.unixcl.com/2009/11/awk-extract-negative-numbers-from-file.html
It is quite simple to do with awk. You simply check the value in the 2nd column and write it out based on its value, e.g.
awk '$2<0 {print $2 > "negative.dat"} $2>=0 {print $2 > "positive.dat"}' file
Where the two rules used by awk above are:
$2<0 {print $2 > "negative.dat"}, if the value in the 2nd column is less than 0, write to negative.dat,
$2>=0 {print $2 > "positive.dat"}, if the value in the 2nd column is greater than or equal to 0, write to "positive.dat".
Example Use/Output
With you example data in file (without your comment), running the above would result in:
$ cat negative.dat
-7.1787
-7.1778
-7.1778
-7.1785
-7.1799
-7.1819
-7.1843
-7.1868
-7.1890
-7.1902
-7.1900
-7.1886
The positive values in:
$ cat positive.dat
2.1372
2.1552
2.2074
2.2864
2.3791
2.4646
2.5189
2.5207
2.5098
David's answer is pretty good, here is a shorter awk one liner using ternary condition:
awk 'NF>1 {print $2 > ($2 < 0 ? "neg.dat" : "pos.dat")}' file

Using awk pattern to file filter data

I have the folling file(named /tmp/test99) which containd the rows:
"0","15","wall15"
123132,09808098,"0","15"
I am trying to filter the rows that contains "0" in the 3rd place, and "15" in 4th place (like in the second row)
I tried running:
cat /tmp/test99 | awk '/"0","15"/{print>"/tmp/0_15_file.out"} '
but instead of getting only the second row, I get also the first row starting with "0","15".
Could you please help with the pattern ?
Thanks:)
You may check if Fields 3 and 4 are equal to some hardcoded value using
awk -F, '$3=="\"0\"" && $4=="\"15\""'
Set the field separator to a comma and then, if Field 3 is "0" and Field 4 is "15" print the line, else discard.
See the online demo:
s='"0","15","wall15"
123132,09808098,"0","15"'
awk -F, '$3=="\"0\"" && $4=="\"15\""' <<< "$s"
# => 123132,09808098,"0","15"
Could you please try following.(comment on your effort, you need NOT to use cat with awk it could read Input_file by itself)
awk -F, '$3!~/\"0\"/ && $4!~/\"15\"/' Input_file

awk / gawk printf when variable format string, changing zero to dash

I have a table of numbers I am printing in awk using printf.
The printf accomplishes some truncation for the numbers.
(cat <<E\OF
Name,Where,Grade
Bob,Sydney,75.12
Sue,Sydney,65.2475
George,Sydney,84.6
Jack,Sydney,35
Amy,Sydney,
EOF
)|gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3=0}
printf("%s,%s,%d \n",$1,$2,$3)}'
This produces:
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,0
Amy,Sydney,0
What I want is to display scores which are less than 50, or missing, as a dash ("-").
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
This requires the 3rd string format in printf change from %d to %s.
So in some rows, the third column should be a value, and in some rows, the third column should be a string. How can I tell this to GAWK? Or should I just pipe through another awk to re-format?
$ gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3="-"} else {$3=sprintf("%d", $3)}
printf("%s,%s,%s \n",$1,$2,$3)}' ip.txt
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
use if-else to assign value to $3 as needed
sprintf allows to assign result of formatting to a variable
for this case, you could use int function as well
now printf will have %s for $3 as well
Assuming you missed the commas for the header and space after third column is not needed, you could do this with a simple one-liner
$ awk -F, -v OFS=, 'NR>1{$3 = $3 < 50 ? "-" : int($3)} 1' ip.txt
Name,Where,Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
?: ternary operator is alternate for if-else
1 is an awk idiom to print contents of $0

I want to check the numbers in the 1st column is equal to 2nd column

I want to check the numbers in the 1st column is equal to 2nd column, and 1st column should be starting with ABC and ending with DEF and number between these should be matching to 2nd column.
can anyone help me here please.
My input:
ABC12345DEF |12345 |23132331331|
ABC95678DEF |45678 |23132331331|
ABC87887DEF |86187 |23132331331|
ABC89043DEF |89043 |23132331331|
Output Should be:
ABC12345DEF |12345 |23132331331|
ABC89043DEF |89043 |23132331331|
I'm trying to use the foollowing one, but it's not working.
awk -F '|' '($1 !~ /ABC+[$2]+DEF/)' WHTFile.txt > QC2Valid.txt
This should work for your requirement:
awk -F'|' '{s=$2;sub(/\s/,"",s)}$1 ~ s' input
ABC12345DEF |12345 |23132331331|
ABC89043DEF |89043 |23132331331|
The problems in your codes:
you cannot concatenate strings in awk by +
you cannot concatenate strings in awk between /.../, it is static regex expression
you should use ~ instead of !~, from the requirement you described.
your column two has empty strings, you should remove them before matching
In the column |12345 |, is the trailing space data? I assume not, based on your proposed output. In that case, the separator isn't just the pipe character.
To see if $2 is embedded within constants in $1, this will do the trick:
$ awk -F '[ |]' '"ABC" $2 "DEF" = $1 { print }'

Awk Field number of matched pattern

I was wondering if there's a built in command in awk to get the field number of the phrase that you just matched.
Banana is yellow.
awk {
/yellow/{ for (i=1;i<=NF;i++) if($i ~/yellow/) print $i}'
Is there a way to avoid writing the loop?
Your command doesn't work when I test it. Here's my version:
echo "banana is yellow" | awk '{for (i=1;i<=NF;i++) if($i ~/yellow/) print i}'
The output is :
3
As far as I know, there's no such built-in feature, to improve your command, the pattern match /yellow/ at the beginning is not necessary, and also $i will print the matching field other than the field number that you need.
Alternatively, you can use an array to store each field and its corresponding index number, and then print field by arr["yellow"]
If the input string is a oneline string you can set the record delimiter to the field delimiter. Doing so you can use NR to print the position:
awk 'BEGIN{RS=FS}/yellow/{print NR}' <<< 'banana is yellow'
3