awk to output two files based on match or no match

awk to output two files based on match or no match - awk

In the below awk I am trying to print out the lines that match have the string FP or RFP $2 in the tab-delimited input. If a match is found in $2 then in result only the lines of file that do not have those keywords in them are printed. At the same time another file removed is printed that has those lines that did have those keywords in them. The awk has a syntax error in it when I try to print two files, if I only print one the awk runs. Thank you :).
input
12 aaa
123 FP bbb
11 ccc
10 RFP ddd
result
12 aaa
11 ccc
removed
123 FP bbb
10 RFP ddd
awk
awk -F'\t' 'BEGIN{d["FP"];d["RFP"]}!($2 in d) {print > "removed"}; else {print > "result"}' file
awk: cmd. line:1: BEGIN{d["FP"];d["RFP"]}!($2 in d) {print > "removed"}; else {print > "result"}
awk: cmd. line:1: ^ syntax error

else goes with if. Your script didn't have an if, just an else, hence the syntax error. All you need is:
awk -F'\t' '{print > ($2 ~ /^R?FP$/ ? "removed" : "result")}' file
or if you prefer the array approach you are trying to use:
awk -F'\t' '
BEGIN{ split("FP RFP",t,/ /); for (i in t) d[t[i]] }
{ print > ($2 in d ? "removed" : "result") }
' file
Read the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn awk syntax and semantics.
Btw when writing if/else code like you show in your question:
if ( !($2 in d) ) removed; else result
THINK about the fact you're using negative (!) logic which makes your code harder to understand right away AND opens you up to potential double negatives. Always try to express every condition in a positive way, in this case that'd be:
if ($2 in d) result; else removed

Related

awk illegal statement at source line 1

I am executing following awk command:
awk -F'\t' '{ split($4,array,"[- ]"); print > array[1]""array[2]""array[3]}' myFile.txt
but seeing this error:
awk: syntax error at source line 1
context is
{ split($4,array,"[- ]"); print > >>> array[1]"" <<<
awk: illegal statement at source line 1
awk: illegal statement at source line 1
What can be the reason for that? How to fix the script?

Those pairs of double quotes are doing nothing, you could just remove them:
awk -F'\t' '{ split($4,array,"[- ]"); print > array[1] array[2] array[3]}' myFile.txt
An unparenthesized expression on the right side of input or output redirection is undefined behavior per POSIX which is why some awks (e.g. gawk) will interpret your code as you intended:
awk -F'\t' '{ split($4,array,"[- ]"); print > (array[1] array[2] array[3])}' myFile.txt
while others can interpret it as:
awk -F'\t' '{ split($4,array,"[- ]"); (print > array[1]) (array[2] array[3])}' myFile.txt
which is a syntax error in any awk, or anything else.
You can fix your syntax error by adding the parens:
awk -F'\t' '{ split($4,array,"[- ]"); print > (array[1] array[2] array[3])}' myFile.txt
but that could have other problems too and the right way to do what you're trying to do depends on whatever it is you're trying to do, which we can't tell just from your code. If you post a new question with sample input and expected output then we can help you write your code the right way.

You need
print > (array[1]""array[2]""array[3])
in many implementations of awk. Note the parenthesis around the expression that generates the filename.
Might want to close the file afterwards too in case there's a lot of possible filenames that can be created, and use appending instead:
awk -F'\t' '{ split($4,array,"[- ]")
file = array[1] "" array[2] "" array[3]
print >> file
close(file)
}' myFile.txt

here's an awk-based solution verified on 4 awk variants, requires no array splitting, while also closing file connections along the way :
pristine $0 has been pre-saved, thus performing ++NF against a blank OFS does not result in data truncation
(ps : as a matter of fact, saving $0 is
only necessary for gawk and nawk )
SETUP and INPUT
removed 'wyx-8979479BCCF-;#%&*[)(]~'
zsh: no matches found: wyx*
1 --------INPUT------------
2 bca 0106 qsr wyx-8979479BCCF-=;#%&*[)(]~ testtail
CODE
{m,n,g}awk '
BEGIN {
OFS = _
FS = "^[^\t]*\t[^\t]*\t[^\t]*\t|[ -]|\t[^\t]*$"
} {
___ = $(_*(__==_?_:close(__)))
print(___) > (__ = $!++NF) }'
# mawk-1/2 specific streamlining
mawk 'BEGIN { FS="^[^\t]*\t[^\t]*\t[^\t]*\t|[ -]|\t[^\t]*$"(OFS=_)
} { print $(_*(__==_?_:close(__))) > (__ = $!++NF) }'
OUTPUT
-rw-r--r-- 1 501 20 50 Jun 19 12:13 wyx8979479BCCF=;#%&*[)(]~
1 bca 0106 qsr wyx-8979479BCCF-=;#%&*[)(]~ testtail

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?

Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file

An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo

If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

awk and log2 divisions

I have a tab delimited file that looks something like this:
foo 0 4
boo 3 2
blah 4 0
flah 1 1
I am trying to calculate log2 for between the two columns for each row. my problem is with the division by zero
What I have tried is this:
cat file.txt | awk -v OFS='\t' '{print $1, log($3/$2)log(2)}'
when there is a zero as the denominator, the awk will crash. What I would want to do is some sort of conditional statement that would print an "inf" as the result when the denominator is equal to 0.
I am really not sure how to go about this?
Any help would be appreciated
Thanks

You can implement that as follows (with a few additional tweaks):
awk 'BEGIN{OFS="\t"} {if ($2==0) {print $1, "inf"} else {print $1, log($3/$2)log(2)}} file.txt
Explanation:
if ($2==0) {print $1, "inf"} else {...} - First check to see if the 2nd field ($2) is zero. If so, print $1 and inf and move on to the next line; otherwise proceed as usual.
BEGIN{OFS="\t"} - Set OFS inside the awk script; mostly a preference thing.
... file.txt - awk can read from files when you specify it as an argument; this saves the use of a cat process. (See UUCA)

awk -F'\t' '{print $1,($2 ? log($3/$2)log(2) : "inf")}' file.txt

awk for multiple patterns

My file looks like:
L 0 256 * * * * *
H 0 307 100.0 + 0 0
S 30 351 * * * * *
D 8 27 * * * * 99.3
C 11 1 * * * * *
for my script I would like to start by awk print $0 for certain lines using $1
Such as
awk '{if ($1!="C") {print $0} else if ($1!="D") {print $0}}'
But, there has to be a way to combine "C" and "D" into one IF statement... right?
For example if I want to search for == L,H,S ie... NOT C or D how would I right this?

Your present condition is not correct as both $1!="C" and $1!="D" can't be false at the same time. Hence, it will always print the whole file.
This will do as you described:
awk '{if ($1!="C" && $1!="D") {print $0}}' file

Using awk, you can provide rules for specific patterns with the syntax
awk 'pattern {action}' file
see the awk manual page for the definition of a pattern. In your case, you could use a regular expression as a pattern with the syntax
awk'/regular expression/ {action}' file
and a basic regular expression which would suit your needs could be
awk '/^[^CD]/ {print $0}' file
which you can actually shorten into
awk '/^[^CD]/' file
since {print $0} is the default action, as suggested in the comments.

awk '$1 ~ /[CD]/' file
awk '$1 ~ /[LHS]/' file
awk '$1 ~ /[^LHS]/' file
awk '$1 !~ /[LHS]/' file

In somewhat the same order, what would be a simple awk-rule to keep only the lines that don't have sth in one field and simultaneously sth else in another field? Possibly with a pattern.
e.g.:
A=1 B=2 C=3
B=2 E=5 C=3
A=1 D=4 C=3
A line like this
awk '$1$3 != "A=1""C=3" {print $0}' IN > OUT
seems to do the trick,
and the moment I start to use the &&-operator (in a bash script) awk tends to delete all the lines that contain A=1 OR C=3. But the goal is to keep the middle line.
I would like to tell awk to delete line one and three when field one contains the letter, the sub-string 'A' and field three the letter 'C' simultaneously.
But, awk doesn't want to listen to me or I'm seriously doing sth wrongly (as I'm still in the experimental awk-phase)

awk script for finding smallest value from column

I am beginner in AWK, so please help me to learn it. I have a text file with name snd and it values are
1 0 141
1 2 223
1 3 250
1 4 280
I want to print the entire row when the third column value is minimu

This should do it:
awk 'NR == 1 {line = $0; min = $3}
NR > 1 && $3 < min {line = $0; min = $3}
END{print line}' file.txt
EDIT:
What this does is:
Remember the 1st line and its 3rd field.
For the other lines, if the 3rd field is smaller than the min found so far, remember the line and its 3rd field.
At the end of the script, print the line.
Note that the test NR > 1 can be skipped, as for the 1st line, $3 < min will be false. If you know that the 3rd column is always positive (not negative), you can also skip the NR == 1 ... test as min's value at the beginning of the script is zero.
EDIT2:
This is shorter:
awk 'NR == 1 || $3 < min {line = $0; min = $3}END{print line}' file.txt

You don't need awk to do what you want. Use sort
sort -nk 3 file.txt | head -n 1
Results:
1 0 141

I think sort is an excellent answer, unless for some reason what you're looking for is the awk logic to do this in a larger script, or you want to avoid the extra pipes, or the purpose of this question is to learn more about awk.
$ awk 'NR==1{x=$3;line=$0} $3<x{line=$0} END{print line}' snd
Broken out into pieces, this is:
NR==1 {x=$3;line=$0} -- On the first line, set an initial value for comparison and store the line.
$3<x{line=$0} - On each line, compare the third field against our stored value, and if the condition is true, store the line. (We could make this run only on NR>1, but it doesn't matter.
END{print line} -- At the end of our input, print whatever line we've stored.
You should read man awk to learn about any parts of this that don't make sense.

a short answer for this would be:
sort -k3,3n temp|head -1
since you have asked for awk:
awk '{if(min>$3||NR==1){min=$3;a[$3]=$0}}END{print a[min]}' your_file
But i prefer the shorter one always.

For calculating the smallest value in any column , let say last column
awk '(FNR==1){a=$NF} {a=$NF < a?$NF:a} END {print a}'
this will only print the smallest value of the column.
In case if complete line is needed better to use sort:
sort -r -n -t [delimiter] -k[column] [file name]

awk -F ";" '(NR==1){a=$NF;b=$0} {a=$NF<a?$NF:a;b=$NF>a?b:$0} END {print b}' filename
this will print the line with smallest value which is encountered first.

awk 'BEGIN {OFS=FS=","}{if ( a[$1]>$2 || a[$1]=="") {a[$1]=$2;} if (b[$1]<$2) {b[$1]=$2;} } END {for (i in a) {print i,a[i],b[i]}}' input_file
We use || a[$1]=="" because when 1st value of field 1 is encountered it will have null in a[$1].

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk to output two files based on match or no match - awk

Related

awk illegal statement at source line 1

awk: print each column of a file into separate files

awk and log2 divisions

awk for multiple patterns

awk script for finding smallest value from column

Categories

Resources