Multiple awk pattern matching in one line - awk

Let's say I want to match foo and bar in a file. The following works :
/foo/{commands}/bar/{commands}
Note: here there is no separator between /foo/{commands} and /bar/{commands}.
The following is also okey:
/foo/{commands1}{commands2} where {commands2} is executed for every line and I've left out the pattern.
But what if I want to leave out the commands? What's awk's syntax rule here? The following doesn't work:
/foo//bar/
Of course, I could write it as /foo/{print}/bar/{print}, but I was wondering what's the delimiter for separating segments and why you sometimes need it and sometimes you don't.

awk works on method of regexp then action in this if we are mentioning /..../ and not mentioning any action so when condition is TRUE then by default print of line will happen. In case you want to put this into 2 different statements try like:
awk '/foo/;/bar/' Input_file
Above will mean like:
Since they are segregated with ; these will be treated as 2 different conditions.
When /foo/ is true for any line then NO action is mentioned so print of that line will happen.
When /bar/ is true for any line same thing for this also, condition is true and no action mentioned so print of line will happen.
But point to be noted that in case any line has both strings in it so that line will be printed 2 times, which I believe you may NOT want it so you could do like following:
OR within single condition itself try something like:
awk '/foo|bar/' Input_file
Or in case you need to check if strings present in same line then try Like:
awk '/foo/ && /bar/' Input_file

To match foo and bar in a file - just combine patterns:
awk '/foo/ && /bar/ ....'

Related

Awk - Grep - Match the exact string in a file

I have a file that looks like this
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
ON,454644,FRED84848,Super Man,65757,555,Free
I need to match the values in the fourth column exactly as they are written. So if I am searching for "Super" I need it to return the line with "Super" only.
ON,111111,TEN000812,Super,7483747483,767,Free
Likewise, if I'm looking for "Super Man" I need that exact line returned.
ON,454644,FRED84848,Super Man,65757,555,Free
I have tried using grep, but grep will match all instances that contain Super. So if I do this:
grep -i "Super" file.txt
It returns all lines, because they all contain "Super"
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
ON,454644,FRED84848,Super Man,65757,555,Free
I have also tired with awk, and I believe I'm close, but when I do:
awk '$4==Super' file.txt
I still get output like this:
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
I have been at this for hours, and any help would be greatly appreciated at this point.
You were close, or I should say very close just put field delimiter as comma in your solution and you are all set.
awk 'BEGIN{FS=","} $4=="Super"' Input_file
Also one more thing in OP's attempt while comparison with 4th field with string value, string should be wrapped in "
OR in case you want to mention value to be compared as an awk variable then try following.
awk -v value="Super" 'BEGIN{FS=","} $4==value' Input_file
You are quite close actually, you can try :
awk -F, '$4=="Super" {print}' file.txt
I find this form easier to grasp. Slightly longer than #RavinderSingh13 though
-F is the field separator, in this case comma
Next you have a condition followed by action
Condition is to check if the fourth field has the string Super
If the string is found, print it

AWK script, linefeed under Windows causing different function

I have a simple AWK script which I try to execute under Windows. Gnu AWK 3.1.6.
The awk script is run with awk -f script.awk f1 f2 under Windows 10.
After spending almost half a day debugging, I came to find that the following two scenarios produce different results:
FNR==NR{
a[$0]++;cnt[1]+=1;next
}
!a[$0]
versus
FNR==NR
{
a[$0]++;cnt[1]+=1;next
}
!a[$0]
The difference of course being the linefeed at line 1.
It puzzles me because I don't recall seeing anywhere awk should be critical about linefeeds. Other linefeeds in the script are unimportant.
In example one, desired result is achieved. Example 2 prints f1, which is not desred.
So I made it work, but would like to know why
From the docs (https://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html)
awk is a line-oriented language. Each rule’s action has to begin on
the same line as the pattern. To have the pattern and action on
separate lines, you must use backslash continuation; there is no other
option.
Note that the action only has to begin on the same line as the pattern. After that as we're all aware it can be spread over multiple lines, though not willy-nilly. From the same page in the docs:
However, gawk ignores newlines after any of the following symbols and
keywords:
, { ? : || && do else
In Example 2, since there is no action beginning on the same line as the FNR == NR pattern, the default action of printing the line is performed when that statement is true (which it is for all and only f1). Similarly in that example, the action block is not paired with any preceding pattern on its same line, so it is executed for every record (though there's no visible result for that).

How do you get awk to move to next line when a condition has been actioned?

I'm trying to write an awk script which checks certain conditions and throws away lines meeting those conditions.
The specific condition are to throw away the first two lines of the file and any line that starts with the text xyzzy:. To that end, I coded up:
awk '
NR < 2 {}
/^xyzzy:/ {}
{print}'
thinking that it would throw away the lines where either of those two conditions were met and print otherwise.
Unfortunately, it appears that the print is being processed even when the line matches one of the other two patterns.
Is there a C-like continue action that will move on the next line ignoring all other condition checks for the current line?
I suppose I could use something like ((NR > 1) && (!/^xyzzy:/)) {print} as the third rule but that seems rather ugly to me.
Alternatively, is there another way to do this?
Use the keyword next as your action
This keyword is often useful when you want to iterate over 2 files; sometimes it's the same file that you want to process twice.
You'll see the following idiom:
awk '
FNR==NR {
< stuff that works on file 1 only >
next
}
{
< stuff that works on file 2 only >
}' ./infile1 ./infile2

Explain this awk command

Please explain what exactly this awk command does:
awk '$0!~/^$/{print $0}'
It removes blank lines. The condition is $0 (the whole line) does not match !~ the regexp /^$/ (the beginning of the line immediately followed by the end of the line).
Similar to grep -v '^$'
It prints non-empty input lines. Note: "Empty" does not mean "blank", in this case.
Your example could be rewritten as simply:
awk '!/^$/'
or
sed '/^$/d'
Like Ben Jackson and the others said, it removes completely empty lines. Not the ones with one ore more whitespaces, but the zero character long ones. We will never know if this was the intended behaviour.
I'd like to remark, that the code is at least redundant if not even triple redundant depending on what it's used for.
What it does is that it prints the input line to the output if the input line is not the empty line.
Since the standard behaviour of awk is, that the input line is printed if a condition without a following program block is met, this would suffice:
awk '$0!~/^$/' or even shorter awk '$0!=""'
If you could be sure, that no line would be parsed to zero, even a
awk'$0'
could do the trick.
Make it readable first...
echo '$0!~/^$/{print $0}' | a2p
==>
$, = ' ';
$\ = "\n";
while (<>) {
chomp;
if ($_ !~ /^$/) {
print $_;
}
}
And the interpret. In this case, don't print empty lines.

Does awk print all if field variable doesn't exist?

I am trying to understand some scripts that I have inherited and make use of awk. In one of the scripts are these lines:
report=`<make call to Java class that generates a report`
report=`echo $report|awk '{print $5}'`
The report generated in line 1 has data like this:
ABC1234:0123456789:ABCDE
ABC4321:9876543210:EDCBA
...
The awk generated report is the same as the original one.
There is no 5th field in the report since there is no whitespace and a different field separator has not been defined. I know that using $0 will return all fields. Does specifying a field that doesn't exist do the same?
No:
echo "1 2 3"|awk '{print $5}'
The above prints nothing. Don't know why it is behaving like you are specifying. If you were to use " instead of ', then it would print because $5 would be expanded by shell, but as written it should not.
Something is wrong with your test.
The expected awk behavior in this case is to print a blank line for each input line, and that's what I see when I run with either the 1TA or gawk.