Awk to print a single next word followin a pattern match - awk

This Q is a variation on the theme of printing something after a pattern.
There will be input lines with words. Some lines will match a pattern where the pattern will be one or multiple words separated by space. The pattern might have a leading/trailing space which needs to be obeyed. I need to print the word immediately following the match.
Example input
The quick brown fox jumps over the lazy dog
Pattern : "brown fox "
Desired output : jumps
The pattern will only occur once in the line. There will always be a word following the pattern. There will be lines without the pattern.
awk or sed would be nice.
Cheers.
EDIT :
I failed to ask the question properly. There will be one or more spaces between the pattern and the next word. This breaks Andre's proposal.
% echo -e "The quick brown fox jumps over the lazy dog\n" | awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }'
jumps
% echo -e "The quick brown fox jumps over the lazy dog\n" | awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }'

This works, given that the desired word is followed by a space:
$ echo -e "The quick brown fox jumps over the lazy dog\n" > file
$ awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }' file
jumps
Edit:
If there're more spaces use this:
$ awk -F 'brown fox' 'NF>1{ sub(/^ */,"",$NF);
sub(/ .*/,"",$NF); print $NF }' file

Disclaimer: this solution assumes that if no pattern is found (There will be lines without the pattern.) it is appropriate to print empty line, if this does not hold true ignore this answer entirely.
I would use AWK for this following way, let file.txt content be
The quick brown fox jumps over the lazy dog
No animals in this line
The quick brown fox jumps over the lazy dog
then
awk 'BEGIN{FS="brown fox *"}{sub(/ .*/,"",$2);print $2}' file.txt
output
jumps
jumps
Explanation: I set field seperator FS to "brown fox " followed by any numbers of spaces. What is after this will appear in 2nd column, I jettison from 2nd column anything which is after first space including said space, then print that column. In case there is no match, second column is empty and these actions result in empty line.

With GNU grep:
$ grep -oP '(?<=brown fox )(\w+)' file
jumps
If you have more than 1 space after the match:
$ echo 'The quick brown fox jumps over the lazy dog' | grep -oP '(?<=\bbrown fox\b)\s+\K(\w+)'
jumps
Perl, with the same regex:
$ perl -lnE 'print $1 if /(?<=\bbrown fox )(\w+)/' file
Or, if you have multiple spaces:
$ perl -lnE 'print $1 if /(?<=brown fox)\s+(\w+)/' file
(As stated in comments, both the GNU grep and Perl regex could be \bbrown\h+fox\h+\K\w+ which has the advantage of supporting multiple spaces between brown and fox)
With awk, you can split on the string and split the result (this works as-is for multi spaces):
pat='brown fox'
awk -v pat="$pat" 'index($0, pat){
split($0,arr, pat)
split(arr[2], arr2)
print arr2[1]}' file

With GNU awk, you might also use a capture group with function match.
\ybrown\s+fox\s+(\w+)
\y A word boundary
brown\s+ Match brown and 1+ whitespace chars
fox\s+ Match fox and 1+ whitespace chars
(\w+) Capture 1+ word chars in group 1
In awk, get the group 1 value using arr[1]
Example
echo "The quick brown fox jumps over the lazy dog" |
awk 'match($0,/\ybrown\s+fox\s+(\w+)/, arr) {print arr[1]}'
Output
jumps
See a bash demo

Related

How to remove all lines after a line containing some string?

I need to remove all lines after the first line that contains the word "fox".
For example, for the following input file "in.txt":
The quick
brown fox jumps
over
the
lazy dog
the second fox
is hunting
The result will be:
The quick
brown fox jumps
I prefer a script made in awk or sed but any other command line tools are good, like perl or php or python etc.
I am using gnuwin32 tools in Windows, and the solution I could find was this one:
grep -m1 -n fox in.txt | cut -f1 -d":" > var.txt
set /p MyNumber=<var.txt
head -%MyNumber% in.txt > out.txt
However, I am looking for a solution that is shorter and that is portable (this one contains Windows specific command set /p).
Similar questions:
How to delete all the lines after the last occurence of pattern?
How to delete lines before a match perserving it?
Remove all lines before a match with sed
How to delete the lines starting from the 1st line till line before encountering the pattern '[ERROR] -17-12-2015' using sed?
How to delete all lines before the first and after the last occurrence of a string?
awk '{print} /fox/{exit}' file
With GNU sed:
sed '0,/fox/!d' file
or
sed -n '0,/fox/p' file

Join between two patterns

I have been searching for an answer to this but am not finding what works. This is what I am trying to accomplish. In a file I have lines that begin with a specific pattern and sometimes there is a line between them and other times there is not. I am trying join the line between the patterns to the first pattern line. Example below:
Current output:
Name: Doe John
Some Random String
Mailing Address: 1234 Street Any Town, USA
Note: The "Some Random String" line sometimes does not exist so the join would not be needed
Desired output:
Name: Doe John Some Random String
Mailing Address: 1234 Street Any Town, USA
I have tried sed and awk answers I have found on the net but cannot wrap my head around how to make this work. My sed and awk skills are very basic at this point so I don't quite understand some of the solutions even when explained.
Thanks for any help or a point to documentation that talks about what I am trying to accomplish.
Could you please try following, written and tested with shown samples in GNU awk.
awk '{printf("%s%s",FNR>1 && $0~/^Mailing/?ORS:OFS,$0)} END{print ""}' Input_file
OR if you want to put new lines only for Name and Mailing both strings then try following.
awk '
{
printf("%s%s",FNR>1 && ($0~/^Mailing/ || $0 ~/Name:/)?ORS:OFS,$0)
}
END{
print ""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
printf("%s%s",FNR>1 && ($0~/^Mailing/ || $0 ~/Name:/)?ORS:OFS,$0)
##Using printf to print strings, 1st one is either newline or space, which is based on
##condition if line is greater than 1 OR line is either starts with Mailing or has Name
##Then print ORS(newline) or print OFS(space). For 2nd string print current line.
}
END{ ##Starting END block of this program from here.
print "" ##Printing new line here.
}
' Input_file ##Mentioning Input_file name here.
Another awk where you define the specific patterns:
$ awk '
BEGIN {
p["Name"] # define the specific patters that start the record
p["Mailing Address"]
}
{
printf "%s%s",(split($0,t,":")>1&&(t[1] in p)&&NR>1?ORS:""),$0
}
END {
print "" # conditional operator controls the ORS so needed here
}' file
Output on slightly modified data (extra space comes from your data, didn't trim them):
Name: Doe John Some Random String
Mailing Address: 1234 Street Any Town, USA Using: but not specific pattern
How about a GNU sed solution:
sed '
/^Name:/{ ;# if the line starts with "Name:" enter the block
N ;# read the next line and append to the pattern space
:l1 ;# define a label "l1"
/\nMailing Address:/! {N; s/\n//; b l1} ;# if the next line does not start with "Mailing Address:"
;# then append next line, remove newline and goto label "l1"
}' file
This might work for you (GNU sed):
sed '/Name:/{:a;N;/Mailing Address:/!s/\s*\n\s*/ /;$!ta}' file
If a line contains Name: keep appending lines and replacing white space either side of the newline by a space, until the end of file or a line containing Mailing Address:.

Use awk to swap two set of word(s) seperated by a given string

In my input file, each line has two word(s) separated by the string 'swap'.
I need to swap the word(s) before 'swap' with the word(s) after 'swap' from each line.
Input: 'cat myfile.txt'
world swap hello
hoo swap woo
I'm cooler swap You're cool
Expected Output:
hello swap world
woo swap hoo
You're cool swap I'm cooler
Is it additionally possible to replace 'swap' with '-' in the output like:
hello - world
woo - hoo
You're cool - I'm cooler
Following awk may help you here.
awk -F" +swap +" '{print $NF,"-",$1}' Input_file
Try this:
awk -F"swap" '{print $2 " - " $1 }' myfile.txt

In a file with two word lines, grep only those lines which have both words from a whitelist

I have a file1:
green
yellow
apple
mango
and a file2:
red apple
blue banana
yellow mango
purple cabbage
I need to find elements from file2 where both words belong to the list in file1. So it should show:
yellow mango
I tried:
awk < file2 '{if [grep -q $1 file1] && [grep -q $2 file1]; then print $0; fi}'
I am getting syntax error.
This will do the trick:
$ awk 'NR==FNR{a[$0];next}($1 in a)&&($2 in a)' file1 file2
yellow mango
Explanation:
NR is a special awk variable the tracks the current line in the input and FNR tracks the current line in each individual file so the condition NR==FNR is only true when we are in the first file. a is a associative array where the keys are each unique line in the first file. $0 is the value of the current line in awk. The next statement jumps to the next line in file to the next part of skip is not executed. The final part is straight forward if the first field $1 is in the array a and the second field then print the current line. The default block in awk is {print $0} so this is implicit.
This is a very hackish approach and probably frowned upon by many of the grep/sed implementors. In addition it is probably terminal dependent. You have been warned.
GNU grep, when in color mode, highlights pieces of the input that were matched by one of the patterns, this could in theory, be used as a test for a full match. Here, this even works in practice, that is, with some help from GNU sed:
grep --color=always -f file1 file2 | sed -n '/^\x1b.*\x1b\[K *\x1b.*\x1b\[K$/ { s/\x1b\[K//g; s/\x1b[^m]*m//gp }'
Output:
yellow mango
Note that the sed pattern assumes space separated columns in file2.
You can do it with bash, sed and grep:
grep -f <(sed 's/^/^/' file1) file2 | grep -f <(sed 's/$/$/' file1)
this is a bit obscure, so I will break it down:
grep -f <file> reads a sequence of patterns from a file and will match on any of them.
<(...) is bash process substitution and will execute a shell command and create a pseudo-file with the output that can be used in place of a filename.
sed 's/^/^/' file1 inserts a ^ character at the start of each line in file1, turning the lines into patterns that will match the first word of file2.
sed 's/$/$/' file1 inserts a $ character at the end, so the patterns will match the second word.
Edit:
Use:
grep -f <(sed 's/^/^/;s/$/\b/' file1) file2 | grep -f <(sed 's/$/$/;s/^/\b/' file1)
to get round the issue that Jonathan pointed out in his comment.

sed script to print the first three words in each line

I wonder how can I do the following thing with sed:
I need to keep only the first three words in each line.
For example, the following text:
the quick brown fox jumps over the lazy bear
the blue lion is hungry
will be transformed in:
the quick brown
the blue lion
In awk you can say:
{print $1, $2, $3}
You can use cut like this:
cut -d' ' -f1-3
I would suggest awk in this situation:
awk '{print $1,$2,$3}' ./infile
% (echo "A B C D E F G H";echo "a b c d e f g h") | sed -E 's/([^\s].){3}//'
I put the "-E" in there for OS X compatibility. Other Unix systems may or may not need it.
edit: damnitall - brainfart. use this:
% sed -E 's/(([^ ]+ ){3}).*/\1/' <<END
the quick brown fox jumps over the lazy bear
the blue lion is hungry
END
the quick brown
the blue lion
Just using the shell
while read -r a b c d
do
echo $a $b $c
done < file
Ruby(1.9)+
ruby -ane 'print "#{$F[0]} #{$F[1]} #{$F[2]}\n"' file
If you need a sed script, you can try:
echo "the quick brown fox jumps over the lazy bear" | sed 's/^\([a-zA-Z]\+\ [a-zA-Z]\+\ [a-zA-Z]\+\).*/\1/'
But I think it would be easier using cut:
echo "the quick brown fox jumps over the lazy bear" | cut -d' ' -f1,2,3
Here's an ugly one with sed:
$ echo the quick brown fox jumps over the lazy bear | sed 's|^\(\([^[:space:]]\+[[:space:]]\+\)\{2\}[^[:space:]]\+\).*|\1|'
the quick brown
If Perl is an option:
perl -lane 'print "$F[0] $F[1] $F[2]"' file
or
perl -lane 'print join " ", #F[0..2]' file