How to search by variable in awk - awk

I'm trying to get the N-th row after a given pattern with awk.
The problem is that awk searches pattern literally:
awk -v patt=${1} -v rows=${2}'NR==p {print} /patt/ {p=NR+rows}'
How to escape the "patt" valiable ?

Use the awk matching operator instead of the slashes:
awk -v patt=${1} -v rows=${2} 'NR==p {print} $0 ~ patt {p=NR+rows}'

I've maneged to get it work,with double quotes
patt=${1}
awk -v rows=${2} "NR==p {print} /${patt}/ {p=NR+rows}" $3

There's nothing special about the string containing the awk program, so you can build it as usual in the shell, e.g.:
awk -v rows=${2}'NR==p {print} /'"$1"'/ {p=NR+rows}'

Related

Chain awk regex matches like grep

I am trying to use awk to select/remove data based on cell entries in a CSV file.
How do I chain Awk commands to build up complex searches like I have done with grep? I plan to use Awk to select rows based on matching criteria in cells in multiple columns, not just the first column as in this example.
Test data
123,line1
123a,line2
abc,line3
G-123,line4
G-123a,line5
Separate Awk statements with intermediate files
awk '$1 !~ /^[[:digit:]]/ {print $0}' file.txt > output1.txt
awk '$1 !~ /^G-[[:digit:]]/ {print $0}' output1.txt > output2.txt
mv output2.txt output.txt
cat output.txt
Chained or multi-line grep version (I think limited to first column only)
grep -v \
-e "^[[:digit:]]" \
-e "^G-[[:digit:]]" \
file.txt > output.txt
cat output.txt
How can I rewrite the Awk command to avoid the intermediate files?
Generally, in awk there are boolean operators available (it's better than grep! :) )
awk '/match1/ || /match2/' file
awk '(/match1/ || /match2/ ) && /match3/' file
and so on ...
In your example you could use something like:
awk -F, '$1 ~ /^[[:digit:]]/ || $1 ~ /G-[[:digit:]]/' input >> output
Note: This is just an example of how to use boolean operators. Also the regular expression itself could have been used here to express the alternative match:
awk -F, '$1 ~ /^(G-)?[[:digit:]]/' input >> ouput
In your awk commands and example, awk regards file.txt as having only one field because you have not defined FS, so the default whitespace field separator is used.
With that said, you can easily AND your two pattern matches together like this:
awk '($1 !~ /^[[:digit:]]/) && ($1 !~ /^G-[[:digit:]]/) {print $0}' file.txt
To make awk use comma as a field separator, you can define it in a BEGIN block. In this example, the output should be just line3
awk 'BEGIN {FS=","} ($1 !~ /^[[:digit:]]/) && ($1 !~ /^G-[[:digit:]]/) {print $2}' file.txt
I would suggest the literal translation of that grep command in awk is
awk '
/^[[:digit:]]/ {next}
/^G-[[:digit:]]/ {next}
{print}
' file.txt
But you have several examples of how to write it more concisely.
You can use
awk '$1 !~ /^(G-)?[[:digit:]]/' file.txt > output.txt
The awk tries to find in Field 1:
^ - start of string
(G-)? - an optional G- char sequence (note the regex flavor in awk is POSIX ERE, so (...) denotes a capturing group and ? denotes a one or zero times quantifier)
[[:digit:]] - a digit.
If the match is found, the record (=line) is not printed. Else, the line is printed.
to stick to your question, I would use:
awk '$1 !~ /^[[:digit:]]/ && $1 !~ /G-[[:digit:]]/' file.txt > output.txt
But I like the #Wiktor Stribiżew REGEX approach!
With your shown samples, this could be also done in grep in a single regexp, we need not to chain the different regex, adding this solution in case you/anyone need it; could be helpful.
grep -v -E '^(G-)?[[:digit:]]' Input_file
Explanation: Simple explanation would be, using grep's -v option to omit lines which are matching the mentioned pattern. Then using -E option of it to enable ERE(extended regular expressions). In main program using regex ^(G-)?[[:digit:]] to match if line starts from G- OR digit then don't print that line.

Testing it awk for data with square brackets [syslog]

I have a text file like this:
File1 [test]
File1 sgfg
File1 fdgsfg
File1 [rsyslog]
File1 moredata
File1 MAX_EVENTS = 256
File1 fgsfg
File1 [other]
File1 Not this
File2 [syslog]
File2 extra
File2 MAX_EVENTS = 12
With awk I would like to match field $2 when it contains [syslog]
Example this works
awk '$2~/\[syslog\]/' file
But I like to define field in advance using var.
Not working
awk -v var="[syslog]" '$2~var' file
awk -v var="\[syslog\]" '$2~var' file
awk -v var="syslog" '{test="["var"]"} $2~test' file
This works since both sub needs to be true as well as the text match, but complicated :)
awk -v var="syslog" 'sub(/^\[/,"",$2) && sub(/\]/,"",$2) && $2==var' file
Working cases:
$ awk -v var='[syslog]' 'index($2, var)' file
File2 [syslog]
$ awk -v var='syslog' '$2~"\\[" var "\\]"' file
File2 [syslog]
$ awk -v var='[[]syslog[]]' '$2~var' file
File2 [syslog]
Basically take care of the escaping, or don't use regex matching.
As Ed kindly mentioned in the comment, ] alone does not need to be escaped:
awk -v var='syslog' '$2~"\\[" var "]"' file
awk -v var='[[]syslog]' '$2~var' file
You didn't say if you wanted a full or partial match or if you wanted a string or regexp match so here's some options:
Full string match:
awk -v var='[syslog]' '$2 == var' file
Partial string match:
awk -v var='[syslog]' 'index($2,var)' file
Full regexp match:
awk -v var='[[]syslog]' '$2 ~ "^"var"$"' file
Partial regexp match:
awk -v var='[[]syslog]' '$2 ~ var' file
There are of course, many other ways to do that too including escaping regexp metachars within the awk script to make them literal, specifying the string between [...] in the var then adding them in the awk script, matching just at the start or end of the field, etc.
See How do I find the text that matches a pattern? for more info on the different kinds of matching and Is it possible to escape regex metacharacters reliably with sed (applies to awk too) for how to escape regexp metachars to make them be treated as literal.
How about something like this?
awk -v var="[syslog]" '$2 == var' my_file
A bit of explanation. If you don't need regular expression matching you can just use == operator which compares strings literally.
Your "Not working" examples weren't working because:
The regular expression is not correct. It matches a single character, any of s,y,l,o,g.
Escaping is not correct, this would have worked var="\\\\[syslog\\\\]". But awk should have warned you about this with the message awk: warning: escape sequence '\[' treated as plain '['.
Not sure, honestly.

How to extract string from a file in bash

I have a file called DB_create.sql which has this line
CREATE DATABASE testrepo;
I want to extract only testrepo from this. So I've tried
cat DB_create.sql | awk '{print $3}'
This gives me testrepo;
I need only testrepo. How do I get this ?
With your shown samples, please try following.
awk -F'[ ;]' '{print $(NF-1)}' DB_create.sql
OR
awk -F'[ ;]' '{print $3}' DB_create.sql
OR without setting any field separators try:
awk '{sub(/;$/,"");print $3}' DB_create.sql
Simple explanation would be: making field separator as space OR semi colon and then printing 2nd last field($NF-1) which is required by OP here. Also you need not to use cat command with awk because awk can read Input_file by itself.
Using gnu awk, you can set record separator as ; + line break:
awk -v RS=';\r?\n' '{print $3}' file.sql
testrepo
Or using any POSIX awk, just do a call to sub to strip trailing ;:
awk '{sub(/;$/, "", $3); print $3}' file.sql
testrepo
You can use
awk -F'[;[:space:]]+' '{print $3}' DB_create.sql
where the field separator is set to a [;[:space:]]+ regex that matches one or more occurrences of ; or/and whitespace chars. Then, Field 3 will contain the string you need without the semi-colon.
More pattern details:
[ - start of a bracket expression
; - a ; char
[:space:] - any whitespace char
] - end of the bracket expression
+ - a POSIX ERE one or more occurrences quantifier.
See the online demo.
Use your own code but adding the function sub():
cat DB_create.sql | awk '{sub(/;$/, "",$3);print $3}'
Although it's better not using cat. Here you can see why: Comparison of cat pipe awk operation to awk command on a file
So better this way:
awk '{sub(/;$/, "",$3);print $3}' file

Regexp in gawk matches multiples ways

I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage

Using Awk to search for a string that has spaces

I'm having trouble searching for the last occurrence of a string in a file using awk. I'm passing a string to the script example "Ping has failed on hostname". I keep getting awk: ^ unterminated string.
#!/bin/sh
LOG=/opt/netcool/omnibus/log/mttrapd.log
TMP_FILE=sitescope.$$
args="$*"
#ruby sitescope.rb
echo "looking for $1 "
tail -1000 $LOG > $TMP_FILE
echo "WORD = $args"
awk '"/'$args'/" {f=$0} END{print f}' $TMP_FILE > data.out
rm -f $TMP_FILE
Rather than play quoting games, pass the shell variable to awk with the -v option
awk -v pattern="$*" 'match($0, pattern) {f=$0} END {print f}'
The point of the single-quotes around the awk string is to keep everything in the first argument (and prevent shell substitution). You can be a bit more flexable with how you put that argument together as
awk "/$args/"' {f=$0} END{print f}' $TMP_FILE > data.out