awk -Search pattern through Variable - awk

We have wrote shell script for multiple file name search pattern.
file format:
<number>_<20180809>.txt
starting with single number and ending with 8 digits number
Command:
awk -v string="12_1234" -v serch="^[0-9]+_+[0-9][0-9][0-9][0-9]$" "BEGIN{ if (string ~/serch$/) print string }"
If sting matches then return value.

You can just change your command in the following way and it will work:
awk -v string='12_1234' -v search='^[0-9]+_+[0-9][0-9][0-9][0-9]$' 'BEGIN{ if (string ~ search) print string }'
12_1234
You do not need to use /.../ syntax for regex if you use the ~ operator and also you had one extra $. You were really close!!!
Then you must adapt the search regex into ^[0-9]_[0-9]{8}$ to match exactly your_<20180809>` pattern.
Also if you are just extracting this information from the file you can use grep,
$ awk -v string='1_12345678' -v search='^[0-9]_[0-9]{8}$' 'BEGIN{ if (string ~ search) print string }'
1_12345678
$ (search='^[0-9]_[0-9]{8}$'; echo '1_12345678')| grep -oE "$search"
1_12345678

Related

Testing it awk for data with square brackets [syslog]

I have a text file like this:
File1 [test]
File1 sgfg
File1 fdgsfg
File1 [rsyslog]
File1 moredata
File1 MAX_EVENTS = 256
File1 fgsfg
File1 [other]
File1 Not this
File2 [syslog]
File2 extra
File2 MAX_EVENTS = 12
With awk I would like to match field $2 when it contains [syslog]
Example this works
awk '$2~/\[syslog\]/' file
But I like to define field in advance using var.
Not working
awk -v var="[syslog]" '$2~var' file
awk -v var="\[syslog\]" '$2~var' file
awk -v var="syslog" '{test="["var"]"} $2~test' file
This works since both sub needs to be true as well as the text match, but complicated :)
awk -v var="syslog" 'sub(/^\[/,"",$2) && sub(/\]/,"",$2) && $2==var' file
Working cases:
$ awk -v var='[syslog]' 'index($2, var)' file
File2 [syslog]
$ awk -v var='syslog' '$2~"\\[" var "\\]"' file
File2 [syslog]
$ awk -v var='[[]syslog[]]' '$2~var' file
File2 [syslog]
Basically take care of the escaping, or don't use regex matching.
As Ed kindly mentioned in the comment, ] alone does not need to be escaped:
awk -v var='syslog' '$2~"\\[" var "]"' file
awk -v var='[[]syslog]' '$2~var' file
You didn't say if you wanted a full or partial match or if you wanted a string or regexp match so here's some options:
Full string match:
awk -v var='[syslog]' '$2 == var' file
Partial string match:
awk -v var='[syslog]' 'index($2,var)' file
Full regexp match:
awk -v var='[[]syslog]' '$2 ~ "^"var"$"' file
Partial regexp match:
awk -v var='[[]syslog]' '$2 ~ var' file
There are of course, many other ways to do that too including escaping regexp metachars within the awk script to make them literal, specifying the string between [...] in the var then adding them in the awk script, matching just at the start or end of the field, etc.
See How do I find the text that matches a pattern? for more info on the different kinds of matching and Is it possible to escape regex metacharacters reliably with sed (applies to awk too) for how to escape regexp metachars to make them be treated as literal.
How about something like this?
awk -v var="[syslog]" '$2 == var' my_file
A bit of explanation. If you don't need regular expression matching you can just use == operator which compares strings literally.
Your "Not working" examples weren't working because:
The regular expression is not correct. It matches a single character, any of s,y,l,o,g.
Escaping is not correct, this would have worked var="\\\\[syslog\\\\]". But awk should have warned you about this with the message awk: warning: escape sequence '\[' treated as plain '['.
Not sure, honestly.

Recognising backslash in awk field separator

Input is
AZE D11/879\x0Dabc\x0D\x0A\x1E!DEF F11/999
awk script sets field separator to "\x0D" (I have tried with and without escaping the backslash.
awk script is
BEGIN {FS="\\x0D"}
{print NF}
It should output 3 because there are 2 occurrences of the field separator but it outputs 1 which indicates it is not being recognized.
There are 2 ways to provide a regexp in awk - a static regexp (aka regexp literal) written as /regexp/ and a dynamic regexp (aka computed regexp) written as "regexp" and used in a regexp context. A field separator is just a regexp with some additional behavior so lets just consider regexps in general to explain what's going on in your example.
The split() function takes a field separator (a regexp for our purposes) as it's third argument so it provides a good test bed:
Using a static regexp:
$ awk '{print split($0,a,/\x0D/)}' file
1
The \ above is escaping the x, it's not a literal \. For that you need to escape the \ itself:
$ awk '{print split($0,a,/\\x0D/)}' file
3
What if we used a dynamic regexp instead of the above static regexp?
$ awk '{print split($0,a,"\x0D")}' file
1
$ awk '{print split($0,a,"\\x0D")}' file
1
$ awk '{print split($0,a,"\\\x0D")}' file
' is not a known regexp operator FNR=1) warning: regexp escape sequence `\
1
$ awk '{print split($0,a,"\\\\x0D")}' file
3
The behavior above is because awk first parses the string to convert it into a regexp (using up one layer of escape chars) and then parses it a second time when using it as a regexp (using up a second layer of escape chars).
Unfortunately when you specify a FS there is no option to specify it as a literal regexp, it's always specified using a string and thus is a dynamic regexp and so needs an extra layer of escaping:
$ awk -v FS='\x0D' '{print NF}' file
1
$ awk -v FS='\\x0D' '{print NF}' file
1
$ awk -v FS='\\\x0D' '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS='\\\\x0D' '{print NF}' file
3
Now - what if you were using the wrong type of quotes in the shell part of the script, i.e. " instead of '? Then you introduce even more pain because now you're inviting the shell to also parse the string even before awk gets to see and parse it twice:
$ awk -v FS="\\\\x0D" '{print NF}' file
1
$ awk -v FS="\\\\\x0D" '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS="\\\\\\x0D" '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS="\\\\\\\x0D" '{print NF}' file
3
That's different from the case where the double quotes are using inside awk because that's all wrapped inside single quotes and so protected from the shell already:
$ awk 'BEGIN{FS="\\\\x0D"} {print NF}' file
3
So - in the shell always use the most restrictive quotes (' over " over none) unless you have a very specific reason not to, and when using regexps or field separators always use literal /.../ rather than dynamic "...", again unless you have a very specific reason not to.
The odd, truncated looking error message above are because of the \rs the tool is trying to print due to the escape sequence we're providing, they're really all warning: regexp escape sequence '\^M' is not a known regexp operator
You need two backslashes for a literal backslash since \ is an escape character:
$ echo 'AZE D11/879\x0Dabc\x0D\x0A\x1E!DEF F11/999' |
awk 'BEGIN{ FS="\\\\x0D" } { print NF }'
3

AWK that reads up to the /

I have the following lines of text :
170311 005201 0433 DE(N) itemhandling itemAddBarCodeData: Barcode(1/1) <0157357069/OK> ##[ti=7672,
170311 005323 0433 DE(N) itemhandling itemAddBarCodeData: Barcode(1/1) </NOREAD> ##[ti=7672,
I have the following script :
grep "itemAddBarCodeData" %myItemHandling% | gawk -F "[<>]+" -v OFS=, "{for(i=1;i<=NF;++i){if($i~/Barcode/){print substr($1,5,2)substr($1,3,2)substr($1,1,2),substr($1,8,6),$(i+1)}}}" > %myOutputPath%%myFilename%
What I need is a script that reads only the /NOREAD and the /OK so the output is like :
11-03-17,00:52:01,NOREAD
11-03-17,00:53:23,OK
any help would be greatly appreciated
Thanks
Complex gawk approach:
awk -F"[ />]" '{patsplit($1, a, /[0-9]{2}/); patsplit($2, b, /[0-9]{2}/);
printf("%s-%s-%s,%s:%s:%s,%s\n",a[3],a[2],a[1],b[1],b[2],b[3],$10)}' inpufile
The output:
11-03-17,00:52:01,OK
11-03-17,00:53:23,NOREAD
-F"[ />]" - "composite" field separator
patsplit(string, array [, fieldpat [, steps ] ])
Divide string into pieces defined by fieldpat and store the pieces in array and
the separator strings in the seps array.
You can use this following script:
script.awk
/\/[A-Z]+>/ { match($1"-"$2,/(..)(..)(..)-(..)(..)(..)/,ts)
dt=mktime( sprintf("20%s %s %s %s %s %s",
ts[1], ts[2], ts[3],
ts[4], ts[5], ts[6]) )
dtd = strftime( "%d-%m-%y", dt )
dts = strftime( "%H:%M:%S", dt )
match ( $0, /\/[A-Z]+>/) # set RSTART and RLENGTH
print dtd, dts, substr( $0, RSTART+1, RLENGTH-2)
}
Run it like this: awk -v OFS=, -f script.awk yourfile
The important part is the second match function call, which matches
a string of capital letters [A_Z]
preceded by a /
followed by a >.
It should match the OK and NOREAD case and not the Barcode(1/1).
The variables
RSTART and
RLENGTH
are set by the match function, we have to correct them by +1 and -2, because the match RE included / and >.
The first match, mktime, strftime and the sprintf function call are another way the format the date and time. The time functions are GNU AWK extensions.
Regular awk version:
awk '
{
d=$1$2
gsub(/../,"& ",d)
split(d,T)
split($8,R,"[/>]")
printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]
}
' file
With script in file:
script.awk:
{
d=$1$2
gsub(/../,"& ",d)
split(d,T)
split($8,R,"[/>]")
printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]
}
awk -f script.awk file
crammed on one line..
awk '{d=$1$2; gsub(/../,"& ",d); split(d,T); split($8,R,"[/>]"); printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]}' file
You don't need grep when you're using awk. With GNU awk for gensub():
$ awk '/itemAddBarCodeData/{print gensub(/(..)(..)(..) (..)(..)(..).*\/([^>]+).*/,"\\3-\\2-\\1,\\4:\\5:\\6,\\7",1)}' file
11-03-17,00:52:01,OK
11-03-17,00:53:23,NOREAD
Here's a pragmatic combination of awk and sed that is conceptually relatively simple:
On Linux and BSD/macOS:
awk -F'[ />]' -v OFS=, '/itemAddBarCodeData/ {print $1, $2, $10}' file |
sed -E 's/^(..)(..)(..),(..)(..)(..)/\3-\2-\1,\4:\5:\6/'
On a Windows system, invoked from cmd.exe, different quoting and line continuation rules apply (assumes the presence of ported GNU utilities):
awk -F"[ />]" -v OFS=, "/itemAddBarCodeData/ {print $1, $2, $10}" file ^
| sed -E "s/^(..)(..)(..),(..)(..)(..)/\3-\2-\1,\4:\5:\6/"
Note how:
"..." strings rather than '...' strings must be used to protect the embedded content from interpretation by the shell
Unlike with "..." on Unix, $ has no special meaning to cmd.exe, so it can be used as-is.
^ as the very last character on a line serves as the explicit line-continuation character, and the line must be broken before the | (whereas on Unix a line ending in | is implicitly continued).
This is only used for readability here; of course, you can place your command on a single line.

How to use variable including special symbol in awk?

For my case, if a certain pattern is found as the second field of one line in a file, then I need print the first two fields. And it should be able to handle case with special symbol like backslash.
My solution is first using sed to replace \ with \\, then pass the new variable to awk, then awk will parse \\ as \ then match the field 2.
escaped_str=$( echo "$pattern" | sed 's/\\/\\\\/g')
input | awk -v awk_escaped_str="$escaped_str" '$2==awk_escaped_str { $0=$1 " " $2 " "}; { print } '
While this seems too complicated, and cannot handle various case.
Is there a better way which is more simpler and could cover all other special symbol?
The way to pass a shell variable to awk without backslashes being interpreted is to pass it in the arg list instead of populating an awk variable outside of the script:
$ shellvar='a\tb'
$ awk -v awkvar="$shellvar" 'BEGIN{ printf "<%s>\n",awkvar }'
<a b>
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]=""; printf "<%s>\n",awkvar }' "$shellvar"
<a\tb>
and then you can search a file for it as a string using index() or ==:
$ cat file
a b
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } index($0,awkvar)' "$shellvar" file
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } $0 == awkvar' "$shellvar" file
a\tb
You need to set ARGV[1]="" after populating the awk variable to avoid the shell variable value also being treated as a file name. Unlike any other way of passing in a variable, ALL characters used in a variable this way are treated literally with no "special" meaning.
There are three variations you can try without needing to escape your pattern:
This one tests literal strings. No regex instance is interpreted:
$2 == expr
This one tests if a literal string is a subset:
index($2, expr)
This one tests regex pattern:
$2 ~ pattern

awk use a command line variable

awk -F, -f awkfile.awk -v mysearch="search term"
I am trying to use the above command from terminal and use search as the search term in the awk program. My awk program runs perfectly fine while actually assigning the search term inside of the program but I am wondering how to get the variable search to be used?
example of the line it's used at if($j ~ /mysearch/){, this does not work at setting the search term, but actually searching for the string mysearch.
Just remove the slashes:
$j ~ mysearch
This is not ideal, but I suggest to write a bash script, which takes in the search term, replace that search term in the awk script, then run the script. For example:
$ cat dosearch.sh
sed "s/XXX/$1/" awktemplate.awk > awkfile.awk
awk -f awkfile.awk data.txt
$ cat awktemplate.awk
{
j = 1
if ($j ~ /XXX/) {
# Do something, such as
print "Found:", $0
}
}
$ cat data.txt
foo here
bar there
xyz everywhere
$ ./dosearch.sh foo
Found: foo here
$ ./dosearch.sh bar
Found: bar there
In the above example, the awk template contains "XXX" as a search term, the bash script replaces that search term with the first parameter, then invoke awk on the modified script.
$ cat input
tinky-winky
dipsy
laa-laa
noo-noo
po
$ teletubby='po'
$ awk -v "regexp=$teletubby" '$0 ~ regexp' input
po
Note that anything could go into the shell-variable,
even a full-blown regexp, e.g ^d.*y. Just make sure to use single-quotes
to prevent the shell from doing any expansion.