awk: any way to access the matched groups in action? [duplicate] - awk

This question already has answers here:
AWK: Access captured group from line pattern
(7 answers)
Closed 8 years ago.
I often find myself doing the same match in the action as the pattern, to access some part of the input record, e.g.
/^Compiled from \"(.*)\"$/ {
file_name = gensub("^Compiled from \"(.*)\"$", "\\1", "g");
print file_name;
}
So the regexp matching is done twice. Is there any way I can access \\1 in the action without matching again?
I am trying to both reduce on pattert matching and extra code.

Unfortunately, GAWK, doesn't have the carry-forward feature of sed which uses an empty //.
sed '/\(patt\)ern/ {s//new\1/}' inputfile
However, you can rejoice since variables have recently been invented and they can be used for just this purpose!
BEGIN {
pattern = "^Compiled from \"(.*)\"$"
}
$0 ~ pattern {
file_name = gensub(pattern, "\\1", "");
print file_name;
}

Related

Replace part of string with variable using sed or awk [duplicate]

This question already has answers here:
replace parts of url with sed in a file
(2 answers)
Closed 1 year ago.
Is there a way to replace only a part of a string with a random number using sed or awk?
From a URL list file, i need to replace each GET url param that matches the pattern with a random number.
Example:
https://example.com?param1=somevalue&param2=somevalue
https://otherexample.com?param1=somevalue&param2=somevalue
Replace "param1=" with random number
So, after replace:
https://example.com?param1=512512412&param2=somevalue
https://otherexample.com?param1=9568478547&param2=somevalue
basically same syntax for sed and awk - the extended regex is
s/(param)([0-9])=([0-9]{3,})/\1\2= { … fill in rand number u want … } /g'
awk side, either mawk or gawk
match($0, /(param)([0-9])=([0-9][0-9]+)/);
$0 = substr($0,1,RSTART + 6)
{…fill in rand #…}
substr($0, RSTART+RLENGTH);
Unfortunately, since mawk2 has neither backreferences like \1 \2 nor bracket expansion like {3,} , sadly if u wanna write portable code then match is the way to go.
if u wanna stay on gawk, then gensub works reasonably okay.

Find strings for 'awk' in the squid log [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 5 years ago.
I want to do the following in the code below when I find these 2 strings, I want to get the IP of the same line that found these string (at least one of them) and throw the IP into a .txt file so that I can handle it in squid.conf .
I'm trying to build a Splash Page in squid, and I only have the features of IPcop. Because of the code that I put up, it does not work because it compares any string, not the ones I need. Can anyone help?
#!/bin/sh
TAIL="/usr/bin/tail -f"
SQUID="/var/log/squid/access.log"
PRINCIPAL1="http://cartilha.cert.br/"
PRINCIPAL2="cartilha.cert.br:443"
LOG="/tmp/autenticados.txt"
$TAIL $SQUID | gawk '{if ($7 = $PRINCIPAL1 || $7 = $PRINCIPAL2) {print $3} }' >> $LOG
With the variable -v does not accept conditional.
I do not think the topic is duplicated, because in the one they have passed, it has no conditional whatsoever.
I tried instead of just putting = = = = but doing so it does not give anything ..
The idea is simple, when accessing one of these links above I need to pick which IP accessed and play in a txt ... just that.

user input inside awk -- or -- variable usage in search pattern [duplicate]

This question already has answers here:
get the user input in awk
(3 answers)
Closed 6 years ago.
I'm trying to take a user input in a pure [g]awk code. now the requirement is that I want the user to enter either today or current - number of days to generate a report. I can't find any routine inside the awk to read user's input. sometime back I had read a document on awk where it was done using either sprintf or printf, but I dont know how.
OR
in awk, I'm using BEGIN block to setup a variable and then search based on that, but not finding it quite helpful to search the variable based search. something like below:
awk -F "|" ' BEGIN { PWR="Nov 3"; }
/Deployment started at PWR/ { print $1 + $NF }' /var/log/deployments
this offensively denies me any search for the pattern of "Deployment started at Nov 3".
Inside the regex slashes, you don't have access to your variables. What you can do is make a string out of the search phrase then apply that string as a regex.
awk -F "|" ' BEGIN { PWR="Nov 3"; }
$0 ~ "Deployment started at "PWR { print $1 + $NF }' /var/log/deployments

Change FS and RS to parse newline char [duplicate]

This question already has answers here:
Read lines from a file into a Bash array [duplicate]
(6 answers)
Closed 6 years ago.
I'm using awk in a shell script to parse a file.
My question has been marked as duplicate of an other, but I want to use awk and I didn't find the same question
Here is the file format:
Hi everyone I'm new\n
Can u help me please\n
to split this file\n
with awk ?\n
The result I hope:
tab[0]=Hi everyone I'm new
tab[1]=Can u help me please
tab[2]=to split this file
tab[3]=with awk ?
So I tried to change FS and RS values to tried get what I wanted but without success. Here what I tried:
config=`cat $1`
tab=($(echo $config | awk '
{
for (i = 1; i < (NF); i++)
print $i;
}'))
And what I get:
Hi
everyone
I'm
new
Can
u
help
me
please
to
split
this
file
with
awk
Do u know how to proceed please ? :/
The problem is that however you parse the file in awk, it's returned to the shell as a simple string.
AWK splits a file into records (line ending in \n), and records are further split into fields (separated by FS, space by default).
In order to assign the returned string to an array, you need to set the shell's IFS to newline, or assign the lines to array items one by one (you can filter the record with NR, which would then require you to read the file several times with AWK).
Your best course of action is to just print the records in AWK and assign them to a bash array using compound assignment, with IFS set to newline character
#/bin/bash
declare -a tab
IFS='
'
# Compount assignment: array=(words)
# Print record: { print } is the same as { print $0 }
# where $0 is the record and $1 ... $N are the fields in the record
tab=($(awk '{ print }' file))
unset IFS
for index in ${!tab[#]}; do
echo "${index}: ${tab[index]}"
done
# Output:
# 0: Hi everyone I'm new
# 1: Can u help me please
# 2: to split this file
# 3: with awk ?
Notice that awk is hardly used at all, and should be replaced with simple cat.

format regexp constant on several lines for readability

For learning purposes I am implementing a little regexp matcher for telephone numbers. My goal is readability, not the shortest possible gawk program:
# should match
#1234567890
#123-456-7890
#123.456.7890
#(123)456-7890
#(123) 456-7890
BEGIN{
regexp="[0-9]{10},[0-9]{3}[-.][0-9]{3}[.-][0-9]{4},\\([0-9]{3}\\) ?[0-9]{3}-[0-9]{4}"
len=split(regexp,regs,/,/)
}
{for (i=1;i<=len;i++)
if ($0 ~ regs[i]) print $0
}
For better readability I would like to split the line regexp="... on several lines like:
regexp="[0-9]{10}
,[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}
,\\([0-9]{3}\\) ?[0-9]{3}-[0-9]{4}"
Is there an easy way to do this in awk?
BEGIN {
regs[1] = "[0-9]{10}"
regs[2] = "[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}"
regs[3] = "\\([0-9]{3}\\) ?[0-9]{3}-[0-9]{4}"
c = 3
}
{
for (i = 1; i <= c; i++)
if ($0 ~ regs[i])
print $0
}
If your awk implementation supports length(array) - use it (see Jaypal Singh comments below):
BEGIN {
regs[1] = "[0-9]{10}"
regs[2] = "[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}"
regs[3] = "\\([0-9]{3}\\) ?[0-9]{3}-[0-9]{4}"
}
{
for (i = 1; i <= length(regs); i++)
if ($0 ~ regs[i])
print $0
}
Consider also the side effects of the computed (dynamic) regular expressions,
see the GNU awk manual for more information.
The following link may contain the answer you were looking for :
http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html
It says that in awk script files or on the command line of certain shells, awk commands can be split over several lines in the same manner as makefile commands. Simply end the line with a backslash (\) and awk will discard the newline character upon parsing. Combine this with implicit concatenation of strings (similar to C) and the solution could be
BEGIN {
regexp = "[0-9]{10}," \
"[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}," \
"\\([0-9]{3}\\)?[0-9]{3}-[0-9]{4}"
len = split(regexp, regs, /,/)
}
Nevertheless, I would favor the solution that stores the regular expressions in an array directly: it better reflects the intent of the statement and doesn't force the programmer to do any more work than required. Also, there is no need for the length function since one can use the foreach syntax. One should note that arrays in awk are like maps in Java or dictionaries in Python in that they don't associate a range of integer indices with values. Rather they map string keys to values. Even if integers are used as keys, they are implicitly converted to a string. Thus the length function is not always provided since it is misleading.
BEGIN {
regs[1] = "[0-9]{10}"
regs[2] = "[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}"
regs[3] = "\\([0-9]{3}\\)?[0-9]{3}-[0-9]{4}"
}
{
for (i in regs) { # i recieves each key added to the regs array
if ($0 ~ regs[i]) {
print # by default `print' prints the whole record
break # we can stop finding a regexp
}
}
}
Note that the break command exits the for loop prematurely. This is necessary if each record must only be printed once, even though several regular expressions could match.
Well you can store the regexp in variables, then join them, e.g.:
awk '{
COUNTRYCODE="WHATEVER_YOUR_CONTRY_CODE_REGEXP"
CITY="CITY_REGEXP"
PHONENR="PHONENR_REGEX"
THE_WHOLE_THING=COUNTRYCODE CITY PHONENR
if ($0 ~ THE_WHOLE_THING) { print "BINGO" }
}'
HTH
The concensus seems to be that there is no simple way to split multiline strings without disturbing awk? Thanks for the other ideas, but make me as the programmer do the work of the computer what i don't enjoy. So i came up with this solution, which in my opinion is pretty close to a kind of executable specification. I use the base and here documents and process redicrection to create the files for awk on the fly:
#!/bin/bash
# numbers that should be matched
read -r -d '' VALID <<'valid'
1234567890
123-456-7890
123.456.7890
(123)456-7890
(123) 456-7890
valid
# regexp patterns that should match
read -r -d '' PATTERNS <<'patterns'
[0-9]{10}
[0-9]{3}\.[0-9]{3}\.[0-9]{4}
[0-9]{3}-[0-9]{3}-[0-9]{4}
\([0-9]{3}\) ?[0-9]{3}-[0-9]{4}
patterns
gawk --re-interval 'NR==FNR{reg[FNR]=$0;next}
{for (i in reg)
if ($0 ~ reg[i]) print $0}' <(echo "$PATTERNS") <(echo "$VALID")
Any comments are welcome.
I want to introduce my favorite to this question as it hasn't been mentioned yet. I like to use the simple string append operation of awk, that is just the default operator between two terms, as multiplication in typical math notations:
x = x"more stuff"
appends "more stuff" to x and sets the new value to x again. So you can write
regexp = ""
regexp = regexp"[0-9]{10}"
regexp = regexp"[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}"
regexp = regexp"\\([0-9]{3}\\) ?[0-9]{3}-[0-9]{4}"
To control additional split characters like newlines between the snippets most languages I know of and awk too, can use the array join and split methods to make a string from an array and convert back the string into an array, without loosing the original structure of the array (for example the newline markers):
i = 0
regexp[i++] = "[0-9]{10}"
regexp[i++] = "[0-9]{3}[-.][0-9]{3}[.-][0-9]{4}"
regexp[i++] = "\\([0-9]{3}\\) ?[0-9]{3}-[0-9]{4}"
Using regstr = join(regexp, ",") add the split "," you used.
Of course there is no join function in awk, but I guess it is very simple
to implement, knowing the string append operation above.
My method seem to look more verbose but has the advantage, that the original data, the regexp string snippets in this part, are prepended by a string constant for each snippet. That means the code can be generated by a very simple algorithm (or even some editors shortcuts).