user input inside awk -- or -- variable usage in search pattern [duplicate] - variables

This question already has answers here:
get the user input in awk
(3 answers)
Closed 6 years ago.
I'm trying to take a user input in a pure [g]awk code. now the requirement is that I want the user to enter either today or current - number of days to generate a report. I can't find any routine inside the awk to read user's input. sometime back I had read a document on awk where it was done using either sprintf or printf, but I dont know how.
OR
in awk, I'm using BEGIN block to setup a variable and then search based on that, but not finding it quite helpful to search the variable based search. something like below:
awk -F "|" ' BEGIN { PWR="Nov 3"; }
/Deployment started at PWR/ { print $1 + $NF }' /var/log/deployments
this offensively denies me any search for the pattern of "Deployment started at Nov 3".

Inside the regex slashes, you don't have access to your variables. What you can do is make a string out of the search phrase then apply that string as a regex.
awk -F "|" ' BEGIN { PWR="Nov 3"; }
$0 ~ "Deployment started at "PWR { print $1 + $NF }' /var/log/deployments

Related

filter unique parameters from file

i have file contains urls plus params like following
https://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
https://example.com/endpoint/endpoint2/?param1=123
https://example.com/endpoint/endpoint2/
https://example.com/endpoint/endpoint5/"//i.example.com/00/s/Nzk5WDEwMjQ=/z/47IAAOSwBu5hXIKF
and i need to filter only urls with unique params
the desired output
http://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
i managed to filter only urls with params with grep
grep -E '(\?[a-zA-Z0-9]{1,9}\=)'
but i need to filter params in the same time so i tried with awk with the same regex
but it gives error
awk '{sub(\?[a-zA-Z0-9]{1,9}\=)} !seen[$0]++'
update
i am sorry for editing the desired output but when i tried the scripts i figured out that their a lot of carbege in my file need to filter too.
i tried #James Brown with some editing and it looks good till the end line it dose not filter it unfortunately
awk -F '?|&' '$2&&!a[$2]++'
and to be more clear why the that output is good for me
it chosed the 1 st line because it has at least param1
2nd line because it has at least param3
3 line because it has at least param2
the comparison method here is choose just unique parameter whatever it concatenate with others with & char or not
Edited version after the reqs changes some:
$ awk -F? '{ # ? as field delimiter
split($2,b,/&/) # split at & to get whats between ? and &
if(b[1]!=""&&!a[b[1]]++) # no ? means no $2
print
}' file
Output as expected. Original answer was:
A short one:
$ awk -F? '$2&&!a[$2]++' file
Explained: Split records at ? (-F?) and if there is a second field ($2) and (&&) it is unique this far by counting the instances of the parameters in the array a (!a[$2]++), output it.
EDIT: Following solution may help when query string has ? as well as & present in it and we want to consider both of them for removing duplicates.
awk '
/\?/{
match($0,/\?[^&]*/)
val=substr($0,RSTART,RLENGTH)
match($0,/&.*/)
if(!seen[val]++ && !seen[substr($0,RSTART,RLENGTH)]++){
print
}
}' Input_file
2nd solution: (Following solution may help when we don't have & parameters in query string) With your shown samples, please try following awk program.
awk 'match($0,/\?.*$/) && !seen[substr($0,RSTART,RLENGTH)]++' Input_file
OR above could be shorten to as follows:(as per Ed sir's suggestions):
awk 's=index($0,"?") && !seen[substr($0,s)]++' Input_file
Explanation: Simple explanation would be, using match function of awk which matches everything from ? to till end of line value. Then adding an AND condition to it to make sure we get only unique values out of all matched values in all lines.
With gnu awk, you could also match the url till the first occurrence of the question mark, and then capture what follows using your initial pattern for the first parameter ([a-zA-Z0-9]{1,9}=[^&]+) followed by matching any character except the &
Then you can use the !seen[$0]++ part with the value of capture group 1.
awk '
match($0, /https?:\/\/[^?]+\?([a-zA-Z0-9]{1,9}=[^&]+)/, arr) && !seen[arr[1]]++
' file
Output
https://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
Using awk you can check that the string starts with the protocol and contains a question mark.
Then to get the first parameter only, you can split on ? and & and use the second part of the split for seen
awk '
/^https?:\/\/[^?]*\?/ && split($0, arr, /[?&]/) > 1 && !seen[arr[2]]++
' file

Replace part of string with variable using sed or awk [duplicate]

This question already has answers here:
replace parts of url with sed in a file
(2 answers)
Closed 1 year ago.
Is there a way to replace only a part of a string with a random number using sed or awk?
From a URL list file, i need to replace each GET url param that matches the pattern with a random number.
Example:
https://example.com?param1=somevalue&param2=somevalue
https://otherexample.com?param1=somevalue&param2=somevalue
Replace "param1=" with random number
So, after replace:
https://example.com?param1=512512412&param2=somevalue
https://otherexample.com?param1=9568478547&param2=somevalue
basically same syntax for sed and awk - the extended regex is
s/(param)([0-9])=([0-9]{3,})/\1\2= { … fill in rand number u want … } /g'
awk side, either mawk or gawk
match($0, /(param)([0-9])=([0-9][0-9]+)/);
$0 = substr($0,1,RSTART + 6)
{…fill in rand #…}
substr($0, RSTART+RLENGTH);
Unfortunately, since mawk2 has neither backreferences like \1 \2 nor bracket expansion like {3,} , sadly if u wanna write portable code then match is the way to go.
if u wanna stay on gawk, then gensub works reasonably okay.

Find strings for 'awk' in the squid log [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 5 years ago.
I want to do the following in the code below when I find these 2 strings, I want to get the IP of the same line that found these string (at least one of them) and throw the IP into a .txt file so that I can handle it in squid.conf .
I'm trying to build a Splash Page in squid, and I only have the features of IPcop. Because of the code that I put up, it does not work because it compares any string, not the ones I need. Can anyone help?
#!/bin/sh
TAIL="/usr/bin/tail -f"
SQUID="/var/log/squid/access.log"
PRINCIPAL1="http://cartilha.cert.br/"
PRINCIPAL2="cartilha.cert.br:443"
LOG="/tmp/autenticados.txt"
$TAIL $SQUID | gawk '{if ($7 = $PRINCIPAL1 || $7 = $PRINCIPAL2) {print $3} }' >> $LOG
With the variable -v does not accept conditional.
I do not think the topic is duplicated, because in the one they have passed, it has no conditional whatsoever.
I tried instead of just putting = = = = but doing so it does not give anything ..
The idea is simple, when accessing one of these links above I need to pick which IP accessed and play in a txt ... just that.

Change FS and RS to parse newline char [duplicate]

This question already has answers here:
Read lines from a file into a Bash array [duplicate]
(6 answers)
Closed 6 years ago.
I'm using awk in a shell script to parse a file.
My question has been marked as duplicate of an other, but I want to use awk and I didn't find the same question
Here is the file format:
Hi everyone I'm new\n
Can u help me please\n
to split this file\n
with awk ?\n
The result I hope:
tab[0]=Hi everyone I'm new
tab[1]=Can u help me please
tab[2]=to split this file
tab[3]=with awk ?
So I tried to change FS and RS values to tried get what I wanted but without success. Here what I tried:
config=`cat $1`
tab=($(echo $config | awk '
{
for (i = 1; i < (NF); i++)
print $i;
}'))
And what I get:
Hi
everyone
I'm
new
Can
u
help
me
please
to
split
this
file
with
awk
Do u know how to proceed please ? :/
The problem is that however you parse the file in awk, it's returned to the shell as a simple string.
AWK splits a file into records (line ending in \n), and records are further split into fields (separated by FS, space by default).
In order to assign the returned string to an array, you need to set the shell's IFS to newline, or assign the lines to array items one by one (you can filter the record with NR, which would then require you to read the file several times with AWK).
Your best course of action is to just print the records in AWK and assign them to a bash array using compound assignment, with IFS set to newline character
#/bin/bash
declare -a tab
IFS='
'
# Compount assignment: array=(words)
# Print record: { print } is the same as { print $0 }
# where $0 is the record and $1 ... $N are the fields in the record
tab=($(awk '{ print }' file))
unset IFS
for index in ${!tab[#]}; do
echo "${index}: ${tab[index]}"
done
# Output:
# 0: Hi everyone I'm new
# 1: Can u help me please
# 2: to split this file
# 3: with awk ?
Notice that awk is hardly used at all, and should be replaced with simple cat.

awk: any way to access the matched groups in action? [duplicate]

This question already has answers here:
AWK: Access captured group from line pattern
(7 answers)
Closed 8 years ago.
I often find myself doing the same match in the action as the pattern, to access some part of the input record, e.g.
/^Compiled from \"(.*)\"$/ {
file_name = gensub("^Compiled from \"(.*)\"$", "\\1", "g");
print file_name;
}
So the regexp matching is done twice. Is there any way I can access \\1 in the action without matching again?
I am trying to both reduce on pattert matching and extra code.
Unfortunately, GAWK, doesn't have the carry-forward feature of sed which uses an empty //.
sed '/\(patt\)ern/ {s//new\1/}' inputfile
However, you can rejoice since variables have recently been invented and they can be used for just this purpose!
BEGIN {
pattern = "^Compiled from \"(.*)\"$"
}
$0 ~ pattern {
file_name = gensub(pattern, "\\1", "");
print file_name;
}