awk sed grep to extract patten with special characters - awk

I am trying to understant the switchs and args in awk and sed
For instance, to get the number next to nonce in the line form the file response.xml:
WWW-Authenticate: ServiceAuth realm="WinREST", nonce="1828HvF7EfPnRtzSs/h10Q=="
I use by suggestion of another member
nonce=$(sed -nE 's/.*nonce="([^"]+)"/\1/p' response.xml)
to get the numbers next to the word idOperation in the line below I was trying :
idOper=$(sed -nE 's/.*idOperation="([^"]+)"/\1/p' zarph_response.xml)
line to extract the number:
{"reqStatus":{"code":0,"message":"Success","success":true},"idOperation":"185-16-6"}
how do I get the 185-16-6 ?
and if the data to extract has no ""
like the 1 next to operStatus ?
{"reqStatus":{"code":0,"message":"Success","success":true},"operStatus":1,"amountReceived":0,"amountDismissed":0,"currency":"EUR"}

Following awk may help you on same.
awk -F"\"" '/idOperation/{print $(NF-1)}' Input_file
Solution 2nd: In sed following may help you on same.
sed '/idOperation/s/\(.*:\)"\([^"]*\)\(.*\)/\2/' Input_file
EDIT: In case you want to get the digit after string operStatus then following may help you on same.
awk 'match($0,/operStatus[^,]*/){;print substr($0,RSTART+12,RLENGTH-12)}' Input_file

Using grep perl-style regexes
grep -oP "nonce=\"\K(.*)?(?=\")" Input_file
grep -oP "idOperation\":\"\K(.*)?(?=\")" Input_file
If the input is json you can use jq
jq .idOperation Input_file

Related

How to print specific string from a sentence using awk

I have the following sentence within a file
FQDN=joe.blogs.com.
How can I print the string "joe"
I have tried using -->> awk -F"=" '{print $2}' file
but this returns joe.blogs.com as "=" is the delimiter.
Is it possible to use 2 delimiters on the same line?
You might use regular expression as FS. Let file.txt content be
FQDN=joe.blogs.com.
then
awk 'BEGIN{FS="[=.]"}{print $2}' file.txt
output
joe
In case you are ok with sed, could you please try following.
sed 's/.*=\([^.]*\)\..*/\1/' Input_file
With GNU grep and using its -oP flag we could try following too.
grep -oP '(.*=)\K([^.]*)' Input_file
You could use GNU grep:
grep -oP '(?<=FQDN=)[^.]+' file
^ all characters up to a '.'
^ lookbehind for 'FQDN='
^ only print match and Perl style regex
Or with Perl:
perl -lne 'print $1 if /(?<=FQDN=)([^.]+)/' file
With awk I would probably do:
awk 'BEGIN{FS="[.=]"} /FQDN=/{print $2}' file
why not keeping it simple and pipe awk?
awk -F"=" '{print $2}' | awk -F"." '{print $1}'
can I use two field delimiters on one line?
No. You may do further string manipulation as post processing, or you could use a regex as field delimiter.
Another option is to use awk's split function:
awk -F= '{ split($2,map,".");print map[1] }' file
Split the second = separated field into the array map using "." as the delimiter. Print the first index of the array.

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

spacial character replacement using sed or awk

I have a string in a dat file
Times-New $\Gamma$
and I want to replace it using sed or awk (I would prefer to use awk) utility with
/Symbol G
I could replace only one $ sign but failed for the remaining text.
So what I want is
sed 's/Times-New $\Gamma$/Symbol G/g' case.dat
Could you please help me?
Could you please try following, written and tested with shown samples. We need to escape \ and $ to make awk treat them as a literal character in program.
awk '{gsub(/Times-New \$\\Gamma\$/,"/Symbol G")} 1' Input_file
OR as per OP's comment try(Thanks to anubhava sir for adding solution):
awk '{gsub(/Times-New[ \t]+(G|\$\\Gamma\$)/, "/Symbol G")} 1' Input_file
With GNU sed try:
sed -E 's/Times-New \$\\Gamma\$/Symbol G/g' Input_file

isolate similar data from stream

We parse data of the following format -
35953539535393 BG |..|...|REF_DATA^1^Y^|...|...|
35953539535393 B |..|...|REF_DATA_IND^1^B^|...|...|
We need to print unique values of REF_DATA* appearing in the file using script.
So,the output of the above data would be :
REF_DATA^1^Y^
REF_DATA_IND^1^B^
How do we achieve this using grep ,sed or awk - using a one-liner script.
This might work for you (GNU sed & sort):
sed '/\n/!s/[^|]*REF_DATA[^|]*/\n&\n/;/^[^|]*REF_DATA/P;D' file | sort -u
Surround the intended strings by newlines, print only those strings on separate lines and sort those lines showing only unique values.
Could you please try following and let me know if this helps you.
awk 'match($0,/REF_DATA[^|]*/){val=substr($0,RSTART,RLENGTH);if(!array[val]++){print val}}' Input_file
Adding a non-one liner form of solution too now.
awk '
match($0,/REF_DATA[^|]*/){
val=substr($0,RSTART,RLENGTH);
if(!array[val]++){
print val
}
}' Input_file
Assuming you have GNU grep:
command_to_produce_data | grep -oP '(?<=[|])REF_DATA.+?(?=[|])' | sort -u
awk -F\| '{print $4}' file
REF_DATA^1^Y^
REF_DATA_IND^1^B^

how to extract line portion on the basis of start substring and end substring using sed or awk

I have a multiline file with text having no spaces.
Thereisacat;whichisverycute.Thereisadog;whichisverycute.
Thereisacat;whichisverycute.Thereisadog;whichisverycute.
I want to extract string between cat and cute (first occurrence not second) that is the output is
;whichisvery
;whichisvery
I am close to getting it but I end up getting string from cat to the last cute with the command from here.
sed -e 's/.*cat\(.*\)cute.*/\1/'
I am getting
;whichisverycute.Thereisadog;whichisvery
;whichisverycute.Thereisadog;whichisvery
How can I get the text from cat to the first occurrence of cute not last?
Given the input you posted all you need is:
$ awk -F'cat|cute' '{print $2}' file
;whichisvery
;whichisvery
In sed:
sed 's/cute.*//;s/.*cat//' Input_file
In awk:
awk '{sub(/cute.*/,"");sub(/^.*cat/,"");print}' Input_file