Regex to extract multiple words from a paragraph - awk

$ echo file.txt
NAME="Ubuntu" <--- some of this
VERSION="20.04.4 LTS (Focal Fossa)" <--- and some of this
ID=ubuntu
ID_LIKE=debian
VERSION_ID="20.04"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
I want this: Ubuntu 20.04.4 LTS.
I managed with two commands:
echo "$(grep '^NAME="' ./file.txt | sed -E 's/NAME="(.*)"/\1/') $(grep '^VERSION="' ./file.txt | sed -E 's/VERSION="(.*) \(.*"/\1/')"
How could I simplify this to one command using grep/sed or perl?

With your shown samples try following awk code.
awk -F'"' '/^NAME="/{name=$2;next} /^VERSION="/{print name,$2}' Input_file
Explanation:
Setting field separator as " for all the lines here.
Checking condition if line starts with Name= then create variable name which has 2nd field. next will skip all further statements from there of awk program, they needed not to be executed.
Then checking if a line starts from VERSION= then print name and 2nd field here as per requirement.

Here is another awk solution:
awk -F '=?"' '$1 == "NAME" {s = $2; next}
$1 == "VERSION" {sub(/ \(.*/, "", $2); print s, $2}' file
Ubuntu 20.04.4 LTS

Related

Using awk to print without double quotes

I would like to get the right value of the following command as a string without double quotes.
$ grep '^VERSION=' /etc/os-release
VERSION="20.04.3 LTS (Focal Fossa)"
When I pipe it with the following awk, I don't get the desired output.
$ grep '^VERSION=' /etc/os-release | awk '{print $0}'
VERSION="20.04.3 LTS (Focal Fossa)"
$ grep '^VERSION=' /etc/os-release | awk '{print $1}'
VERSION="20.04.3
$ grep '^VERSION=' /etc/os-release | awk '{print $2}'
LTS
How can I fix that?
You may use this single awk command:
awk -F= '$1=="VERSION" {gsub(/"/, "", $2); print $2}' /etc/os-release
20.04.3 LTS (Focal Fossa)
1st solution: With your shown samples, please try following awk code.
awk 'match($0,/^VERSION="[^"]*/){print substr($0,RSTART+9,RLENGTH-9)' Input_file
Explanation: Simple explanation would be, using match function of awk to match starting VERSION=" till next occurrence of " and then printing the matched part(to get only desired output as per OP's shown samples).
2nd solution: Using GNU grep with PCRE regex enabled option try following.
grep -oP '^VERSION="\K[^"]*' Input_file
3rd solution: Using awk's capability to set different field separators and then check conditions accordingly and print values.
awk -F'"' '$1=="VERSION="{print $2}' Input_file
Assuming that "the right value" you want output is 20.04.3:
$ awk -F'[" ]' '/^VERSION=/{print $2}' file
20.04.3
or if it's the whole quoted string:
$ awk -F'"' '/^VERSION=/{print $2}' file
20.04.3 LTS (Focal Fossa)
You can use an awk command like
awk 'match($0, /^VERSION="([^"]*)"/, m) {print m[1]}' /etc/os-release
Here, ^VERSION="([^"]*)" matches VERSION=" at the start of the string (^), then captures into Group 1 any zero or more chars other than " (with ([^"]*)) and then matches ". The match is saved in m where m[1] holds the Group 1 value.
Or, sed like
sed -n '/^VERSION="\([^"]*\)".*/s//\1/p' /etc/os-release
See an online test:
s='VERSION="20.04.3 LTS (Focal Fossa)"'
awk 'match($0, /^VERSION="([^"]*)"/, m) {print m[1]}' <<< "$s"
sed -n '/^VERSION="\([^"]*\)".*/s//\1/p' <<< "$s"
Here, -n option suppresses the default line output, /^VERSION="\([^"]*\)".*/ matches a string starting with VERSION=", then capturing into Group 1 any zero or more chars other than ", and then matching " and the rest of the string, and replacing the whole match with the Group 1 value. // means the previous regex pattern must be used. p only prints the result of the substition.
Both output 20.04.3 LTS (Focal Fossa).
Since the file /etc/os-release conforms to a variable assignment in bash or the shell in general (POSIX), sourcing it should do the job.
source /etc/os-release; echo "$VERSION"
Using a subshell just in case one does not want the pollute the current env variables.
( source /etc/os-release; echo "$VERSION" )
Assigning it to a variable.
version=$( source /etc/os-release; echo "$VERSION" )
If the shell you're using does not conform to POSIX.
sh -c '. /etc/os-release; echo "$VERSION"'
See your local man page if available.
man 5 os-release

Remove '||'(double pipe) separated multiple columns from file using shell script command

Please suggest perfect shell script command to remove last two '||' delimiter separated columns from the file.(Lets assume below example)
File Name: abc.dat
"a1"||"a2"||"a3"||"a4"
"b1"||"b2"||"b3"||"b4"
"c1"||"c2"||"c3"||"c4"
output should be like :
"a1"||"a2"
"b1"||"b2"
"c1"||"c2"
I tried below cut and awk command but not worked:
awk -F '||' '{print $1$2}' ${file} >> ${file}
cut -d'||' -f2 --complement ${file} >> ${file} (not working as cut: the delimiter must be a single character)
With your shown samples, please try following. Make field separator as ||(escaping it to treat literal character) along with setting OFS to || too. Then print 1st and 2nd fields for each line of Input_file.
awk -F'\\|\\|' -v OFS="||" '{print $1,$2}' Input_file
Once you are happy with results of above command, also to make changes into Input_file itself try following.
awk -F'\\|\\|' -v OFS="||" '{print $1,$2}' Input_file > temp && mv temp Input_file
2nd solution: Using GNU grep try following.
grep -oP '^.*?\|\|"[^"]*' Input_file
Rather than assuming || is the delimiter, assume that | is the delimiter and the second field is empty.
$ cut -d'|' -f1-3 <<EOF
> "a1"||"a2"||"a3"||"a4"
> "b1"||"b2"||"b3"||"b4"
> "c1"||"c2"||"c3"||"c4"
> EOF
"a1"||"a2"
"b1"||"b2"
"c1"||"c2"
(This assumes that || was chosen for some aesthetic reason, rather than to allow for single pipes in each field.)
You may use:
awk '{sub(/(\|{2}[^|]*){2}$/, "")} 1' file
"a1"||"a2"
"b1"||"b2"
"c1"||"c2"
Or if you just want to remove last 2 columns without caring how many columns are there in total use:
awk -F '\\|{2}' -v OFS='||' '{
$NF = $(NF-1) = ""
sub(/([|]{2})*$/, "")
} 1' file

Regexp in gawk matches multiples ways

I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage

using awk command to get the correct name

I want to get the filename from a long string in shell script.After reading some example from likegeeks.com,I write a simple solution:
#/bin/bash
cdnurl="http://download.example.com.cn/download/product/vpn/rules/vpn_patch_20190218162130_sign.pkg?wsSecret=9cadeddedfr7bb85a20a064510cd3f353&wsABSTime=5c6ea1e7"
echo ${cndurl}
url=`echo ${cdnurl} | awk -F'/' '{ print $NF }'`
result=`echo ${url} | awk -F '?' '{ print $1}'`
echo ${url}
echo ${result}
I just want to get vpn_patch_20190218162130_sign.pkg,and the it does.I wonder is there any smart ways (may be one line).
If behind pkg it's not ?,how to use pkg to get the filename,I am not sure if always ? after pkg,but the filename always be *.pkg.
You can try : this is more robust as compare to second awk command:
echo "$cdnurl"|awk -v FS='/' '{gsub(/?.*/,"",$NF);print $NF}'
vpn_patch_20190218162130_sign.pkg
#less robust
echo "$cdnurl"|awk -vFS=[?/] '{print $(NF-1)}'
You should use sed :
sed -r 's|.*/(.*.pkg).*|\1|g'

Printing only part of next line after matching a pattern

I want to print next sentence after match
My file content like this:
SSID:CoreFragment
Passphrase:WiFi1234
SSID:CoreFragment_5G
Passphrase:WiFi1234
SSID:Aleph_inCar
Passphrase:1234567890
As per my search,e.g. If I found WIFI-3(SSID) than, I want to print 1234ABCD. I used this command to search SSID:
grep -oP '^SSID:\K.+' file_name
After this search I want to print Passphrase of that particular match.
I'm working on Ubuntu 18.04
ssid=$(grep -oP &apos;^SSID:\K.+&apos; list_wifi.txt)
for ssid in $(sudo iwlist wlp2s0 scan | grep ESSID | cut -d &apos;"&apos; -f2)
do
if [ $ssid == $ssid_name ]; then
echo "SSID found...";
fi
done
I want to print next line after match.
another awk
$ awk -F: -v s="$ssid" '$0=="SSID:"s{c=NR+1} c==NR{print $2; exit}' file
1234ABCD
will only print the value if it's on the next line.
awk -F: '/WIFI-3/{getline;print $2; exit}' file
1234ABCD
Robustly (wont fail due to partial matches, etc.) and idiomatically:
$ awk -F':' 'f{print $2; exit} ($1=="SSID") && ($2=="WIFI-3"){f=1}' file
1234ABCD
Please try the following:
ssid="WIFI-3"
passphrase=$(grep -A 1 "^SSID:$ssid" file_name | tail -n 1 | cut -d: -f2)
echo "$passphrase"
which yields:
1234ABCD
Since code tags have changed the look of samples so adding this now.
var=$(awk '/SSID:[a-zA-Z]+-[0-9]+/{flag=1;next} flag{sub(/.*:/,"");value=$0;flag=""} END{print value}' Input_file)
echo "$var"
Could you please try following.
awk '/Passphrase/ && match($0,/WIFI-3 Passphrase:[0-9a-zA-Z]+/){val=substr($0,RSTART,RLENGTH);sub(/.*:/,"",val);print val;val=""}' Input_file
Using Perl
$ export ssid="WIFI-3"
$ perl -0777 -lne ' /SSID:$ENV{ssid}\s*Passphrase:(\S+)/ and print $1 ' yash.txt
1234ABCD
$ export ssid="Aleph_inCar"
$ perl -0777 -lne ' /SSID:$ENV{ssid}\s*Passphrase:(\S+)/ and print $1 ' yash.txt
1234567890
$
$ cat yash.txt
SSID:CoreFragment
Passphrase:WiFi1234
SSID:CoreFragment_5G
Passphrase:WiFi1234
SSID:Aleph_inCar
Passphrase:1234567890
SSID:WIFI-1
Passphrase:1234ABCD
SSID:WIFI-2
Passphrase:123456789
SSID:WIFI-3
Passphrase:1234ABCD
You can capture it in variables as
$ passphrase=$(perl -0777 -lne ' /SSID:$ENV{ssid}\s*Passphrase:(\S+)/ and print $1 ' yash.txt)
$ echo $passphrase
1234567890
$