I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile
Related
I have a file called DB_create.sql which has this line
CREATE DATABASE testrepo;
I want to extract only testrepo from this. So I've tried
cat DB_create.sql | awk '{print $3}'
This gives me testrepo;
I need only testrepo. How do I get this ?
With your shown samples, please try following.
awk -F'[ ;]' '{print $(NF-1)}' DB_create.sql
OR
awk -F'[ ;]' '{print $3}' DB_create.sql
OR without setting any field separators try:
awk '{sub(/;$/,"");print $3}' DB_create.sql
Simple explanation would be: making field separator as space OR semi colon and then printing 2nd last field($NF-1) which is required by OP here. Also you need not to use cat command with awk because awk can read Input_file by itself.
Using gnu awk, you can set record separator as ; + line break:
awk -v RS=';\r?\n' '{print $3}' file.sql
testrepo
Or using any POSIX awk, just do a call to sub to strip trailing ;:
awk '{sub(/;$/, "", $3); print $3}' file.sql
testrepo
You can use
awk -F'[;[:space:]]+' '{print $3}' DB_create.sql
where the field separator is set to a [;[:space:]]+ regex that matches one or more occurrences of ; or/and whitespace chars. Then, Field 3 will contain the string you need without the semi-colon.
More pattern details:
[ - start of a bracket expression
; - a ; char
[:space:] - any whitespace char
] - end of the bracket expression
+ - a POSIX ERE one or more occurrences quantifier.
See the online demo.
Use your own code but adding the function sub():
cat DB_create.sql | awk '{sub(/;$/, "",$3);print $3}'
Although it's better not using cat. Here you can see why: Comparison of cat pipe awk operation to awk command on a file
So better this way:
awk '{sub(/;$/, "",$3);print $3}' file
I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage
I like to check if there is other alternatives where I can print using other bash commands to get the range of IPs under #Hiko other than the below sed, tail and head which I actually figured out to get what I needed from my hosts file.
I'm just curious and keen in learning more on bash, hope I could gain more knowledge from the community.
:D
$ sed -n '/#Hiko/,/#Pico/p' /etc/hosts | tail -n +3 | head -n -2
/etc/hosts
#Tito
192.168.1.21
192.168.1.119
#Hiko
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
#Pico
192.168.1.23
192.168.1.93
192.168.1.121
1st solution: With shown samples could you please try following. Written and tested in GNU awk.
awk -v RS= '/#Pico/{exit} /#Hiko/{found=1;next} found' Input_file
Explanation:
awk -v RS= ' ##Starting awk program from here.
/#Pico/{ ##Checking condition if line has #Pico then do following.
exit ##exiting from program.
}
/#Hiko/{ ##Checking condition if line has #Hiko is present in line.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found ##Checking condition if found is SET then print the line.
' Input_file ##mentioning Input_file name here.
2nd solution: Without using RS function try following.
awk '/#Pico/{exit} /#Hiko/{found=1;next} NF && found' Input_file
3rd solution: You could look for record #Hiko and then could print its next record and come out with shown samples.
awk -v RS= '/#Hiko/{found=1;next} found{print;exit}' Input_file
NOTE: These all solutions above check if string #Hiko or #Pico are present in anywhere in line, in case you want to look exact string then change above only /#Hiko/ and /#Pico/ part to /^#Hiko$/ and /^#Pico$/ respectively.
With sed (checked with GNU sed, syntax might differ for other implementations)
$ sed -n '/#Hiko/{n; :a n; /^$/q; p; ba}' /etc/hosts
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
-n turn off automatic printing of pattern space
/#Hiko/ if line contains #Hiko
n get next line (assuming there's always an empty line)
:a label a
n get next line (using n will overwrite any previous content in the pattern space, so only single line content is present in this case)
/^$/q if the current line is empty, quit
p print the current line
ba branch to label a
You can use
awk -v RS= '/^#Hiko$/{getline;print;exit}' file
awk -v RS= '$0 == "#Hiko"{getline;print;exit}' file
Which means:
RS= - make awk read the file paragraph by paragraph
/^#Hiko$/ or '$0 == "#Hiko" - finds a paragraph that is equal to #Hiko
{getline;print;exit} - gets the next paragraph, prints it and exits.
See the online demo.
You may use:
awk -v RS= 'p && NR == p + 1; $1 == "#Hiko" {p = NR}' /etc/hosts
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
This might work for you (GNU sed):
sed -n '/^#/h;G;/^[0-9].*\n#Hiko/P' file
Copy the header to the hold buffer.
Append the hold buffer to each line.
If the line begins with a digit and contains the required header, print the first line in the pattern space.
I want search for specific pattern, that i have inside a variable, in a file and that pattern must be the starting point of the line to print the line.
I have done it with grep here:
grep -n "^"$curdate"" ./file
Now i want to do the same with awk. i have done this with awk:
awk -v pat="$input" -F ":" '$0~pat{print NR") "$2 }' ./file
But the problem with the awk code above is that it prints every line that contains the pattern even if if it finds it in the middle of the line and not ONLY on the start!!
I think the solution is easy but i cannot find the syntax for that!
Could you please try following, since there are no samples so couldn't test it but should work. Using regexp ^ here which indicates that we are looking for value which starts with in each line.
awk -v pat="$input" -F ":" '$0~"^"pat{print NR") "$2 }' ./file
2nd solution: Using index option of awk try following.
awk -v pat="$input" -F ":" 'index($0,pat)==1{print NR") "$2 }' Input_file
3rd solution: Using substr method of awk:
awk -v pat="$input" -F ":" 'substr($0,1,length(pat))==var{print NR") "$2 }' Input_file
I have this kind of log:
2018-10-05 09:12:38 286 <190>1 2018-10-05T09:12:38.474640+00:00 app web - - Class uuid=uuid-number-one cp=xxx action='xxxx'
2018-10-05 10:11:23 286 <190>1 2018-10-05T10:11:23.474640+00:00 app web - - Class uuid=uuid-number-two cp=xxx action='xxxx'
I need to extract uuid and run a second query with:
./getlogs --search 'uuid-number-one OR uuid-number-two'
For the moment for the first query I do this to extract uuid:
./getlogs | grep 'uuid' | awk 'BEGIN {FS="="} { print $2 }' | cut -d' ' -f1
My three question :
I think I could get rid of grep and cut and use only awk?
How could I capture only the value of uuid. I tried awk '/uuid=\S*/{ print $1 }' or awk 'BEGIN {FS="uuid=\\S*"} { print $1 }' but it's a failure.
How could I aggregate the result and turn it into one shell variable that I can use after for the new command?
You could define two field separators:
$ awk -F['= '] '/uuid/{print $12}' file
Result:
uuid-number-one
uuid-number-two
Question 2:
The pattern part in awk just selects lines to process. It doesn't change the internal variables like $1 or NF. You need to do the replacement afterwards:
$ awk '/uuid=/{print gensub(/.*uuid=(\S*).*/, "\\1", "")}' file
Question 3:
var=$(awk -F['= '] '/uuid/{r=r","$12}END{print substr(r,2)}' file)
Implement the actual aggregation for each line (here r=r","$12).
Could you please try following(tested on shown samples and in BASH environment).
awk 'match($0,/uuid=[^ ]*/){print substr($0,RSTART+5,RLENGTH-5)}' Input_file
Solution 2nd: In case your uid is not having space in it then use following.
awk '{sub(/.*uuid=/,"");sub(/ .*/,"")} 1' Input_file
solution 3rd: using sed following may help you(considering that uid is not having any space in its values).
sed 's/\(.*uuid=\)\([^ ]*\)\(.*\)/\2/' Input_file
Solution 4th: using awk field separator method for shown samples.
awk -F'uuid=| cp' '{print $2}' Input_file
To concatenate all values into a shell variable use following.
shell_var=$(awk 'match($0,/uuid=[^ ]*/){val=val?val OFS substr($0,RSTART+5,RLENGTH-5):substr($0,RSTART+5,RLENGTH-5)} END{print val}' Input_file)