awk Search for a petern,that is inside a variable, in a file that starts with that pattern - awk

I want search for specific pattern, that i have inside a variable, in a file and that pattern must be the starting point of the line to print the line.
I have done it with grep here:
grep -n "^"$curdate"" ./file
Now i want to do the same with awk. i have done this with awk:
awk -v pat="$input" -F ":" '$0~pat{print NR") "$2 }' ./file
But the problem with the awk code above is that it prints every line that contains the pattern even if if it finds it in the middle of the line and not ONLY on the start!!
I think the solution is easy but i cannot find the syntax for that!

Could you please try following, since there are no samples so couldn't test it but should work. Using regexp ^ here which indicates that we are looking for value which starts with in each line.
awk -v pat="$input" -F ":" '$0~"^"pat{print NR") "$2 }' ./file
2nd solution: Using index option of awk try following.
awk -v pat="$input" -F ":" 'index($0,pat)==1{print NR") "$2 }' Input_file
3rd solution: Using substr method of awk:
awk -v pat="$input" -F ":" 'substr($0,1,length(pat))==var{print NR") "$2 }' Input_file

Related

How can I extract using sed or awk between newlines after a specific pattern?

I like to check if there is other alternatives where I can print using other bash commands to get the range of IPs under #Hiko other than the below sed, tail and head which I actually figured out to get what I needed from my hosts file.
I'm just curious and keen in learning more on bash, hope I could gain more knowledge from the community.
:D
$ sed -n '/#Hiko/,/#Pico/p' /etc/hosts | tail -n +3 | head -n -2
/etc/hosts
#Tito
192.168.1.21
192.168.1.119
#Hiko
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
#Pico
192.168.1.23
192.168.1.93
192.168.1.121
1st solution: With shown samples could you please try following. Written and tested in GNU awk.
awk -v RS= '/#Pico/{exit} /#Hiko/{found=1;next} found' Input_file
Explanation:
awk -v RS= ' ##Starting awk program from here.
/#Pico/{ ##Checking condition if line has #Pico then do following.
exit ##exiting from program.
}
/#Hiko/{ ##Checking condition if line has #Hiko is present in line.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found ##Checking condition if found is SET then print the line.
' Input_file ##mentioning Input_file name here.
2nd solution: Without using RS function try following.
awk '/#Pico/{exit} /#Hiko/{found=1;next} NF && found' Input_file
3rd solution: You could look for record #Hiko and then could print its next record and come out with shown samples.
awk -v RS= '/#Hiko/{found=1;next} found{print;exit}' Input_file
NOTE: These all solutions above check if string #Hiko or #Pico are present in anywhere in line, in case you want to look exact string then change above only /#Hiko/ and /#Pico/ part to /^#Hiko$/ and /^#Pico$/ respectively.
With sed (checked with GNU sed, syntax might differ for other implementations)
$ sed -n '/#Hiko/{n; :a n; /^$/q; p; ba}' /etc/hosts
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
-n turn off automatic printing of pattern space
/#Hiko/ if line contains #Hiko
n get next line (assuming there's always an empty line)
:a label a
n get next line (using n will overwrite any previous content in the pattern space, so only single line content is present in this case)
/^$/q if the current line is empty, quit
p print the current line
ba branch to label a
You can use
awk -v RS= '/^#Hiko$/{getline;print;exit}' file
awk -v RS= '$0 == "#Hiko"{getline;print;exit}' file
Which means:
RS= - make awk read the file paragraph by paragraph
/^#Hiko$/ or '$0 == "#Hiko" - finds a paragraph that is equal to #Hiko
{getline;print;exit} - gets the next paragraph, prints it and exits.
See the online demo.
You may use:
awk -v RS= 'p && NR == p + 1; $1 == "#Hiko" {p = NR}' /etc/hosts
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
This might work for you (GNU sed):
sed -n '/^#/h;G;/^[0-9].*\n#Hiko/P' file
Copy the header to the hold buffer.
Append the hold buffer to each line.
If the line begins with a digit and contains the required header, print the first line in the pattern space.

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

Using Awk to search for a string that has spaces

I'm having trouble searching for the last occurrence of a string in a file using awk. I'm passing a string to the script example "Ping has failed on hostname". I keep getting awk: ^ unterminated string.
#!/bin/sh
LOG=/opt/netcool/omnibus/log/mttrapd.log
TMP_FILE=sitescope.$$
args="$*"
#ruby sitescope.rb
echo "looking for $1 "
tail -1000 $LOG > $TMP_FILE
echo "WORD = $args"
awk '"/'$args'/" {f=$0} END{print f}' $TMP_FILE > data.out
rm -f $TMP_FILE
Rather than play quoting games, pass the shell variable to awk with the -v option
awk -v pattern="$*" 'match($0, pattern) {f=$0} END {print f}'
The point of the single-quotes around the awk string is to keep everything in the first argument (and prevent shell substitution). You can be a bit more flexable with how you put that argument together as
awk "/$args/"' {f=$0} END{print f}' $TMP_FILE > data.out