I am new to awk and want to have awk in shell script to select ami name in my automation pipeline
{"us-west_ami" :"ami-123" }
I want to select "ami-123" and pass it new job.
I tried to use print $NF but it is not selecting the last value.
if it is json format, use the right tool, e.g. jq:
kent$ jq '."us-west_ami"' <<<'{"us-west_ami" :"ami-123" }'
"ami-123"
print $NF indeed does select the last field but first you need to define what are the record and field separators (RS and FS). In this case it would be easiest to use Gnu awk and define the FPAT:
$ awk 'BEGIN{FPAT="\"[^\"]+\""}{print $NF}' file
"ami-123"
See this for more details on FPAT.
grep is not right tool to parse json, but still for given input this will work
$ grep -oP '"us-west_ami"(\s+)?:\K[^,}]*' <<< '{"us-west_ami" :"ami-123" }'
"ami-123"
To save in variable
$ myvar=$(grep -oP '"us-west_ami"(\s+)?:\K[^,}]*' <<<'{"us-west_ami" :"ami-123" }')
$ echo "$myvar"
"ami-123"
Related
cat file1.txt | awk -F '{print $1 "|~|" $2 "|~|" $3}' > file2.txt
I am using above command to filter first three columns from file1 and put into file.
But only getting the column names and not the column data.
How to do that?
|~| - is the delimiter.
file1.txt has values as :
a|~|b|~|c|~|d|~|e
1|~|2|~|3|~|4|~|5
11|~|22|~|33|~|44|~|55
111|~|222|~|333|~|444|~|555
my expedted output is :
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
With your shown samples, please try following awk code. You need to set field separator to |~| and remove starting space from lines, then print the lines.
awk -F'\\|~\\|' -v OFS='|~|' '{sub(/^[[:blank:]]+/,"");print $1,$2,$3}' Input_file
In case you want to keep spaces(which was in initial post before edit) then try following:
awk -F'\\|~\\|' -v OFS='|~|' '{print $1,$2,$3}' Input_file
NOTE: Had a chat with user in room and got to know why this code was not working for user because of gunzip -c file was being used wrongly, its output was being saved into a variable on which user was running awk program, so correcting that command generated right file and awk program ran fine on it. Adding this as a reference for future readers.
One approach would be:
awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
The approach simply replaces all "|~|" with a "," setting the output file separator to "|~|". All leading whitespace is trimmed with sub().
Example Use/Output
With your data in file1.txt, you would have:
$ awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
Let me know if this is what you intended. You can simply redirect, e.g. > file2.txt to write to the second file.
For such cases, my bash+awk script rcut comes in handy:
rcut -Fd'|~|' -f-3 ip.txt
The -F option enables fixed string input delimiter (which is given using the -d option). And by default, the output field separator will also be same as -d when -F is active. -f-3 is similar to cut syntax to specify first three fields.
For better speed, use hck command:
hck -Ld'|~|' -D'|~|' -f-3 ip.txt
Here, -L enables literal field separator and -D specifies output field separator.
Another benefit is that hck supports -z option to automatically handle common compressed formats based on filename extension (adding this since OP had an issue with compressed input).
Another way:
sed 's/|~|/\t/g' file1.txt | awk '{print $1"|~|"$2"|~|"$3}' > file2.txt
First replace the |~| delimiter, and use the default awk separator, then print columns what you need.
I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile
I would like to be able to print several substrings via awk.
Here an example of what I usually do;
awk' {print substr($0,index($0,string),10)} ' test.txt > result.txt
This allow me to print 10 letters after the discovery of my string.
But the result is the first one substring, instead of several as I expected.
Here an example if I use the string "ATGC" :
test.txt
ATGCATATAAATGCTTTTTTTTT
result.txt
ATGCATATAA
instead of
ATGCATATAA
ATGCTTTTTT
What I have to add ?
I'm sure the answer is easy for you guys !
Thank you for your help.
If you have gawk (gnu awk), you can make use of FPAT:
awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' file
With your example:
$ awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' <<<"ATGCATATAAATGCTTTTTTTTT"
ATGCATATAA
ATGCTTTTTT
awk '{print substr($0,1,10),RS substr($0,length -12,10)}' file
ATGCATATAA
ATGCTTTTTT
I would like to take the number after the - sign and put is as column 2 in my matrix. I know how to grep the string but not how to print it after the text string.
in:
1-967764 GGCTGGTCCGATGGTAGTGGGTTATCAGAACT
3-425354 GCATTGGTGGTTCAGTGGTAGAATTCTCGCC
4-376323 GGCTGGTCCGATGGTAGTGGGTTATCAGAAC
5-221398 GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT
6-180339 TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT
out:
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
awk -F'[[:space:]-]+' '{print $3,$2}' file
Seems like a simple substitution should do the job:
sed -E 's/[0-9]+-([0-9]+)[[:space:]]*(.*)/\2 \1/' file
Capture the parts you're interested in and use them in the replacement.
Alternatively, using awk:
awk 'sub(/^[0-9]+-/, "") { print $2, $1 }' file
Remove the leading digits and - from the start of the line. When this is successful, sub returns true, so the action is performed, printing the second field, followed by the first.
Using regex ( +|-) as field separator:
$ awk -F"( +|-)" '{print $3,$2}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
here is another awk
$ awk 'split($1,a,"-") {print $2,a[2]}' file
awk '{sub(/.-/,"");print $2,$1}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!