I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage
I would like to be able to print several substrings via awk.
Here an example of what I usually do;
awk' {print substr($0,index($0,string),10)} ' test.txt > result.txt
This allow me to print 10 letters after the discovery of my string.
But the result is the first one substring, instead of several as I expected.
Here an example if I use the string "ATGC" :
test.txt
ATGCATATAAATGCTTTTTTTTT
result.txt
ATGCATATAA
instead of
ATGCATATAA
ATGCTTTTTT
What I have to add ?
I'm sure the answer is easy for you guys !
Thank you for your help.
If you have gawk (gnu awk), you can make use of FPAT:
awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' file
With your example:
$ awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' <<<"ATGCATATAAATGCTTTTTTTTT"
ATGCATATAA
ATGCTTTTTT
awk '{print substr($0,1,10),RS substr($0,length -12,10)}' file
ATGCATATAA
ATGCTTTTTT
I have this kind of log:
2018-10-05 09:12:38 286 <190>1 2018-10-05T09:12:38.474640+00:00 app web - - Class uuid=uuid-number-one cp=xxx action='xxxx'
2018-10-05 10:11:23 286 <190>1 2018-10-05T10:11:23.474640+00:00 app web - - Class uuid=uuid-number-two cp=xxx action='xxxx'
I need to extract uuid and run a second query with:
./getlogs --search 'uuid-number-one OR uuid-number-two'
For the moment for the first query I do this to extract uuid:
./getlogs | grep 'uuid' | awk 'BEGIN {FS="="} { print $2 }' | cut -d' ' -f1
My three question :
I think I could get rid of grep and cut and use only awk?
How could I capture only the value of uuid. I tried awk '/uuid=\S*/{ print $1 }' or awk 'BEGIN {FS="uuid=\\S*"} { print $1 }' but it's a failure.
How could I aggregate the result and turn it into one shell variable that I can use after for the new command?
You could define two field separators:
$ awk -F['= '] '/uuid/{print $12}' file
Result:
uuid-number-one
uuid-number-two
Question 2:
The pattern part in awk just selects lines to process. It doesn't change the internal variables like $1 or NF. You need to do the replacement afterwards:
$ awk '/uuid=/{print gensub(/.*uuid=(\S*).*/, "\\1", "")}' file
Question 3:
var=$(awk -F['= '] '/uuid/{r=r","$12}END{print substr(r,2)}' file)
Implement the actual aggregation for each line (here r=r","$12).
Could you please try following(tested on shown samples and in BASH environment).
awk 'match($0,/uuid=[^ ]*/){print substr($0,RSTART+5,RLENGTH-5)}' Input_file
Solution 2nd: In case your uid is not having space in it then use following.
awk '{sub(/.*uuid=/,"");sub(/ .*/,"")} 1' Input_file
solution 3rd: using sed following may help you(considering that uid is not having any space in its values).
sed 's/\(.*uuid=\)\([^ ]*\)\(.*\)/\2/' Input_file
Solution 4th: using awk field separator method for shown samples.
awk -F'uuid=| cp' '{print $2}' Input_file
To concatenate all values into a shell variable use following.
shell_var=$(awk 'match($0,/uuid=[^ ]*/){val=val?val OFS substr($0,RSTART+5,RLENGTH-5):substr($0,RSTART+5,RLENGTH-5)} END{print val}' Input_file)
I have a file, from which I want to retrieve the first column, and add a comma between each value.
Example:
AAAA 12345 xccvbn
BBBB 43431 fkodks
CCCC 51234 plafad
to obtain
AAAA,BBBB,CCCC
I decided to use awk, so I did
awk '{ $1=$1","; print $1 }'
Problem is: this add a comma also on the last value, which is not what I want to achieve, and also I get a space between values.
How do I remove the comma on the last element, and how do I remove the space? Spent 20 minutes looking at the manual without luck.
$ awk '{printf "%s%s",sep,$1; sep=","} END{print ""}' file
AAAA,BBBB,CCCC
or if you prefer:
$ awk '{printf "%s%s",(NR>1?",":""),$1} END{print ""}' file
AAAA,BBBB,CCCC
or if you like golf and don't mind it being inefficient for large files:
$ awk '{r=r s $1;s=","} END{print r}' file
AAAA,BBBB,CCCC
awk {'print $1","$2","$3'} file_name
This is the shortest I know
Why make it complicated :) (as long as file is not too large)
awk '{a=NR==1?$1:a","$1} END {print a}' file
AAAA,BBBB,CCCC
For better porability.
awk '{a=(NR>1?a",":"")$1} END {print a}' file
You can do this:
awk 'a++{printf ","}{printf "%s", $1}' file
a++ is interpreted as a condition. In the first row its value is 0, so the comma is not added.
EDIT:
If you want a newline, you have to add END{printf "\n"}. If you have problems reading in the file, you can also try:
cat file | awk 'a++{printf ","}{printf "%s", $1}'
awk 'NR==1{printf "%s",$1;next;}{printf "%s%s",",",$1;}' input.txt
It says: If it is first line only print first field, for the other lines first print , then print first field.
Output:
AAAA,BBBB,CCCC
In this case, as simple cut and paste solution
cut -d" " -f1 file | paste -s -d,
In case somebody as me wants to use awk for cleaning docker images:
docker image ls | grep tag_name | awk '{print $1":"$2}'
Surpised that no one is using OFS (output field separator). Here is probably the simplest solution that sticks with awk and works on Linux and Mac: use "-v OFS=," to output in comma as delimiter:
$ echo '1:2:3:4' | awk -F: -v OFS=, '{print $1, $2, $4, $3}' generates:
1,2,4,3
It works for multiple char too:
$ echo '1:2:3:4' | awk -F: -v OFS=., '{print $1, $2, $4, $3}' outputs:
1.,2.,4.,3
Using Perl
$ cat group_col.txt
AAAA 12345 xccvbn
BBBB 43431 fkodks
CCCC 51234 plafad
$ perl -lane ' push(#x,$F[0]); END { print join(",",#x) } ' group_col.txt
AAAA,BBBB,CCCC
$
This can be very simple like this:
awk -F',' '{print $1","$1","$2","$3}' inputFile
where input file is : 1,2,3
2,3,4 etc.
I used the following, because it lists the api-resource names with it, which is useful, if you want to access it directly. I also use a label "application" to find specific apps in a namespace:
kubectl -n ops-tools get $(kubectl api-resources --no-headers=true --sort-by=name | awk '{printf "%s%s",sep,$1; sep=","}') -l app.kubernetes.io/instance=application
I want to find every line of a file that contains any of the strings held in a column of a different file.
I have tried
grep "$(awk '{ print $1 }' file1.txt)" file2.txt
but that just outputs file2.txt in its entirety.
I know I've done this before with a pattern I found on this site, but I can't find that question anymore.
I see in the OP's comment that maybe the question is no longer a question. However, the following slight modification will handle the blank line situation. Just add a check to make sure the line has at least one field:
grep "$(awk '{if (NF > 0) print $1}' file1)" file2
And if the file with the patterns is simply a set of patterns per line, then a much simpler version of it is:
grep -f file1 file2
That causes grep to use the lines in file1 as the patterns.
THere is no need to use grep when you have awk
awk 'FNR==NR&&NF{a[$0];next}($1 in a)' file2 file1
$(awk '{ print $1 }' file1.txt) | grep text > file.txt