use awk to print a column, adding a comma - awk

I have a file, from which I want to retrieve the first column, and add a comma between each value.
Example:
AAAA 12345 xccvbn
BBBB 43431 fkodks
CCCC 51234 plafad
to obtain
AAAA,BBBB,CCCC
I decided to use awk, so I did
awk '{ $1=$1","; print $1 }'
Problem is: this add a comma also on the last value, which is not what I want to achieve, and also I get a space between values.
How do I remove the comma on the last element, and how do I remove the space? Spent 20 minutes looking at the manual without luck.

$ awk '{printf "%s%s",sep,$1; sep=","} END{print ""}' file
AAAA,BBBB,CCCC
or if you prefer:
$ awk '{printf "%s%s",(NR>1?",":""),$1} END{print ""}' file
AAAA,BBBB,CCCC
or if you like golf and don't mind it being inefficient for large files:
$ awk '{r=r s $1;s=","} END{print r}' file
AAAA,BBBB,CCCC

awk {'print $1","$2","$3'} file_name
This is the shortest I know

Why make it complicated :) (as long as file is not too large)
awk '{a=NR==1?$1:a","$1} END {print a}' file
AAAA,BBBB,CCCC
For better porability.
awk '{a=(NR>1?a",":"")$1} END {print a}' file

You can do this:
awk 'a++{printf ","}{printf "%s", $1}' file
a++ is interpreted as a condition. In the first row its value is 0, so the comma is not added.
EDIT:
If you want a newline, you have to add END{printf "\n"}. If you have problems reading in the file, you can also try:
cat file | awk 'a++{printf ","}{printf "%s", $1}'

awk 'NR==1{printf "%s",$1;next;}{printf "%s%s",",",$1;}' input.txt
It says: If it is first line only print first field, for the other lines first print , then print first field.
Output:
AAAA,BBBB,CCCC

In this case, as simple cut and paste solution
cut -d" " -f1 file | paste -s -d,

In case somebody as me wants to use awk for cleaning docker images:
docker image ls | grep tag_name | awk '{print $1":"$2}'

Surpised that no one is using OFS (output field separator). Here is probably the simplest solution that sticks with awk and works on Linux and Mac: use "-v OFS=," to output in comma as delimiter:
$ echo '1:2:3:4' | awk -F: -v OFS=, '{print $1, $2, $4, $3}' generates:
1,2,4,3
It works for multiple char too:
$ echo '1:2:3:4' | awk -F: -v OFS=., '{print $1, $2, $4, $3}' outputs:
1.,2.,4.,3

Using Perl
$ cat group_col.txt
AAAA 12345 xccvbn
BBBB 43431 fkodks
CCCC 51234 plafad
$ perl -lane ' push(#x,$F[0]); END { print join(",",#x) } ' group_col.txt
AAAA,BBBB,CCCC
$

This can be very simple like this:
awk -F',' '{print $1","$1","$2","$3}' inputFile
where input file is : 1,2,3
2,3,4 etc.

I used the following, because it lists the api-resource names with it, which is useful, if you want to access it directly. I also use a label "application" to find specific apps in a namespace:
kubectl -n ops-tools get $(kubectl api-resources --no-headers=true --sort-by=name | awk '{printf "%s%s",sep,$1; sep=","}') -l app.kubernetes.io/instance=application

Related

Regexp in gawk matches multiples ways

I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage

Why does awk not filter the first column in the first line of my files?

I've got a file with following records:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt;2;CLI001
depots/import/HDN1YYAA_20102018.txt;32;CLI001
depots/import/HDN1YYAA_25102018.txt;1;CAB001
depots/import/HDN1YYAA_50102018.txt;1;CAB001
depots/import/HDN1YYAA_65102018.txt;1;CAB001
depots/import/HDN1YYAA_80102018.txt;2;CLI001
depots/import/HDN1YYAA_93102018.txt;2;CLI001
When I execute following oneliner awk:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR==1){print $1}}END {}'
the output is not the expected:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
While I am suppose get only the frist column:
If I run it through all the records:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR>0){print $1}}END {}'
then it will start filtering only after the second line and I get the following output:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_25102018.txt
depots/import/HDN1YYAA_50102018.txt
depots/import/HDN1YYAA_65102018.txt
depots/import/HDN1YYAA_80102018.txt
depots/import/HDN1YYAA_93102018.txt
Does anybody knows why awk is skiping the first line only.
I tried deleting first record but the behaviour is the same, it will skip the first line.
First, it should be
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}END {}' filename
You can omit the END block if it is empty:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}' filename
You can use the -F command line argument to set the field delimiter:
awk -F';' '{if(NR==1){print $1}}' filename
Furthermore, awk programs consist of a sequence of CONDITION [{ACTIONS}] elements, you can omit the if:
awk -F';' 'NR==1 {print $1}' filename
You need to specify delimiter in either BEGIN block or as a command-line option:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}'
awk -F ';' '{ if(NR==1){print $1}}'
cut might be better suited here, for all lines
$ cut -d';' -f1 file
to skip the first line
$ sed 1d file | cut -d';' -f1
to get the first line only
$ sed 1q file | cut -d';' -f1
however at this point it's better to switch to awk
if you have a large file and only interested in the first line, it's better to exit early
$ awk -F';' '{print $1; exit}' file

Get only part of a file name in Awk

I have tried
awk '{print FILENAME}'
And the result was full path of the file.
I want to get only the file name, example: from "test/testing.test.txt" I just want to get "testing" without ".test.txt".
Use -F to delimit by the period and print the first string before that delimiter:
awk -F'.' '{ print $1 }'
Alternatively,
ls -l | awk '{ print $9 }' | awk -F"." '{ print $1 }'
will run through the whole folder👍
(there's a fancier way to do it, but that's easy).
Use the sub and/or split functions to extract the part of FILENAME you want.

initialising field seperators on condition in awk

I know that initialising FS in BEGIN is the correct practice but what if i need different field seperators for different lines(lines containing a particular pattern)? eg: my awk script is
{if($0 ~ /.*youtube.*/){FS="=";print $2}}
This code is not processing the first line.How to fix this?
You can use split. Eks get the middle date from third field green
echo "on,cat ,blue|green|red,more" | awk -F, '{split($3,a,"|");print a[2]}'
green
And you BEGIN block is not only where you can set the Field Separator:
echo "on,two,three" | awk -F, '{print $2}'
echo "on,two,three" | awk '{print $2}' FS=,
echo "on,two,three" | awk 'BEGIN{FS=","} {print $2}'
echo "on,two,three" | awk -v FS=, '{print $2}'
All these will print two
But they may have some different impact in when they can be used.
awk -F, 'BEGIN{print FS}'
,
and this does not work and gives no output.
awk 'BEGIN{print FS}' FS=,
Back to your problem:
This:
awk '{if($0 ~ /.*youtube.*/){FS="=";print $2}}' file
should be:
awk '{if($0 ~ /.*youtube.*/){split($0,a,"=");print a[2]}}' file
You do not need to test for any characters before and after regex, so:
awk '{if($0 ~ /youtube/){split($0,a,"=");print a[2]}}' file
And this could even more be simplified:
awk '/youtube/ {split($0,a,"=");print a[2]}' file
If data is like this:
cat file
youtube=thisisyoutube1 //starts here
youtube=thisisyoutube2
youtube=thisisyoutube3
youtube=thisisyoutube4
yautube=thisisnottobeprinted
Then do like this:
awk -F= '/youtube/ {split($2,a," ");print a[1]}' file
thisisyoutube1
thisisyoutube2
thisisyoutube3
thisisyoutube4

strip out value from return using awk

Im not sure how to strip out the "DST=" from these lines..
Here is my command(its returning what it should) and please if there is a more efficient way or a better way, feel free to criticize.
awk '{print $10}' iptables.log |sort -u
DST=96.7.49.64
DST=96.7.49.65
DST=96.7.50.64
DST=98.27.88.26
DST=98.27.88.28
DST=98.27.88.45
DST=98.27.88.50
As you can see, I need to grab unique ip's from iptable log.
Thanks!
If you you don't mind the unsorted output, here's a better way using awk:
awk '!a[$10]++ { sub(/DST=/,"",$10); print $10 }' file
or you can keep it all in one process, and use awk's equivalent sub() function, i.e.
awk '{sub(/DST=/,"",$10); print $10}' iptables.log |sort -u
Update:
Is there anyway to key just on DST= regardless of whether its at space 10 or 11?
awk '$10~/^DST=/{sub(/DST=/,"",$10); print $10};$11~/^DST=/{sub(/DST=/,"",$11); print $11}' iptables.log | sort -u
OR
awk '{for (i=9;i<13;i++) {
if ($i ~ /^DST=/) { sub(/DST=/, "", $i); print $i}
}
}' iptables.log | sort -u
Note that here, you can change the range of fields to check and print, I'm testing fields 9-12 just for example. variables in awk like $i refer to the i'th' element in the current line, just like $1, $9, $87, etc, etc.
As I don't have iptables.log to test with, I can't test it except to confirm that the awk syntax doesn't fail. It this doesn't work, please post 2-4 sample lines of simplified data.
IHTH
You could pipe the result of your output through sed to remove the DST= from each line:
awk '{print $10}' iptables.log | sed 's/^DST=//' | sort -u
awk '{split($10,a,"=");b[a[2]];next}END{for(i in b)print i}' iptables.log