Put a comma in a specific column - awk

I would like to know how to put a comma in one column (space). For example.
a b c d e
And I would like this.
a b c d, e
A comma in the 4th space.
I tried with this command.
awk -F '{print $4}' < file.txt | cut -d"," -f4-

$ awk '{$4=$4","}1' file
a b c d, e

If you have only 5 fields(or in case you have more fields in your Input_file and you want to perform this for second last field) in your Input_file then following may also help you in same.
awk '{$(NF-1)=$(NF-1)","} 1' Input_file
Or with sed simply replace 4th space with comma as follows.
sed 's/ /, /4' Input_file

echo a b c d e| awk '{$0=gensub(/ /,", ",4)}1'
a b c d, e

Related

print specific value from 7th column using pattern matching along with first 6 columns

file1
1 123 ab456 A G PASS AC=0.15;FB=1.5;BV=45; 0|0 0|0 0|1 0|0
4 789 ab123 C T PASS FB=90;AC=2.15;BV=12; 0|1 0|1 0|0 0|0
desired output
1 123 ab456 A G PASS AC=0.15
4 789 ab123 C T PASS AC=2.15
I used
awk '{print $1,$2,$3,$4,$5,$6,$7}' file1 > out1.txt
sed -i 's/;/\t/g' out1.txt
awk '{print $1,$2,$3,$4,$5,$6,$7,$8}' out1.txt
output generated
1 123 ab456 A G PASS AC=0.15
4 789 ab123 C T PASS FB=90
I want to print first 6 columns along with value of AC=(*) from 7th column.
With your shown samples, please try following awk code.
awk '
{
val=""
while(match($7,/AC=[^;]*/)){
val=(val?val:"")substr($7,RSTART,RLENGTH)
$7=substr($7,RSTART+RLENGTH)
}
print $1,$2,$3,$4,$5,$6,val
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
val="" ##Nullifying val here.
while(match($7,/AC=[^;]*/)){ ##Running while loop to use match function to match AC= till semi colon all occurrences here.
val=(val?val:"")substr($7,RSTART,RLENGTH) ##Creating val and keep adding matched regex value to it, from 7th column.
$7=substr($7,RSTART+RLENGTH) ##Assigning rest pending values to 7th column itself.
}
print $1,$2,$3,$4,$5,$6,val ##Printing appropriate columns required by OP along with val here.
}
' Input_file ##Mentioning Input_file name here.
$ awk '{
n=split($7,a,/;/) # split $7 on ;s
for(i=1;i<=n&&a[i]!~/^AC=/;i++); # just loop looking for AC
print $1,$2,$3,$4,$5,$6,a[i] # output
}' file
Output:
1 123 ab456 A G PASS AC=0.15
4 789 ab123 C T PASS AC=2.15
If AC= was not found, and empty field is outputed instead.
Any time you have tag=value pairs in your data I find it best to first populate an array (f[] below) to hold those tag-value mappings so you can print/test/rearrange those values by their tags (names).
Using any awk in any shell on every Unix box:
$ cat tst.awk
{
split($7,tmp,/[=;]/)
for (i=1; i<NF; i+=2) {
f[tmp[i]] = tmp[i] "=" tmp[i+1]
}
sub(/[[:space:]]*[^[:space:]]+;.*/,"")
print $0, f["AC"]
}
$ awk -f tst.awk file
1 123 ab456 A G PASS AC=0.15
4 789 ab123 C T PASS AC=2.15
This might work for you (GNU sed):
sed -nE 's/^((\S+\s){6})\S*;?(AC=[^;]*);.*/\1\3/p' file
Turn off implicit printing -n and add easier regexp -E.
Match the first six fields and their delimiters and append the AC tag and its value from the next.
With only GNU sed:
$ sed -r 's/(\S+;)?(AC=[^;]*).*/\2/' file1
1 123 ab456 A G PASS AC=0.15
4 789 ab123 C T PASS AC=2.15
Lines without a AC=... part in the 7th field will be printed without modification. If you prefer removing the 7th field and the end of the line, use:
$ sed -r 's/(\S+;)?(AC=[^;]*).*/\2/;t;s/\S+;.*//' file1

awk shrinks multiple spaces to one [duplicate]

This question already has an answer here:
how can I preserve an embedded TAB character
(1 answer)
Closed 2 years ago.
I'm trying to get rid of the first column using awk. If I assign empty string to first column then all other spaces inside another columns are shrinked to one space. How to disable space shrinking?
$ echo 'a b c' | awk '{print $0}'
a b c
$ echo 'a b c' | awk '{$1=""; print $0}'
b c
I'm using stadard awk inside ubuntu repo
$ dpkg -l | grep awk
ii mawk 1.3.3-17ubuntu3 amd64 a pattern scanning and text processing language
When you modify any field in awk i.e. $1="" you force awk to reformat a record using default OFS, which is just a single space.
Having said that, one way in awk to remove first column while preserving whitespaces between fields is:
echo 'a b c' | awk '{sub(/^[[:blank:]]*[^[:blank:]]+[[:blank:]]+/, "")} 1'
b c
Or if you're using gnu-awk then use:
echo 'a b c' | awk -v RS='[[:blank:]]+' 'NR > 1{ORS=RT; print}'
b c
Another way using perl array slice:
echo 'a b c' | perl -lane 'print join "\t", #F[1..2]'
 Output
b c
echo 'a b c' | awk '{print substr($0,index($0,FS)+1)}'
sed might be easier
$ echo 'a b c' | sed -E 's/\S+\s+//'
b c

Count b or B in even lines

I need count the number of times in the even lines of the file.txt the letter 'b' or 'B' appears, e.g. for the file.txt like:
everyB or gbnBra
uitiakB and kanapB bodddB
Kanbalis astroBominus
I got the first part but I need to count these b or B letters and I do not know how to count them together
awk '!(NR%2)' file.txt
$ awk '!(NR%2){print gsub(/[bB]/,"")}' file
4
Could you please try following, one more approach with awk written on mobile will try it in few mins should work but.
awk -F'[bB]' 'NR%2 == 0{print (NF ? NF - 1 : 0)}' Input_file
Thanks to #Ed sir for solving zero matches found line problem in comments.
In a single awk:
awk '!(NR%2){gsub(/[^Bb]/,"");print length}' file.txt
gsub(/[^Bb]/,"") deletes every character in the line the line except for B and b.
print length prints the length of the resulting string.
awk '!(NR%2)' file.txt | tr -cd 'Bb' | wc -c
Explanation:
awk '!(NR%2)' file.txt : keep only even lines from file.txt
tr -cd 'Bb' : keep only B and b characters
wc -c : count characters
Example:
With file bellow, the result is 4.
everyB or gbnBra
uitiakB and kanapB bodddB
Kanbalis astroBominus
Here is another way
$ sed -n '2~2s/[^bB]//gp' file | wc -c

Remove empty columns by awk

I have a input file, which is tab delimited, but I want to remove all empty columns. Empty columns : $13=$14=$15=$84=$85=$86=$87=$88=$89=$91=$94
INPUT: tsv file with more than 90 columns
a b d e g...
a b d e g...
OUTPUT: tsv file without empty columns
a b d e g....
a b d e g...
Thank you
This might be what you want:
$ printf 'a\tb\tc\td\te\n'
a b c d e
$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=""} 1'
a c e
$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a c e
Note that the above doesn't remove all empty columns as some potential solutions might do, it only removes exactly the column numbers you want removed:
$ printf 'a\tb\t\td\te\n'
a b d e
$ printf 'a\tb\t\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a e
remove ALL empty columns:
If you have a tab-delimited file, with empty columns and you want to remove all empty columns, it implies that you have multiple consecutive tabs. Hence you could just replace those with a single tab and delete then the first starting tab if you also removed the first column:
sed 's/\t\+/\t/g;s/^\t//' <file>
remove SOME columns: See Ed Morton or just use cut:
cut --complement -f 13,14,15,84,85,86,87,88,89,91,94 <file>
remove selected columns if and only if they are empty:
Basically a simple adaptation from Ed Morton :
awk 'BEGIN{FS=OFS="\t"; n=split(col,a,",")}
{ for(i=1;i<=n;++i) if ($a[i]=="") $a[i]=RS; gsub("(^|"FS")"RS,"") }
1' col=13,14,15,84,85,86,87,88,89,91,94 <file>

AWK: convert string to a column

I want to convert string (eg: abcdef) to a column
This is what I want.
a
b
c
d
e
f
I know how to covert string to column by using sed
$ echo abcdef | sed 's/[^.]/&\n/g'|sed '$d'
But how to covert it using awk?
[akshay#localhost tmp]$ awk -v ORS= 'gsub(/./,"&\n")' <<<"abcdefgh"
a
b
c
d
e
f
g
h
You can set the field separator to an empty string, so that every character is a different field. Then, loop through them and print:
$ awk -v FS="" '{for (i=1;i<=NF;i++) print $i}' <<< "abcdef"
a
b
c
d
e
f
Which is equivalent to:
awk -F "" '{for (i=1;i<=NF;i++) print $i}' <<< "abcdef"
Only working with awks internal variables:
echo abcdef | awk 'BEGIN{FS="";OFS="\n"}{$1=$1}1'
It sets the input field separator (FS) to nothing which means that every character is a field. The output field separator (OFS) is set to newline. Notive the $1=$1 is needed to rebuild the record with the new OFS.