awk, print all columns and add new column with substr - awk

I have this table
USI,Name,2D-3D
RO0001,Patate,2D
RO0002,Haricot,3D
RO0003,Banane,2D
RO0004,Pomme,2D
RO0005,Poire,2D
and I want this
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
I manage to obtain the construction "RO_2D_Patate" with awk
awk -F "," '{print substr($1,1,2)"_"substr($3,1,2)"_"$2}' Test4.txt
But I want to print all my column $0 before as my second table.
I tried everything But I am still a novice !!!!
Any idea over there?

awk -F, '{print $0 (NR>1 ? FS substr($1,1,2)"_"$3"_"$2 : "")}' Test4.txt

$ awk -F, -v OFS=, 'NR>1{$4=substr($1,1,2)"_"$3"_"$2}1' Test4.txt
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire

awk -F, 'NR>1{print $0,substr($1,1,2)"_"$NF"_"$2}/USI/' OFS=, file
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire

Related

AWK:Remove multiple columns and retain the column delimiters [duplicate]

This command works. It outputs the field separator (in this case, a comma):
$ echo "hi,ho"|awk -F, '/hi/{print $0}'
hi,ho
This command has strange output (it is missing the comma):
$ echo "hi,ho"|awk -F, '/hi/{$2="low";print $0}'
hi low
Setting the OFS (output field separator) variable to a comma fixes this case, but it really does not explain this behaviour.
Can I tell awk to keep the OFS?
When you modify the line ($0) awk re-constructs all columns and puts the value of OFS between them which by default is space. You modified the value of $2 which means you forced awk to re-evaluate$0.
When you print the line as is using $0 in your first case, since you did not modify any fields, awk did not re-evaluated each field and hence the field separator is preserved.
In order to preserve the field separator, you can specify that using:
BEGIN block:
$ echo "hi,ho" | awk 'BEGIN{FS=OFS=","}/hi/{$2="low";print $0}'
hi,low
Using -v option:
$ echo "hi,ho" | awk -F, -v OFS="," '/hi/{$2="low";print $0}'
hi,low
Defining at the end of awk:
$ echo "hi,ho" | awk -F, '/hi/{$2="low";print $0}' OFS=","
hi,low
You first example does not change anything, so all is printed out as the input.
In second example, it change the line and it will use the default OFS, that is (one space)
So to overcome this:
echo "hi,ho"|awk -F, '/hi/{$2="low";print $0}' OFS=","
hi,low
In your BEGIN action, set OFS = FS.

Using AWK to extract one column from a tab separated file

I know this is a simple question, but the awk command is literally melting my brain. I have a tab separated file "inputfile.gtf" and I need to extract one column from it and put it into a new file "newfile.tsv" I cannot for the life of me figure out the proper syntax to do this with awk. Here is what I've tried:
awk -F, 'BEGIN{OFS="/t"} {print $8}' inputfile.gtf > newfile.tsv
also
awk 'BEGIN{OFS="/t";FS="/t"};{print $8}' inputfile.gtf > newfile.tsv
Both of these just give me an empty file. Everywhere I search, people seem to have completely different ways of trying to achieve this simple task, and at this point I am completely lost. Any help would be greatly appreciated. Thanks.
Why not simpler :
awk -F'\t' '{print $8}' inputfile.gtf > newfile.tsv
You have specified the wrong delimiter /t, the tab character typed as \t:
awk 'BEGIN{ FS=OFS="\t" }{ print $8 }' inputfile.gtf > newfile.tsv
Your 1st command :
awk -F, 'BEGIN{OFS="/t"} {print $8}' inputfile.gtf > newfile.tsv
You are setting -F, which is not required, as your file is not , comma separated.
next, OFS="/t" : Syntax is incorrect, it should be OFS="\t", but again you don't need this as you don't want to set Output fields separator as \t since you're printing only a single record and OFS is not at all involved in this case; unless you print atleast two fields.
Your 2nd command :
awk 'BEGIN{OFS="/t";FS="/t"};{print $8}' inputfile.gtf > newfile.tsv
Again it's not /t it should be \t. Also FS="\t" is similar to -F "\t"
What you actually need is :
awk -F"\t" '{print $8}' inputfile.gtf > newfile.tsv
or
awk -v FS="\t" '{print $8}' inputfile.gtf > newfile.tsv
And if your file has just tabs and your fields don't have spaces in between then you can simply use :
awk '{print $8}' inputfile.gtf > newfile.tsv

How to delete first three columns in a delimited file

For example, I have a csv file as follow,
12345432|1346283301|5676438284971|13564357342151697 ...
87540258|1356433301|1125438284971|135643643462151697 ...
67323266|1356563471|1823543828471|13564386436651697 ...
and hundreds more columns but I want to remove first three columns and save to a new file(if possible same file would be better for me)
This is the result I want.
13564357342151697 ...
135643643462151697 ...
13564386436651697 ...
I have been looking and trying but I am not able to do it. And below is the code I have.
awk -F'|' '{print $1 > "newfile"; sub(/^[^|]+\|/,"")}1' old.csv > new.csv
Appreciate if someone can help me. Thank you.
You can use cut :
cut -f4- -d'|' old.csv > new.csv
#Heng: try:
awk -F"|" '{for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"":"|")};print ""}' Input_file
OR
awk -F"|" '{for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":"|")};}' Input_file
you could re-direct this command's output into a file as per your need.
EDIT:
awk -F"|" 'FNR==1{++e;fi="REPORT_A1_"e;} {for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":"|") > fi}}' Input_file1 Input_file2 Input_file3
This is what you're looking for:
awk -F '|' '{$1=$2=$3=""; print $0}' oldfile > newfile
But it will have leading whitespaces so then add the following substitution:
sub(/^[ \t\|]+/,"") --> changed to sub(/^[ \t\|]+/,"") (escaped leading '|' from column removal)
awk -F '|' '{$1=$2=$3="";OFS="|";sub(/^[ \t\|]+/,"") ;print $0}' oldFile > newFile
awk -F\| '{print $NF}' file >newfile
13564357342151697 ...
135643643462151697 ...
13564386436651697 ...

initialising field seperators on condition in awk

I know that initialising FS in BEGIN is the correct practice but what if i need different field seperators for different lines(lines containing a particular pattern)? eg: my awk script is
{if($0 ~ /.*youtube.*/){FS="=";print $2}}
This code is not processing the first line.How to fix this?
You can use split. Eks get the middle date from third field green
echo "on,cat ,blue|green|red,more" | awk -F, '{split($3,a,"|");print a[2]}'
green
And you BEGIN block is not only where you can set the Field Separator:
echo "on,two,three" | awk -F, '{print $2}'
echo "on,two,three" | awk '{print $2}' FS=,
echo "on,two,three" | awk 'BEGIN{FS=","} {print $2}'
echo "on,two,three" | awk -v FS=, '{print $2}'
All these will print two
But they may have some different impact in when they can be used.
awk -F, 'BEGIN{print FS}'
,
and this does not work and gives no output.
awk 'BEGIN{print FS}' FS=,
Back to your problem:
This:
awk '{if($0 ~ /.*youtube.*/){FS="=";print $2}}' file
should be:
awk '{if($0 ~ /.*youtube.*/){split($0,a,"=");print a[2]}}' file
You do not need to test for any characters before and after regex, so:
awk '{if($0 ~ /youtube/){split($0,a,"=");print a[2]}}' file
And this could even more be simplified:
awk '/youtube/ {split($0,a,"=");print a[2]}' file
If data is like this:
cat file
youtube=thisisyoutube1 //starts here
youtube=thisisyoutube2
youtube=thisisyoutube3
youtube=thisisyoutube4
yautube=thisisnottobeprinted
Then do like this:
awk -F= '/youtube/ {split($2,a," ");print a[1]}' file
thisisyoutube1
thisisyoutube2
thisisyoutube3
thisisyoutube4

Why does an awk field assignment lose the output field separator?

This command works. It outputs the field separator (in this case, a comma):
$ echo "hi,ho"|awk -F, '/hi/{print $0}'
hi,ho
This command has strange output (it is missing the comma):
$ echo "hi,ho"|awk -F, '/hi/{$2="low";print $0}'
hi low
Setting the OFS (output field separator) variable to a comma fixes this case, but it really does not explain this behaviour.
Can I tell awk to keep the OFS?
When you modify the line ($0) awk re-constructs all columns and puts the value of OFS between them which by default is space. You modified the value of $2 which means you forced awk to re-evaluate$0.
When you print the line as is using $0 in your first case, since you did not modify any fields, awk did not re-evaluated each field and hence the field separator is preserved.
In order to preserve the field separator, you can specify that using:
BEGIN block:
$ echo "hi,ho" | awk 'BEGIN{FS=OFS=","}/hi/{$2="low";print $0}'
hi,low
Using -v option:
$ echo "hi,ho" | awk -F, -v OFS="," '/hi/{$2="low";print $0}'
hi,low
Defining at the end of awk:
$ echo "hi,ho" | awk -F, '/hi/{$2="low";print $0}' OFS=","
hi,low
You first example does not change anything, so all is printed out as the input.
In second example, it change the line and it will use the default OFS, that is (one space)
So to overcome this:
echo "hi,ho"|awk -F, '/hi/{$2="low";print $0}' OFS=","
hi,low
In your BEGIN action, set OFS = FS.