I know 2 things about awk:
1.
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ var {print $0}' file.txt # will print the line where 3rd field includes the variable $PAT
2.
awk '$3 ~ /^aGeneName/' file.txt # will print the line where 3rd field starts with string "aGeneName"
But what I want is the combination of these two: I want to print the line where the 3rd field starts with the variable $PAT, something like
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ /^var/ {print $0}' file.txt # but this is wrong, since variable can't be put into //
One way is like this:
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ "^" var {print $0}' file.txt
And the {print $0} can be saved here, it's implied.
Another way, when the pattern var is a simple string, no RegEX character inside:
PAT='aGeneName'
awk -v var="$PAT" 'index($3, var)==1' file.txt
I am writing a simple bash loop where I use awk to grab lines from a file. The pattern is iterated over in the loop. I can get the program to work fine until I try to add symbols to the variable used in awk for the search pattern
WORKING PROGRAM (first search term is "cat")
list="cat dog"
for k in $list
do
vark="$k"
awk '/'$vark'/{print RS $0}' RS=\> FILE1 > FILE2
done
But when I try to add the symbols "|" ,"<", and "_" to the variable the loop breaks. I have tried with multiple seperators for the symbols but can not seem to get it to correctly integrate the symbols into the variable.
BROKEN PROGRAM (first search term is "|cat>_")
list="cat dog"
for k in $list
do
vark="$k"
varkk="|$k>_"
awk '/'$varkk'/{print RS $0}' RS=\> FILE1 > FILE2
done
Thank you so much for your help!
Correct way to pass shell/bash variable to awk is using -v option, like -v varname="$shell_var"
From
awk '/'$varkk'/{print RS $0}' RS=\> FILE1 > FILE2
To
awk -v regexp="$vark" -v RS='\>' '$0 ~ regexp{print RS $0}' FILE1 > FILE2
You can also do this, using awk itself
awk '
FNR==NR{ arr[$0]; next }
{ for(i in arr)if($0 ~ i){ print RS $0; next} }
' pattern_file RS='\>' FILE1 >file2
I have this table
USI,Name,2D-3D
RO0001,Patate,2D
RO0002,Haricot,3D
RO0003,Banane,2D
RO0004,Pomme,2D
RO0005,Poire,2D
and I want this
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
I manage to obtain the construction "RO_2D_Patate" with awk
awk -F "," '{print substr($1,1,2)"_"substr($3,1,2)"_"$2}' Test4.txt
But I want to print all my column $0 before as my second table.
I tried everything But I am still a novice !!!!
Any idea over there?
awk -F, '{print $0 (NR>1 ? FS substr($1,1,2)"_"$3"_"$2 : "")}' Test4.txt
$ awk -F, -v OFS=, 'NR>1{$4=substr($1,1,2)"_"$3"_"$2}1' Test4.txt
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
awk -F, 'NR>1{print $0,substr($1,1,2)"_"$NF"_"$2}/USI/' OFS=, file
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
I'm trying to remove columns beyond number 26 from all lines of a file, using this code:
awk '{ FS = ";" ; for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}'
It is working well in all the lines but for the first one, where it shows 2 more fields (and cuts the last in two).
Is there anything wrong in my code?
Thanks a lot
This is because you set FS on every line, while it should be in a BEGIN{} block (or outside as a parameter, like others answers correctly suggest):
awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
In fact, to accomplish your goal it is easier to use cut:
cut -d';' -f-26 file
^ ^^^
| all fields up to the 26th
delimiter
Example with 4 cols
sample file:
$ cat a
1col1;col2;col3;col4;col5;col6
2col1;col2;col3;col4;col5;col6
3col1;col2;col3;col4;col5;col6
previous code:
$ awk '{FS=";"; for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
2col1;col2;col3;col4
3col1;col2;col3;col4
new code:
$ awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
with cut:
$ cut -d';' -f-4 a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
You can try this awk,
awk -F';' 'NF>26{NF=26}1' OFS=';' yourfile
#fedorqui is right.
But you can also use this to set Field Separator :
awk -F";" '{for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
I would like use awk to convert "{(linefeed)" to just "{"
I tried w/o success
awk '{gsub("{\n", "{")} input >output;
any sensible descriptive solutions...?
Use GNU awk for multi-char RS to let you read the whole file at once:
awk -v RS='^$' -v ORS= '{gsub(/{\n/, "{")} 1' input >output
Your problem is that, unlike perl, the record separator does not appear in the record.
If that last character on the line is an open brace, print without a newline, else print with a newline.
awk '/{$/ {printf "%s", $0; next} 1' file
or,
awk '{printf "%s%s", $0, /{$/ ? "" : ORS}' file