I have an awk command to print out the total number of times "200" occurred in column 26.
awk '$26 ~ /200/{n++}; END {print n+0}' testfile
How do I modify this statement so I can pass 200 as a variable? e.g. if I have a variable $code with a value of 200
Thanks in advance
awk '$26 ~ code {n++} END {print n+0}' code=200 testfile
If a filename on the command line has the form var=val it is treated as a variable
assignment. The variable var will be assigned the value val.
ยง Awk Program Execution
awk -v var="$shellVar" '$26~var{n++} END{print n}' file
you see above line how to use shell variable in awk. some notes for your awk one-liner:
print n+0 not necessary. because the n defined by you, not picked from input text, and you explicitly did n++, so it is number type, n+0 makes no sense
the ; before END should be removed
I copied your code about the checking 200 part. but it is risky. if the $26 has only a number, you can consider to use 1*$26 == 200 or $26 == "200" using regex in this situation may give wrong result, think about in your $26, value was : 20200
Related
I have a flat file which has phone number in field starting at 314 till 323. Now I wanted to dummy out that field with 1234567890
For this I tried using the below commands but both are throwing errors.
awk '{var=substr($0,314,10);gsub("[0-9]","1234567890",$var); print}' final_phone.txt >final_phone.txt1
awk 'var=substr($0,314,10) { $var = "1234567890" }1' final_phone.txt >final_phone.txt1
**fatal: grow_fields_arr: fields_arr: can't allocate 9849885432 bytes of memory (Cannot allocate memory)**
In the first case I tried to assign the substring to a variable and in gsub I wanted to check for numbers pattern and substitute with 1234567890.
In the second case I was trying to assign the value to each of the substring value in each line.
Can someone help me with the syntax here?
#markp-fuso's comment abut why your code is generating that specific error message is correct.
All you need is:
awk '{$0=substr($0,1,313) "1234567890" substr($0,324)} 1' file
or if you want to check for numbers first:
awk 'substr($0,314,10) ~ /^[0-9]+$/{$0=substr($0,1,313) "1234567890" substr($0,324)} 1' file
and using variables:
awk '
BEGIN { beg=314; lgth=10; new="1234567890" }
substr($0,beg,lgth) ~ /^[0-9]+$/ {
$0 = substr($0,1,beg-1) new substr($0,beg+lgth)
}
1' file
I am trying to filter all the singleton from a fasta file.
Here is my input file:
>OTU1;size=3;
ATTCCCCGGGGGGG
>OTU2;size=1;
ATCCGGGACTGATC
>OTU3;size=5;
GAACTATCGGGTAA
>OTU4;size=1;
AATTGGCCATCT
The expected output is:
>OTU1;size=3;
ATTCCCCGGGGGGG
>OTU3;size=5;
GAACTATCGGGTAA
I've tried
awk -F'>' '{if($1>=2) {print $0}' input.fasta > ouput.fasta
but this will remove all the header for each OTU.
Anyone could help me out?
Could you please try following.
awk -F'[=;]' '/^>/{flag=""} $3>=3{flag=1} flag' Input_file
$ awk '/>/{f=/=1;/}!f' file
>OTU1;size=3;
ATTCCCCGGGGGGG
>OTU3;size=5;
GAACTATCGGGTAA
awk -v FS='[;=]' 'prev_sz>=2 && !/size/{print prev RS $0} /size/{prev=$0;prev_sz=$(NF-1)}'
>OTU1;size=3;
ATTCCCCGGGGGGG
>OTU3;size=5;
GAACTATCGGGTAA
Store the size from each line in prev_sz variable and whole line in prev variables. Now check if its >= 2, then print the previous line and the current line. RS is used to print new line.
While all the above methods work, they are limited to the fact that input always has to look the same. I.e the sequence-name in your fasta-file needs to have the form:
>NAME;size=value;
A few solutions can handle a bit more extended sequence-names, but none handle the case where things go a bit more generic, i.e.
>NAME;label1=value1;label2=value2;STRING;label3=value3;
Print sequence where label xxx matches value vvv:
awk '/>{f = /;xxx=vvv;/}f' file.fasta
Print sequence where label xxx has a numeric value p bigger than q:
awk -v label="xxx" -v limit=q \
'BEGIN{ere=";" label "="}
/>/{ f=0; match($0,ere);value=0+substr($0,RSTART+length(ere)); f=(value>limit)}
f' <file>
In the above ere is a regular expression we try to match. We use it to find the location of the value attached to label xxx. This substring will have none-numeric characters after its value, but by adding 0 to it, it is converted to a number, losing all non-numeric values (i.e. 3;label4=value4; is converted to 3). We check if the value is bigger than our limit, and print the sequence based on that result.
I have a file with two column data and I want to find the max value and print it.
file =load_measure
11:20,18.03
11:25,17.85
11:30,18.24
11:35,19.19
11:40,18.45
11:45,17.53
11:50,17.56
11:55,17.60
12:00,18.51
12:05,18.50
I try via hereunder code but It returns 0
awk 'BEGIN {max = 0} {if ($2>max) max=$2} END {print max}' load_measure
0
I try via declaring max as $max but it does not count the real max:
awk 'BEGIN {max = 0} {if ($2>max) max=$2} END {print $max}' load_measure
12:05,18.50
Can anyone explain what I'm doing wrong?
thank you!
When your fields are separated by something other that white space you need to tell awk what that something is by populating FS. You also need to set max to the first value read so the script will work for all-negative input and you need to print max+0 in the END to ensure numeric output even if the input file is empty:
awk -F, 'NR==1{max=$2} $2>max{max=$2} END{print max+0}' file
Whern max is 2, print max is printing the value of max, i.e. 2, while print $max is printing the value of the field indexed by the value of max, i.e. $2, which in an END section will either be null or the value of $2 on the last line read (undefined behavior per POSIX so awk-dependent).
You should specify the value of FS that is the input field separator. It describes how each record is split into fields; it may even be an extended regular expression.
On awk's command line, FS can be specified as -F <sep> (or -v FS=<sep>). You can also set it in the BEGIN block.
I'm normally using the later method but that's just a personal preference:
BEGIN {max=0;FS=","} ....
Also Your problem can be solved like this too:
awk -F, -v m=0 '$2>m {m=$2} END {print m}'
thus sparing an if statement.
The POSIX-mandated default value is a space (0x20). But be aware that running spaces (more than one) might be considered as one field separator.
Here is the official documentation for GNU Awk.
i am using awk and need to find if a variable , in this case $24 contains the word 3:2- if so to print the line (for sed command)- the variable may include more letters or spaces or \n.......
for ex.
$24 == "3:2" {print "s/(inter = ).*/\\1\"" "3:2_pulldown" "\"/" >> NR }
in my above line- it never find such a string although it exists.
can you help me with the command please??
If you're looking for "3:2" within $24, then you want $24 ~ /3:2/ or index($24, "3:2") > 0
Why are you using awk to generate a sed script?
Update
To pass a variable from the shell to awk, use the -v option:
val="3:2" # or however you determine this value
awk -v v="$val" '$24 ~ v {print}'
awk '$24~/3:2/' file_name
this will serach for "3:2" in field 24
Awk is awesome of text manipulation, but a little opaque to me. I would like to run an awk command that boils down to something like this
awk '{$x = ($3 > 0 ? 1 : -1); print $1*$x "\t" $2*$x}' file
I want to assign $x on each line, i.e. not using the -v option, and then use it inside my print statement. Unforunately, after the ; awk has forgotten the values of $1 and $2. And putting the assignment outside the braces doesn't seem to work either. How does this work?
AWK doesn't use dollar signs on its variables:
awk '{x = ($3 > 0 ? 1 : -1); print $1*x "\t" $2*x}' file
In your version you're assigning a 1 or -1 to $0 (the entire input line) since x==0 (effectively) when the first line of the input file is read. That's why $1 and $2 seem to be "forgotten".