Passing a shell variable to awk in gnuplot - variables

I need to pass a shell variable to awk in gnuplot but I get error messages :
The variable is set in the sript and is called FILE. This changes according to date.
My code : (in a Gnuplot script)
plot FILE using 1:14 with points pointtype 7 pointsize 1 # this works fine
replot '< awk ''{y1 = y2; y2 = $14; if (NR > 1 && y2 - y1 >= 100) printf("\n") ; if (NR > 1 && y2 -y1 <= -100) printf("\n"); print}'' FILE' using 1:14 with linespoints
Err msg
awk: fatal: cannot open file `FILE' for reading (No such file or directory)
When I hard code the FILE path the replot works.
Could anyone clarify the code I need to pass this variable to awk? Am I on the right track with something like :
% environment_variable=FILE
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }' ?
Here is my Gnuplot script code: cobbled together from other posts mostly..
#FILE selection - we want to plot the most recent data file
FILE = strftime('/data/%Y-%m-%d.txt', time(0)) # this is correct
print "FILE is : " .FILE
#set file path variable for awk : (This is where my problem is)
awk -v var="$FILE" '{print var}'
awk '{print $0}' <<< "$FILE"
Thank you in advance

If FILE is a gnuplot variable that contains the path of the file, you can do this:
FILE = 'input'
plot '<awk ''1'' ' . FILE
This concatenates the value of the gnuplot variable FILE onto the end of the awk command. The resulting awk "script" is therefore awk '1' input (which just prints every line of the file); you can substitute the '1' for whatever it is you want to do with awk.
By the way, your awk script can be simplified a little bit to this:
awk '{ y1 = y2; y2 = $14 } NR > 1 && (y2 - y1 >= 100 || y2 - y1 <= -100) { print "" } { print $1, $14 }'
It's not often that you need to use if in awk, as each block { } is executed conditionally (or if no condition is specified, the block is always executed). Assuming you haven't modified the record separator (the RS variable), print "" is the same as printf("\n"). Rather than specifying using 1:14 in gnuplot, you may as well only print the columns that you are interested in using print $1, $14.
So your replot line in gnuplot would be:
replot '<awk ''{ y1 = y2; y2 = $14 } NR > 1 && (y2 - y1 >= 100 || y2 - y1 <= -100) { print "" } { print $1, $14 }'' ' . FILE with linespoints
Of course, this line is getting a bit long. You might want to split it up a bit:
awk_cmd = '{ y1 = y2; y2 = $14 } NR > 1 && (y2 - y1 >= 100 || y2 - y1 <= -100) { print "" } { print $1, $14 }'
replot sprintf("<awk '%s' %s", awk_cmd, FILE) with linespoints

Related

Concatenating array elements into a one string in for loop using awk

I am working on a variant calling format (vcf) file, and I tried to show you guys what I am trying to do:
Input:
1 877803 838425 GC G
1 878077 966631 C CCACGG
Output:
1 877803 838425 C -
1 878077 966631 - CACGG
In summary, I am trying to delete the first letters of longer strings.
And here is my code:
awk 'BEGIN { OFS="\t" } /#/ {next}
{
m = split($4, a, //)
n = split($5, b, //)
x = "-"
delete y
if (m>n){
for (i = n+1; i <= m; i++) {
y = sprintf("%s", a[i])
}
print $1, $2, $3, y, x
}
else if (n>m){
for (j = m+1; i <= n; i++) {
y = sprintf("%s", b[j]) ## Problem here
}
print $1, $2, $3, x, y
}
}' input.vcf > output.vcf
But,
I am getting the following error in line 15, not even in line 9
awk: cmd. line:15: (FILENAME=input.vcf FNR=1) fatal: attempt to use array y in a scalar context
I don't know how to concatenate array elements into a one string using awk.
I will be very happy if you guys help me.
Merry X-Mas!
You may try this awk:
awk -v OFS="\t" 'function trim(s) { return (length(s) == 1 ? "-" : substr(s, 2)); } {$4 = trim($4); $5 = trim($5)} 1' file
1 877803 838425 C -
1 878077 966631 - CACGG
More readable form:
awk -v OFS="\t" 'function trim(s) {
return (length(s) == 1 ? "-" : substr(s, 2))
}
{
$4 = trim($4)
$5 = trim($5)
} 1' file
You can use awk's substr function to process the 4th and 5th space delimited fields:
awk '{ substr($4,2)==""?$4="-":$4=substr($4,2);substr($5,2)==""?$5="-":$5=substr($5,2)}1' file
If the string from position 2 onwards in field 4 is equal to "", set field 4 to "-" otherwise, set field 4 to the extract of the field from position 2 to the end of the field. Do the same with field 5. Print lines modified or not with short hand 1.

Concatenate columns and adds digits awk

I have a csv file:
number1;number2;min_length;max_length
"40";"1801";8;8
"40";"182";8;8
"42";"32";6;8
"42";"4";6;6
"43";"691";9;9
I want the output be:
4018010000;4018019999
4018200000;4018299999
42320000;42329999
423200000;423299999
4232000000;4232999999
42400000;42499999
43691000000;43691999999
So the new file will be consisting of:
column_1 = a concatenation of old_column_1 + old_column_2 + a number
of "0" equal to (old_column_3 - length of the old_column_2)
column_2 = a concatenation of old_column_1 + old_column_2 + a number of "9" equal
to (old_column_3 - length of the old_column_2) , when min_length = max_length. And when min_length is not equal with max_length , I need to take into account all the possible lengths. So for the line "42";"32";6;8 , all the lengths are: 6,7 and 8.
Also, i need to delete the quotation mark everywhere.
I tried with paste and cut like that:
paste -d ";" <(cut -f1,2 -d ";" < file1) > file2
for the concatenation of the first 2 columns, but i think with awk its easier. However, i can't figure out how to do it. Any help it's apreciated. Thanks!
Edit: Actually, added column 4 in input.
You may use this awk:
awk 'function padstr(ch, len, s) {
s = sprintf("%*s", len, "")
gsub(/ /, ch, s)
return s
}
BEGIN {
FS=OFS=";"
}
{
gsub(/"/, "");
for (i=0; i<=($4-$3); i++) {
d = $3 - length($2) + i
print $1 $2 padstr("0", d), $1 $2 padstr("9", d)
}
}' file
4018010000;4018019999
4018200000;4018299999
42320000;42329999
423200000;423299999
4232000000;4232999999
42400000;42499999
43691000000;43691999999
With awk:
awk '
BEGIN{FS = OFS = ";"} # set field separator and output field separator to be ";"
{
$0 = gensub("\"", "", "g"); # Drop double quotes
s = $1$2; # The range header number
l = $3-length($2); # Number of zeros or 9s to be appended
l = 10^l; # Get 10 raised to that number
print s*l, (s+1)*l-1; # Adding n zeros is multiplication by 10^n
# Adding n nines is multipliaction by 10^n + (10^n - 1)
}' input.txt
Explanation inline as comments.

Print all the files which are at maximum depth in a directory

Print all the files which are present in maximum depth
for example
abc/1/2/3/4/r.txt
abc/1/f1.txt
abc/11/22/44/66/77/f2.txt
abc/11/22/44/66/77/f4.txt
abc/11/22/44/66/77/f5.txt
so this would print
abc/11/22/44/66/77/f2.txt
abc/11/22/44/66/77/f4.txt
abc/11/22/44/66/77/f5.txt
I have written this command
$cat listoffiles.txt | awk -F "/" ' { if ( NF > x ) { x = NF; y = $0 } }END{ print y }'
but this is printing only the first occurrence.
Keep buffering deepest files and discard them whenever the max depth changes. At the end, dump what's in the buffer.
awk -F'/+' 'NF>max{max=NF;delete buf} NF==max{buf[$0]} END{for(f in buf) print f}' file

What does this awk command do?

What does this awk command do?
awk 'NR > 1 {for(x=1;x<=NF;x++) if(x == 1 || (x >= 4 && x % 2 == 0))
printf "%s", $x (x == NF || x == (NF-1) ? "\n":"\t")}' depth.txt
> depth_concoct.txt
I think
NR > 1 means it starts from second line,
for(x=1;x<=NF;x++) means for every fields,
if(x == 1 || (x >= 4 && x % 2 == 0)) means if x equals 1 or (I don' understand the codes from this part and so on)
and I know that the input file for awk is depth.txt and the output of awk will be saved to depth_concoct.txt.
What does the codes in the middle mean?
$ awk '
NR > 1 { # starting from the second record
for(x=1;x<=NF;x++) # iterate every field
if(x == 1 || (x >= 4 && x % 2 == 0)) # for 1st, 4th and every even-numbered field after 4th
printf "%s", # print the field and after it
$x (x == NF || x == (NF-1) ? "\n":"\t") # a tab or a newline if its the last field
}' depth.txt > depth_concoct.txt
(x == NF || x == (NF-1) ? "\n":"\t") is called conditional operator, in this context it's basically streamlined version of:
if( x == NF || x == (NF-1) ) # if this is the last field to be printed
printf "\n" # finish the record with a newline
else # else
printf "\t"` # print a tab after the field
you can rewrite it as below, which should be trivial to read.
$ awk `NR>1 {printf "%s", $1;
for(x=4;x<=NF;x+=2) printf "\t%s", $x;
print ""}' inputfile > outputfile
the complexity of the code is sometimes just an implementation detail.
prints first and every second field starting from the 4th.
Assume your file has 8 fields, this is equivalent to
$ awk -v OFS='\t' 'NR>1{print $1,$4,$6,$8}' inputfile > outputfile

awk one row substracts the next row if their first two colums are the same

If we would like to substract $17 if their $1 & $2 are the same: input
targetID,cpd_number,Cell_assay_id,Cell_alt_assay_id,Cell_type_desc,Cell_Operator,Cell_result_value,Cell_unit_value,assay_id,alt_assay_id,type_desc,operator,result_value,unit_value,Ratio_operator,Ratio,log_ratio,Cell_experiment_date,experiment_date,Cell_discipline,discipline
111,CPD-123456,2222,1111,IC50,,6.1,uM,1183,1265,Ki,,0.16,uM,,38.125,1.7511,2003-03-03 00:00:00,2003-02-10 00:00:00,Cell,Enzyme
111,CPD-123456,2222,1111,IC50,,9.02053,uM,1183,1265,Ki,,0.16,uM,,56.3783,-1.5812,2003-02-27 00:00:00,2003-02-10 00:00:00,Cell,Enzyme
111,CPD-777888,3333,4444,IC50,,6.1,uM,1183,1265,Ki,,0.16,uM,,38.125,-1,2003-03-03 00:00:00,2003-02-10 00:00:00,Cell,Enzyme
111,CPD-777888,3333,4444,IC50,,9.02053,uM,1183,1265,Ki,,0.16,uM,,56.3783,-3,2003-02-27 00:00:00,2003-02-10 00:00:00,Cell,Enzyme
The desired output should be (1.7511-(-1.5812)=3.3323); (-1-(-3)=2)
3.3323
2
First attempt:
awk -F, ' last != $1""$2 && last{ # ONLY When last key "TargetID + Cpd_number"
print C # differs from actual , print line + substraction
C=0} # reset acumulators
{ # This block process each line of infile
C -= $17 # C calc
line=$0 # Line will be actual line without activity
last=$1""$2} # Store the key in orther to track switching
END{ # This block triggers after the complete file read
# to print the last average that cannot be trigger during
# the previous block
print C}' input
It will give the output:
-0.1699
4
The second attempt:
#!/bin/bash
tail -n+2 test > test2 # remove the title/header
awk -F, '$1 == $1 && $2 == $2 {print $17}' test2 >> test3 # print $17 if the $1 and $2 are the same
awk 'NR==1{s=$1;next}{s-=$1}END{print s}' test3
rm test2 test3
test3 will be
1.7511
-1.5812
-1
-3
Output is
7.3323
Could any guru kindly give some comments? Thanks!
You could try the below awk command,
$ awk -F, 'NR==1{next} {var=$1; foo=$2; bar=$17; getline;} $1==var && $2==foo{xxx=bar-$17; print xxx}' file
3.3323
2
awk '
BEGIN { FS = "," }
NR == 1 { next } # skip header line
{ # accumulate totals
if ($1 SUBSEP $2 in a) # if key already exists
a[$1,$2] -= $17 # subtract $17 from value
else # if first appearance of this key
a[$1,$2] = $17 # set value to $17
}
END { # print results
for (x in a)
print a[x]
}
' file