How to awk every nth line starting from different lines each iteration

How to awk every nth line starting from different lines each iteration - awk

I would like awk to print every nth line out of a file starting from line 0. Then, after awk has gone through the whole file, I would like it to print every nth line starting from line 1...then print every nth line starting from line 2...etc, up to printing every nth line starting from line n-1. My sad attempt thus far:
#!/bin/bash
rm *.sad *.sadd *.out
#Create loop index
for i in $(seq 20 1 36);
do
listm+=($i)
done
#Create input file
for j in "${listm[#]}"
do
if [ $j -eq 20 ];
then
awk 'NR % 20 == 0' vel_VMDout > atomvel.dat
awk '{print $2,$3,$4}' atomvel.dat > velocity.dat
else
awk 'NR % 20 == 1' vel_VMDout > $j.sad
egrep -v "^[[:space:]]*$|^#" $j.sad > $j.sadd
awk '{print $2, $3, $4}' $j.sadd > $j.out
paste velocity.dat $j.out > taste
fi
done
Let me try to clarify this by providing the input and what the output should look like. Th input is an xyz file of an MD simulation consisting of frames of the atoms' xyz coordinates.
INPUT:
This image shows the 1st snapshot and part of the second snapshot. Because these are snapshot, the ordering of the atoms do not change. Thus, I am trying to print the xyz coordinates from each snapshot for each specific atom in their own columns as shown below. This would eventually make a file consisting of 3N columns, where N is the number of atoms.
OUTPUT:
As you can see, the each atoms' coordinates are in their own columns and the total file is a Nx3N array. My bash script was me trying to do this, but could only do the first two atoms. I wanted to print every nth line (coordinates of the nth atom) so they look like the output. I really appreciate your patience all.

Generating sample data
This is a step that should not be necessary; the question should have included usable sample data and the required output from that sample data.
At one level, it won't help much because you don't have my random number generator program, but the script below shows how I generated the data that follows, and it illustrates the lengths to which it might be necessary to go when the question doesn't supply readable data. I generated some data that looks similar to the data in the question (at least superficially):
18
Generated by VMD in absentia
C 0.979485 -6.665347 0.575383
C 1.191999 -3.002386 2.859484
C 3.151517 -5.610077 0.429413
C 3.439828 -6.454984 1.319724
C 3.726201 -0.123038 2.096854
C 1.363325 -3.031238 0.016019
C 6.090283 -3.915340 2.396358
C 0.407755 -7.957784 -0.846842
C 0.203074 -0.796428 2.659573
O 2.600610 -2.259674 -0.260378
O 4.773839 -6.765097 0.588508
H 2.743424 -2.890016 2.906452
H 2.810233 -6.641054 -0.797672
H 6.854169 -3.191721 -0.925670
O 2.914233 -1.060001 0.776983
H 3.803923 -1.497032 2.908799
H 5.669443 -7.227666 -0.647552
H 0.092455 -5.850637 2.959987
18
Generated by VMD in absentia
C 6.042840 -7.254720 2.093573
C 2.551942 -6.044322 2.061072
C 3.523150 -6.167163 2.451689
C 5.197316 -3.429866 -0.412062
C 2.548777 -6.422851 1.282846
C 3.775197 -2.012031 1.377440
C 3.405112 -3.206415 -0.879886
C 1.448359 -5.419629 0.467291
C 3.661964 -2.789234 2.644294
O 4.214854 -2.439574 -0.951704
O 5.297609 -2.320418 2.709898
H 2.653940 -4.431080 -0.511743
H 5.040635 -0.676199 -0.590970
H 1.546725 -1.294582 2.562937
O 4.231461 -7.180908 1.629901
H 3.297836 -1.557133 -0.133280
H 3.442481 -4.489962 2.111930
H 1.423611 -7.982655 0.715618
18
Generated by VMD in absentia
C 1.432495 -7.686243 2.525734
C 5.038409 -4.976270 2.826846
C 6.184137 -7.303094 2.711561
C 3.208125 -0.606556 1.978725
C 2.171859 -6.792060 0.678988
C 6.521124 -5.622797 -0.773797
C 1.725619 -5.768633 -0.223397
C 3.602427 -2.325680 1.762008
C 1.937521 -1.686895 1.743159
O 0.745526 -0.114246 -0.949490
O 4.754360 -6.531145 1.998913
H 1.114732 -1.158810 1.486939
H 6.410490 -5.411647 0.062737
H 4.164330 -6.743763 1.802804
O 2.587841 -3.979700 2.609748
H 2.192073 -2.815376 -0.809569
H 5.501795 -2.326438 1.325829
H 3.285032 -1.212541 1.284453
18
Generated by VMD in absentia
C 3.564424 -3.117406 -0.032879
C 2.894745 -0.632591 0.532311
C 3.384916 -5.383135 1.179585
C 0.793488 -0.894539 -0.886891
C 1.348785 -6.501867 1.648604
C 2.189941 -2.438067 0.616090
C 2.043378 -4.966472 0.691603
C 3.124161 -5.792896 0.545362
C 5.741472 -0.640590 2.825374
O 0.300550 -7.149663 0.942726
O 1.344387 -0.121382 2.169401
H 4.963296 -0.964665 -0.230523
H 6.651423 -4.905053 2.509626
H 5.059694 -6.166516 0.102255
O 5.046864 -3.288883 0.853948
H 2.389007 -3.057664 1.806301
H 2.365876 -0.956860 1.458959
H 2.892502 -0.097422 -0.531714
The script I used to do it was:
random -n $((4 * 18)) -T '%8:6[0:7]F %8:6[-8:0]F %8:6[-1:3]F' |
awk 'BEGIN { n = split("CCCCCCCCCOOHHHOHHH", atoms, ""); atoms[0] = atoms[n] }
NR % n == 1 { print n; print " Generated by VMD in absentia" }
{ print "", atoms[NR%18], " ", $0 }'
The -n option to random says how many rows to generate; I chose 72. The -T option is a template, and the notation %8:6[0:7]F means use %8.6F format to print uniformly distributed random numbers between 0 and 7. The awk script takes the data that is so generated and interpolates the noise (the number of atoms and a variant on the 'generated by VMD' line), as well as tagging the lines with the appropriate atomic symbol.
Processing the sample data
Given some data, you then need to munge it to get the required output. This script more or less does the job. There are endless ways it should be improved, of course, such as taking file names as command line arguments, using temporary file names instead of fixed names, cleaning up the intermediate files, different compounds, different atoms (nitrogen, phosphorous, etc), and so on. However, it should adapt reasonably easily.
input="data"
output="output"
n=$(sed 1q "$input")
n2=$(($n+2))
for ((i = 3; i <= n2; i++))
do
colno=$(printf "%.2d" $(($i-2)))
awk -v N=$n2 -v R=$i \
' BEGIN { name["C"] = "Carbon"; name["H"] = "Hydrogen"; name["O"] = "Oxygen";
R0 = R % N }
NR > 2 && NR <= R { count[$1]++; }
NR == R { printf "%-32.32s\n", name[$1] " " count[$1]; }
NR % N == R0 { xyz = sprintf("%s %s %s", $2, $3, $4); printf "%-32.32s\n", xyz }
' "$input" > "column.$colno"
done
paste -d ' ' column.* > "$output"
The first four lines set up the control parameters, collecting the number of lines per unit of data from the input file, and adjusting things accordingly. The for loop iterates over offsets 3 to $n2 inclusive (skipping the two header lines), and runs the awk script. That encodes atom types (BEGIN), determines which atom it is processing this time (NR > 2 && NR <= R and NR == R), and then arranges to print the triplets of data for the relevant atom. The formatting is carefully organized so that the column headings and the actual xyz-triplets are uniformly spaced. These are written to a file column.$colno. When all's done, the column.* files are pasted to generate a single output file, which looks like this:
Carbon 1 Carbon 2 Carbon 3 Carbon 4 Carbon 5 Carbon 6 Carbon 7 Carbon 8 Carbon 9 Oxygen 1 Oxygen 2 Hydrogen 1 Hydrogen 2 Hydrogen 3 Oxygen 3 Hydrogen 4 Hydrogen 5 Hydrogen 6
0.979485 -6.665347 0.575383 1.191999 -3.002386 2.859484 3.151517 -5.610077 0.429413 3.439828 -6.454984 1.319724 3.726201 -0.123038 2.096854 1.363325 -3.031238 0.016019 6.090283 -3.915340 2.396358 0.407755 -7.957784 -0.846842 0.203074 -0.796428 2.659573 2.600610 -2.259674 -0.260378 4.773839 -6.765097 0.588508 2.743424 -2.890016 2.906452 2.810233 -6.641054 -0.797672 6.854169 -3.191721 -0.925670 2.914233 -1.060001 0.776983 3.803923 -1.497032 2.908799 5.669443 -7.227666 -0.647552 0.092455 -5.850637 2.959987
6.042840 -7.254720 2.093573 2.551942 -6.044322 2.061072 3.523150 -6.167163 2.451689 5.197316 -3.429866 -0.412062 2.548777 -6.422851 1.282846 3.775197 -2.012031 1.377440 3.405112 -3.206415 -0.879886 1.448359 -5.419629 0.467291 3.661964 -2.789234 2.644294 4.214854 -2.439574 -0.951704 5.297609 -2.320418 2.709898 2.653940 -4.431080 -0.511743 5.040635 -0.676199 -0.590970 1.546725 -1.294582 2.562937 4.231461 -7.180908 1.629901 3.297836 -1.557133 -0.133280 3.442481 -4.489962 2.111930 1.423611 -7.982655 0.715618
1.432495 -7.686243 2.525734 5.038409 -4.976270 2.826846 6.184137 -7.303094 2.711561 3.208125 -0.606556 1.978725 2.171859 -6.792060 0.678988 6.521124 -5.622797 -0.773797 1.725619 -5.768633 -0.223397 3.602427 -2.325680 1.762008 1.937521 -1.686895 1.743159 0.745526 -0.114246 -0.949490 4.754360 -6.531145 1.998913 1.114732 -1.158810 1.486939 6.410490 -5.411647 0.062737 4.164330 -6.743763 1.802804 2.587841 -3.979700 2.609748 2.192073 -2.815376 -0.809569 5.501795 -2.326438 1.325829 3.285032 -1.212541 1.284453
3.564424 -3.117406 -0.032879 2.894745 -0.632591 0.532311 3.384916 -5.383135 1.179585 0.793488 -0.894539 -0.886891 1.348785 -6.501867 1.648604 2.189941 -2.438067 0.616090 2.043378 -4.966472 0.691603 3.124161 -5.792896 0.545362 5.741472 -0.640590 2.825374 0.300550 -7.149663 0.942726 1.344387 -0.121382 2.169401 4.963296 -0.964665 -0.230523 6.651423 -4.905053 2.509626 5.059694 -6.166516 0.102255 5.046864 -3.288883 0.853948 2.389007 -3.057664 1.806301 2.365876 -0.956860 1.458959 2.892502 -0.097422 -0.531714
Your task is to understand why all the bits of the awk script are present. For example, why is R0 needed (hint, experiment without the R0 calculation, and use R in its place).

Related

How to replace a value to another value in a specific column on a gzipped file using awk?

I have a compressed file (.gz) The file has approx 7000000 rows and the first few lines look like this:
CHROM POS ID REF ALT A1 TEST OBS_CT BETA SE T_STAT P
1 54712 1:54712 TTTTC T ADD 1460 0.00428077 0.0561095 0.0762931 0.939196
1 825069 rs4475692 G C G ADD 1460 -0.000411661 0.0413083 -0.00996558 0.99205
1 825410 rs13303179 G A G ADD 1460 0.00489633 0.041967 0.116671 0.907137
The end of the file has X in the first column
X 154929637 rs35185538:154929637:CT:C CT C C ADD 1460 0.0787708 0.0396199 1.98816 0.0469823
X 154929952 rs4012982:154929952:CAA:C CAA C C ADD 1460 0.0265508 0.0522027 0.50861 0.611104
X 154930230 rs781880:154930230:A:G A G G ADD 1460 0.0827871 0.0356246 2.32387 0.0202707
I want to replace the X (only the X) to 23 and preserve the header. I have tried to no avail.
gunzip -c file.gz | awk 'NR==1{gsub(/\X/,"23",$1)} 1' > out.txt
Any help will be appreciated.
Avni.

You could check only for X in the first column and check if the row number is greater than 1.
Then you can replace X at the start of the string using ^X with 23.
awk 'NR > 1 && $1=="X" {sub(/^X/,"23")}1' > out.txt

Awk failing extraction

I have a huge file containing the xyz positions of some atoms from different molecules. The whole file contains ~ 10000 configurations. I have created a script that iterates over the total number of configurations and extracts the coordinates associated with a specific atomic species that is systematically repeated at a fixed position, along with each frame associated with each system. My code works perfectly, except in the case in which the atomic position coincides with the last position of the frame I have to process, skipping to grab it and print in the corresponding file.
Each frame contains 384 atoms. In the xyz format, we have to take into account two extra lines at the beginning, where the number of atoms (in this case 384, line #1) and a blank/commented line are (line #2) are located.
The awk file with the list of atoms position lines is of the form:
{n = NR%386}
n == 1 {print "24"; next}
n == 2 ||
n == 91 ||
...
n == 378 ||
n == 380 ||
n == 381 ||
n == 386
where the n=NR%386 is the number of lines that awk has to account at every iteration in order to have the correct number of frames, in
n == 1 {print "24"; next}
the code prints the number of atoms I want to extract for each frame, in this case, 24.
The problem arises with the last value, in the last position of each frame before advancing to the next frame:
n == 386
When using the command
awk -f file.awk filename.xyz >> test.txt
the code will skip reading, extracting, and printing the last coordinate.
The filename.xyz I have to process is something like:
384
i = 3171, time = 3171.000, E = -3298.3005315786
C 6.66359796 19.29831718 16.63773520
C 6.19922671 19.83243350 15.35406226
C 7.73577004 21.24303011 16.94974860
C 7.32315891 21.77975003 15.67093925
N 5.08248005 17.55384984 15.51887635
N 7.75857672 23.00895664 15.43811018
N 8.58649028 22.07495287 17.61330368
N 7.45555304 19.97249138 17.42360101
...
...
...
N 3.62924684 23.22942656 15.38486984
N 4.52670891 22.25077226 17.55981432
N 3.17369677 20.23465407 17.45881199
N 2.28230853 21.30557433 14.86646780
S 1.48394488 18.18032187 17.21253664
S 0.70072709 19.13053602 14.60582837
S 4.67511560 23.53830074 16.57005901
Currently, just trying to extract only position 386
n == 386
produces something like:
1
i = 3171, time = 3171.000, E = -3298.3005315786
1
i = 3172, time = 3172.000, E = -3298.3023115390
1
i = 3173, time = 3173.000, E = -3298.3056102462
1
i = 3174, time = 3174.000, E = -3298.3101590395
that are just the corresponding to the commented lines, apparently skipping or not correctly interpreting which line to grep.
I would like to understand why awk if not able to extract the last line properly and how to solve the problem.

This appears to be a math problem. NR%386 will never be 386 because of the way the modulus operator works (there is no remainder when you divide 386 by 386). So your n==386 will never get executed. Try using (NR-1)%386 instead of NR%386 and shift all your conditionals accordingly:
n == 0 {print "24"; next}
etc. If you need n for calculations, add one to it.

Reading fields in previous lines for moving average

Main Question
What is the correct syntax for recursively calling AWK inside of another AWK program, and then saving the output to a (numeric) variable?
I want to call AWK using 2/3 variables:
N -> Can be read from Bash or from container AWK script.
Linenum -> Read from container AWK program
J -> Field that I would like to read
This is my attempt.
Container AWk program:
BEGIN {}
{
...
# Loop in j
...
k=NR
# Call to other instance of AWK
var=(awk -f -v n="$n_steps" linenum=k input-file 'linenum-n {printf "%5.4E", $j}'
...
}
END{}
Background for more general questions:
I have a file for which I would like to calculate a moving average of n (for example 2280) steps.
Ideally, for the first n rows the average is of the values 1 to k,
where k <= n.
For rows k > n the average would be of the last n values.
I will eventually execute the code in many large files, with several columns, and thousands to millions of rows, so I'm interested in streamlining the code as much as possible.
Code Excerpt and Description
The code I'm trying to develop looks something like this:
NR>1
{
# Loop over fields
for (j in columns)
{
# Rows before full moving average is done
if ( $1 <= n )
{
cumsum[j]=cumsum[j]+$j #Cumulative sum
$j=cumsum[j]/$1 # Average
}
#moving average
if ( $1 > n )
{
k=NR
last[j]=(awk -f -v n="$n_steps" ln=k input-file 'ln-n {printf "%5.4E", $j}') # Obtain value that will get ubstracted from moving average
cumsum[j]=cumsum[j]+$j-last[j] # Cumulative sum adds last step and deleted unwanted value
$j=cumsum[j]/n # Moving average
}
}
}
My input file contains several columns. The first column contains the row number, and the other columns contain values.
For the cumulative sum of the moving average: If I am in row k, I want to add it to the cumulative sum, but also start subtracting the first value that I don't need (k-n).
I don't want to have to create an array of cumulative sums for the last steps, because I feel it could impact performance. I prefer to directly select the values that I want to substract.
For that I need to call AWK once again (but on a different line). I attempt to do it in this line:
k=NR
last[j]=(awk -f -v n="$n_steps" ln=k input-file 'ln-n {printf "%5.4E", $j}'
I am sure that this code cannot be correct.
Discussion Questions
What is the best way to obtain information about a field in a previous line to the one that AWK is working on? Can it be then saved into a variable?
Is this recursive use of AWK allowed or even recommended?
If not, what could be the most efficient way to update the cumulative sum values so that I get an efficient enough code?
Sample input and Output
Here is a sample of the input (second column) and the desired output (third column). I'm using 3 as the number of averaging steps (n)
N VAL AVG_VAL
1 1 1
2 2 1.5
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
11 11 10
12 12 11
13 13 12
14 14 13
14 15 14

If you want to do a running average of a single column, you can do it this way:
BEGIN{n=2280; c=7}
{ s += $c - a[NR%n]; a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
Here we store the last n values in an array a and keep track of the cumulative sum s. Every time we update the sum we correct by first removing the last value from it.
If you want to do this for a couple of columns, you have to be a bit handy with keeping track of your arrays
BEGIN{n=2280; c[0]=7; c[1]=8; c[2]=9}
{ for(i in c) { s[i] += $c[i] - a[n*i + NR%n]; a[n*i + NR%n] = $c[i] } }
{ printf $0
for(i=0;i<length(c);++i) printf OFS (s[i]/(NR < n : NR ? n))
printf ORS
}
However, you mentioned that you have to add millions of entries. That is where it becomes a bit more tricky. Summing a lot of values will introduce numeric errors as you loose precision bit by bit (when you add floats). So in this case, I would suggest implementing the Kahan summation.
For a single column you get:
BEGIN{n=2280; c=7}
{ y = $c - a[NR%n] - k; t = s + y; k = (t - s) - y; s = t; a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
or a bit more expanded as:
BEGIN{n=2280; c=7}
{ y = $c - k; t = s + y; k = (t - s) - y; s = t; }
{ y = -a[NR%n] - k; t = s + y; k = (t - s) - y; s = t; }
{ a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
For a multi-column problem, it is now straightforward to adjust the above script. All you need to know is that y and t are temporary values and k is the compensation term which needs to be stored in memory.

Plotting a function directly from a text file

Is there a way to plot a function based on values from a text file?
I know how to define a function in gnuplot and then plot it but that is not what I need.
I have a table with constants for functions that are updated regularly. When this update happens I want to be able to run a script that draws a figure with this new curve. Since there are quite few figures to draw I want to automate the procedure.
Here is an example table with constants:
location a b c
1 1 3 4
2
There are two ways I see to solve the problem but I do not know if and how they can be implemented.
I can then use awk to produce the string: f(x)=1(x)**2+3(x)+4, write it to a file and somehow make gnuplot read this new file and plot on a certain x range.
or use awk inside gnuplot something like f(x) = awk /1/ {print "f(x)="$2 etc., or use awk directly in the plot command.
I any case, I'm stuck and have not found a solution to this problem online, do you have any suggestions?

Another possibilty to have a somewhat generic version for this, you can do the following:
Assume, the parameters are stored in a file parameters.dat with the first line containing the variable names and all others the parameter sets, like
location a b c
1 1 3 4
The script file looks like this:
file = 'parameters.dat'
par_names = system('head -1 '.file)
par_cnt = words(par_names)
# which parameter set to choose
par_line_num = 2
# select the respective string
par_line = system(sprintf('head -%d ', par_line_num).file.' | tail -1')
par_string = ''
do for [i=1:par_cnt] {
eval(word(par_names, i).' = '.word(par_line, i))
}
f(x) = a*x**2 + b*x + c
plot f(x) title sprintf('location = %d', location)

This question (gnuplot store one number from data file into variable) had some hints for me in the first answer.
In my case I have a file which contains parameters for a parabola. I have saved the parameters in gnuplot variables. Then I plot the function containing the parameter variables for each timestep.
#!/usr/bin/gnuplot
datafile = "parabola.txt"
set terminal pngcairo size 1000,500
set xrange [-100:100]
set yrange [-100:100]
titletext(timepar, apar, cpar) = sprintf("In timestep %d we have parameter a = %f, parameter c = %f", timepar, apar, cpar)
do for [step=1:400] {
set output sprintf("parabola%04d.png", step)
# read parameters from file, where the first line is the header, thus the +1
a=system("awk '{ if (NR == " . step . "+1) printf \"%f\", $1}' " . datafile)
c=system("awk '{ if (NR == " . step . "+1) printf \"%f\", $2}' " . datafile)
# convert parameters to numeric format
a=a+0.
c=c+0.
set title titletext(step, a, c)
plot c+a*x**2
}
This gives a series of png files called parabola0001.png,
parabola0002.png,
parabola0003.png,
…, each showing a parabola with the parameters read from the file called parabola.txt. The title contains the parameters of the given time step.
For understanding the gnuplot system() function you have to know that:
stuff inside double quotes is not parsed by gnuplot
the dot is for concatenating strings in gnuplot
the double quotes for the awk printf command have to be escaped, to hide them from gnuplot parser
To test this gnuplot script, save it into a file with an arbitrary name, e.g. parabolaplot.gplot and make it executable (chmad a+x parabolaplot.gplot). The parabola.txt file can be created with
awk 'BEGIN {for (i=1; i<=1000; i++) printf "%f\t%f\n", i/200, i/100}' > parabola.txt

awk '/1/ {print "plot "$2"*x**2+"$3"*x+"$4}' | gnuplot -persist
Will select the line and plot it

This was/is another question about how to extract specific values into variables with gnuplot (maybe it would be worth to create a Wiki entry about this topic).
There is no need for using awk, you can do this simply with gnuplot only (hence platform-independent), even with gnuplot 4.6.0 (March 2012).
You can do a stats (check help stats) and assign the values to variables.
Data: SO15007620_Parameters.txt
location a b c
1 1 3 4
2 -1 2 3
3 2 1 -1
Script: (works with gnuplot 4.6.0, March 2012)
### read parameters from separate file into variables
reset
FILE = "SO15007620_Parameters.txt"
myLine = 1 # line index 0-based
stats FILE u (a=$2, b=$3, c=$4) every ::myLine::myLine nooutput
f(x) = a*x**2 + b*x + c
plot f(x) w l lc rgb "red" ti sprintf("f(x) = %gx^2 + %gx + %g", a,b,c)
### end of script
Result:

gnuplot store one number from data file into variable

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected

You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.

This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.

Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to awk every nth line starting from different lines each iteration - awk

Related

How to replace a value to another value in a specific column on a gzipped file using awk?

Awk failing extraction

Reading fields in previous lines for moving average

Plotting a function directly from a text file

gnuplot store one number from data file into variable

Categories

Resources