Awk failing extraction - awk

I have a huge file containing the xyz positions of some atoms from different molecules. The whole file contains ~ 10000 configurations. I have created a script that iterates over the total number of configurations and extracts the coordinates associated with a specific atomic species that is systematically repeated at a fixed position, along with each frame associated with each system. My code works perfectly, except in the case in which the atomic position coincides with the last position of the frame I have to process, skipping to grab it and print in the corresponding file.
Each frame contains 384 atoms. In the xyz format, we have to take into account two extra lines at the beginning, where the number of atoms (in this case 384, line #1) and a blank/commented line are (line #2) are located.
The awk file with the list of atoms position lines is of the form:
{n = NR%386}
n == 1 {print "24"; next}
n == 2 ||
n == 91 ||
...
n == 378 ||
n == 380 ||
n == 381 ||
n == 386
where the n=NR%386 is the number of lines that awk has to account at every iteration in order to have the correct number of frames, in
n == 1 {print "24"; next}
the code prints the number of atoms I want to extract for each frame, in this case, 24.
The problem arises with the last value, in the last position of each frame before advancing to the next frame:
n == 386
When using the command
awk -f file.awk filename.xyz >> test.txt
the code will skip reading, extracting, and printing the last coordinate.
The filename.xyz I have to process is something like:
384
i = 3171, time = 3171.000, E = -3298.3005315786
C 6.66359796 19.29831718 16.63773520
C 6.19922671 19.83243350 15.35406226
C 7.73577004 21.24303011 16.94974860
C 7.32315891 21.77975003 15.67093925
N 5.08248005 17.55384984 15.51887635
N 7.75857672 23.00895664 15.43811018
N 8.58649028 22.07495287 17.61330368
N 7.45555304 19.97249138 17.42360101
...
...
...
N 3.62924684 23.22942656 15.38486984
N 4.52670891 22.25077226 17.55981432
N 3.17369677 20.23465407 17.45881199
N 2.28230853 21.30557433 14.86646780
S 1.48394488 18.18032187 17.21253664
S 0.70072709 19.13053602 14.60582837
S 4.67511560 23.53830074 16.57005901
Currently, just trying to extract only position 386
n == 386
produces something like:
1
i = 3171, time = 3171.000, E = -3298.3005315786
1
i = 3172, time = 3172.000, E = -3298.3023115390
1
i = 3173, time = 3173.000, E = -3298.3056102462
1
i = 3174, time = 3174.000, E = -3298.3101590395
that are just the corresponding to the commented lines, apparently skipping or not correctly interpreting which line to grep.
I would like to understand why awk if not able to extract the last line properly and how to solve the problem.

This appears to be a math problem. NR%386 will never be 386 because of the way the modulus operator works (there is no remainder when you divide 386 by 386). So your n==386 will never get executed. Try using (NR-1)%386 instead of NR%386 and shift all your conditionals accordingly:
n == 0 {print "24"; next}
etc. If you need n for calculations, add one to it.

Related

How do you detect blank lines in Fortran?

Given an input that looks like the following:
123
456
789
42
23
1337
3117
I want to iterate over this file in whitespace-separated chunks in Fortran (any version is fine). For example, let's say I wanted to take the average of each chunk (e.g. mean(123, 456, 789) then mean(42, 23, 1337) then mean(31337)).
I've tried iterating through the file normally (e.g. READ), reading in each line as a string and then converting to an int and doing whatever math I want to do on each chunk. The trouble here is that Fortran "helpfully" ignores blank lines in my text file - so when I try and compare against the empty string to check for the blank line, I never actually get a .True. on that comparison.
I feel like I'm missing something basic here, since this is a typical functionality in every other modern language, I'd be surprised if Fortran didn't somehow have it.
If you're using so-called "list-directed" input (format = '*'), Fortran does special handling to spaces, commas, and blank lines.
To your point, there's a feature which is using the BLANK keyword with read
read(iunit,'(i10)',blank="ZERO",err=1,end=2) array
You can set:
blank="ZERO" will return a valid zero value if a blank is found;
blank="NULL" is the default behavior that skips blank/returns an error depending on the input format.
If all your input values are positive, you could use blank="ZERO" and then use the location of zero values to process your data.
EDIT as #vladimir-f has correctly pointed out, you not only have blanks in between lines, but also after the end of the numbers in most lines, so this strategy will not work.
You can instead load everything into an array, and process it afterwards:
program array_with_blanks
integer :: ierr,num,iunit
integer, allocatable :: array(:)
open(newunit=iunit,file='stackoverflow',form='formatted',iostat=ierr)
allocate(array(0))
do
read(iunit,'(i10)',iostat=ierr) num
if (is_iostat_end(ierr)) then
exit
else
array = [array,num]
endif
end do
close(iunit)
print *, array
end program
Just read each line as a character (but note Francescalus's comment on the format). Then read the character as an internal file.
program stuff
implicit none
integer io, n, value, sum
character (len=1000) line
n = 0
sum = 0
io = 0
open( 42, file="stuff.txt" )
do while( io == 0 )
read( 42, "( a )", iostat = io ) line
if ( io /= 0 .or. line == "" ) then
if ( n > 0 ) print *, ( sum + 0.0 ) / n
n = 0
sum = 0
else
read( line, * ) value
n = n + 1
sum = sum + value
end if
end do
close( 42 )
end program stuff
456.000000
467.333344
3117.00000

Reading fields in previous lines for moving average

Main Question
What is the correct syntax for recursively calling AWK inside of another AWK program, and then saving the output to a (numeric) variable?
I want to call AWK using 2/3 variables:
N -> Can be read from Bash or from container AWK script.
Linenum -> Read from container AWK program
J -> Field that I would like to read
This is my attempt.
Container AWk program:
BEGIN {}
{
...
# Loop in j
...
k=NR
# Call to other instance of AWK
var=(awk -f -v n="$n_steps" linenum=k input-file 'linenum-n {printf "%5.4E", $j}'
...
}
END{}
Background for more general questions:
I have a file for which I would like to calculate a moving average of n (for example 2280) steps.
Ideally, for the first n rows the average is of the values 1 to k,
where k <= n.
For rows k > n the average would be of the last n values.
I will eventually execute the code in many large files, with several columns, and thousands to millions of rows, so I'm interested in streamlining the code as much as possible.
Code Excerpt and Description
The code I'm trying to develop looks something like this:
NR>1
{
# Loop over fields
for (j in columns)
{
# Rows before full moving average is done
if ( $1 <= n )
{
cumsum[j]=cumsum[j]+$j #Cumulative sum
$j=cumsum[j]/$1 # Average
}
#moving average
if ( $1 > n )
{
k=NR
last[j]=(awk -f -v n="$n_steps" ln=k input-file 'ln-n {printf "%5.4E", $j}') # Obtain value that will get ubstracted from moving average
cumsum[j]=cumsum[j]+$j-last[j] # Cumulative sum adds last step and deleted unwanted value
$j=cumsum[j]/n # Moving average
}
}
}
My input file contains several columns. The first column contains the row number, and the other columns contain values.
For the cumulative sum of the moving average: If I am in row k, I want to add it to the cumulative sum, but also start subtracting the first value that I don't need (k-n).
I don't want to have to create an array of cumulative sums for the last steps, because I feel it could impact performance. I prefer to directly select the values that I want to substract.
For that I need to call AWK once again (but on a different line). I attempt to do it in this line:
k=NR
last[j]=(awk -f -v n="$n_steps" ln=k input-file 'ln-n {printf "%5.4E", $j}'
I am sure that this code cannot be correct.
Discussion Questions
What is the best way to obtain information about a field in a previous line to the one that AWK is working on? Can it be then saved into a variable?
Is this recursive use of AWK allowed or even recommended?
If not, what could be the most efficient way to update the cumulative sum values so that I get an efficient enough code?
Sample input and Output
Here is a sample of the input (second column) and the desired output (third column). I'm using 3 as the number of averaging steps (n)
N VAL AVG_VAL
1 1 1
2 2 1.5
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
11 11 10
12 12 11
13 13 12
14 14 13
14 15 14
If you want to do a running average of a single column, you can do it this way:
BEGIN{n=2280; c=7}
{ s += $c - a[NR%n]; a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
Here we store the last n values in an array a and keep track of the cumulative sum s. Every time we update the sum we correct by first removing the last value from it.
If you want to do this for a couple of columns, you have to be a bit handy with keeping track of your arrays
BEGIN{n=2280; c[0]=7; c[1]=8; c[2]=9}
{ for(i in c) { s[i] += $c[i] - a[n*i + NR%n]; a[n*i + NR%n] = $c[i] } }
{ printf $0
for(i=0;i<length(c);++i) printf OFS (s[i]/(NR < n : NR ? n))
printf ORS
}
However, you mentioned that you have to add millions of entries. That is where it becomes a bit more tricky. Summing a lot of values will introduce numeric errors as you loose precision bit by bit (when you add floats). So in this case, I would suggest implementing the Kahan summation.
For a single column you get:
BEGIN{n=2280; c=7}
{ y = $c - a[NR%n] - k; t = s + y; k = (t - s) - y; s = t; a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
or a bit more expanded as:
BEGIN{n=2280; c=7}
{ y = $c - k; t = s + y; k = (t - s) - y; s = t; }
{ y = -a[NR%n] - k; t = s + y; k = (t - s) - y; s = t; }
{ a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
For a multi-column problem, it is now straightforward to adjust the above script. All you need to know is that y and t are temporary values and k is the compensation term which needs to be stored in memory.

How to awk every nth line starting from different lines each iteration

I would like awk to print every nth line out of a file starting from line 0. Then, after awk has gone through the whole file, I would like it to print every nth line starting from line 1...then print every nth line starting from line 2...etc, up to printing every nth line starting from line n-1. My sad attempt thus far:
#!/bin/bash
rm *.sad *.sadd *.out
#Create loop index
for i in $(seq 20 1 36);
do
listm+=($i)
done
#Create input file
for j in "${listm[#]}"
do
if [ $j -eq 20 ];
then
awk 'NR % 20 == 0' vel_VMDout > atomvel.dat
awk '{print $2,$3,$4}' atomvel.dat > velocity.dat
else
awk 'NR % 20 == 1' vel_VMDout > $j.sad
egrep -v "^[[:space:]]*$|^#" $j.sad > $j.sadd
awk '{print $2, $3, $4}' $j.sadd > $j.out
paste velocity.dat $j.out > taste
fi
done
Let me try to clarify this by providing the input and what the output should look like. Th input is an xyz file of an MD simulation consisting of frames of the atoms' xyz coordinates.
INPUT:
This image shows the 1st snapshot and part of the second snapshot. Because these are snapshot, the ordering of the atoms do not change. Thus, I am trying to print the xyz coordinates from each snapshot for each specific atom in their own columns as shown below. This would eventually make a file consisting of 3N columns, where N is the number of atoms.
OUTPUT:
As you can see, the each atoms' coordinates are in their own columns and the total file is a Nx3N array. My bash script was me trying to do this, but could only do the first two atoms. I wanted to print every nth line (coordinates of the nth atom) so they look like the output. I really appreciate your patience all.
Generating sample data
This is a step that should not be necessary; the question should have included usable sample data and the required output from that sample data.
At one level, it won't help much because you don't have my random number generator program, but the script below shows how I generated the data that follows, and it illustrates the lengths to which it might be necessary to go when the question doesn't supply readable data. I generated some data that looks similar to the data in the question (at least superficially):
18
Generated by VMD in absentia
C 0.979485 -6.665347 0.575383
C 1.191999 -3.002386 2.859484
C 3.151517 -5.610077 0.429413
C 3.439828 -6.454984 1.319724
C 3.726201 -0.123038 2.096854
C 1.363325 -3.031238 0.016019
C 6.090283 -3.915340 2.396358
C 0.407755 -7.957784 -0.846842
C 0.203074 -0.796428 2.659573
O 2.600610 -2.259674 -0.260378
O 4.773839 -6.765097 0.588508
H 2.743424 -2.890016 2.906452
H 2.810233 -6.641054 -0.797672
H 6.854169 -3.191721 -0.925670
O 2.914233 -1.060001 0.776983
H 3.803923 -1.497032 2.908799
H 5.669443 -7.227666 -0.647552
H 0.092455 -5.850637 2.959987
18
Generated by VMD in absentia
C 6.042840 -7.254720 2.093573
C 2.551942 -6.044322 2.061072
C 3.523150 -6.167163 2.451689
C 5.197316 -3.429866 -0.412062
C 2.548777 -6.422851 1.282846
C 3.775197 -2.012031 1.377440
C 3.405112 -3.206415 -0.879886
C 1.448359 -5.419629 0.467291
C 3.661964 -2.789234 2.644294
O 4.214854 -2.439574 -0.951704
O 5.297609 -2.320418 2.709898
H 2.653940 -4.431080 -0.511743
H 5.040635 -0.676199 -0.590970
H 1.546725 -1.294582 2.562937
O 4.231461 -7.180908 1.629901
H 3.297836 -1.557133 -0.133280
H 3.442481 -4.489962 2.111930
H 1.423611 -7.982655 0.715618
18
Generated by VMD in absentia
C 1.432495 -7.686243 2.525734
C 5.038409 -4.976270 2.826846
C 6.184137 -7.303094 2.711561
C 3.208125 -0.606556 1.978725
C 2.171859 -6.792060 0.678988
C 6.521124 -5.622797 -0.773797
C 1.725619 -5.768633 -0.223397
C 3.602427 -2.325680 1.762008
C 1.937521 -1.686895 1.743159
O 0.745526 -0.114246 -0.949490
O 4.754360 -6.531145 1.998913
H 1.114732 -1.158810 1.486939
H 6.410490 -5.411647 0.062737
H 4.164330 -6.743763 1.802804
O 2.587841 -3.979700 2.609748
H 2.192073 -2.815376 -0.809569
H 5.501795 -2.326438 1.325829
H 3.285032 -1.212541 1.284453
18
Generated by VMD in absentia
C 3.564424 -3.117406 -0.032879
C 2.894745 -0.632591 0.532311
C 3.384916 -5.383135 1.179585
C 0.793488 -0.894539 -0.886891
C 1.348785 -6.501867 1.648604
C 2.189941 -2.438067 0.616090
C 2.043378 -4.966472 0.691603
C 3.124161 -5.792896 0.545362
C 5.741472 -0.640590 2.825374
O 0.300550 -7.149663 0.942726
O 1.344387 -0.121382 2.169401
H 4.963296 -0.964665 -0.230523
H 6.651423 -4.905053 2.509626
H 5.059694 -6.166516 0.102255
O 5.046864 -3.288883 0.853948
H 2.389007 -3.057664 1.806301
H 2.365876 -0.956860 1.458959
H 2.892502 -0.097422 -0.531714
The script I used to do it was:
random -n $((4 * 18)) -T '%8:6[0:7]F %8:6[-8:0]F %8:6[-1:3]F' |
awk 'BEGIN { n = split("CCCCCCCCCOOHHHOHHH", atoms, ""); atoms[0] = atoms[n] }
NR % n == 1 { print n; print " Generated by VMD in absentia" }
{ print "", atoms[NR%18], " ", $0 }'
The -n option to random says how many rows to generate; I chose 72. The -T option is a template, and the notation %8:6[0:7]F means use %8.6F format to print uniformly distributed random numbers between 0 and 7. The awk script takes the data that is so generated and interpolates the noise (the number of atoms and a variant on the 'generated by VMD' line), as well as tagging the lines with the appropriate atomic symbol.
Processing the sample data
Given some data, you then need to munge it to get the required output. This script more or less does the job. There are endless ways it should be improved, of course, such as taking file names as command line arguments, using temporary file names instead of fixed names, cleaning up the intermediate files, different compounds, different atoms (nitrogen, phosphorous, etc), and so on. However, it should adapt reasonably easily.
input="data"
output="output"
n=$(sed 1q "$input")
n2=$(($n+2))
for ((i = 3; i <= n2; i++))
do
colno=$(printf "%.2d" $(($i-2)))
awk -v N=$n2 -v R=$i \
' BEGIN { name["C"] = "Carbon"; name["H"] = "Hydrogen"; name["O"] = "Oxygen";
R0 = R % N }
NR > 2 && NR <= R { count[$1]++; }
NR == R { printf "%-32.32s\n", name[$1] " " count[$1]; }
NR % N == R0 { xyz = sprintf("%s %s %s", $2, $3, $4); printf "%-32.32s\n", xyz }
' "$input" > "column.$colno"
done
paste -d ' ' column.* > "$output"
The first four lines set up the control parameters, collecting the number of lines per unit of data from the input file, and adjusting things accordingly. The for loop iterates over offsets 3 to $n2 inclusive (skipping the two header lines), and runs the awk script. That encodes atom types (BEGIN), determines which atom it is processing this time (NR > 2 && NR <= R and NR == R), and then arranges to print the triplets of data for the relevant atom. The formatting is carefully organized so that the column headings and the actual xyz-triplets are uniformly spaced. These are written to a file column.$colno. When all's done, the column.* files are pasted to generate a single output file, which looks like this:
Carbon 1 Carbon 2 Carbon 3 Carbon 4 Carbon 5 Carbon 6 Carbon 7 Carbon 8 Carbon 9 Oxygen 1 Oxygen 2 Hydrogen 1 Hydrogen 2 Hydrogen 3 Oxygen 3 Hydrogen 4 Hydrogen 5 Hydrogen 6
0.979485 -6.665347 0.575383 1.191999 -3.002386 2.859484 3.151517 -5.610077 0.429413 3.439828 -6.454984 1.319724 3.726201 -0.123038 2.096854 1.363325 -3.031238 0.016019 6.090283 -3.915340 2.396358 0.407755 -7.957784 -0.846842 0.203074 -0.796428 2.659573 2.600610 -2.259674 -0.260378 4.773839 -6.765097 0.588508 2.743424 -2.890016 2.906452 2.810233 -6.641054 -0.797672 6.854169 -3.191721 -0.925670 2.914233 -1.060001 0.776983 3.803923 -1.497032 2.908799 5.669443 -7.227666 -0.647552 0.092455 -5.850637 2.959987
6.042840 -7.254720 2.093573 2.551942 -6.044322 2.061072 3.523150 -6.167163 2.451689 5.197316 -3.429866 -0.412062 2.548777 -6.422851 1.282846 3.775197 -2.012031 1.377440 3.405112 -3.206415 -0.879886 1.448359 -5.419629 0.467291 3.661964 -2.789234 2.644294 4.214854 -2.439574 -0.951704 5.297609 -2.320418 2.709898 2.653940 -4.431080 -0.511743 5.040635 -0.676199 -0.590970 1.546725 -1.294582 2.562937 4.231461 -7.180908 1.629901 3.297836 -1.557133 -0.133280 3.442481 -4.489962 2.111930 1.423611 -7.982655 0.715618
1.432495 -7.686243 2.525734 5.038409 -4.976270 2.826846 6.184137 -7.303094 2.711561 3.208125 -0.606556 1.978725 2.171859 -6.792060 0.678988 6.521124 -5.622797 -0.773797 1.725619 -5.768633 -0.223397 3.602427 -2.325680 1.762008 1.937521 -1.686895 1.743159 0.745526 -0.114246 -0.949490 4.754360 -6.531145 1.998913 1.114732 -1.158810 1.486939 6.410490 -5.411647 0.062737 4.164330 -6.743763 1.802804 2.587841 -3.979700 2.609748 2.192073 -2.815376 -0.809569 5.501795 -2.326438 1.325829 3.285032 -1.212541 1.284453
3.564424 -3.117406 -0.032879 2.894745 -0.632591 0.532311 3.384916 -5.383135 1.179585 0.793488 -0.894539 -0.886891 1.348785 -6.501867 1.648604 2.189941 -2.438067 0.616090 2.043378 -4.966472 0.691603 3.124161 -5.792896 0.545362 5.741472 -0.640590 2.825374 0.300550 -7.149663 0.942726 1.344387 -0.121382 2.169401 4.963296 -0.964665 -0.230523 6.651423 -4.905053 2.509626 5.059694 -6.166516 0.102255 5.046864 -3.288883 0.853948 2.389007 -3.057664 1.806301 2.365876 -0.956860 1.458959 2.892502 -0.097422 -0.531714
Your task is to understand why all the bits of the awk script are present. For example, why is R0 needed (hint, experiment without the R0 calculation, and use R in its place).

gnuplot store one number from data file into variable

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected
You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.
This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.
Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row

Formula in gawk

I have a problem that I’m trying to work out in gawk. This should be so simple, but my attempts ended up with a divide by zero error.
What I trying to accomplish is as follows –
maxlines = 22 (fixed value)
maxnumber = > max lines (unknown value)
Example:
maxlines=22
maxnumber=60
My output should look like the following:
print lines:
1
2
...
22
print lines:
23
24
...
45
print lines:
46 (remainder of 60 (maxnumber))
47
...
60
It's not clear what you're asking, but I assume you want to loop through input lines and print a new header (page header?) after every 22 lines. Using a simple counter and check for
count % 22 == 1
which tells you it's time to print the next page.
Or you could keep two counters, one for the absolute line number and another for the line number within the current page. When the second counter exceeds 22, reset it to zero and print the next page heading.
Worked out gawk precedence with some help and this works -
maxlines = 22
maxnumber = 60
for (i = 1; i <= maxnumber; i++){
if ( ! ( (i-1) % maxlines) ){
print "\nprint lines:"
}
print i
}