Fortran: read variables that are not present in a file - file-io

I need help understanding this 50 line program
implicit none
integer maxk, maxb, maxs
parameter (maxk=6000, maxb=1000, maxs=5)
integer nk, nspin, nband, ik, is, ib
double precision e(maxb, maxs, maxk), k(maxk)
double precision ef, kmin, kmax, emin, emax
logical overflow
read(5,*) ef
read(5,*) kmin, kmax
read(5,*) emin, emax
read(5,*) nband, nspin, nk
overflow = (nband.gt.maxb) .or. (nk.gt.maxk) .or. (nspin.gt.maxs)
if (overflow) stop 'Dimensions in gnubands too small'
write(6,"(2a)") '# GNUBANDS: Utility for SIESTA to transform ',
. 'bands output into Gnuplot format'
write(6,"(a)") '#'
write(6,"(2a)") '# ',
. ' Emilio Artacho, Feb. 1999'
write(6,"(2a)") '# ------------------------------------------',
. '--------------------------------'
write(6,"(a,f10.4)") '# E_F = ', ef
write(6,"(a,2f10.4)") '# k_min, k_max = ', kmin, kmax
write(6,"(a,2f10.4)") '# E_min, E_max = ', emin, emax
write(6,"(a,3i6)") '# Nbands, Nspin, Nk = ', nband, nspin, nk
write(6,"(a)") '#'
write(6,"(a)") '# k E'
write(6,"(2a)") '# ------------------------------------------',
. '--------------------------------'
read(5,*) (k(ik),((e(ib,is,ik),ib=1,nband), is=1,nspin), ik=1,nk)
do is = 1, nspin
do ib = 1, nband
write(6,"(2f14.6)") ( k(ik), e(ib,is,ik), ik = 1, nk)
write(6,"(/)")
enddo
enddo
This is a free format Fortran file. The name of the program is gnubands and rearranges numbers in an input (which the user specifies). I would like to know how this program operates. Here is what I do not understand. The program takes input from a file, it reads
ef, kmin,kmax,emin,emax,nband,nspin,nk
However, all of these variables are not found inside the input file. I opened the input file in vi and conducted a search using /. I do not obtain any results. Nevertheless, the program appears to correctly pick all values. What is happening?
Also, I do not understand the read format
read(5,*) (k(ik),((e(ib,is,ik),ib=1,nband), is=1,nspin), ik=1,nk)
I am not familiar with the syntax and would like to know what it is saying or any references.

Some tutorial PDF of SIESTA shows that the input for gnubands.f is something like this:
whose header part is to be read by the first four read statements of gnubands.f. With this input, the variables are set as
ef = -5.018...
kmin = 0.000...
kmax = 3.338...
emin = -25.187...
emax = 143.069...
nband = 18
nspin = 1
nk = 150
by giving the input file from the standard input (assumed unit number 5) as
gfortran -o gnubands.x gnubands.f
gnubands.x < your_data_file.bands
Note that there are (and should be) no keywords like "ef" or "EF" or "Ef" (capitalization does not matter), because the numbers are directly read into the variables in gnubands.f. This is in contrast to other cases like using XML files, where (human-readable) tags or keywords are embedded in the file itself (e.g., pseudopotential files used by Quantum ESPRESSO). I guess your confusion might be coming from the use of namelist for obtaining input values, which looks like
namelist /your_inp/ a, b, c
read( funit, nml = your_inp )
with an input file
&your_inp
a = 1.0
b = "method1"
c = 77
/
In this case, the variable names (here, a, b, and c) appear literally in the input file.

Historically, 5 (in your read(5,*)) is stdin, so either
(1)you are supplying the value, when you are running the code,
or,(2) I guess when you run the SIESTA, (gnuband is a postprocessor of that) it creates a file, possibly named fort.5. Check that.

Related

How do you detect blank lines in Fortran?

Given an input that looks like the following:
123
456
789
42
23
1337
3117
I want to iterate over this file in whitespace-separated chunks in Fortran (any version is fine). For example, let's say I wanted to take the average of each chunk (e.g. mean(123, 456, 789) then mean(42, 23, 1337) then mean(31337)).
I've tried iterating through the file normally (e.g. READ), reading in each line as a string and then converting to an int and doing whatever math I want to do on each chunk. The trouble here is that Fortran "helpfully" ignores blank lines in my text file - so when I try and compare against the empty string to check for the blank line, I never actually get a .True. on that comparison.
I feel like I'm missing something basic here, since this is a typical functionality in every other modern language, I'd be surprised if Fortran didn't somehow have it.
If you're using so-called "list-directed" input (format = '*'), Fortran does special handling to spaces, commas, and blank lines.
To your point, there's a feature which is using the BLANK keyword with read
read(iunit,'(i10)',blank="ZERO",err=1,end=2) array
You can set:
blank="ZERO" will return a valid zero value if a blank is found;
blank="NULL" is the default behavior that skips blank/returns an error depending on the input format.
If all your input values are positive, you could use blank="ZERO" and then use the location of zero values to process your data.
EDIT as #vladimir-f has correctly pointed out, you not only have blanks in between lines, but also after the end of the numbers in most lines, so this strategy will not work.
You can instead load everything into an array, and process it afterwards:
program array_with_blanks
integer :: ierr,num,iunit
integer, allocatable :: array(:)
open(newunit=iunit,file='stackoverflow',form='formatted',iostat=ierr)
allocate(array(0))
do
read(iunit,'(i10)',iostat=ierr) num
if (is_iostat_end(ierr)) then
exit
else
array = [array,num]
endif
end do
close(iunit)
print *, array
end program
Just read each line as a character (but note Francescalus's comment on the format). Then read the character as an internal file.
program stuff
implicit none
integer io, n, value, sum
character (len=1000) line
n = 0
sum = 0
io = 0
open( 42, file="stuff.txt" )
do while( io == 0 )
read( 42, "( a )", iostat = io ) line
if ( io /= 0 .or. line == "" ) then
if ( n > 0 ) print *, ( sum + 0.0 ) / n
n = 0
sum = 0
else
read( line, * ) value
n = n + 1
sum = sum + value
end if
end do
close( 42 )
end program stuff
456.000000
467.333344
3117.00000

Octave strread can't return parsed results to an array (?)

In Octave, I am reading very large text files from disk and parsing them. The function textread() does just what I want except for the way it is implemented. Looking at the source, textread.m pulls the entire text file into memory before attempting to parse lines. If the text file is large, it fills all my free RAM (16 GB) with text and then starts saving back to disk (virtual memory), before parsing. If I wait long enough, textread() will complete, but it takes almost forever.
Notice that after parsing into a matrix of floating point values, the same data fit into memory quite easily. So I'm using textread() in an intermediate zone, where there is enough memory for the floats, but not enough memory for the same data as text.
All of that is preparation for my question, which is about strread(). The data in my text files looks like this
0.0647148 -2.0072535 0.5644875 8.6954257
0.1294296 -8.4689583 0.6567095 144.3090450
0.1941444 -9.2658037 -1.0228742 173.8027785
0.2588593 -6.5483359 -1.5767574 90.7337329
0.3235741 -0.7646807 -0.5320896 1.7357120
... and so on. There are no header lines or comments in the file.
I wrote a function that reads the file line by line, and notice the two ways I'm attempting to use strread() to parse a line of data.
function dest = readPowerSpectrumFile(filename, dest)
% read enough lines to fill destination array
[rows, cols] = size(dest);
fid = fopen(filename, 'r');
for line = 1 : rows
lstr = fgetl(fid);
% this line works, but is very brittle
[dest(line, 1), dest(line, 2), dest(line, 3), dest(line, 4)] = strread(lstr, "%f %f %f %f");
% This line doesn't work. Or anything similar I can think of.
% dest(line, 1:4) = strread(lstr, "%f %f %f %f");
endfor
fclose(fid);
endfunction
Is there an elegant way of having strread return parsed values to an array? Otherwise I'll have to write a new function any time I change the number of columns.
Thanks
Your described format is a matrix with floating point values. In this case you can just use load
d = load ("yourfile");
which is much faster than any other function. You can have a look at the used implementation in libinterp/corefcn/ls-mat-ascii.cc: read_mat_ascii_data
If you feed fprintf more values than are in its format specification, it will reapply the print statement until it's used them up:
>> fprintf("%d %d \n", 1:6)
1 2
3 4
5 6
It appears this also works with strread. If you specify only one value to read, but there are multiple on the current line, it will keep reading them and add them to a column vector. All we need to do is to assign those values to the correct row of dest:
function dest = readPowerSpectrumFile(filename, dest)
% read enough lines to fill destination array
[rows, cols] = size(dest);
fid = fopen(filename, 'r');
for line = 1 : rows
lstr = fgetl(fid);
% read all values from current line into column vector
% and store values into row of dest
dest(line,:) = strread(lstr, "%f");
% this will also work since values are assumed to be numeric by default:
% dest(line,:) = strread(lstr);
endfor
fclose(fid);
endfunction
Output:
readPowerSpectrumFile(filename, zeros(5,4))
ans =
6.4715e-02 -2.0073e+00 5.6449e-01 8.6954e+00
1.2943e-01 -8.4690e+00 6.5671e-01 1.4431e+02
1.9414e-01 -9.2658e+00 -1.0229e+00 1.7380e+02
2.5886e-01 -6.5483e+00 -1.5768e+00 9.0734e+01
3.2357e-01 -7.6468e-01 -5.3209e-01 1.7357e+00

Reading, parsing and storing .txt files contents in Torch tensors efficiently

I have a huge number of .txt files (maybe around 10 millions) each having the same number of rows/colums. They actually are some single channel images and the pixel values are separated with an space. Here's the code I've written to do the work but it's very slow. I wonder if someone can suggest a more optimized/efficient way of doing this:
require 'torch'
f = assert(io.open(txtFilePath, 'r'))
local tempTensor = torch.Tensor(1, 64, 64):fill(0)
local i = 1
for line in f:lines() do
local l = line:split(' ')
for key, val in ipairs(l) do
tempTensor[{1, i, key}] = tonumber(val)
end
i = i + 1
end
f:close()
In brief, change you source files if it is possible.
The only I can suggest is to use binary data instead of txt as a source.
You have got the long-term methods: f:lines(), line:split(' ') and tonumber(val). All of them are using strings as variables.
As I understood, you have got file like this:
0 10 20
11 18 22
....
so, change your source it into binary like this:
<0><18><20><11><18><22> ...
where <18> is a byte in hex form, that is 12 , <20> is 16 , etc.
to read
fid = io.open(sup_filename, "rb")
while true do
local bytes = fid:read(1)
if bytes == nil then break end -- EOF
local st = bytes[0]
print(st)
end
fid:close()
https://www.lua.org/pil/21.2.2.html
It would be dramatically faster.
May be using regular expressions (instead of :split() and lines()) can help to you but I do not think.

gnuplot store one number from data file into variable

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected
You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.
This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.
Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row

How to load 2D array from a text(csv) file into Octave?

Consider the following text(csv) file:
1, Some text
2, More text
3, Text with comma, more text
How to load the data into a 2D array in Octave? The number can go into the first column, and all text to the right of the first comma (including other commas) goes into the second text column.
If necessary, I can replace the first comma with a different delimiter character.
AFAIK you cannot put stings of different size into an array. You need to create a so called cell array.
A possible way to read the data from your question stored in a file Test.txt into a cell array is
t1 = textread("Test.txt", "%s", "delimiter", "\n");
for i = 1:length(t1)
j = findstr(t1{i}, ",")(1);
T{i,1} = t1{i}(1:j - 1);
T{i,2} = strtrim(t1{i}(j + 1:end));
end
Now
T{3,1} gives you 3 and
T{3,2} gives you Text with comma, more text.
After many long hours of searching and debugging, here's how I got it to work on Octave 3.2.4. Using | as the delimiter (instead of comma).
The data file now looks like:
1|Some text
2|More text
3|Text with comma, more text
Here's how to call it: data = load_data('data/data_file.csv', NUMBER_OF_LINES);
Limitation: You need to know how many lines you want to get. If you want to get all, then you will need to write a function to count the number of lines in the file in order to initialize the cell_array. It's all very clunky and primitive. So much for "high level languages like Octave".
Note: After the unpleasant exercise of getting this to work, it seems that Octave is not very useful unless you enjoy wasting your time writing code to do the simplest things. Better choices seems to be R, Python, or C#/Java with a Machine Learning or Matrix library.
function all_messages = load_data(filename, NUMBER_OF_LINES)
fid = fopen(filename, "r");
all_messages = cell (NUMBER_OF_LINES, 2 );
counter = 1;
line = fgetl(fid);
while line != -1
separator_index = index(line, '|');
all_messages {counter, 1} = substr(line, 1, separator_index - 1); % Up to the separator
all_messages {counter, 2} = substr(line, separator_index + 1, length(line) - separator_index); % After the separator
counter++;
line = fgetl(fid);
endwhile
fprintf("Processed %i lines.\n", counter -1);
fclose(fid);
end