Getting wrong zero values with numpy fromfile when reading binary files - numpy

I am trying to read a binary file with Python. This is the code I use:
fb = open(Bin_File, "r")
a = numpy.fromfile(fb, dtype=numpy.float32)
However, I get zero values at the end of the array. For example, for a case where nrows=296 and ncol=439 and as a result, len(a)=296*439, I get zero values for a[-922:]. I know these values should be noData (-9999 in this example) from a trusted piece of code in R. Does anybody know why I am getting these non-sense zeros?
P.S: I am not sure it is related on not, but len(a) is nrows*ncols+2! I have to get rid of these two using a = a[0:-2] so that when I reshape them into rows and columns using a_reshape = a.reshape(nrows, ncols) I don't get an error.

When opening a file for reading as binary you should use the mode "rb" instead of "r".
Here is some background from the docs. On linux machines you don't need the "b" but it wont hurt. On Windows machines you must use "rb" for binary files.
Also note that the two extra entries you're getting is a common bug/feature when using the "unformatted" binary output format of Fortran. Each write statement given in this mode will produce a record that is surrounded by two 4 byte blocks.
These blocks represent integers that list the number of bytes in the block of unformatted data. For example, [223] [223 bytes of data] [223].

Related

Fortran: How to skip many lines of data file efficiently

I have a formatted data file which is typically billions of lines long, with several lines of headers of variable length. The data file takes the form:
# header 1
# header 2
# headers are of variable length.
# data begins from next line.
1.23 4.56 7.89 0.12
2.34 5.67 8.90 1.23
:
:
# billions of lines of data, each row the same length, same format.
-- end of file --
I would like to extract a portion of data from this file, and my current code looks like:
<pre>
do j=1,jmax !Suppose I want to extract jmax lines of data from the file.
[algorithm to determine number of lines to skip, "N(j)"]
!This determines the number of lines to skip from the previous file
!position, when the data was read on j-1th iteration.
!Skip N-1 lines to go to the next data line to read off:
do i=1,N-1
read(unit=unit,fmt='(A)')
end do
!Now read off the line of data I want:
read(unit=unit,fmt='(data_format)'),data1,data2,etc.
!Data is stored in some arrays.
end do
</pre>
The problem is, N(j) can be anywhere between 1 and several billion, so it takes some time to run the code.
My question is, is there a more efficient way of skipping millions of lines of data? The only way I can think of, while sticking to Fortran, is to open the file with direct access and jump to the desired line upon opening the file.
As you suggest, direct access seems like the best option. But that requires the records to all have the same length, which your headers violate. Also, why used formatted output? With a file of this length, its hard to imagine a person reading the file. If you use unformatted IO, the file will be both smaller and IO will be faster. Perhaps create two files, one with the headers (metadata) in human reader form, and the other with the data in native form. Native / binary representation means a loss of portability, which is something to consider if you want to move the files to different computer architectures or have them be useable for decades. Otherwise it's probably worth the convenience. Other options would be to use a more sophisticated file format that combines metadata and data, such as HDF5 or FITS, but for communication between two programs of one person, that's probably excessive.

Fortran does'nt end when obtain unexpected value?

I've got a program, which compute a several variables and then these variables are writing in to the output file.
Is it possilbe, that when my program can't get a correct results for my formula, it does'nt terminate?
To clarify what I do, here is part of my code, where the variable of my interest are compute:
dx=x(1,i)-x(nk,i)
dy=y(1,i)-y(nk,i)
dz=z(1,i)-z(nk,i)
call PBC(dx,dy,dz)
r2i=dx*dx+dy*dy+dz*dz
r2=r2+r2i
r2g0=0.0d0
r2gx=0.0d0
dx=x(1,i)-x(2,i)
call PBC(dx,dy,dz)
rspani=dsqrt(dx*dx)
do ii=1,nk-1
rx=x(ii,i)
ry=y(ii,i)
rz=z(ii,i)
do jj=ii+1,nk
dx=x(jj,i)-rx
dy=y(jj,i)-ry
dz=z(jj,i)-rz
call PBC(dx,dy,dz)
r21=dx*dx+dy*dy+dz*dz
r21x=dx*dx
r2g=r2g+r21
r2gx=r2gx+r21x
r2g0=r2g0+r21
rh=rh+1.0d0/dsqrt(r21)
rh1=rh1+1.0d0
ir21=dnint(dsqrt(r21)/dr)
p(ir21)=p(ir21)+2.0D0
dxs=dsqrt(r21x)
if(dxs.gt.rspani) rspani=dxs
end do
and then in to the output I just write these variables:
write(12,870)r2i,sqrt(r2i),r2g0,r2gx/(nk*nk)
870 FORMAT(3(f15.7,3x),f15.7)
The x, y, z are actully generate via a random number generator.
The problem is that my output contains, correct values for lets say 457 lines, and then a one line is just "*********" when I use mc viewer and then the output continues with correct values, but let's say 12 steps form do cycle which compute these variables is missing.
So my questions are basic:
Is it possible, that my program can't get a correct numbers, and that's why the result is not writing in to the program?
or
Could it this been caused due to wrong output formating or something related with formating?
Thank you for any suggestion
********* is almost certainly the result of trying to write a number too large for the field specified in a format string.
For example, a field specified as f15.7 will take 1 spot for the decimal point, 1 spot for a leading sign (- will always be printed if required, + may be printed if options are set), 7 for the fractional digits, leaving 6 digits for the whole part of the number. There may therefore be cases where the program won't fit the number into the field and will print 15 *s instead.
Programs compiled with an up to date Fortran compiler will write a string such as NaN or -Inf if they encounter a floating-point number which represents one of the IEEE special values

Fortran runtime error: Bad integer for item 0 in list input?

How do I fix the Fortran runtime error: Bad integer for item 0 in list input?
Below is the Fortran program which generates a runtime error.
CHARACTER CNFILE*(*)
REAL BOX
INTEGER CNUNIT
PARAMETER ( CNUNIT = 10 )
INTEGER NN
OPEN ( UNIT = CNUNIT, FILE = CNFILE, STATUS = 'OLD' )
READ ( CNUNIT,* ) NN, BOX
The error message received from gdb is :
At line 688 of file MCNPT.f (unit = 10, file = 'LATTICE-256.txt')
Fortran runtime error: Bad integer for item 0 in list input
[Inferior 1 (process 3052) exited with code 02]
(gdb)
I am not sure what options must be specified for READ() to read to numbers from the text file. Does it matter if the two numbers on the same line are specified as either an integer or a real in the text file?
Below is the gdb execution of the program using a break point at the open call
Breakpoint 1, readcn (
cnfile=<error reading variable: Cannot access memory at address 0x7fffffffdff0>,
box=-3.37898272e+33, _cnfile=30) at MCNPT.f:686
Since you did not specify form="unformatted" on the open statement, the unit / file is opened for formatted IO. This is appropriate for a human-readable text file. ("unformatted" would be used for a non-human readable file in computer-native format, sometimes called "binary".) Therefore you should provide a format on the read, or use list-directed read, i.e., read(unit, *). To advise on a particular format we would have to know the layout of the numbers in the file. A possible read with format is: read (CNUINT, '(I4, 2X, F6.2)' ) NN, BOX
P.S. I'm answering the question in your question and not the title, which seems unrelated.
EDIT: now that you are show the text data file, a list-directed read looks easier. That is because the data doesn't line up in columns. It seems that the file has two integers on the first line, then three real numbers on each of the following lines. Most likely you need a different read for the first line. Is the code sample that you are showing us trying to read the first line, or one of the later lines? If the first line, it would seem plausible to read into two integer variables. If a later line, into two or three real variables. Two if you wish to skip the third data item on the line.
EDIT 2: the question has been substantially altered several times, which is very confusing. The first line of the text file that was shown in one version of the question contained integers, with later lines having reals. Since the listed-directed read is reading into an integer and a floating variable, it will have problems if you attempt to use it on the later lines that have two real values.

'Bad repeat count' while inputting a file, FORTRAN

I am trying to read a file into my code.
there are 2 subroutines, one which writes a file and the other which reads it.
the writing part was:
write(*,*)'entered refile, shall make file'
ileunitA=int(presentstep)
write(fname,1012)ileunitA
1012 format('DATA_',i6.6,'.dat')
write(fnam,1112)index
1112 format('pp',i3.3)
open(UNIT=ileunitA,FILE=fname)
!variables from module global
write(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
write(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
write(ileunita,*) scon,smomu,smomv,smomw
...
The reading part was as follows(in another subroutine):
ileunita=25;
open(unit=ILEUNITA,file='DATA_010500.dat')
!variables from module global
read(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
read(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
read(ileunita,*) scon,smomu,smomv,smomw
...
When I run the code, it shows the following error:
At line 3682 of file bub2.f90 (unit = 25, file = 'DATA_000001.dat')
Fortran runtime error: Bad repeat count in item 1 of list input
Can anyone help me figure out what could be the problem? And what is 'repeat count'. What is a 'bad' repeat count? Thanks
Guessing a little (you could show the text in the problematic line in your question...), but you are using list directed input (and output) with the * as the second specifier in the read (and write) statements. List directed input allows multiple fields that have the same value to be represented using the syntax r*c, where r is a numeric repeat count and c is the value to be repeated.
If any of your output items generate a field that contains a * then that could be confusing the processing of input.
(It is permissible (though rare) for a processor to represent multiple output fields that have the same value using a repeat count, for example WRITE (unit,*) 23, 23, 23, 23 could result in an input file that contains the text 4*23.)
List directed input also has some other features, such as the handling of delimiter characters, the / character causing input processing to terminate and the possibility and handling of null values. Some of these features may surprise those not familiar with the rules (which are inspired by typical short cuts taken when input was submitted via punched cards), which why it is often better to avoid list directed input and output and use an explicit format instead.
If any of your data fields are of type character you should consider using a non-default DELIM mode to avoid any special characters within the character variable value from confusing the input processing.

Fortran 90: How to correctly read an integer among other real

I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000