Formatted vs unformatted file writing - file-io

I have often observed that unformatted file writing (as shown below) is way faster than formatted file writing (also as shown below) in Fortran 90.
Unformatted file writing
OPEN ( Unit=86, File='out.dat', Form='unformatted', Action='Write')
WRITE (86) A, B, C
CLOSE (86)
Formatted file writing
OPEN ( Unit=86, File='out.dat', Form='formatted', Action='Write')
DO ii = 1,N
DO jj = 1,N
WRITE (*,86) A(ii,jj), B(ii,jj), C(ii,jj)
END DO
END DO
CLOSE (86)
Where A,B,C are two-dimensional arrays of size(N,N). I found that in this case the difference in CPU_Time was around 12s.
Why is there a huge difference? Is it just the time taken by looping in formatted procedure?

Notice that in one case you write the whole A first, then whole B then whole C. In the second you are writing A11, B11, C11, A21, B21, C21, ... That is a difference, but the impact will be small.
It takes a lot of CPU time to convert the numbers from the binary memory representation to human readable numbers. That is mostly what makes it slower in formatted case. Also the file is larger so writing it takes more time.

Related

Fortran: How to skip many lines of data file efficiently

I have a formatted data file which is typically billions of lines long, with several lines of headers of variable length. The data file takes the form:
# header 1
# header 2
# headers are of variable length.
# data begins from next line.
1.23 4.56 7.89 0.12
2.34 5.67 8.90 1.23
:
:
# billions of lines of data, each row the same length, same format.
-- end of file --
I would like to extract a portion of data from this file, and my current code looks like:
<pre>
do j=1,jmax !Suppose I want to extract jmax lines of data from the file.
[algorithm to determine number of lines to skip, "N(j)"]
!This determines the number of lines to skip from the previous file
!position, when the data was read on j-1th iteration.
!Skip N-1 lines to go to the next data line to read off:
do i=1,N-1
read(unit=unit,fmt='(A)')
end do
!Now read off the line of data I want:
read(unit=unit,fmt='(data_format)'),data1,data2,etc.
!Data is stored in some arrays.
end do
</pre>
The problem is, N(j) can be anywhere between 1 and several billion, so it takes some time to run the code.
My question is, is there a more efficient way of skipping millions of lines of data? The only way I can think of, while sticking to Fortran, is to open the file with direct access and jump to the desired line upon opening the file.
As you suggest, direct access seems like the best option. But that requires the records to all have the same length, which your headers violate. Also, why used formatted output? With a file of this length, its hard to imagine a person reading the file. If you use unformatted IO, the file will be both smaller and IO will be faster. Perhaps create two files, one with the headers (metadata) in human reader form, and the other with the data in native form. Native / binary representation means a loss of portability, which is something to consider if you want to move the files to different computer architectures or have them be useable for decades. Otherwise it's probably worth the convenience. Other options would be to use a more sophisticated file format that combines metadata and data, such as HDF5 or FITS, but for communication between two programs of one person, that's probably excessive.

idl strange symbols in file

I've written several IDL programs to analyse some data. To keep it simple the programs read in some time varying data and calculate the fourier spectrum. This spectrum is written to file using this code:
openw,3,filename
printf,3,[transpose(freq),transpose(power)],format='(e,e)'
close,3
The file is then read by another program using this code:
rdfloat,filename,freq,power,/double
The rdfloat procedure can be found here: http://idlastro.gsfc.nasa.gov/
The error i get when trying to read the a file is: "Input conversion error. Unit: 101"
When i delve in to the file being read, i notice several types of unrecognised characters. I dont know if these are a result of the writing to the file or some thing else related to the number of files being created (over 300 files)
These symbols/characters are in the place of a single number:
< dle> < dc1> < dc2> < dc3> < dc4> < can> < nak> < em> < soh> < syn>
Example of what appears in the file being read, Note they are not consecutive lines.
7.7346< dle>18165493007e+01 8.4796811549010105e+00
7.7354408697119453e+01 1.04459538071< dc2>1749e+01
7.7360701595839< can>28e+01 3.0447318983094189e+00
Whenever I run the procedures that write the files, there is always at least one file that has some or all of these characters. The file/s that contains these characters is always different.
Can anyone explain what these symbols are and what I might be doing to create them as well as how to ensure they are not written to file?
I see two things that may be causing a problem. But first, I want to suggest a few tips.
When you open a file, it is useful to use the /GET_LUN keyword because it allows IDL to find and use a logical unit number (LUN) that is available (e.g., in case you left LUN 3 open somewhere else). When you print formatted data, you should specify the total width and number of decimal places. It will make things easier because then you need not worry about changing spacings between numbers in a file.
So I would change your first set of code to the following (or some variant of the following):
OPENW,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
PRINTF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine
So the first potential issue I see is that if your variable power was calculated using something like FFT.pro, then it will be a complex float or complex double, depending on the input and keywords used.
The second potential issue may be due to an incorrect format statement. You did not tell PRINTF how many columns or rows to expect. It might not know how to handle the input properly, so it guesses and may result in those characters you show. Those characters may be spacing characters due to the vague format statement or the software you are using to look at the files (e.g., I would not recommend using Word to open text files, use a text editor).
Side Note: You can open and read the file you just wrote in a similar fashion to what I showed above, but changed to the following:
n = FILE_LINES(filename[0])
freq = DBLARR(n)
power = DBLARR(n)
OPENR,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
READF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine

Can Fortran read bytes directly from a binary file?

I have a binary file that I would like to read with Fortran. The problem is that it was not written by Fortran, so it doesn't have the record length indicators. So the usual unformatted Fortran read won't work.
I had a thought that I could be sneaky and read the file as a formatted file, byte-by-byte (or 4 bytes by 4 bytes, really) into a character array and then convert the contents of the characters into integers and floats via the transfer function or the dreaded equivalence statement. But this doesn't work: I try to read 4 bytes at a time and, according to the POS output from the inquire statement, the read skips over like 6000 bytes or so, and the character array gets loaded with junk.
So that's a no go. Is there some detail in this approach I am forgetting? Or is there just a fundamentally different and better way to do this in Fortran? (BTW, I also tried reading into an integer*1 array and a byte array. Even though these codes would compile, when it came to the read statement, the code crashed.)
Yes.
Fortran 2003 introduced stream access into the language. Prior to this most processors supported something equivalent as an extension, perhaps called "binary" or similar.
Unformatted stream access imposes no record structure on the file. As an example, to read data from the file that corresponds to a single int in the companion C processor (if any) for a particular Fortran processor:
USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_INT
INTEGER, PARAMETER :: unit = 10
CHARACTER(*), PARAMETER :: filename = 'name of your file'
INTEGER(C_INT) :: data
!***
OPEN(unit, filename, ACCESS='STREAM', FORM='UNFORMATTED')
READ (unit) data
CLOSE(unit)
PRINT "('data was ',I0)", data
You may still have issues with endianess and data type size, but those aspects are language independent.
If you are writing to a language standard prior to Fortran 2003 then unformatted direct access reading into a suitable integer variable may work - it is Fortran processor specific but works for many of the current processors.

maximum 'string' length in fortran

does fortran have a maximum 'string' length?
i am going to be reading lines from a file which could have very long lines. the one i am looking at now has around 1.3k characters per line, but it is possible that they may have much more. i am reading each line from the file to a character*5000 variable, but if i get more in the future, is it bad to make it a character*5000000 variable? is there a max? is there a better way to solve this problem than making a very large character variables?
Since the usual Fortran IO is record based, reading lines into strings implies knowing the maximum string length. Another possible design: use stream IO and Fortran will ignore the record boundaries. Read the file in fixed-length chunks that are shorter than the longest lines. The complication is handling items split across chunk boundaries. The practicality depends on details not given in the question.
P.S. From "The Fortran 2003 Handbook" by Adams et al.: "The maximum length permitted for character strings is processor-dependent." -- meaning compiler dependent.
Maximum wil be implementation dependant. For your case, I can think of something along these lines:
character(:),allocatable :: ch
l = 5
do
allocate(character(l) :: ch)
read(unit,'(a)',iostat=io) ch
if (ch(l-4:l) = ' ' .or. io/=0) exit
deallocate(ch)
l = l * 2
end do
Obviously will not work for pad='no' and if you expect long regions with spacec in your records.

Fortran: How do I read the first character from each line of a text file?

this is my first time trying to program in Fortran. I'm trying to write a program that prints the first 1476 terms of the Fibonacci sequence, then examines the first digit of each term and stores the number of 1s, 2s, 3s, ..., 9s that occur in an array.
The problem that I can't seem to figure out is how to read the first digit of each term. I've tried several things but am having difficulty with my limited knowledge of Fortran techniques. I write the terms to a text file and the idea is to read the first digit of each line and accumulate the respective number in the array. Does anyone have any suggestions of how to do this?
Here is my code so far:
(edit: I included the code I have for reading the file. Right now it just prints out 3.60772951994415996E-313,
which seems like an address of some sort, because it's not one of the Fibonacci numbers. Also, it is the only thing printed, I expected that it would print out every line of the file...)
(edit edit: After considering this, perhaps there's a way to format the writing to the text file to just the first digit. Is there a way to set the number of significant digits of a real number to one? :P)
subroutine writeFib(n)
integer :: i
real*8 :: prev, current, newFib
prev = 0
current = 1
do i = 1, n
newFib = prev + current
prev = current
current = newFib
write(7,*) newFib
end do
return
end subroutine
subroutine recordFirstDigits(a)
integer :: openStat, inputStat
real*8 :: fibNum
open(7, file = "fort.7", iostat = openStat)
if (openStat > 0) stop "*** Cannot open the file ***"
do
read(7, *, iostat = inputStat) fibNum
print *,fibNum
if (inputStat > 0) stop "*** input error ***"
if (inputStat < 0) exit ! end of file
end do
close(7)
end subroutine
program test
integer :: k, a(9)
k = 1476
call writeFib(k)
call recordFirstDigits(a)
end program
Although the suggestions were in place, there were also several things that were forgotten. Range of the REAL kind, and some formatting problems.
Anyways, here's one patched up solution, compiled and working, so try to see if this will work for you. I've took the liberty of choosing my own method for fibonacci numbers calculation.
program SO1658805
implicit none
integer, parameter :: iwp = selected_real_kind(15,310)
real(iwp) :: fi, fib
integer :: i
character(60) :: line
character(1) :: digit
integer :: n0=0, n1=0, n2=0, n3=0, n4=0, n5=0, n6=0, n7=0, n8=0, n9=0
open(unit=1, file='temp.txt', status='replace')
rewind(1)
!-------- calculating fibonacci numbers -------
fi = (1+5**0.5)/2.
do i=0,1477
fib = (fi**i - (1-fi)**i)/5**0.5
write(1,*)fib,i
end do
!----------------------------------------------
rewind(1)
do i=0,1477
read(1,'(a)')line
line = adjustl(line)
write(*,'(a)')line
read(line,'(a1)')digit
if(digit.eq.' ') n0=n0+1
if(digit.eq.'1') n1=n1+1
if(digit.eq.'2') n2=n2+1
if(digit.eq.'3') n3=n3+1
if(digit.eq.'4') n4=n4+1
if(digit.eq.'5') n5=n5+1
if(digit.eq.'6') n6=n6+1
if(digit.eq.'7') n7=n7+1
if(digit.eq.'8') n8=n8+1
if(digit.eq.'9') n9=n9+1
end do
close(1)
write(*,'("Total number of different digits")')
write(*,'("Number of digits 0: ",i5)')n0
write(*,'("Number of digits 1: ",i5)')n1
write(*,'("Number of digits 2: ",i5)')n2
write(*,'("Number of digits 3: ",i5)')n3
write(*,'("Number of digits 4: ",i5)')n4
write(*,'("Number of digits 5: ",i5)')n5
write(*,'("Number of digits 6: ",i5)')n6
write(*,'("Number of digits 7: ",i5)')n7
write(*,'("Number of digits 8: ",i5)')n8
write(*,'("Number of digits 9: ",i5)')n9
read(*,*)
end program SO1658805
Aw, ... I just read you need the number of digits stored in to an array. While I just counted them.
Oh well, ... "left as an exercise for the reader ..." :-)
Can you read with a FORMAT(A1)? It's been 20 years so I don't remember the exact syntax.
I wonder why the open statement succeeds when file 7 hasn't been closed. I think you need an endfile statement and/or a rewind statement in between writing and reading.
Paul Tomblin posted what you have to do after you solve your problem in getting reads to work in the first place.
I am getting an 'end of line' runtime error
You don't show the ! code to read here... which makes it kind of difficult to guess what you are doing wrong :-)
Perhaps you need a loop to read each line and then jump out of the loop to a continue statement when there are no more lines.
Something like this:
do
read(7,*,end=10) fibNumber
end do
10 continue
Better still - look at the more modern style used in this revcomp program.
here are some hints:
You don't need to use characters,
much less file i/o for this problem
(unless you forgot to state that a
file must be created).
Therefore, use math to find your statistics. There are lots of resources on Fibonacci numbers that might provide a simplifying insight or at least a way to independently spot check your answers.
Here is a complicated hint in non-Fortran lingo:
floor(10^(frac(log_10(7214989861293412))))
(Put this in Wolfram Alpha to see what it does.)
A simpler hint (for a different approach) is that you can do very
well in Fortran with simple
arithmetic inside of looping
constructs--at least for a first pass at the solution.
Accumulate your statistics as you
go. This advice would even apply to your character-driven approach. (This problem is ideally suited
for coming up with a cute indexing
scheme for your statistics, but some
people hate cute schemes in
programming. If you don't fear
cuteness ... then you can have associative
arrays in Fortran as long as your
keys are integers ;-)
The most important aspect of this
problem is the data type you will
use to calculate your answers. For
example, here's the last number you
will have to print.
Cheers, --Jared