I am trying to run a really large FORTRAN model, so I can't give all the code that's involved but I hope I can give enough information for this to make sense.
The code compiled fine (using Intel 2016.0.109 compiler, OpenMPI 1.10.2 and HDF5 1.8.17).
When I try to run it though, it tells me that two of my inputs (called NZG and NZS) are set to -999, hence it fails.
> >>>> opspec_grid error! in your namelist!
> ---> Reason: Too few soil layers. Set it to at least 2. Your nzg is currently set to -999...
> ---> Reason: Too few maximum # of snow layers. Set it to at least 1. Your nzs is currently set to -999.
However, in the input file, they are really specified as
NL%NZG = 9
NL%NZS = 1
I went through all the modules that somehow have something to do with these variables and cannot find anywhere where this should go off the rails.
So I am starting to think now that maybe there is something wrong with the way the values are read in. The input file is a text file. The variables are specified as integers in the module that reads them in, FYI.
Should I check for special characters or something? I know Fortran can be picky with inputs...
EDIT: Below the code sequence tracing back the error
1) The main module opens the namelist
write (unit=*,fmt='(a)') 'Reading namelist information'
call read_nl(trim(name_name))
read_nl is
subroutine read_nl(namelist_name)
use ename_coms, only : nl & ! intent(inout)
, init_ename_vars ! ! subroutine
implicit none
!----- Arguments. ----------------------------------------------------------------------!
character(len=*), intent(in) :: namelist_name
!----- Local variables. ----------------------------------------------------------------!
logical :: fexists
!----- Name lists. ---------------------------------------------------------------------!
namelist /ED_NL/ nl
!---------------------------------------------------------------------------------------!
!----- Open the namelist file. ---------------------------------------------------------!
inquire (file=trim(namelist_name),exist=fexists)
if (.not. fexists) then
call fatal_error('The namelist file '//trim(namelist_name)//' is missing.' &
,'read_nl','ed_load_namelist.f90')
end if
!----- Initialise the name list with absurd, undefined values. -------------------------!
call init_ename_vars(nl)
!----- Read grid point and options information from the namelist. ----------------------!
open (unit=10, status='OLD', file=namelist_name)
read (unit=10, nml=ED_NL)
close(unit=10)
return
end subroutine read_nl
2) Then there are specifics about how (what variables) to read from the namelist (the input)
write (unit=*,fmt='(a)') 'Copying namelist'
call copy_nl('ALL_CASES')
if (runtype == 'HISTORY') then
call copy_nl('HISTORY')
else
call copy_nl('NOT_HISTORY')
end if
My simulation is 'NOT_HISTORY': copy_nl('ALL_CASES') is specified so it reads a number of the namelist variables, including the variable 'runtype' - which is then used in that if-else statement.
3) Then copy_nl('NOT_HISTORY') is the following, this is where nzg and nzs are defined.
subroutine copy_nl(copy_type)
use grid_coms , only : time & ! intent(out)
, centlon & ! intent(out)
, centlat & ! intent(out)
, deltax & ! intent(out)
, deltay & ! intent(out)
, nnxp & ! intent(out)
, nnyp & ! intent(out)
, nstratx & ! intent(out)
, nstraty & ! intent(out)
, polelat & ! intent(out)
, polelon & ! intent(out)
, ngrids & ! intent(out)
, timmax & ! intent(out)
, time & ! intent(out)
, nzg & ! intent(out)
, nzs ! ! intent(out)
implicit none
!----- Arguments. ----------------------------------------------------------------------!
character(len=*), intent(in) :: copy_type
!----- Internal variables. -------------------------------------------------------------!
integer :: ifm
!---------------------------------------------------------------------------------------!
!---------------------------------------------------------------------------------------!
! Here we decide which variables we should copy based on the input variable. !
!---------------------------------------------------------------------------------------!
select case (trim(copy_type))
case ('NOT_HISTORY')
itimea = nl%itimea
idatea = nl%idatea
imontha = nl%imontha
iyeara = nl%iyeara
nzg = nl%nzg
nzs = nl%nzs
slz(1:nzgmax) = nl%slz(1:nzgmax)
current_time%year = iyeara
current_time%month = imontha
current_time%date = idatea
current_time%time = real(int(real(itimea) * 0.01)) * hr_sec &
+ (real(itimea) * 0.01 - real(int(real(itimea)*0.01))) &
* 100.0 * min_sec
time = 0.0d0
NOTE that I have not put all the modules and variables in use, otherwise this post would become insanely long.
FYI the module grid_coms specifies the type the variable is. See below the relevant part (the whole module is a list of variables, nothing else)
module grid_coms
integer :: nzg ! Number of soil levels
integer :: nzs ! Number of snow/surface water levels
end module grid_coms
4) Back to the main module, this then calls ed_opspec_grid to check that the variables make sense, and this is where things go wrong. Again, with use the variables are initialized at the start, I'm leaving that out here.
subroutine ed_opspec_grid
!---------------------------------------------------------------------------------------!
! Check whether soil layers are reasonable, i.e, enough layers, sorted from the !
! deepest to the shallowest. !
!---------------------------------------------------------------------------------------!
if (nzg < 2) then
write (reason,'(a,1x,i4,a)') &
'Too few soil layers. Set it to at least 2. Your nzg is currently set to' &
,nzg,'...'
call opspec_fatal(reason,'opspec_grid')
ifaterr=ifaterr+1
elseif (nzg > nzgmax) then
write (reason,'(2(a,1x,i5,a))') &
'The number of soil layers cannot be greater than ',nzgmax,'.' &
,' Your nzg is currently set to',nzg,'.'
call opspec_fatal(reason,'opspec_grid')
ifaterr=ifaterr+1
end if
do k=1,nzg
if (slz(k) > -.001) then
write (reason,'(a,1x,i4,1x,a,1x,es14.7,a)') &
'Your soil level #',k,'is not enough below ground. It is currently set to' &
,slz(k),', make it deeper than -0.001...'
call opspec_fatal(reason,'opspec_grid')
ifaterr=ifaterr+1
end if
end do
do k=1,nzg-1
if (slz(k)-slz(k+1) > .001) then
write (reason,'(2(a,1x,i4,1x),a,2x,a,1x,es14.7,1x,a,1x,es14.7,a)') &
'Soil layers #',k,'and',k+1,'are not enough apart (i.e. > 0.001).' &
,'They are currently set as ',slz(k),'and',slz(k+1),'...'
call opspec_fatal(reason,'opspec_grid')
ifaterr=ifaterr+1
end if
end do
end subroutine ed_opspec_grid
NOTE that this is not the first check in this subroutine. Other variables are checked before this part (I left those out), but this is the first error message. Which makes me think that maybe part of the input is read okay and some is not.
Finally, let me stress again that this is a very large project, I really cannot show all the code, which is why I framed the question very simply: are there any (text) input requirements for Fortran that I might have missed (special characters, returns, maybe different for different systems?).
Plus, this code is used by a lot of researchers, on different platforms, so I doubt it very much there is something wrong with the code itself... (but let me know if I'm wrong).
You are not giving us enough data to pinpoint the issue, so I'll just tell you two of the issues that I have come across using namelists:
If you read several (even different) namelists from the same file, the order counts: If in the file namelist a preceeds namelist b, but the code reads b first, then it won't find a unless you rewind the file.
If the namelist in the data file does not include one or more values, these values will simply stay the same. It is quite likely that in the code these variables are set to -999 specifically so that their absence would be noted. So double-check that the data file is correct. Look for stray / characters that might end the namelist prematurely
All in all, in order to correctly assess what's happening, we'd need at least 3 things:
The declaration block, including all the variables that form part of this namelist, and the namelist /.../ ...,...,... specification
The part of the namelist file that declares this specific namelist, and
Whether any other namelists are also in the same file and whether they're before or after this namelist (both in the file, and in the code)
Related
so I'm trying to make a Fortran subroutine that reads a matrix (or a tensor I guess) of size 125x125x125 from a file that I create in another subroutine, but for some reason it doesn't work. I have successfully done it using almost the same program but for a different size of matrix (70x200x70) and for some reason when I change the size of the arrays and a couple of other things to fit the new data, the program reads just half the data file and throws an error that says At line 133 of file Busqueda.f90 (unit = 2, file = 'test2.txt') Fortran runtime error: End of file. So my working code is:
Subroutine DataSorting(datasort,indexsort,datosden,datostemp)
use globals
implicit None
real*8, dimension(70,200,70) :: datosdentemp
real*8, dimension(70,200,70), intent(out) :: datostemp, datosden
real*8, dimension(980000), intent(out) :: datasort
integer,dimension(3,980000),intent(out) :: indexsort
integer, dimension(3) :: Result
integer :: i
rewind(2)
read(2,*) datosden
datosdentemp = datosden
rewind(3)
read(3,*) datostemp
do i = 1, 10
Result = MAXLOC(datosdentemp)
datasort(i) = datosden(Result(1),Result(2),Result(3))
indexsort(1,i) = Result(1)
indexsort(2,i) = Result(2)
indexsort(3,i) = Result(3)
datosdentemp(Result(1),Result(2),Result(3)) = 0
end do
End Subroutine DataSorting
And the code that dont work is:
Subroutine DataSorting(datasort,indexsort,datosden,datostemp)
use globals
implicit None
real*8, dimension(125,125,125) :: datosdentemp
real*8, dimension(125,125,125), intent(out) :: datostemp, datosden
real*8, dimension(1953125), intent(out) :: datasort
integer,dimension(3,1953125),intent(out) :: indexsort
integer, dimension(3) :: Result
integer :: i
rewind(2)
read(2,*) datosden
datosdentemp = datosden
rewind(3)
read(3,*) datostemp
do i = 1, 10
Result = MAXLOC(datosdentemp)
datasort(i) = datosden(Result(1),Result(2),Result(3))
indexsort(1,i) = Result(1)
indexsort(2,i) = Result(2)
indexsort(3,i) = Result(3)
datosdentemp(Result(1),Result(2),Result(3)) = 0
end do
End Subroutine DataSorting
The files from where I'm reading the data just have the data arranged in 125 columns and 15625 rows in the case of the non-working program, and in 70 columns and 14000 rows for the working one, and the files are just that, I mean they are really structured that way in both cases. I was using scratch files for the files 2 and 3, and change them to '.txt' files to see if there was a problem with the input file, but no. Then I build a new file and did the reading with a do-loop, to see if that was the problem, but no, but at least that helped me to realize that the code was reading exactly half of 15625, but i don't know why this is happening, I'm new to fortran, my thesis advisor told me to do this code in Fortran 'cause the amount of data that have to handle will make my Julia code way to slow, so honestly any help will be much appreciated.
I would like to call subroutines contained in a module. the module is saved in a separate file with my_mod.f95 filename.
module calc_mean
! this module contains two subroutines
implicit none
public :: calc_sum
public :: mean
contains
subroutine calc_sum(x,n,s)
! this subroutine calculates sum of elements of a vector
! x the vector
! n size of the vector
! s sum of elements of x
integer, intent(in):: n
real, intent(in):: x(n)
integer :: i
real, intent(out):: s
s=0
do i=1,n
s=s+x(i)
end do
end subroutine calc_sum
!
!
!
subroutine mean(x,n,xav)
! this subroutine calculates the mean of a vector
! x the vector
! n size of the vector
! xav mean of x
integer, intent(in):: n
real, intent(in):: x(n)
real, intent(out):: xav
real :: s
!
!
call calc_sum(x,n,s)
xav=s/n
end subroutine mean
end module calc_mean
I have the main program saved in a different file with 'my_program.f95'
program find_mean
! this program calculates mean of a vector
use calc_mean
implicit none
! read the vector from a file
integer, parameter ::n=200
integer :: un, ierror
character (len=25):: filename
real :: x(n), xav
un=30
filename='randn.txt'
!
OPEN (UNIT=un, FILE=filename, STATUS='OLD', ACTION='READ', IOSTAT=ierror)
read(un,*) x !
!
call mean(x,n,xav)
write (*,100) xav
100 format ('mean of x is', f15.8)
end program find_mean
when I compile the main program with geany, I got the following error message. Please, help me!
**
/usr/bin/ld: /tmp/cctnlPMO.o: in function MAIN__': my_program.f08:(.text+0x1e1): undefined reference to __calc_mean_MOD_mean'
collect2: error: ld returned 1 exit status
**
When I save both the main program and the module to the same file and run it, everything is fine.
So I am doing 2 modules which are linking to the main program. The first one has all the variables defined in it and the second one is with the functions.
Module1:
module zmienne
implicit none
integer, parameter :: ngauss = 8
integer, parameter :: out_unit=1000
integer, parameter :: out_unit1=1001
integer, parameter :: out_unit2=1002, out_unit3=1003
real(10), parameter :: error=0.000001
real(10):: total_calka, division,tot_old,blad
real(10),parameter:: intrange=7.0
real(10),dimension(ngauss),parameter::xx=(/-0.9602898565d0,&
-0.7966664774d0,-0.5255324099d0,-0.1834346425d0,&
0.1834346425d0,0.5255324099d0,0.7966664774d0,0.9602898565d0/)
real(10),Dimension(ngauss),parameter::ww=(/0.1012285363d0,&
0.2223810345d0,0.3137066459d0,0.3626837834d0,&
0.3626837834d0,0.3137066459d0,0.2223810345d0,0.1012285363d0/)
real(10) :: r, u, r6, tempred, f, r2, r1, calka,beta
real(10) :: inte
real :: start, finish
integer:: i,j,irange
real(10),dimension(ngauss)::x,w,integrand
end module zmienne
Module2
module in
implicit none
contains
real(10) function inte(y,beta,r2,r1)
real(kind=10)::r,beta,r6,r2,r1,u,y
r=(r2-r1)*y+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
if (u>100.d0) then
inte=-1.0d0
else
inte=exp(-u)-1.d0
endif
inte=r*r*inte
end function
end module in
And while im calling them like that:
use zmienne; use in
I am getting following error:
Name 'inte' at (1) is an ambiguous reference to 'inte' from module 'zmienne'
I've deleted "inte" in the module1 but now I am getting following error:
irange=inte(intrange/division)
1
Error: Missing actual argument for argument 'beta' at (1)
The main program code is:
program wykres
use zmienne; use in
implicit none
open(unit=out_unit, file='wykresik.dat', action='write', status='replace')
open(unit=out_unit1, file='wykresik1.dat', action='write')
open(unit=out_unit2, file='wykresik2.dat', action='write')
open(out_unit3, file='wykresik3.dat', action='write')
! the gaussian points (xx) and weights (ww) are for the [-1,1] interval
! for [0,1] interval we have (vector instr.)
x=0.5d0*(xx+1.0d0)
w=0.5d0*ww
! plots
tempred = 1.0
call cpu_time(start)
do i=1,1000
r=float(i)*0.01
r6=(1.0/r)**6
u=beta*r6*(r6-1.0)
f=exp(-u/tempred)-1.0
write(out_unit,*) r, u
write(out_unit1,*)r, f
write(out_unit2,*)r, r*r*f
end do
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
! end of plots
! integration 1
calka=0.0
r1=0.0
r2=0.5
do i=1,ngauss
r=(r2-r1)*x(i)+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
! check for underflows
if (u>100.d0) then
f=-1.0d0
else
f=exp(-u)-1.d0
endif
! the array integrand is introduced in order to perform vector calculations below
integrand(i)=r*r*f
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
write(*,*)calka
! end of integration
! integration 2
calka=0.0
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
! end of integration 2
write(*,*)calka
! vector integration and analytical result
write(*,*)sum(integrand*w*(r2-r1)),-(0.5**3)/3.0
!**************************************************************
! tot_calka - the sum of integrals all integration ranges
! dividion the initial length of the integration intervals
! tot_old - we will compare the results fro two consecutive divisions.
! at the beginning we assume any big number
! blad - the difference between two consecutive integrations,
! at the beginning we assume any big number
! error - assumed precission, parameter, it is necassary for
! performing do-while loop
total_calka=0.0
division=0.5
tot_old=10000.0
blad=10000.0
do while (blad>error)
! intrange - the upper integration limit, it should be estimated
! analysing the plot of the Mayer function. Here - 7.
! irange = the number of subintegrals we have to calculate
irange=inte(intrange/division)
total_calka=-(0.5**3)/3.0
! the analytical result for the integration range [0,0.5]
! the loop over all the intervals, for each of them we calculate
! lower and upper limits, r1 and r2
do j=1,irange
r1=0.5+(j-1)*division
r2=r1+division
calka=0.0
! the integral for a given interval
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
total_calka=total_calka+calka*(r2-r1)
enddo
! aux. output: number of subintervals, old and new integrals
write(*,*) irange,division,tot_old,total_calka
division=division/2.0
blad=abs(tot_old-total_calka)
tot_old=total_calka
! and the final error
write(*,*) blad
enddo
open(1,file='calka.dat', access='append')
! the secod viarial coefficient=CONSTANT*total_calka,
! CONSTANT is omitted here
write(1,*)tempred,total_calka
close(1)
end program wykres
The inte is declared in both modules.
Upd. The inte(y,beta,r2,r1) function is defined in the module in, and is used in the main program. This function requires four arguments, but this call
irange=inte(intrange/division)
provides only one argument. I'm not sure if this function should be used in this case. Try to use long meaningful names for variables and functions to avoid similar issues.
I have written a meshing application that runs quite well except for one thing:
While writing the results to disk, the program sometimes hangs. Well effectively it is still running, I can see it produces 100% CPU load and occupying memory using top. But there is no new data written to the file and this state would probably last indefinitely.
The job can not even be killed, the only option I had so far was rebooting the machine to get rid of the job.
However, most of the time the exact same job executes until the end without any problems. It should be noted that this only happens for large jobs where the final file size is somewhere above the order of 50GB. I haven't hat this behavior with smaller jobs so far. The result file is written up to a different point each time. Sometimes its 45GB, sometimes its 60GB or something completely different.
The workstation I use operates OpenSuse 13.1, has enough RAM and Disk space available. Yet the same behavior could be observed on different machines.
What I tried so far (apart from the usual debugging) was using Gfortran and Intel Fortran compilers to no avail. I messed around with the -mcmodel=large compiler option, but as I understood this is not related to the size of files written by the program and did not help anything.
I don't know what information to provide since I am really running out of ideas what could be causing the problem. I append the routine that writes an ASCII result file here because it produces the largest file size, but I have the same issues when writing unformatted files with a similar routine.
ANY hint on what could possibly cause this behavior is highly appreciated.
SUBROUTINE write_legacymesh
use types
use parameters
use data
implicit none
INTEGER :: l, n, i, j, k, number, c, neighbor
REAL(DP), DIMENSION(3) :: pos_d
CHARACTER(LEN=100) :: dummy_char
INTEGER(I4B), DIMENSION(26) :: neighborbuffer
LOGICAL :: file_exists
call write_header
call system ('sync')
! create the mesh file
dummy_char = folder_mesh // '/' // file_mesh
write(*,'(A,A)') ' writing mesh: ', trim(adjustl(dummy_char))
open(unit=20, file=trim(adjustl(dummy_char)), status='replace', action='write')
! write the mesh file
write(20,'(A)') ' LB database'
l = level_max
! write the header for this mesh size
write(20,'(A)') ' number of collision centers'
write(20,'(I16)') nf(l)
write(20,'(A)') ' number of cut links'
write(20,'(I16)') ncut(l)
write(20,'(A)') ' number of boundary collision centers'
write(20,'(I16)') nb(l)
write(20,'(A)') ' dummy value'
write(20,'(I16)') 0
write(20,'(A)') ' number of inflow boundary collision centers'
write(20,'(I16)') nb1(l)
write(20,'(A)') ' number of pressure boundary collision centers'
write(20,'(I16)') nb2(l)
write(20,'(A)') ' number of slip boundary collision centers'
write(20,'(I16)') nb3(l)
write(20,'(A)') ' number of noslip boundary collision centers'
write(20,'(I16)') nb4(l)
write(20,'(A)') ' lattice spacing'
write(20,'(E16.8)') dx(l)*scaling
! write the coordinates
write(20,'(A)') ' cc RBT value cc coordinates'
do n=1, nf(l)
number = order_level(l)%order(n)
i = levels(l)%blocks(number)%pos(1)
j = levels(l)%blocks(number)%pos(2)
k = levels(l)%blocks(number)%pos(3)
! rescale and move the geometry to get the original size and position
pos_d(1) = (real(i,kind=dp) - 0.5_dp) * dx(l) + corner_ibc(1)
pos_d(2) = (real(j,kind=dp) - 0.5_dp) * dx(l) + corner_ibc(2)
pos_d(3) = (real(k,kind=dp) - 0.5_dp) * dx(l) + corner_ibc(3)
write(20,'(2I16,3E20.7)') n, levels(l)%blocks(number)%state, pos_d*scaling
end do
! write the link info
write(20,'(A)') ' link information'
do n=1, nf(l)
number = order_level(l)%order(n)
do c=1, nlinks
neighbor = levels(l)%blocks(number)%neighbors(c)
if(neighbor .GT. 0) then
neighborbuffer(c) = levels(l)%blocks(neighbor)%number
else
neighborbuffer(c) = neighbor
end if
end do
write(20,'(7I16)') n, neighborbuffer(1:6)
write(20,'(A,6I16)') ' ', neighborbuffer(7:12)
write(20,'(A,6I16)') ' ', neighborbuffer(13:18)
end do
! write the q-values
write(20,'(A)') ' q-values'
do n=1, ncut(l)
write(20,'(I16,E20.7)') n, max(qmin, min(qmax, aux_level(l)%q_values(n)))
end do
write(20,'(A)') ' end of data'
close(unit=20)
END SUBROUTINE write_legacymesh
Edit1: Following the suggestions of flushing write buffers, I added this to every loop that writes data:
if(mod(n,10000000) .EQ. 0) then
call flush(20)
close(unit=20)
call system('sync')
open(unit=20, file=trim(adjustl(dummy_char)), status='old', position='append', action='write')
end if
This did not change anything, the routine still hangs at a random point, this time after writing 18.5GB of data or 211'149'016 write events in the first loop.
Edit2: Deactivating buffered I/O with the Intel Fortran compiler (not using the -assume buffered_io flag) kind of "solved" the issue. But of course this leaves me with slow unbuffered writes for large files, a particularly unpleasant combination.
Edit3: Thanks for the suggestion to add manual write buffers. The file format is not well suited to make use of them efficiently, but its still better than nothing. Anyway, this is a legacy file format and I am positive that I can convince my co-workers that a different file format or even unformatted files are the better option.
While I still have no idea what causes the error, at least I have a tolerable way to avoid it now. Thanks for all the valuable suggestions.
Is there any means to accessing (reading, writing to) files that are opened in some other source code by just passing the unit number?
Yes, that is possible (for both reading and writing). Here is a short example:
module test_mod
contains
subroutine myWrite( uFile )
implicit none
integer, intent(in) :: uFile
write(uFile, *) 'Hello world'
end subroutine
end module
program test
use test_mod
implicit none
integer :: uFile, stat
open(newunit=uFile, file='test.txt', status='replace', &
action='write', iostat=stat)
if(stat.ne.0) return
call myWrite( uFile )
close (uFile)
end program
$ cat test.txt
Hello world
External file units are globally accessible. You need not even pass the unit number, though that is a better practice than using hardcoded units. This behavior defined in Fortran 2008 Cl. 9.5.1,
3 The external unit identified by a particular value of a scalar-int-expr is the same external unit in all program
units of the program.
where they provide this sample code in note 9.14:
In the example:
SUBROUTINE A
READ (6) X
...
SUBROUTINE B
N = 6
REWIND N
the value 6 used in both program units identifies the same external unit.