Optimizing a Fortran subroutine - optimization

I've written a minimal implementation for the fast xoroshiro128plus pseudo-random number generator in Fortran to replace the intrinsic random_number. This implementation is quite fast (4X faster than random_number) and the quality is good enough for my purposes, I don't use it in cryptography applications.
My question is how can I optimize this subroutine to get the last drop of performance from my compiler, even 10% improvement is appreciated. This subroutine is to be used in tight loops inside long simulations. I'm interested more in generating a single random number at a time and not big vectors or nD arrays at once.
Here is a test program to give you some context about how my subroutine is used:
program test_xoroshiro128plus
implicit none
integer, parameter :: n = 10000
real*8 :: A(n,n)
integer :: i, j, t0, t1, count_rate, count_max
call system_clock(t0, count_rate, count_max)
do j = 1,n
do i = 1,n
call drand128(A(i,j))
end do
end do
! call drand128(A) ! works also with 2D
call system_clock(t1)
print *, "Time :", real(t1-t0)/count_rate
print *, "Mean :", sum(A)/size(A), char(10), A(1:2,1:3)
contains
impure elemental subroutine drand128(r)
real*8, intent(out) :: r
integer*8 :: s0 = 113, s1 = 19937
s1 = xor(s0,s1)
s0 = xor(xor(ior(ishft(s0,55), ishft(s0,-9)),s1), ishft(s1,14))
s1 = ior(ishft(s1,36), ishft(s1,-28))
r = ishft(s0+s1, -1) / 9223372036854775808.d0
end
end program

Only now I realized you are asking about this particular PRNG. I am using it in Fortran myself https://bitbucket.org/LadaF/elmm/src/eb5b54b9a8eb6af158a38038f72d07865fe23ee3/src/rng_par_zig.f90?at=master&fileviewer=file-view-default
My code in the link is slower than yours, because it calls several subroutines and aims to be more universal. Bet let's try to condense the code I use into a single subroutine.
So let's just compare the performance of your code and the optimized version by #SeverinPappadeux and my optimized code with Gfortran 4.8.5
> gfortran -cpp -O3 -mtune=native xoroshiro.f90
Time drand128 sub: 1.80900002
Time drand128 fun: 1.80900002
Time rng_uni: 1.32900000
the code is here, remember to let the CPU spin-up, the first iteration of the k loop is just garbage!!!
program test_xoroshiro128plus
use iso_fortran_env
implicit none
integer, parameter :: n = 30000
real*8 :: A(n,n)
real*4 :: B(n,n)
integer :: i, j, k, t0, t1, count_rate, count_max
integer(int64) :: s1 = int(Z'1DADBEEFBAADD0D0', int64), s2 = int(Z'5BADD0D0DEADBEEF', int64)
!let the CPU spin-up
do k = 1, 3
call system_clock(t0, count_rate, count_max)
do j = 1,n
do i = 1,n
call drand128(A(i,j))
end do
end do
! call drand128(A) ! works also with 2D
call system_clock(t1)
print *, "Time drand128 sub:", real(t1-t0)/count_rate
call system_clock(t0, count_rate, count_max)
do j = 1,n
do i = 1,n
A(i,j) = drand128_fun()
end do
end do
! call drand128(A) ! works also with 2D
call system_clock(t1)
print *, "Time drand128 fun:", real(t1-t0)/count_rate
call system_clock(t0, count_rate, count_max)
do j = 1,n
do i = 1,n
call rng_uni(A(i,j))
end do
end do
call system_clock(t1)
print *, "Time rng_uni:", real(t1-t0)/count_rate
end do
print *, "Mean :", sum(A)/size(A), char(10), A(1:2,1:3)
contains
impure elemental subroutine drand128(r)
real*8, intent(out) :: r
integer*8 :: s0 = 113, s1 = 19937
s1 = xor(s0,s1)
s0 = xor(xor(ior(ishft(s0,55), ishft(s0,-9)),s1), ishft(s1,14))
s1 = ior(ishft(s1,36), ishft(s1,-28))
r = ishft(s0+s1, -1) / 9223372036854775808.d0
end
impure elemental real*8 function drand128_fun()
real*8, parameter :: c = 1.0d0/9223372036854775808.d0
integer*8 :: s0 = 113, s1 = 19937
s1 = xor(s0,s1)
s0 = xor(xor(ior(ishft(s0,55), ishft(s0,-9)),s1), ishft(s1,14))
s1 = ior(ishft(s1,36), ishft(s1,-28))
drand128_fun = ishft(s0+s1, -1) * c
end
impure elemental subroutine rng_uni(fn_val)
real(real64), intent(inout) :: fn_val
integer(int64) :: ival
ival = s1 + s2
s2 = ieor(s2, s1)
s1 = ieor( ieor(rotl(s1, 24), s2), shiftl(s2, 16))
s2 = rotl(s2, 37)
ival = ior(int(Z'3FF0000000000000',int64), shiftr(ival, 12))
fn_val = transfer(ival, 1.0_real64) - 1;
end subroutine
function rotl(x, k)
integer(int64) :: rotl
integer(int64) :: x
integer :: k
rotl = ior( shiftl(x, k), shiftr(x, 64-k))
end function
end program
The main difference should come from the faster and better way to convert from integers to reals http://experilous.com/1/blog/post/perfect-fast-random-floating-point-numbers#half-open-range
If you are bored, you could try to inline rotl() manually, but I trust the compiler here.

Ok, here is my attempt. First, I made it to function - in x64 or similar ABI function returning float value do in in register - much faster than parameter transfer. Second,
replaced final division by multiplication, though Intel compiler might do it for you.
Timing, Intel i7 6820, WSL, Ubuntu 18.04:
before - 0.850000024
after - 0.601000011
GNU Fortran 7.3.0, command line
gfortran -std=gnu -O3 -ffast-math -mavx2 /mnt/c/Users/kkk/Documents/CPP/a.for
Code
program test_xoroshiro128plus
implicit none
integer, parameter :: n = 10000
real*8 :: A(n,n)
integer :: i, j, t0, t1, count_rate, count_max
call system_clock(t0, count_rate, count_max)
do j = 1,n
do i = 1,n
A(i,j) = drand128()
end do
end do
A = drand128() ! works also with 2D
call system_clock(t1)
print *, "Time :", real(t1-t0)/count_rate
print *, "Mean :", sum(A)/size(A), char(10), A(1:2,1:3)
contains
impure elemental real*8 function drand128()
real*8, parameter :: c = 1.0d0/9223372036854775808.d0
integer*8 :: s0 = 113, s1 = 19937
s1 = xor(s0,s1)
s0 = xor(xor(ior(ishft(s0,55), ishft(s0,-9)),s1), ishft(s1,14))
s1 = ior(ishft(s1,36), ishft(s1,-28))
drand128 = ishft(s0+s1, -1) * c
end
end program

Related

flags, compilers and algorithm strategies to optimise Fortran loops

I'm studying around the impact of unrolling and optimisation flags on Fortran code. I come up with the following, very trivial, case:
program do_order
implicit none
integer :: j, s, n, nLoops
integer, dimension(4) :: Iv
real*8, dimension(16, 4) :: tmp, Ov
real :: start, finish
nLoops = 1000000
!! Initialize the values of Input vector;
do n = 1,4
Iv(n) = n**2
end do
!! Explicit Do-loop + implicit Do-loop working across columns (to be Fortran efficient)
call cpu_time(start)
do n = 1, nLoops
tmp = 0.d0
Ov = 0.d0
do j = 1,4
tmp(1:Iv(j),j) = Ov(1:Iv(j),j) - 10.0d0
end do
end do
call cpu_time(finish)
print '("Loop-1 Time = ",f6.3," seconds.")',finish-start
tmp = 0.d0
Ov = 0.d0
!! Un-rolled loop + implicit Do-loop
call cpu_time(start)
do n = 1, nLoops
tmp = 0.d0
Ov = 0.d0
tmp(1:Iv(1),1) = Ov(1:Iv(1),1) - 10.0d0
tmp(1:Iv(2),2) = Ov(1:Iv(2),2) - 10.0d0
tmp(1:Iv(3),3) = Ov(1:Iv(3),3) - 10.0d0
tmp(1:Iv(4),4) = Ov(1:Iv(4),4) - 10.0d0
end do
call cpu_time(finish)
print '("Loop-2 Time = ",f6.3," seconds.")',finish-start
end program
Compiled with: -O3 -mprefer-avx128 flags, gives me the following timing:
Loop-1 Time = 4.487 seconds.
Loop-2 Time = 3.657 seconds.
I've used Gfort: GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0. I have access also to
Ifort 19.1.3.304. The same test with Ifort -O3 gives me:
Loop-1 Time = 0.939 seconds.
Loop-2 Time = 0.873 seconds.
My questions:
Why the huge gap in performance between Gfort and Ifort?
did I unrolled the second loop correctly?
Are there other optimization flags to speedup the code even more (i.e. Loop unrolling & optimization, Decreasing the fortran run time by extra optimization flags)?
Are there other strategies to speedup the code even more (without moving to multithreading, for now)?

Extracting specific parameter and its amount from a text file

I have a text file (S.txt) which contains a certain parameter with its amount in each interval. Something like this:
-
-
x=a
-
-
x=b
-
-
x=c
-
-
.
.
.
I want to write Fortran code to open the text file (S.txt) and read it in order to find each 'x' and read its amount into a parameter. something like this:
a
b
c
.
.
.
I have come up with this code but it does not work:
PROGRAM deter
IMPLICIT NONE
real,Dimension(2) :: value
open(unit=40,file='D:\S.txt',action='read')
READ(40,fmt='(2X,f3.3)') value
close(40)
END PROGRAM deter
when I run this program I get NO ERROR, but it doesn't work either.
any suggestion?
program extract_value
implicit none
integer :: ios
character(len=200), allocatable :: command(:)
character(len=200), allocatable :: word(:)
real, allocatable :: x(:), deter(:)
character(len=200) :: line
integer :: n, i, j, r
character (len=5), parameter :: sstr='x='
open(unit=50, file='D:\S.txt', iostat=ios)
if ( ios /= 0 ) stop "Error opening file S.txt"
n = 0
do
read(50, '(A)', iostat=ios) line
if (ios /= 0) exit
n = n + 1
end do
allocate(command(n))
allocate(word(n))
rewind(50)
j=0
do i = 1, n
read(50,'(A)') command(i)
read (command(i),'(a2)') word(i)
if (word(i)==sstr) then
j=j+1
end if
end do
allocate(x(n))
allocate(deter(j))
x=0
do i = 1, n
if (word(i)==sstr) then
read(command(i), fmt='(2X,f5.2)') x(i)
end if
end do
deter=0
deter=pack(x, x /= 0)
close(50)
open(unit=100, file='D:\R.txt', action="write",status="replace")
WRITE(100,fmt='(2X,f5.2)')(deter(r), r=1,j)
close(100)
end program extract_value

Undefined reference to a function in a module

I was going through all the other threads about undefined reference and I was unable to figure out why this piece of codes is not working. I am basically trying to run a function from a module into a subroutine in the main program. But I keep getting an error
Main:
program main
use m1
implicit none
integer, dimension(:,:), allocatable :: A,AFun
integer :: n,i,j
print *, "enter value for n"
read *, n
do i=1,n
do j=1,n
A(i,j)=i**2-j**3
end do
end do
print *, "Matrix A: "
call prMat(A,n)
call matFun(A,AFun,n)
print *, "Matrix AFun:"
call prMat(AFun,n)
call fromAvg(AFun,n)
contains
subroutine prMat(x,n)
implicit none
integer, dimension(n,n) :: x
integer :: i,j,n
do i=1,n
write(*,*), (x(i,j))
end do
end subroutine prMat
subroutine matfun(x,y,n)
implicit none
integer, dimension(n,n) :: x,y
integer :: i,j,n,f1
do i = 1,n
do j=1,n
y(i,j)=f1(x,i,j,n)
end do
end do
end subroutine matFun
subroutine fromAvg(x,n)
integer, dimension(:,:) :: x
integer :: i,j,n
integer :: s,avg,g,b
s=0; g=0; b=0
do i=1,n
do j=1,n
s=s+x(i,j)
end do
end do
avg=s/(n*n)
do i = 1,n
do j = 1,n
if ( x(i,j) > avg ) then
g = g + 1
else
b = b + 1
end if
end do
end do
print *, "In from avg, average=", avg
print *, "Number of values greater than average is ",g
print *, "Number of values less than average is ",b
end subroutine fromAvg
integer, dimension(:,:), allocatable, intent(IN) :: x
integer :: i,j,n
integer :: s,avg,g,b
s=0; g=0; b=0
do i=1,n
do j=1,n
s=s+x(i,j)
end do
end do
avg=s/(n*n)
do i = 1,n
do j = 1,n
if ( x(i,j) > avg ) then
g = g + 1
elseif ( x(i,j) < avg ) then
b = b + 1
end if
end do
end do
print *, "In from avg, average=", avg
print *, "Number of values greater than average is ",g
print *, "Number of values less than average is ",b
end subroutine fromAvg
end program main
Module Function:
module m1
implicit none
private
public :: f1
contains
function f1(x,p,q,n)
integer, dimension(:,:) :: x
integer, intent(in) :: p,q,n
integer :: i,f1
f1=0
do i = 1,n
f1 = f1 + x(p,n)
end do
do i = 1,n
f1 = f1 + x(n,q)
end do
end function f1
end module m1
The error that I keep getting is as follows:
/tmp/ccKZHw7L.o: In function `matfun.1520':
lab4_b.f90:(.text+0x808): undefined reference to `__m1_MOD_f1'
collect2: ld returned 1 exit status
Am I missing something? I have use m1 in the beginning of my main program before the implicit none statement.
In the subroutine matfun f1 is (re-)declared to be an integer, in the line
integer :: i,j,n,f1
This effectively masks the module function from sight. Since the code is (correctly) using the module containing f1 it shouldn't also be declared inside the subroutine.
Then my compiler complains that there is no definition for fromavg but I guess you know that.

Store arithmetic operators in an array with Fortran

I would like to solve a given equation of the following kind with Fortran:
1 ? 2 ? 3 = 7
In this equation only the arithmetic operators are missing and the solution would be '+' for the first question mark and '*' for the second one. I would like to write a short script that finds the correct operators by brute force. So in this case four times four cases would have to be checked. In order to do this I would like to store the operators in an array and use them in a nested do loop:
value1=1
value2=2
value3=3
result=7
op(1)=+
op(2)=-
op(3)=/
op(4)=*
do i=1,4
do j=1,4
if(value1 op(i) value2 op(j) value3 .eq. result) then
write(*,*)'Found solution: ' ,op(i), op(j)
else
j=j+1
endif
enddo
i=i+1
enddo
Apparently this doesn't work because of the wrong interpretation of the if-statement. Any ideas how to make this work?
As Vladimir pointed out, you can't do it directly that way in Fortran, unless you use function pointers. Below you find an according example.
I made it simple by using integer operations only. Also note, that I assumed that the operations in your expression are executed from left to right (no precedence rules), otherwise the algorithm would be much more complicated. If precedence matters, you should think about using an interpreted language for the task (unless execution speed is cruical).
Here is the module defining the operations and a type containing procedure pointers:
module myfuncs
implicit none
abstract interface
function binary(i1, i2)
integer, intent(in) :: i1, i2
integer :: binary
end function binary
end interface
type :: procptr
procedure(binary), pointer, nopass :: ptr
end type procptr
contains
function add(i1, i2) result(res)
integer, intent(in) :: i1, i2
integer :: res
res = i1 + i2
end function add
function mul(i1, i2) result(res)
integer, intent(in) :: i1, i2
integer :: res
res = i1 * i2
end function mul
function sub(i1, i2) result(res)
integer, intent(in) :: i1, i2
integer :: res
res = i1 - i2
end function sub
function div(i1, i2) result(res)
integer, intent(in) :: i1, i2
integer :: res
res = i1 / i2
end function div
end module myfuncs
Here is the main program with the brute force search:
program bruteforce
use myfuncs
type(procptr) :: ops(4)
character :: opnames(4)
integer :: val1, val2, val3, res, myres
val1 = 1
val2 = 2
val3 = 3
res = 7
ops(1)%ptr => add
ops(2)%ptr => sub
ops(3)%ptr => mul
ops(4)%ptr => div
opnames = [ "+", "-", "*", "/" ]
lpi: do ii = 1, 4
lpj: do jj = 1, 4
myres = ops(jj)%ptr(ops(ii)%ptr(val1, val2), val3)
write(*,"(3(I0,1X,A,1X),I0)") val1, opnames(ii), val2, &
&opnames(jj), val3, " = ", myres
if (myres == res) then
write(*,*) "Solution found."
exit lpi
end if
end do lpj
end do lpi
end program bruteforce
This cannot be done in Fortran. What type did you declare for op? There isn't any that would fit. One thing you could do is to define some functions and store function pointers to them.

Fortran Error Meanings

I have been following books and PDFs on writing in FORTRAN to write an integration program. I compile the code with gfortran and get several copies of the following errors.
1)Unexpected data declaration statement at (1)
2)Unterminated character constant beginning at (1)
3)Unclassifiable statement at (1)
4)Unexpected STATEMENT FUNCTION statement at (1)
5)Expecting END PROGRAM statement at (1)
6)Syntax error in data declaration at (1)
7)Statement function at (1) is recursive
8)Unexpected IMPLICIT NONE statement at (1)
I do not know hat they truly mean or how to fix them, google search has proven null and the other topics on this site we about other errors. for Error 5) i put in Program main and end program main like i might in C++ but still got the same result. Error 7) makes no sense, i am trying for recursion in the program. Error 8) i read implicit none was to prevent unnecessary decelerations.
Ill post the code itself but i am more interested in the compiling errors because i still need to fine tune the array data handling, but i cant do that until i get it working.
Program main
implicit none
real, dimension(:,:), allocatable :: m, oldm
real a
integer io, nn
character(30) :: filename
real, dimension(:,:), allocatable :: alt, temp, nue, oxy
integer locationa, locationt, locationn, locationo, i
integer nend
real dz, z, integral
real alti, tempi, nuei, oxyi
integer y, j
allocate( m(0, 0) ) ! size zero to start with?
nn = 0
j = 0
write(*,*) 'Enter input file name: '
read(*,*) filename
open( 1, file = filename )
do !reading in data file
read(1, *, iostat = io) a
if (io < 0 ) exit
nn = nn + 1
allocate( oldm( size(m), size(m) ) )
oldm = m
deallocate( m )
allocate( m(nn, nn) )
m = oldm
m(nn, nn) = a ! The nnth value of m
deallocate( oldm )
enddo
! Decompose matrix array m into column arrays [1,n]
write(*,*) 'Enter Column Number for Altitude'
read(*,*) locationa
write(*,*) 'Enter Column Number for Temperature'
read(*,*) locationt
write(*,*) 'Enter Column Number for Nuetral Density'
read(*,*) locationn
write(*,*) 'Enter Column Number for Oxygen density'
read(*,*) locationo
nend = size(m, locationa) !length of column #locationa
do i = 1, nend
alt(i, 1) = m(i, locationa)
temp(i, 1) = log(m(i, locationt))
nue(i, 1) = log(m(i, locationn))
oxy(i, 1) = log(m(i, locationo))
enddo
! Interpolate Column arrays, Constant X value will be array ALT with the 3 other arrays
!real dz = size(alt)/100, z, integral = 0
!real alti, tempi, nuei, oxyi
!integer y, j = 0
dz = size(alt)/100
do z = 1, 100, dz
y = z !with chopped rounding alt(y) will always be lowest integer for smooth transition.
alti = alt(y, 1) + j*dz ! the addition of j*dz's allow for all values not in the array between two points of the array.
tempi = exp(linear_interpolation(alt, temp, size(alt), alti))
nuei = exp(linear_interpolation(alt, nue, size(alt), alti))
oxyi = exp(linear_interpolation(alt, oxy, size(alt), alti))
j = j + 1
!Integration
integral = integral + tempi*nuei*oxyi*dz
enddo
end program main
!Functions
real function linear_interpolation(x, y, n, x0)
implicit none
integer :: n, i, k
real :: x(n), y(n), x0, y0
k = 0
do i = 1, n-1
if ((x0 >= x(i)) .and. (x0 <= x(i+1))) then
k = i ! k is the index where: x(k) <= x <= x(k+1)
exit ! exit loop
end if
enddo
if (k > 0) then ! compute the interpolated value for a point not in the array
y0 = y(k) + (y(k+1)-y(k))/(x(k+1)-x(k))*(x0-x(k))
else
write(*,*)'Error computing the interpolation !!!'
write(*,*) 'x0 =',x0, ' is out of range <', x(1),',',x(n),'>'
end if
! return value
linear_interpolation = y0
end function linear_interpolation
I can provide a more detailed description of the exact errors, i was hoping that the error name would be enough since i have a few of each type.
I think I can spot a few serious errors in your code sample. The syntax error is that you have unbalanced parentheses in the exp(... statements. They should be like this:
tempi = exp(linear_interpolation(alt, temp, size(alt), alti) ) ! <- extra ")"
nuei = exp(linear_interpolation(alt, nue, size(alt), alti) )
oxyi = exp(linear_interpolation(alt, oxy, size(alt), alti) )
It's precisely things like this that can produce long strings of confusing errors like you're getting; therefore the advice Dave and Jonathan have given can't be repeated often enough.
Another error (the "unclassifiable statement") applies to your loops:
do(i=1, nend)
! ...
do(z=1, 100, dz)
! ...
These should be written without parentheses.
The "data declaration error" stems from your attempt to declare and initialise multiple variables like
real dz = size(alt)/100, z, integral = 0
Along with being positioned incorrectly in the code (as noted), this can only be done with the double colon separator:
real :: dz = size(alt)/100, z, integral = 0
I personally recommend always writing declarations like this. It must be noted, though, that initialising variables like this has the side effect of implicitly giving them the save attribute. This has no effect in a main program, but is important to know; you can avoid it by putting the initialisations on a separate line.