I'm going through a Fortran code, and one bit has me a little puzzled.
There is a subroutine, say
SUBROUTINE SSUB(X,...)
REAL*8 X(0:N1,1:N2,0:N3-1),...
...
RETURN
END
Which is called in another subroutine by:
CALL SSUB(W(0,1,0,1),...)
where W is a 'working array'. It appears that a specific value from W is passed to the X, however, X is dimensioned as an array. What's going on?
This is non-uncommon idiom for getting the subroutine to work on a (rectangular in N-dimensions) subset of the original array.
All parameters in Fortran (at least before Fortran 90) are passed by reference, so the actual array argument is resolved as a location in memory. Choose a location inside the space allocated for the whole array, and the subroutine manipulates only part of the array.
Biggest issue: you have to be aware of how the array is laid out in memory and how Fortran's array indexing scheme works. Fortran uses column major array ordering which is the opposite convention from c. Consider an array that is 5x5 in size (and index both directions from 0 to make the comparison with c easier). In both languages 0,0 is the first element in memory. In c the next element in memory is [0][1] but in Fortran it is (1,0). This affects which indexes you drop when choosing a subspace: if the original array is A(i,j,k,l), and the subroutine works on a three dimensional subspace (as in your example), in c it works on Aprime[i=constant][j][k][l], but in Fortran in works on Aprime(i,j,k,l=constant).
The other risk is wrap around. The dimensions of the (sub)array in the subroutine have to match those in the calling routine, or strange, strange things will happen (think about it). So if A is declared of size (0:4,0:5,0:6,0:7), and we call with element A(0,1,0,1), the receiving routine is free to start the index of each dimension where ever it likes, but must make the sizes (4,5,6) or else; but that means that the last element in the j direction actually wraps around! The thing to do about this is not use the last element. Making sure that that happens is the programmers job, and is a pain in the butt. Take care. Lots of care.
in fortran variables are passed by address.
So W(0,1,0,1) is value and address. so basically you pass subarray starting at W(0,1,0,1).
This is called "sequence association". In this case, what appears to be a scaler, an element of an array (actual argument in caller) is associated with an array (implicitly the first element), the dummy argument in the subroutine . Thereafter the elements of the arrays are associated by storage order, known as "sequence". This was done in Fortran 77 and earlier for various reasons, here apparently for a workspace array -- perhaps the programmer was doing their own memory management. This is retained in Fortran >=90 for backwards compatibility, but IMO, doesn't belong in new code.
Related
Preamble
On another question I understood that python constructors routines does not make a copy of the provided numpy array only if the data-type of the array is the same for all the entries. In case the constructor is fed with a structured numpy array with different types on columns, it makes a copy.
Implementation reference
df_dict = {}
for i in range(5):
obj = Object(1000000)
arr = obj.getNpArr()
print(arr[:10])
df_dict[i] = pandas.DataFrame.from_records(arr)
print("The DataFrames are :")
for i in range(5):
print(df_dict[i].head(10))
In this case Object(N) constructs an instance of Object which internally allocates and initializes a 2D array of shape (N,3), with dtypes 'f8','i4','i4' on each row. Object manages the life of these data, deallocating it on destruction. The function Object.getNpArr() returns a np.recarray pointing to the internal data and it has the above mentioned dtype. Importantly, the returned array does not own the data, it is just a view.
Problem
The DataFrame printed at the end show corrupted data (with respect to the printed array inside the first loop). I am not expecting such behaviour, since the array fed to the pandas construction function is copied (I separately checked this behaviour).
I have not many ideas about the cause and solutions to avoid data corruption. The only guess I can make is:
the constructor starts allocating the memory for its own data, which takes long because of the big size, and then copies
before/during the allocation/copy, the GIL is released and it is taken back to the for loop
for loop proceed before the copy of the array is completed, going to the next iteration
at the next iteration the obj name is moved to the new Object and the memory is deallocated, which causes data corruption in the copy of the DataFrame at the previous iteration, which is probably still running.
If this is really the cause of the issue, how can I find a workaround? Is there a way to let the GIL go through only when the copy of the array is effectively done?
Or, if my guess is wrong, what is the cause of the data corruption?
Definition:
As defined here, CGGetDisplaysWithPoint takes 4 parameters:
A CGPoint object
An int32 representing the maximum number of displays returned
A mutable array passed by reference, which will be filled with the displayIDs found.
An int32 representing the matching display count
Syntax:
CGError CGGetDisplaysWithPoint(CGPoint point, uint32_t maxDisplays, CGDirectDisplayID *displays, uint32_t *matchingDisplayCount);
This is fine and I can get this function working however I am quite confused as to how I should deal with the maxDisplays parameter?
As I understand it, if I set maxDisplays to 5 then if someone has 6 displays, there is a 1/6 chance that a randomly selected pixel will find no displays?
So do we just set maxDisplays to something unrealistic, like 99, and release the array afterwards? What's the point in this argument?
The point of the argument is to prevent the function from writing past the end of your array. You have to tell it the capacity of the array. Note that the displays parameter is neither a Cocoa nor Core Foundation mutable array object. It's a C-style array. It's "mutable" in the sense that it's not "const", but it's not an object that manages its own storage. You are responsible for managing that storage and must communicate its capacity to any function that is intended to store data in it (or otherwise guarantee that such function won't overrun it).
So, your question should really be how to decide on the capacity of the array. There are two basic approaches:
1) Call the function passing NULL for the displays parameter and any arbitrary value (best to use 0) for maxDisplays. As documented, when displays is NULL, maxDisplays is ignored and the function outputs via matchingDisplayCount the number of displays whose bounds contain the given point. Then, allocate an array with (at least) that many elements to use to receive the display IDs and call the function again, passing that array for displays and its capacity for maxDisplays.
2) Use an array with capacity of 32. It's not explicitly documented but it's implicit in the API that that's the maximum number of supported displays. A display ID can be converted to an OpenGL display mask using CGDisplayIDToOpenGLDisplayMask(). The type CGOpenGLDisplayMask is used to hold OpenGL display masks. It is defined as uint32_t, a 32-bit value. Therefore, there can be at most 32 active displays.
This technique is used in some Apple docs, like here, here, here, and here. That last one even makes a direct connection between the number of bits in CGOpenGLDisplayMask and the maximum number of displays.
I have a Fortran program which uses a routine in a module to resize a matrix like:
module resizemod
contains
subroutine ResizeMatrix(A,newSize,lindx)
integer,dimension(:,:),intent(inout),pointer :: A
integer,intent(in) :: newSize(2)
integer,dimension(:,:),allocatable :: B
integer,optional,intent(in) :: lindx(2)
integer :: i,j
allocate(B(lbound(A,1):ubound(A,1),lbound(A,2):ubound(A,2)))
forall (i=lbound(A,1):ubound(A,1),j=lbound(A,2):ubound(A,2))
B(i,j)=A(i,j)
end forall
if (associated(A)) deallocate(A)
if (present(lindx)) then
allocate(A(lindx(1):lindx(1)+newSize(1)-1,lindx(2):lindx(2)+newSize(2)-1))
else
allocate(A(newSize(1),newSize(2)))
end if
do i=lbound(B,1),ubound(B,1)
do j=lbound(B,2), ubound(B,2)
A(i,j)=B(i,j)
end do
end do
deallocate(B)
end subroutine ResizeMatrix
end module resizemod
The main program looks like:
program resize
use :: resizemod
implicit none
integer,pointer :: mtest(:,:)
allocate(mtest(0:1,3))
mtest(0,:)=[1,2,3]
mtest(1,:)=[1,4,5]
call ResizeMatrix(mtest,[3,3],lindx=[0,1])
mtest(2,:)=0
print *,mtest(0,:)
print *,mtest(1,:)
print *,mtest(2,:)
end program resize
I use ifort 14.0 to compile the codes. The issue that I am facing is that sometimes I don't get the desired result:
1 0 0
1 0 5
0 0 -677609912
Actually I couldn't reproduce the issue (which is present in my original program) using the minimal test codes. But the point that I noticed was that when I remove the compiler option -fast, this problem disappears.
Then my question would be
If the pieces of code that I use are completely legal?
If any other method for resizing the matrices would be recommended which is better than the one presented in here?
The relevance of the described issue and the compiler option "-fast".
If I've read the code right it's legal but incorrect. In your example you've resized a 2x3 array into 3x3 but the routine ResizeMatrix doesn't do anything to set the values of the extra elements. The strange values you see, such as -677609912, are the interpretation, as integers. of whatever bits were lying around in memory when the memory location corresponding to the unset array element was read (so that it's value could be written out).
The relevance of -fast is that it is common for compilers in debug or low-optimisation modes, to zero-out memory locations but not to bother when higher optimisation is switched on. The program is legal in the sense that it contains no compiler-determinable syntax errors. But it is incorrect in the sense that reading a variable whose value has not been initialised or assigned is not something you regularly ought to do; doing so makes your program, essentially, non-deterministic.
As for your question 2, it raises the possibility that you are not familiar with the intrinsic functions reshape or (F2003) move_alloc. The latter is almost certainly what you want, the former may help too.
As an aside: these days I rarely use pointer on arrays, allocatable is much more useful and generally easier and safer too. But you may have requirements of which I wot not.
I have a subroutine in a shared library:
SUBROUTINE DLLSUBR(ARR)
IMPLICIT NONE
INTEGER, PARAMETER :: N = 2
REAL ARR(0:N)
arr(0) = 0
arr(1) = 1
arr(2) = 2
END
And let's assume I will call it from executable by:
REAL ARR(0:3)
CALL DLLSUBR(ARR)
Note: The code happily compiles and runs (DLLSUBR is inside a module) without any warning or error in Debug + /check:all option switched on.
Could this lead to memory corruption or some weird behaviour? Where I can find info about passing array with different size in the Fortran specification?
It is actually allowed for explicit shape arrays by the rules of sequence association, if you make the dummy argument element count to be smaller or equal. It is prohibited when the subroutine expects more elements then it gets.
The explicit shape arrays often require the arguments to be passed by a copy. This happens when the compiler cannot prove the array is contiguous (a pointer or an assumed shape array dummy argument). If smaller number of elements was passed, the subroutine could then access some garbage after the copy of the portion of the array.
In your case everything will be OK, because you are passing more to a subroutine expecting less.
Fortran 2008 12.5.2.11.4:
4 An actual argument that represents an element sequence and
corresponds to a dummy argument that is an array is sequence
associated with the dummy argument if the dummy argument is an
explicit-shape or assumed-size array. The rank and shape of the actual
argument need not agree with the rank and shape of the dummy argument,
but the number of elements in the dummy argument shall not exceed the
number of elements in the element sequence of the actual argument. If
the dummy argument is assumed-size, the number of elements in the
dummy argument is exactly the number of elements in the element
sequence.
I've got a Lua program that seems to be slower than it ought to be. I suspect the issue is that I'm adding values to an associative array one at a time and the table has to allocate new memory each time.
There did seem to be a table.setn function, but it fails under Lua 5.1.3:
stdin:1: 'setn' is obsolete
stack traceback:
[C]: in function 'setn'
stdin:1: in main chunk
[C]: ?
I gather from the Google searching I've done that this function was depreciated in Lua 5.1, but I can't find what (if anything) replaced the functionality.
Do you know how to pre-size a table in Lua?
Alternatively, is there some other way to avoid memory allocation when you add an object to a table?
Let me focus more on your question:
adding values to an associative array
one at a time
Tables in Lua are associative, but using them in an array form (1..N) is optimized. They have double faces, internally.
So.. If you indeed are adding values associatively, follow the rules above.
If you are using indices 1..N, you can force a one-time size readjust by setting t[100000]= something. This should work until the limit of optimized array size, specified within Lua sources (2^26 = 67108864). After that, everything is associative.
p.s. The old 'setn' method handled the array part only, so it's no use for associative usage (ignore those answers).
p.p.s. Have you studied general tips for keeping Lua performance high? i.e. know table creation and rather reuse a table than create a new one, use of 'local print=print' and such to avoid global accesses.
static int new_sized_table( lua_State *L )
{
int asize = lua_tointeger( L, 1 );
int hsize = lua_tointeger( L, 2 );
lua_createtable( L, asize, hsize );
return( 1 );
}
...
lua_pushcfunction( L, new_sized_table );
lua_setglobal( L, "sized_table" );
Then, in Lua,
array = function(size) return sized_table(size,0) end
a = array(10)
As a quick hack to get this running you can add the C to lua.c.
I don't think you can - it's not an array, it's an associative array, like a perl hash or an awk array.
http://www.lua.org/manual/5.1/manual.html#2.5.5
I don't think you can preset its size meaningfully from the Lua side.
If you're allocating the array on the C side, though, the
void lua_createtable (lua_State *L, int narr, int nrec);
may be what you need.
Creates a new empty table and pushes
it onto the stack. The new table has
space pre-allocated for narr array
elements and nrec non-array elements.
This pre-allocation is useful when you
know exactly how many elements the
table will have. Otherwise you can use
the function lua_newtable.
There is still an internal luaL_setn and you can compile Lua so that
it is exposed as table.setn. But it looks like that it won't help
because the code doesn't seem to do any pre-extending.
(Also the setn as commented above the setn is related to the array part
of a Lua table, and you said that your are using the table as an associative
array)
The good part is that even if you add the elements one by one, Lua does not
increase the array that way. Instead it uses a more reasonable strategy. You still
get multiple allocations for a larger array but the performance is better than
getting a new allocation each time.
Although this doesn't answer your main question, it answers your second question:
Alternatively, is there some other way to avoid memory allocation when you add an object to a table?
If your running Lua in a custom application, as I can guess since your doing C coding, I suggest you replace the allocator with Loki's small value allocator, it reduced my memory allocations 100+ fold. This improved performance by avoiding round trips to the Kernel, and made me a much happier programmer :)
Anyways I tried other allocators, but they were more general, and provide guarantee's that don't benefit Lua applications (such as thread safety, and large object allocation, etc...), also writing your own small-object allocator can be a good week of programming and debugging to get just right, and after searching for an available solution Loki's allocator wasthe easiest and fastest I found for this problem.
If you declare your table in code with a specific amount of items, like so:
local tab = { 0, 1, 2, 3, 4, 5, ... , n }
then Lua will create the table with memory already allocated for at least n items.
However, Lua uses the 2x incremental memory allocation technique, so adding an item to a table should rarely force a reallocation.