I'm looking for a Fortran Library or preferred method of serializing data to a memory buffer in Fortran.
After researching the topic, I found examples using the EQUIVALENCE statement and the TRANSFER intrinsic function. I wrote code to test them and they both worked. In my limited testing, the transfer function seems to be quite a bit slower than the equivalence statement. However, I have found several references stating that in general not to use the equivalence statement.
So, I've been trying to come up with another way to serialize data efficiently. After ready up on the Fortran 2003 spec I discovered that I could use the C_LOC and C_F_POINTER together to cast my "byte" array into the desired data type (int, real, etc..). Initial testing shows that it is working and is faster than the transfer function. An example program is listed below. I was wondering if this is valid use of the C_LOC and C_F_POINTER functions.
Thanks!
program main
use iso_c_binding
implicit none
real(c_float) :: a, b, c
integer(c_int8_t), target :: buf(12)
a = 12345.6789_c_float
b = 4567.89123_c_float
c = 9079.66788_c_float
call pack_float( a, c_loc(buf(1)) )
call pack_float( b, c_loc(buf(5)) )
call pack_float( c, c_loc(buf(9)) )
print '(A,12I5)', 'Bin: ', buf
contains
subroutine pack_float( src, dest )
implicit none
real(c_float), intent(in) :: src
type(c_ptr), intent(in) :: dest
real(c_float), pointer :: p
call c_f_pointer(dest, p)
p = src
end subroutine
end program
Output:
Bin: -73 -26 64 70 33 -65 -114 69 -84 -34 13 70
I also coded this up in Python to double check the answer above. The code and output is listed below.
import struct
a = struct.pack( '3f', 12345.6789, 4567.89123, 9079.66788)
b = struct.unpack('12b', a)
print b
Output:
(-73, -26, 64, 70, 33, -65, -114, 69, -84, -34, 13, 70)
Practically, use of C_LOC and C_F_POINTER in this way is likely to work, but formally it is not standard conforming. The type and type parameters of the Fortran pointer passed to C_F_POINTER must either be interoperable with the object nominated by the C address or be the same as the type and type parameters of the object that was originally passed to C_LOC (see the description of the FPTR argument in F2008 15.2.3.3). Depending on what you are trying to serialize, you may also find that formal restrictions on the C_LOC argument (which are progressively less restrictive in later standards than F2003) come into play.
(The C equivalent requires use of unsigned char for this sort of trick - which is not necessarily the same thing as int8_t.)
There are constraints on the items in an EQUIVALENCE set that make that approach not generally applicable (see constraints C591 through C594 in F2008). Interpreting the internal representation of an object through equivalence is also formally subject to the rules around definition and undefinition of variables - see F2008 16.6.6 item 1 in particular.
The conforming way to access the representation of one object as a different type in Fortran is through TRANSFER. Be mindful that serialization of the internal representation of derived types with allocatable or pointer components (is that what you mean by dynamic fields?) may not be useful.
Depending on circumstance, it may be simpler and more robust to simply store your real time results in an array of the type of the thing to be stored.
Related
This question will draw information from the draft N1570, so C11 basically.
Colloquially, to dereference a pointer means to apply the unary * operator to a pointer. There is only one place where the word "dereferencing" exists in the draft document (no instance of "dereference"), and it is in a footnote:
102) [...]
Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an address inappropriately aligned for
the type of object pointed to, and the address of an object after the
end of its lifetime
As far as I can see, the unary * operator is actually called the "indirection operator", as evidenced by §6.5.3.2:
6.5.3.2 Address and indirection operators
4 The unary * operator denotes indirection. [...]
Simiarily, it is explicitly called the indirection operator in Annex §J.2:
— The value of an object is accessed by an array-subscript [],
member-access . or −>, address &, or indirection * operator or a
pointer cast in creating an address constant (6.6).
So is it correct to talk about "dereferencing pointers" in C or is this being excessively pedantic? Where does the terminology come from? (I can kinda give a pass on [] being called "deferencing" due to §6.5.2.1)
K&R v1
If one look at The C Programming Language, in first edition, (1978), the term “indirection” is used.
Examples
2.12 Precedence and Order of Evaluation
[…]
Chapter 5 discusses * (indirection) and & (address of).
,
7.2 Unary operators
[…]
The unary * operator means indirection: the expression must be a pointer, and the
result is an lvalue referring to the object to which the expression points.
It is also listed in INDEX as e.g.
* indirection operator 89, 187
A longer excerpt from section 5.1
5.1 Pointers and Addresses
Since a pointer contains the address of an object, it is possible to access the object “indirectly” through the pointer.
Suppose that x is a variable, say an int, and that px is a
pointer, created in some as yet unspecified way. The unary operator c
gives the address of an object, so the statement
px = &x;
assigns the address of x to the variable px; px is now said to
“point to” x. The & operator can be applied only to variables
and array elements; constructs like &(x+1 ) and &3 are illegal. It
is also illegal to take the address of a register variable.
The unary operator * treats its operand as the address off the ultimate target, and accesses that address to fetch the contents. Thus
if y is alos an int,
y = *px;
assigns to y the contents of whatever px points to. So the
sequence
px = &x;
y = *px;
assigns the same value to y as does
y = x;
K&R v2
In second edition the term dereferencing comes in.
5.1 Pointers and Addresses
The unary operator * is the indirection or dereferencing operator; when applied to a pointer, it accesses the object the pointer points to. Suppose that x and y are integers and ip is a pointer to int. This artificial sequence shows how to declare a pointer and how to use & and *:
[…]
Prior usage
The term is however ("much") older as can be seen in e.g.
A survey of some issues concerning abstract data types, 1974. E.g pp24/25. Here stated in the connection with ALGOL 68, PASCAL, SIMULA 67.
The mechanism by which pointers are transformed into values by a language is
known as 'dereferencing', a form of coercion (discussed later). Consider the statement
p := q;
Depending upon the types of p and q, there are several possible interpretations.
Let '#' be a dereferencing operator (i.e. if p points to j , then #p is the same as j) and
'#' be a referencing operation (i.e. if p points to j , then p is the same as #j). The
following table indicates the possible actions a language might take to perform the
assignment:
|
| type of p
|
| t ref t ref ref t . . .
|
---------------------------------------------------------
|
t | p←q p←#q p←##q
| #p←q #p←#q
| ##p←q
type |
of |
q ref t | p←#q p←q p←#q
| #p←#q #p←q
| ##p←#q
|
|
ref ref t | p←##q p←#q p←q
. | #p←##q #p←#q
. | ##p←##q
. |
|
|
[…]
Coining
There are several other examples of its usage. Exactly where and when it was coined I am not able to find though (at least not yet). (The 1974 paper is at least interesting.)
For the fun of it it can also often be useful to look at mailing lists such as net.unix-wizards. An example from Peter Lamb at Melbourne Uni (11/28/83):
Dereferencing NULL pointers is yet another example of idiots who
write 'portable' code, assuming however, that THEIR machine is the
only one on which it will ever run: the same sorts of people who designed
cpio with binary headers.
Even on a VAX, dereferencing NULL will get you garbage: sure, *(char *)NULL
and *(short *)NULL return you 0, but *(int *)NULL will give you
1024528128 !!!!.
[…]
Ed1. Addition
Not mentioning “dereferencing” but still; An interesting read is Ritchie: The Development of the C Language ✝
Here the term “indirection” is also consistently used – but/and/etc. the connection between the languages are somewhat detailed. The use of the term is thus interesting in view of e.g. papers like the 1974 one mentioned above.
As an example on indirection as concept and the syntax read e.g. pp 12 ev.
An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing.
[…]
There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C– Algol 68, for example – describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an ‘inside-out’ style that many find difficult to grasp [Anderson 80].
In this conjunction it is likely also worth mentioning ANSI C89 and mentions like:
3.1.2.5 Types
A pointer to void may not be dereferenced, although such a pointer may be converted to a normal pointer type which may be dereferenced.
Draft ANSI C Standard (ANSI X3J11/88-090), (Courtesy of Wikipedia)
Rationale for American National Standard for Information Systems – Programming Language – C
Among the invalid values for dereferencing a pointer by the unary * operator are
a null pointer, an address inappropriately aligned for the type of
object pointed to, or the address of an object that has automatic
storage duration when execution of the block in which the object is
declared and of all enclosed blocks has terminated.
(I have to re-read some of these documents now.)
Because in the good old days of K&R C, the language only passed parameters by value. So pointers were used to simulate a pass parameters by reference. And people (incorrectly) spoke of taking a reference to a variable for constructing a pointer to a variable.
And the dereferencing of a pointer was the opposite operation.
Now C++ uses true references that are distinct from pointers, but the word dereference is still used (even if it is not really correct).
I do not know the exact etymology, but one can consider a pointer value (in the generic sense, not the C/C++-specific meaning) as "referencing" another object in memory; that is, p refers to x. When we use p to obtain the value stored in x, we are bypassing that reference, or de-referencing p.
Kernighan and Ritchie, The C Programming Language, 2nd ed., 5.1:
The unary operator * is the indirection or dereferencing operator; [...] ''pointer to void'' is used to hold any type of pointer but cannot be dereferenced itself.
I have a Fortran program which uses a routine in a module to resize a matrix like:
module resizemod
contains
subroutine ResizeMatrix(A,newSize,lindx)
integer,dimension(:,:),intent(inout),pointer :: A
integer,intent(in) :: newSize(2)
integer,dimension(:,:),allocatable :: B
integer,optional,intent(in) :: lindx(2)
integer :: i,j
allocate(B(lbound(A,1):ubound(A,1),lbound(A,2):ubound(A,2)))
forall (i=lbound(A,1):ubound(A,1),j=lbound(A,2):ubound(A,2))
B(i,j)=A(i,j)
end forall
if (associated(A)) deallocate(A)
if (present(lindx)) then
allocate(A(lindx(1):lindx(1)+newSize(1)-1,lindx(2):lindx(2)+newSize(2)-1))
else
allocate(A(newSize(1),newSize(2)))
end if
do i=lbound(B,1),ubound(B,1)
do j=lbound(B,2), ubound(B,2)
A(i,j)=B(i,j)
end do
end do
deallocate(B)
end subroutine ResizeMatrix
end module resizemod
The main program looks like:
program resize
use :: resizemod
implicit none
integer,pointer :: mtest(:,:)
allocate(mtest(0:1,3))
mtest(0,:)=[1,2,3]
mtest(1,:)=[1,4,5]
call ResizeMatrix(mtest,[3,3],lindx=[0,1])
mtest(2,:)=0
print *,mtest(0,:)
print *,mtest(1,:)
print *,mtest(2,:)
end program resize
I use ifort 14.0 to compile the codes. The issue that I am facing is that sometimes I don't get the desired result:
1 0 0
1 0 5
0 0 -677609912
Actually I couldn't reproduce the issue (which is present in my original program) using the minimal test codes. But the point that I noticed was that when I remove the compiler option -fast, this problem disappears.
Then my question would be
If the pieces of code that I use are completely legal?
If any other method for resizing the matrices would be recommended which is better than the one presented in here?
The relevance of the described issue and the compiler option "-fast".
If I've read the code right it's legal but incorrect. In your example you've resized a 2x3 array into 3x3 but the routine ResizeMatrix doesn't do anything to set the values of the extra elements. The strange values you see, such as -677609912, are the interpretation, as integers. of whatever bits were lying around in memory when the memory location corresponding to the unset array element was read (so that it's value could be written out).
The relevance of -fast is that it is common for compilers in debug or low-optimisation modes, to zero-out memory locations but not to bother when higher optimisation is switched on. The program is legal in the sense that it contains no compiler-determinable syntax errors. But it is incorrect in the sense that reading a variable whose value has not been initialised or assigned is not something you regularly ought to do; doing so makes your program, essentially, non-deterministic.
As for your question 2, it raises the possibility that you are not familiar with the intrinsic functions reshape or (F2003) move_alloc. The latter is almost certainly what you want, the former may help too.
As an aside: these days I rarely use pointer on arrays, allocatable is much more useful and generally easier and safer too. But you may have requirements of which I wot not.
I have done a lot of work with array-based interpreted languages, but I'm having a look at Fortran. What has just occurred to me after writing my first bit of code is the question of whether or not gfortran will optimise an expression using array syntax by placing the expression in a single loop. In most array-based interpreters an expression such as A=B/n*2*pi (where B is an array) would require 5 loops and multiple array temporaries to evaluate. Is gfortran clever enough to optimise this out, and will my code below (the line that calculates the array from 0 to 2pi) be as efficient as an explicit do loop around the expression? Is there anything I should look out for when using array syntax if I'm worried about performance?
PROGRAM Sine
IMPLICIT NONE
REAL, PARAMETER :: PI = 3.415926535
INTEGER, PARAMETER :: z = 500
INTEGER :: ier
INTEGER, EXTERNAL :: PGBEG
REAL, DIMENSION(z) :: x,y
x=(indgen(z)-1.0)/z*(2*pi) ! This line...``
y=sin(x)
CALL plot(y,x)
CONTAINS
FUNCTION indgen(n) result(i)
INTEGER :: n
INTEGER, DIMENSION(n) :: i
INTEGER :: l
DO l=1,n
i(l)=l
END DO
END FUNCTION indgen
SUBROUTINE plot(y,x)
REAL, DIMENSION(:) :: x,y
ier=PGBEG(0,'/XWINDOW',1,1)
CALL PGENV(0.0,7.0,-1.0,1.0,0,1)
CALL PGLINE(SIZE(x),x,y)
CALL PGEND()
END SUBROUTINE plot
END PROGRAM Sine
In gfortran you can use the -Warray-temporaries flag to see all array temporaries generated. When I try your example no extra array temporary is generated (other than the one necessary to store the results of indgen(z)), so I guess gfortran is clever enough.
The expression z*(2*pi) is a compile-time constant, which the compiler can easily verify, so that should not be evaluated at run time regardless. Additionally, virtually all modern compilers should perform one-line "elemental" array operations within a single loop, and in many cases SIMD instructions will be generated (auto-vectorization).
Whether a temporary is generated usually depends on whether or not each element can be handled independently, and whether or not the compiler can prove this. Xiaolei Zhu's suggestion of using -Warray-temporaries is a good one. Don't mix this up wih -fcheck=array-temps, which I think only applies to temporaries generated for function calls.
Here's an example of such a message from gfortran:
foo.F90:4.12:
foo(1:20) = 2*foo(20:1:-1)
1
Warning: Creating array temporary at (1)
Your function call will be done in a separate loop, unless the compiler can inline it. Whether or not the compiler inlines a short function can be pretty unpredictable; it potentially depends on where that other function is defined, whether or not the function has the pure attribute (although in practice this rarely seems to matter), the vendor and version of the compiler itself, and the options you pass. Some compilers can generate a report for this; as I recall, the Intel compiler has a decent one.
Edit: It's also possible to inline the expression in this line pretty easily by hand, using an "implied do loop":
x = [ ( real(i)/z*(2*pi), i = 0, z-1) ]
Yes.
Fortran is compiled rather than interpreted.
It handles loops very well.
I am trying to use SmallCheck to test a Haskell program, but I cannot understand how to use the library to test my own data types. Apparently, I need to use the Test.SmallCheck.Series. However, I find the documentation for it extremely confusing. I am interested in both cookbook-style solutions and an understandable explanation of the logical (monadic?) structure. Here are some questions I have (all related):
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)? What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
I managed to make quickCheck work like this without getting my hands too dirty by using judicious applications of Control.Monad.liftM to functions like makeTale. I couldn't figure out a way to do this with smallCheck (please explain it to me!).
What is the relationship between the types Serial, Series, etc.?
(optional) What is the point of coSeries? How do I use the Positive type from SmallCheck.Series?
(optional) Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
If there is there any intro/tutorial to using smallCheck, I'd appreciate a link. Thank you very much!
UPDATE: I should add that the most useful and readable documentation I found for smallCheck is this paper (PDF). I could not find the answer to my questions there on the first look; it is more of a persuasive advertisement than a tutorial.
UPDATE 2: I moved my question about the weird Identity that shows up in the type of Test.SmallCheck.list and other places to a separate question.
NOTE: This answer describes pre-1.0 versions of SmallCheck. See this blog post for the important differences between SmallCheck 0.6 and 1.0.
SmallCheck is like QuickCheck in that it tests a property over some part of the space of possible types. The difference is that it tries to exhaustively enumerate a series all of the "small" values instead of an arbitrary subset of smallish values.
As I hinted, SmallCheck's Serial is like QuickCheck's Arbitrary.
Now Serial is pretty simple: a Serial type a has a way (series) to generate a Series type which is just a function from Depth -> [a]. Or, to unpack that, Serial objects are objects we know how to enumerate some "small" values of. We are also given a Depth parameter which controls how many small values we should generate, but let's ignore it for a minute.
instance Serial Bool where series _ = [False, True]
instance Serial Char where series _ = "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
series d = Nothing : map Just (series d)
In these cases we're doing nothing more than ignoring the Depth parameter and then enumerating "all" possible values for each type. We can even do this automatically for some types
instance (Enum a, Bounded a) => Serial a where series _ = [minBound .. maxBound]
This is a really simple way of testing properties exhaustively—literally test every single possible input! Obviously there are at least two major pitfalls, though: (1) infinite data types will lead to infinite loops when testing and (2) nested types lead to exponentially larger spaces of examples to look through. In both cases, SmallCheck gets really large really quickly.
So that's the point of the Depth parameter—it lets the system ask us to keep our Series small. From the documentation, Depth is the
Maximum depth of generated test values
For data values, it is the depth of nested constructor applications.
For functional values, it is both the depth of nested case analysis and the depth of results.
so let's rework our examples to keep them Small.
instance Serial Bool where
series 0 = []
series 1 = [False]
series _ = [False, True]
instance Serial Char where
series d = take d "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
-- we shrink d by one since we're adding Nothing
series d = Nothing : map Just (series (d-1))
instance (Enum a, Bounded a) => Serial a where series d = take d [minBound .. maxBound]
Much better.
So what's coseries? Like coarbitrary in the Arbitrary typeclass of QuickCheck, it lets us build a series of "small" functions. Note that we're writing the instance over the input type---the result type is handed to us in another Serial argument (that I'm below calling results).
instance Serial Bool where
coseries results d = [\cond -> if cond then r1 else r2 |
r1 <- results d
r2 <- results d]
these take a little more ingenuity to write and I'll actually refer you to use the alts methods which I'll describe briefly below.
So how can we make some Series of Persons? This part is easy
instance Series Person where
series d = SnowWhite : take (d-1) (map Dwarf [1..7])
...
But our coseries function needs to generate every possible function from Persons to something else. This can be done using the altsN series of functions provided by SmallCheck. Here's one way to write it
coseries results d = [\person ->
case person of
SnowWhite -> f 0
Dwarf n -> f n
| f <- alts1 results d ]
The basic idea is that altsN results generates a Series of N-ary function from N values with Serial instances to the Serial instance of Results. So we use it to create a function from [0..7], a previously defined Serial value, to whatever we need, then we map our Persons to numbers and pass 'em in.
So now that we have a Serial instance for Person, we can use it to build more complex nested Serial instances. For "instance", if FairyTale is a list of Persons, we can use the Serial a => Serial [a] instance alongside our Serial Person instance to easily create a Serial FairyTale:
instance Serial FairyTale where
series = map makeFairyTale . series
coseries results = map (makeFairyTale .) . coseries results
(the (makeFairyTale .) composes makeFairyTale with each function coseries generates, which is a little confusing)
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)?
First of all, you need to decide which values you want to generate for each depth. There's no single right answer here, it depends on how fine-grained you want your search space to be.
Here are just two possible options:
people d = SnowWhite : map Dwarf [1..7] (doesn't depend on the depth)
people d = take d $ SnowWhite : map Dwarf [1..7] (each unit of depth increases the search space by one element)
After you've decided on that, your Serial instance is as simple as
instance Serial m Person where
series = generate people
We left m polymorphic here as we don't require any specific structure of the underlying monad.
What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
Use cons1:
instance Serial m FairyTale where
series = cons1 makeTale
What is the relationship between the types Serial, Series, etc.?
Serial is a type class; Series is a type. You can have multiple Series of the same type — they correspond to different ways to enumerate values of that type. However, it may be arduous to specify for each value how it should be generated. The Serial class lets us specify a good default for generating values of a particular type.
The definition of Serial is
class Monad m => Serial m a where
series :: Series m a
So all it does is assigning a particular Series m a to a given combination of m and a.
What is the point of coseries?
It is needed to generate values of functional types.
How do I use the Positive type from SmallCheck.Series?
For example, like this:
> smallCheck 10 $ \n -> n^3 >= (n :: Integer)
Failed test no. 5.
there exists -2 such that
condition is false
> smallCheck 10 $ \(Positive n) -> n^3 >= (n :: Integer)
Completed 10 tests without failure.
Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
When you are writing a Serial instance (or any Series expression), you work in the Series m monad.
When you are writing tests, you work with simple functions that return Bool or Property m.
While I think that #tel's answer is an excellent explanation (and I wish smallCheck actually worked the way he describes), the code he provides does not work for me (with smallCheck version 1). I managed to get the following to work...
UPDATE / WARNING: The code below is wrong for a rather subtle reason. For the corrected version, and details, please see this answer to the question mentioned below. The short version is that instead of instance Serial Identity Person one must write instance (Monad m) => Series m Person.
... but I find the use of Control.Monad.Identity and all the compiler flags bizarre, and I have asked a separate question about that.
Note also that while Series Person (or actually Series Identity Person) is not actually exactly the same as functions Depth -> [Person] (see #tel's answer), the function generate :: Depth -> [a] -> Series m a converts between them.
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, FlexibleContexts, UndecidableInstances #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity
data Person = SnowWhite | Dwarf Int
instance Serial Identity Person where
series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))
Not very familiar with in-line assembly to begin with, and much less with that of the blackfin processor. I am in the process of migrating a legacy C application over to C++, and ran into a problem this morning regarding the following routine:
//
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
:: "a" ( buffer ), "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
I have a class that contains an array of shorts that is used for audio processing:
class AudProc
{
enum { buffer_size = 512 };
short M_samples[ buffer_size * 2 ];
// remaining part of class omitted for brevity
};
Within the AudProc class I have a method that calls clear_buffer, passing it the samples array:
clear_buffer ( M_samples, sizeof ( M_samples ) / 2 );
This generates a "Bus Error" and aborts the application.
I have tried making the array public, and that produces the same result. I have also tried making it static; that allows the call to go through without error, but no longer allows for multiple instances of my class as each needs its own buffer to work with. Now, my first thought is, it has something to do with where the buffer is in memory, or from where it is being accessed. Does something need to be changed in the in-line assembly to make this work, or in the way it is being called?
Thought that this was similar to what I was trying to accomplish, but it is using a different dialect of asm, and I can't figure out if it is the same problem I am experiencing or not:
GCC extended asm, struct element offset encoding
Anyone know why this is occurring and how to correct it?
Does anyone know where there is helpful documentation regarding the blackfin asm instruction set? I've tried looking on the ADSP site, but to no avail.
I would suspect that you could define your clear_buffer as
inline void clear_buffer (short * buffer, int len) {
memset (buffer, 0, sizeof(short)*len);
}
and probably GCC is able to optimize (when invoked with -O2 or -O3) that cleverly (because GCC knows about memset).
To understand assembly code, I suggest running gcc -S -O -fverbose-asm on some small C file, then to look inside the produced .s file.
I would have take a guess, because I don't know Blackfin assembler:
That LC0 sounds like "loop counter", LSETUP looks like a macro/insn, which, well, setups a loop between two labels and with a certain loop counter.
The "%0" operands is apparently the address to write to and we can safely guess it's incremented in the loop, in other words it's both an input and output operand and should be described as such.
Thus, I suggest describing it as in input-output operand, using "+" constraint modifier, as follows:
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
: "+a" ( buffer )
: "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
This is, of course, just a hypothesis, but you could disassemble the code and check if by any chance GCC allocated the same register for "%0" and "%2".
PS. Actually, only "+a" should be enough, early-clobber is irrelevant.
For anyone else who runs into a similar circumstance, the problem here was not with the in-line assembly, nor with the way it was being called: it was with the classes / structs in the program. The class that I believed to be the offender was not the problem - there was another class that held an instance of it, and due to other members of that outer class, the inner one was not aligned on a word boundary. This was causing the "Bus Error" that I was experiencing. I had not come across this before because the classes were not declared with __attribute__((packed)) in other code, but they are in my implementation.
Giving Type Attributes - Using the GNU Compiler Collection (GCC) a read was what actually sparked the answer for me. Two particular attributes that affect memory alignment (and, thus, in-line assembly such as I am using) are packed and aligned.
As taken from the aforementioned link:
aligned (alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:
struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));
force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.
Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire struct or union type.
As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:
struct S { short f[3]; } __attribute__ ((aligned));
Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables that have types that you have aligned this way.
In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two that is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.
Note that although you can ask the compiler to select a time-efficient alignment for a given type and then declare only individual stand-alone objects of that type, the compiler's ability to select a time-efficient alignment is primarily useful only when you plan to create arrays of variables having the relevant (efficiently aligned) type. If you declare or use arrays of variables of an efficiently-aligned type, then it is likely that your program also does pointer arithmetic (or subscripting, which amounts to the same thing) on pointers to the relevant type, and the code that the compiler generates for these pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types.
The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below.
Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.
.
packed
This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.
Specifying this attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members. Specifying the -fshort-enums flag on the line is equivalent to specifying the packed attribute on all enum definitions.
In the following example struct my_packed_struct's members are packed closely together, but the internal layout of its s member is not packed—to do that, struct my_unpacked_struct needs to be packed too.
struct my_unpacked_struct
{
char c;
int i;
};
struct __attribute__ ((__packed__)) my_packed_struct
{
char c;
int i;
struct my_unpacked_struct s;
};
You may only specify this attribute on the definition of an enum, struct or union, not on a typedef that does not also define the enumerated type, structure or union.
The problem which I was experiencing was specifically due to the use of packed. I attempted to simply add the aligned attribute to the structs and classes, but the error persisted. Only removing the packed attribute resolved the problem. For now, I am leaving the aligned attribute on them and testing to see if I find any improvements in the efficiency of the code as mentioned above, simply due to their being aligned on word boundaries. The application makes use of arrays of these structures, so perhaps there will be better performance, but only profiling the code will say for certain.