How to load different primitive datas to the operand stack - jvm

JVM has two instructions: bipush (operand value should be between Byte.MIN_VALUE and Byte.MAX_VALUE.) and sipush (operand value should be between Short.MIN_VALUE and Short.MAX_VALUE). Correspondingly, the MethodVisitor in the ASM provide the API to manipulate these two instructions:
public void visitIntInsn(int opcode, int operand)
Therefore, how to visit other non-byte(short) primitives in the ASM? For example, the primitive long, double, int, boolean, and float data? It looks strange these data are wrapped as Long, Double, Integer.

Booleans are just ints at the JVM level. Push 0 for false and 1 for true. You don't need bipush for that, you can just use iconst_0 and iconst_1.
Bytes, shorts, chars, and ints are all also just ints at the JVM level. There is no concept of short primitive types in bytecode, only truncatation functions (i2c and co).
If you want to push the int value 111, you would use bipush 111. If you want to push 1111, you would use sipush 1111. If you want to push 111111, you would use ldc 111111. They're all just ints at the bytecode level.
For floats, doubles, and longs, you'll have to in most cases use the ldc* family of instructions. For the special case of 0 or 1 constants, you can instead use lconst_0, dconst_1, etc.
You can create ldc instructions in ASM using the aptly named MethodVisitor.visitLdcInsn.
Note that ldc requires space in the constant pool, which is limited to 65534 slots. However, it's big enough that you'll never run out under practical circumstances. If you really want to, you can push larger primitive values without using the constant pool by using math to construct the values.
For example, int 111111 can be generated by iconst_3 bipush 15 ishl sipush 12807 iadd. Enjarify has code to do this for every primative type, even doubles, though they usually take many more instructions to represent for obvious reasons. However, if you're using ASM, you're probably not doing anything unusual and complicated enough to require this approach.

Related

Representing objects of properties and methods in memory

Representing objects of properties and methods in memory , if anyone have picture or drawing to expalin how computer deal with it and store properties in memory?
Computers do not really store abstract information of that sort at the basic level. There, you essentially have numbers--in binary, but that is not important--and it is generally up to software to interpret these numbers.
In the Von Neuman model, that close to every system is based on, you have one big address space. You can index into it, so your CPU can, for example, fetch the number that sits on a given address, or write a new number to an address, and that is mostly what there is to storing data. Usually, but not always, the addresses pick individual bytes of your memory, but your computer could address into larger or smaller word sizes, for example, you might have a computer that would address into 32 bit words instead of 8 bit words. It doesn't matter for the overall model, though. You just have a big block of memory and you can get the data at individual addresses.
How you interpret this data is up to the program. Well, almost. In this figure, I've tried to illustrate memory and where we have some data. The data is the zero-terminated string "Hello, World\n", but only if we interpret it as an ASCII-encoded string. If we interpreted it as an array of integers instead, then it would be that. The hardware doesn't care how you interpret the data.
What makes a computer a Neuman model is that both data and program is represented in the same memory. Not only can we get to any data via its address, but we can get to the code we want to run as well. There isn't any difference between the two. A program, or a function, or a method, is just an address where you have a sequence of numbers, and the CPU can interpret these numbers as executable code. You can, in theory, point to "Hello, World\n" and then tell the CPU to run it as a program. (I won't recommend it).
When it comes to executable code, there is the slight difference that the CPU does the interpretation. In your own program, you can mostly choose how to represent data (although there might be some penalties if you want different representations than what you get from the raw hardware), but the CPU will interpret the different numbers as specific instructions and execute them as such. At least that is how it works if you run native code; if you have a virtual machine, then the virtual machine is a program that interprets your code, and its interpretation of the data can be quite different from the CPU's. The virtual machine, though, will typically run native code, so you are still relying on the CPU's interoperation of numbers, although indirectly.
I should also mention that modern hardware and operating systems do not usually stick with the simple Von Neuman model. If you treat program and data as interchangeable, you get some massive security holes. In practise, you have some form of permission set on different memory blocks, and your code has to sit in a block that you are allowed to execute, and your data (typically) is not. You can switch the permissions, though, if you want to autogenerate native executable code, and virtual machines often do this.
Anyway, for simplicity, let's just say that we have a simple Von Neuman model. Then both program and data are just chunks of memory that we either interpret as program (and it will then be executed by the CPU when we tell it to run the code at a given address) or as data (and then our software is responsible for interpreting the numbers in memory as some higher data structure).
There aren't any differences between object, properties, or other higher-level concepts at this level. Those are entirely dealt with at the level(s) above the hardware. They are simply interpretations of the raw numbers that sit in memory.
Update: a few more details...
Storing objects
The hardware doesn’t know anything about objects. It has addresses and there are numbers (or bit-patterns, if you prefer) at those addresses. Most data types span more than one address. If, for example, we can address bytes, but integers take up four bytes (i.e. they are 32-bit integers), then naturally we need four bytes, at four addresses, to represent an integer. They will be represented as four contiguous bytes, and depending on the architecture you might have the most-significant byte first or last (this is known as endianess) So, the number 10 (which fits in a single byte, but is still a four-byte integer) might be represented as 0x00 0x00 0x00 0x0a or 0x0a 0x00 0x00 0x00. The 0x0a byte is 10 and it might be first or last.
What then about structures, which is what is closest to what we think of as objects? They are larger blocks of attributes/properties/entries/whatever, and they are represented the same way. Blocks of memory is all we have.
If you have an object that contains two integers, say a representation of a rectangle, then the object sits somewhere in memory and will contain the representation of those two integers.
rect:
h, w: int
I’ve intentionally made up the syntax for this, since it isn’t language specific, and different languages and runtime systems have different variations on how they do this, but they all do something similar.
Here, one representation could be a block of 8 bytes, two 4-byte integers, where the first is h and the second is w. There might be padding between elements, so the objects are aligned the way the hardware prefers, but I will ignore that here.
If the object sits at address 0xafode4, that means that h also sits there (assuming that there is no extra information stored in the object), and that means that w sits four bytes later, if integers take up four bytes of space. Again, the details will differ, but this is generally how it is done if you know the layout of objects at compile time. (If you don’t know them until runtime, you will instead have a table of attributes, and the object contains the table instead).
Now, what happens if an object contains other objects? Say, what if the rectangle is represented by two points instead, and the points are objects
point:
x, y: int
rect:
p1, p2: point
In the simplest version, nothing changes. The rect object contains two points, so the points are embedded in the memory that represents the rect.
This doesn’t always work, though. If you have polymorphic types, you might not know the concrete type of a contained object, so you cannot allocate memory. In that case, instead of containing the other object, you will have a reference to it, a pointer. The rect object would hold the addresses of the two points, and the points would sit elsewhere in memory. This is also what you have to do if you want to build non-trivial data structures, so it isn’t specific to object orientation or objects.
In an OOP context, there might be a bit more work to it, but we will get to that. First, let’s consider functions (and let’s go back to a rectangle that just holds h and w).
Representation of functions
Code is just blocks of memory as well, but where the numbers represent instructions to the CPU. Let’s say we want to multiply two numbers, then we might have an instruction that looks like
mul a, b, c
that says that the CPU should take the numbers in registers a and b, multiply them, and put the result in register c. You usually have instructions that take the input from memory or as constants or such as well, but let’s just consider a single simple instruction: multiply two numbers you have in registers and put the result in a third register.
The mul instruction has a number. Completely arbitrarily we can say that it is the byte 0xef. The three arguments specify registers, and if they are a byte each we can have up to 256 registers. The full instruction would contain four bytes, the mul instruction 0xef and the three arguments. If we want to multiply register r1 with register r2 and put the result in register r0, the instruction would be
mul r1, r2, r0
0xef 0x01 0x02 0x00
so what the computer sees is the program 0xef 0x01 0x02 0x00.
For functions, we need two things more: a way to return, and a way to handle input and output.
The return bit is easy. There will be a ret instruction that returns to where the function was called, handling stack registers and such in the process. We can pretend that ret has code 0xab.
Input and output is specified by a calling convention, and it isn’t tied to the hardware as such. You need an agreed upon way to pass arguments to functions and you need to know where the result is when the function returns, but that is all there is to it. On our imaginary architecture, we could say that input one and two will be in registers r1 and r2 and that the output should be in r0 when we return. That way, we can make a simple multiplication function
fun mult(a, b): return a * b
with the instructions
mul r1, r2, r0 ; 0xef 0x01 0x02 0x00
ret ; 0xab
and the computer will store it as the numbers 0xef 0x01 0x02 0x00 0xab. If you know where this code/data sits in memory, e.g. 0x00beef, you can call the function call 0x00beef with some other instruction call (that also has a number, say 0x10) and the address (here an address is typically 8 bytes on a desktop, or 64 bits, so the three bytes in 0x00beef would have zeros before or after it, depending on endianes. I will pretend that we have three byte addresses to make it more readable).
To call the function, you first need to get the arguments into the correct registers, so if you want to get the area of our rect object, you want to get h and w into registers r1 and r2.
What you want to do is call
area = mult(rect.h, rect.w)
so how do you get rect.h and rect.w into registers? You need instructions for that. Let’s say that we have a mov instruction (0x12) that looks like this:
mov adr, reg
where adr is an address (3 bytes on this imaginary architecture) and reg is a register (1 byte). The full instruction is 5 bytes (the 0x12 instruction, the 3 byte address and the 1 byte register). If your rect object sits at 0xaf0de4, then we have rect.h at 0xaf0de4 as well, and we have rect.w four bytes later, at 0xaf0de8. Calling mult(rect.h, rect.w) involves these instructions
mov 0xaf0de4, r1 ; rect.h -> r1
mov 0xaf0de8, r2 ; rect.h -> r2
call 0x00beef ; mult(rect.h, rect.w)
; now rect.h * rect.w is in r0
The actual data stored on the computer is the codes for this:
; mov 0xaf0de4, r1
0x12 0xaf 0x0d 0xe4 0x01
; mov 0xaf0de8, r2
0x12 0xaf 0x0d 0xe8 0x02
; call 0x00beef
0x10 0x00 0xbe 0xef
Everything is still just numbers that we can access through addresses.
Here, of course, the addresses we have used are hardwired into the program, and that doesn’t work in real life. You don’t know where all the objects will be when you compile your program. Some addresses you do know, once you fire up your executable. The location of functions, for example, will be known, and the linker can insert the correct addresses where you need them. Locations of objects, typically not. But there will be instructions like mov that takes the address from a register instead of from the program. We could, for example, have an instruction
mov a[offset], b
that moves data from the address stored in register a + offset into register b. It might have a another number, say 0x13 instead of 0x12, but in assembly you typically have the same code so you don’t see it there.
You would also have an instruction for putting a constant into a register, and I wouldn’t be surprised if that is also called mov and would have the form
mov a, b
where a is now a constant, i.e. some number, and you put that number in register b. The assembly looks the same, but the instruction might have number 0x14.
Anyway, we could use that to call mult(rect.h, rect.w) instead. Then the code would be
mov 0xaf0de4, r3 ; put the address of rect in r3
; 0x14 0xaf 0x0d 0xe4 0x03
mov r3[0], r1 ; put the value at r3+0 into r1
; 0x13 0x03 0x00 0x01
mov r3[4], r2 ; put the value at r3+4 into r2
; 0x13 0x03 0x04 0x02
call 0x00beef
; 0x10 0x00 0xbe 0xef
If we have these instructions, we could also modify our function mult(a,b) to one that takes a rectangle as input and returns the area
fun area(rect): rect.h * rect.w
The function can get the address of the object as its single argument, where it would go in register r1, and from there it could load rect.h and rect.w to multiply them.
; area(rect) -- address of rect in r1
mov r1[0], r2 ; rect.h -> r2
mov r1[4], r3 ; rect.w -> r3
mul r2, r3, r0 ; rect.h * rect.w -> r0
ret ; return rect.h * rect.w
It gets more complicated than this, but you should have the idea now. Our functions are sequences of such instructions, and the arguments to them, and the result value, is passed back and forth, usually through registers, by some calling convention. If you want to pass a value to a function, you need to put it in the right register (or on the stack, depending on the calling convention), and then the function will operate on it. What it does with the object is entirely software; the hardware doesn’t care that much.
Classes and polymorphism
What then if we want polymorphic methods? If we have a class hierarchy of geometric objects and rect is just one of them, and all of them should have an area method that, when called, is dispatched based on the objects’ class?
When you have polymorphic methods, what you really have is a bunch of different functions. If you call x.area() on an object x that happens to be a circle, then you are really calling circle_area(x), while if x is a rect you are calling rect_area(x). The only thing you need to make this work is having a mechanism for dispatching to the right function call.
Here, again, the details differ (a lot), but a simple solution is to put pointers to the correct function in the objects. If you call x.area() maybe you know that the first element in the memory of x is a pointer to its specific area function. So, instead of calling a function directly, you fetch the address of the function from x and then you call it.
x.area() == (x.area_func)(x)
All objects you can call area() on should have this function, and they should have it at the same offset from the address of the object, and then it can be as simple as that.
This can, of course, be wasteful in memory if your classes have lots of methods. You are storing a pointer to each method in each object (and you also have to spend time on initialising this, so there is additional overhead there as well).
Then another solution can be to add a level of indirection. If the methods are the same for all objects of a class (which they often are, but not for all languages) then you can put the table of methods in a class object and have a single pointer to the class in each object. When you need to get the right function you first get the class and then you get the function from it.
x.area() == (x.class.area_func)(x)
With single inheritance, the tables in the different classes can have different sizes, and it doesn’t get more complicated because of that. With multiple inheritance, it does get more complicated, but that is handled very differently in different languages so it is hard to say anything general about that.

Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?

The general standard appears to use NS_ENUM with NSInteger as the base type. Why is this the case? Assuming less than 256 cases (which covers almost any enumeration), is there any reason to use that instead of uint8_t, which could use less memory space? Either imports into Swift fine.
This is different than NS_OPTIONS, where a larger type makes sense, since you shouldn't be doing any bit math with enumerations, and you can use every number representable by the base type as a value.
The answer to the question in the title:
Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?
is probably not.
When declaring an enum in C if no underlying type is specified the compiler is free to choose any suitable type from char and the signed and unsigned integer types which can at least represent all the values required. The current Xcode/Clang compiler picks a 4-byte integer. One could reasonably assume the compiler writers made an informed choice - some balance of performance and storage.
Smaller types, such as uint8_t, will usually be aligned on smaller boundaries in memory (or on disc) - but that is only of benefit if the adjacent field matches the alignment e.g. if a 2-byte size typed field follows a 1-byte sized typed field then unless otherwise specified (e.g. with a #pragma packed) there will probably be an intervening unused byte.
Whether any performance or storage differences are significant will be heavily dependent on the application. Follow the usual rule of thumb - don't optimise until an issue is found.
However if you find semantic benefit in limiting the size then certainly do so - there is no general reason you shouldn't. The choice is similar to picking signed vs. unsigned integers, some programmers avoid unsigned types for values that will be ≥ 0 unless absolutely required for the extra range, while others appreciate the semantic benefit.
Summary: There is no right answer, its largely a subjective issue.
HTH
First of all: The memory footprint is close to completely meaningless. You are talking about 1 Byte vs. 4/8 Bytes. (If the memory alignment does not force the usage of 4/8 bytes whatever you chosed.) How many NS_ENUM (C) objects do you want to have in your running app?
I guess that the reason is pretty easy: NSInteger is akin of "catch all" integer type in Cocoa. That makes assignments easier, especially you do not have to care about assigning a bigger integer type to a smaller one. Without casting this would lead to warnings.
Having more than one integer type in a desktop app with a 32/64 bit model is akin of an anachronism. Nor a Mac neither a MacBook neither an iPhone is an embedded micro controller …
You can use any integer data type including uint8_t with NS_ENUM as.
typedef NS_ENUM(uint8_t, eEnumAddEditViewMode) {
eWBEnumAddMode,
eWBEnumEditMode
};
In old c style standard NSInteger is default, because NSInteger is akin of "catch all" integer type in objective c. and developer can easily type boxing and unboxing with their own variable. This is just developer friendly best practise.

Difference between Objective-C primitive numbers

What is the difference between objective-c C primitive numbers? I know what they are and how to use them (somewhat), but I'm not sure what the capabilities and uses of each one is. Could anyone clear up which ones are best for some scenarios and not others?
int
float
double
long
short
What can I store with each one? I know that some can store more precise numbers and some can only store whole numbers. Say for example I wanted to store a latitude (possibly retrieved from a CLLocation object), which one should I use to avoid loosing any data?
I also noticed that there are unsigned variants of each one. What does that mean and how is it different from a primitive number that is not unsigned?
Apple has some interesting documentation on this, however it doesn't fully satisfy my question.
Well, first off types like int, float, double, long, and short are C primitives, not Objective-C. As you may be aware, Objective-C is sort of a superset of C. The Objective-C NSNumber is a wrapper class for all of these types.
So I'll answer your question with respect to these C primitives, and how Objective-C interprets them. Basically, each numeric type can be placed in one of two categories: Integer Types and Floating-Point Types.
Integer Types
short
int
long
long long
These can only store, well, integers (whole numbers), and are characterized by two traits: size and signedness.
Size means how much physical memory in the computer a type requires for storage, that is, how many bytes. Technically, the exact memory allocated for each type is implementation-dependendant, but there are a few guarantees: (1) char will always be 1 byte (2) sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long).
Signedness means, simply whether or not the type can represent negative values. So a signed integer, or int, can represent a certain range of negative or positive numbers (traditionally –2,147,483,648 to 2,147,483,647), and an unsigned integer, or unsigned int can represent the same range of numbers, but all positive (0 to 4,294,967,295).
Floating-Point Types
float
double
long double
These are used to store decimal values (aka fractions) and are also categorized by size. Again the only real guarantee you have is that sizeof(float) <= sizeof(double) <= sizeof (long double). Floating-point types are stored using a rather peculiar memory model that can be difficult to understand, and that I won't go into, but there is an excellent guide here.
There's a fantastic blog post about C primitives in an Objective-C context over at RyPress. Lots of intro CPS textbooks also have good resources.
Firstly I would like to specify the difference between au unsigned int and an int. Say that you have a very high number, and that you write a loop iterating with an unsigned int:
for(unsigned int i=0; i< N; i++)
{ ... }
If N is a number defined with #define, it may be higher that the maximum value storable with an int instead of an unsigned int. If you overflow i will start again from zero and you'll go in an infinite loop, that's why I prefer to use an int for loops.
The same happens if for mistake you iterate with an int, comparing it to a long. If N is a long you should iterate with a long, but if N is an int you can still safely iterate with a long.
Another pitfail that may occur is when using the shift operator with an integer constant, then assigning it to an int or long. Maybe you also log sizeof(long) and you notice that it returns 8 and you don't care about portability, so you think that you wouldn't lose precision here:
long i= 1 << 34;
Bit instead 1 isn't a long, so it will overflow and when you cast it to a long you have already lost precision. Instead you should type:
long i= 1l << 34;
Newer compilers will warn you about this.
Taken from this question: Converting Long 64-bit Decimal to Binary.
About float and double there is a thing to considerate: they use a mantissa and an exponent to represent the number. It's something like:
value= 2^exponent * mantissa
So the more the exponent is high, the more the floating point number doesn't have an exact representation. It may also happen that a number is too high, so that it will have a so inaccurate representation, that surprisingly if you print it you get a different number:
float f= 9876543219124567;
NSLog("%.0f",f); // On my machine it prints 9876543585124352
If I use a double it prints 9876543219124568, and if I use a long double with the .0Lf format it prints the correct value. Always be careful when using floating points numbers, unexpected things may happen.
For example it may also happen that two floating point numbers have almost the same value, that you expect they have the same value but there is a subtle difference, so that the equality comparison fails. But this has been treated hundreds of times on Stack Overflow, so I will just post this link: What is the most effective way for float and double comparison?.

How do I access an integer array within a struct/class from in-line assembly (blackfin dialect) using gcc?

Not very familiar with in-line assembly to begin with, and much less with that of the blackfin processor. I am in the process of migrating a legacy C application over to C++, and ran into a problem this morning regarding the following routine:
//
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
:: "a" ( buffer ), "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
I have a class that contains an array of shorts that is used for audio processing:
class AudProc
{
enum { buffer_size = 512 };
short M_samples[ buffer_size * 2 ];
// remaining part of class omitted for brevity
};
Within the AudProc class I have a method that calls clear_buffer, passing it the samples array:
clear_buffer ( M_samples, sizeof ( M_samples ) / 2 );
This generates a "Bus Error" and aborts the application.
I have tried making the array public, and that produces the same result. I have also tried making it static; that allows the call to go through without error, but no longer allows for multiple instances of my class as each needs its own buffer to work with. Now, my first thought is, it has something to do with where the buffer is in memory, or from where it is being accessed. Does something need to be changed in the in-line assembly to make this work, or in the way it is being called?
Thought that this was similar to what I was trying to accomplish, but it is using a different dialect of asm, and I can't figure out if it is the same problem I am experiencing or not:
GCC extended asm, struct element offset encoding
Anyone know why this is occurring and how to correct it?
Does anyone know where there is helpful documentation regarding the blackfin asm instruction set? I've tried looking on the ADSP site, but to no avail.
I would suspect that you could define your clear_buffer as
inline void clear_buffer (short * buffer, int len) {
memset (buffer, 0, sizeof(short)*len);
}
and probably GCC is able to optimize (when invoked with -O2 or -O3) that cleverly (because GCC knows about memset).
To understand assembly code, I suggest running gcc -S -O -fverbose-asm on some small C file, then to look inside the produced .s file.
I would have take a guess, because I don't know Blackfin assembler:
That LC0 sounds like "loop counter", LSETUP looks like a macro/insn, which, well, setups a loop between two labels and with a certain loop counter.
The "%0" operands is apparently the address to write to and we can safely guess it's incremented in the loop, in other words it's both an input and output operand and should be described as such.
Thus, I suggest describing it as in input-output operand, using "+" constraint modifier, as follows:
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
: "+a" ( buffer )
: "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
This is, of course, just a hypothesis, but you could disassemble the code and check if by any chance GCC allocated the same register for "%0" and "%2".
PS. Actually, only "+a" should be enough, early-clobber is irrelevant.
For anyone else who runs into a similar circumstance, the problem here was not with the in-line assembly, nor with the way it was being called: it was with the classes / structs in the program. The class that I believed to be the offender was not the problem - there was another class that held an instance of it, and due to other members of that outer class, the inner one was not aligned on a word boundary. This was causing the "Bus Error" that I was experiencing. I had not come across this before because the classes were not declared with __attribute__((packed)) in other code, but they are in my implementation.
Giving Type Attributes - Using the GNU Compiler Collection (GCC) a read was what actually sparked the answer for me. Two particular attributes that affect memory alignment (and, thus, in-line assembly such as I am using) are packed and aligned.
As taken from the aforementioned link:
aligned (alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:
struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));
force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.
Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire struct or union type.
As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:
struct S { short f[3]; } __attribute__ ((aligned));
Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables that have types that you have aligned this way.
In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two that is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.
Note that although you can ask the compiler to select a time-efficient alignment for a given type and then declare only individual stand-alone objects of that type, the compiler's ability to select a time-efficient alignment is primarily useful only when you plan to create arrays of variables having the relevant (efficiently aligned) type. If you declare or use arrays of variables of an efficiently-aligned type, then it is likely that your program also does pointer arithmetic (or subscripting, which amounts to the same thing) on pointers to the relevant type, and the code that the compiler generates for these pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types.
The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below.
Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.
.
packed
This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.
Specifying this attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members. Specifying the -fshort-enums flag on the line is equivalent to specifying the packed attribute on all enum definitions.
In the following example struct my_packed_struct's members are packed closely together, but the internal layout of its s member is not packed—to do that, struct my_unpacked_struct needs to be packed too.
struct my_unpacked_struct
{
char c;
int i;
};
struct __attribute__ ((__packed__)) my_packed_struct
{
char c;
int i;
struct my_unpacked_struct s;
};
You may only specify this attribute on the definition of an enum, struct or union, not on a typedef that does not also define the enumerated type, structure or union.
The problem which I was experiencing was specifically due to the use of packed. I attempted to simply add the aligned attribute to the structs and classes, but the error persisted. Only removing the packed attribute resolved the problem. For now, I am leaving the aligned attribute on them and testing to see if I find any improvements in the efficiency of the code as mentioned above, simply due to their being aligned on word boundaries. The application makes use of arrays of these structures, so perhaps there will be better performance, but only profiling the code will say for certain.

Memcpy and Memset on structures of Short Type in C

I have a query about using memset and memcopy on structures and their reliablity. For eg:
I have a code looks like this
typedef struct
{
short a[10];
short b[10];
}tDataStruct;
tDataStruct m,n;
memset(&m, 2, sizeof(m));
memcpy(&n,&m,sizeof(m));
My question is,
1): in memset if i set to 0 it is fine. But when setting 2 i get m.a and m.b as 514 instead of 2. When I make them as char instead of short it is fine. Does it mean we cannot use memset for any initialization other than 0? Is it a limitation on short for eg
2): Is it reliable to do memcopy between two structures above of type short. I have a huge
strings of a,b,c,d,e... I need to make sure copy is perfect one to one.
3): Am I better off using memset and memcopy on individual arrays rather than collecting in a structure as above?
One more query,
In the structue above i have array of variables. But if I am passed pointer to these arrays
and I want to collect these pointers in a structure
typedef struct
{
short *pa[10];
short *pb[10];
}tDataStruct;
tDataStruct m,n;
memset(&m, 2, sizeof(m));
memcpy(&n,&m,sizeof(m));
In this case if i or memset of memcopy it only changes the address rather than value. How do i change the values instead? Is the prototype wrong?
Please suggest. Your inputs are very imp
Thanks
dsp guy
memset set's bytes, not shorts. always. 514 = (256*2) + (1*2)... 2s appearing on byte boundaries.
1.a. This does, admittedly, lessen it's usefulness for purposes such as you're trying to do (array fill).
reliable as long as both structs are of the same type. Just to be clear, these structures are NOT of "type short" as you suggest.
if I understand your question, I don't believe it matters as long as they are of the same type.
Just remember, these are byte level operations, nothing more, nothing less. See also this.
For the second part of your question, try
memset(m.pa, 0, sizeof(*(m.pa));
memset(m.pb, 0, sizeof(*(m.pb));
Note two operations to copy from two different addresses (m.pa, m.pb are effectively addresses as you recognized). Note also the sizeof: not sizeof the references, but sizeof what's being referenced. Similarly for memcopy.