Memcpy and Memset on structures of Short Type in C

Memcpy and Memset on structures of Short Type in C - structure

I have a query about using memset and memcopy on structures and their reliablity. For eg:
I have a code looks like this
typedef struct
{
short a[10];
short b[10];
}tDataStruct;
tDataStruct m,n;
memset(&m, 2, sizeof(m));
memcpy(&n,&m,sizeof(m));
My question is,
1): in memset if i set to 0 it is fine. But when setting 2 i get m.a and m.b as 514 instead of 2. When I make them as char instead of short it is fine. Does it mean we cannot use memset for any initialization other than 0? Is it a limitation on short for eg
2): Is it reliable to do memcopy between two structures above of type short. I have a huge
strings of a,b,c,d,e... I need to make sure copy is perfect one to one.
3): Am I better off using memset and memcopy on individual arrays rather than collecting in a structure as above?
One more query,
In the structue above i have array of variables. But if I am passed pointer to these arrays
and I want to collect these pointers in a structure
typedef struct
{
short *pa[10];
short *pb[10];
}tDataStruct;
tDataStruct m,n;
memset(&m, 2, sizeof(m));
memcpy(&n,&m,sizeof(m));
In this case if i or memset of memcopy it only changes the address rather than value. How do i change the values instead? Is the prototype wrong?
Please suggest. Your inputs are very imp
Thanks
dsp guy

memset set's bytes, not shorts. always. 514 = (256*2) + (1*2)... 2s appearing on byte boundaries.
1.a. This does, admittedly, lessen it's usefulness for purposes such as you're trying to do (array fill).
reliable as long as both structs are of the same type. Just to be clear, these structures are NOT of "type short" as you suggest.
if I understand your question, I don't believe it matters as long as they are of the same type.
Just remember, these are byte level operations, nothing more, nothing less. See also this.
For the second part of your question, try
memset(m.pa, 0, sizeof(*(m.pa));
memset(m.pb, 0, sizeof(*(m.pb));
Note two operations to copy from two different addresses (m.pa, m.pb are effectively addresses as you recognized). Note also the sizeof: not sizeof the references, but sizeof what's being referenced. Similarly for memcopy.

Related

Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?

The general standard appears to use NS_ENUM with NSInteger as the base type. Why is this the case? Assuming less than 256 cases (which covers almost any enumeration), is there any reason to use that instead of uint8_t, which could use less memory space? Either imports into Swift fine.
This is different than NS_OPTIONS, where a larger type makes sense, since you shouldn't be doing any bit math with enumerations, and you can use every number representable by the base type as a value.

The answer to the question in the title:
Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?
is probably not.
When declaring an enum in C if no underlying type is specified the compiler is free to choose any suitable type from char and the signed and unsigned integer types which can at least represent all the values required. The current Xcode/Clang compiler picks a 4-byte integer. One could reasonably assume the compiler writers made an informed choice - some balance of performance and storage.
Smaller types, such as uint8_t, will usually be aligned on smaller boundaries in memory (or on disc) - but that is only of benefit if the adjacent field matches the alignment e.g. if a 2-byte size typed field follows a 1-byte sized typed field then unless otherwise specified (e.g. with a #pragma packed) there will probably be an intervening unused byte.
Whether any performance or storage differences are significant will be heavily dependent on the application. Follow the usual rule of thumb - don't optimise until an issue is found.
However if you find semantic benefit in limiting the size then certainly do so - there is no general reason you shouldn't. The choice is similar to picking signed vs. unsigned integers, some programmers avoid unsigned types for values that will be ≥ 0 unless absolutely required for the extra range, while others appreciate the semantic benefit.
Summary: There is no right answer, its largely a subjective issue.
HTH

First of all: The memory footprint is close to completely meaningless. You are talking about 1 Byte vs. 4/8 Bytes. (If the memory alignment does not force the usage of 4/8 bytes whatever you chosed.) How many NS_ENUM (C) objects do you want to have in your running app?
I guess that the reason is pretty easy: NSInteger is akin of "catch all" integer type in Cocoa. That makes assignments easier, especially you do not have to care about assigning a bigger integer type to a smaller one. Without casting this would lead to warnings.
Having more than one integer type in a desktop app with a 32/64 bit model is akin of an anachronism. Nor a Mac neither a MacBook neither an iPhone is an embedded micro controller …

You can use any integer data type including uint8_t with NS_ENUM as.
typedef NS_ENUM(uint8_t, eEnumAddEditViewMode) {
eWBEnumAddMode,
eWBEnumEditMode
};
In old c style standard NSInteger is default, because NSInteger is akin of "catch all" integer type in objective c. and developer can easily type boxing and unboxing with their own variable. This is just developer friendly best practise.

How do I access an integer array within a struct/class from in-line assembly (blackfin dialect) using gcc?

Not very familiar with in-line assembly to begin with, and much less with that of the blackfin processor. I am in the process of migrating a legacy C application over to C++, and ran into a problem this morning regarding the following routine:
//
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
:: "a" ( buffer ), "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
I have a class that contains an array of shorts that is used for audio processing:
class AudProc
{
enum { buffer_size = 512 };
short M_samples[ buffer_size * 2 ];
// remaining part of class omitted for brevity
};
Within the AudProc class I have a method that calls clear_buffer, passing it the samples array:
clear_buffer ( M_samples, sizeof ( M_samples ) / 2 );
This generates a "Bus Error" and aborts the application.
I have tried making the array public, and that produces the same result. I have also tried making it static; that allows the call to go through without error, but no longer allows for multiple instances of my class as each needs its own buffer to work with. Now, my first thought is, it has something to do with where the buffer is in memory, or from where it is being accessed. Does something need to be changed in the in-line assembly to make this work, or in the way it is being called?
Thought that this was similar to what I was trying to accomplish, but it is using a different dialect of asm, and I can't figure out if it is the same problem I am experiencing or not:
GCC extended asm, struct element offset encoding
Anyone know why this is occurring and how to correct it?
Does anyone know where there is helpful documentation regarding the blackfin asm instruction set? I've tried looking on the ADSP site, but to no avail.

I would suspect that you could define your clear_buffer as
inline void clear_buffer (short * buffer, int len) {
memset (buffer, 0, sizeof(short)*len);
}
and probably GCC is able to optimize (when invoked with -O2 or -O3) that cleverly (because GCC knows about memset).
To understand assembly code, I suggest running gcc -S -O -fverbose-asm on some small C file, then to look inside the produced .s file.

I would have take a guess, because I don't know Blackfin assembler:
That LC0 sounds like "loop counter", LSETUP looks like a macro/insn, which, well, setups a loop between two labels and with a certain loop counter.
The "%0" operands is apparently the address to write to and we can safely guess it's incremented in the loop, in other words it's both an input and output operand and should be described as such.
Thus, I suggest describing it as in input-output operand, using "+" constraint modifier, as follows:
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
: "+a" ( buffer )
: "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
This is, of course, just a hypothesis, but you could disassemble the code and check if by any chance GCC allocated the same register for "%0" and "%2".
PS. Actually, only "+a" should be enough, early-clobber is irrelevant.

For anyone else who runs into a similar circumstance, the problem here was not with the in-line assembly, nor with the way it was being called: it was with the classes / structs in the program. The class that I believed to be the offender was not the problem - there was another class that held an instance of it, and due to other members of that outer class, the inner one was not aligned on a word boundary. This was causing the "Bus Error" that I was experiencing. I had not come across this before because the classes were not declared with __attribute__((packed)) in other code, but they are in my implementation.
Giving Type Attributes - Using the GNU Compiler Collection (GCC) a read was what actually sparked the answer for me. Two particular attributes that affect memory alignment (and, thus, in-line assembly such as I am using) are packed and aligned.
As taken from the aforementioned link:
aligned (alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:
struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));
force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.
Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire struct or union type.
As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:
struct S { short f[3]; } __attribute__ ((aligned));
Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables that have types that you have aligned this way.
In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two that is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.
Note that although you can ask the compiler to select a time-efficient alignment for a given type and then declare only individual stand-alone objects of that type, the compiler's ability to select a time-efficient alignment is primarily useful only when you plan to create arrays of variables having the relevant (efficiently aligned) type. If you declare or use arrays of variables of an efficiently-aligned type, then it is likely that your program also does pointer arithmetic (or subscripting, which amounts to the same thing) on pointers to the relevant type, and the code that the compiler generates for these pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types.
The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below.
Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.
.
packed
This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.
Specifying this attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members. Specifying the -fshort-enums flag on the line is equivalent to specifying the packed attribute on all enum definitions.
In the following example struct my_packed_struct's members are packed closely together, but the internal layout of its s member is not packed—to do that, struct my_unpacked_struct needs to be packed too.
struct my_unpacked_struct
{
char c;
int i;
};
struct __attribute__ ((__packed__)) my_packed_struct
{
char c;
int i;
struct my_unpacked_struct s;
};
You may only specify this attribute on the definition of an enum, struct or union, not on a typedef that does not also define the enumerated type, structure or union.
The problem which I was experiencing was specifically due to the use of packed. I attempted to simply add the aligned attribute to the structs and classes, but the error persisted. Only removing the packed attribute resolved the problem. For now, I am leaving the aligned attribute on them and testing to see if I find any improvements in the efficiency of the code as mentioned above, simply due to their being aligned on word boundaries. The application makes use of arrays of these structures, so perhaps there will be better performance, but only profiling the code will say for certain.

How are the digits in ObjC method type encoding calculated?

Is is a follow-up to my previous question:
What are the digits in an ObjC method type encoding string?
Say there is an encoding:
v24#0:4:8#12B16#20
How are those numbers calculated? B is a char so it should occupy just 1 byte (not 4 bytes). Does it have something to do with "alignment"? What is the size of void?
Is it correct to calculate the numbers as follows? Ask sizeof on every item and round up the result to multiple of 4? And the first number becomes the sum of all the other ones?

The numbers were used in the m68K days to denote stack layout. That is, you could literally decode the the method signature and, for just about all types, know exactly which bytes at what offset within the stack frame you could diddle to get/set arguments.
This worked because the m68K's ABI was entirely [IIRC -- been a long long time] stack based argument/return passing. There wasn't anything shoved into registers across call boundaries.
However, as Objective-C was ported to other platforms, always-on-the-stack was no longer the calling convention. Arguments and return values are often passed in registers.
Thus, those offsets are now useless. As well, the type encoding used by the compiler is no longer complete (because it never was terribly useful) and there will be types that won't be encoded. Not too mention that encoding some C++ templatized types yields method type encoding strings that can be many Kilobytes in size (I think the record I ran into was around 30K of type information).
So, no, it isn't correct to use sizeof() to generate the numbers because they are effectively meaningless to everything. The only reason why they still exist is for binary compatibility; there are bits of esoteric code here and there that still parse the type encoding string with the expectation that there will be random numbers sprinkled here and there.
Note that there are vestiges of API in the ObjC runtime that still lead one to believe that it might be possible to encode/decode stack frames on the fly. It really isn't as the C ABI doesn't guarantee that argument registers will be preserved across call boundaries in the face of optimization. You'd have to drop to assembly and things get ugly really really fast (>shudder<).

The full encoding string is constructed (in clang) by the method ASTContext::getObjCEncodingForMethodDecl, which you can find in lib/AST/ASTContext.cpp.
The method that does the size rounding is ASTContext::getObjCEncodingTypeSize, in the same file. It forces each size to be at least the size of an int. On all of Apple's current platforms, an int is 4 bytes.

The stack frame size and argument offsets are calculated by the compiler. I'm actually trying to track this down in the Clang source myself this week; it possibly has something to do with CodeGenTypes::arrangeObjCMessageSendSignature. (Looks like Rob just made my life a lot easier!)
The first number is the sum of the others, yes -- it's the total space occupied by the arguments. To get the size of the type represented by an ObjC type encoding in your code, you should use NSGetSizeAndAlignment().

passing primitive or struct type as function argument

I'm trying to write some reasonably generic networking code. I have several kinds of packets, each represented by a different struct. The function where all my sending occurs looks like:
- (void)sendUpdatePacket:(MyPacketType)packet{
for(NSNetService *service in _services)
for(NSData *address in [service addresses])
sendto(_socket, &packet, sizeof(packet), 0, [address bytes], [address length]);
}
I would really like to be able to send this function ANY kind of packet, not just MyPacketType packets.
I thought maybe if the function def was:
- (void)sendUpdatePacket:(void*)packetRef
I could pass in anykind of pointer to packet. But, without knowing the type of packet, I can't dereference the pointer.
How do I write a function to accept any kind of primitive/struct as its argument?

What you are trying to achieve is polymorphism, which is an OO concept.
So while this would be quite easy to implement in C++ (or other OO languages), it's a bit more challenging in C.
One way you could get around is it to create a generic "packet" structure such as this:
typedef struct {
void* messageHandler;
int messageLength;
int* messageData;
} packet;
Where the messageHandler member is a function pointer to a callback routine which can process the message type, and the messageLength and messageData members are fairly self-explanatory.
The idea is that the method which you pass the packetStruct to would use the Tell, Don't Ask principle to invoke the specific message handler pointer to by messageHandler, passing in the messageLength and messageData without interpreting it.
The dispatch function (pointed to by messageHandler) would be message-specific and will be able to cast the messageData to the appropriate meaningful type, and then the meaningful fields can be extracted from it and processed, etc.
Of course, this is all much easier and more elegant in C++ with inheritance, virtual methods and the like.
Edit:
In response to the comment:
I'm a little unclear how "able to cast
the messageData to the appropriate
meaningful type, and then the
meaningful fields can be extracted
from it and processed, etc." would be
accomplished.
You would implement a handler for a specific message type, and set the messageHandler member to be a function pointer to this handler. For example:
void messageAlphaHandler(int messageLength, int* messageData)
{
MessageAlpha* myMessage = (MessageAlpha*)messageData;
// Can now use MessageAlpha members...
int messageField = myMessage->field1;
// etc...
}
You would define messageAlphaHandler() in such a way to allow any class to get a function pointer to it easily. You could do this on startup of the application so that the message handlers are registered from the beginning.
Note that for this system to work, all message handlers would need to share the same function signature (i.e. return type and parameters).
Or for that matter, how messageData
would be created in the first place
from my struct.
How are you getting you packet data? Are you creating it manually, reading it off a socket? Either way, you need to encode it somewhere as a string of bytes. The int* member (messageData) is merely a pointer to the start of the encoded data. The messageLength member is the length of this encoded data.
In your message handler callback, you don't want probably don't want to continue to manipulate the data as raw binary/hex data, but instead interpret the information in a meaningful fashion according to the message type.
Casting it to a struct essentially maps the raw binary information on to a meaningful set of attributes matching to the protocol of the message you are processing.

The key is that you must realize that everything in a computer is just an array of bytes (or, words, or double words).
ZEN MASTER MUSTARD is sitting at his desk staring at his monitor staring at a complex pattern of seemingly random characters. A STUDENT approaches.
Student: Master? May I interrupt?
Zen Master Mustard: You have answered your own inquiry, my son.
S: What?
ZMM: By asking your question about interrupting me, you have interrupted me.
S: Oh, sorry. I have a question about moving structures of varying size from place to place.
ZMM: If that it true, then you should consult a master who excels at such things. I suggest, you pay a visit to Master DotPuft, who has great knowledge in moving large metal structures, such as tracking radars, from place to place. Master DotPuft can also cause the slightest elements of a feather-weight strain gage to move with the force of a dove's breath. Turn right, then turn left when you reach the door of the hi-bay. There dwells Master DotPuft.
S: No, I mean moving large structures of varying sizes from place to place in the memory of a computer.
ZMM: I may assist you in that endeavor, if you wish. Describe your problem.
S: Specifically, I have a c function that I want to accept several different types of structs (they will be representing different type of packets). So my struct packets will be passed to my function as void*. But without knowing the type, I can't cast them, or really do much of anything. I know this is a solvable problem, because sento() from socket.h does exactly that:
ssize_t sendto(int socket, const void *message, size_t length, int flags, const struct sockaddr *dest_addr,socklen_t dest_len);
where sendto would be called like:
sendto(socketAddress, &myPacket, sizeof(myPacket), Other args....);
ZMM: Did you describe your problem to Zen Master MANTAR! ?
S: Yeah, he said, "It's just a pointer. Everything in C is a pointer." When I asked him to explain, he said, "Bok, bok, get the hell out of my office."
ZMM: Truly, you have spoken to the master. Did this not help you?
S: Um, er, no. Then I asked Zen Master Max.
ZMM: Wise is he. What was his advice to you useful?
S: No. When I asked him about sendto(), he just swirled his fists in the air. It's just an array of bytes."
ZMM: Indeed, Zen Master Max has tau.
S: Yeah, he has tau, but how do I deal with function arguments of type void*?
ZMM: To learn, you must first unlearn. The key is that you must realize that everything in a computer is just an array of bytes (or, words, or double words). Once you have a pointer to the beginning of a buffer, and the length of the buffer, you can sent it anywhere without a need to know the type of data placed in the buffer.
S: OK.
ZMM: Consider a string of man-readable text. "You plan a tower that will pierce the clouds? Lay first the foundation of humility." It is 82 bytes long. Or, perhaps, 164 if the evil Unicode is used. Guard yourself against the lies of Unicode! I can submit this text to sendto() by providing a pointer to the beginning of the buffer that contains the string, and the length of the buffer, like so:
char characterBuffer[300]; // 300 bytes
strcpy(characterBuffer, "You plan a tower that will pierce the clouds? Lay first the foundation of humility.");
// note that sizeof(characterBuffer) evaluates to 300 bytes.
sendto(socketAddress, &characterBuffer, sizeof(characterBuffer));
ZMM: Note well that the number of bytes of the character buffer is automatically calculated by the compiler. The number of bytes occupied by any variable type is of a type called "size_t". It is likely equivalent to the type "long" or "unsinged int", but it is compiler dependent.
S: Well, what if I want to send a struct?
ZMM: Let us send a struct, then.
struct
{
int integerField; // 4 bytes
char characterField[300]; // 300 bytes
float floatField; // 4 bytes
} myStruct;
myStruct.integerField = 8765309;
strcpy(myStruct.characterField, "Jenny, I got your number.");
myStruct.floatField = 876.5309;
// sizeof(myStruct) evaluates to 4 + 300 + 4 = 308 bytes
sendto(socketAddress, &myStruct, sizeof(myStruct);
S: Yeah, that's great at transmitting things over TCP/IP sockets. But what about the poor receiving function? How can it tell if I am sending a character array or a struct?
ZMM: One way is to enumerate the different types of data that may be sent, and then send the type of data along with the data. Zen Masters refer to this as "metadata", that is to say, "data about the data". Your receiving function must examine the metadata to determine what kind of data (struct, float, character array) is being sent, and then use this information to cast the data back into its original type. First, consider the transmitting function:
enum
{
INTEGER_IN_THE_PACKET =0 ,
STRING_IN_THE_PACKET =1,
STRUCT_IN_THE_PACKET=2
} typeBeingSent;
struct
{
typeBeingSent dataType;
char data[4096];
} Packet_struct;
Packet_struct myPacket;
myPacket.dataType = STRING_IN_THE_PACKET;
strcpy(myPacket.data, "Nothing great is ever achieved without much enduring.");
sendto(socketAddress, myPacket, sizeof(Packet_struct);
myPacket.dataType = STRUCT_IN_THE_PACKET;
memcpy(myPacket.data, (void*)&myStruct, sizeof(myStruct);
sendto(socketAddress, myPacket, sizeof(Packet_struct);
S: All right.
ZMM: Now, just us walk along with the receiving function. It must query the type of the data that was sent and the copy the data into a variable declared of that type. Forgive me, but I forget the exact for of the recvfrom() function.
char[300] receivedString;
struct myStruct receivedStruct;
recvfrom(socketDescriptor, myPacket, sizeof(myPacket);
switch(myPacket.dataType)
{
case STRING_IN_THE_PACKET:
// note the cast of the void* data into type "character pointer"
&receivedString[0] = (char*)&myPacket.data;
printf("The string in the packet was \"%s\".\n", receivedString);
break;
case STRUCT_IN_THE_PACKET:
// note the case of the void* into type "pointer to myStruct"
memcpy(receivedStruct, (struct myStruct *)&myPacket.data, sizeof(receivedStruct));
break;
}
ZMM: Have you achieved enlightenment? First, one asks the compiler for the size of the data (a.k.a. the number of bytes) to be submitted to sendto(). You send the type of the original data is sent along as well. The receiver then queries for the type of the original data, and uses it to call the correct cast from "pointer to void" (a generic pointer), over to the type of the original data (int, char[], a struct, etc.)
S: Well, I'll give it a try.
ZMM: Go in peace.

Is there a practical limit to the size of bit masks?

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.
I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.
What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?
I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.

Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):
// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//set the right bit
bytes[offset >> 3] |= (1 << (offset & 0x7));
return 0; //success
}
//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//get the right bit
return (bytes[offset >> 3] & (1 << (offset & 0x7));
}

I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";
(journalling masks in flash memory if you want to know)
many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.
My 2 cents.

With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)
How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.
If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).
Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.

Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )
However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.
( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )
The instant you have to crack open a constants table to debug code you know you've gone too far.

Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.

For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.
That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.

You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.
Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.
Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Memcpy and Memset on structures of Short Type in C - structure

Related

Is there any reason to use NSInteger instead of uint8_t with NS_ENUM?

How do I access an integer array within a struct/class from in-line assembly (blackfin dialect) using gcc?

How are the digits in ObjC method type encoding calculated?

passing primitive or struct type as function argument

Is there a practical limit to the size of bit masks?

Categories

Resources