Extract first two digits of hex (UInt32 *) and convert to int - objective-c

I have a bunch of hex values stored as UInt32*
2009-08-25 17:09:25.597 Particle[1211:20b] 68000000
2009-08-25 17:09:25.598 Particle[1211:20b] A9000000
2009-08-25 17:09:25.598 Particle[1211:20b] 99000000
When I convert to int as is, they're insane values when they should be from 0-255, I think. I think I just need to extract the first two digits. How do I do this? I tried dividing by 1000000 but I don't think that works in hex.

Since you're expecting < 255 for each value and only the highest byte is set in the sample data you posted, it looks like your endianness is mixed up - you loaded a big endian number then interpreted it as little endian, or vice versa, causing the order of bytes to be in the wrong order.
For example, suppose we had the number 104 stored in 32-bits on a big endian machine. In memory, the bytes would be: 00 00 00 68. If you loaded this into memory on a little endian machine, those bytes would be interpreted as 68000000.
Where did you get the numbers from? Do you need to convert them to machine byte order?

Objective C is essentially C with extra stuff on top. Your usual bit-shift operations (my_int >> 24 or whatever) should work.

This absolutely sounds like an endianness issue. Whether or not it is, simple bit shifting should do the job:
uint32_t saneValue = insaneValue >> 24;

Dividing by 0x1000000 should work (that is, by 16^6 = 2^24, not 10^6). That's the same as shifting the bits right by 24 (I don't know ObjC syntax, sorry).

Try using the function NSSwapInt(), i.e.
int x = 0x12345678;
x = NSSwapInt(x);
NSLog (#"%x", x);
Should print “78563412”.

Related

how do I divide large number into two smaller integers and then reassemble the large number?

i have tried the below but do not seem to get the correct value in the end:
I have a number that may be larger than 32bit and hence I want to store it into two 32 bit array indices.
I broke them up like:
int[0] = lgval%(2^32);
int[1] = lgval/(2^32);
and reassembling the 64bit value I tried like:
CPU: PowerPC e500v2
lgval= ((uint64)int[0]) | (((uint64)int[1])>>32);
mind shift to right since we're on big endian. For some reason I do not get the correct value at the end, why not? What am I doing wrong here?
The ^ operator is xor, not power.
The way you want to do this is probably:
uint32_t split[2];
uint64_t lgval;
/* ... */
split[0] = lgval & 0xffffffff;
split[1] = lgval >> 32;
/* code to operate on your 32-bit array elements goes here */
lgval = ((uint64_t)split[1] << 32) | (uint64_t)(split[0]);
As Raymond Chen has mentioned, endianness is about storage. In this case, you only need to consider endianness if you want to access the bytes in your split-32-bit-int as a single 64-bit value. This probably isn't a good idea anyway.

Does the "C" code algorithm in RFC1071 work well on big-endian machine?

As described in RFC1071, an extra 0-byte should be added to the last byte when calculating checksum in the situation of odd count of bytes:
But in the "C" code algorithm, only the last byte is added:
The above code does work on little-endian machine where [Z,0] equals Z, but I think there's some problem on big-endian one where [Z,0] equals Z*256.
So I wonder whether the example "C" code in RFC1071 only works on little-endian machine?
-------------New Added---------------
There's one more example of "breaking the sum into two groups" described in RFC1071:
We can just take the data here (addr[]={0x00, 0x01, 0xf2}) for example:
Here, "standard" represents the situation described in the formula [2], while "C-code" representing the C code algorithm situation.
As we can see, in "standard" situation, the final sum is f201 regardless of endian-difference since there's no endian-issue with the abstract form of [Z,0] after "Swap". But it matters in "C-code" situation because f2 is always the low-byte whether in big-endian or in little-endian.
Thus, the checksum is variable with the same data(addr&count) on different endian.
I think you're right. The code in the RFC adds the last byte in as low-order, regardless of whether it is on a litte-endian or big-endian machine.
In these examples of code on the web we see they have taken special care with the last byte:
https://github.com/sjaeckel/wireshark/blob/master/epan/in_cksum.c
and in
http://www.opensource.apple.com/source/tcpdump/tcpdump-23/tcpdump/print-ip.c
it does this:
if (nleft == 1)
sum += htons(*(u_char *)w<<8);
Which means that this text in the RFC is incorrect:
Therefore, the sum may be calculated in exactly the same way
regardless of the byte order ("big-endian" or "little-endian")
of the underlaying hardware. For example, assume a "little-
endian" machine summing data that is stored in memory in network
("big-endian") order. Fetching each 16-bit word will swap
bytes, resulting in the sum; however, storing the result
back into memory will swap the sum back into network byte order.
The following code in place of the original odd byte handling is portable (i.e. will work on both big- and little-endian machines), and doesn't depend on an external function:
if (count > 0)
{
char buf2[2] = {*addr, 0};
sum += *(unsigned short *)buf2;
}
(Assumes addr is char * or const char *).

Ints to Bytes: Endianess a Concern?

Do I have to worry about endianness in this case (integers MUST be 0-127):
int a = 120;
int b = 100;
int c = 50;
char theBytes[] = {a, b, c};
I think that, since each integer sits in its own byte, I don't have to worry about Endianess in passing the byte array between systems. This has also worked out empirically. Am I missing something?
Endianness only affects the ordering of bytes within an individual value. Individual bytes are not subject to endian issues, and arrays are always sequential, so byte arrays are the same on big- and little-endian architectures.
Note that this doesn't necessarily mean that only using chars will make datatypes 100% byte-portable. Structs may still include architecture-dependent padding, for example, and one system may have unsigned chars while another uses signed (though I see you sidestep this by only allowing 0-127).
No, you don't need to worry, compiler produces code which makes correct casting and assignment.

Dataset's TBytes column and SQL VarBinary field combination

select convert(varbinary(8), 1) in MS SQL Server produces output : 0x00000001
On assigning the above query to dataset in Delphi, and accessing field value, we get byte array as [1, 0, 0, 0] . So Bytes[0] contains 1.
When I use IntToHex() on this bytes array it would result me value as "10000000" .
Why is IntToHex considering it in reverse order?
Thanks & Regards,
Pavan.
I think you forgot to include a reference to the code where you're somehow calling IntToHex on a TBytes array. It's from the answer to your previous question, how to convert byte array to its hex representation in Delphi.
In my answer, I forgot to account for how a pointer to an array of bytes would have the bytes in big-endian order while IntToHex (and everything else on x86) expects them in little-endian order. The solution is to switch them around. I used this function:
function Swap32(value: Integer): Integer;
asm
bswap eax
end;
In the meantime, I fixed my answer to account for that.
This seems to be a little/big endian problem. Just reverse the byte array or the return value from IntToHex. Another way would be to do it yourself:
myInt = Bytes[0];
Inc(myInt, (Bytes[1] shl 8));
Inc(myInt, (Bytes[2] shl 16));
Inc(myInt, (Bytes[3] shl 24));
Also be careful with the sign. Is the SQL value signed or unsigned - the Delphi datatype should match this (int/longint is signed, Longword/Cardinal is unsigned - see here or in the Delphi Help).
Because the x86 CPU uses little-endian numbers, a numbering system which orders its bytes in reverse order. You'll need to swap the byte order to get the right value.

Is there a practical limit to the size of bit masks?

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.
I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.
What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?
I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.
Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):
// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//set the right bit
bytes[offset >> 3] |= (1 << (offset & 0x7));
return 0; //success
}
//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//get the right bit
return (bytes[offset >> 3] & (1 << (offset & 0x7));
}
I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";
(journalling masks in flash memory if you want to know)
many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.
My 2 cents.
With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)
How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.
If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).
Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.
Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )
However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.
( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )
The instant you have to crack open a constants table to debug code you know you've gone too far.
Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.
For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.
That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.
You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.
Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.
Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.