Strange bitwise operation with Bitmap row width, what does it mean? (And why) - objective-c

What and why is the developer adding a hex value of 16, then using the bitwise operations AND followed by a NOT in this line:
size_t bytesPerRow = ((width * 4) + 0x0000000F) & ~0x0000000F;
He comments that "16 byte aligned is good", what does he mean?
- (CGContextRef)createBitmapContext {
CGRect boundingBox = CGPathGetBoundingBox(_mShape);
size_t width = CGRectGetWidth(boundingBox);
size_t height = CGRectGetHeight(boundingBox);
size_t bitsPerComponent = 8;
size_t bytesPerRow = ((width * 4) + 0x0000000F) & ~0x0000000F; // 16 byte aligned is good

ANDing with ~0x0000000F = 0xFFFFFFF0 (aka -16) rounds down to a multiple of 16, simply by resetting those bits that could make it anything other than a multiple of 16 (the 8's, 4's, 2's and 1's).
Adding 15 (0x0000000F) first makes it round up instead of down.

purpose of size_t bytesPerRow = ((width * 4) + 0x0000000F) & ~0x0000000F; is to round up this value to 16 bytes

The goal is to set bytesPerRow to be the smallest multiple of 16 that is capable of holding a row of data. This is done so that a bitmap can be allocated where every row address is 16 byte aligned, i.e. a multiple of 16. There are many possible benefits to alignment, including optimizations that take advantage of it. Some APIs may also require alignment.

The code sets the 4 least significant bits to zero. If the value is an address it will be on an even 16 byte boundary, "16 byte aligned".
this ia a one's complement so
~0x0000000F
becomes
0xFFFFFFF0
and-ing it with another value will clear 4 least significant bits.
This is the kind of thing we used to do all the time "back in the day"!

He's adding 0xf, and then masking out the lower 4 bits (& ~0xf), to make sure the value is rounded up. If he didn't add the 0xf, it would round down.

Related

How could I increase the range of an HID axis? (steering wheel hits limit after a few degrees)

I have an encoder that gives 4300 increments per rev. And I need at least 3 turns in either direction. (for a steering wheel)
However, when I turn it just a bit, it already hits the extremums. This is after a few degrees clockwise:
This is my descripor:
My code:
while (1)
{
steer.direction = position - position_p;
position_p = position;
USBD_CUSTOM_HID_SendReport(&hUsbDeviceFS, &steer, sizeof(steer));
HAL_Delay(5);
}
I have tried using an absolute value. With 8 bits it just overflows after a few degrees and comes back to the opposite extremum. Maybe 16 bits could solve that but I can't get it to work that way.
I managed to use a 16 bit absolute position.
I think it didn't work because the "send" function only took 8-bit values.
So I've split the 16 bit variable into a 2x8-bit array. (im using cubeIDE)
steer.direction[0] = position & 0x00FF;
steer.direction[1] = position >> 8;

Extra bytes on the end of YUV buffer - RaspberryPi

I've started editing the RaspiStillYUV.c code. I eventually want to process the image I receive, but for now, I'm just working to understand it. Why am I working with YUV instead of RGB? So I can learn something new. I've made minor changes to the function camera_buffer_callback. All I am doing is the following:
fprintf(stderr, "GREAT SUCCESS! %d\n", buffer->length);
The line this is replacing:
bytes_written = fwrite(buffer->data, 1, buffer->length, pData->file_handle);
Now, the dimensions should be 2592 x 1944 (w x h) as set in the code. Working off of Wikipedia (YUV420) I have come to the conclusion that the file size should be w * h * 1.5. Since the Y component has 1 byte of data for each pixel and the U and V components have 1 byte of data for every 4 pixels (1 + 1/4 + 1/4 = 1.5). Great. Doing the math in Python:
>>> 2592 * 1944 * 1.5
7558272.0
Unfortunately, this does not line up with the output of my program:
GREAT SUCCESS! 7589376
That leaves a difference of 31104 bytes.
I figure that the buffer is allocated in fixed size chunks (the output size is evenly divisible by 512). While I would like to understand that mystery, I'm fine with the fixed size chunk explanation.
My question is if I am missing something. Are the extra bytes beyond the expected size meaningful in this format? Should they be ignored? Are my calculations off?
The documentation at this location supports your theory on padding: http://www.raspberrypi.org/wp-content/uploads/2013/07/RaspiCam-Documentation.pdf
Specifically:
Note that the image buffers saved in raspistillyuv are padded to a
horizontal size divisible by 16 (so there may be unused bytes at the
end of each line to made the width divisible by 16). Buffers are also
padded vertically to be divisible by 16, and in the YUV mode, each
plane of Y,U,V is padded in this way.
So my interpretation of this is the following.
The width is 2592 (divisible by 16 so this is ok).
The height is 1944 which is 8 short of being divisible by 16 so an extra 8*2592 are added (also multiplied by 1.5) thus giving your 31104 extra bytes.
Although this kindof helps with the size of the file, it doesn't explain the structure of the YUV output properly. I am having a look at this description to see if this provides a hint to start with: http://en.wikipedia.org/wiki/YUV#Y.27UV420p_.28and_Y.27V12_or_YV12.29_to_RGB888_conversion
From this I believe it is as follows:
Y Channel:
2592 * (1944+8) = 5059584
U Channel:
1296 * (972+4) = 1264896
V Channel:
1296 * (972+4) = 1264896
Giving a sum of :
5059584 + 2*1264896 = 7589376
This makes the numbers add up so only thing left is to confirm if this interpretation is correct.
I am also trying to do the YUV decode (for image comparisons) so if you can confirm if this actually does correspond to what you are reading in the YUV file this would be much appreciated.
You have to read the manual carefully. Buffers are padded to multiples of 16, but colour data is half-size, so your image size needs to be in multiples of 32 to avoid problems with padding breaking external software.

GLubyte / GLushort usage issue

I'm new to OpenGl ES. I'm trying to build a sphere not using any manuals, tutrials...
I have succeded to achieve my goal. I can draw a sphere using TRIANGLE_STRIP. And the number of meridians/horizontals I specify before drawing.
Everything works fine when I have less then 256 indexes for vertixes. I tried to use GLushort instead of GLubyte but the picture changed a lot.
GLubyte *Indices;
...
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(GLubyte) * (meridians * (horizontals * 2 + 2)), Indices, GL_STATIC_DRAW);
...
Indices = malloc(sizeof(GLubyte) * (meridians * (horizontals * 2 + 2)));
Thats where I change byte to short.
Current project on GitHub
What should I do?
Here are the pictures where I change byte to short
Looks like you forgot to change the following line:
glDrawElements(GL_TRIANGLE_STRIP, (meridians * (horizontals * 2 + 2)), GL_UNSIGNED_BYTE, 0);
This indicates that there are a number of indices to be render, and each one is the size of an unsigned byte (most likely 8 bits, but the actual number is platform specific...very very very rarely is it not 8 bits though). However, you have filled an array of indices that are the size of unsigned shorts (probably 16 bits) so what will end up happening is that each of your numbers will be read twice. Once with the "first" 8-bits, and once with the "second" (endian will determine whether high or low order comes first). Since a lot of your indices (the majority?) are under 255, then there are going to be a lot of vertices that turn into "0" since the higher 8 bits are all 0. On top of that, you will only render half of your indices.
So, you need to indicate to OpenGL that it needs to draw these indices as unsigned shorts instead by changing the above line to this:
glDrawElements(GL_TRIANGLE_STRIP, (meridians * (horizontals * 2 + 2)), GL_UNSIGNED_SHORT, 0);

Working with unsigned char. How to replace elements without using loop?

I'm developing an application that should use very few resources and be very fast. And in my app I use an unsigned char* rawData which contains bytes got from image. So in this rawData array I have to keep some bytes and others set to zero. But I'm not permitted to use any loop(otherwise I can just run through each byte and set them zero).
So here are questions.
Q1) Is there any method in Objective C like ZeroMemory in C
Q2) Is there any other ways to set nessecary bytes to zero without using any loop.
Thanks In Advance...
P.S. Can provide some code if nessecary...
If you don't know the size of the buffer, you can't do it without a loop. Even if you don't write the loop yourself, calling something like strlen will result in a loop. I'm counting recursion as a loop here too.
How do you know which bytes to keep and which to set to zero? If these bytes are in known positions, you can use vector operations to zero out some of the bytes and not others. The following example zeros out only the even bytes over the first 64 bytes of rawData:
__m128i zeros = _mm_setzero_si128();
uint8_t mask[] = {8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0};
__m128i sse_mask = _mm_load_si128(mask);
_mm_maskmoveu_si128(zeros, sse_mask, &rawData[0]);
_mm_maskmoveu_si128(zeros, sse_mask, &rawData[16]);
_mm_maskmoveu_si128(zeros, sse_mask, &rawData[32]);
_mm_maskmoveu_si128(zeros, sse_mask, &rawData[48]);
If the high bit of each byte in mask is 1, the corresponding value in zeros will be copied to rawData. You can use a sequence of these masked copies to quickly replace some bytes and not others. The resulting machine code uses SSE operations, so this is actually quite fast. It's not required, but SSE operations will run much faster if rawData is 16-byte aligned.
Sorry if you're targeting ARM. I believe the NEON intrinsics are similar, but not identical.

How can I encode four unsigned bytes (0-255) to a float and back again using HLSL?

I am facing a task where one of my hlsl shaders require multiple texture lookups per pixel. My 2d textures are fixed to 256*256, so two bytes should be sufficient to address any given texel given this constraint. My idea is then to put two xy-coordinates in each float, giving me eight xy-coordinates in pixel space when packed in a Vector4 format image. These eight coordinates are then used to sample another texture(s).
The reason for doing this is to save graphics memory and an attempt to optimize processing time, since then I don't require multiple texture lookups.
By the way: Does anyone know if encoding/decoding 16 bytes from/to 4 floats using 1 sampling is slower than 4 samplings with unencoded data?
Edit: This is for Shader Model 3
If you are targeting SM 4.0-SM 5.0 you can use Binary Casts and Bitwise operations:
uint4 myUInt4 = asuint(myFloat4);
uint x0 = myUInt4.x & 0x000000FF; //extract two xy-coordinates (x0,y0), (x1,y1)
uint y0 = (myUInt4.x & 0x0000FF00) >> 8;
uint x1 = (myUInt4.x & 0x00FF0000) >> 16;
uint y1 = (myUInt4.x & 0xFF000000) >> 24;
//repeat operation for .y .z and .w coordinates
For previous Shader Models, I think it could be more complicated since it depends on FP precision.