What mechanism detects accesses of unallocated memory? - objective-c

From time to time, I'll have an off-by-one error like the following:
unsigned int* x = calloc(2000, sizeof(unsigned int));
printf("%d", x[2000]);
I've gone beyond the end of the allocated region, so I get an EXC_BAD_ACCESS signal at runtime. My question is: how is this detected? It seems like this would just silently return garbage, since I'm only off by one byte and not, say, a full page. What part of the system prevents me from just returning the garbage byte at x + 2000?

The memory system has sentinel values at the beginning and end of its memory fields, beyond your allocated bytes. When you free the memory, it checks to see if those values are intact. If not, it tells you.

Perhaps you are just lucky because you are using 2000 as a size. Depending on the size of int the total size is divisible by 32 or 64, so chances are high that the end of it really terminates the "real" allocation. Try with some odd number of bytes (better use a char array for that) and see if your systems still detects it.
In any case you shouldn't rely on finding these bugs this way. Always use valgrind or similar to check your memory accesses.

Related

how do i know in advance that the buffer size is enough in nanopb?

im trying to use nanopb, according to the example:
https://github.com/nanopb/nanopb/blob/master/examples/simple/simple.c
the buffer size is initialized to 128:
uint8_t buffer[128];
my question is how do i know (in advance) this 128-length buffer is enough to transmit my message? how to decide a proper(enough but not waste too much due to over-large) size of buffer before initial (or coding) it?
looks like a noob question :) , but thx for your quick suggestion.
When possible, nanopb adds a define in the generated .pb.h file that has the maximum encoded size of a message. In the file examples/simple/simple.pb.h you'll find:
/* Maximum encoded size of messages (where known) */
#define SimpleMessage_size 11
And could specify uint8_t buffer[SimpleMessage_size];.
This define will be available only if all repeated and string fields have been specified (nanopb).max_count and (nanopb).max_size options.
For many practical purposes, you can pick a buffer size that you estimate will be large enough, and handle error conditions. It is also possible to use pb_get_encoded_size() to calculate the encoded size and dynamically allocate storage, but in general that is not a great solution in embedded applications. When total system memory size is limited, it is often better to have a constant sized buffer that you can test with, instead of having the available amount of dynamic memory vary at the runtime.

Valgrind Invalid read of size 4, but not out-of-bounds nor not stack'd, malloc'd or (recently) free'd

I got many 'Invalid read of size N' while running my program written in C++ with Valgrind-3.11.0 on Ubuntu 64-bit.
The error messages are like following with different N where N is varying among 1, 4, 8.
Invalid read of size N.
Address 0xblahblah is 88 bytes inside a block of size 176 alloc'd
The block of size 176 is a C++ class object allocated with new operator and the size of N is small enough so that it's not out-of-bounds case.
Then why the Valgrind doesn't tell me the reason like 'not stacked', 'not malloced', 'recently freed'?
Does anyone know why Valgrind decided this as an invalid read when there are no messages like 'not stacked', 'not malloced', 'recently freed'?
N corresponds to the size of the underlying primitive data type. Roughly speaking, 1 would be a char or a bool, 4 would be and int or float and 8 would be a long int, double or pointer.
XX bytes inside a block of size YY alloc'd
First of all this means that the memory is dynamic (~on the heap) rather than automatic (~on the stack). Secondly, you know the size of the object(s) allocated. If you know the size per object (you can use sizeof to get that), then you can work out how many objects were allocated.
Are there any other serious errors before this?
Do you use any Valgrind client requests in your code?

Fastest way to compare uchar arrays in OpenCL

I need do many comparsions in opencl programm. Now i make it like this
int memcmp(__global unsigned char* a,__global unsigned char* b,__global int size){
for (int i = 0; i<size;i++){
if(a[i] != b[i])return 0;
}
return 1;
}
How i can make it faster? Maybe using vectors like uchar4 or somethins else? Thanks!
I guess that your kernel computes "size" elements for each thread. I think that your code can improve if your accesses are more coalesced. Thanks to the L1 caches of the current GPUs this is not a huge problem but it can imply a noticeable performance penalty. For example, you have 4 threads(work-items), size = 128, so the buffers have 512 uchars. In your case, thread #0 acceses to a[0] and b[0], but it brings to cache a[0]...a[63] and the same for b. thread #1 wich belongs to the same warp (aka wavefront) accesses to a[128] and b[128], so it brings to cache a[128]...a[191], etc. After thread #3 all the buffer is in the cache. This is not a problem here taking into account the small size of this domain.
However, if each thread accesses to each element consecutively, only one "cache line" is necessary all the time for your 4 threads execution (the accesses are coalesced). The behavior will be better when more threads per block are considered. Please, try it and tell me your conclusions. Thank you.
See: http://www.nvidia.com/content/cudazone/download/opencl/nvidia_opencl_programmingguide.pdf Section 3.1.2.1
It is a bit old but their concepts are not so old.
PS: By the way, after this I would try to use uchar4 as you commented and also the "loop unrolling".

Memory addresses, pointers, variables, values - what goes on behind the scenes

This is going to be a pretty loaded question but ever since I started learning about pointers I've been very curious about what happens behind the scenes when a program is run.
As far as I know, computer memory is commonly thought of as a long strip of memory divided evenly into individual bytes. Certainly pictures such as the following evoke such a metaphor:
One thing I've been wondering, what do the memory addresses themselves represent? I'm sure it's no coincidence that memory addresses appear as 8 digit hexadecimal values (eg/ 00EB5748). Why is this?
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Now suppose x is an unsigned int that occupies 2 bytes of memory (ie values ranging from 0 to 65536). When I declare x = 12, what is happening? What is it that I'm making equal to 12? When I draw conceptual diagrams, I usually have a box for an address (say &x) pointing to a variable (x) that occupies seemingly nothing, and I'm sure that can't be a fully accurate picture of what's going on.
And what's happening at the binary level? Is the address 00EB5748 treated as 111010110101011101001000 and storing a value of 12 somewhere, or 1100?
Mostly my confusion & curiosity stems from the relationship between memory addresses and actual values being declared (eg/ 12, 'a', -355.2). As another example, suppose our address 00EB5748 is pointing to a char 's' whose value is 115 according to ASCII charts. Is the address describing a position that stores the value 115 in 1 byte, by flipping the appropriate 1s and 0s at that position in memory?
Just open any book. You will see pages. Every page has a number. Consecutive pages are numbered by consecutive numbers. Do you have any confusion with numbered pages? I think no. Then you should not have confusion with computer memory.
Books were main memory storage devices before computer era. Computer memory derived basic concept from books: book has pages -> computer memory has memory cells, book has page numbers -> computer memory has memory addresses.
One thing I've been wondering, what do the memory addresses themselves represent?
Numbers. Every memory cell has number, like every page in book.
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Memory manager marks some memory cells occupied and tells the address of first reserved cell to compiler. Compiler associates name and type of variable with this address. (This picture is from my head, it can be inaccurate).
When I declare x = 12, what is happening?
When you declared variable x, memory cells were reserved for this variable. Now you write 12 into these memory cells. Note that 12 is binary coded in some way, depending on type of variable x. If x is unsigned int which occupies 2 memory cells, then one cell will contain 0, other will contain 12. Because binary integer representation of 12 is
0000 0000 0000 1100
|_______| |_______|
cell cell
If 12 is floating-point number it will be coded in other way.
A memory address is simply the position of a given byte in memory. The zeroth byte is at 0x00000000. The tenth at 0x0000000A. The 65535th at 0x0000FFFF. And so on.
Local variables live on the stack*. When compiling a block of code, the compiler counts how many bytes are needed to hold all the local variables, and then increments the stack pointer so that all the variables can fit below it (along with some other stuff like frame pointers and return addresses and whatnot). Then it just remembers that, for example, local variable x is at an offset -2 from the stack pointer, foo is at an offset -4 and so on, and uses those addresses whenever those variables are referenced in the following code.
Since the compiler knows that x is at address (stack pointer - 2), that's the location that is set to the value 12 when you do x = 12.
Not entirely sure if I understand this question, but say you want to read the memory at address 0x00EB5748. The control unit in the CPU reads the instruction, sees that it is a load instruction, and passes the address (in binary of course) to the load/store unit, along with some other junk like how many bytes to read. Then the LSU sends that address to some memory (probably L1 cache), and after a certain time gets the value 12 back. Then this data is available to, say, put in a register, or send to the ALU to do arithmetic, or whatever.
That seems to be accurate, yes. Going back to the first question, an address simply means "byte number 0xWHATEVER in memory".
Hope this clarified things a bit at least.
*I should probably explain the stack as well. A stack is a portion of memory reserved for local variables (and some other stuff). It starts at a fixed location in memory, and stops at the memory address contained in a special register called the stack pointer. To begin with, the stack is empty, so the stack pointer just contains the start of the stack. As you put more data on the stack, the SP is incremented. This means that you can always put more data on it simply by putting it at the address in the SP, and then incrementing the SP so that once again anything past that address is free memory.

Where does the limitation of 10^15 in D.J. Bernstein's 'primegen' program come from?

At http://cr.yp.to/primegen.html you can find sources of program that uses Atkin's sieve to generate primes. As the author says that it may take few months to answer an e-mail sent to him (I understand that, he sure is an occupied man!) I'm posting this question.
The page states that 'primegen can generate primes up to 1000000000000000'. I am trying to understand why it is so. There is of course a limitation up to 2^64 ~ 2 * 10^19 (size of long unsigned int) because this is how the numbers are represented. I know for sure that if there would be a huge prime gap (> 2^31) then printing of numbers would fail. However in this range I think there is no such prime gap.
Either the author overestimated the bound (and really it is around 10^19) or there is a place in the source code where the arithmetic operation can overflow or something like that.
The funny thing is that you actually MAY run it for numbers > 10^15:
./primes 10000000000000000 10000000000000100
10000000000000061
10000000000000069
10000000000000079
10000000000000099
and if you believe Wolfram Alpha, it is correct.
Some facts I had "reverse-engineered":
numbers are sifted in batches of 1,920 * PRIMEGEN_WORDS = 3,932,160 numbers (see primegen_fill function in primegen_next.c)
PRIMEGEN_WORDS controls how big a single sifting is - you can adjust it in primegen_impl.h to fit your CPU cache,
the implementation of the sieve itself is in primegen.c file - I assume it is correct; what you get is a bitmask of primes in pg->buf (see primegen_fill function)
The bitmask is analyzed and primes are stored in pg->p array.
I see no point where the overflow may happen.
I wish I was on my computer to look, but I suspect you would have different success if you started at 1 as your lower bound.
Just from the algorithm, I would conclude that the upper bound comes from the 32 bit numbers.
The page mentiones Pentium-III as CPU so my guess it is very old and does not use 64 bit.
2^32 are approx 10^9. Sieve of Atkins (which the algorithm uses) requires N^(1/2) bits (it uses a big bitfield). Which means in 2^32 big memory you can make (conservativ) N approx 10^15. As this number is a rough conservative upper bound (you have system and other programs occupying memory, reserving address ranges for IO,...) the real upper bound is/might be higher.