How does mmap'ed data work with object allocation? - objective-c

Im a little confused on how mmap() works with the frameworks on iOS or OSX.
If a file is mapped to virtual memory using mmap() and data is requested from it, the the kernel pages in the data to RAM. How does this actually affect how one creates objects?
If one usually creates an object using alloc/init, the memory block is allocated and the object initiated. But what if the data resides in virtual memory, in a mmap'ed file? Does the alloc need to be called on the object? Does the allocated memory for the object get filled with the data from virtual memory? Or does one skip the alloc call and just pass in the pointer to the data in virtual memory?
e.g. an image or a sound file, if I know where the file is in virtual memory, how would I setup the object?
If one allocates the data, doesn't it get duplicated if the data is already paged into RAM?
I was thinking that using memory from virtual addresses would remove the need to allocate on the heap.

If you only have one object you are storing in the mmaped space, then you simply skip the alloc and use the location directly. However, normally you will have more than one object, and now you are into managing it yourself. Typically at least some portion of it will be laid out in a fixed manner, so that both processes know where to find things. Instead of pointers you get the fun of using offsets from the start of the arena, since that works in both processes' address space.
In essence, you're given a chunk of memory, as if you'd done one big malloc/alloc and you get to play around within it.
If you have, say
void *p = mmap( <appropriate arguments> );
and you want to put an object of type foo at offset 200, you would say
foo *f = (foo *)p+200;
and now you can manipulate f in all the normal ways, subject to f not containing any pointers into the mmapped space. It is generally good discipline to substitute offsets for such pointers, and then when you need to follow one, you can convert it to a pointer (by adding p).

Related

When to use dispatch_once versus reallocation? (Cocoa/CocoaTouch)

I often use simple non compile-time immutable objects: like an array #[#"a", #"b"] or a dictionary #{#"a": #"b"}.
I struggle between reallocating them all the time:
- (void)doSomeStuff {
NSArray<NSString *> *fileTypes = #[#"h", #"m"];
// use fileTypes
}
And allocating them once:
- (void)doSomeStuff {
static NSArray<NSString *> * fileTypes;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
fileTypes = #[#"h", #"m"];
});
// use fileTypes
}
Are there recommendations on when to use each construct? Like:
depending on the size of the allocated object
depending on the frequency of the allocation
depending on the device (iPhone 4 vs iMac 2016)
...
How do I figure it out?
Your bullet list is a good start. Memory would be another consideration. A static variable will stay in memory from the time it is actually initialized until the termination of the app. Obviously a local variable will be deallocated at the end of the method (assuming no further references).
Readability is something to consider too. The reallocation line is much easier to read than the dispatch_once setup.
For a little array, I'd reallocate as the first choice. The overhead is tiny. Unless you are creating the array in a tight loop, performance will be negligible.
I would use dispatch_once and a static variable for things that take more overhead such as creating a date formatter. But then there is the overhead of reacting to the user changing the device's locale.
In the end, my thought process is to first use reallocation. Then I consider whether there is tangible benefit to using static and dispatch_once. If there isn't a worthwhile reason to use a static, I leave it a local variable.
Use static if the overhead (speed) of reallocation is too much (but not if the permanent memory hit is too large).
Your second approach is more complex. So you should use it, if it is needed, but not as a default.
Typically this is done when the object creating is extreme expensive (almost never) or if you need a single identity of the instance object (shared instance, sometimes called singleton, what is incorrect.) In such a case you will recognize that you need it.
Although this question could probably be closed as "primarily opinion-based", I think this question addresses a choice that programmers frequently make.
The obvious answer is: don't optimize, and if you do, profile first - because most of the time you'll intuitively address the wrong part of your code.
That being said, here's how I address this.
If the needed object is expensive to build (like the formatters per documentation) and lightweight on resources, reuse / cache the object.
If the object is cheap to build, create a new instance when needed.
For crucial things that run on the main thread (UI stuff) I tend to draw the line more strict and cache earlier.
Depending on what you need those objects for, sometimes there are valid alternatives that are cheaper but offer a similar programming comfort. For example, if you wanted to look up some chessboard coordinates, you could either build a dictionary {#"a1" : 0, #"b1" : 1, ...}, use an array alone and the indices, take a plain C array (which now has a much less price tag attached to it), or do a small integer-based calculation.
Another point to consider is where to put the cached object. For example, you could store it statically in the method as in your example. Or you could make it an instance variable. Or a class property / static variable. Sometimes caching is only the half way to the goal - if you consider a table view with cells including a date formatter, you can think about reusing the very same formatter for all your cells. Sometimes you can refactor that reused object into a helper object (be it a singleton or not), and address some issues there.
So there is really no golden bullet here, and each situation needs an individual approach. Just don't fall into the trap of premature optimization and trade clear code for bug-inviting, hardly readable stuff that might not matter to your performance but comes with sure drawbacks like increased memory footprint.
dispatch_once_t
This allows two major benefits: 1) a method is guaranteed to be called only once, during the lifetime of an application run, and 2) it can be used to implement lazy initialization, as reported in the Man Page below.
From the OS X Man Page for dispatch_once():
The dispatch_once() function provides a simple and efficient mechanism to run an initializer exactly once, similar to pthread_once(3). Well designed code hides the use of lazy initialization.
Some use cases for dispatch_once_t
Singleton
Initialization of a file system resource, such as a file handle
Any static variable that would be shared by a group of instances, and takes up a lot of memory
static, without dispatch_once_t
A statically declared variable that is not wrapped in a dispatch_once block still has the benefit of being shared by many instances. For example, if you have a static variable called defaultColor, all instances of the object see the same value. It is therefore class-specific, instead of instance-specific.
However, any time you need a guarantee that a block will be called only once, you will need to use dispatch_once_t.
Immutability
You also mentioned immutability. Immutability is independent of the concern of running something once and only once--so there are cases for both static immutable variables and instance immutable variables. For instance, there may be times when you need to have an immutable object initialized, but it still may be different for each instance (in cases where it's value depends on other instance variables). In that case, an immutable object is not static, and still may be initialized with different values from multiple instances. In this case, a property is derived from other instance variables, and therefore should not be allowed to be changed externally.
A note on immutability vs mutability, from Concepts in Objective-C Programming:
Consider a scenario where all objects are capable of being mutated. In your application you invoke a method and are handed back a reference to an object representing a string. You use this string in your user interface to identify a particular piece of data. Now another subsystem in your application gets its own reference to that same string and decides to mutate it. Suddenly your label has changed out from under you. Things can become even more dire if, for instance, you get a reference to an array that you use to populate a table view. The user selects a row corresponding to an object in the array that has been removed by some code elsewhere in the program, and problems ensue. Immutability is a guarantee that an object won’t unexpectedly change in value while you’re using it.
Objects that are good candidates for immutability are ones that encapsulate collections of discrete values or contain values that are stored in buffers (which are themselves kinds of collections, either of characters or bytes). But not all such value objects necessarily benefit from having mutable versions. Objects that contain a single simple value, such as instances of NSNumber or NSDate, are not good candidates for mutability. When the represented value changes in these cases, it makes more sense to replace the old instance with a new instance.
A note on performance, from the same reference:
Performance is also a reason for immutable versions of objects representing things such as strings and dictionaries. Mutable objects for basic entities such as strings and dictionaries bring some overhead with them. Because they must dynamically manage a changeable backing store—allocating and deallocating chunks of memory as needed—mutable objects can be less efficient than their immutable counterparts.

What does "live in the heap" mean?

I'm learning Objectiv C, and I hear the term "live in the heap" constantly, from what I understand its some kind of unknown area that a pointer lives in, but trying to really put head around the exact term...like "we should make our property strong so it won't live in the heap. He said that since the property is private. I know it'ss a big difference It's pretty clear that we want to make sure that we want to count the reference to this object so the autorelease wont clean it (we want to "retain" it from what i know so far), but I want to make sure I understand the term since it's being use pretty often.
Appreciate it
There are three major memory areas used by C (and by extension, Objective C) programs for storing the data:
The static area
The automatic area (also known as "the stack"), and
The dynamic area (also known as "the heap").
When you allocate objects by sending their class a new or alloc message, the resultant object is allocated in the dynamic storage area, so the object is said to live in the heap. All Objective-C objects are like that (although the pointers that reference these objects may be in any of the three memory data areas). In contrast, primitive local variables and arrays "live" on the stack, while global primitive variables and arrays live in the static data storage.
Only the heap objects are reference counted, although you can allocate memory from the heap using malloc/calloc/realloc, in which case the allocation would not be reference-counted: your code would be responsible for deciding when to free the allocated dynamic memory.

Can I snapshot and restore memory of Objective-C object graph?

I'm designing a object persistent code.
IMO, memory snapshot is fastest, reliable and compact persistent method within a few limitation.
It's easy with C structs. I can layout all objects' memory layout manually. I can save all references as index of object collection. So reference is not a problem.
Anyway I want to try this with Objective-C objects. To do this, objects must be positioned in specific location of memory. So, if I can specify memory location of allocation, I can snapshot the memory. And when restoring, I can get an object at specific address.
Of course, all of these are machine-specific and needs many tricks, but it's fine to me.
The only problem is I don't know way to specify location of new Objective-C object. How can I do this?
Generally people use NSCoding and NSKeyedArchiver (or some custom subclass thereof). I think your C method would have worked before the 64-bit runtime, since the data part of objects was implemented using structs, but I think the new runtime's use of nonfragile instance variables would complicate matters. In any event, the program that loads the persistent objects still has to have the class definitions for them, either hard-coded or loaded via bundles.

NSString pointer: theoretical questions after study

I have questions about the NSString pointer. I wanted to get to the bottom of this and actually tried to create a theory to obtain full understanding based on multiple information retrieved from then net. Please believe me when I say that I have not been lazy, and I actually read a lot, but I was still left with uncertainties and questions. Can you please confirm / deny when good / wrong, and I've put additional questions and doubts which are indicated by (?).
Here we go:
If I consider this very basic example:
NSString *sPointer = [[NSString alloc]initWithString:#"This is a pointer"];
[sPointer release];
My starting point was: The compiler reserves RAM memory for a pointer type and this memory (which also has an own address) contains the memory address (hexadecimal - binary) of the memory where another variable is stored (where it points to). The actual pointer would take up about 2 bytes:
1) First some general issue - not necessarily linked to objective C. The actual questions about the NSString pointer will come in point 2.
A string is a "string of characters", where 1 character takes a fixed amount of memory space, let's say 2 bytes.
This automatically means that the memory size occupied by a string variable is defined by the length of the character string.
Then I read this on Wikipedia:
"In modern byte-addressable computers, each address identifies a single byte of storage; data too large to be stored in a single byte may reside in multiple bytes occupying a sequence of consecutive addresses. "
So in this case, the string value is actually contained by multiple addresses and not by a single 1 (this already differs from what I read everywhere) (?).
How are these multiple addresses contained in 1 pointer in reality? Will the pointer be divided into multiple addresses as well?
Do you know what component in a computer actually identifies and assigns the actual address "codes"?
And now my actual questions;
2) In my example, the code does 2 things:
It creates a pointer to the address where the string variable is stored.
It also actually saves the actual "String variable": #"This is a pointer", because otherwise there would be nothing to point to;
OK, my question; I wondered what really happens when you release the pointer [sPointer release]; are you actually releasing the pointer (containing the address), or are you also releasing the actual "string variable" from memory as well?
I have learned that when you delete the reference, the memory where the actual variable is stored will just be overwritten at the time the compiler needs memory so it does not need to be cleared at that time. Is this wrong?
If it is correct, why do they say that it's really important to release the NSString pointer for performance reasons, if you just release the pointer which will basically contain only a few bytes? Or am I wrong, and is the memory where the actual variable is stored in fact also cleared at once with the "release" message?
And finally also: primitive datatypes are not released, but they "do" take memory space at the moment of declaration (but not more than a common pointer). Why shouldn't we release them in fact? What prevents us from doing something like: int i = 5, followed by [i release]?;
I'm sorry - a lot of questions in 1 time! In practice, I never had problems with it, but in theory, I also really want to fully understand it - and I hope that I'm not the only one. Can we discuss the topic?
Thank you and sorry for the bother!
Maybe I'm wrong but I just read yesterday that pointers usually take 4 bytes. That answers none of your questions but you seem really interested in this so I figured I would mention it.
I think the source of your confusion is that you are confusing primitives with Objective-C classes. Objective-C classes (or objects to be exact, instances of classes) can accept messages (similar to method invocations in other languages). retain is one such message. This is why an Objective-C NSString object can receive the retain message but not a primitive like an integer. I think that's another one of your confusions. retain and release etc. are not Objective-C language constructs, they're actual messages (think methods) you send to objects. That is why they apply to Objective-C objects but not to primitives like integers and floats.
Another similar confusion is that what you read about how strings are stored has more to do with C-style strings, like char *name = "john". However, when you create a pointer to an NSString, that points to an NSString instance, which itself decides how to handle storing the actual string bytes/characters. Which may or not be the same way that C strings are stored.
data too large to be stored in a single byte may reside in multiple bytes occupying a sequence of consecutive addresses. " So in this case, the string value is actually contained by multiple addresses and not by a single 1 (this already differs from what I read everywhere) (?). How are these multiple addresses contained in 1 pointer in reality?
In C for example, the pointer would point to the address of the first character in the string.
OK, my question; I wondered what really happens when you release the pointer [sPointer release]; are you actually releasing the pointer (containing the address), or are you also releasing the actual "string variable" from memory as well?
You are sending the release message to the NSString instance/object. This is important to note to avoid further confusion. You are not acting upon the pointer itself, but upon what the pointer is pointing to, which is the NSString object. So you are not releasing the pointer itself. After sending the object the release method, if its reference count has reached 0, then it will handle deallocating itself by deallocating everything it stores, which I imagine includes the actual character string.
If it is correct, why do they say that it's really important to release the NSString pointer for performance reasons, if you just release the pointer which will basically contain only a few bytes?
So yeah, you're actually sending the release message to the string instance and it handles how to deallocate itself if it has to. If you were to simply erase the pointer so that it no longer points at the string instance, then you simply will no longer know where/how to access the data stored at that location, but it won't make it magically disappear, the program won't automatically know that it can use that memory. What you're hinting at is garbage collection, in which, simply put, unused memory will automatically be freed for subsequent use. Objective-C 2.0 does have garbage collection but as far as I know it's not enabled on iOS devices yet. Instead, the new version of iOS will support a feature known as Automatic Reference Counting in which the compiler itself takes care of reference counting.
Sorry if I didn't answer all of your questions, you asked a ton :P If any of my information is wrong please let me know! I tried to limit my answer to what I felt I did know.
"Or am I wrong, and is the memory where the actual variable is stored in fact also cleared at once with the "release" message?" The memory IS NOT CLEARED, but goes into the free memory pool, so that it does in fact reduce the memory print of the program. If you did not release the pointer you would continue to "hog" memory until you consume all the virtual memory available and crash not only your program but potentiality the system as well.
How are these multiple addresses contained in 1 pointer in reality?
Will the pointer be divided into multiple addresses as well? Do you
know what component in a computer actually identifies and assigns the
actual address "codes"?
From the perspective of the programmer (take note of this), the pointer on itself is usually a 4-byte number that represents the offset from start from the memory (32-bits, in 64-bit you can have addresses up to 8 bytes). The thing is that this pointers points to the start of whatever is being stored, and that's it.
In C for example, the original strings used a NULL (\0) terminated strings to identify when a string ended (Try doing a printf() on C with a non-zero ended string, and it will print whatever is in memory until it finds a zero). This of course is pretty dangerous, and one has to use functions like strncpy (notice the "n") indicating that you should manually input the number of chars from the offset until it ends.
A way to circumvent this is storing the used space in the start of the memory address, something like
struct
{
int size;
char *string;
}string;
That stores the size to prevent any issues. Objective-C and many other more abstract language implement in their own way how to handle memory. An NSString* is quite an abstract class to know what happens behind the scenes, it probably inherits all its memory managing from the NSObject.
The whole point I'm try to get is that a pointer contains the starting address, and you can jump from byte-to-byte from there (or jumps of a certain size), keeping in mind the total length of whatever you're storing to avoid doing nasty things like overflowing the stack memory (hence, the name of this site).
Now, how the computer gives this addresses is entirely up to the Operating System, and your logical memory address you use in all your programs is quite different from what the underlying implementation uses (physical memory address). Typically, you'll find that memory is stored in segmented units called "frames", and a used frame is called a "page". And the mapping in between physical and logical is done with a "[Page Table]"2.
As you can see, the software handles pretty much everything, but doesn't mean that there isn't hardware to support this, like the TLB, a cpu-level cache that holds recent memory addresses for quick access.
Also, please take my answer with a grain of salt, it's been a while since I studied these subjects.
OK, my question; I wondered what really happens when you release the
pointer [sPointer release]; are you actually releasing the pointer
(containing the address), or are you also releasing the actual "string
variable" from memory as well? I have learned that when you delete the
reference, the memory where the actual variable is stored will just be
overwritten at the time the compiler needs memory so it does not need
to be cleared at that time. Is this wrong? If it is correct, why do
they say that it's really important to release the NSString pointer
for performance reasons, if you just release the pointer which will
basically contain only a few bytes? Or am I wrong, and is the memory
where the actual variable is stored in fact also cleared at once with
the "release" message?
When you release, you're just decreasing the memory count of the object. What you mean is what happens when it's dealloced (when the count reaches zero).
When you dealloc something, you're basically saying that the space where it has been reserved it's now free to be replaced by anything else requesting memory (through alloc). The variable may still point to a freed space, and this causes problems (Read about dangling pointers and leaks).
The memory might be cleared, but there's no guarantees.
I hope this clears all the doubts, as all of them spawn from your confusion about memory freeing.
And finally also: primitive datatypes are not released, but they "do"
take memory space at the moment of declaration (but not more than a
common pointer). Why shouldn't we release them in fact? What prevents
us from doing something like: int i = 5, followed by [i release]?;
The thing is C has two main things going (actually, a lot more): The Heap that stores memory that has been requested with alloc (or malloc in C), and these require to be freed. And the Stack, which holds local variables, that die when the function/block ends (the stack pops the function call).
In your example, the variable i has been locally declared within its scope, and it's contined in the stack. Trying to peform a dealloc/free (also, the variable i won't respond to release, nor dealloc, as it's not an object) won't work, as is not the type of memory which requires to be freed.
I suggest you going back to C before trying to tackle what Objective-C does, because it's hard to have a clear idea how the imperative programming works with all the nice abstractions like release and dealloc.
For the sake of the forum, I will make a brief and simplified summary of your answers as a conclusion. Thanks to all of you for this extended clarification, the mist disappeared! Don't hesitate to react in case you want to add or rectify something:
Rectification: a pointer on the Mac takes 4 bytes of memory space and not 2.
The pointer *sPointer points to an instance of the NSString class and NOT directly to the memory where the chars are saved.
The NSString instance consists of a set of iVars in which there is a pointer iVar that points to memory allocated where the char variables that make up the string are stored (defined when using the initWithString: instance method).
[sPointer release]; The release message is not sent to the pointer itself, but to the instance of the NSString Object. You are not acting on the pointer itself, but on what the pointer is pointing to (!).
When sending the alloc message, the retain count of the NSString instance object is increased by 1. When sending a "release" message it doesn't mean that the concerned memory is literally being emptied, but it decreases the retain count by 1. When the retain count reaches zero, the compiler knows that the previously allocated memory is available again for re-use.
The way memories addresses are presented is decided by the operating system. The logical memory address used in programs is different from what the underlying implementation actually uses (physical memory address).
LOCAL variables (not necessarily primitive variables) are stored in the Stack memory (unlike objects instances who are stored in the Heap memory). What this means is that they will be destroyed automatically at the end of a function (they are automatically removed from the stack). More info on the memory constructs stack and heap can be found in several threads that clarify the use and difference in their own way. e.g.. What and where are the stack and heap? / http://ee.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html
Before I answer the points, you start with a false premise. A pointer takes up more than 2 bytes on a non-16 bit system. On the Mac, it takes up 4 bytes for a 32 bit executable, and 8 bytes in a 64 executable.
Let me note that the following is not entirely accurate (for optimization and some other reasons, there are several kinds of internal representations of strings and the initXXX functions decide which is instantiated), but for the sake of use and understandingof strings, the explanation is good enough.
An NSString is a class (and a rather complicated one as well). A string, i.e. an instance of that class, contains some administrative ivars and one that is another pointer which points to a piece of allocated memory big enough to at least hold the bytes/code points that make up the string. Your code (the alloc method, to be precise) reserves enough memory to contain all the ivars of the object (including the pointer to the buffer) and returns a pointer to that memory. That is what you store in your pointer (if initWithString: doesn't change it -- but I won't go into that here, let's assume it doesn't). If necessary, initWithString: allocates the buffer large enough to hold the text of the string and stores its memory in the pointer for it, inside the NSString instance. So it is like this:
sPointer NSString instance buffer
+---------------------------+ +-----------------+ +------+
| addr of NSString instance | ----> | ivar | +-> | char |
+---------------------------+ | ivar | | | char |
| ivar (pointer) | --+ | char |
| ivar | | char |
| etc... | | char |
+-----------------+ | char |
| etc. |
+------+
In the case of a hard-coded, literal string like #"Hello", the internal pointer only points to that string, which is already stored in the program, in readonly memory. No memory has to be allocated for it, and the memory can't be freed either.
But let's assume you have a string with allocated contents. release (either coded manually, or invoked by an autorelease pool) will decrement the reference count of the string object (the so called retainCount). If that count reaches zero, your instance of the NSString class will be dealloced, and in the dealloc method of the string, the buffer holding the string text will be released. That memory is not cleared in any way, it is only marked as free by a memory manager, which means it can be re-used again for some other purpose.

How does it know where my value is in memory?

When I write a program and tell it int c=5, it puts the value 5 into a little bit of it's memory, but how does it remember which one? The only way I could think of would be to have another bit of memory to tell it, but then it would have to remember where it kept that as well, so how does it remember where everything is?
Your code gets compiled before execution, at that step your variable will be replaced by the actual reference of the space where the value will be stored.
This at least is the general principle. In reality it will be way more complecated, but still the same basic idea.
There are lots of good answers here, but they all seem to miss one important point that I think was the main thrust of the OP's question, so here goes. I'm talking about compiled languages like C++, interpreted ones are much more complex.
When compiling your program, the compiler examines your code to find all the variables. Some variables are going to be global (or static), and some are going to be local. For the static variables, it assigns them fixed memory addresses. These addresses are likely to be sequential, and they start at some specific value. Due to the segmentation of memory on most architectures (and the virtual memory mechanisms), every application can (potentially) use the same memory addresses. Thus, if we assume the memory space programs are allowed to use starts at 0 for our example, every program you compile will put the first global variable at location 0. If that variable was 4 bytes, the next one would be at location 4, etc. These won't conflict with other programs running on your system because they're actually being mapped to an arbitrary sequential section of memory at run time. This is why it can assign a fixed address at compile time without worrying about hitting other programs.
For local variables, instead of being assigned a fixed address, they're assigned a fixed address relative to the stack pointer (which is usually a register). When a function is called that allocates variables on the stack, the stack pointer is simply moved by the required number of bytes, creating a gap in the used bytes on the stack. All the local variables are assigned fixed offsets to the stack pointer that put them into that gap. Every time a local variable is used, the real memory address is calculated by adding the stack pointer and the offset (neglecting caching values in registers). When the function returns, the stack pointer is reset to the way it was before the function was called, thus the entire stack frame including local variables is free to be overwritten by the next function call.
read Variable (programming) - Memory allocation:
http://en.wikipedia.org/wiki/Variable_(programming)#Memory_allocation
here is the text from the link (if you don't want to actually go there, but you are missing all the links within the text):
The specifics of variable allocation
and the representation of their values
vary widely, both among programming
languages and among implementations of
a given language. Many language
implementations allocate space for
local variables, whose extent lasts
for a single function call on the call
stack, and whose memory is
automatically reclaimed when the
function returns. (More generally, in
name binding, the name of a variable
is bound to the address of some
particular block (contiguous sequence)
of bytes in memory, and operations on
the variable manipulate that block.
Referencing is more common for
variables whose values have large or
unknown sizes when the code is
compiled. Such variables reference the
location of the value instead of the
storing value itself, which is
allocated from a pool of memory called
the heap.
Bound variables have values. A value,
however, is an abstraction, an idea;
in implementation, a value is
represented by some data object, which
is stored somewhere in computer
memory. The program, or the runtime
environment, must set aside memory for
each data object and, since memory is
finite, ensure that this memory is
yielded for reuse when the object is
no longer needed to represent some
variable's value.
Objects allocated from the heap must
be reclaimed—especially when the
objects are no longer needed. In a
garbage-collected language (such as
C#, Java, and Lisp), the runtime
environment automatically reclaims
objects when extant variables can no
longer refer to them. In
non-garbage-collected languages, such
as C, the program (and the programmer)
must explicitly allocate memory, and
then later free it, to reclaim its
memory. Failure to do so leads to
memory leaks, in which the heap is
depleted as the program runs, risking
eventual failure from exhausting
available memory.
When a variable refers to a data
structure created dynamically, some of
its components may be only indirectly
accessed through the variable. In such
circumstances, garbage collectors (or
analogous program features in
languages that lack garbage
collectors) must deal with a case
where only a portion of the memory
reachable from the variable needs to
be reclaimed
There's a multi-step dance that turns c = 5 into machine instructions to update a location in memory.
The compiler generates code in two parts. There's the instruction part (load a register with the address of C; load a register with the literal 5; store). And there's a data allocation part (leave 4 bytes of room at offset 0 for a variable known as "C").
A "linking loader" has to put this stuff into memory in a way that the OS will be able to run it. The loader requests memory and the OS allocates some blocks of virtual memory. The OS also maps the virtual memory to physical memory through an unrelated set of management mechanisms.
The loader puts the data page into one place and instruction part into another place. Notice that the instructions use relative addresses (an offset of 0 into the data page). The loader provides the actual location of the data page so that the instructions can resolve the real address.
When the actual "store" instruction is executed, the OS has to see if the referenced data page is actually in physical memory. It may be in the swap file and have to get loaded into physical memory. The virtual address being used is translated to a physical address of memory locations.
It's built into the program.
Basically, when a program is compiled into machine language, it becomes a series of instructions. Some instructions have memory addresses built into them, and this is the "end of the chain", so to speak. The compiler decides where each variable will be and burns this information into the executable file. (Remember the compiler is a DIFFERENT program to the program you are writing; just concentrate on how your own program works for the moment.)
For example,
ADD [1A56], 15
might add 15 to the value at location 1A56. (This instruction would be encoded using some code that the processor understands, but I won't explain that.)
Now, other instructions let you use a "variable" memory address - a memory address that was itself loaded from some location. This is the basis of pointers in C. You certainly can't have an infinite chain of these, otherwise you would run out of memory.
I hope that clears things up.
I'm going to phrase my response in very basic terminology. Please don't be insulted, I'm just not sure how proficient you already are and want to provide an answer acceptable to someone who could be a total beginner.
You aren't actually that far off in your assumption. The program you run your code through, usually called a compiler (or interpreter, depending on the language), keeps track of all the variables you use. You can think of your variables as a series of bins, and the individual pieces of data are kept inside these bins. The bins have labels on them, and when you build your source code into a program you can run, all of the labels are carried forward. The compiler takes care of this for you, so when you run the program, the proper things are fetched from their respective bin.
The variables you use are just another layer of labels. This makes things easier for you to keep track of. The way the variables are stored internally may have very complex or cryptic labels on them, but all you need to worry about is how you are referring to them in your code. Stay consistent, use good variable names, and keep track of what you're doing with your variables and the compiler/interpreter takes care of handling the low level tasks associated with that. This is a very simple, basic case of variable usage with memory.
You should study pointers.
http://home.netcom.com/~tjensen/ptr/ch1x.htm
Reduced to the bare metal, a variable lookup either reduces to an address that is some statically known offset to a base pointer held in a register (the stack pointer), or it is a constant address (global variable).
In an interpreted language, one register if often reserved to hold a pointer to a data structure (the "environment") that associates variable names with their current values.
Computers ultimately only undertand on and off - which we conveniently abstract to binary. This language is the basest level and is called machine language. I'm not sure if this is folklore - but some programmers used to (or maybe still do) program directly in machine language. Typing or reading in binary would be very cumbersome, which is why hexadecimal is often used to abbreviate the actual binary.
Because most of us are not savants, machine language is abstracted into assembly language. Assemply is a very primitive language that directly controls memory. There are a very limited number of commands (push/pop/add/goto), but these ultimately accomplish everything that is programmed. Different machine architectures have different versions of assembly, but the gist is that there are a few dozen key memory registers (physically in the CPU) - in a x86 architecture they are EAX, EBX, ECX, EDX, ... These contain data or pointers that the CPU uses to figure out what to do next. The CPU can only do 1 thing at a time and it uses these registers to figure out what to do next. Computers seem to be able to do lots of things simultaneously because the CPU can process these instructions very quickly - (millions/billions instructions per second). Of course, multi-core processors complicate things, but let's not go there...
Because most of us are not smart or accurate enough to program in assembly where you can easily crash the system, assembly is further abstracted into a 3rd generation language (3GL) - this is your C/C++/C#/Java etc... When you tell one of these languages to put the integer value 5 in a variable, your instructions are stored in text; the assembler compiles your text into an assembly file (executable); when the program is executed, the program and its instructions are queued by the CPU, when it is show time for that specific line of code, it gets read in the the CPU register and processed.
The 'not smart enough' comments about the languages are a bit tongue-in-cheek. Theoretically, the further you get away from zeros and ones to plain human language, the more quickly and efficiently you should be able to produce code.
There is an important flaw here that a few people make, which is assuming that all variables are stored in memory. Well, unless you count the CPU registers as memory, then this won't be completely right. Some compilers will optimize the generated code and if they can keep a variable stored in a register then some compilers will make use of this!
Then, of course, there's the complex matter of heap and stack memory. Local variables can be located in both! The preferred location would be in the stack, which is accessed way more often than the heap. This is the case for almost all local variables. Global variables are often part of the data segment of the final executable and tend to become part of the heap, although you can't release these global memory areas. But the heap is often used for on-the-fly allocations of new memory blocks, by allocating memory for them.
But with Global variables, the code will know exactly where they are and thus write their exact location in the code. (Well, their location from the beginning of the data segment anyways.) Register variables are located in the CPU and the compiler knows exactly which register, which is also just told to the code. Stack variables are located at an offset from the current stack pointer. This stack pointer will increase and decrease all the time, depending on the number of levels of procedures calling other procedures.
Only heap values are complex. When the application needs to store data on the heap, it needs a second variable to store it's address, otherwise it could lose track. This second variable is called a pointer and is located as global data or as part of the stack. (Or, on rare occasions, in the CPU registers.)
Oh, it's even a bit more complex than this, but already I can see some eyes rolling due to this information overkill. :-)
Think of memory as a drawer into which you decide how to devide it according to your spontaneous needs.
When you declare a variable of type integer or any other type, the compiler or interpreter (whichever) allocates a memory address in its Data Segment (DS register in assembler) and reserves a certain amount of following addresses depending on your type's length in bit.
As per your question, an integer is 32 bits long, so, from one given address, let's say D003F8AC, the 32 bits following this address will be reserved for your declared integer.
On compile time, whereever you reference your variable, the generated assembler code will replace it with its DS address. So, when you get the value of your variable C, the processor queries the address D003F8AC and retrieves it.
Hope this helps, since you already have much answers. :-)