I've just made a simple RAM memory in Minecraft (with redstone), with 4bits for the adress and 4bits stored in each cell. Our next goal is to store different kinds of variables in it and to process them differently.
We are not engineers, so we don't really know, but we have made some quite complex things and we think we can do this. The problem is that we can't figure out how to store variables of more bits that can be stored in a single cell. I'll give an example.
Think of a 16bit variable. We thought that there's no sense in creating big cells so we decided to store that data storing 4bits in each cell. But that's not enough, we had to relate those 4 cells. So we thought that we had to create 8bit cells, with 4bits of content and 4bits to store the address where the next 4bits of the variable are stored. However, 4bits of address is nothing for RAM, we can't store nothing there. So we would need at least 8bits for the address. 4bits of content also seems quite low, and we also need at least other 4bits to store the type of the variable.
Well, finally we thought that technique was absurd and that it coudn't be done like that in real life. And we don't know how to do it now. I've searched on the web about how RAM works and the few that I've find was too complex for our needs.
Could someone please explain us how this is done in real life?
Heh you're playing the blame game, trying to pin all the responsibility of memory management on the physical RAM implementation.
In fact, RAM is just that, a storage device (your redstone tiles), actually storing data in it is your program's responsibility. Put in other words, there doesn't need to be a standardized memory cell "linking" strategy for RAM, because it's your program that writes to it and then reads it back, so it knows its own common practices.
With that in mind, storing values is easy. Say you want a 16bit integer stored in your 4bit/word RAM (so 4 words of data). Simply refer to addresses 0 through 4 as your variable and that's it. No "linking" necessary because you both know how to read from it and write to it, and you won't step on your own toes (in theory).
Additional thoughts for growing your construct: special locations for specialized registries (stack pointer to use a stack for recursive computing, program pointer for a turing machine etc). I had one more but I forgot it while writing that one, if I'll remember it I'll edit..
Related
In C++, map class is very comfortable. Instead of going for a separate database I want to store all the rows as objects and I want to create map object for the columns to search. I am concerned with maximum objects a process can handle. And is using map function to retrieve an object among, say, 10 million objects, if linux permits, is a good choice? I'm not worried about persisting the data.
What you are looking for is std::map::max_size, quoting from the reference:
...reflects the theoretical limit on the size of the container. At runtime, the size of the container may be limited to a value smaller than max_size() by the amount of RAM available.
No, there is no maximum number of objects per process. Objects (as in, C++ objects) are an abstraction which the OS is unaware of. The only meaningful limit in this regard is the amount of memory used.
You can completely fill your RAM using as much map as it takes, I promise.
As you can see in reference documentation, the constannt map::max_size will let you know the numbers.
This should be 2^31-1 on iX86 hardware/OS and 2^64-1 on amd64 hardware/64bit OS
Possible additionnal information here.
Object is a concept in programming language. In fact, the processes are not aware of the objects. With enough RAM space, you can alloc as many objects as possible in your program.
About your second question, my answer is that which data structure you choose in your program depends on the problem that you want to solve in your program. Map is a suitable data structure for quick accessing objects, testing existance, etc, but is not good enough to maintain the objects' order.
Can optimizers get rid of bad uses of spatial locality? I'm maintaining some code written by somebody else, and many of their arrays are declared in haphazard orders, and iterated differently every time they are called.
Because of the complexity of the code it would be quite the block of time to try and remanage every time the arrays were cycled. I'm not skilled enough at reading assembly language to be able to tell exactly whats different with varying levels of optimization, but my question is,
Is locality important when writing programs, or does that get optimized away so I can not worry about it?
Getting locality right is important, because it can make a difference of two orders of magnitude (5-6 orders of magnitude if you have page faults) of difference in runtime.
Apart from the fact that real compilers usually don't handle this automatically (as Joel Falcou said), even a hypothetical compiler would have a very hard time doing such a thing. In many cases, it may not even be valid for the compiler to do such a thing, and it is very hard to predict when it is or when it is not.
Say, for example, you have vertex data that you calculate on the CPU, and which you upload to a graphics API such as OpenGL or DirectX. You've agreed with that API a certain vertex data layout. Now the compiler figures that it is more efficient to rearrange the layout in some way. Bang, you're dead.
How was the compiler supposed to know?
Say you have a few arrays and a few pointers, and some pointers alias others, or some point into the middle of an array for some reason, others point at the beginning. The compiler figures that it's more efficient to do certain operations in a different order, overwriting one result with another.
The data corruption issue left aside, let's say those arrays are "somewhat big", so they're most certainly going to be dynamically allocated rather than being on the stack. Which means their start addresses are "non-deterministic" or even "random" from the compiler's point of view. How is the compiler going to make decisions -- at compile time -- not knowing half of the details?
Few to none compiler handle data layout for locality. It's still an active research domain.
I'm writing an API that gets information about the CPU (using CPUID). What I'm wondering is should I store the values from the bit field returned by calling CPUID in separate integer values, or should I just store the entire bit field in a value and write functions to get the different values on-the-fly?
What is preferable in this case? Memory usage or speed? If it's memory usage, I'll just store the entire bit field in a single variable. If it's speed, I'll store each value in a separate variable.
You're only going to query a CPU once. With modern computers having both huge amounts of memory and processing power, it would make no difference either way.
Just do what would make more sense for the next person who reads it.
Programs must be written for people to read, and only incidentally for machines to execute.
— The Structure and Interpretation of Computer Programs
I think it does not matter here, b/c you will not call your CPU-id code 10000 times per second.. will you?
I think you can define different interface (method) for different value. this is more clear and easy to use. a clear, accuracy & easy to use of interface should be the first thing to consider, then performance (memory usage & speed).
i am working on embedded software projects in automotive domain. In one of my projects, the application software consumes almost 99% of RAM memory. Actual RAM size available is 12KB. we use TMS470R1B1 Titan F05 microcontroller. I have done some optimisation like finding unused messages in software and deleting them but its still not worth reducing RAM. could you please suggest some good ways to reduce the RAM by some software optimisation?
Unlike speed optimisation, RAM optimisation might be something that requires "a little bit here, a little bit there" all through the code. On the other hand, there may turn out to be some "low hanging fruit".
Arrays and Lookup Tables
Arrays and look-up tables can be good "low-hanging fruit". If you can get a memory map from the linker, check that for large items in RAM.
Check for look-up tables that haven't used the const declaration properly, which puts them in RAM instead of ROM. Especially look out for look-up tables of pointers, which need the const on the correct side of the *, or may need two const declarations. E.g.:
const my_struct_t * param_lookup[] = {...}; // Table is in RAM!
my_struct_t * const param_lookup[] = {...}; // In ROM
const char * const strings[] = {...}; // Two const may be needed; also in ROM
Stack and heap
Perhaps your linker config reserves large amounts of RAM for heap and stack, larger than necessary for your application.
If you don't use heap, you can possibly eliminate that.
If you measure your stack usage and it's well under the allocation, you may be able to reduce the allocation. For ARM processors, there can be several stacks, for several of the operating modes, and you may find that the stacks allocated for the exception or interrupt operating modes are larger than needed.
Other
If you've checked for the easy savings, and still need more, you might need to go through your code and save "here a little, there a little". You can check things like:
Global vs local variables
Check for unnecessary use of static or global variables, where a local variable (on the stack) can be used instead. I've seen code that needed a small temporary array in a function, which was declared static, evidently because "it would take too much stack space". If this happens enough times in the code, it would actually save total memory usage overall to make such variables local again. It might require an increase in the stack size, but will save more memory on reduced global/static variables. (As a side benefit, the functions are more likely to be re-entrant, thread-safe.)
Smaller variables
Variables that can be smaller, e.g. int16_t (short) or int8_t (char) instead of int32_t (int).
Enum variable size
enum variable size may be bigger than necessary. I can't remember what ARM compilers typically do, but some compilers I've used in the past by default made enum variables 2 bytes even though the enum definition really only required 1 byte to store its range. Check compiler settings.
Algorithm implementation
Rework your algorithms. Some algorithms have have a range of possible implementations with a speed/memory trade-off. E.g. AES encryption can use an on-the-fly key calculation which means you don't have to have the entire expanded key in memory. That saves memory, but it's slower.
Deleting unused string literals won't have any effect on RAM usage because they aren't stored in RAM but in ROM. The same goes for code.
What you need to do is cut back on actual variables and possibly the size of your stack/stacks. I'd look for arrays that can be resized and unused varaibles. Also, it's best to avoid dynamic allocation because of the danger of memory fragmentation.
Aside from that, you'll want to make sure that constant data such as lookup tables are stored in ROM. This can usually be achieved with the const keyword.
Make sure the linker produces a MAP file - it will show you where the RAM is used. Sometimes you can find things like string literals/constants that are kept in RAM. Sometimes you'll find there are unused arrays/variables put there by someone else.
IF you have the linker map file it's also easy to attack the modules which are using the most RAM first.
Here are the tricks I've used on the Cell:
Start with the obvious: squeeze 32-bit words into 16s where possible, rearrange structures to eliminate padding, cut down on slack in any arrays. If you've got any arrays of more than eight structures, it's worth using bitfields to pack them down tighter.
Do away with dynamic memory allocation and use static pools. A constant memory footprint is much easier to optimize and you'll be sure of having no leaks.
Scope local allocations tightly so that they don't stay on stack longer than they have to. Some compilers are very bad at recognizing when you're done with a variable, and will leave it on the stack until the function returns. This can be bad with large objects in outer functions that then eat up persistent memory they don't have to as the outer function calls deeper into the tree.
alloca() doesn't clean up until a function returns, so can waste stack longer than you expect.
Enable function body and constant merging in the compiler, so that if it sees eight different consts with the same value, it'll put just one in the text segment and alias them with the linker.
Optimize executable code for size. If you've got a hard realtime deadline, you know exactly how fast your code needs to run, so if you've any spare performance you can make speed/size tradeoffs until you hit that point. Roll loops, pull common code into functions, etc. In some cases you may actually get a space improvement by inlining some functions, if the prolog/epilog overhead is larger than the function body.
The last one is only relevant on architectures that store code in RAM, I guess.
w.r.t functions, following are the handles to optimise the RAM
Make sure that the number of parameters passed to a functions is deeply analysed. On ARM architectures as per AAPCS(ARM arch Procedure Call standard), maximum of 4 parameters can be passed using the registers and rest of the parameters would be pushed into the stack.
Also consider the case of using a global rather than passing the data to a function which is most frequently called with the same parameter.
The deeper the function calls, the heavier is the use of the stack. use any static analysis tool, to get to know worst cast function call path and look for venues to reduce it. When function A is calling function B, B is calling C, which in turn calls D, which in turn calls E and goes deeper. In this case registers can't be at all levels to pass the parameters and so obviously stack will be used.
Try for venues for clubbing the two parameters into one wherever applicable. remember that all the registers are of 32bit in ARM and so further optimisation is also possible.
void abc(bool a, bool b, uint16_t c, uint32_t d, uint8_t e)// makes use of registers and stack
void abc(uint8 ab, uint16_t c, uint32_t d, uint8_t e)//first 2 params can be clubbed. so total of 4 parameters can be passed using registers
Have a re-look on nested interrupt vectors. In any architecture, we use to have scratch-pad registers and preserved registers. Preserved registers are something which needs to be saved before the servicing the interrupt. In case of nested interrupts it will be needing huge stack space to back up the preserved registers to and from the stack.
if objects of type such as structure is passed to the function by value, then it pushes so much of data(depending on the struct size) which will eat up stack space easily. This can be changed to pass by reference.
regards
barani kumar venkatesan
Adding to the previous answers.
If you are running your program from RAM for faster execution, you can create a user defined section which contains all the initialization routines which you are sure that it wont run more than once after your system boots up. After all the initialization functions executed, you can re use the region for heap.
This can be applied to the data section which are identified as not helpful after a certain stage in your program.
I'm working on a project in Objective-c where I need to work with large quantities of data stored in an NSDictionary (it's around max ~2 gigs in ram). After all the computations that I preform on it, it seems like it would be quicker to save/load the data when needed (versus re-parsing the original file).
So I started to look into saving large amount of data. I've tried using NSKeyedUnarchiver and [NSDictionary writeToFile:atomically:], but both failed with malloc errors (Can not allocate ____ bytes).
I've looked around SO, Apple's Dev forums and Google, but was unable to find anything. I'm wondering if it might be better to create the file bit-by-bit instead of all at once, but I can't anyway to add to an existing file. I'm not completely opposed to saving with a bunch of small files, but I would much rather use one big file.
Thanks!
Edited to include more information: I'm not sure how much overhead NSDictionary gives me, as I don't take all the information from the text files. I have a 1.5 gig file (of which I keep ~1/2), and it turns out to be around 900 megs through 1 gig in ram. There will be some more data that I need to add eventually, but it will be constructed with references to what's already loaded into memory - it shouldn't double the size, but it may come close.
The data is all serial, and could be separated in storage, but needs to all be in memory for execution. I currently have integer/string pairs, and will eventually end up with string/strings pairs (with all the values also being a key for a different set of strings, so the final storage requirements will be the same strings that I currently have, plus a bunch of references).
In the end, I will need to associate ~3 million strings with some other set of strings. However, the only important thing is the relationship between those strings - I could hash all of them, but NSNumber (as NSDictionary needs objects) might give me just as much overhead.
NSDictionary isn't going to give you the scalable storage that you're looking for, at least not for persistence. You should implement your own type of data structure/serialisation process.
Have you considered using an embedded sqllite database? Then you can process the data but perhaps only loading a fragment of the data structure at a time.
If you can, rebuilding your application in 64-bit mode will give you a much larger heap space.
If that's not an option for you, you'll need to create your own data structure and define your own load/save routines that don't allocate as much memory.