How do you deal with inserting and removing data from a big buffer on the hard drive? - serialization

When you have something that's serialised on the hard drive, and you want to add and remove data from the middle, how do you address the issue that the entire buffer has to be resized/moved? I'm thinking of a serialised game world which could hold hundreds of megabytes or perhaps gigabytes of information about a scene graph. If you add an object at the beginning, assuming you have the file open as a stream, are you supposed to push all those bytes after the insertion back? The same goes deletion. I supposed this is a problem that's faced when working with a serialised bufferin RAM/heap, an array. The thing is that when you're working in system RAM things aren't serialised, they might be nodes as pointers to other memory.
Is there any reading you can point me to to solve this issue?

Related

Why can't I store un-serialized data structure on disk the same way I can store them in memory?

Firstly, I am assuming that data structures, like a hash-map for example, can only be stored in-memory but not on disk unless they are serialized. I want to understand why not?
What is holding us back from dumping a block of memory which stores the data structure directly into disk without any modifications?
Something like a JSON could be thought of as a "serialized" python dictionary. We can very well store JSON in files, so why not a dict?
You may say how would you represent non-string values like bool/objects on disk? I can argue "the same way you store them in memory". Am I missing something here?
naming a few problems:
Big endian vs Little endian makes reading data from disk depend on the architecture of the CPU, so if you just dumped it you won't be able to read it again on different device.
items are not contagious in memory, a list (or dictionary) for example only contains pointers to things that exist "somewhere" in memory, you can only dump contagious memory, otherwise you are only storing the locations in memory that the data was in, which won't be the same when you load the program again.
the way structures are laid in memory can change between two compiled versions of the same program, so if you just recompile your application, you may get different layouts for structures in memory so you just lost your data.
different versions of the same application may wish to update the shape of the structures to allow extra functionality, this won't be possible if the data shape on disk is the same as in memory. (which is one of the reasons why you shouldn't be using pickle for portable data storage, despite it using a memory serializer)

labview - buffer data then save to excel file

My question is with respect to a labVIEW VI (2013), I am trying to modify. (I am only just learning to use this language. I have searched the NI site and stackoverflow for help without success, I suspect I am using the incorrect key words).
My VI consists of a flat sequence one pane of which contains a while loop where integer data is collected from a device and displayed on a graph.
I would like to be able to be able to buffer this data and then send it to disk when a preset number of samples have been collected. My attempts so far result in only the last record being saved.
Specifically I need to know how to save the data in a buffer (array) then when the correct number of samples are captured save it all to disk (saving as it is captured slows the process down to much).
Hope the question is clear and thanks very much in advance for any suggestions.
Tom
Below is a simple circular-buffer that holds the most recent 100 readings. Each time the buffer is refilled, its contents are written to a text file. Drag the image onto a VI's block diagram to try it out.
As you learn more about LabVIEW and as your performance and multi-threaded needs increase, consider reading about some of the LabVIEW design patterns mentioned in the other answers:
State machine: http://www.ni.com/tutorial/7595/en/
Producer-consumer: http://www.ni.com/white-paper/3023/en/
I'd suggest to split the data acquisition and the data saving in two different loops using a producer/consumer design pattern..
Moreover if you need a very high throughput consider using TDMS file format.
Have a look here for an overview: http://www.ni.com/white-paper/3727/en/
Screenshot will definitely help. However, some things are clear:
Unless you are dealing with very high volume of data, very slow hard drives or have other unusual requirements, open the file before your while loop, write to it every time you acquire a sample (leaving buffering to the OS), and close it afterwards.
If you decide you need to manage buffering on your own, you can use queues. See this example: https://decibel.ni.com/content/docs/DOC-14804 for reference (they stream data from disk, buffering it in the queue, but it is the same idea)
My VI consists of a flat sequence one pane of which
Substitute flat sequence for finite state machine (e.g. http://forums.ni.com/t5/LabVIEW/Ending-a-Flat-Sequence-Inside-a-case-structure/td-p/3170025)

Reading specific bytes of data from a large text file... quickly

For argument's sake, let's say you have a single, enormous file to hold your map save data. The game that comes to mind as a great example is Terraria. They save all MapWidth*MapHeight tile data within a single map file (Horrible idea, really) but they can render only what is visible within the camera (And some outer-lying tiles for smoothness sake) based on the camera position.
So my question is, "How can they search through all of that data in real time starting at the camera position?"
That would entail reading through potentially millions of tile data just to get to the screen coordinates. I understand you could skip bytes of data based on the x/y coordinates if the tile data was consistent (This is all I can find in my week or so of searching), but that is where my problem lies. The tile data is dynamic. If one tile is empty, the data beyond "isValid" is nonexistent. So that is less bytes to search through. If a tile has water, multiple states, a background, etc... it contains all the data and is the largest in terms of bytes. So it is not constant at all. In that case we cannot just skip X amount of bytes as it changes (Constantly as tiles are modified).
My current solutions are: Read it line by line (Ugh), use chunk files, or ensure fixed line sizes (Padding? Data wasted... Ugh).
I know chunks would be the best option, but being able to reach that deep into text files quickly would still be a nice thing to know.
If you have chunk-based data, you need a chunk-based reader, simple as that.
Additionally, if you're particularly interested only in certain parts of the data and you can process it first, is to build a second file/list that stores the offsets to the start of every object in the first file.
In that case, whenever you need to reference an object, you look up the offset first and then do a straight jump to it in your original file. It still requires you to read through the whole file at-least once.

How variables are stored in RAM memory?

I've just made a simple RAM memory in Minecraft (with redstone), with 4bits for the adress and 4bits stored in each cell. Our next goal is to store different kinds of variables in it and to process them differently.
We are not engineers, so we don't really know, but we have made some quite complex things and we think we can do this. The problem is that we can't figure out how to store variables of more bits that can be stored in a single cell. I'll give an example.
Think of a 16bit variable. We thought that there's no sense in creating big cells so we decided to store that data storing 4bits in each cell. But that's not enough, we had to relate those 4 cells. So we thought that we had to create 8bit cells, with 4bits of content and 4bits to store the address where the next 4bits of the variable are stored. However, 4bits of address is nothing for RAM, we can't store nothing there. So we would need at least 8bits for the address. 4bits of content also seems quite low, and we also need at least other 4bits to store the type of the variable.
Well, finally we thought that technique was absurd and that it coudn't be done like that in real life. And we don't know how to do it now. I've searched on the web about how RAM works and the few that I've find was too complex for our needs.
Could someone please explain us how this is done in real life?
Heh you're playing the blame game, trying to pin all the responsibility of memory management on the physical RAM implementation.
In fact, RAM is just that, a storage device (your redstone tiles), actually storing data in it is your program's responsibility. Put in other words, there doesn't need to be a standardized memory cell "linking" strategy for RAM, because it's your program that writes to it and then reads it back, so it knows its own common practices.
With that in mind, storing values is easy. Say you want a 16bit integer stored in your 4bit/word RAM (so 4 words of data). Simply refer to addresses 0 through 4 as your variable and that's it. No "linking" necessary because you both know how to read from it and write to it, and you won't step on your own toes (in theory).
Additional thoughts for growing your construct: special locations for specialized registries (stack pointer to use a stack for recursive computing, program pointer for a turing machine etc). I had one more but I forgot it while writing that one, if I'll remember it I'll edit..

Saving large objects to file

I'm working on a project in Objective-c where I need to work with large quantities of data stored in an NSDictionary (it's around max ~2 gigs in ram). After all the computations that I preform on it, it seems like it would be quicker to save/load the data when needed (versus re-parsing the original file).
So I started to look into saving large amount of data. I've tried using NSKeyedUnarchiver and [NSDictionary writeToFile:atomically:], but both failed with malloc errors (Can not allocate ____ bytes).
I've looked around SO, Apple's Dev forums and Google, but was unable to find anything. I'm wondering if it might be better to create the file bit-by-bit instead of all at once, but I can't anyway to add to an existing file. I'm not completely opposed to saving with a bunch of small files, but I would much rather use one big file.
Thanks!
Edited to include more information: I'm not sure how much overhead NSDictionary gives me, as I don't take all the information from the text files. I have a 1.5 gig file (of which I keep ~1/2), and it turns out to be around 900 megs through 1 gig in ram. There will be some more data that I need to add eventually, but it will be constructed with references to what's already loaded into memory - it shouldn't double the size, but it may come close.
The data is all serial, and could be separated in storage, but needs to all be in memory for execution. I currently have integer/string pairs, and will eventually end up with string/strings pairs (with all the values also being a key for a different set of strings, so the final storage requirements will be the same strings that I currently have, plus a bunch of references).
In the end, I will need to associate ~3 million strings with some other set of strings. However, the only important thing is the relationship between those strings - I could hash all of them, but NSNumber (as NSDictionary needs objects) might give me just as much overhead.
NSDictionary isn't going to give you the scalable storage that you're looking for, at least not for persistence. You should implement your own type of data structure/serialisation process.
Have you considered using an embedded sqllite database? Then you can process the data but perhaps only loading a fragment of the data structure at a time.
If you can, rebuilding your application in 64-bit mode will give you a much larger heap space.
If that's not an option for you, you'll need to create your own data structure and define your own load/save routines that don't allocate as much memory.