Reading specific bytes of data from a large text file... quickly

Reading specific bytes of data from a large text file... quickly - file-io

For argument's sake, let's say you have a single, enormous file to hold your map save data. The game that comes to mind as a great example is Terraria. They save all MapWidth*MapHeight tile data within a single map file (Horrible idea, really) but they can render only what is visible within the camera (And some outer-lying tiles for smoothness sake) based on the camera position.
So my question is, "How can they search through all of that data in real time starting at the camera position?"
That would entail reading through potentially millions of tile data just to get to the screen coordinates. I understand you could skip bytes of data based on the x/y coordinates if the tile data was consistent (This is all I can find in my week or so of searching), but that is where my problem lies. The tile data is dynamic. If one tile is empty, the data beyond "isValid" is nonexistent. So that is less bytes to search through. If a tile has water, multiple states, a background, etc... it contains all the data and is the largest in terms of bytes. So it is not constant at all. In that case we cannot just skip X amount of bytes as it changes (Constantly as tiles are modified).
My current solutions are: Read it line by line (Ugh), use chunk files, or ensure fixed line sizes (Padding? Data wasted... Ugh).
I know chunks would be the best option, but being able to reach that deep into text files quickly would still be a nice thing to know.

If you have chunk-based data, you need a chunk-based reader, simple as that.
Additionally, if you're particularly interested only in certain parts of the data and you can process it first, is to build a second file/list that stores the offsets to the start of every object in the first file.
In that case, whenever you need to reference an object, you look up the offset first and then do a straight jump to it in your original file. It still requires you to read through the whole file at-least once.

Related

Why do I need resources per swapchain image

I have been following different tutorials and I don't understand why I need resources per swapchain image instead of per frame in flight.
This tutorial:
https://vulkan-tutorial.com/Uniform_buffers
has a uniform buffer per swapchain image. Why would I need that if different images are not in flight at the same time? Can I not start rewriting if the previous frame has completed?
Also lunarg tutorial on depth buffers says:
And you need only one for rendering each frame, even if the swapchain has more than one image. This is because you can reuse the same depth buffer while using each image in the swapchain.
This doesn't explain anything, it basically says you can because you can. So why can I reuse the depth buffer but not other resources?

It is to minimize synchronization in the case of the simple Hello Cube app.
Let's say your uniforms change each frame. That means main loop is something like:
Poll (or simulate)
Update (e.g. your uniforms)
Draw
Repeat
If step #2 did not have its own uniform, then it needs to write a uniform previous frame is reading. That means it has to sync with a Fence. That would mean the previous frame is no longer considered "in-flight".

It all depends on the way You are using Your resources and the performance You want to achieve.
If, after each frame, You are willing to wait for the rendering to finish and You are still happy with the final performance, You can use only one copy of each resource. Waiting is the easiest synchronization, You are sure that resources are not used anymore, so You can reuse them for the next frame. But if You want to efficiently utilize both CPU's and GPU's power, and You don't want to wait after each frame, then You need to see how each resource is being used.
Depth buffer is usually used only temporarily. If You don't perform any postprocessing, if Your render pass setup uses depth data only internally (You don't specify STORE for storeOp), then You can use only one depth buffer (depth image) all the time. This is because when rendering is done, depth data isn't used anymore, it can be safely discarded. This applies to all other resources that don't need to persist between frames.
But if different data needs to be used for each frame, or if generated data is used in the next frame, then You usually need another copy of a given resource. Updating data requires synchronization - to avoid waiting in such situations You need to have a copy a resource. So in case of uniform buffers, You update data in a given buffer and use it in a given frame. You cannot modify its contents until the frame is finished - so to prepare another frame of animation while the previous one is still being processed on a GPU, You need to use another copy.
Similarly if the generated data is required for the next frame (for example framebuffer used for screen space reflections). Reusing the same resource would cause its contents to be overwritten. That's why You need another copy.
You can find more information here: https://software.intel.com/en-us/articles/api-without-secrets-the-practical-approach-to-vulkan-part-1

Labview: save controls under a certain size

I would like to log the values of all controls and indicators on a VI. I can do this by using the invoke node
ctrl val.get all
followed by saving the array of name/variant data clusters to disk using datalog vis.
However, I would now like to impose a size limit: I only want to save the data if the size is under a threshold (e.g. 100 kb) to avoid generating huge files (for instance if the front panel contains an image). I want this function to be generic, so I can't create a list of control names to exclude or sort by control data type.
It seems that one way would be to flatten the variant data to a string and then measure the size of the string, but this seems potentially problematic if the control contains an inordinately large amount of data (e.g. could end up creating a 1 GB string).
Is there a more refined way to handle this problem?

You probably want to inspect each control type to then have a more efficient way to check the size of that type. Your problem of flattening large strings can then be avoided for any known control types you detect. Arrays, images, waveforms, etc could all be inspected once you know the type, specifically for their size without ever having to flatten the data. This would allow you to save the small stuff, ignore known large stuff and still flatten any unknown or unhandled types to string to determine the size, then it stays generic and could be used for any VI. The openG variant tools (among others) have lots of type inspection pieces to use on the controls so shouldn't be too hard to implement.

Good news (for you): LabVIEW is horrendously inefficient at rendering data onto the front panel (WRT memory).
Displaying data on a control takes something like 10x the memory it would take to flatten that same data to a string. There's not much by way of clever, lazy-loading for arrays or any of that. You can still do size filtering on the flattened string if you want to, but it's a safe bet that if the Front Panel is open, flattening control values (one at a time) to string is going to use a trivial amount of memory compared to the memory it takes just to put that stuff on the front panel.

labview - buffer data then save to excel file

My question is with respect to a labVIEW VI (2013), I am trying to modify. (I am only just learning to use this language. I have searched the NI site and stackoverflow for help without success, I suspect I am using the incorrect key words).
My VI consists of a flat sequence one pane of which contains a while loop where integer data is collected from a device and displayed on a graph.
I would like to be able to be able to buffer this data and then send it to disk when a preset number of samples have been collected. My attempts so far result in only the last record being saved.
Specifically I need to know how to save the data in a buffer (array) then when the correct number of samples are captured save it all to disk (saving as it is captured slows the process down to much).
Hope the question is clear and thanks very much in advance for any suggestions.
Tom

Below is a simple circular-buffer that holds the most recent 100 readings. Each time the buffer is refilled, its contents are written to a text file. Drag the image onto a VI's block diagram to try it out.
As you learn more about LabVIEW and as your performance and multi-threaded needs increase, consider reading about some of the LabVIEW design patterns mentioned in the other answers:
State machine: http://www.ni.com/tutorial/7595/en/
Producer-consumer: http://www.ni.com/white-paper/3023/en/

I'd suggest to split the data acquisition and the data saving in two different loops using a producer/consumer design pattern..
Moreover if you need a very high throughput consider using TDMS file format.
Have a look here for an overview: http://www.ni.com/white-paper/3727/en/

Screenshot will definitely help. However, some things are clear:
Unless you are dealing with very high volume of data, very slow hard drives or have other unusual requirements, open the file before your while loop, write to it every time you acquire a sample (leaving buffering to the OS), and close it afterwards.
If you decide you need to manage buffering on your own, you can use queues. See this example: https://decibel.ni.com/content/docs/DOC-14804 for reference (they stream data from disk, buffering it in the queue, but it is the same idea)
My VI consists of a flat sequence one pane of which
Substitute flat sequence for finite state machine (e.g. http://forums.ni.com/t5/LabVIEW/Ending-a-Flat-Sequence-Inside-a-case-structure/td-p/3170025)

How to detect silence and cut mp3 file without re-encoding using NAudio and .NET

I've been looking for an answer everywhere and I was only able to find some bits and pieces. What I want to do is to load multiple mp3 files (kind of temporarily merge them) and then cut them into pieces using silence detection.
My understanding is that I can use Mp3FileReader for this but the questions are:
1. How do I read say 20 seconds of audio from an mp3 file? Do I need to read 20 times reader.WaveFormat.AverageBytesPerSecond? Or maybe keep on reading frames until the sum of Mp3Frame.SampleCount / Mp3Frame.SampleRate exceeds 20 seconds?
2. How do I actually detect the silence? I would look at an appropriate number of the consecutive samples to check if they are all below some threshold. But how do I access the samples regardless of them being 8 or 16bit, mono or stereo etc.? Can I directly decode an MP3 frame?
3. After I have detected silence at say sample 10465, how do I map it back to the mp3 frame index to perform the cutting without re-encoding?

Here's the approach I'd recommend (which does involve re-encoding)
Use AudioFileReader to get your MP3 as floating point samples directly in the Read method
Find an open source noise gate algorithm, port it to C#, and use that to detect silence (i.e. when noise gate is closed, you have silence. You'll want to tweak threshold and attack/release times)
Create a derived ISampleProvider that uses the noise gate, and in its Read method, does not return samples that are in silence
Either: Pass the output into WaveFileWriter to create a WAV File and and encode the WAV file to MP3
Or: use NAudio.Lame to encode directly without a WAV step. You'll probably need to go from SampleProvider back down to 16 bit WAV provider first

BEFORE READING BELOW: Mark's answer is far easier to implement, and you'll almost certainly be happy with the results. This answer is for those who are willing to spend an inordinate amount of time on it.
So with that said, cutting an MP3 file based on silence without re-encoding or full decoding is actually possible... Basically, you can look at each frame's side info and each granule's gain & huffman data to "estimate" the silence.
Find the silence
Copy all the frames from before the silence to a new file
now it gets tricky...
Pull the audio data from the frames after the silence, keeping track of which frame header goes with what audio data.
Start writing the second new file, but as you write out the frames, update the main_data_begin field so the bit reservoir is in sync with where the audio data really is.

MP3 is a compressed audio format. You can't just cut bits out and expect the remainder to still be a valid MP3 file. In fact, since it's a DCT-based transform, the bits are in the frequency domain instead of the time domain. There simply are no bits for sample 10465. There's a frame which contains sample 10465, and there's a set of bits describing all frequencies in that frame.
Plain cutting the audio at sample 10465 and continuing with some random other sample probably causes a discontinuity, which means the number of frequencies present in the resulting frame skyrockets. So that definitely means a full recode. The better way is to smooth the transition, but that's not a trivial operation. And the result is of course slightly different than the input, so it still means a recode.
I don't understand why you'd want to read 20 seconds of audio anyway. Where's that number coming from? You usually want to read everything.
Sound is a wave; it's entirely expected that it crosses zero. So being close to zero isn't special. For a 20 Hz wave (threshold of hearing), zero crossings happen 40 times per second, but each time you'll have multiple samples near zero. So you basically need multiple samples that are all close to zero, but on both sides. 5 6 7 isn't much for 16 bits sounds, but it might very well be part of a wave that will have a maximum at 10000. You really should check for at least 0.05 seconds to catch those 20 Hz sounds.
Since you detected silence in a 50 millisecond interval, you have a "position" that's approximately several hundred samples wide. With any bit of luck, there's a frame boundary in there. Cut there. Else it's time for reencoding.

Efficient way to handle large runtime-generated tile maps?

I am coding a 2 dimensional, tile based (orthogonal tiles) iPhone game. All levels are procedurally generated when the app is first played, and then persist until the user wants a new map. Maps are rather large, being 1000 tiles in both width and height, and the terrain is destructible. At the moment it is rather similar to Terraria, but that will change.
To hold map/tile information I am currently using several 2 dimensional c style arrays. This works well, but I am concerned as to the amount of memory this takes up, as the arrays are all defined as short array[1000][1000], which takes up (1000 * 1000 * sizeof(short)) bytes of space.
This is not particularly desirable when the iPhone doesn't have an incredibly large amount of memory to work with, especially when the user is multitasking. The main problem is that there is no way that I can use a specific tile map format such as .tmx, because all the levels are procedurally generated. Performance could also be an issue, because if a tile is destroyed at index(x, y), then I need to change the data in that index. I have also thought about writing tile map data to a text file, but I think there would be difficulties or performance issues when accessing or changing data.
Keeping all this in mind, what would be an efficient and fast way to handle my tile data?

My gut feeling on this is Core Data structured such that each tile element has relationships to the tiles around it. There's some non-trivial overhead here, but the advantage is that you can release tiles that aren't onscreen from memory and fault them back when you need them. As you move in a direction, you can query for the tiles in that direction, and you can fairly cheaply dump memory when you're in the background. This would get rid of the "several" 2D arrays and move all the data into a single object. In principle, the grid could be infinite in size this way, since everything is by relationship rather than coordinate.
You could similarly approach the problem using SQLite, querying for rows and columns in a given range. You might mark the objects as NSDiscardableContent and put them in an NSCache, which could dramatically improve memory performance. You could still generate an effectively-infinite grid as long as you allow coordinates to be both positive and negative.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas