A few times while using my application I am processing some large data in the background. (To be ready when the user needs it. Something kind of indexing.) When this background process finished it needs to the save the data in a cache file, but since this is really large it take some seconds.
But at the same time the user may open some dialog which displays images and text loaded from the disk. If this happens at the same time while the background process data is saved, the user interface needs to wait until the saving process is completed. (This is not wanted, since the user then have to wait 3-4 seconds until the images and texts from the disk are loaded!)
So I am looking a way to throttling the writing to disk. I thought of splitting up the data in chunks and inserting a short delay between saving the different chunks. In this delay, the user interface will be able to load the needed texts and images, so the user will not recognize a delay.
At the moment I am using [[array componentsJoinedByString:'\n'] writeToFile:#"some name.dic" atomically:YES]. This is very high-level solution which doesn't allow any customization. How can I implement without large data into one file without saving all the data as one-shot?
Does writeToFile:atomically: blocks asynchronous reading?
No. It is like writing to a temporary file. Once completed successfully, then renaming the temporary file to the destination (replacing the pre-existing file at the destination, if it exists).
You should consider how you can break your data up, so it is not so slow. If it's all divided by strings/lines and it takes seconds, and easy approach to divide the database would be by first character. Of course, a better solution could likely be imagined, based on how you access, search, and update the index/database.
…inserting a short delay between saving the different chunks. In this delay, the user interface will be able to load the needed texts and images, so the user will not recognize a delay.
Don't. Just implement the move/replace of the atomic write yourself (writing to a temporary file during index and write). Then your app can serialize read and write commands explicitly for fast, consistent and correct accesses to these shared resources.
You have to look to the class NSFileHandle.
Using combination of seekToEndOfFile and writeData:(NSData *)data you can do the work you wish.
Related
I have three functions that read, process and write respectively. Each function was optimized (to the best of my knowledge) to work independently. Now, I am trying to pass each result of each function the next one in the chain as soon as it is available instead of waiting for the entire list. I am not really sure how I can connect them. Here's what I have so far.
def main(files_to_load):
loaded_files = load(files_to_load)
with ThreadPool(processes=cpu_count()) as pool:
proccessed_files = pool.map_async(processing_function_with_Pool, iterable=loaded_files).get()
write(proccessed_files)
As you can see, my main() function waits for all the files to load (about 500Mb) stores them to memory and sends them to processing_function_with_Pool() which divides the files into chunks to be processed.After all the processing is done, the files will start to be written to disk. I feel like there's a lot of unnecessary waiting between these three steps. How can I connect everything?
Now your logic is reading all the files sequentially (I guess) and storing them at once in memory.
I'd recommend you to send to processing_function_with_Pool just a list with the file names to be processed.
The processing_function_with_Pool will take care of reading, processing the file and writing the results back.
In this way you'll take advantage of dealing with IO concurrently.
If the processing_function_with_Pool is doing CPU-bound work, I'd suggest you to switch to a Pool of processes.
I have more of a conceptional question..hopefully that is okay.
Is AsyncStorage ment for repeated calls? For example... I have an application with a slideshow which I want the user to be able to remeber where in the slideshow they were each time they open the app.
I was thinking of using AsyncStore to update the index on it each time it switches slides.. but am worried that means I cam accessing it too much and constantly resetting the index. Is that over the top or is that use in a way of how it is intended to be used?
Thanks!
Your approach is totally okay. I use AsncStorage also for a Slider-Component and save its value to asncStorage after each change-event. But i've bounced it to only one write within 500 ms (the last write will win here).
If you plan to store bigger documents, you should consider, that internally the AsyncStorage stores all keys and values within one huge file. Depending on the overall size it could be get slow and battery consuming then.
I'm new to redis, and I think I have a good use case for redis. What I'm trying to do is to cache an mp3 file for a short time. These MP3s are >2M in side, but I'm also only talking maybe 5-10 stored any any moment in time. The TTL on them would be fairly short too, minutes, not hours, etc.
(disk persistence isn't an option).
So, what I'm wondering, do I need to get fancy and Base64 out the mp3 to store it? Or can I simply set keyvalue=bytearray[]?
This redis hit will be from a web service, which in turn, gets it's data from a downstream service with disk hits. So what I'm trying to do is to cache the mp3 file a short time on my middleware if you will. I won't need to do it for every file, just the ones >2M so I don't have to keep going bak to the downstream servers and request the file from the disk again.
Thanks!
Nick
You can certainly store them, and 2MB is nothing for redis to store.
Redis is binary safe and you don't need to base64 your data, just store via byte array in your favorite client.
One thing I'd consider doing (it might not be worth it with 2Mb of data, but if I were to store video files for example) is to store the file a sequence of chunks, and not load everything at once. If your app won't hold many files in memory at once, and the files are not that big, it might not be worth it. But if you're expecting high concurrency, do consider this as it will save application memory, not redis memory.
You can do this in a number of ways:
You can store each block as an element of a sorted set with the sequence number as score, and read them one by one, so you won't have to load everything to memory at once.
You can store the file a complete string, but read chunks with GETRANGE.
e.g.
GETRANGE myfile.mp3 0 10000
GETRANGE myfile.mp3 100000 200000
... #until you read nothing of course
We have a tool which loads data from some optical media, and once it's all copied to the hard drive runs it through a third-party tool for processing. I would like to optimise this process so each file is processed as it is read in. Trouble is, the third-party tool (which naturally I cannot change) has a 12 second startup overhead. What is the best way I can deal with this, in terms of finishing the entire process as soon as possible? I can pass any number of files to the processing tool in each run, so I need to be able to determine exactly when to run the tool to get the fastest result overall. The data being copied could be anything from one large file (which can't be processed until it's fully copied) to hundreds of small files.
The simplest would be to create and run 2 threads, one that runs the tool and one that loads data. Start 12 seconds timer and trigger both threads. Upon each file load completion check the passed time. If 12 seconds passed, fetch the data into the thread running the tool. Restart loading the data in parallel to processing of previous bulk. Once previous bulk processing completes restart the 12 sec timer and continue checking it upon every file load completion. Repeat till no more data remains.
For better results a more complex solution might be required. You can do some benchmarking to get an evaluation of average data loading time. Since it might be different for small and large files, several evaluations may be needed for different categories of files (according to size). Optimal resources utilization would be the one that processes the data in the same rate the new data arrives. Processing time includes the 12 seconds startup. The benchmarking should give you a ratio of processing threads number vs. reading threads number (you can also decrease/increase the number of active reading threads according to the incoming file sizes). Actually, it's a variation of producer-consumer problem with multiple producers and consumers.
I'd like to provide an accessor on a class that provides an NSInputStream for STDIN, which may be several hundred megabytes (or gigabytes, though unlikely, perhaps) of data.
When a caller gets this NSInputStream it should be able to read from it without worrying about exhausting the data it contains. In other words, another block of code may request the NSInputStream and will expect to be able to read from it.
Without first copying all of the data into an NSData object which (I assume) would cause memory exhaustion, what are my options for handling this? The returned NSInputStream does not have to be the same instance, it simply needs to provide the same data.
The best I can come up with right now is to copy STDIN to a temporary file and then return NSInputStream instances using that file. Is this pretty much the only way to handle it? Is there anything I should be cautious of if I go the temporary file route?
EDIT | I should mention, it's not actually STDIN, this is in a multithreaded FastCGI application and it's the FCGX_Request.in stream which did come from STDIN.
When reading data from a pipe or socket, you have three options:
Process it and forget it.
Add it to a complete record in memory and process it before or after doing so.
Add it to a complete file and process it before or after doing so.
That's the complete list. There's nowhere else to record it but short-term or long-term storage, so the only other thing you can do with data you read is to not record it at all.
The only other way to get the data again is for whatever sent it to you to send it again.