Is there any performance difference between creating an NSFileHandle for a large versus a small file? - objective-c

This question strikes me as almost silly, but I just want to sanity check myself. For a variety of reasons, I'm welding together a bunch of files into a single megafile before packing this as a resource in my iOS app. I'm then using NSFileHandle to open the file, seek to the right place, and read out just the bytes I want.
Is there any performance difference between doing it this way and reading loose files? Or, supposing I could choose to use just one monolithic megafile, versus, say, 10 medium-sized (but still joined) files, is there any performance difference between "opening" the large versus a smaller file?
Since I know exactly where to seek to, and I'm reading just the bytes I want, I don't see how there could be a difference. But, hey -- Stranger things have proved to be. Thanks in advance!

There could be a difference if it was an extremely large number of files. Every open file uses up resources in memory (file handles, and the like), and on some storage devices, a file will take up an entire block even if it doesn't fill it. That can lead to wasted space in extreme cases. But in practice, it probably won't be a problem. To know for sure, you can profile your code and see if it's faster one way vs. the other, and see what sort of space it takes up on a typical device.

Related

Why does reading a file multiple times vary in reading-time?

This is perhaps a very obvious question due simple computer science rules, but is there a good explanation of why it varies so much from time to time? Reading a small file can sometimes take a few milliseconds and other times it can take a few seconds. Of course this depends on how you read the file, and also what language you read it in (i.e the programming language).
Maybe there isn't a obvious answer for this? I'm not sure, I havn't read much about it, so that is why I'm asking the question.
One thing that can cause varying read time is whether the file is in memory or not.
Disk is a much slower read than from memory. So if a file has been read and placed in memory, it will be much quicker to read from that file afterwards, until it gets kicked out of memory.

How to deal with thousands of small audio files?

Need to implement an app that has a feature to play sounds. Each sound will be some word sound, number of expected sounds is about one thousand. So, the most simple solution would be to store those sounds as sound files, each word sound in separate sound file, and to play them on demand. Would there be any potential problems with such a large number of files?
No problem with that many files, but they will take up more space than just the total of their sizes. Each file will fill up a whole # of space blocks on the device. On average you will then waste half a block (as a rule of thumb) unless all your files are significantly smaller than one block, in which case you will always use 1.000 blocks (one pr. file) and waste 1000 * (blocksize - average file size).
Things you could do:
Concatenate the files into one big file, store the start and length of each subfile, either read the chunk into memory or copy to a temporary file.
Drop the files in a database as BLOB fields for easier retrieval. This won't save space, but may make your code simpler or more reliable.
I don't think you need to make your own caching mechanism. Most likely iOS has a system-wide cache that does a far better job. That should only be relevant if you experience performance issues and need to get much shorter load times. In that case prhaps consider using bolcks for loading and dispatching the playing, as that's an easier way to hide the load latency and avoid UI freezes.
If your audio is uncompressed, the App Store will report the compressed size. If that differs a lot from the unpacked size, some (nitpicking) customers will definitely notice ald complain, as they think the advertised size is the install size. I know from personal experience. They wil generally not take a technical answer for an answer, any may even bypass talking to you, and just downvote you based on this. I s#it you not.
You should be fine storing 1000 audio clip files within the IPA but it is important to take note about the space requirements and organisation.
Also to take into consideration is the fact that accessing the disk is slower than memory and it also takes up battery space so it my be ideal to load up the most frequently used audio clips into memory.
If you can afford it, use FMOD which I believe can extract audio from various compressed schemes. If you just want to handle all those files yourself create a .zip file and extract them on the fly using libz (iOS library libs.dylib).

MPI-2 file format options

I am trying to speed up my file I/O using MPI-2, but there doesn't appear to be any way to read/write formatted files. Many of my I/O files are formatted for ease of pre and post-processing.
Any suggestions for an MPI-2 solution for formatted I/O?
The usual answer to using MPI-IO while generating some sort of portable, sensible file format is to use HDF5 or NetCDF4 . There's a real learning curve to both (but also lots of tutorials out there) but the result is you hve portable, self-describing files that there are a zillion tools for accessing, manipulating, etc.
If by `formatted' output you mean plain human-readable text, then as someone who does a lot of this stuff, I wouldn't be doing my job if I didn't urge you enough to start moving away from that approach. We all by and large start that way, dumping plain text so we can quickly see what's going on; but it's just not a good approach for doing production runs. The files are bloated, the I/O is way slower (I routinely see 6x slowdown in using ascii as vs binary, partly because you're writing out small chunks at a time and partly because of the string conversions), and for what? If there's so little data being output that you actually can feasibly read and understand the output, you don't need parallel I/O; if there are so many numbers that you can't really plausibly flip through them all and understand what's going on, then what's the point?

How can I accelerate the generation of the an MD5 Checksum within vb.net?

I'm working with some very large files residing on P2 (Panasonic) cards. Part of the process we employ is to first generate a checksum of the file we are going to copy, then copy the file, then run a checksum on the file to confirm that it copied OK. The problem is, is that files are large (70 GB+) and take a long time to complete. It's an issue since we will eventually be dealing with thousands of these files.
I would like to find a faster way to generate the checksum other than using the System.Security.Cryptography.MD5CryptoServiceProvider
I don't care if this means using a specialized hardware card, provided it works and is not to ungodly expensive. I would prefer to have a method of encoding that provided some feedback as to how far the process has gone along so I can display it like I do now.
The application is written in vb.net. I would prefer to be able to use it as component, library, reference within my application, but I'm willing to call an outside application if there is enough improvement in the speed of generating the checksum.
Needless to say, the checksum must be consistent and correct. :-)
Thank you in advance for your time and efforts,
Richard
I see one potential way to speed up this process: calculate the MD5 of the source file while performing the copy, not prior to it. This will reduce the number of times you'll need to read the entire file from 3 (source hash, copy, destination hash) to 2 (copy, destination hash).
The downside of this all is that you'll have to write your own copying code (as opposed to just relying on System.IO.File.Copy), and there's a non-zero chance that this will turn out to be slower in the end anyway than the 3-step process.
Other than that, I don't think there's much you can do here, as the entire process is I/O bound by design. You're spending most of your time reading/writing the file, and even at 100MB/s (a respectable I/O speed for your typical SATA drive), you'll do about 5.8GB/min at best.
With a modern processor, the overhead of calculating the MD5 (or anything else) doesn't factor into things very much, so speeding it up won't improve your overall throughput. Crypto accelerators in particular won't help you here, as unless the driver implementation is very efficient, they'll add more overhead due to context switches required to feed the data to the external card than they'll save.
What you do want to improve is the I/O speed. The .NET framework is already pretty efficient when it comes to this (using nicely-sized buffers, overlapped I/O and such), but it's possible an optimized native Windows application will perform better here. My advice: Google around for a few native MD5 calculators, and see how they compare to your current .NET implementation. If the difference in hash calculation speed is >10%, it's worth switching to using said external app.
The correct answer is to avoid using MD5. MD5 is a cryptographic hash function, designed to provide certain cryptographic features. For merely detecting accidental corruption, it is way over-engineered and slow. There are many faster checksums, the design of which can be understood by examining the literature of error detection and correction. Some common examples are the CRC checksums, of which CRC32 is very common, but you can also relatively easily compute 64 or 128 bit or even larger CRCs much much faster than an MD5 hash.

How important is size in an application?

When creating applications (Java, run on a normal computer). How important is program size for users? For example, would it be necessary to replace .png's with .jpg's, convert .wav's to .midi's, or strip down libraries to save space, or do users generally not care if my program is 5mb when it could be 50kb if stripped down?
Thanks.
That depends on the delivery mechanism.
Size is generally only relevant in terms of the bandwidth required to download it. If you download it often, then it matters a lot. If its only once, it matters less and you have to weigh up the time involved in reducing that vs how much space you save.
After that, nobody cares until you get into gigabytes. Well, mobile applications will probably start caring at about 10MB+.
Users definitely care (after all, not only does space cost money, but affects program load time). However, the question becomes how much do you optimize. I suggest the 80/20 rule. 80% of your benefit comes from the first 20% of the effort.
If you use a utility like TreePie you might be able to see what parts of a large application are consuming most of your resources. If you find it's just a few large images, or one big DLL with a bunch of embedded resources, it's probably worth taking a look at reducing the size, if it's easy.
But there's a cost/benefit tradeoff. I just saw a terrabyte drive for $100 the other day. Saving the user 1 gig is about 10 cents in terms of storage space, and perhaps some hard to quantify amount of time spent loading every time they load. If you have 100,000 users, it probably worth your time to optimize a bit, but if you're writing custom software for one user it's probably not worth it unless they're complaining.
As mentioned by Graham Lee, a great deal of this is very dependant on your users. If you are writing something that needs to be optimized to fit on the chip of a 68000 processor, then you'd better believe that program size matters. Assuming you're not programming 30 years ago, you probably won't run across that particular issue.
But in general, you should be making your application as small as possible while still achieving the quality you want. That is to say, if your application is likely to be viewed on an 640x480 screen, then you don't need hi-res 6mg pngs for all your images. On the other hand, if your application is designed to be blown up on a big screen at conferences, then you probably want to upsize your images.
Another option that is very common is creating installers with separate options ranging from full to minimal. That way you can allow your users to decide whether size matters to them. It allows you to create the pretty pretty version of your app, and a scaled back version that doesn't include tutorials or mp3 files of a soothing woman's voice telling you that you've push the wrong button.
Know your users. And if you don't, then let them decide for themselves.
Consider yourself, what would you use? Would you rather save space with 5KB programs or waste it with 5MB programs?
I think that smaller is better, especially if the program doesn't use/need much graphics and can be optimized.
I would say not important at all, unless it's obscenely large.
I would argue that startup time is far more important to users that application size.
However if you include a lot of media files with your system it is logical to optimise this data as much as possible. But don't compromise the quality - switching to jpeg might be okay for photos, but it sucks for technical diagrams. A .wav could be an .aac or .mp3, but not if you're writing a professional audio application.