I have to save some 100 sentences in my app. Each sentence is around 50-90 characters. I can create a .plist file for this. But i dont want to read the whole .plist file, just want to read a specific index. Is it possible?
There's no API for reading plist files partially. For 10kB files that's not much of a problem, just read it and discard what you don't need.
I am not sure whether it's possible to avoid reading whole file. But if you want to access a single entry then you can create an array from plist and access the index (i assume that you are already aware of this). It won't take much memory unless your app is already heavy loaded.
Related
I am trying to wrap my head around the PDF file structure. There is a header, a body with objects, a cross-reference table and a trailer. In the official PDF reference from Adobe, section 3.4.4 about file trailer, we can read that:
The trailer of a PDF file enables an application reading the file to quickly find the cross-reference table and certain special objects. Applications should read a PDF file from its end.
This looks very inefficient to me. I can't show anything to users this way (not even the first page) before I load the whole file. Well, to be precise, I can - if my file is linearized. But that is optional and means some extra overhead both when writing and reading such file.
Instead of that whole linearization thing, it would be easier to just put the references in front of the body (followed by objects on page 1, page2, page 3... ). But people in Adobe probably had their reasons to put it after it. I just don't see them. So...
Why is the cross-reference table placed after the body?
I would agree with the two reasons already mentioned, but not because of hardware limitations "back in the day", but rather scale. It's easy to think an invoice with a couple of pages of text could be done better differently, but what about a book, or a PDF with 1,000 photos?
With the trailer at the end you can write images/text/fonts to the file as they are processed and then discard them from memory while simply storing the file offset of each object to be used to write the trailer.
If the trailer had to come first then you would have to read (or even generate in the case of an embedded font) all of these objects just to get their size so you could write out the trailer, then write all the objects to the file. So you would either be reading, sizing, discarding, then reading again, or trying to hold everything in ram until you could write them to the file.
Write speed and ram are still issues we contend with today when we're running in a docker container on a VM on shared hardware..
PDF was invented back when hard drives were slow to write files... really s-l-o-w. By putting the xref at the end, you could quickly change a file by simply appending new objects and an updated xref to the end of the file rather than rewriting the whole thing.
Not only were the drives slow (giving rise to the argument in #joelgeraci's answer), also was there much less RAM available in a typical computer. Thus, when creating a pdf one had to write data to file early, much earlier than one had any idea how big the file or, as a consequence, the cross references would become. Writing the cross references at the end, therefore, was a natural consequence.
I have several pieces of data that need to be merged into one file (ATContentTypes blob file, Plone 4.1). The total amount of data is likely to be quite large so I really don't want to have to load it all into memory, concatenate it, and do something like o.setFile(data). If I were writing directly to the file system I could just do open(myfile, 'a') and write to it, but I'm not clear how I could do that with a blob supported content type. All of the docs and tests I've been able to look at just have it being set with a str or in-memory StringIO. Is there a way to append to this field without loading the whole thing into memory?
Similarly, I've also looked at using Dexterity with a plone.namedfile NamedBlobFile. It looks like that field just has a 'data' attribute that is basically a string. How could I append to that without loading the whole thing into memory?
It's quite old and the product has never been officially released, but it can help you: ore.bigfile.
It's well explained in this blog article: http://blog.jazkarta.com/2010/09/21/handling-large-files-in-plone-with-ore-bigfile/
I'm new in Apache Lucene.
Is it possible to store files (e.g. pdf, doc) in Apache Lucene and later on to retrieve it? Or if i have to store those files somewhere else and just use it for indexing?
Technically you can, of course, store the contents of a file (e.g. in the StoredField or elsewhere) but I don't see any reason why you should. This will simply bring no added value but pain while serializing and deserializing file contents - and you will still have to keep the file name indexed somewhere else. Apart from serialization/deserialization pain, your app will likely have to block longer while Lucene will be merging index segments.
The best approach IMO is to store the path to the file relative to some file repository root - e.g. if your file is in /home/users/bob/files/123/file.txt, you might want to store the files/123/file.txt part without tokenization (using StringField).
This is a bit of a two part question, for working with 40mb xml files.
• What’s a reasonable size to store in memory for a program running continually in the background?
• How to find what has changed in an XML file.
So on the first read the XML is loaded into NSData, then uploaded to the server.
Now instead of uploading a 40mb XML every time it changes, I would prefer to upload a “delta” file containing only what has changed. The program would monitor the file for change, and activate when it’s been modified. From what I can see, I would need to parse an old version of the xml file and parse the modified xml file, then compare them? Is it unreasonable to store 80mb in memory like this every time the file is modified?. Now I’m assuming that this has to be done with a DOM parser because I can’t see how you could compare two files like that with a SAX parser since it only has part of the file stored?
I'm a newbie at this so any help would be appreciated!
To compare two files:
There are many ways to do, (As file is to be considered, I may not be correct):
sdiff file1.xml file2.xml A unix command
You can use this command with apple script.
-[NSFileManager contentsEqualAtPath:andPath:]
This method checks to see if two files at given path are the same file, then compares their size, and finally compares their contents.
For other part:
What size is considered for background process, I dont think so, for an application it matters. You can save these into temporary files. Even safari uses 130+ MB as you can easily check through Activity monitor.
NSXMLParser ended up being the most useful for this
Is there any difference between writing an objects(ex array types and NSObject types) to plist and NSFileManager.
if so can i write 1000's of images data and mp3 songs to plist.
As a plist file is an XML (mostly text) file you must archive your data before writing and unarchive after reading (see NSKeyedArchiver here).
NSFileManager, however is a wrapper around generic filesystem operations and there is no need to marshall the data into text in order to store it. The stored data will therefore be much smaller, much quicker to read/write and is the obvious choice.
NSFileManager is used to do things like copy a file, remove a file, and move a file. You wouldn't use NSFileManager to write out a new file. It's only good for working with existing files. It isn't comparable to a plist in the sense of using a plist vs. NSFileManager.
However, if the question you're trying to answer is should I store all my data in a plist or in separate files then that depends. If you're going to store 1000s of images and mp3s, then you definitely do not want to store them all in a single plist. Plists are an inefficient format for storing and updating large amounts of information, both in terms of speed and memory. For example, if you wanted to update a single string in your plist, you have to read the entire plist into memory, update it, and then write the entire plist back to disk. You cannot update just a portion of the plist using the standard plist functions provided by Foundation. If your plist contains all your image and mp3 data, it's going to be really slow.
You may be able to get away with using a plist as a manifest for your images and mp3s which are stored as separate files on disk, but even that can get slow. I'd recommend using SQLite or Core Data instead for the manifest of files and then keeping each image and mp3 as a separate file in the file system. Or, if you don't need to store any metadata with each item, you don't need the manifest at all. If you do end up going with a plist for your manifest, make sure to save the plist as a binary plist using the NSPropertyListBinaryFormat_v1_0 option. This will make the plist take up less space on disk and de-serialize faster when you read it again.