What object to hold a large amount of text in? - smalltalk

I am planning a Seaside app to hold text, a single instance which may be up to, say, 5Mb. What kind of object is best for this?
I would also like to do some iterations over this text.
Thanks, Vince
Edit: Thanks for your replies thus far. The file is a CSV file that takes ~40 minutes to generate from a legacy finance system, so it must be pre-generated and stored. Each line is a customer record and I need to pull each one out and use the values as and when the customer logs in. Customer access is not predictable and interfacing with the legacy system to generate each line on the fly is a very last resort.

Given that the file takes that long to generate and that you need more-or-less random access to the file later on, I would opt for parsing the file and keeping the structured data in memory afterwards.
There is a CSV Parser project on Squeaksource that you can use. It will create a structured object tree of the CSV records that you can use.

Use an external Text file and some instance of a specific class as representation of that file. Use the oop of the object as the name of the file.

Just use a collection of customers, and fill it from the CSV, as Johan said. Depending on your accessing needs you can use a Dictionary or an OrderedCollection to hold it.

5 megs is nothing. Don't worry about that.
If you can't reify those CSV records into objects (after parsing and instantiating them), then a Collection of Strings or even Streams would be just fine.
If you need keyed lookup then a Dictionary or LookupTable would do the job.
I had 100 megs of text data in memory (1 millon rows), and even persisted in the image (image save) with no problems.
Regards.

Related

How to store huge amount of NSStrings for comparison purpose

I am writing a (linguistic) Morphology Mac Application. I often have to check if the Words in a given Text are in a huge List of Words (~1.000.000).
My Question is: How do i store these Lists ?
I use a .txt File to store the Words and create an NSSet from this File, which survives as long as the Application is launched.
I use a Database like SQLite.
Some points:
I think the focus should be on speed, because the analysis is triggered by the user and this comparisons make the largest part of the computation.
The Lists may change via updates.
I used CoreData and MySQL before, so (i think) i could realize both.
I have read a lot about the pro/cons of Database vs. File but i never thought its my usecase.
I dont know if its relevant which technik i use, because the size of these Files is relatively small (~20MB) and even with a lot of supported Languages, only 3-4 of this files will be loaded into memory at the same time.
Thanks! Danke!

ObjC: Write to .plist, optimized way?

Reading this article for instance, writing an NSDictionary to a plist looks pretty easy. Now, if I just want to modify one single row of an existing plist: Is there any way to update the row without re-writing the whole file?
Like "when updating a row in a database table, I don't have to rewrite the whole table"?
Thanks,
J.
First : plist is for very trivial data, so there should not be the question of optimisation of reading and writing it. As if you need to optimise that, that means you need to change your application design.
Second : plist is basically an xml file, so of course you can use xml parsers.
Theoretically, you could do modifications faster under the condition that the lengths of the string representation of your new value takes up precisely as many characters as that of the old value. However, this would be a bad case of micro-optimization: unlike a database where rewriting an entire table may take considerable time, rewriting a plist in its entirety should go unnoticed, as far as the timing is concerned. Your code to locate and overwrite the old value in a file would not be pretty by any standard. It has no chance of matching the clarity of a three-line "read-modify-write" code fragment that rewrites the whole thing.
This assumes that you are using your property list for its intended purpose, which is storing relatively small amounts of data. If you are using a plist to store moderate to large amounts of data, perhaps you need to switch to using a database, where updating individual rows is permitted.
Where is the sense in that?
Despite what has already been said, that plist, who use XML, is designed for small amounts of data and that it could work properly only when the new value does not make any change to the length (string representation within the XML) of your value, there is hardly any bargain to make.
Frankly I do not know what the cluster (or however the smallest fraction of allocable disk space is named in iOS' file system) size is but I would not expect it to be much smaller then the average size of a typcal plist file. In all file systems that I know a cluster will allways be written in one go. Even if the programmer just changes a byte and tries to write only that one byte back to the file, the OS will allways write the full cluster on disk (or whatever there is that we still call a disk :).
Meaning - even if you try hard, you will hardly gain anything from that attempt.
Unless of course, your data structure is rather big and probably complex for a plist file. In that case, as it has been said already, consider a change to the data storage method that you are using.
From an other answer:
You have:
a plist file
a container (for example, NSMutableArray) in which you load this plist
a table view which takes values from this container.
To modify plist through table view:
change value in container
write container back to plist file.

Store and Retrieve auto-fill text terms, iOS

I have a list of about 10,000 phrases (1-5 words each). When the user starts to type in the searchbar, I want to display a tableview that filters through these phrases to find matches. ie: it will function like auto-fill in your browser.
My question is: What is the best way to store this data? Should I just put it in an array that gets initialized when the user searches? Or should it be stored in an external file?
(I am working with iOS).
Thanks!
You could easily do it with an array, but the performance would be very poor.
It would be best to have it in a SQLite (or Core Data) database and search that.
I think having it in a file could be even worse performance than the array.
Save it in a SQLite or Core Data database. You could also use a .plist file, although that might take longer to read through.

plist, sqlite3, xml?

I am wondering about the pros and cons to using different datasets.
I have working code that uses all three of the following datasets.
One pulls an .xml file off of my server
One accesses a copy of a SQLite3 database from within the app's bundle (it's a copy so that I can add to it, delete from it, and save changes
One accesses data from a .plist.
My question is, now that I have some experience creating these datasets and displaying their data in an app, why/when would I use one over the other?
xml file off your server:
Pros: You can update the XML file at any time to provide new data to the user, good to send to other platforms
Cons: Requires a network connection, have to parse the XML into Obj-C objects, there's no way to modify one value in an XML file without rewriting the entire file, XML files need extra metadata for parsing into the proper Obj-C types
sqlite file within your bundle:
Pros: Good for large datasets; you can do queries, sorts and read partial data; you can rewrite or add one row at a time; good to send to other platforms
Cons: Have to convert sqlite data into Obj-C objects (I like fmdb for this), to update the data you need to submit your app to Apple and have it approved
plist:
Pros: Good for smallish datasets, easy to read plist into an Obj-C container
Cons: Bad for large datasets (more than 1000 or so items), no way to update only one value without rewriting the entire file, hard to send to other platforms, have to submit your app to Apple and have it approved
Note:
You can also put a file (any format) within your bundle and also check your server for a more recent version.

Search plist for integer

In an iOS app, I take a four digit code from the user and give them a corresponding string in a TextView. My question is, because there are about a thousand possible codes the user would enter that I am checking for, what is an efficient way to give a result without having a huge if or switch statement? Like, using a plist, txt file, or even database.... Thanks in advance
The decision of plist, text file or database is just a matter of storage, not search. Personally, I would just use a JSON file, since it's reasonably well-supported both by the human brain and by software. For searching, just put them in an NSDictionary and do a lookup on that. Unless your items are very big, 1000 items is not really a large dataset, even on a memory-constrained iPhone. Even if each item is 1 KB (which sounds a lot larger than the dataset you're describing), you're looking at less than a megabyte for the whole set.
If the strings happen to be long, then store the long text in a file and store the file URL in your lookup table instead of the whole string. IIRC, a URL is about 100 bytes on average, and an NSNumber is about 8, so you'd be looking at about 108 KB for the entire dataset.
Given the number of possible codes I would recommend Core Data. Alternatively you can use SQLite directly. You could use a plist, but I fear it would quickly become unmanageable as you add, remove, and update codes.