NSManagedObject as store with continuous analysis of raw data - objective-c

This is similar to a question I asked before, but now that I've come much further along I still have a question about "proper" subclassing of NSManagedObject as I was told last night that it's a "bad idea" to put lots of non-persisted properties and ivars inside one. Currently I have TONS of code inside my NSManagedObject, and Apple's docs don't really address the "rightness" of that. FYI: the code works, but I'm asking if there are pitfalls ahead, or if there are obvious improvements to doing it another way.
My "object" is a continuously growing array of incoming data, the properties/ivars that track the progress of the analysis of that data, and the processed data (output). All of this is stored in memory because it grows huge, very quickly, and would not be possible to re-generate/re-analyze continuously. The NSManagedObject properties that are actually persisted are just the raw data (regularly saved, as Core Data doesn't support NSMutableData), a few basic properties and 2 relationships to other NSManagedObjects (1 being a user, the other being a set of snapshots of the data). Only one object is being recorded to at any one time, although dozens can be opened for viewing (which may involve further processing at any time).
It's not possible to have the object that inserts the entity (data manager that manages Core Data) have all of the processing logic/variables inside it, as each object necessitates at least a handful of arrays/properties that are used as intermediaries and tracking values for the analysis. And I, personally, think that it sounds silly to create two objects for each object that is being used (the NSManagedObject that is the store, and another object that is the processing/temp store).
Basically, all of the examples I can find using NSManagedObjects have super simple objects that are things like coordinates, address book entries, pictures: stuff that is basically static. In that case I can see having all of the logic that creates/modifies them outside the object. However, my case is not that simple and I have yet to come up with an alternative that doesn't involve duplication.
Any suggestions would be appreciated.

You might use a 'wrapper', that is to say a class with a reference to one of your managed object instances, this wrapper would contain your algorithms and your non persisted algorithms.

Related

Tracking model object attributes changes (dirty) in Cocoa

I'm trying to gain insight into the least overhead solution to tracking model object changes in Cocoa.
As I see it there are 3 options:
Use Core Data – lot's of functionality exists for monitoring model object changes (Core Data NSManagedObject - tracking if attribute was changed). I don't know what the overhead of Core Data's management infrastructure is compared to other approaches but it's well established architecture for multi-threading support is a plus. For cross-platform devs there is some downside in not having a readily accessible schema but there are ways around that issue.
Write custom accessors that mark the object as dirty when updating a field with a new value. I've been using this technique with mixed success for quite some time. There are some sticky issues to deal with when sharing objects across threads. You also don't get the benefits of enhancements to automatic synthesis of attributes, etc. You do, however, have greater control of your data store than when using Core Data which can be of benefit (eg. certain operations can be done in a SQL store across many objects in a much more efficient way). Note: There could be a lot of variation here depending on how you write the accessors. For the sake of conversation let's assume setters make a check of the new value against the old one, make appropriate calls to KVO (willChange / didChange), and set a boolean flag (all within synchronization of course).
Use KVO to monitor object fields (ala keyPathsForValuesAffectingValueForKey:) and mark the object as dirty in the KVO callout. I have yet to use this method but it seems like a decent approach. The obvious downside would be the callout every time a setter is called.
I am inclined to think that option 2 has the lowest overhead (in terms of raw processing requirements) given that Core Data and KVO both have some additional overhead either in the generated accessors or in the KVO callouts. The question is, how substantial is the overhead?
And lastly, did I miss an option?
Thanks.

Whether to put method code in a VB.Net data storage class, or put it in a separate class?

TLDR summary: (a) Should I include (lengthy) method code in classes which may spawn multiple objects at runtime, (b) does doing so cause memory usage bloat, (c) if so should I "outsource" the code to a class that is loaded only once and have the class methods call that, or alternatively (d) does the code get loaded only once with the object definition anyway and I'm worrying about nothing?
........
I don't know whether there's a good answer to this but if there is I haven't found it yet by searching in the usual places.
In my VB.Net (2010 if it matters) WinForms project I have about a dozen or so class objects in an object model. Some of these are pretty simple and do little more than act as data storage repositories. The ones further up the object model, however, have an increasing number of methods. There can be a significant number of higher level objects in use though the exact number will be runtime dependent so I can't be more precise than that.
As I was writing the method code for one of the top level ones I noticed that it was starting to get quite lengthy.
Memory optimisation is something of a lost art given how much memory the average PC has these days but I don't want to make my application a resource hog. So my questions for anyone who knows .Net way better than I do (of which there will be many) are:
Is the code loaded into memory with each instance of the class that's created?
Alternatively is it loaded only once with the definition of the class, and all derived objects just refer to that definition? (I'm not really sure how that could be possible given that, for example, event handlers can be assigned dynamically, but no harm asking.)
If the answer to the first one is yes, would it be more efficient to write the code in a "utility" object which is loaded only once and called from the real class' methods?
Any thoughts appreciated.
Go with whichever is going to be the easier to maintain codebase (shorter methods, etc). That is the more important cost with anything that has increasing complexity.
Memory optimization is only a problem if its a problem. 12 classes is really nothing, when you have hundreds of instances of hundreds of classes, then it may become a problem.
The short answer, it doesn't matter. Your data is stored in memory but your code is loaded only once.
EDIT: I guess I need a longer answer.
If you have 10 instances of a class, the variables that are part of that instance all take up thier own memory space. So if you have 10 properties, variables, etc, that means you have 100(ish) items in your memory. As for your code, it was loaded just once with your assembly. If you create 10 instances of your class, your code is not in memory 10 times.

Should I create subclass NSManagedObject or not?

I have spent a few days learning and writing NSCoding and finally got it working. However, it took very long to archive and unarchive the (quite complex) object graph, which is unacceptable. After searching the internet for some time, I think the better way is to use core data.
Do you recommend that 1) I should rewrite all my classes as subclasses of NSManagedObject or 2) should I create an instance variable of NSManagedObject in each of my class so that any changes to the class also updates its core data representation? Doing either way will need significant changes to the exiting classes and I think I have to update lots of unit test cases as well if it changes the way the classes are initialized.
What do you recommend? I really don't want to head to the wrong approach again...
Thanks!
I would recommend 1), if you use Core Data.
2) doesn't make much sense. For example, say A* a1 and A* a2 refer to the same B* b. If A and B are subclasses of NSManagedObject, this relationship can be easily saved to and then be retrieved from the file. But if A and B have NSManagedObject instances as ivars, how do you maintain this relationship that two As refer to one B? You will be forced to write a whole lot of glue codes, which are basically provided by the Core Data APIs.
If you decide to use Core Data, one very important advice I can give is to read Apple's documentations very, very carefully from the start to the finish, and to resist the urge to write codes from the day one. Core Data is a rather big set of API, and a good grasp of the whole structure before starting to write codes will save you many days afterwards.

Encoding Large Object Graph Using NSKeyedArchiver Eats Memory

I'm using a NSKeyedArchiver to encode a big object graph (76295 objects.)
It takes a lot of time but even worse the NSKeyedArchiver doesn't give back all its memory.
After using the Leak check, the code doesn't appear to leak at all but for some reason the encoding doesn't give back all memory after it's done.
Calling the encode method several times does make it worse, more and more memory is being eaten away.
You got any suggestions I could like at?
P.S. A database (sqlite) or CoreData aren't alternatives because they seem to scale very poorly with an big object graph like the one mentioned above.
I would prefer a solution using NSKeyedArchiver
NSKeyedArchiver doesn't actually encode objects. It simply walks the graph and calls the encode methods of each instances in the graph. Therefore, the most likely source of a leak is inside a custom coder method that you wrote to archive one of your custom classes. You might want to do test archives of individual classes to see if one of them leaks.
The problem with using NSKeyedArchiver to archive a large complex graph is that the entire graph is piled into memory at once. One leak in one classes coder can explode if a lot of instances of that class are archived. If you have 76,000+ objects and just a couple of thousand leak a few bytes each, that will add up in a hurry.
I must add that I have never encountered or even read of a situation in which Core Data did not perform better than an archive for complex graphs regardless of size. Core Data was created specifically to handle just this sort of problem.
If you've tried Core Data and it bogged down, it is probably because you used to much entity inheritance. Since Core Data uses single table inheritance on the store side in SQL, all the descendants of an entity end up in the same table and that bogs things down. Remember, entities are separate from the classes they model so you can have class inheritance without having entity inheritance. That gives you the coding advantages of inheritance without the speed penalty of entity inheritance.
Seems the memory is given back to the system slowly at a later interval.
So no real memory leak is there.
For anyone else: try look at CoreData or sqlite3 directly.
If you use sqlite3 directly make sure you encapsulate the complete set of queries in an transaction; it will dramatically increase data throughput.
More info regarding sqlite speed optimization:
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

Reading a pointer from XML without being sure the relevant Obj-C instance exists

I have a "parent" Obj-C object containing (in a collection) a bunch of objects whose instance variables point to one another, possibly circularly (fear not, no retaining going on between these "siblings"). I write the parent object to XML, which of course involves (among other things) writing out its "children", in no particular order, and due to the possible circularity, I replace these references between the children with unique IDs that each child has.
The problem is reading this XML back in... as I create one "child", I come across an ID, but there's no guarantee the object it refers to has been created yet. Since the references are possibly circular, there isn't even an order in which to read them that solves this problem.
What do I do? My current solution is to replace (in the actual instance variables) the references with strings containing the unique IDs. This is nasty, though, because to use these instance variables, instead of something like [oneObject aSibling] I now have to do something like [theParent childWithID:[oneObject aSiblingID]]. I suppose I could create an aSibling method to simplify things, but it feels like there's a cleaner way than all this. Is there?
This sounds an awful lot like you are re-inventing NSCoding as it handles circular references, etc... Now, there might be a good reason to re-invent that wheel. Only you can answer that question.
In any case, sounds like you want a two pass unarchival process.
Pass 1: Grab all the objects out of the backing store and reconstitute. As each object comes out, shove it in a dictionary or map with the UID as the key. Whenever an object contains a UID, register the object as needing to be fixed up; add it to a set or array that you keep around during unarchival.
Pass 2: Walk the set or array of objects that need to be fixed up and fix 'em up, replacing the UIDs with objects from the map you built in pass #1.
I hit a bit of parse error on that last paragraph. Assuming your classes are sensibly declared, they ought to be able to repair themselves on the fly.
(All things considered, this is exactly the kind of data structure that is much easier to implement in a GC'd environment. If you are targeting Mac OS X, not the iPhone, turning on GC is going to make your life easier, most likely)
Java's serialization process does much the same thing. Every object it writes out, it puts in a 'previously seen objects' table. When it comes to writing out a subsequent reference, if it's seen the object before, then it writes out a code which indicates that it's a previously seen object from the list. When the reverse occurs, whenever it sees such a reference, it replaces it on the fly with the instance before.
That approach means that you don't have to use this map for all instances, but rather the substitution happens only for objects you've seen a second time. However, you still need to be able to uniquely reference the first instance you've got written, whether by some pointer to a part in the data structure or not is dependent on what you're writing.