How to model a HPC queueing system with Objective-C

How to model a HPC queueing system with Objective-C - objective-c

I am trying to program an application for the mac to query a high performance computing cluster about its running and queued calculation jobs. The aim is to be able to monitor the submitted jobs if they are still queued and waiting for execution or if they are running and on which node or host in the cluster.
On the GUI side I would like to be able to display an NSTableView showing all submitted job and alternatively a second option to see all hosts in the cluster, how many and which jobs are running on each node.
The model objects themselves are not so hard to do, what bothers me most is the lifecycle and the ownership relations between the host and the job objects. This has to be well designed otherwise I will run into memory management problems.
Please note that I would like to program it without using CoreData if possible.
1. Possibility
The yellow queue object is the root object of my object graph and it owns all the host objects (has an NSArray of custom host objects). Each host object owns all the job object which are running on this host (also by having an NSArray of custom job objects). I think that there are two major problems with this approach:
where are all the job objects store which are still queued and are not already running on a host. They lack a parent host object.
How would one implement a NSTableView containing all the job objects?
2. Possibility
The yellow root object holds directly references to all job objects by having them stored in a NSArray. Each job has an instance variable retaining a host object. Again here are some problems
I would also have the hosts in the model which are currently idle, so no job is currently executed on them.
How would one implement the data source for a NSTableView showing all the hosts.
How does one make sure that there are no duplicate host objects, so that each host in the cluster is represented by one host object only.
My questions are:
1. Which of the two possibilities make most sense? Are there alternatives?
2. Would one better implement it with CoreData?
3. How would one manage the object lifecycle so that there are no retain cycles or dangling pointers.
Thank you

If your concerned about memory management, Core Data is the way to go. It's much more efficient memory wise and it manages memory for you in the vast majority of cases.
In general you manage memory by making sure that each individual class' instances clean up after themselves. Then you put the objects in a hierarchy such that as each level deallocates it automatically cleans up the objects under it.
As to the specific structure, it depends on the logic of the situation you are modeling. If the organization logically goes:
queue{
jobs{
host
}
}
... then you should mimic that in your data structure.
I strongly recommend you use Core Data. You will end up duplicating a lot Core Data functionality anyway if you implement this all by hand. Core Data was designed specifically to manage object graphs like this. That is its primary role. All the database stuff was tacked on as an after thought. There's no need to reinvent the wheel.

Related

Get value of control refnum in one step in SubVI

I'm trying to de-spaghetti a big UI by creating SubVIs that handle only the controls that are relevant, via control refnums.
Now, when extracting the code from the main VI and re-wiring into the subVIs, things get clutter-y.
To read/write these refnums, I have to do a two-step process. First add a terminal to get the control refnum value and then another to get the value of the control.
Wiring the refnums everywhere is not really an option as that will create more spaghetti if there are more than two of them. (usually 4-10)
Is there a better way?
UPDATE
Guys, this is a low-level question about the picture above, not really a queston about large scale architecture / design patterns. I'm using QMH, classes, et.al. where appropriate.
I just feel there should be a way to get the typed value from a typed control ref in one step. It feels kind of common.

In the caller VI, where the controls/indicators actually live, create all your references, then bundle them into clusters of relevant pieces. Pass the clusters into your subVIs, giving a given subVI only the cluster it needs. This both keeps your conpane cleaned up and and makes it clear the interface that each subVI is talking to. Instead of a cluster, you may want to create a LV class to further encapsulate and define the sub-UI operations, but that's generally only on larger projects where some components of the UI will be reused in other UIs.

I'm not sure there is a low-touch way to de-spaghetti a UI with lots of controls and indicators.
My suggestion is to rework the top-level VI into a queued message handler, which would allow you to decouple the user interaction from the application's response. In other words, rather than moving both the controls and the code that handles their changes to subVIs (as you're currently doing), this would keep the controls where they are (so you don't need to use ref nums and property nodes) and only move the code to subVIs.
This design pattern is built-in to recent versions of LabVIEW: navigate to File » Create Project to make LabVIEW generate a project you can evaluate. For more information about understanding how to extend and customize it, see this NI slide deck: Decisions Behind the Design of the
Queued Message Handler Template.

In general, it is not the best practice to read/write value using refnum in perspective of performance. It requires a thread swap to the UI thread each time (which is a heavy process), whereas the FP Terminal is privileged to be able to update the panel without switching execution threads and without mutex friction.
Using references to access value
Requires to update the front panel item every single time they are called.
They are a pass by reference function as opposed to a pass by value function. This means they are essentially pointers to specific memory locations. The pointers must be de-referenced, and then the value in memory updated. The process of de-referencing the variables causes them to be slower than Controls/Indicators, or Local Variables.
Property Nodes cause the front panel of a SubVI to remain in memory, which increases memory use. If the front panel of a SubVI is not displayed, remove property nodes to decrease memory use.
If after this you want to use this method you can use VI scripting to speed up the process: http://sine.ni.com/nips/cds/view/p/lang/en/nid/209110

NSManagedObject as store with continuous analysis of raw data

This is similar to a question I asked before, but now that I've come much further along I still have a question about "proper" subclassing of NSManagedObject as I was told last night that it's a "bad idea" to put lots of non-persisted properties and ivars inside one. Currently I have TONS of code inside my NSManagedObject, and Apple's docs don't really address the "rightness" of that. FYI: the code works, but I'm asking if there are pitfalls ahead, or if there are obvious improvements to doing it another way.
My "object" is a continuously growing array of incoming data, the properties/ivars that track the progress of the analysis of that data, and the processed data (output). All of this is stored in memory because it grows huge, very quickly, and would not be possible to re-generate/re-analyze continuously. The NSManagedObject properties that are actually persisted are just the raw data (regularly saved, as Core Data doesn't support NSMutableData), a few basic properties and 2 relationships to other NSManagedObjects (1 being a user, the other being a set of snapshots of the data). Only one object is being recorded to at any one time, although dozens can be opened for viewing (which may involve further processing at any time).
It's not possible to have the object that inserts the entity (data manager that manages Core Data) have all of the processing logic/variables inside it, as each object necessitates at least a handful of arrays/properties that are used as intermediaries and tracking values for the analysis. And I, personally, think that it sounds silly to create two objects for each object that is being used (the NSManagedObject that is the store, and another object that is the processing/temp store).
Basically, all of the examples I can find using NSManagedObjects have super simple objects that are things like coordinates, address book entries, pictures: stuff that is basically static. In that case I can see having all of the logic that creates/modifies them outside the object. However, my case is not that simple and I have yet to come up with an alternative that doesn't involve duplication.
Any suggestions would be appreciated.

You might use a 'wrapper', that is to say a class with a reference to one of your managed object instances, this wrapper would contain your algorithms and your non persisted algorithms.

Is it possible to reuse an NSKeyedUnarchiver?

I'm working on an iOS game and I'm using the NSCoding protocol to save my levels in my editor and to load them in game. I was wondering if it was possible to somehow reuse an NSKeyedUnarchiver after it's been used to load the level. For instance when the player wan't to restart the level. I can't simply create and load a new instance of the level, because I want to keep the same objects, just reset their properties.

You can re-use the data, which is passed down to decoder. You cannot "reset" the existing objects to initial state, though.
While you can do this on your own, I'd suggest to just invalidate the whole tree of objects and re-load them from possibly cached data.
That surely depends on the number of objects, but if you have enough of them for the process to be visibly slow, I believe you have lots of other more important optimisations to do.

Copying pending changes between NSManagedObjectContexts with shared persistent store?

I have two instances of NSManagedObjectContext: one is used in main thread and the other is used in a background thread (via an NSOperation.) For thread safety, these two contexts only share an NSPersistentStoreCoordinator.
The problem I'm having is with pending changes in the first context (on the main thread) not being available to the second context until a -save is performed. This is understandable since the shared persistent store won't have copies of the NSManagedObjects being tracked by -insertedObjects, -updatedObjects, and -deletedObjects are persisted.
Unfortunately, this presents a problem with the user experience: any unsaved changes won't appear in the (time consuming) reports that are generated in the background thread.
The only solution I can think of is nasty: take the inserted, updated and deleted objects from the first context and graft them onto the object graph of the second context. There are some pretty complex relations in the dataset, so I'm hesitant to go in this direction. I'm hoping someone here as a better solution.

If this is under 10.7 there are some solutions: one is you can have nested ManagedObjectContexts, so you can “save” in the one being modified and it won’t save all the way to the disk, but it will make the changes available to other children of the master context.
Before 10.7 you will probably have to copy the changes over yourself. This isn’t super-hard since you can just have a single object listen for NSManagedObjectContextObjectsDidChangeNotification and then just re-apply the changes exactly from the main context. (Should be about 20 lines of code.) You never have to save this second context I assume?

Not sure if you have any OS restraints but in iOS 5 / Mac OS 10.7 you can use nested managed object contexts to accomplish this. I believe a child context is able to pull in unsaved changes in the parent by simply doing a new fetch.
Edit: Looks like Wil beat me to it but yeah, prior to iOS 5 / Mac OS 10.7 you'll have to listen for the NSManagedObjectContextDidSaveNotification and take a look at the userInfo dictionary for the added/updated/deleted objects.

An alternate solution might involve using a single managed object context and providing your own thread safety over access to it, or use the context's lock and unlock methods.

I would try to make the main thread do a normal save so the second context can just merge the changes into his context. "fighting" a APIs intended use is never an good idea.
You could mark the newly saved record with an attribute as intermediate and delete later if the user finally cancels the edit.
Solving those problems with attributes in your entities and querying in the background thread with a matching predicated would be easy...
And that would be a stable solution as well. I am coming from a database driven world (oracle) we often use such patterns (status attributes in records) to make data visible/invisible to other DB sessions (which would equal to threads in an cocoa app). Works always without problems. Other threads /sessions do always only see commited changes that's how most RDBMS work.

Context pattern? Why does Core Data need it?

I'm still fairly new to Core Data and am trying to understand why it requires the passing around of a NSManagedObjectContext. As I understand it, passing the context is needed so that multiple threads don't affect the same context, but I was also under the impression that this pattern is sometimes considered an antipattern, as noted here.
Could Core Data theoretically be implemented in a thread safe way that would avoid using this pattern? How do other ORMs (such as Ruby's ActiveRecord for example) avoid this pattern? For example, couldn't CoreData implement a per-NSManagedObject saving method such as in this extension. This light framework doesn't handle multithreading, but couldn't NSManagedObjects use some kind of internal GCD queue(s) to support it, with an internal context they don't expose?
Sorry if I'm missing anything major.

The NSManagedObjectContext is the in-memory container of your applications object graph, just as the persistent store (XML, SQLite, etc.) usually represents the on disk container of your object graph.
There are some advantages to this approach:
Faulting can be applied to a set of objects, or in the case of CoreData the entire object graph
It's a convenient abstraction for forcing the application to batch it's I/O.
It provides a single point of contact for efficiently performing operations over the entire object graph (NSFetchRequests, etc.)
Undo can be applied to the object graph, not just individual objects.
It's also important to remember that CoreData is not an ORM framework, it's an object persistence framework. The primary responsibility of CoreData is to make accessing data stored in a persistent format on disk more efficient. However it doesn't attempt to emulate the functionality of relational databases.
To your point about concurrency, new concurrency models have been introduced in the upcoming release of Mac OSX. You can read more about that at developer.apple.com.
In the abstract though, the concurrency model chosen for a managed object context has more to do with the specifics of an individual application than the context pattern itself. Instances of NSManagedObjectContext should generally never be shared between threads.
In the same way that each thread requires it's own instance of NSAutoReleasePool, each thread should also have it's own MOC. That way, when the thread is done executing, it can commit it's changes to the store on disk and then release the context, freeing up all the memory consumed by objects processed on the thread.
This is a much more efficient paradigm than allowing a single context to continuously consume system resources during the lifecycle of a given application. Of course, this can be done by invoking -reset on the context as well, which will cause all of the NSManagedObject's in use by the context to be turned back in to faults.

You need one NSManagedObjectContext per thread. So you would have one to fill your UI on the main thread and for longer operations you would have another for each background thread. If you want results to be merged in from those other threads then there is a notification you can subscribe to that provides something to quickly merge what was changed into your main MOC.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas