Stamping / Tagging / Branding Object Instances - .net-4.0

I have a routine which accepts an object and does some processing on it. The objects may or may-not be mutable.
void CommandProcessor(ICommand command) {
// do a lot of things
}
There is a probability that the same command instance loops back in the processor. Things turn nasty when that happens. I want to detect these return visitors and prevent them from being processed. question is how can I do that transparently i.e. without disturbing the object themselves.
here is what i tried
Added a property Boolean Visited {get, set} on the ICommand.
I dont like this because the logic of one module shows up in other. The ShutdownCommand is concerned with shutting down, not with the bookkeeping. Also an EatIceCreamCommand may always return False in a hope to get more. Some non-mutable objects have outright problems with a setter.
privately maintain a lookup table of all processed instances. when an object comes first check against the list.
I dont like this either. (1) performance. the lookup table grows large. we need to do liner search to match instances. (2) cant rely on hashcode. the object may forge a different hashcode from time to time. (3) keeping the objects in a list prevents them from being garbage collected.
I need a way to put some invisible marker on the instance (of ICommand) which only my code can see. currently i dont discriminate between the invocations. just pray the same instances dont come back. does anyone have a better idea to implement this functionality..?

Assuming you can't stop this from happening just logically (try to cut out the loop) I would go for a HashSet of commands that you've already seen.
Even if the objects are violating the contracts of HashCode and Equals (which I would view as a problem to start with) you can create your own IEqualityComparer<ICommand> which uses System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode to call Object.GetHashCode non-virtually. The Equals method would just test for reference identity. So your pool would contain distinct instances without caring whether or how the commands override Equals and GetHashCode.
That just leaves the problem of accumulating garbage. Assuming you don't have the option of purging the pool periodically, you could use WeakReference<T> (or the non-generic WeakReference class for .NET 4) to avoid retaining objects. You would then find all "dead" weak references every so often to prevent even accumulating those. (Your comparer would actually be an IEqualityComparer<WeakReference<T>> in this case, comparing the targets of the weak references for identity.)
It's not particularly elegant, but I'd argue that's inherent in the design - you need processing a command to change state somewhere, and an immutable object can't change state by definition, so you need the state outside the command. A hash set seems a fairly reasonable approach for that, and hopefully I've made it clear how you can avoid all three of the problems you mentioned.
EDIT: One thing I hadn't considered is that using WeakReference<T> makes it hard to remove entries - when the original value is garbage collected, you're not going to be able to find its hash code any more. You may well need to just create a new HashSet with the still-alive entries. Or use your own LRU cache, as mentioned in comments.

Related

Do we need to lock the immutable list in kotlin?

var list = listOf("one", "two", "three")
fun One() {
list.forEach { result ->
/// Does something here
}
}
fun Two() {
list = listOf("four", "five", "six")
}
Can function One() and Two() run simultaneously? Do they need to be protected by locks?
No, you dont need to lock the variable. Even if the function One() still runs while you change the variable, the forEach function is running for the first list. What could happen is that the assignment in Two() happens before the forEach function is called, but the forEach would either loop over one or the other list and not switch due to the assignment
if you had a println(result) in your forEach, your program would output either
one
two
three
or
four
five
six
dependent on if the assignment happens first or the forEach method is started.
what will NOT happen is something like
one
two
five
six
Can function One() and Two() run simultaneously?
There are two ways that that could happen:
One of those functions could call the other.  This could happen directly (where the code represented by // Does something here in One()⁽¹⁾ explicitly calls Two()), or indirectly (it could call something else which ends up calling Two() — or maybe the list property has a custom setter which does something that calls One()).
One thread could be running One() while a different thread is running Two().  This could happen if your program launches a new thread directly, or a library or framework could do so.  For example, GUI frameworks tend to have one thread for dispatching events, and others for doing work that could take time; and web server frameworks tend to use different threads for servicing different requests.
If neither of those could apply, then there would be no opportunity for the functions to run simultaneously.
Do they need to be protected by locks?
If there's any possibility of them being run on multiple threads, then yes, they need to be protected somehow.
99.999% of the time, the code would do exactly what you'd expect; you'd either see the old list or the new one.  However, there's a tiny but non-zero chance that it would behave strangely — anything from giving slightly wrong results to crashing.  (The risk depends on things like the OS, CPU/cache topology, and how heavily loaded the system is.)
Explaining exactly why is hard, though, because at a low level the Java Virtual Machine⁽²⁾ does an awful lot of stuff that you don't see.  In particular, to improve performance it can re-order operations within certain limits, as long as the end result is the same — as seen from that thread.  Things may look very different from other threads — which can make it really hard to reason about multi-threaded code!
Let me try to describe one possible scenario…
Suppose Thread A is running One() on one CPU core, and Thread B is running Two() on another core, and that each core has its own cache memory.⁽³⁾
Thread B will create a List instance (holding references to strings from the constant pool), and assign it to the list property; both the object and the property are likely to be written to its cache first.  Those cache lines will then get flushed back to main memory — but there's no guarantee about when, nor about the order in which that happens.  Suppose the list reference gets flushed first; at that point, main memory will have the new list reference pointing to a fresh area of memory where the new object will go — but since the new object itself hasn't been flushed yet, who knows what's there now?
So if Thread A starts running One() at that precise moment, it will get the new list reference⁽⁴⁾, but when it tries to iterate through the list, it won't see the new strings.  It might see the initial (empty) state of the list object before it was constructed, or part-way through construction⁽⁵⁾.  (I don't know whether it's possible for it to see any of the values that were in those memory locations before the list was created; if so, those might represent an entirely different type of object, or even not a valid object at all, which would be likely to cause an exception or error of some kind.)
In any case, if multiple threads are involved, it's possible for one to see list holding neither the original list nor the new one.
So, if you want your code to be robust and not fail occasionally⁽⁶⁾, then you have to protect against such concurrency issues.
Using #Synchronized and #Volatile is traditional, as is using explicit locks.  (In this particular case, I think that making list volatile would fix the problem.)
But those low-level constructs are fiddly and hard to use well; luckily, in many situations there are better options.  The example in this question has been simplified too much to judge what might work well (that's the down-side of minimal examples!), but work queues, actors, executors, latches, semaphores, and of course Kotlin's coroutines are all useful abstractions for handling concurrency more safely.
Ultimately, concurrency is a hard topic, with a lot of gotchas and things that don't behave as you'd expect.
There are many source of further information, such as:
These other questions cover some of the issues.
Chapter 17: Threads And Locks from the Java Language Specification is the ultimate reference on how the JVM behaves.  In particular, it describes what's needed to ensure a happens-before relationship that will ensure full visibility.
Oracle has a tutorial on concurrency in Java; much of this applies to Kotlin too.
The java.util.concurrent package has many useful classes, and its summary discusses some of these issues.
Concurrent Programming In Java: Design Principles And Patterns by Doug Lea was at one time the best guide to handling concurrency, and these excerpts discuss the Java memory model.
Wikipedia also covers the Java memory model
(1) According to Kotlin coding conventions, function names should start with a lower-case letter; that makes them easier to distinguish from class/object names.
(2) In this answer I'm assuming Kotlin/JVM.  Similar risks are likely apply to other platforms too, though the details differ.
(3) This is of course a simplification; there may be multiple levels of caching, some of which may be shared between cores/processors; and some systems have hardware which tries to ensure that the caches are consistent…
(4) References themselves are atomic, so a thread will either see the old reference or the new one — it can't see a bit-pattern comprising parts of the old and new ones, pointing somewhere completely random.  So that's one problem we don't have!
(5) Although the reference is immutable, the object gets mutated during construction, so it might be in an inconsistent state.
(6) And the more heavily loaded your system is, the more likely it is for concurrency issues to occur, which means that things will probably fail at the worst possible time!

Manipulating Objects in Methods instead of returning new Objects?

Let’s say I have a method that populates a list with some kind of objects. What are the advantages and disadvantages of following method designs?
void populate (ArrayList<String> list, other parameters ...)
ArrayList<String> populate(other parameters ...)
Which one I should prefer?
This looks like a general issue about method design but I couldn't find a satisfying answer on google, probably for not using the right keywords.
The second one seems more functional and thread safe to me. I'd prefer it in most cases. (Like every rule, there are exceptions.)
The owner of the populate method could return an immutable List (why ArrayList?).
It's also thread safe if there is no state modified in the populate method. Only passed in parameters are used, and these can also be immutable.
Other than what #duffymo mentioned, the second one is easier to understand, thus use: it is obvious what its input and output is.
Advantages to the in-out parameter:
You don't have to create as many objects. In languages like C or C++, where allocation and deallocation can be expensive, that can be a plus. In Java/C#, not so much -- GC makes allocation cheap and deallocation all but invisible, so creating objects isn't as big a deal. (You still shouldn't create them willy-nilly, but if you need one, the overhead isn't as bad as in some manual-allocation languages.)
You get to specify the type of the list. Potential plus if you need to pass that array to some other code you don't control later.
Disadvantages:
Readability issues.
In almost all languages that support function arguments, the first case is assumed to mean "do something with the entries in this list". Modifying args violates the Priciple of Least Astonishment. The second is assumed to mean "give me a list of stuff", which is what you're after.
Every time you say "ArrayList", or even "List", you take away a bit of flexibility. You add some overhead to your API. What if i don't want to create an ArrayList before calling your method? I shouldn't have to, if the method's whole purpose in life is to return me some entries. That's the API's job.
Encapsulation issues:
The method being passed a list to fill can't assume anything about that list (even that it's a list at all; it could be null).
The method passing the list can't guarantee anything about what the method does with it. If it's working correctly, sure, the API docs can say "this method won't destroy existing entries". But considering the chance of bugs, that may not be worth trusting. At least if the method returns its own list, the caller doesn't have to worry about what was in it before. And it doesn't have to worry about a bug from a thousand miles away corrupting data it should never have affected.
Thread safety issues.
The list could be locked by another thread, meaning if we try and lock on it now it could potentially lock up the app.
Or, if not locked, it could still be modified by another thread, in which case we're no less screwed. Unless you're going to write extra code to handle concurrent-modification exceptions everywhere.
Returning a new list means every call to the method can have its own list. No thread can mess with another thread's return value, unless the class is very badly designed.
Side point: Being able to specify the type of the list often leads to dependencies on the type of the list. Notice how you're passing ArrayLists around everywhere. You're painting yourself into corners by saying "This is an ArrayList" when you don't need to, but when you're passing it to a dozen methods, that's a dozen methods you'll have to change. (Not entirely related, but only slightly tangential. You could change the types to List rather than ArrayList and get rid of this. But the more you're passing that list around, the more places you'll need to change.)
Short version: Unless you have a damn good reason, use the first syntax only if you're using the existing contents of the list in your method. IE: if you're modifying it, or doing something with the existing values. If you intend to return a list of entries, then return a List of entries.
The second method is the preferred way for many reasons.
primarily because the function signature is more clear and shows what its intentions are.
It is actually recommended that you NEVER change the value of a parameter that is passed in to a function unless you explicitly mark it as an "out" parameter.
it will also be easier to use in expressions
and it will be easier to change in the future. including taking it to a more functional approach (for threading, etc.) if you would like to

When writing a game, should you make objects/enemies/etc. have unique ID numbers?

I have recently encountered some issues with merely passing references to objects/enemies in a game I am making, and am wondering if I am using the wrong approach.
The main issue I have is disposing of enemies and objects, when other enemies or players may still have links to them.
For example, if you have a Rabbit, and a Wolf, the Wolf may have selected the Rabbit to be its target. What I am doing, is the wolf has a GameObject Target = null; and when it decides it is hungry, the Target becomes the Rabbit.
If the Rabbit then dies, such as another wolf killing it, it cannot be removed from the game properly because this wolf still has a reference to it.
In addition, if you are using a decoupled approach, the rabbit could hit by lightning, reducing its health to below zero. When it next updates itself, it realises it has died, and is removed from the game... but there is no way to update everything that is interested in it.
If you gave every enemy a unique ID, you could simply use references to that instead, and use a central lookup class that handled it. If the monster died, the lookup class could remove it from its own index, and subsequently anything trying to access it would be informed that it's dead, and then they could act accordingly.
Any thoughts on this?
One possible approach is to have objects register an interest with the object they're tracking. So the tracked object can inform the trackers of state changes dynamically. e.g. the Wolf registers with the Rabbit (that has a list of interested parties), and those parties are notified by the Rabbit whenever there's a state change.
This approach means that each object knows about its clients and that state is directly tied to that object (and not in some third-party manager class).
This is essentially the Observer pattern.
You approach sounds reasonable, why not? Registering all your objects in a hashmap shouldn't be too expensive. You could then have sort of an event bus where objects could register for different events.
Other than that, there is another approach coming to my mind. You could have the rabbit expose the event directly and have the wolf register on it.
The second approach is appealing for it's simplicity, however it will to some extend couple the event publishers to the subscribers. The first approach is technically more complex but has the benefit of allowing other kind of lookups too.
In practice I hardly ever find situations where I ever need to hold a reference or pointer to game objects from other game objects. There are a few however, such as the targeting example you give, and in those situations that's where unique ID numbers work great.
I suppose you could use the observer pattern for such things to ensure that references get cleared when necessary, but I think that will start to get messy if you need more than 1 reference per object, for example. You might have a target gameobject, you might have gameobjects in your current group, you might be following a gameobject, talking to one, fighting one, etc. This probably means your observing object needs to have a monolithic clean-up function that checks all the outgoing object references and resets them.
I personally think it's easier just to use an ID and validate the object's continued existence at the point of use, although the price is a bit of boilerplate code to do that and the performance cost of the lookup each time.
References only work while design stays monolithic.
First, passing references to other modules (notably, scripting) leads to security and technical problems.
Second, if you want to extend existing object by implementing some behavior and related properties in a new module - you won't have a single reference for all occasions.

Passing object references needlessly through a middleman

I often find myself needing reference to an object that is several objects away, or so it seems. The options I see are passing a reference through a middle-man or just making something available statically. I understand the danger of global scope, but passing a reference through an object that does nothing with it feels ridiculous. I'm okay with a little bit passing around, I suppose. I suspect there's a line to be drawn somewhere.
Does anyone have insight on where to draw this line?
Or a good way to deal with the problem of distributing references amongst dependent objects?
Use the Law of Demeter (with moderation and good taste, not dogmatically). If you're coding a.b.c.d.e, something IS wrong -- you've nailed forevermore the implementation of a to have a b which has a c which... EEP!-) One or at the most two dots is the maximum you should be using. But the alternative is NOT to plump things into globals (and ensure thread-unsafe, buggy, hard-to-maintain code!), it is to have each object "surface" those characteristics it is designed to maintain as part of its interface to clients going forward, instead of just letting poor clients go through such undending chains of nested refs!
This smells of an abstraction that may need some improvement. You seem to be violating the Law of Demeter.
In some cases a global isn't too bad.
Consider, you're probably programming against an operating system's API. That's full of globals, you can probably access a file or the registry, write to the console. Look up a window handle. You can do loads of stuff to access state that is global across the whole computer, or even across the internet... and you don't have to pass a single reference to your class to access it. All this stuff is global if you access the OS's API.
So, when you consider the number of global things that often exist, a global in your own program probably isn't as bad as many people try and make out and scream about.
However, if you want to have very nice OO code that is all unit testable, I suppose you should be writing wrapper classes around any access to globals whether they come from the OS, or are declared yourself to encapsulate them. This means you class that uses this global state can get references to the wrappers, and they could be replaced with fakes.
Hmm, anyway. I'm not quite sure what advice I'm trying to give here, other than say, structuring code is all a balance! And, how to do it for your particular problem depends on your preferences, preferences of people who will use the code, how you're feeling on the day on the academic to pragmatic scale, how big the code base is, how safety critical the system is and how far off the deadline for completion is.
I believe your question is revealing something about your classes. Maybe the responsibilities could be improved ? Maybe moving some code would solve problems ?
Tell, don't ask.
That's how it was explained to me. There is a natural tendency to call classes to obtain some data. Taken too far, asking too much, typically leads to heavy "getter sequences". But there is another way. I must admit it is not easy to find, but improves gradually in a specific code and in the coder's habits.
Class A wants to perform a calculation, and asks B's data. Sometimes, it is appropriate that A tells B to do the job, possibly passing some parameters. This could replace B's "getName()", used by A to check the validity of the name, by an "isValid()" method on B.
"Asking" has been replaced by "telling" (calling a method that executes the computation).
For me, this is the question I ask myself when I find too many getter calls. Gradually, the methods encounter their place in the correct object, and everything gets a bit simpler, I have less getters and less call to them. I have less code, and it provides more semantic, a better alignment with the functional requirement.
Move the data around
There are other cases where I move some data. For example, if a field moves two objects up, the length of the "getter chain" is reduced by two.
I believe nobody can find the correct model at first.
I first think about it (using hand-written diagrams is quick and a big help), then code it, then think again facing the real thing... Then I code the rest, and any smells I feel in the code, I think again...
Split and merge objects
If a method on A needs data from C, with B as a middle man, I can try if A and C would have some in common. Possibly, A or a part of A could become C (possible splitting of A, merging of A and C) ...
However, there are cases where I keep the getters of course.
But it's less likely a long chain will be created.
A long chain will probably get broken by one of the techniques above.
I have three patterns for this:
Pass the necessary reference to the object's constructor -- the reference can then be stored as a data member of the object, and doesn't need to be passed again; this implies that the object's factory has the necessary reference. For example, when I'm creating a DOM, I pass the element name to the DOM node when I construct the DOM node.
Let things remember their parent, and get references to properties via their parent; this implies that the parent or ancestor has the necessary property. For example, when I'm creating a DOM, there are various things which are stored as properties of the top-level DomDocument ancestor, and its child nodes can access those properties via the reference which each one has to its parent.
Put all the different things which are passed around as references into a single class, and then pass around just that one class instance as the only thing that's passed around. For example, there are many properties required to render a DOM (e.g. the GDI graphics handle, the viewport coordinates, callback events, etc.) ... I put all of these things into a single 'Context' instance which is passed as the only parameter to the methods of the DOM nodes to be rendered, and each method can get whichever properties it needs out of that context parameter.

Reading a pointer from XML without being sure the relevant Obj-C instance exists

I have a "parent" Obj-C object containing (in a collection) a bunch of objects whose instance variables point to one another, possibly circularly (fear not, no retaining going on between these "siblings"). I write the parent object to XML, which of course involves (among other things) writing out its "children", in no particular order, and due to the possible circularity, I replace these references between the children with unique IDs that each child has.
The problem is reading this XML back in... as I create one "child", I come across an ID, but there's no guarantee the object it refers to has been created yet. Since the references are possibly circular, there isn't even an order in which to read them that solves this problem.
What do I do? My current solution is to replace (in the actual instance variables) the references with strings containing the unique IDs. This is nasty, though, because to use these instance variables, instead of something like [oneObject aSibling] I now have to do something like [theParent childWithID:[oneObject aSiblingID]]. I suppose I could create an aSibling method to simplify things, but it feels like there's a cleaner way than all this. Is there?
This sounds an awful lot like you are re-inventing NSCoding as it handles circular references, etc... Now, there might be a good reason to re-invent that wheel. Only you can answer that question.
In any case, sounds like you want a two pass unarchival process.
Pass 1: Grab all the objects out of the backing store and reconstitute. As each object comes out, shove it in a dictionary or map with the UID as the key. Whenever an object contains a UID, register the object as needing to be fixed up; add it to a set or array that you keep around during unarchival.
Pass 2: Walk the set or array of objects that need to be fixed up and fix 'em up, replacing the UIDs with objects from the map you built in pass #1.
I hit a bit of parse error on that last paragraph. Assuming your classes are sensibly declared, they ought to be able to repair themselves on the fly.
(All things considered, this is exactly the kind of data structure that is much easier to implement in a GC'd environment. If you are targeting Mac OS X, not the iPhone, turning on GC is going to make your life easier, most likely)
Java's serialization process does much the same thing. Every object it writes out, it puts in a 'previously seen objects' table. When it comes to writing out a subsequent reference, if it's seen the object before, then it writes out a code which indicates that it's a previously seen object from the list. When the reverse occurs, whenever it sees such a reference, it replaces it on the fly with the instance before.
That approach means that you don't have to use this map for all instances, but rather the substitution happens only for objects you've seen a second time. However, you still need to be able to uniquely reference the first instance you've got written, whether by some pointer to a part in the data structure or not is dependent on what you're writing.