Difference of principles behing CopyOnWriteArrayList and ConcurrentHashMap - arraylist

In the advanced java collections API, we have CopyOnWriteArrayList and ConcurrentHashMap. yet the underlying principles on these data structures are different. i.e ConcurrentHashMap only locks a segment of the Map on which the write operation is happening. This is how it prevents synchronization problems without affecting performance.
CopyOnWriteArrayList on the other hand prevents concurrency problems by making a duplicate of the original List. Why are these implementations so different? is Java just testing to see which one works better?

A concurrent data structure is designed to ensure that any sequence of individual operations on the data structure will always leave it in a state which is consistent from its own point of view, but a "snapshot" formed by reading the data structure piece by piece may not necessarily represent any state which the data structure ever held. For example, if while one is reading out an collection of users, "Zachary" is renamed to "Adam", the renamed user might get read out as "Adam", "Zachary", both, or neither. Even if during the enumeration the collection was never in a state where the user never existed with both names, or didn't exist with either, the enumeration might make it look like it did.
A copy-on-write collection is designed to let one take a snapshot of the collection's entire state and guarantee that there was some moment in time when the collection actually had that state. The result of every action, including snapshot requests, should be consistent with each action having been performed at some discrete moment in time time between when the request was issued and when it was reported to have completed. If two requests are given before either completes, the selection of which action precedes the other is arbitrary, but there must be a globally-consistent ordering.

Related

Finite State Machine Implementation in an Entity Component System

I would like to use a Finite State Machine to handle Entity states in my game. Specifically, for the purpose of this post, I'm going to refer to the Player entity.
My Player is going to have states such as idling, running, jumping, falling, etc... and needs some way to manage these states and the transitions between them. In an OOP environment, the simplest solution was to make each state its own class, and have a method called handleInput take in input and determine if a state change should occur. For example, in the IdleState if either move_right or move_left occurred, the state would change to a new RunningState. This is easy and makes sense because the behavior of a state should be encapsulated in the state.
However, everything changes when you use a FSM in an entity component system. States are no longer objects (because that would go against the flexibility of a component system), but are instead different arrangements of components. The JumpState might have components like JumpComponent, AirbornMovementComponent, etc... whereas the AttackState might have components to represent an attack like SwingComponent, DamageComponent, SwordComponent, etc... The idea is by rearranging the components, new states can be created. The systems job is to simply handle these components separately, because the systems don't care about the state, they only care about the individual components. The actual FSM sits in a FSMComponent held by the entity.
This makes a lot of sense, except for when it comes to handling state transitions. Right now I have an InputSystem that looks for Entities that have an InputComponent and a FSMComponent and attempts to update the state of the FSM based on the current input. However, this doesn't work so well.
The best way (in my opinion) for the FSM to handle input is to have each state determine how it wants to handle input and how to transition to a new state based on that input. This goes back to the OOP way of implementing a FSM, going against the design of an ECS where components are simply data bags and systems do all the logic. In an ECS, the idea would be to have a system handle state transitions, but that gets complicated because every FSM might have different conditions to transition between states.
You can't simply state in the InputSystem "if the input is to move right, then set the state to running". That would be specific to the player, but may not hold true for ALL entities. If one day I decide to make an enemy controllable, the inputs that work for a Player wouldn't be the same inputs for an Enemy.
My question: How can I let my FSM be generic enough and flexible enough in an ECS to allow for various implementations of state transitions without having to do explicit if/else checks in the systems themselves?
Am I approaching this completely the wrong way? If so, what's a better solution for implementing a FSM in an entity component system?
Just riffing off of #fallaciousreasoning's post (and its subsequent comments).
Ash actually has two FSMs that operate at different levels.
First, there is the FSM operating at the Entity level that manages (entity) state transitions by changing the composition of components on an entity as you transition from one state to another.
And secondly there is the FSM operating at the Engine level that manages (engine) state transitions by changing the composition of Systems executed by the engine.
Combined they make for a pretty robust FSM.
The trick is to determine what type of transition you need to work with; one where 'data' composition drives transitions, or one where 'logic' composition drives transitions.
So armed with this new found knowledge lets reason about how this would work.
In more naive approaches to changing component composition, we'd use a number of 'tag' components, along with some lengthy switch statements (or if/else checks) to handle these changes in entity state, ultimately resulting in bloated Systems that do way more than they should. Ash's Entity FSM refines this and avoids those lengthy switch statements (or if/else clauses) by mapping a given component configuration to an identifier and providing a manager that can be used to trigger state transitions. This manager instance can be passed around as a property within a component OR it can be composed/injected as a member of a system.
Alternatively, taking the Engine FSM approach, we break out each bit of state specific logic into their own Systems, and swap them in and out based on the given (engine) state. This method is not without its drawbacks since the absence of a system will affect all entities associated with it. However, its not uncommon to have Systems dedicated to a single entity instance (for example, a player character) so this can prove useful in the right context. Come to think of it affecting entities globally via System swaps may be desirable in some cases as well.
( Note: If the scope of your logic modification needs to be more narrow, you can restrict it to the System without involving the Engine FMS. This can be achieved by implementing the State pattern within a System where the system can change its behavior by delegating to different sub-systems based on state.)
I've seen some ECS frameworks label System compositions as 'Phases', but they fundamentally act as FSMs where the ECS engine processes entities using different sets of systems associated with a given phase.
In closing, data composition is just one half of the equation; how you compose your bits (or blocks) of logic is just as important when trying to implement a FSM within an ECS.
The ash framework has quite an interesting approach, the author writes about it here
He takes a similar approach to the one you were suggesting, arguing that an entity's state is dependent on the components that make it up (which in turn, determine what systems process the entity).
To make managing the state transitions easier, he introduces a FiniteStateMachine class, which tracks what components are required for an entity to be in various states.
For example, an enemy AI might be in a patrol state if it has a Transform and Patrol component, so he would register that information with the FiniteStateMachine
var fsm = new FiniteStateMachine(enemyEntity);
fsm.CreateState("patrol")
.WithComponent(new Transform())
.WithComponent(new Patrol());
When the Finite State machine is told to change the state of the entity, it removes and adds components to get to the desired state. Any system that needs to change the state of the entity just needs a reference to the finite state machine (something like fsm.ChangeState("patrol")).
He's also released the source code for it, which you can find here (it also explains it better than I ever could) and there's a practical example in the basic asteroids game here. The code is in ActionScript (I think) but you should be able to decipher it without too much difficulty.

When is it useful to have broadcast data in deserialized form?

Reading the docs for Spark I see
The data broadcasted this way is cached in serialized form and deserialized before running each task. This means that explicitly creating broadcast variables is only useful when tasks across multiple stages need the same data or when caching the data in deserialized form is important.
I understand why broadcasts variables are useful when re-using them in multiple tasks. You don't want to re-send them with all closures.
However the second part, in bold, says when caching data in deserialized form is important. When and why would that be important? If you're only going to use data in 1 task it will still get serialized/deserialized once, no?
I think you ignored following part:
and deserialized before running each task.
A single stage typically consist of multiple tasks (it is not common to have only a single partition, is it?) and multiple tasks belonging to the same stage can be processed by the same executor. Since deserialization can be quite expensive you may prefer to perform it only once.

Stamping / Tagging / Branding Object Instances

I have a routine which accepts an object and does some processing on it. The objects may or may-not be mutable.
void CommandProcessor(ICommand command) {
// do a lot of things
}
There is a probability that the same command instance loops back in the processor. Things turn nasty when that happens. I want to detect these return visitors and prevent them from being processed. question is how can I do that transparently i.e. without disturbing the object themselves.
here is what i tried
Added a property Boolean Visited {get, set} on the ICommand.
I dont like this because the logic of one module shows up in other. The ShutdownCommand is concerned with shutting down, not with the bookkeeping. Also an EatIceCreamCommand may always return False in a hope to get more. Some non-mutable objects have outright problems with a setter.
privately maintain a lookup table of all processed instances. when an object comes first check against the list.
I dont like this either. (1) performance. the lookup table grows large. we need to do liner search to match instances. (2) cant rely on hashcode. the object may forge a different hashcode from time to time. (3) keeping the objects in a list prevents them from being garbage collected.
I need a way to put some invisible marker on the instance (of ICommand) which only my code can see. currently i dont discriminate between the invocations. just pray the same instances dont come back. does anyone have a better idea to implement this functionality..?
Assuming you can't stop this from happening just logically (try to cut out the loop) I would go for a HashSet of commands that you've already seen.
Even if the objects are violating the contracts of HashCode and Equals (which I would view as a problem to start with) you can create your own IEqualityComparer<ICommand> which uses System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode to call Object.GetHashCode non-virtually. The Equals method would just test for reference identity. So your pool would contain distinct instances without caring whether or how the commands override Equals and GetHashCode.
That just leaves the problem of accumulating garbage. Assuming you don't have the option of purging the pool periodically, you could use WeakReference<T> (or the non-generic WeakReference class for .NET 4) to avoid retaining objects. You would then find all "dead" weak references every so often to prevent even accumulating those. (Your comparer would actually be an IEqualityComparer<WeakReference<T>> in this case, comparing the targets of the weak references for identity.)
It's not particularly elegant, but I'd argue that's inherent in the design - you need processing a command to change state somewhere, and an immutable object can't change state by definition, so you need the state outside the command. A hash set seems a fairly reasonable approach for that, and hopefully I've made it clear how you can avoid all three of the problems you mentioned.
EDIT: One thing I hadn't considered is that using WeakReference<T> makes it hard to remove entries - when the original value is garbage collected, you're not going to be able to find its hash code any more. You may well need to just create a new HashSet with the still-alive entries. Or use your own LRU cache, as mentioned in comments.

NSManagedObject as store with continuous analysis of raw data

This is similar to a question I asked before, but now that I've come much further along I still have a question about "proper" subclassing of NSManagedObject as I was told last night that it's a "bad idea" to put lots of non-persisted properties and ivars inside one. Currently I have TONS of code inside my NSManagedObject, and Apple's docs don't really address the "rightness" of that. FYI: the code works, but I'm asking if there are pitfalls ahead, or if there are obvious improvements to doing it another way.
My "object" is a continuously growing array of incoming data, the properties/ivars that track the progress of the analysis of that data, and the processed data (output). All of this is stored in memory because it grows huge, very quickly, and would not be possible to re-generate/re-analyze continuously. The NSManagedObject properties that are actually persisted are just the raw data (regularly saved, as Core Data doesn't support NSMutableData), a few basic properties and 2 relationships to other NSManagedObjects (1 being a user, the other being a set of snapshots of the data). Only one object is being recorded to at any one time, although dozens can be opened for viewing (which may involve further processing at any time).
It's not possible to have the object that inserts the entity (data manager that manages Core Data) have all of the processing logic/variables inside it, as each object necessitates at least a handful of arrays/properties that are used as intermediaries and tracking values for the analysis. And I, personally, think that it sounds silly to create two objects for each object that is being used (the NSManagedObject that is the store, and another object that is the processing/temp store).
Basically, all of the examples I can find using NSManagedObjects have super simple objects that are things like coordinates, address book entries, pictures: stuff that is basically static. In that case I can see having all of the logic that creates/modifies them outside the object. However, my case is not that simple and I have yet to come up with an alternative that doesn't involve duplication.
Any suggestions would be appreciated.
You might use a 'wrapper', that is to say a class with a reference to one of your managed object instances, this wrapper would contain your algorithms and your non persisted algorithms.

Parallelizing L2S Entity Retrieval

Assuming a typical domain entity approach with SQL Server and a dbml/L2S DAL with a logic layer on top of that:
In situations where lazy loading is not an option, I have settled on a convention where getting a list of entities does not also get each item's child entities (no loading), but getting a single entity does (eager loading).
Since getting a single entity also gets children, it causes a cascading effect in which each child then gets its children too. This sounds bad, but as long as the model is not too deep, I usually don't see performance problems that outweigh the benefits of the ease of use.
So if I want to get a list in which each of the items is fully hydrated with children, I combine the GetList and GetItem methods. So I'll get a list and then loop through it getting each item with the full cascade. Even this is generally acceptable in many of the projects I've worked on - but I have recently encountered situations with larger models and/or more data in which it needs to be more efficient.
I've found that partitioning the loop and executing it on multiple threads yields excellent results. In my first experiment with a list of 50 items from one particular project, I did 5 threads of 10 items each and got a 3X improvement in time.
Of course, the mileage will vary depending on the project but all else being equal this is clearly a big opportunity. However, before I go further, I was wondering what others have done that have already been through this. What are some good approaches to parallelizing this type of thing?
Usually it is faster to make a single database call that returns a set of records.
This recordset can "hydrate" the top-level objects, then another recordset can load child objects. I'm not sure how your situation does not allow lazy-loading, but this method is essentially lazy-loading, and will surely be faster than making multiple calls to the database that returns a single record each time.
You could make asynchronous calls to the database so that multiple queries are running in parallel. If you combine this with the first strategy for each "layer" of the model, and write a somewhat more complex hydration function based on multiple-record return sets, you should see that the database handles concurrent connections very well (which is why you see a performance gain from using multiple threads).
But you don't need to explicitly create threads - check out the asynchronous methods of a SqlCommand.