ECS / CES shared and dependent components and cache locality - game-engine

I have been trying to wrap my head around how ECS works when there are components which are shared or dependent. I've read numerous articles on ECS and can't seem to find a definitive answer to this.
Assume the following scenario:
I have an entity which has a ModelComponent (or MeshComponent), a PositionComponent and a ParticlesComponent (or EmitterComponent).
The ModelRenderSystem needs both the ModelComponent and the PositionComponent.
The ParticleRenderSystem needs ParticlesComponent and the PositionComponent.
In the ModelRenderSystem, for cache efficiency / locality, I would like run through all the ModelComponents which are in a compact array and render them, however for each model I need to pull the PositionComponent. I haven't even started thinking about how to deal with the textures, shaders etc for each model (which will definitely blow the cache).
A similar issue with the ParticleRenderSystem.. I need both the ParticlesComponent as well as the PositionComponent, and I want to be able to run through all ParticlesComponents in a cache efficient / friendly manner.
I considered having ModelComponent and ParticlesComponent each having their own position, but they will need to be synched every time the models position changes (imagine a particle effect on a character). This adds another entity or component which needs to track and synch components or values (and potentially negates any cache efficiency).
How does everyone else handle these kinds of dependency issues?

One way to reduce the complexity could be to invert flow of data.
Consider that your ModelRenderSystem has a listener callback that allows the entity framework to inform it that an entity has been added to the simulation that contains both a position and model component. During this callback, the system could register a callback on the position component or the system that owns that component allowing the ModelRenderSystem to be informed when that position object changes.
As the change events from the position changes come in, the ModelRenderSystem can queue up a list of modifications it must replicate during its update phase and then during update, its really a simple lookup each modifications model and set the position to the value in the event.
The benefit is that per frame, you're only ever replicating position changes that actually changed during the frame and you minimize lookups needed to replicate the data. While the update of the position propagates to various systems of interest may not be as cache friendly, the gains you observe otherwise out weigh that.
Lastly, don't forget that systems do not necessarily need to iterate over the components proper. The components in your entity system exist to allow you to toggle plug-able behavior easily. The systems can always manage a more cache friendly data structure and using the above callback approach, allows you to do that and manage data replication super easily with minimal coupling.


How do you handle systems accessing multiple components (without dropping your cache load) in an entity component system?

I'm currently writing my own Entity Component System, and I'm having a little bit of trouble optimizing my cache load when it comes to systems that look at multiple components.
I'm taking a very pure ECS approach: components of the same type are stored in a (sparse) vector, and entities are identified by their index in each component vector. I understand that this is good for performance, because a system working on a single component can fill the cache load fully with components that it needs.
But an issue arises when a system uses multiple components, and I don't know what I'm missing; for each entity, a system will need to drop its cache load of one component to look at the next one. Each component type a system looks at means a cache drop for each entity it looks at.
Is my problem clear? Does anyone have any solutions to this? What am I missing?

In ECS (Entity Component System) what is the difference between Component-Manager and a System?

I'm trying to understand ECS. So a component is just plain data, and some manager holds these components in a container and loops through all of them to act on this data, to "update" them.
Is this manager what people call "component-manager" or is it a "system"? Or do they mean the same thing? If not, what does a component-manager and a system do?
ECS means different things to different people. There are a large number of approaches when it comes to implementation but I personally go by the following rules:
A Component is just plain data, typically a structure or some object with no logic associated with it what so ever.
An Entity is a collection of components. It is defined by an identifier, typically an integer, that can be used to look up components like an index.
A System is where all the game logic lives. Each System has an archetype, that is a specific set of components that it operates on. Systems have an update function, which when invoked accesses the specific set of components its interested in (its archetype), for all entities that have that specific collection of components. This update function is triggered externally (by what? see the next paragraph).
Now, here's the bit that addresses your question directly (or at least attempts to). Video games are simulations and they are typically driven by whats called an update loop (typically sync-ed to a monitor's refresh rate). In ECS architecture, there is typically dedicated code that strings your systems together in a queue and on each time-step of the update loop executes those systems in sequence (ie. calls their update functions). That bit of dedicated code not only manages the system update loop but is also responsible for managing components (stored as lists/arrays that can be indexed by an entity id) and a myriad of other tasks. In many implementations its referred to as the "Engine". This is what I take to be a "component-manager". But that could mean something else in another ECS approach. Just my two-cents. Hope it helped.

What is a proper way to separate data structure logic from its graphical representation?

It's more of a software design question, than strictly programming, so I'll paste UML diagrams instead of code for everyone's convenience.
Language is Java, so variables are implied references.
I'm writing an app, that helps edit a very simple data structure, that looks like this:
On my first trial run I've included all the drawing-related and optimization code into the data structure, that is, every Node knew how to draw itself and kept a reference to one of shared cached bitmaps. UML:
It was easy to add (fetch a corresponding bitmap and you're done) and remove (paint background color over previously mentioned bitmap). Performance-wise it was nice, but code-wise it was messy.
So on the next iteration I decided to split things, but I may have went to far and things got messy yet again:
Here data structure and its logic is completely separated, which is nice. I can easily load it from file or manipulate in some way before it needs to be drawn, but when it comes to drawing things get uncomfortable.
The classic way would be to change data then call invalidate() on drawing wrapper,but that's inefficient for many small changes. So to, say, delete 1 Tile Id have to either have Drawn representation be independent of Data and call deketeTile() for both separately, or funnel all commands to Data through Drawing class. Things get even messier when I try to add different drawing methods via Strategy pattern or somehow else. The horror:
What wis a clean efficient way to organize interactions with Model and View?
First, definitely decouple the app logic from UI. Make some model for your schematic. That will solve your trouble to unit test the app model, as you already said. Then I would try the Observer pattern. But given that a schematic can have lots and lots of graphical components (your Tiles), I would change the usual setup for notifying every observer when something changes in the model, to notifying only the corresponding GraphicalComponent (Tile), when a Component gets changed in the Model. Your UI asks Model to do things, and gets called back in some parts to update. This will be automatic, no duplicated calls, just the initial observer registry on GraphicalComponent creation.

Use cases of Event Sourcing, when we don't care about past states

I have been reading about Event Sourcing pattern, I have seen it used in the projects I have worked on, but I am still yet to see any benefit of it, while it makes the design much more complicated.
That is, many sources mention that Event Sourcing is good if you want to see Audit Log, be able to reconstruct the state of 15 days ago and I see that Event Sourcing solves all of that beautifully. But apart from that, what is the point?
Yes, I can imagine that if you are in relational world, then writes are comparatively slow as they lock the data and so on. But it is much easier to solve this problem, by going no-sql and using something like Cassandra. Cassandra's writes are super fast, as they are append-only (kinda temporary event source), it scales beautifully as well. Sources also mention that Event Sourcing helps scaling - how on earth it can help you to scale, when instead of storing ~1 row of data per user, now you have 9000 and instead of retrieving that single row, now you are replaying 9000 rows (or less, if you complicate the design even more and add some temporal snapshots of state and replay the current state form the last snapshot).
Any examples of real life problems that Event Sourcing solves or links would be much appreciated.
While I haven't implemented a distributed, event-sourced sub-system as yet (so I'm no expert), I have been researching and evaluating the approach. Event sourcing provides a number of key benefits:
I'm sure there are more. To a large extent, the benefits of event sourcing depend on the baseline you are comparing it against (CRUD, event-driven DDD, CQRS, or whatever), and the domain.
Let's look at each of those in turn:
With event driven systems that fire events whenever the system is updated, you often have a problem: how do you both update the system state and fire the event in one go? If the 2nd operation fails, your system is in a broken, inconsistent state. Event sourcing provides a neat solution to this, since the system only requires a single operation for the state change, which will either succeed or fail atomically: the writing of the event. Other solutions tend to be more complex and less scalable - 2 phase commit, etc.
This is a big benefit in a large, high transaction system, where components are failing, being updated or replaced all the time while transactions are going on. The ability to terminate a process at any time without any worry about data corruption or consistency is a big benefit and helps you sleep at night.
In many domains you won't have concurrent writes to the same entities, or you won't require events since a state change has no knock-on effects, in which case event sourcing is unlikely to be a good approach, and simpler approaches like CRUD may be fine.
First of all, event streams make consistent writes very efficient - it's just an append only log, which makes replication and 'compare and set' simple to optimise. Something like Cassandra is quite slow in the scenario where you need to protect your invariants - that is, you need to validate a command against the current state of a 'row', and reject the update if the row changes before you have a chance to update it. You either need to use 'lightweight transactions' to ensure consistency, or have a single writer thread per partition, so that you can be sure that you can successfully validate a command against the current state of the system before allowing the update. Of course you can implement an event store in Cassandra, using either of these approaches (single thread/lightweight transactions).
Read scalability is the biggest performance benefit though - since you can build as many different eventually consistent projections (views) on the data as you want by reading from event streams, and horizontally scale query services on these views as much as you want. These views can use custom databases (Cassandra, graph databases) as necessary to allow queries to be optimised as much as you want. They can store denormalised data, to allow all required data to be fetched in a single (non-joined) database query. They can even store the projected state in memory, for maximum performance. While this can potentially be achieved without event sourcing, it is much more complex to implement.
If you don't have complex querying and high scalability requirements, event sourcing may not be the right solution.
If you need to look at your data in a new way, say you create a new client app or screen in an app, it's very easy to add new projections of the event streams as new, independent services. If you need to add some data to an existing read view that you missed, or fix a bug in the read view, you can just rebuild the views using the event streams and throw away the old ones. The advantages here vs. the non-event sourced case are:
You don't need to write both DB migration code and then code to keep the view up to date as events come in. Instead, you just write the code to keep it up to date, and run it on the events from the start of time.
Related to this, you can do the update without having to bring down the query service to do a schema change - instead, just leave the old service version running against the old DB, generate a new DB with the new service version, and when it's caught up with the event streams, just atomically switch over then clean up the old service and DB once you're happy the new one is stable (noting that the old service will be keeping itself up to date in the meantime, if you need to roll back!). This is likely to be extremely difficult to achieve without event sourcing.
If you need any temporal information to be added to your views (e.g. when was the last update, when was this created), that's already available and easy to add, but impossible to add retrospectively without event sourcing.
Note that the above isn't about modifying event streams (which is tricker, see my comment on challenges below) - it's about using the existing event streams to enhance a view or create a new one.
There are simple ways to do this without event sourcing, such as using database views (with an RDBMS), but they aren't as scalable.
Event sourcing also has some challenges for evolvability - you need to take care of event versioning, probably using a combination of weak event schema (so you can add properties with default values) and stream replacement (when you want to do a bigger change to your events). Greg Young is writing a good book on this.
As you mentioned, you're not interested in this.

When does application state belong in Core Data instead of NSUserDefaults?

I'm trying to decide whether some application state, specifically the selected item in a list, should be stored in Core Data or NSUserDefaults.
Generally I believe that application preferences and state should persist in NSUserDefaults, and model-level data should persist elsewhere, say in Core Data. My model so far is:
Should the data be stored at all? If the user wouldn't reasonably expect it to be, then throw it out (for example, the cursor position is not saved in TextEdit)
If the application were multi-document, the setting would apply to all documents
It's conceivable that the data would be configured in preferences
Having the data outside of the model makes sense for testing (swapping several defaults with one model store)
The data clearly belongs as an attribute of a model-level object
The data is sufficiently large that storing it in NSUserDefaults would cause performance problems
It would be difficult or time-intensive for the user to re-create the data (they would definitely consider the loss of this information "data loss")
I plan to store the sort order of some entities in Core Data. Without this information (i.e. a "sortIndex" or "order" attribute) each entity instance would have to be augmented with data from the user defaults.
However, storing state in the model seems like a slippery slope. If I store sort order then it also seems appropriate to store selection since they are both the state of a list. The selection data for my use case may actually be quite large. Specifically, the icons in one list depend on the selection in each of their sub-lists.
Does anyone have a hard line they draw with respect to NSUserDefaults vs. data model?
You didn't mention whether this is a document-based app (like say, TextEdit) or a library-based one (like say, AddressBook).
That may help you decide where such information should go: assume a document-based app. Assume its documents get placed under version-control (this is actually feasible when using Core Data's XML data store type). Open the app, change the doc's sort orders. Does this dirty the document? Would this change be worth a check-in? Would the change be valuable to other users of this repository?
Typically, sort orderings aren't valuable enough to warrant document-based storage (ala NSTableView's Auto Save Name in Interface Builder). But your app may place a priority on sorting (it sounds like it).
So, there is no hard-and-fast rule. But I think the idea of having a document under version control, potentially shared with others, provides a good intellectual framework to make your case for either side.
I agree with rentzsch, but another way to view it:
Is the selection part of the data or is it metadata? If metadata, is it metadata about a single document or is it state that should apply to any document that happens to be opened next?
Document-specific metadata might want to be stored as an extended attribute. For example, TextMate stores the selection for a document this way, much as BBEdit, MPW, and others used to store tab settings, window size, etc. as a resource in the resource fork. Metadata is considered optional and the document is intact if it is stripped away.
If the selection is an integral part of the data, then by all means store it in the data, using Core Data if you happen to swing that way.
And if it's not a document-based app, then NSUserDefaults is the simplest path since support for it is generally built into common NSView subclasses via bindings.
I personally don't have a hard line between saving preferences in the file itself or in NSUSerDefaults.
But I have always tended towards the obvious:
Application preferences = NSUSerDefaults
Document preferences = in the file itself
For selection state specifically, I would judge if keeping that is important enough to the user. If it is, and important enough to move with the document to another computer, I would keep it in the document itself.
If it isn't important (or applicable) I wouldn't bother with saving it at all.