So in OOP, objects send messages to other objects. This is a pretty simple concept, and as long as all of the objects live in memory, it's easy to implement e.g. by calling methods.
But in real life, we persist objects into the database or elsewhere, because there isn't enough RAM to hold all of them. How do you call a method on an object that is currently persisted?
OK, so maybe unpersisting one object can be incapsulated into its Factory. But what if I want to send messages to a lot of objects, e.g. in a loop? Unpersisting them one by one is a classic N+1 issue.
OK, I can have a Repository that'll unpersist all necessary objects in one shot. But doesn't it break incapsulation to ask a Repository to get my objects?
What about patterns like Observer? Is it possible to have an object subscribe to anything, knowing that it's going to be persisted?
Are there transparent implementations of this in any language?
I don't think it does. RAM is critical to OOP. Once you persist something, outside of RAM, it's not OOP anymore.
An object is actually the total of all the objects it's in contact with, plus all the objects they're in contact with, plus.... To work, an object needs almost instant communications with many other objects. Without that, performance will lurch to a halt. (Think virtual memory: if you've ever used 1.5GB on a machine with 0.5GB, you'll know the problem.)
Storage based on slow memory tends to use either relational storage or big blocks of sequential data, each block accessed ramdomly by key. I have written large-data, highly interactive (and so heavily OO) systems that persisted their data in a SQL database. I had to organize the DB so it could be divided into managable blocks, each able to stand on its own, several of which would be loaded into a session. Then I had to write vast amounts of code to turn the tables, rows, and columns into useful objects. To write to the DB, I had to reverse the whole process.
So to start the program, the DB data went into a blender and came out something completely different. When the user was done and saved to the database, the data got unblended and put back. The in-memory objects had no obvious relationship with the DB tables (though obviously one could be created from the other with enough work). And that's as close as I've been able to get to usable persistence. The atomic units were much bigger than objects or rows, and each transformation took a while (whole seconds).
I've worked a (very) little bit with ORM. The trick here is that OO is used to mimick relationals storage. You end up not with the objects you'd get if you started with a good OO design, but with something that looks just like a relational database. In a lot of cases this is just what you want and it works very well. But it's not real OOP and probably is not what you're looking for.
So I've found no general answer. Fast memory and OO--and slow memory and relational/non-realtional--are two different things, and bridging that gap takes careful study of specific cases and bit of genius.
Related
I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.
For a hobby project I am building an application to keep track of my money. Register everything that comes in and goes out. I am using sqlite as a database backend.
I have two data access models in mind.
Creating one master object as a sort of database connector, which contains methods which execute the queries and provide the required sets of data as a list of objects
Have objects who need data execute the queries themselves
Which one of these is 'the best' and why? Or are there different, better models out there?
The latter option is better. In the first option, you would end up having to touch your universal data access object for just about any update to the code that wasn't purely a change in display logic. If you have different data access objects, then you will have much more testable, maintainable code.
I suggest you read up a bit on the model-view-controller paradigm. The wikipedia article on it is a good start: http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller.
Also, you didn't say which language/platform you were coding in, but most platforms have numerous options for auto-generating a starting point your data access classes from your database. You may find something like that useful.
Much of a muchness really, the thing to avoid is having the "same" sql sprinkled all over your code base.
The key point is. You've just added a new column to Table1. When you do Find In Files "Table1", how many hits are you going to get and where.
If you use one class and there's a lot of db operations, it's going to get very messy very quickly, but if you have one interface (say IModel) with one implementation, you can swap backends very easily.
So how many db operations, and how likely is it you will move away from SqlLite.
In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"
This is a generic question, I don't know if it belongs to Programming or StackOverflow.
I'm writing a litte simulation. Without going very deep into its details, consider that many kind of identities are involved. They correspond to Object since I'm using a OOP language.
There are Guys that inhabit the world simulated
There are Maps
A map has many Lots, that are pieces of land with some characteristics
There are Tribes (guys belong to tribes)
There is a generic class called Position to locate the elements
There are Bots in control of tribes that move guys around
There is a World that represents the world simulated
and so on.
If the simulated world was laid down as a database, the objects would be tables with lots of references, but in memory I have to use a different strategy. So, for example, a Tribe has an array of Guys as a property, The world has a, array of Bots, of Tribes, of Maps. A Map has a Dictionary whose key is a Position and whose value is a Lot. A Guy has a Position that is where he stands.
The way I lay down such connections is pretty much arbitrary. For example, I could have an array of Guys in the World, or an Array of guys per Lot (the guys standing on a piece of land), or an array of Guys per Bot (with the Guys controlled by the bot).
Doing so, I also have to pass around a lot of objects. For example, a Bot must have informations about the Map and opponent Guys to decide how to move its Guys.
As said, in a database I'd have a Guys table connected to the Lots table (indicating its position), to the Tribe table (indicating which Tribe it belongs to) and so it would also be easy to query "All the guys in Position [1, 5]". "All the Guys of Tribe 123". "All the Guys controlled by Bot B standing on the Lot b34 not belonging to the Tribe 456" and so on.
I've worked with APIs where to get the simplest information you had to make an instance of the CustomerContextCollection and pass it to CustomerQueryFactory to get back a CustomerInPlaceQuery to... When people criticize OOP and cite verbose abstractions that soon smell ridiculous, that's what I mean. I want to avoid such things and having to relay on deep abstractions and (anti pattern) abstract contexts.
The question is: what is the preferred, clean way to manage entities and collections of entities that are deeply linked in multiple ways?
It depends on your definition of "clean". In my case, I define clean as: I can implement desired behavior in an obvious, efficient manner.
Building OOP software is not a data modeling exercise. I'd suggest stepping back a little. What does each one of those objects actually do? What methods are you going to implement?
Just because "guys are in a lot" doesn't mean that the lot object needs a collection of guys; it only needs one if there are operations on a lot that affect all the guys in it. And even then, it doesn't necessarily need a collection of guys - it needs a way to get the guys in the lot. This may be an internally stored collection, but it could also be a simple method that calls back into the world to find guys matching a criteria. The implementation of that lookup should be transparent to anyone.
From the tenor of your questions, it seems like you're thinking of this from a "how do I generate reports" perspective. Step back and think of the behaviors you're trying to implement first.
Another thing I find extremely valuable is to differentiate between Entities and Values. Entities are objects where identity matters - you may have two guys, both named "Chris", but they are two different objects and remain distinct despite having the same "key". Values, on the other hand, act like ints. From your above list, Position sounds a lot like a value - Position(0,0) is Position(0,0) regardless of which chunk of memory (identity) those bits are stored in. The distinction has a bit effect on how you compare and store values vs. entities. For example, your Guy objects (entities) would store their Position as a simple member variable.
I've found a great reference for how to think about such things is Eric Evan's "Domain Driven Design" book. He's focused on business systems, but the discussions are very valuable for how you think about building OO systems in general I've found.
I would say that no 'true' answer exists to your core question -- a best way to manage collections of entities that are linked in multiple ways. It really depends on the kind of application (simulation) - here are some thoughts:
Is execution time important?
If this is the case, there is really no way around analyzing in which way your simulator will iterate over (query) the objects from the pool: sketch out the basic simulation loop and check what kind of events will require to iterate over what kind of model entities (I assume you are developing a discrete-event simulation?). Then you should organize the data structures in a way that optimizes the most frequent/time-consuming events (as opposed to "laying down the connections arbitrarily"). Additionally, you may want to use special data structures (such as k-d trees) to organize entities with properties that you need to query often (e.g., position data). For some typical problems, e.g. collision detection, there is also a whole lot of approaches to solve them efficiently (so look for suitable libraries/frameworks, e.g. for multi-agent simulation).
How flexible do you want to make it?
If you really want to make it super-flexible and really don't want to decide on the hierarchy of the model entities, why not just use an in-memory database? As you already said, databases are easily applicable to your problem (and you can easily save the model state, which may also be useful).
How clean is clean enough?
If you want to be absolutely sure that the rest of your simulator is not affected by the design choices you make in regards of your model representation, hide it behind an interface (say, ModelWorld), which defines methods for all the types of queries your simulator may invoke (this is orthogonal to the second point and may help with the first point, i.e. figuring out what kind of access pattern your simulator exhibits). This allows you to change implementations easily, without affecting any other parts of the simulator code.
Most of what I hear about NHibernate's lazy-loading, is that it's better to use it, than not to use it. It seems like it just makes sense to minimize database access, in an effort to reduce bottlenecks. But few things come without trade-offs, certainly it slightly limits design by forcing you to have virtual properties. But I've also noticed that some developers turn lazy-loading off on certain often-used objects.
This makes me wonder if there are some definite situations where data-access performance is hurt by using lazy-loading.
So I wonder, when and in what situations should I avoid lazy-loading one of my NHibernate-persisted objects?
Is the downside to lazy-loading merely in additional processing time, or can nhibernate lazy-loading also increase the data-access time (for instance, by making additional round-trips to the database)?
Thanks!
There are clear performance tradeoffs between eager and lazy loading objects from a database.
If you use eager loading, you suck a ton of data in a single query, which you can then cache. This is most common on application startup. You are trading memory consumption for database round trips.
If you use lazy loading, you suck a minimal amount of data in a single query, but any time you need more information related to that initial data it requires more queries to the database and database performance hits are very often the major performance bottleneck in most applications.
So, in general, you always want to retrieve exactly the data you will need for the entire "unit of work", no more, no less. In some cases, you may not know exactly what you need (because the user is working through a wizard or something similar) and in that case it probably makes sense to lazy load as you go.
If you are using an ORM and focused on adding features quickly and will come back and optimize performance later (which is extremely common and a good way to do things), having lazy loading being the default is the correct way to go. If you later find (through performance profiling/analysis) that you have one query to get an object and then N queries to get the N objects related to that original object, you can change that piece of code to use eager loading to only hit the database once instead of N+1 times (the N+1 problem is a well known downside of using lazy loading).
The usual tradeoff for lazy loading is that you make a smaller hit on the database up front, but you end up making more hits on it long-term.
Without lazy loading, you'll grab an entire object graph up front, sucking down a large chunk of data at once. This could, potentially, cause lag in your UI, and so it is often discouraged. However, if you have a common object graph (not just single object - otherwise it wouldn't matter!) that you know will be accessed frequently, and top to bottom, then it makes sense to pull it down at once.
As an example, if you're doing an order management system, you probably won't pull down all the lines of every order, or all the customer information, on a summary screen. Lazy loading prevents this from happening.
I can't think of a good example for not using it offhand, but I'm sure there are cases where you'd want to do a big load of an object graph, say, on application initialization, in order to avoid lags in processing further down the line.
The short version is this:
Development is simpler if you use lazy loading. You just traverse object relationships in a natural OO way, and you get what you need when you ask for it.
Performance is generally better if you figure out what you need before you ask for it, and ask for it in one trip to the database.
For the past few years we've been focusing on quick development times. Now that we have a solid app and userbase, we're optimizing our data access.
If you are using a webservice between the client and server handling the database access using nhibernate it might be problematic using lazy loading since the object will be serialized and sent over the webservice and subsequent usage of "objects" further down in the object relationship needs a new trip to the database server using additional webservices. In such an instance it might not be too good using lazy loading. A word of caution, be careful in what you fetch if you turn lazy loading of, its way to easy to not think this through and through and end up fetching almost the whole database...
I have seen many performance problems aring from wrong loading behaviour configuration in Hibernate. The situation is quite the same with NHibernate I think. My recommendation is to always use lazy relations and then use eager fetching statemetns in your query - like fetch joins - . This ensures you are not loading to much data and you can avoid to many SQL queries.
It is easy to make a lazy releation eager by a query. It is nearly impossible the other way round.