Dealing with large class inheritance hierarchies in NHibernate - nhibernate

My model looks like this:
InsurancePolicy
VehicleInsurancePolicy
AbcInsurancePolicy
DefInsurancePolicy
HomeInsurancePolicy
GhiInsurancePolicy
PqrInsurancePolicy
SomeOtherInsurancePolicy
... etc
where InsurancePolicy is an abstract class which is the base class for all concrete implementations of insurance policies. AbcInsurancePolicy , DefInsurancePolicy , etc are implementations which correspond to a certain insurance products. Sometimes I define other abstract classes for subgroups of policies with a subset of common fields (like VehicleInsurancePolicy).
I mapped this classes using a "Table per subclass, using a discriminator" strategy. The InsurancePolicy table contains about 60 fields, and each joined table adds from 10 to 30 fields. I used this strategy because:
I have a lot of subclasses with a lot of fields. A table-per-class-hierarchy strategy would end having a single table with a lot of null columns.
I want to be able to extend the application by adding other subclasses without changing the schema of InsurancePolicy table.
The InsurancePolicy is used often as a many-to-one relationship in other entities like Payment, Document etc.
NHibernate generates a lot of left-outer-joins when querying for InsurancePolicy because it doesn't know the type. This is very inefficient as I have a lot of tables to join. The problem becomes even worse when lazy-loading many-to-one properties containing an InsurancePolicy because it is used quite a lot in my model. The concrete implementations are used rarely, only in edit/details scenarios where it is specified the actual type and only the needed tables are joined.
Then I used a combination of discrimator + join. Thus the InsurancePolicy table contains the information about the type. Unfortunately a "join" mapping doesn't support lazy-loading. I tried setting fetch="select", however these generates N+1 selects when querying for multiple insurance policies.
// select from 1 table, "join" class must be lazy-loaded on access
Session.Get<InsurancePolicy>(5)
// select includes a join, since we explicitly specified a concrete type
Session.Get<SomeConcreteInsurancePolicy>(5)
So my questions are:
Is there a way to extend NHibernate to make it work like above?
Is there another way of mapping these large / complex class hierarchies?

Based on this:
The concrete implementations are used rarely, only in edit/details scenarios
I recommend that you break up InsurancePolicy in two:
InsurancePolicy, containing only the properties from the current base class
PolicyDetails, an abstract base class for the hierarchy.
There's a one-to-one relationship between those two classes.
The beauty of this is that you don't have to change anything else (except a minor change in the policy edit views, to point them to the new relationship)

Related

Core Data ordered many-to-many relationships

Using Core Data, I have two entities that have many-to-many relationships. So:
Class A <<---->> Class B
Both relationships are set up as 'ordered' so I can track they're order in a UITableView. That works fine, no problem.
I am about to try and implement iCloud with this Core Data model, and find out that iCloud doesn't support ordered relationships, so I need to reimplement the ordering somehow.
I've done this with another entity that has a one-to-many relationship with no problem, I add an 'order' attribute to the entity and store it's order information there. But with a many-to-many relationship I need an unknown number of order attributes.
I can think of two solutions, neither of which seem ideal to me so maybe I'm missing something;
Option 1. I add an intermediary entity. This entity has a one-to-many relationship with both entities like so:
Class A <<--> Class C <-->> Class B
That means I can have the single order attribute in this helper entity.
Option 2. Instead of an order attribute that stores a single order number, I store a dictionary that I can store as many order numbers as I need, probably with the corresponding object (ID?) as the key and the order number as the value.
I'm not necessarily looking for any code so any thoughts or suggestions would be appreciated.
I think your option 1, employing a "join table" with an order attribute is the most feasible solution for this problem. Indeed, this has been done many times in the past. This is exactly the case for which you would use a join table in Core Data although the framework already gives you many-to-many relationships: if you want to store information about the relationship itself, which is precisely your case. Often these are timestamps, in your case it is a sequence number.
You state: "...solutions, neither of which seem ideal to me". To me, the above seems indeed "ideal". I have used this scheme repeatedly with great performance and maintainability.
The only problem (though it is the same as with a to-one relationship) is that when inserting an item out of sequence you have to update many entities to get the order right. That seems cumbersome and could potentially harm performance. In practice, however, it is quite manageable and performs rather well.
NB: As for arrays or dictionaries to be stored with the entity to keep track of ordering information: this is possible via so-called "transformable" attributes, but the overhead is daunting. These attributes have to be serialized and deserialized, and in order to retrieve one sequence number you have to get all of them. Hardly an attractive design choice.
Before we had ordered relationships for more than 10 years, everyone used a "helper" entity. So that is the thing that you should do.
Additional note 1: This is no "helper" entity. It is a entity that models a fact in your model. In my books I always had the same example:
You have a group entity with members. Every member can belong to many groups. The "helper" entity is nothing else than membership.
Additional note 2: It is hard to synchronize such an ordered relationship. This is why it is not done automatically. However, you have to do it. Since CD and synchronizing is no fun, CD and synchronizing a model with ordered relationship is less than no fun.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Is there a general term for objects that map exactly to data tables?

I'm wondering if there's a general term used for objects that map exactly to data tables? E.g., a user and an article objects could map directly to user and article tables in a db, with each db field corresponding to a class variable...
They are referred to as Entities in JPA specification.
They are usually called entities, but entities in general don't need to map 1:1 to DB tables. However, what you describe is known as Active Record pattern.
Also, please note that there is very rarely an exact 1:1 mapping between object model and DB:
many-to-many relationships are usually implemented with third table in the DB but are usually mapped to only 2 classes with direct associations in the object model (if relation doesn't have additional attributes)
class inheritance can be modeled in 3 different ways in the DB with 1, N or N + 1 tables
ternary relationships use 3 tables in DB, but can be modeled with parameterized properties in the object model

Table Per Subclass Vs Table Per concrete class in hibernate?

In most of the web application, i have seen one base class consisting common properties and number of subclasses extending base class . So my question here is which strategy we should go for among Table Per Subclass Vs Table Per concrete class. I personally feel we should go for table per subclass because in future if we want to introduce the common column we can do it at one place but in case of concrete class we have to do it in multiple tables. Right?
But yes if we want to fetch all deatils from all child tables i think Table per concrete class will be helpful Because we have to simply union the records from all tables but in case of Table per Sub class along with union we have to introduce the join with parent table which will be extra costlier .Right?
You might be interested in Section 2.12 "Inheritance Mapping Strategies" of the JPA 2.0 specification, as it sums up all possbible inheritance types as well as their advantages and drawbacks. Let me pull out just the most interesting fragments:
2.12.1 Single Table per Class Hierarchy Strategy
This mapping strategy provides good support for polymorphic
relationships between entities and for queries that range over the
class hierarchy. It has the drawback, however, that it requires that
the columns that correspond to state specific to the subclasses be
nullable.
2.12.3 Table per Concrete Class Strategy
This strategy has the following drawbacks:
- It provides poor support for polymorphic relationships.
- It typically requires that SQL UNION queries (or a separate SQL query per subclass) be issued for queries that are intended to range over the class hierarchy.
2.12.2 Joined Subclass Strategy
It has the drawback that it requires that one or more join operations
be performed to instantiate instances of a subclass. In deep class
hierarchies, this may lead to unacceptable performance. Queries that
range over the class hierarchy likewise require joins.
Also, if you're planning to be JPA-compatible, remember that the JPA-provider doesn't have to support TABLE_PER_CLASS strategy type.
I personally feel we should go for table per subclass because in
future if we want to introduce the common column we can do it at one
place but in case of concrete class we have to do it in multiple
tables.
True, but JOINED strategy also provides you the same feature and allows to specify common properties in one table.
Hope that helps!
The Object-Relational Impedance Mismatch
In Object Model, while creating object we may require to use inheritance i.e. Generalization as follows:
In Relational Model, the above Generalization(not association i.e. one-to-one or many-to-many) can achieve in Hibernate ORM with the following three inheritance mapping strategies:
Table Per Class i.e. for Hierarchy only one table
Table Per Concrete class i.e. One table for each concrete class not for super class
Table Per Subclass i.e. One table fore each class
In this strategy, we can map the whole hierarchy into single table, here we use one more discriminator column i.e. TYPE.
In this strategy, tables are created as per class but related by foreign key. So there are no duplicate columns.
In this strategy, tables are created as per class but related by foreign key. So there are no duplicate columns.
image source

ORM question - JPA

I'm reading Pro JPA 2. The book talks begins by talking about ORM in the first few pages.
It talks about mapping a single Java class named Employee with the following instance variables - id,name,startDate, salary.
It then goes on to the issue of how this class can be represented in a relational database and suggests the following scheme.
table A: emp
id - primary key
startDate
table B: emp_sal
id - primary key in this table, which is also a foreign key referencing the 'id' column in table A.
It thus seems to suggest that persisting an Employee instance to the database would require operations on two(multiple) tables.
Should the Employee class have an instance variable 'salary' in the first place?
I think it should possibly belong to a separate class (Class salary maybe?) representing salary and thus the example doesn't seem very intuitive.
What am I missing here?
First, the author explains that there are multiples ways to represent a class in a database: sometimes the mapping of a class to a table is straightforward, sometimes you don't have a direct correspondence between attributes and columns, sometimes a single class is represented by multiples tables:
In scenario (C), the EMP table has
been split so that the salary
information is stored in a separate
EMP_SAL table. This allows the
database administrator to restrict
SELECT access on salary information to
those users who genuinely require it.
With such a mapping, even a single
store operation for the Employee class
now requires inserts or updates to two
different tables.
So even storing the data from a single class in a database can be a challenging exercise.
Then, he describes how relationships are different. At the object level model, you traverse objects via their relations. At the relational model level, you use foreign keys and joins (sometimes via a join table that doesn't even exist at the object model level).
Inheritance is another "problem" and can be "simulated" in various ways at the relational model level: you can map an entire hierarchy into a single table, you can map each concrete class to its own table, you can map each class to its own table.
In other words, there is no direct and unique correspondence between an object model and a relational model. Both rely on different paradigms and the fit is not perfect. The difference between both is known as the impedance mismatch, which is something ORM have to deal with (allowing the mapping between an object model and the many possible representations in a relation model). And this is what the whole section you're reading is about. This is also what you missed :)