NHibernate queries return multiple copies of the target objects - nhibernate

I have two classes mapped in NHibernate: Dragon and its subclass FierceDragon, with a few FierceDragons stored stored in a table called Dragons. When I run an HQL query like from Dragon... I get back two objects per row: the expected FierceDragon and an ordinary Dragon that's a copy of the FierceDragon (insofar as is possible; naturally it lacks the FierceDragon's extra Ferocity and TimeSinceLastMeal properties). In particular, their IDs are identical. When I do from FierceDragon I get only FierceDragons, with no extra copies, but that won't work for me in general.
Why does this happen, and how can I prevent it?

If you create your mapping correctly, that should not pose a problem.
There are 3 different ways of mapping a class-hierarchy to the DB using NHibernate.
Check out this and this article.
You can map both Dragon and FierceDragon to the same table, but, in that case, your table should have some nullable columns to be able to store the additional properties of FierceDragon. Since you're talking about one table, I suppose you want to use the 'Table per class hierarchy' mapping strategy ?

The mystery is solved; I thought I was only mapping FierceDragon, but no, I was mapping Dragon too, both to the table Dragons. Not sure why NH did that particular thing in this case, but clearly the fix is to, you know, not map separate classes to the same table. Or if you do, at least give NH some way of distinguishing between the two in the DB.

Related

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

How can one delete an entity in nhibernate having only its id and type?

I am wondering how can one delete an entity having just its ID and type (as in mapping) using NHibernate 2.1?
If you are using lazy loading, Load only creates a proxy.
session.Delete(session.Load(type, id));
With NH 2.1 you can use HQL. Not sure how it actually looks like, but something like this: note that this is subject to SQL injection - if possible use parametrized queries instead with SetParameter()
session.Delete(string.Format("from {0} where id = {1}", type, id));
Edit:
For Load, you don't need to know the name of the Id column.
If you need to know it, you can get it by the NH metadata:
sessionFactory.GetClassMetadata(type).IdentifierPropertyName
Another edit.
session.Delete() is instantiating the entity
When using session.Delete(), NH loads the entity anyway. At the beginning I didn't like it. Then I realized the advantages. If the entity is part of a complex structure using inheritance, collections or "any"-references, it is actually more efficient.
For instance, if class A and B both inherit from Base, it doesn't try to delete data in table B when the actual entity is of type A. This wouldn't be possible without loading the actual object. This is particularly important when there are many inherited types which also consist of many additional tables each.
The same situation is given when you have a collection of Bases, which happen to be all instances of A. When loading the collection in memory, NH knows that it doesn't need to remove any B-stuff.
If the entity A has a collection of Bs, which contains Cs (and so on), it doesn't try to delete any Cs when the collection of Bs is empty. This is only possible when reading the collection. This is particularly important when C is complex of its own, aggregating even more tables and so on.
The more complex and dynamic the structure is, the more efficient is it to load actual data instead of "blindly" deleting it.
HQL Deletes have pitfalls
HQL deletes to not load data to memory. But HQL-deletes aren't that smart. They basically translate the entity name to the corresponding table name and remove that from the database. Additionally, it deletes some aggregated collection data.
In simple structures, this may work well and efficient. In complex structures, not everything is deleted, leading to constraint violations or "database memory leaks".
Conclusion
I also tried to optimize deletion with NH. I gave up in most of the cases, because NH is still smarter, it "just works" and is usually fast enough. One of the most complex deletion algorithms I wrote is analyzing NH mapping definitions and building delete statements from that. And - no surprise - it is not possible without reading data from the database before deleting. (I just reduced it to only load primary keys.)

Inheritance in Database Design

I am designing a new laboratory database with MANY types of my main entities.
The table for each entity will hold fields common to ALL types of that entity (entity_id, created_on, created_by, etc). I will then use concrete inheritance (separate table for each unique set of attributes) to store all remaining fields.
I believe that this is the best design for the standard types of data which come through the laboratory daily. However, we often have a special samples which often are accompanied by specific values the originator wants stored.
Question: How should I model special (non-standard) types of entities?
Option 1: Use entity-value for special fields
One table (entity_id, attribute_name, numerical_value) would hold all data for any special entity.
+ Fewer tables.
- Cannot enforce requiring a particular attribute.
- Must convert (pivot) rows to columns which is inefficient.
Option 2: Strict concrete inheritance.
Create separate table for each separate special case.
+ Follows in accordance with all other rules
- Overhead of many tables with only a few rows.
Option 3: Concrete inheritance with special tables under a different user.
Put all special tables under a different user.
+ Keeps all special and standard tables separate.
+ Easier to search for common standard table in a list without searching through all special tables.
- Overhead of many tables with only a few rows.
Actually the design you described (common table plus subtype-specific tables) is called Class Table Inheritance.
Concrete Table Inheritance would have all the common attributes duplicated in the subtype tables, and you'd have no supertype table as you do now.
I'm strongly against EAV. I consider it an SQL antipattern. It may seem like an elegant solution because it requires fewer tables, but you're setting yourself up for a lot of headache later. You identified a couple of the disadvantages, but there are many others. IMHO, EAV is used appropriately only if you absolutely must not create a new table when you introduce a new subtype, or if you have an unbounded number of subtypes (e.g. users can define new attributes ad hoc).
You have many subtypes, but still a finite number of them, so if I were doing this project I'd stick with Class Table Inheritance. You may have few rows of each subtype, but at least you have some assurance that all rows in each subtype have the same columns, you can use NOT NULL if you need to, you can use SQL data types, you can use referential integrity constraints, etc. From a relational perspective, it's a better design than EAV.
One more option that you didn't mention is called Serialized LOB. That is, add a BLOB column for a semi-structured collection of custom attributes. Store XML, YAML, JSON, or your own DSL in that column. You won't be able to parse individual attributes out of that BLOB easily with SQL, you'll have to fetch the whole BLOB back into your application and extract individual attributes in code. So in some ways it's less convenient. But if that satisfies your usage of the data, then there's nothing wrong with that.
I think it depends mostly on how you want to use this data.
First of all, I don't really see the benefit of option 3 over option 2. I think separating the special tables in another schema will make your application harder to maintain, especially if later on commonalities are found between 'special values'.
As another option I would say:
- Store the special values in an XML fragment (or blob). Most databases have ability to query on XML structures these days, so without the need for many extra tables, you would keep your flexibility for a small performance hit.
If you put all the special values in one table, you get a very sparse table. Most normal DBMSes cannot handle this very well, but there are some implementations that specialize in this. You could benefit from that.
Do you often need to query the key-value pairs? if you basically access that table through it's entry_id, I think having a key-value table is not a bad design. An extra index on the kay column might even help you when you do need to query for special values. If you build an application layer on top of your database, the key-value table will map on a Map or Hash structure, which can also easily be used.
It also depends on the different types of values you want to store. If there are many different types, that need to be easily accessed (instead of being serialized/deserialized to XML/Character-String) you might want to store the type in a separate column, but that will usually lead to a very complicated design.
Hope this helps (a little bit).
-Maarten
http://en.wikipedia.org/wiki/Entity-Attribute-Value_model
Suggest you read about the problems with entity value tables before deciding to use them.
Oracle can deal with sparsely filled tables quite well. I think you can use a similar approach as company salesforce uses. They use tables with a lot of columns, they create columns when needed. You can index those columns much better than an eav model.
So it is flexible but it performs better than an eav model.
Read: Ask Tom 1, Ask Tom 2, High Scalabilty and SalesForce.
The "Option 1" patterns is also called the "Universal Relation" At first look it seems like a short cut to not doing potentially difficult data modeling. It trades effortless data modeling for not being able to do simple select, update, delete without dramatically more effort than it would take on more usual looking data model with multiple tables.

Fluent Nhibernate and Dynamic Table Name

I've got a parent and child object. Depending on a value in the parent object changes the table for the child object. So for example if the parent object had a reference "01" then it will look in the following table "Child01" whereas if the reference was "02" then it would look in the table "Child02". All the child tables are the same as in number of columns/names/etc.
My question is that how can I tell Fluent Nhibernate or nhibernate which table to look at as each parent object is unique and can reference a number of different child tables?
I've looked at the IClassConvention in Fluent but this seems to only be called when the session is created rather than each time an object is created.
I found only two methods to do this.
Close and recreate the nhibernate session every time another dynamic table needs to be looked at. On creating the session use IClassConvention to dynamically calculate the name based on user data. I found this very intensive as its a large database and a costly operation to create the session every time.
Use POCO object for these tables with custom data access.
As statichippo stated I could use a basechild object and have multiple child object. Due to the database size and the number of dynamic table this wasn't really a valid option.
Neither of my two solutions I was particularly happy with but the POCO's seemed the best way for my problem.
NHibernate is intended to be an object relational mappers. It sounds like you're doing more of a scripting style and hoping to map your data instead of working in an OOP manner.
It sounds like you have the makings of an class hierarchy though. What it sounds like you're trying to create in your code (and then map accordingly) is a hierarchy of different kinds of children:
BaseChild
--> SmartChild
--> DumbChild
Each child is either smart or dumb, but since they all have a FirstName, LastName, Age, etc, they all are instances of the BaseChild class which defines these. The only differences might be that the SmartChild has an IQ and the DumbChild has a FavoriteFootballTeam (this is just an example, no offense to anyone of course ;).
NHibernate will let you map this sort of relationship in many ways. There could be 1 table that encompasses all classes or (what it sounds like you want in your case), one table per class.
Did I understand the issue/what you're looking for?

Optimal DB structure for additional fields entity

I have a table in a DB (Postgres based), which acts like a superclass in object-oriented programming. It has a column 'type' which determines, which additional columns should be present in the table (sub-class properties). But I don't want the table to include all possible columns (all properties of all possible types).
So I decided to make a table, containg the 'key' and 'value' columns (i.e. 'filename' = '/file', or 'some_value' = '5'), which contain any possible property of the object, not included in the superclass table. And also made one related table to contain the available 'key' values.
But there is a problem with such architecture - the 'value' column should be of a string data type by default, to be able to contain anything. But I don't think converting to and from strings is a good decision. What is the best way to bypass this limitation?
The design you're experimenting with is a variation of Entity-Attribute-Value, and it comes with a whole lot of problems and inefficiencies. It's not a good solution for what you're doing, except as a last resort.
What could be a better solution is what fallen888 describes: create a "subtype" table for each of your subtypes. This is okay if you have a finite number of subtypes, which sounds like what you have. Then your subtype-specific attributes can have data types, and also a NOT NULL constraint if appropriate, which is impossible if you use the EAV design.
One remaining weakness of the subtype-table design is that you can't enforce that a row exists in the subtype table just because the main row in the superclass table says it should. But that's a milder weakness than those introduced by the EAV design.
edit: Regarding your additional information about comments-to-any-entity, yes this is a pretty common pattern. Beware of a broken solution called "polymorphic association" which is a technique many people use in this situation.
How about this instead... each sub-type gets its own DB table. And the base/super table just has a varchar column that holds the name of the sub-type DB table. Then you can have something like this...
Entity
------
ID
Name
Type
SubTypeName (value of this column will be 'Dog')
Dog
---
VetName
VetNumber
etc
If you don't want your (sub-)table names to be varchar values in the base table, you can also just have a SubType table whose primary key will be in the base table.
The only workaround (while retaining your strucure) is to have separate tables:
create table IntProps(...);
create table StringProps(...);
create table CurrencyProps(...);
But I do not think that this is a good idea...
One common approach is having the key-value table contain multiple columns, one for each data type, i.e. StringValue, DecimalValue, etc.
Just know you're trading queryability and performance for a database schema you don't need to change. You could also consider ORM mapping or an object database.
You could have a per type key/value table. The available table would need to encode the availability of a specific key/type pair to point to the correctly typed key/value table.
This seems like a highly inefficient architecture in for a row based relational databases however.
Perhaps you should take a look at a column oriented relational database?
Thanks for the answers. I'll explain a little bit more specifically what i need.
There's a need to program a blog+forum website, and I've been looking at the WordPress DB structure.
There's a strong need for the ability to place comments to any kind of 'object', like a blog entry, or a video file attachment to it. The above DB structure being very easy to scale and to fulfill all our needs was the reason of its choice.
But that's not late to change it, cause this is in stage of early engineering. Also our model smells now like a completely tree-hierarchy based DB. For now I'll accept Bill Karwin's and fallen888 answers, but maybe I'm going in a totally wrong direction?
about the user being able to add a new field to the table:
I admire all these people making comments.
I used to be interested in this kind of thing a few years ago, but have written little code recently (apart from a little bit of PHP and MYSQL).
I think it's fine if you want to keep going - you may end up with something new.
Sorry to pour any cold water on the scheme - I admire your efforts. My personal belief is that if you go far enough in this direction, you will end up with a system that interprets more of natural language than SQL does. (Around 1970, SQL was actually spelt Sequel, and it actually stood for "structured english query language", but after they standardized it in the 1970's - I think someone said that Oracle was the first commercial implementation, 19079, the "English" got dropped off, because I guess they decided that it was only a tiny subset of English.
I have run out of steam in this area, because I haven't got a job. Without an easy job that pays the bills, where I can experiment with these ideas, it's a bit hard to concentrate on this area.
Best wishes to all.
sorry, I wrote 19079 above, I meant the year 1979. Oracle got their first contract writing a database for the CIA.