Accessing the join table in a hql query for a many-to-many relationship in grails - sql

I have 2 domain classes with a many-to-many relationship in grails: decks and cards.
The setup looks like this:
class Deck {
static hasMany = [cards: Card]
}
class Card {
static hasMany = [decks: Deck]
static belongsTo = Deck
}
After I delete a deck, I want to also delete all cards which no longer belong to a deck. The easiest way to accomplish this is to write something like the following sql:
delete from card where card.id not in(select card_id from deck_cards);
However, I can't figure out how to write a HQL query which will resolve to this SQL because the join table, deck_cards, does not have a corresponding grails domain class. I can't write this statement using normal joins because HQL doesn't let you use joins in delete statements, and if I use a subquery to get around this restriction mySQL complains because you're not allowed to refer to the table you're deleting from in the "from" section of the subquery.
I also tried using the hibernate "delete-orphan" cascade option but that results in all cards being deleted when a deck is deleted even if those cards also belong to other decks. I'm going crazy - this seems like it should be a simple task.
edit
There seems to be some confusion about this specific use of "decks" and "cards". In this application, the "cards" are flashcards and there can be tens of thousands of them in a deck. Also, it is sometimes necessary to make a copy of a deck so that users can edit it as they see fit. In this scenario, rather than copying all the cards over, the new deck will just reference the same cards as the old deck, and if a card is changed only then will a new card be created. Also, while I can do this delete in a loop in groovy, it will be very slow and resource-intensive since it will generate tens of thousands of sql delete statements rather than just 1 (using the above sql). Is there no way to access a property of the join table in HQL?

First, I don't see the point in your entities.
It is illogical to make a card belong to more than one deck. And it is illogical to have both belongTo and hasMany.
Anyway, Don't use HQL for delete.
If you actually need a OneToMany, use session.remove(deck) and set the cascade of cards to REMOVE or ALL.
If you really want ManyToMany, do the checks manually on the entities. In pseudocode (since I don't know grails):
for (Card card : deck.cards} {
if (card.decks.size == 0) {
session.remove(card);
}
}

I won't be answering the technical side, but challenging the model. I hope this will also be valuable to you :-)
Functionally, it seems to me that your two objects don't have the same lifecycle:
Decks are changing : they are created, filled with Cards, modified, and deleted. They certainly need to be persisted to your database, because you wouldn't be able to recreate them using code otherwise.
Cards are constant : the set of all cards is known from the beginning, they keep existing. If you delete a Card once in the database, then you will need to recreate the same Card later when someone needs to put it in a Deck, so in all cases you will have a data structure that is responsible for providing the list of possible Cards. If they are not saved in your database, you could recreate them...
In the model you give, the cards have a set of Decks that hold them. But that information has the same lifecycle than the Decks' (changing), so I suggest to hold the association only on the Deck's side (uni-directional Many-To-Many relationship).
Now you've done that, your Cards are really constant information, so they don't even need to be persisted into the database. You would still have a second table (in addition to the Deck), but that Card table would only contain the identifying information for the Card (could be a simple integer 1 to 52, or two values, depending what you need to "select" in your queries), and not other fields (an image, the strength, some points etc...).
In Hibernate, these choices turns the Many-To-Many relationship to a Collection of values (see Hibernate reference).
With a Collection of Values, Card is not an Entity but a Component. And you don't have to delete them, everything is automatically taken care by Hibernate.

Related

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Is it acceptable to have multiple aggregation that can theoretically be inconsistent?

I have a question about the modelling of classes and the underlying database design.
Simply put, the situation is as follows: at the moment we have Positions and Accounts objects and tables and the relationship between them is that a Position 'has an' Account (an Account can have multiple Positions). This is simple aggregation and is handled in the DB by the Position table holding an Account ID as a foreign key.
We now need to extend this 'downwards' with Trades and Portfolios. One or more Trades make up a Position (but a Trade is not a Position in itself) and one or more Portfolios make up an Account (but a Portfolio is not an Account in itself). Trades are associated with Portfolios just like Positions are associated with Accounts ('has a'). Note that it is still possible to have a Position without Trades and an Account without Portfolios (i.e. it is not mandatory to have all the existing objects broken down in subcomponents).
My first idea was to go simply for the following (the first two classes already exist):
class Account;
class Position {
Account account;
}
class Portfolio {
Account account;
}
class Trade {
Position position;
Portfolio portfolio;
}
I think the (potential) problem is clear: starting from Trade, you might end up in different Accounts depending if you take the Position route or the Portfolio route. Of course this is never supposed to happen and the code that creates and stores the objects should never be able create such an inconsistency. I wonder though whether the fact that it is theoretically possible to have an inconsistent database implies a flawed design?
Looking forward to your feedback.
The design is not flawed just because there are two ways to get from class A to class D, one way over B and one over C. Such "squares" will appear often in OOP class models, sometimes not so obvious, especially if more classes lie in the paths. But as Dan mentioned, always the business semantics determine if such a square must commute or not (in the mathematic sense).
Personally I draw a = sign inside such a square in the UML diagram to indicate that it must commute. Also I note the precise formula in an UML comment, in my example it would be
For every object a of class A: a.B.D = a.C.D
If such a predicate holds, then you have basically two options:
Trust all programmers to not break the rule in any code, since it is very well documented
Implement some error handling (like Dan and algirdas mentioned) or, if you don't want to have such code in your model, create a Checker controller, which checks all conditions in a given model instance.

How to fix my m..n relationship in nosql (mongodb)?

At first I'm trying to make a rally (you know cars with drivers...) database. I have two collections: drivers { name, address, sex, ... } and then another one tournaments { name, location, price, ... }
I try to keep it simple. In a tournament there should be drivers (because a tournament without drivers...well its not nice ^^). And there is my problem, in a normal sql database I could select two primary keys (lets say name in drivers and name in tournaments - just to keep it simple, I know name as primary key is not nice). And because its an m..n relationship (is it right?) I would make a 3. Table with the two primary keys. OK that would be easy. But how should I solve this problem in mongodb. I thought something like: tournaments { name, location, price, ... drivers { driver_1, ..., driver_n } } , but im not sure. I'm using Java so I could make some special Classes which one is handling this relationship problem? I don't understand the other mongodb tutorials. Any ideas? Thank you for any help!
There are a few ways to do this:
As #Gianluca describes you can perform this linking manually by adding a driver's _id ObjectId or another identifying property (probably one you have a unique index on) to a "drivers" array in a tournament document. e.g. tournament : { ... drivers : ["6019235867192384", "73510945093", ...]}
Another option specifically built for this referencing is the DBRef specification which provides a more formal method probably more similar to what you're familiar in the SQL world. DBRef is supported by the java driver and allows you to scope your reference to a collection (basically saying where this reference comes from). I wouldn't be surprised if in the future versions of MongoDB cross-collection queries will be supported, although they are not currently.
More information here.
Also if you aren't using a DAO framework I would suggest Morphia which supports DBRef with a nice #Reference annotation.
I solved the problem using the _id field that every document had and is unique.
So in you case you just need to create a collection that has the ObjectId of the torunaments and some ObjectId from the collection drivers. Or you can just put the ObejctId of the driver directly in the torunaments collection. Probably not the best solution, but it work
Gianluca
Add an array field drivers in the trournaments type and put the _ids of the drivers in there.
To add/remove drivers, just update the field. There is no need for an intermediary N:M mapping table unless the array gets really huge.
If it gets huge, the usual solution is to cut the array into several smaller ones and save them in several documents that you can look up quickly by using the id_ of the container (the tournament). Removing and sorting is then a pain, of course.

NHibernate How to make Criteria inner join without hydrating objects?

Some quick nhibernate problem:
I have sql tables:
Item { Id, Name }
ItemRange { Id, Name }
ItemHasItemRange { Id, ItemId, ItemRangeId }
Mappings are simple, so I will not paste them, the ItemId and ItemRangeId are foreign keys, Item class has ItemHasItemRanges collection mapped as lazy bag.
I want all items which are in particular ItemRange, but I do not want to retrieve associated ItemRangeObjects, I just want to do inner join to narrow results.
When I do it like that:
c.CreateCriteria("Item", "i")
.CreateAlias("ItemHasItemRanges", "ihpr", JoinType.InnerJoin)
.Add(Restrictions.Eq("ihpr.ItemRange.Id", I18nHelper.CurrentItemRange.Id));
It works fine, but all ItemHasItemRange objects are fetched as well to the Item.ItemHasItemRanges collections (which is mapped as lazy)
I do not want to fetch Item.ItemHasItemRanges, because it takes time. I just want to do inner join to limit result set. It is possible in NHibernate?
So I think that you just want to retrieve those objects in order to show an overview / list, and you are not going to actually 'do' something with those objects (unless perhaps loading one of them) ?
In that case, I think that it is better for you to work with 'projections'.
Here's the scenario:
You'll have to create a (simple) class that just contains the properties that you want to show (where you're interested in).
You'll have to 'import' that class into NHibernate, so that NHibernate knows of its existence.
Next, you can create your Criteria statement like you have it now. (Working with your domain classes).
Then, you should specify how the projection should look like. That is, how the properties of your Item entity map to the properties of your 'DTO'/View class (= the simple class you just created).
Specify that an AliasToBean ResultTransformer should be used.
Then, execute your Criteria query. NHibernate will be able to produce the simplest possible query that is needed in order to retrieve all the data that is necessary.
I've explained something similar here
I find out the problem was somewhere else. ItemHasItemRange table did not have multiple index on ItemId and ItemRangeId - id only had separate indexes on each field. Thats why performance was so poor.
But NHibernate question is still valid - is it possible to create inner join for criteria only to narrow results and not to fetch all joined objects which normally are lazy.

How can one delete an entity in nhibernate having only its id and type?

I am wondering how can one delete an entity having just its ID and type (as in mapping) using NHibernate 2.1?
If you are using lazy loading, Load only creates a proxy.
session.Delete(session.Load(type, id));
With NH 2.1 you can use HQL. Not sure how it actually looks like, but something like this: note that this is subject to SQL injection - if possible use parametrized queries instead with SetParameter()
session.Delete(string.Format("from {0} where id = {1}", type, id));
Edit:
For Load, you don't need to know the name of the Id column.
If you need to know it, you can get it by the NH metadata:
sessionFactory.GetClassMetadata(type).IdentifierPropertyName
Another edit.
session.Delete() is instantiating the entity
When using session.Delete(), NH loads the entity anyway. At the beginning I didn't like it. Then I realized the advantages. If the entity is part of a complex structure using inheritance, collections or "any"-references, it is actually more efficient.
For instance, if class A and B both inherit from Base, it doesn't try to delete data in table B when the actual entity is of type A. This wouldn't be possible without loading the actual object. This is particularly important when there are many inherited types which also consist of many additional tables each.
The same situation is given when you have a collection of Bases, which happen to be all instances of A. When loading the collection in memory, NH knows that it doesn't need to remove any B-stuff.
If the entity A has a collection of Bs, which contains Cs (and so on), it doesn't try to delete any Cs when the collection of Bs is empty. This is only possible when reading the collection. This is particularly important when C is complex of its own, aggregating even more tables and so on.
The more complex and dynamic the structure is, the more efficient is it to load actual data instead of "blindly" deleting it.
HQL Deletes have pitfalls
HQL deletes to not load data to memory. But HQL-deletes aren't that smart. They basically translate the entity name to the corresponding table name and remove that from the database. Additionally, it deletes some aggregated collection data.
In simple structures, this may work well and efficient. In complex structures, not everything is deleted, leading to constraint violations or "database memory leaks".
Conclusion
I also tried to optimize deletion with NH. I gave up in most of the cases, because NH is still smarter, it "just works" and is usually fast enough. One of the most complex deletion algorithms I wrote is analyzing NH mapping definitions and building delete statements from that. And - no surprise - it is not possible without reading data from the database before deleting. (I just reduced it to only load primary keys.)