Is it acceptable to have multiple aggregation that can theoretically be inconsistent? - oop

I have a question about the modelling of classes and the underlying database design.
Simply put, the situation is as follows: at the moment we have Positions and Accounts objects and tables and the relationship between them is that a Position 'has an' Account (an Account can have multiple Positions). This is simple aggregation and is handled in the DB by the Position table holding an Account ID as a foreign key.
We now need to extend this 'downwards' with Trades and Portfolios. One or more Trades make up a Position (but a Trade is not a Position in itself) and one or more Portfolios make up an Account (but a Portfolio is not an Account in itself). Trades are associated with Portfolios just like Positions are associated with Accounts ('has a'). Note that it is still possible to have a Position without Trades and an Account without Portfolios (i.e. it is not mandatory to have all the existing objects broken down in subcomponents).
My first idea was to go simply for the following (the first two classes already exist):
class Account;
class Position {
Account account;
}
class Portfolio {
Account account;
}
class Trade {
Position position;
Portfolio portfolio;
}
I think the (potential) problem is clear: starting from Trade, you might end up in different Accounts depending if you take the Position route or the Portfolio route. Of course this is never supposed to happen and the code that creates and stores the objects should never be able create such an inconsistency. I wonder though whether the fact that it is theoretically possible to have an inconsistent database implies a flawed design?
Looking forward to your feedback.

The design is not flawed just because there are two ways to get from class A to class D, one way over B and one over C. Such "squares" will appear often in OOP class models, sometimes not so obvious, especially if more classes lie in the paths. But as Dan mentioned, always the business semantics determine if such a square must commute or not (in the mathematic sense).
Personally I draw a = sign inside such a square in the UML diagram to indicate that it must commute. Also I note the precise formula in an UML comment, in my example it would be
For every object a of class A: a.B.D = a.C.D
If such a predicate holds, then you have basically two options:
Trust all programmers to not break the rule in any code, since it is very well documented
Implement some error handling (like Dan and algirdas mentioned) or, if you don't want to have such code in your model, create a Checker controller, which checks all conditions in a given model instance.

Related

How to decide when to abstract common properties?

Imagine we have two types of requests, an InvoiceRequest and a QuoteRequest. How would you prefer the object model (classes) be and the database model ? Which one of the following two make more sense ?
InvoiceRequest:
- id
- amount
- discount
- date
- invoiceSpecificFieldHere
QuoteRequest:
- id
- amount
- discount
- date
- quoteSpecificFieldHere.
Or does this one make more sense?
RequestData:
- amount
- discount
- date
InvoiceRequest:
- id
- requestData: <RequestData>
- invoiceSpecificProperty
QuoteRequest:
- id
- requestData: <RequestData>
- quoteSpecificProperty.
I'm not representing a third option using inheritance in purpose.
The question behind this question, is the following; if we go with design 2, we reduce redundancy, however there is something about it that doesn't feel right. I think discount should be at the same level as quoteSpecificProperty. And putting it inside the requestData object doesn't model this correctly.
My impression is that you are mixing concepts from object-oriented modeling and relational data modeling. This is since your second solution is not correct from a relational data modeling point of view.
Since I do not know your exact needs in term of implementation of the model, I'll try to propose a solution for different situations.
If you want to use a pure Object-Oriented Model, implemented with an object-oriented language, you should obviously define a superclass Request, with two subclasses InvoiceRequest and QuoteRequest, both of them with the specific properties.
If you want to implement your situation in a pure relational model, with a relational database, you should define three tables:
Requests:
- id (Primary Key)
- amount
- discount
- date
InvoiceRequests:
- id (Primary Key) (Foreign Key for Requests)
- invoiceSpecificProperty
QuoteRequests:
- id (Primary Key) (Foreign Key for Requests)
- quoteSpecificProperty.
Finally, if you want to use an Object-Relational Mapping, you should design a superclass Request, with two subclasses InvoiceRequest and QuoteRequest, both of them with the specific properties, and then you can map it onto a relational database with a model like the previous one.
Of course there is another possibility in the relational modeling, i.e. to have a single table Requests, with all the attributes, includind quote specific and invoice specific, as well as an attribute to distinguish which kind of request is the current one.
The second one has a lot more sense, because when you design your objects, and the fields they have, you are making and abstraction of the real word, and how it see and what behavior has in it. You are dealing here with something called normalization,
Database Normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.
That relationship not always match perfectly with the reality in the world, but you must abstract from the real word and to treat the data as it is related to each other.
I will share with you some information I collected this week.
Maybe the SOLID principles would help you to decide that.
SOLID =(Single responsibility principle,Open/closed principle,
Liskov substitution principle,Interface segregation principle,
Dependency inversion principle or Dependency injection principle.
Alright, that's much more than property abstraction. Let see Some examples:
S
According Wikipedia, Single responsibility principle means
One class shall have only one reason that justifies changing its
implementation;
Classes shall have few dependencies on other classes;
Classes shall be abstract from the particular layer they are running.
O
When you define a class or a unit, keep in mind:
They shall be open for extension;
But closed for modification.
About modification, think that, in bug situation, which you are obligated to do that, a modification in second model is most easy for common fields.
First model
InvoiceRequest:
- id
- amount
- discount
- date
- invoiceSpecificFieldHere
QuoteRequest:
- id
- amount
- discount
- date
- quoteSpecificFieldHere.
Second model-Common fields
QuoteRequest:
- id
- requestData: <RequestData>
- quoteSpecificProperty.
L
According "Barbara Liskovs substitution principle" , if TChild is a subtype of TParent, then objects of type TParent may be replaced with objects of type TChild without altering any of the desirable properties of that program (correctness, task performed, etc.).
I mean, the objects of TParent, the instances of TParent, not the TParent classes properly.
That is an interesting topic to think when you want to implement this example using Interface. Also follow:
I
Interface segregation principle
D
Dependency Inversion Principle
Another form of decoupling is to invert the dependency between high and low level of a software design:
- High-level modules should not depend on low-level modules. Both
should depend on abstractions;
- Abstractions should not depend upon details. Details should depend
upon abstractions.
To know more about SOLID principle, read http://blog.synopse.info/post/2011/11/27/SOLID-design-principles
In resume, observe three characteristics of an object model:
Rigidity – Hard to change something because every change affects too
many other parts of the system;
Fragility – When you make a change, unexpected parts of the system
break;
Immobility – Hard to reuse in another application because it cannot
be disentangled from the current application.
Special thanks for A.Bouchez, source http://blog.synopse.info/post/2011/11/27/SOLID-design-principles
Invoice and quote are two completelly different things even they look similar. It's better to keep them separated because changes to one might produce unwanted side effects to the other.

DDD: Do item counts belong in domain model?

Say you are modeling a forum and you are doing your best to make use of DDD and CQRS (just the separate read model part). You have:
Category {
int id;
string name;
}
Post {
int id;
int categoryId;
string content;
}
Every time that a new post has been created a domain event PostCreated is raised.
Now, our view wants to project count of posts for each category. My domain doesn't care about count. I think I have two options:
Listen for PostCreated on the read model side and increment the count using something like CategoryQueryHandler.incrimentCount(categoryId).
Listen for PostCreated on domain side and increment the count using something like CategoryRepo.incrimentCount(categoryId).
The same question goes for all the other counts like number of posts by user, number of comments in a post, etc. If I don't use these counts anywhere except my views should I just have my query handlers take care of persisting them?
And finally if one of my domain services will ever want to have a count of posts in category do I have to implement the count property onto the category domain model or can that service simply use read model query to get that count or alternatively a repository query such as CategoryRepo.getPostCount(categoryId).
My domain doesn't care about count.
This is equivalent to saying that you don't have any invariant that requires or manages the count. Which means that there isn't an aggregate where count makes sense, so the count shouldn't be in your domain model.
Implement it as a count of PostCreated events, as you suggest, or by running a query against the Post store, or.... whatever works for you.
If I don't use these counts anywhere except my views should I just have my query handlers take care of persisting them?
That, or anything else in the read model -- but you don't even need that much if your read model supports something like select categoryId, count(*) from posts...
domain services will ever want to have a count of posts in category
That's a pretty strange thing for a domain service to want to do. Domain services are generally stateless query support - typically they are used by an aggregate to answer some question during command processing. They don't actually enforce any business invariant themselves, they just support an aggregate in doing so.
Querying the read model for counts to be used by the write model doesn't make sense, on two levels. First, that the data in the read model is stale - any answer you get from that query can change between the moment that you complete the query and the moment when you attempt to commit the current transaction. Second, once you've determined that stale data is useful, there's no particular reason to prefer the stale data observed during the transaction to stale data prior. Which is to say, if the data is stale anyway, you might as well pass it to the aggregate as a command argument, rather than hiding it in a domain service.
OTOH, if your domain needs it -- if there is some business invariant that constraints count, or one that uses the count to constrain something else -- then that invariant needs to be captured in some aggregate that controls the count state.
Edit
Consider two transactions running concurrently. In transaction A, Aggregate id:1 running a command that requires the count of objects, but the aggregate doesn't control that count. In transaction B, Aggregate id:2 is being created, which changes the count.
Simple case, the two transactions happen by luck to occur in contiguous blocks
A: beginTransaction
A: aggregate(id:1).validate(repository.readCount())
A: repository.save(aggregate(id:1))
A: commit
// aggregate(id:1) is currently valid
B: beginTransaction
B: aggregate(id:2) = aggregate.new
B: repository.save(aggregate(id:2))
B: commit
// Is aggregate(id:1) still in a valid state?
I represent that, if aggregate(id:1) is still in a valid state, then its validity doesn't depend on the timeliness of the repository.readCount() -- using the count prior to the beginning of the transaction would have been just as good.
If aggregate(id:1) is not in a valid state, then its validity depends on data outside its own boundary, which means that the domain model is wrong.
In the more complicated case, the two transactions can be running concurrently, which means that we might see the save of aggregate(id:2) happen between the read of the count and the save of aggregate(id:1), like so
A: beginTransaction
A: aggregate(id:1).validate(repository.readCount())
// aggregate(id:1) is valid
B: beginTransaction
B: aggregate(id:2) = aggregate.new
B: repository.save(aggregate(id:2))
B: commit
A: repository.save(aggregate(id:1))
A: commit
It may be useful to consider also why having a single aggregate that controls the state fixes the problem. Let's change this example up, so that we have a single aggregate with two entities....
A: beginTransaction
A: aggregate(version:0).entity(id:1).validate(aggregate(version:0).readCount())
// entity(id:1) is valid
B: beginTransaction
B: entity(id:2) = entity.new
B: aggregate(version:0).add(entity(id:2))
B: repository.save(aggregate(version:0))
B: commit
A: repository.save(aggregate(version:0))
A: commit
// throws VersionConflictException
Edit
The notion that the commit (or the save, if you prefer) can throw is an important one. It highlights that the model is a separate entity from the system of record. In the easy cases, the model prevents invalid writes and the system of record prevents conflicting writes.
The pragmatic answer may be to allow this distinction to blur. Trying to apply a constraint to the count is an example of Set Validation. The domain model is going to have trouble with that unless a representation of the set lies within an aggregate boundary. But relational databases tend to be good at sets - if your system of record happens to be a relational store, you may be able to maintain the integrity of the set by using database constraints/triggers.
Greg Young on Set Validation and Eventual Consistency
How you approach any problem like this should be based on an understanding of the business impact of the particular failure. Mitigation, rather than prevention, may be more appropriate.
When it comes to counts of things I think one has to consider if you actually need to save the count to the DB or not.
In my view in most cases you do not need to save counts unless their calculation is very expensive. So I would not have a CategoryQueryHandler.incrementCount or CategoryRepo.incrementCount.
I would just have a PostService.getPostCount(categoryId) that runs a query like
SELECT COUNT(*)
FROM Post
WHERE CategoryId=categoryId
and then call it when your PostCreated event fires.

Is 'identity' an optional characteristic for real world objects?

Somewhere I've read
An object has three characteristics:
state (e.g. name)
behavior (e.g. reading)
identity(unique id number of student)
As per this information, every object will have unique identification, so that all objects of a class will be different from each other.
but,
In many other places I've read that objects have two characteristics:
state
behavior
Question:
which one is true? objects have 2 characteristics or 3 characteristics?
suppose there are two erasers of the same brand, look, shape, size and color.
So, these two objects should be treated as 'equal objects' as there is nothing to uniquely identify them?
Explicit identity is optional.
However, the culmination of an object's states and behaviors is an implicit identity; thus,
Implicit identity is required.
Two objects can have the exact same intrinsic characteristics (eg. color, size, shape), but differ in their extrinsic characteristics (eg. location, owner).
In this way, two objects may be considered equivalent when compared by a selection of their properties, but would be considered distinct in terms of the culmination of all intrisic or extrinsic states and behaviors.
In your provided analogy, you bring up two identical erasers with the same characteristics. If you show them to us, and ask us, are they different, we will say "No. They are the same."
However, if you were to ask us if these two erasers are actually the same singular eraser, we'd wonder how we got to Philosophy SE.
Identity does not have to be explicitly defined. Take the case of String for example.
If I do:
String a = "ABCDE";
String b = "ABCDEFG".substring(0, 5); //turns into ABCDE
We have two Strings storing identical information ABCDE
We can do two comparisons:
a == b //false
a.equals(b) //true
These two Strings are like your two erasers. They are equal in that they both consist of ABCDE, but they aren't actually the same singular String, but two separate sets of characters that are coincidentally the same thing.
Both a and b point to a unique reference to "ABCDE". In this case, we don't have an explicit identity, but both a and b are unique references, so the language knows, "Hey, these are two different Strings."
Now, let's return to the eraser example. In this case, we haven't been provided any sort of way to differentiate the two, but we can still differentiate the two.
One eraser is on the left, one is on the right (or however they're arranged)
These erasers have implicitly, by us, been given an identity so we recognize that they're two different erasers with identical properties.
We can explicitly define an identity by giving erasers serial numbers or names. They may look the same, but they now have explicit identities rather than the ones we have made up in our head.
Value Objects (for instance date, color, etc..) haven't by essence the notion of identity.
Two distinct Color objects MAY be equal as long as their properties (values) are equal (checked by HashCode/equals).
It is quite possible that your design leads you to define Color as an Entity but in a very few cases.
In the contrary, each entity (user123, car456 etc..) owns a unique id, and comparison are usually made through this id. HashCode/equals only take account of the id.
So if I want to make a rule for both, it would be:
object has two characteristics: state and behavior
I would argue that you can regard Identity as being part of State... The vast majority of classes will have some form of identity stored with them, but it's not a hard and fast rule. Consider, for instance, strings...
Out in the real world though, the vast majority of what you do will involve talking to databases, and joining information together. Keys are crucial to that, and your identity is basically your key... In the old days just because it was in your database didn't mean you would have it in your class, but in this era of Object Relational Mappers I'd get used to the idea if I were you...
If two objects are completely the same in every way, there's no way to distinguish between them, so they're the same object. Distinct objects — those that are not the same object — are distinct because they differ in some way. Your two erasers may have the same brand, look, shape, size, and color, but they differ in their position: they don't physically overlap in space. You can put them on the table next to each other and see them in their distinct locations, so you know they're distinct objects.
You may find it useful to consider two distinct objects equivalent even if they differ in some of their properties — two erasers in different physical locations but the same in other characteristics, or two data structures at different memory addresses but containing the same data. This is the difference between Java's equals() method and its == operator.
You are describing the difference between Reference Equality (literally the same object) and Value Equality (have the same properties). Whether or not a given object should have Reference Equality or Value Equality is NOT dependent solely on the object itself; it depends on the context. In most contexts, a dollar bill has value equality: one is just as good as another, we interchange them freely, and we don't care at all about the "identity" of each individual bill. However, to a counterfeit specialist working for the treasury department, all dollar bills are not the same, and the identity of the individual bill matters a lot.
Another example might be an airplane. If I am getting on an American Airlines flight, whether or not the airplane is a 737 or an S80 matters to me: one has 6 seats in a row while the other has 5, one has AC power in every seat while the other does not. But I only care about the properties of the plane, not the identity; one 737 is just as good as another to me. But to the mechanics who maintain the planes, the identity matters a great deal. One plane has just been serviced, while another is approaching its service deadline; keeping track of which one is which is extremely important.
So before you decide how to model your object, consider the context in which it will be used.
I endorse much of #Wyzard's answer, but it doesn't go far enough. To be clear:
YES! Identity is optional for real-world objects.
Even objects that are not truly identical in every way--two potato chips, for example--are for all practical intents and purposes identical. They are identical in use. No one cares about their individuality. There is no naming scheme, no serial numbers, no hash values to distinguish them in any way. Your two erasers or two oysters you just bought at the local raw seafood bar may have some different characteristics, and they have a physics identity (i.e. they can't occupy the same space at the same time), but there is no meaningful way to talk about them other than pointing at them ("this one is bigger than that one!") or describing them ("the blue eraser--no, that other one that looks like it's been slightly used"). Their identity is only transient, and only what individual users assign. Very few people bother naming their erasers or their oysters. ("Go to work, Fred! Erase!" or "Francine was delicious! Bring me another! I shall name her Sally Wellfleet!")
This no-real-identity is true of many manufactured products, such as nails, pieces of lumber, ball bearings, Ibuprofen caplets, or bottles of Ibuprofen. Manufacturers often track cohort identity--the batch number of which they were a part, for example. But that is the finest granularity for which true identity information is created or considered.
Now, this isn't generally true of electronic devices. Even the humblest Ethernet adaptor, Bluetooth transponder, or RFID tag has an elaborate identification system. It has a manufacturer, a model/part number, a serial number, and often a designed identity (device id), There may also be a "current address" like a MAC address or other "I am operating at/as" identity. Many of these pieces of identifying information are available via reflection. Individual chips that make up the device may us a manufacturing "batch id" system, but the operational device has a more overt individual identity.
But most real-world objects are not electronic transponders, and they have no meaningful identity other than what we assign.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Accessing the join table in a hql query for a many-to-many relationship in grails

I have 2 domain classes with a many-to-many relationship in grails: decks and cards.
The setup looks like this:
class Deck {
static hasMany = [cards: Card]
}
class Card {
static hasMany = [decks: Deck]
static belongsTo = Deck
}
After I delete a deck, I want to also delete all cards which no longer belong to a deck. The easiest way to accomplish this is to write something like the following sql:
delete from card where card.id not in(select card_id from deck_cards);
However, I can't figure out how to write a HQL query which will resolve to this SQL because the join table, deck_cards, does not have a corresponding grails domain class. I can't write this statement using normal joins because HQL doesn't let you use joins in delete statements, and if I use a subquery to get around this restriction mySQL complains because you're not allowed to refer to the table you're deleting from in the "from" section of the subquery.
I also tried using the hibernate "delete-orphan" cascade option but that results in all cards being deleted when a deck is deleted even if those cards also belong to other decks. I'm going crazy - this seems like it should be a simple task.
edit
There seems to be some confusion about this specific use of "decks" and "cards". In this application, the "cards" are flashcards and there can be tens of thousands of them in a deck. Also, it is sometimes necessary to make a copy of a deck so that users can edit it as they see fit. In this scenario, rather than copying all the cards over, the new deck will just reference the same cards as the old deck, and if a card is changed only then will a new card be created. Also, while I can do this delete in a loop in groovy, it will be very slow and resource-intensive since it will generate tens of thousands of sql delete statements rather than just 1 (using the above sql). Is there no way to access a property of the join table in HQL?
First, I don't see the point in your entities.
It is illogical to make a card belong to more than one deck. And it is illogical to have both belongTo and hasMany.
Anyway, Don't use HQL for delete.
If you actually need a OneToMany, use session.remove(deck) and set the cascade of cards to REMOVE or ALL.
If you really want ManyToMany, do the checks manually on the entities. In pseudocode (since I don't know grails):
for (Card card : deck.cards} {
if (card.decks.size == 0) {
session.remove(card);
}
}
I won't be answering the technical side, but challenging the model. I hope this will also be valuable to you :-)
Functionally, it seems to me that your two objects don't have the same lifecycle:
Decks are changing : they are created, filled with Cards, modified, and deleted. They certainly need to be persisted to your database, because you wouldn't be able to recreate them using code otherwise.
Cards are constant : the set of all cards is known from the beginning, they keep existing. If you delete a Card once in the database, then you will need to recreate the same Card later when someone needs to put it in a Deck, so in all cases you will have a data structure that is responsible for providing the list of possible Cards. If they are not saved in your database, you could recreate them...
In the model you give, the cards have a set of Decks that hold them. But that information has the same lifecycle than the Decks' (changing), so I suggest to hold the association only on the Deck's side (uni-directional Many-To-Many relationship).
Now you've done that, your Cards are really constant information, so they don't even need to be persisted into the database. You would still have a second table (in addition to the Deck), but that Card table would only contain the identifying information for the Card (could be a simple integer 1 to 52, or two values, depending what you need to "select" in your queries), and not other fields (an image, the strength, some points etc...).
In Hibernate, these choices turns the Many-To-Many relationship to a Collection of values (see Hibernate reference).
With a Collection of Values, Card is not an Entity but a Component. And you don't have to delete them, everything is automatically taken care by Hibernate.