ORM and many-to-many relationships

ORM and many-to-many relationships - nhibernate

This is more or less a general question and not about any specific ORM or language in particular: this question comes up regardless of your ORM preference.
When mapping a many-to-many relationship it is possible to obscure the intermediary table or to make the intermediary table a part of your model. In the case that the intermediary table has valuable data beyond the relationship, how do you handle the mapping?
Consider the following tables:
CaseWorker (id, first_name, last_name)
CaseWorkerCases (case_worker_id, case_id, date_opened, date_closed)
Case (id, client_id, field_a, field_b)
As a programmer I would really rather be able to do:
CaseWorker.Cases
than
CaseWorker.CaseWorkerCases.Cases
On the one hand, the table CaseWorkerCases contains useful data and hiding the intermediary table makes accessing that data less than convenient. On the other, having to navigate through the intermediary table makes the common task of accessing Cases seem awkward.
I supose one solution could be to expose the intermediate table in the model and then give the CaseWork object a wrapper property could work. Something like:
public IEnumerable<Case> Cases
{
get{return (from caseWorkerCase in this.CaseWorkerCases
select caseWorkerCase.Case);}
}
But that also seems wrong.

I regard many-to-many mappings as just a notational abbreviation for two one-to-many mappings with the intermediate table, as you call it, enabling simplification of the relationships. It only works where the relationships do not have attributes of their own. However, as understanding of the particular domain improves, I usually find that many-to-many mappings usually need to be broken down to allow attributes to be attached. So my usual approach these days is to always simply use one-to-many mappings to start with.

I don't think your workaround is wrong. The complexities of these models have to be coded somewhere.
I have a blog post about this exact topic: Many-to-many relationships with properties

Related

Core Data ordered many-to-many relationships

Using Core Data, I have two entities that have many-to-many relationships. So:
Class A <<---->> Class B
Both relationships are set up as 'ordered' so I can track they're order in a UITableView. That works fine, no problem.
I am about to try and implement iCloud with this Core Data model, and find out that iCloud doesn't support ordered relationships, so I need to reimplement the ordering somehow.
I've done this with another entity that has a one-to-many relationship with no problem, I add an 'order' attribute to the entity and store it's order information there. But with a many-to-many relationship I need an unknown number of order attributes.
I can think of two solutions, neither of which seem ideal to me so maybe I'm missing something;
Option 1. I add an intermediary entity. This entity has a one-to-many relationship with both entities like so:
Class A <<--> Class C <-->> Class B
That means I can have the single order attribute in this helper entity.
Option 2. Instead of an order attribute that stores a single order number, I store a dictionary that I can store as many order numbers as I need, probably with the corresponding object (ID?) as the key and the order number as the value.
I'm not necessarily looking for any code so any thoughts or suggestions would be appreciated.

I think your option 1, employing a "join table" with an order attribute is the most feasible solution for this problem. Indeed, this has been done many times in the past. This is exactly the case for which you would use a join table in Core Data although the framework already gives you many-to-many relationships: if you want to store information about the relationship itself, which is precisely your case. Often these are timestamps, in your case it is a sequence number.
You state: "...solutions, neither of which seem ideal to me". To me, the above seems indeed "ideal". I have used this scheme repeatedly with great performance and maintainability.
The only problem (though it is the same as with a to-one relationship) is that when inserting an item out of sequence you have to update many entities to get the order right. That seems cumbersome and could potentially harm performance. In practice, however, it is quite manageable and performs rather well.
NB: As for arrays or dictionaries to be stored with the entity to keep track of ordering information: this is possible via so-called "transformable" attributes, but the overhead is daunting. These attributes have to be serialized and deserialized, and in order to retrieve one sequence number you have to get all of them. Hardly an attractive design choice.

Before we had ordered relationships for more than 10 years, everyone used a "helper" entity. So that is the thing that you should do.
Additional note 1: This is no "helper" entity. It is a entity that models a fact in your model. In my books I always had the same example:
You have a group entity with members. Every member can belong to many groups. The "helper" entity is nothing else than membership.
Additional note 2: It is hard to synchronize such an ordered relationship. This is why it is not done automatically. However, you have to do it. Since CD and synchronizing is no fun, CD and synchronizing a model with ordered relationship is less than no fun.

Why no many-to-many relationships?

I am learning about databases and SQL for the first time. In the text I'm reading (Oracle 11g: SQL by Joan Casteel), it says that "many-to-many relationships can't exist in a relational database." I understand that we are to avoid them, and I understand how to create a bridging entity to eliminate them, but I am trying to fully understand the statement "can't exist."
Is it actually physically impossible to have a many-to-many relationship represented?
Or is it just very inefficient since it leads to a lot of data duplication?
It seems to me to be the latter case, and the bridging entity minimizes the duplicated data. But maybe I'm missing something? I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship, either in the text or anywhere else I've searched. I've been searching all day and only finding the same information repeated: "don't do it, and use a bridging entity instead." But I like to ask why. :-)
Thanks!

Think about a simple relationship like the one between Authors and Books. An author can write many books. A book could have many authors. Now, without a bridge table to resolve the many-to-many relationship, what would the alternative be? You'd have to add multiple Author_ID columns to the Books table, one for each author. But how many do you add? 2? 3? 10? However many you choose, you'll probably end up with a lot of sparse rows where many of the Author_ID values are NULL and there's a good chance that you'll run across a case where you need "just one more." So then you're either constantly modifying the schema to try to accommodate or you're imposing some artificial restriction ("no book can have more than 3 authors") to force things to fit.

A true many-to-many relationship involving two tables is impossible to create in a relational database. I believe that is what they refer to when they say that it can't exist. In order to implement a many to many you need an intermediary table with basically 3 fields, an ID, an id attached to the first table and an id atached to the second table.
The reason for not wanting many-to-many relationships, is like you said they are incredibly inefficient and managing all the records tied to each side of the relationship can be tough, for instance if you delete a record on one side what happens to the records in the relational table and the table on the other side? Cascading deletes is a slippery slope, at least in my opinion.

Normally (pun intended) you would use a link table to establish many-to-many
Like described by Joe Stefanelli, let's say you had Authors and Books
SELECT * from Author
SELECT * from Books
you would create a JOIN table called AuthorBooks
Then,
SELECT *
FROM Author a
JOIN AuthorBooks ab
on a.AuthorId = ab.AuthorId
JOIN Books b
on ab.BookId = b.BookId
hope that helps.

it says that "many-to-many relationships can't exist in a relational database."
I suspect the author is just being controversial. Technically, in the SQL language, there is no means to explicitly declare a M-M relationship. It is an emergent result of declaring multiple 1-M relations to the table. However, it is a common approach to achieve the result of a M-M relationship and it is absolutely used frequently in databases designed on relational database management systems.
I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship,
They should be used where they are appropriate to be used would be a more accurate way of saying this. There are times, such as the books and authors example given by Joe Stafanelli, where any other solution would be inefficient and introduce other data integrity problems. However, M-M relationships are more complicated to use. They add more work on the part of the GUI designer. Thus, they should only be used where it makes sense to use them. If you are highly confident that one entity should never be associated with more than one of some other entity, then by all means restrict it to a 1-M. For example, if you were tracking the status of a shipment, each shipment can have only a single status at any given time. It would over complicate the design and not make logical sense to allow a shipment to have multiple statuses.

Of course they can (and do) exist. That sounds to me like a soapbox statement. They are required for a great many business applications.
Done properly, they are not inefficient and do not have duplicate data either.
Take a look at FaceBook. How many many-to-many relationships exist between friends and friends of friends? That is a well-defined business need.
The statement that "many-to-many relationships can't exist in a relational database." is patently false.

Many-to-many relationships are in fact very useful, and also common. For example, consider a contact management system which allows you to put people in groups. One person can be in many groups, and each group can have many members.
Representation of these relations requires an extra table--perhaps that's what your book is really saying? In the example I just gave, you'd have a Person table (id, name, address etc) and a Group table (id, group name, etc). Neither contains information about who's in which group; to do that you have a third table (call it PersonGroup) in which each record contains a Person ID and a Group ID--that record represents the relation between the person and the group.
Need to find the members of a group? Your query might look like this (for the group with ID=1):
SELECT Person.firstName, Person.lastName
FROM Person JOIN PersonGroup JOIN Group
ON (PersonGroup.GroupID = 1 AND PersonGroup.PersonID = Person.ID);

It is correct. The Many to Many relationship is broken down into several One to Many relationships. So essentially, NO many to many relationship exists!

Well, of course M-M relationship does exist in relational databases and they also have capability of handling at some level through bridging tables, however as the degree of M-M relationship increases it also increases complexity which results in slow R-W cycles and latency.
It is recommended to avoid such complex M-M relationships in a Relational Database. Graph Databases are the best alternative and good at handling Many to Many relationship between objects and that's why social networking sites uses Graph databases for handling M-M relationship between User and Friends, Users and Events etc.

Let's invent a fictional relationship (many to many relationship) between books and sales table. Suppose you are buying books and for each book you buy needs to generate an invoice number for that book. Suppose also that the invoice number for a book can represent multiple sales to the same customer (not in reality but let's assume). We have a many to many relationship between books and sales entities.
Now if that's the case, how can we get information about only 1 book given that we have purchased 3 books since all books would in theory have the same invoice number? That introduces the main problem of using a many to many relationship I guess. Now if we add a bridging entity between Books and sales such that each book sold have only 1 invoice number, no matter how many books are purchases we can still correctly identify each books.

In a many-to-many relationship there is obvious redundancy as well as insert, update and delete anomaly which should be eliminated by converting it to 2 one-to-many relationship via a bridge table.

M:N relationships should not exist in database design. They are extremely inefficient and do not make for functional databases. Two tables (entities) with a many-to-many relationship (aircraft, airport; teacher, student) cannot both be children of each other, there would be no where to put foreign keys without an intersecting table. aircraft-> flight <- airport; teacher <- class -> student.
An intersection table provides a place for an entity that is dependent on two other tables, for example, a grade needs both a class and a student, a flight needs both an aircraft and an airport. Many-to-many relationships conceal data. Intersection tables reveal this data and create one-to-many relationships that can be more easily understood and worked with. So, the question arises, what table should the flight be in--aircraft or airport. Neither, they should be foreign keys in the intersection table, Flight.

Can I (theoretically) use a Collection (e.g., Array, List) as a foreign key in a relational Database schema?

Is is possible to use a Collection such as a Multiset or Array as the foreign key in a database scheme?
Background: A student proposes to use such a construct (don't use a join table for the n:m association) to store the following object structure in a database
public class Person {
String name;
List<Resource> res;
…
}
public class Resource {
int id;
List<Person> prs;
…
}
SQL:2003

IMHO, the student didn't understand relational concepts. I don't know how collection types are implemented in todays databases, but they most probably store them in separate tables.
Edit
If it would be technically possible, I doubt that it would be useful. Consider the query language. Sql is designed for relational structures, I doubt that you could really have the same flexibility and possibilities using collection types. If you had it, you couldn't read it anymore. Consider indexes. etc. etc.
Relational structures are primitive, but very powerful and fast. You can do (almost) everything with them. Collection type are actually not needed, although they may be useful in certain cases. Using collections (for relational stuff) just would be more complex and less pure.

As David pointed out, theory allows attribute values to be of a collection type.
However, in your case, which is just to model n:m relationships (am I right about that), it simply does not apply.
If a Person P1 has associated resources R1 and R2, the row for this person would be like {P1, {R1, R2}}. If that collection-typed column were a foreign key referencing some other table, it would mean that there had to be another table in which a row appeared with the collection value {R1, R2} in some column. Which table would that be in your example ?
Collection-typed attributes are mostly useful if you have a need for dealing with empty collections alongside non-empty ones. There is no relational join in the world that will do its equivalent for you.

Simply put, I would have said no. I don't think that it is possible in SQL2003 and in any case it would couple the code and the database structure too closely. Remember good practice of structuring code so that a change to your database doesn't require a change to your code and vice versa.
As Stefan said you need separate tables for Resource and Person with Foreign Key links to the indexes between them.
So based on the classes shown each table would need 3 coloumns.
You would then obtain your class data by using an appropriate query to the database.

In principle, yes you can implement such a referential constraint. That's assuming your RDBMS allows a suitable type for the set of values. For instance it could be a relation value if relation-valued attributes (RVA) are supported.
If it was a RVA then the constraint could easily be expressed in the relational algebra / calculus or its equivalent. For instance you can do it in a RDBMS like Rel which supports the Tutorial D language. Doing it in SQL is probably going to be a lot harder - but then SQL is not a real relational language.
Of course, the fact that you can do it relationally does not necessarily make it a good idea...

Three Entity Intersection Table "many-many-many"

I'm currently working on a data model design (which will be backed by an RDMS). It is based around 3 entities relating to benefits for a member of the organization. The entities are the Benefit, Membership Type, and Provider. I've created a standard many-to-many intersection table to relate two entities, but never 3. I'm wondering if anybody has done such a thing and if there are any potential pitfalls I should keep out for. The table would be like the following:
create table BENEFIT_MEM_TYPE_PROVIDER
(
BENEFIT_ID reference BENEFITS not null,
MEMBERSHIP_TYPE_ID reference MEMBERSHIP_TYPES not null,
PROVIDER_ID reference PROVIDERS not null,
primary key (BENEFIT_ID, MEMBERSHIP_TYPE_ID, PROVIDER_ID)
)
Something about this relationship just doesn't sit right with me, so I thought I'd ask the smart folks for any advice.
Thanks

Yes.
Ternary relationships are quite widely used, although not as widely used as binary relationships. This can even be extended to quaternary or in general n-ary relationships.
One place where you can see n-ary relationships in database design is star schema. The fact table at the center of a star is an n-ary relationship, where n is the number of dimension tables referenced by the fact table.
The normalization process consists of detecting departures from some normal form, and decomposing tables in order to yield a more normalized equivalent. Decomposition sometimes results lowering the order of n-ary relationships.
Star schema deliberately goes the other way. That is why star schema and normalization are so frequently seen as being at odds with each other.

anybody has done such a thing
Yes.
if there are any potential pitfalls I should keep out for.
This can lead to issues if there are relationships among Benefit-Membership, Benefit-Provider and Membership-Provider that might appear true from data in this table, but shouldn't actually be true.
For example, a Member-Provider-Benefit might have a "restriction" that the Benefit only applies to the Member when a specific Provider is involved. It's possible that the relationship isn't generally true, but is true as a special case because of some qualification.
In this case, you'll need to carry that qualification in this table to be sure that these exceptional cases can be discovered properly by simple SQL queries.
In short, when you have more than 1 relationship, be sure the pair-wise relationships in each row are also true.

You should verify whether your table satisfies Fifth Normal Form (5NF). If it does not then it's probably better to normalize it so that it is in 5NF.

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!

It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.

Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.

your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas