I have a table, A, whose rows have a many-to-one relationship with several other tables. The rows in the other tables--B, C, D--each have many As. Each A relates exclusively to another table. For example you never have an A that relates to both a B and C.
Currently, the A table is represented as a "flattened" sum-type where each row has a nullable AId, BId, CId, & DId. The "parent" of any given row in A is determined by which one of these FK's in non-null.
This has been working fine so far. However, I have several new features to be implemented down the road which will also have many As.
My question is: is there a more extensible design than simply adding more columns to store FK's to these other tables?
You have a type of MUCK (massive unified code key)table, but only worse.
Instead of a table like this:
MUCK(aid, bid, cid, did, value1)
It would work better like this:
MUCK(table, id, value1)
The table is a,b,c, or d (or so on) and the id is the id within that table.
However, I really suggest you do some research on why MUCK and EAV tables can be a nightmare as you add requirements and features.
Here is a question I recently answered and a link to another that discusses why EAV (and even more, MUCK) is not always the best idea.
How to store custom entity properties in a relational database
Related
Let's say I have a database in which one entity (i.e. table) inherits from another one, for example:
Table 1, named person: (name,surname)
Table 2, named car_owner
In this case, car_owner inherits from person, i.e. a car-owner IS a person.
I'm now in a point where I have to decide whether I should:
create the table car_owner, even though it has no extra columns except the ones in person, although in the future this might change => doing this results in car_owner = table with columns (id,person_id), where person_id is FK to person
or
leave only the person table for now and only do (1) when/if extra information regarding a car-owner will appear => note that if I do this FKs to a car-owner from other tables would actually be FKs to the person table
The tables I'm dealing with have different names and semantics and the choice between (1) and (2) is not clear, because the need for extra columns in car_owner might never pop-up.
Conceptually, (1) seems to be the right choice, but I guess what I'm asking is if there are any serious issues I might run into later if I instead resort to (2)
I would suggest that option 1 is the better answer. While it creates more work to join the tables for queries, it is neater to put "optional" data in it's own table. And if more types of persons are required (pedestrian, car_driver, car_passenger) they can be accommodated with more child tables. You can always use a view to make them look like one table.
BTW for databases, we say Parent and Child, not "inherets".
To answer the part about problems/consequences of option 2 - well, none too serious. This is a database, so you can always re-arrange things later, but there will be a price to pay in rewriting queries and code if you restructure tables. Why I don't like Option 2 is because it can lead to extra tables not re-using the person part. If it looks like that table is for car_owners, I might make an entirely new table for car_passengers with duplication of all the person columns.
In short, nothing too tragic should happen with either approach, they are each preferable for different reasons and the drawbacks are mainly inconvenience and potential future messiness.
I'm working on a database structure and trying to imagine the best way to split up a host of related records into tables. Records all have the same base type they inherit from, but each then expands on it for their particular use.
These 4 properties are present for every type.
id, name, groupid, userid
Here are the types that expand off those 4 properties.
"Static": value
"Increment": currentValue, maxValue, overMaxAllowed, underNegativeAllowed
"Target": targetValue, result, lastResult
What I tried initially was to create a "records" table with the 4 base properties in it. I then created 3 other tables named "records_static/increment/target", each with their specific properties as columns. I then forged relationships between a "rowID" column in each of these secondary tables with the main table's "id".
Populating the tables with dummy data, I am now having some major problems attempting to extract the data with a query. The only parameter is the userid, beyond that what I need is a table with all of the columns and data associated with the userid.
I am unsure if I should abandon that table design, or if I just am going about the query incorrectly.
I hope I explained that well enough, please let me know if you need additional detail.
Make the design as simple as possible.
First I'd try a single table that contains all attributes that might apply to a record. Irrelevant attributes can be null. You can enforce null values for a specific type with a check constraint.
If that doesn't work out, you can create three tables for each record type, without a common table.
If that doesn't work out, you can create a base table with 1:1 extension tables. Be aware that querying that is much harder, requiring join for every operation:
select *
from fruit f
left join
apple a
on a.fruit_id = f.id
left join
pear p
on p.fruit_id = f.id
left join
...
The more complex the design, the more room for an inconsistent database state. The second option you could have a pear and an apple with the same id. In the third option you can have missing rows in either the base or the extension table. Or the tables can contradict each other, for example a base row saying "pear" with an extension row in the Apple table. I fully trust end users to find a way to get that into your database :)
Throw out the complex design and start with the simplest one. Your first attempt was not a failure: you now know the cost of adding relations between tables. Which can look deceptively trivial (or even "right") at design time.
This is a typical "object-oriented to relational" mapping problem. You can find books about this. Also a lot of google hits like
http://www.ibm.com/developerworks/library/ws-mapping-to-rdb/
The easiest for you to implement is to have one table containing all columns necessary to store all your types. Make sure you define them as nullable. Only the common columns can be not null if necessary.
Just because object share some of the same properties does not mean you need to have one table for both objects. That leads to unnecessary right outer joins that have a 1 to 1 relationship which is not what I think of as good database design.
but...
If you want to continue in your fashion I think all you need is a primary key in the table with common columns "id, name, groupid, userid" (I assume ID) then that would be the foreign key to your table with currentValue, maxValue, overMaxAllowed, underNegativeAllowed
I am learning about databases and SQL for the first time. In the text I'm reading (Oracle 11g: SQL by Joan Casteel), it says that "many-to-many relationships can't exist in a relational database." I understand that we are to avoid them, and I understand how to create a bridging entity to eliminate them, but I am trying to fully understand the statement "can't exist."
Is it actually physically impossible to have a many-to-many relationship represented?
Or is it just very inefficient since it leads to a lot of data duplication?
It seems to me to be the latter case, and the bridging entity minimizes the duplicated data. But maybe I'm missing something? I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship, either in the text or anywhere else I've searched. I've been searching all day and only finding the same information repeated: "don't do it, and use a bridging entity instead." But I like to ask why. :-)
Thanks!
Think about a simple relationship like the one between Authors and Books. An author can write many books. A book could have many authors. Now, without a bridge table to resolve the many-to-many relationship, what would the alternative be? You'd have to add multiple Author_ID columns to the Books table, one for each author. But how many do you add? 2? 3? 10? However many you choose, you'll probably end up with a lot of sparse rows where many of the Author_ID values are NULL and there's a good chance that you'll run across a case where you need "just one more." So then you're either constantly modifying the schema to try to accommodate or you're imposing some artificial restriction ("no book can have more than 3 authors") to force things to fit.
A true many-to-many relationship involving two tables is impossible to create in a relational database. I believe that is what they refer to when they say that it can't exist. In order to implement a many to many you need an intermediary table with basically 3 fields, an ID, an id attached to the first table and an id atached to the second table.
The reason for not wanting many-to-many relationships, is like you said they are incredibly inefficient and managing all the records tied to each side of the relationship can be tough, for instance if you delete a record on one side what happens to the records in the relational table and the table on the other side? Cascading deletes is a slippery slope, at least in my opinion.
Normally (pun intended) you would use a link table to establish many-to-many
Like described by Joe Stefanelli, let's say you had Authors and Books
SELECT * from Author
SELECT * from Books
you would create a JOIN table called AuthorBooks
Then,
SELECT *
FROM Author a
JOIN AuthorBooks ab
on a.AuthorId = ab.AuthorId
JOIN Books b
on ab.BookId = b.BookId
hope that helps.
it says that "many-to-many relationships can't exist in a relational database."
I suspect the author is just being controversial. Technically, in the SQL language, there is no means to explicitly declare a M-M relationship. It is an emergent result of declaring multiple 1-M relations to the table. However, it is a common approach to achieve the result of a M-M relationship and it is absolutely used frequently in databases designed on relational database management systems.
I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship,
They should be used where they are appropriate to be used would be a more accurate way of saying this. There are times, such as the books and authors example given by Joe Stafanelli, where any other solution would be inefficient and introduce other data integrity problems. However, M-M relationships are more complicated to use. They add more work on the part of the GUI designer. Thus, they should only be used where it makes sense to use them. If you are highly confident that one entity should never be associated with more than one of some other entity, then by all means restrict it to a 1-M. For example, if you were tracking the status of a shipment, each shipment can have only a single status at any given time. It would over complicate the design and not make logical sense to allow a shipment to have multiple statuses.
Of course they can (and do) exist. That sounds to me like a soapbox statement. They are required for a great many business applications.
Done properly, they are not inefficient and do not have duplicate data either.
Take a look at FaceBook. How many many-to-many relationships exist between friends and friends of friends? That is a well-defined business need.
The statement that "many-to-many relationships can't exist in a relational database." is patently false.
Many-to-many relationships are in fact very useful, and also common. For example, consider a contact management system which allows you to put people in groups. One person can be in many groups, and each group can have many members.
Representation of these relations requires an extra table--perhaps that's what your book is really saying? In the example I just gave, you'd have a Person table (id, name, address etc) and a Group table (id, group name, etc). Neither contains information about who's in which group; to do that you have a third table (call it PersonGroup) in which each record contains a Person ID and a Group ID--that record represents the relation between the person and the group.
Need to find the members of a group? Your query might look like this (for the group with ID=1):
SELECT Person.firstName, Person.lastName
FROM Person JOIN PersonGroup JOIN Group
ON (PersonGroup.GroupID = 1 AND PersonGroup.PersonID = Person.ID);
It is correct. The Many to Many relationship is broken down into several One to Many relationships. So essentially, NO many to many relationship exists!
Well, of course M-M relationship does exist in relational databases and they also have capability of handling at some level through bridging tables, however as the degree of M-M relationship increases it also increases complexity which results in slow R-W cycles and latency.
It is recommended to avoid such complex M-M relationships in a Relational Database. Graph Databases are the best alternative and good at handling Many to Many relationship between objects and that's why social networking sites uses Graph databases for handling M-M relationship between User and Friends, Users and Events etc.
Let's invent a fictional relationship (many to many relationship) between books and sales table. Suppose you are buying books and for each book you buy needs to generate an invoice number for that book. Suppose also that the invoice number for a book can represent multiple sales to the same customer (not in reality but let's assume). We have a many to many relationship between books and sales entities.
Now if that's the case, how can we get information about only 1 book given that we have purchased 3 books since all books would in theory have the same invoice number? That introduces the main problem of using a many to many relationship I guess. Now if we add a bridging entity between Books and sales such that each book sold have only 1 invoice number, no matter how many books are purchases we can still correctly identify each books.
In a many-to-many relationship there is obvious redundancy as well as insert, update and delete anomaly which should be eliminated by converting it to 2 one-to-many relationship via a bridge table.
M:N relationships should not exist in database design. They are extremely inefficient and do not make for functional databases. Two tables (entities) with a many-to-many relationship (aircraft, airport; teacher, student) cannot both be children of each other, there would be no where to put foreign keys without an intersecting table. aircraft-> flight <- airport; teacher <- class -> student.
An intersection table provides a place for an entity that is dependent on two other tables, for example, a grade needs both a class and a student, a flight needs both an aircraft and an airport. Many-to-many relationships conceal data. Intersection tables reveal this data and create one-to-many relationships that can be more easily understood and worked with. So, the question arises, what table should the flight be in--aircraft or airport. Neither, they should be foreign keys in the intersection table, Flight.
I have a SQL database with multiple tables: A, B, C, D. Entities in those tables are quite different things, with different columns, and different kind of relations between them.
However, they all share one little thing: the need for a comment system which, in this case, would have an identical structure: author_id, date, content, etc.
I wonder which strategy would be the best for this schema to have A,..D tables use a comment system. In a classical 'blog' web site I would use a one-to-many relationship with a post_id inside the 'comments' table.
Here it looks like I need an A_comments, B_comments, etc tables to handle this problem, which looks a little bit weird.
Is there a better way?
Create a comment table with a comment_id primary key and the various attributes of a comment.
Additionally, create A_comment thus:
CREATE TABLE A_comment (
comment_id PRIMARY KEY REFERENCES comment(comment_id),
A_id REFERENCES A(A_id)
)
Do likewise for B, C and D. This ensures referential integrity between comment and all the other tables, which you can't do if you store the ids to A, B, C and D directly in comment.
Declaring A_comment.comment_id as the primary key ensures that a comment can only belong to one entry in A. It doesn't prevent a comment from belonging to an entry in A and an entry in B, but there's only so much you can achieve with foreign keys; this would require database-level constraints, which no database I know of supports.
This design also doesn't prevent orphaned comments, but I can't think of any way to prevent this in SQL, except, of course, to do the very thing you wanted to avoid: create multiple comment tables.
I had a similar "problem" with comments for more different object types (e.g. articles and shops in my case, each type has it's own table).
What I did: in my comments table I have two columns that manage the linking:
object_type (ENUM type) that determines the object/table we are linking to, and
object_id (unsigned integer type that matches the primary key of your other tables (or the biggest of them)) that point to the exact row in the particular table.
The column structure is then: id, object_type, object_id, author_id, date, content, etc.
Important thing here is to have an index on both of the columns, (object_type, object_id), to make indexing fast.
I presume that you are talking of a single comments table with a foreign key to "exactly one of" A, B, C or D.
The fact that SQL cannot handle this is one of its fundamental weaknesses. The question gets asked over and over and over again.
See, e.g.
What is the best way to enforce a 'subset' relationship with integrity constraints
Your constraint is a "foreign key" from your "single comments" table into a view, which is the union of the identifiers in A, B, C and D. SQL supports only foreign keys into base tables.
Observe that SQL as a language does have support for your situation, in the form of CREATE ASSERTION. I know of no SQL products that support that statement, however.
EDIT
You should also keep in mind that with a 'single' comments table, you might need to enforce disjointness between the keys in A,B,C and D, for otherwise it might happen some time that a comment automatically gets "shared" between entity occurrences from different tables, which might not be desirable.
You could have a single comments table and within that table have a column that contains a value differentiating which table the comment belongs to - ie a 1 in that column means it's a comment for table A, 2 for table B, and so on. If you didn't want to have "magic numbers" in the comments table, you could have another table that has just two columns: one with the number and another detailing which table the number represents.
You don't need a separate comment table for every other, one is enough. Every comment will have a unique ID, so you don't have to worry about conflicts.
I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!
It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.
Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.
your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.