I am a bit confused with the partial keys. 'Database System Concepts by Korth' says the following:
Although the weak entity set does not have a primary key, we
nevertheless need a means of distinguishing among all those entities in
the weak entity set that depend on one particular strong entity. The
discriminator of a weak entity set is a set of attributes that allows
this distinction to be made. The discriminator of a weak entity set is
also called the partial key of the entity set.
My confusion is that if the discriminator/partial keys of weak entities are able to uniquely identify the set of attributes, then it should be called primary key, instead of partial keys, as primary keys are those which can uniquely identify all the attributes of a relation.
Also, while surfing the web, I came across a definition of partial key, which says:
'A partial key is a key using which all the records of the table can not be identified uniquely'
It raises a question in my mind, that suppose if a table consists of a primary key which is made up of two or more attributes, then if we pick up a single attribute from this, then will it be called partial key, as that attribute is part of a primary key, but by itself it can't uniquely identify all attributes in a relation.
The definition doesn't say that "the discriminator/partial keys of weak entities are able to uniquely identify" within a table. It says that one identifies a weak entity within a particular strong entity.
Technical terms only mean what they are defined to mean in a certain context of assumptions, including other definitions. You can't expect the same term to mean the same thing everywhere. You can't just look at the text of a definition & make assumptions about what situations it applies to & what its technical terms mean or even whether a word is used in a technical or everyday meaning. When someone uses a term you have to make sure that you know what they mean by it.
A relational superkey uniquely identifies a row. A CK (candidate key) is a superkey that contains no smaller superkey. A PK (primary key) is just some CK you decided to call the PK.) So being unique is not a reason to call something a PK or CK. (An SQL PK/UNIQUE is analogous to a relational superkey.)
The book method generates discriminators that are not superkeys. So we can say that it agrees with the web definition--for cases that come up in that method. But if a method allowed generation of discriminators that were CKs or PKs then its use of that textbook wording would define "partial key" to be a different sort of thing than the web definition. Such a method couldn't use (relational) "PK" for a strong id plus discriminator, because it would be a superkey but not a CK or PK. (But it could still use SQL "PK" since that approximately means primary superkey.)
I really think this type of descriptions stems from the very first step in any modelling process, and one which anyone with any data modelling experience would just fix without even thinking about it.
The wiki page on "Weak Entity" gives the classic example of a Header/Detail pair, where the detail by itself doesn't have a reference to the header. Think of a two page document where page one is the header, page two is the details.
By itself, page two can not uniquely identify a row, but of course anyone would automatically add the header FK so we can uniquely identify a row.
Haven't seen the book you are reading, but I think that's what its getting at. So I think all your subsequent reasoning is correct. Have a look at the wiki page for more info.
Related
I'm having a hard time converting my database tables and foreign keys to a class diagram with classes and associations.
My question is:
"In in a composition relationship, should a child class always should have an ID field?".
In my CD, there are 2 compositor classes: PurchaseItem and PurchaseFinisher, which composite Purchase class. PurchaseItem already comes with an ID field from its table but, PurchaseFinisher doesn't because it is filtered by the id_purchase and id_payment_method foreign keys.
thanks in advance.
This is my DB diagram:
I can't see redundancy in between Purchase or Product, as you said. Could you, please, show me that based on my DB diagram? My tables are well modeled (hope so). My fault is in the classes definition.
In a class diagram, no class requires an id property: each class instance (aka object) has its own identity with or without explicit id property.
In a database, you need of course an explicit id property to uniquely identify the object among others in the database and find it back. By the way, you may annotate such properties with a trailing {id} . UML does not define any semantic for it, but it is in general sufficiently expressive to help database designers.
In the case of composition, the main question is whether a composed object can easily be identified by alternate means. There are several related ORM database techniques, for example:
you can use the owning object’s id together with another property if this is sufficient to identify the element. The two together would make a composite primary key in database.
you can use a unique id to identify the object (surrogate primary key) and use the id of the owning object as foreign key.
For PurchaseItem you have everything that is needed, although the diagram does not tell which of the two approaches you’ll use (e.g is the id unique globally, or unique within the purchase?).
But for PurchaseFinisher it is unclear if you could uniquely identify an occurence. If a payment method can only be used once per purchase, it’s fine as it may be used to identify the object.
If it would be allowed to pay two times the same amount (half of the overall price) in the same currency with the same payment methods, you’d have undistinguishable duplicates. So, some kind of identifier will be needed from the database point of view.
I have the following problem that I have multiple scenarios that might be right or wrong, I've been searching on this for a while and I didn't find a specific answer for my problem:
Doctor Clinic Example:
We have doctor, patient, treatment, treatment-type
Doctor: id, name....
Patient: id, name...
Treatment: date, cost
Treatment-Type: id, name
Doctor can do multiple treatments, and Patient can also do multiple treatments, so they are connected with Treatment with(1-N) relationship.
Treatment entity is a weak entity, as it cannot be defined in the absence of Doctor or Patient, so my question is, when we convert this ERD to actual tables, which is the correct (or the best-practice) scenario?
1 - doctor-id, patient-id cannot define the Treatment table uniquely, so we add to Treatment table the treatment-id field, and the PK is (doctor-id, patient-id, treatment-id).
2 - We add treatment-id field, and the PK is(treatment-id).
3 - The PK will be (doctor-id, patient-id, date).
I struggled finding if 'date' can be part of PK or not, and also I struggled if I can create an unique ID for weak entity
Thanks in advance.
Weak entity sets are entity sets that are partially identified by a parent entity set's primary key. A weak entity set necessarily depends on its parent entity set for existence (we say it participates totally in its identifying relationship), but not everything with an existence dependency is a weak entity set. Regular entity sets can also participate totally in one or more relationships. So, it depends on how you identify an entity set. See also my answer to the question "is optionality (mandatory, optional) and participation (total, partial) are same?"
An entity set that is uniquely identified by its own attributes is a regular entity set. An entity set that is partially identified by a parent entity set's primary key is a weak entity set. An entity set that is fully identified by a parent entity set's primary key is a subtype.
You should also note that weak entity sets can only have one parent entity set according to the entity-relationship model as Chen described it. Being identified by multiple parent entity sets would make it a relationship rather than an entity set.
In some schema design tools, a different interpretation is used where tables are equated to entity sets and relationships equated to FK constraints, and an identifying relationship would be an FK that is part of the PK of a table. This approach is closer to the network data model than the entity-relationship model, despite having adopted much of ER's terminology.
Let's take a look at your examples:
In example 1, we should consider whether treatment-id is identifying on its own (i.e. a surrogate key) or only in combination with doctor-id and patient-id (i.e. an ordinal number). If it's a surrogate key, it would be a mistake to include doctor-id and patient-id in the PK, example 2 would be the right way of handling it. If it's an ordinal number, then it's basically the same as example 3 - two foreign entity keys and a value set in a primary key. I'll say more about that in my comments on example 3.
In example 2, treatment-id is a surrogate key which means Treatment is a regular entity set which participates totally in its relationships with Patient and Doctor. This would be my recommended solution, since it's the simplest.
In example 3, you have a primary key consisting of two foreign entity keys and a value set.
The entity-relationship model doesn't cover such relations - relations with a single entity key are called entity relations, and relations with multiple entity keys are called relationships relations. Value sets are only described as the codomains of attributes, not the domains. The ER model's inability to handle arbitrary relations are a consequence of artificial distinctions between entity sets vs value sets, and between attributes vs relationships. Other data modeling disciplines like the relational model and object-role modeling are complete and can handle any kinds of relations.
Back to example 3, despite the ER model's shortcomings, it's not invalid to create such a table/relation in an actual database. However, think about what the primary key means - can a patient receive only one treatment per day from the same doctor? I would think multiple treatments should be possible, in which case you might need to add another ordinal number, e.g. (doctor-id, patient-id, date, treatment-id). In that case, it might be simpler just to do (doctor-id, patient-id, treatment-id).
One argument against such composite/natural keys is that they add up - a many-to-many association between two relations, each with 3 columns in their primary keys, could have up to 6 columns in its primary key! That gets inconvenient quickly, but on the other hand, those columns are relevant related info that would otherwise need to be retrieved from joined tables if the association was identified by a surrogate key.
Sorry about the long answer, but I hope this covers all the fine points. Let me know if you have any questions.
I'm trying to build a table for an inbox app that stores its messages within inbox table.
The structure of the table is as follows:
inbox_id|sender_id|receiver_id|subject|message
The columns sender_id and receiver_id are FOREIGN KEYS and can reference multiple tables.
There are currently 3 types of users within the database and they all can send messages to each other. A user of type UserType1 can send a message to UserType2 and vice versa, or UserType1 can send messages to UserType1. So receiver and sender can reference one of these 3 tables.
My solution to this problem is to build a inbox_user table containing columns for each user type and to have sender_id and inbox_user reference it.
My main concern is limited flexibility of the solution and wasteful usage of resources. I would, at all times, always have 2 empty columns per row. And that would become even worse if I introduced more user types.
Would this be considered a bad practice? What are some more flexible and intelligent designs?
From your description it sounds like the best practice should be applied to your user table and not this inbox table. Of course, I don't know your constraints, but if you have 2 or 3 types of users with each type of user in its own table, that is a poor design (again, not knowing your constraints). The preference is to store all users in one table with a column to indicate their type. Then the reference to your inbox table becomes straightforward with both sender and receiver FKs pointing back to the same user table.
Otherwise, you're going to end up using multiple columns to reference each table like you said (UserTypeASenderID, UserTypeBSenderID, UserTypeCSenderID, etc). My preference is to have null FK columns and gain the referential integrity than to implement some other solution and lose the constraints.
3 different types of users is a classic type/subtype situation. (or, if you prefer, class/subclass). There are several ways to design for a class/subclass situation. Two that look good to me are "Class Table Inheritance" and "Single Table Inheritance" as explained by Martin fowler. You can find a synopsis online. You can also visit tags by these names in here, read up on the info presented, and look at the tagged questions.
Single table inheritance suffers from a lot of NULLs for fields that are not applicable some of the time, as you pointed out in the Q. This may or may not be a problem for you, depending on your case.
Class table inheritance involves a little programming when a new entry is made in a subclass (subtype). It also involves more joining, but it isn't very expensive joining. Class table inheritance is frequently combined with a technique called shared primary key. In this technique, the subclass tables end up with a primary key that is a duplicate of the primary key in the superclass table. It's also a foreign key to the superclass table. This makes joining subclass data and superclass data simple, easy, and fast.
Shared primary key resolves the quandary you stated in your Q, namely how to reference more than one table with one foreign key. A foreign key reference in some other table to the superclass table will also be a reference to at least one of the subclass tables. This seems like magic. Try it, see if you like it.
Suppose I have a set of 'model' entities, and a set of difficulty levels. Each model has a certain percentage success rate for a given day on a given difficulty level.
A model entity has a name, which is both unique and immutable under any circumstances, so it makes a natural primary key. A difficulty level is described by its name only (easy, normal, etc). Difficulty levels are very, very unlikely to change, though it's possible a new one could be added. A success rate record is uniquely identified by the model it pertains to, the difficulty level, and the date.
Here's the most trivial db design for this scenario:
In this design, 'name' is the primary key for the table 'models' and is represented by a VARCHAR(20) field. Likewise, a VARCHAR(20) field 'name' is the primary key for the table 'difficulty_levels' (a lookup table). In the 'success_rates' table, 'model_name' is a foreign key referencing the 'name' field in the 'model' table, and 'difficulty_level' is a foreign key referencing the 'name' field in the 'difficulty_levels' table. The fields 'model_name', 'difficulty_level' and 'date' make up a composite primary key for the 'success_rates' table.
The most used queries would be:
getting all success rates for a certain model, difficulty level, and date period
getting the most/least successful models for a certain period and difficulty level.
Now, my question is - do I need to add surrogate primary keys to the 'models' and 'difficulty_levels' tables? I guess storing int values as opposed to varchar values in the foreign key fields of 'success_rates' takes up less space, and maybe the queries will be faster (just my wild guess, not sure about that)?
The problem I see with surrogate keys is that they have zero relevance to the business logic itself. I'm planning on using a mini-ORM (most likely Dapper), and without the surrogate keys I can operate on classes that very cleanly represent the entities I'm working with. Now, if I add surrogate keys, I'll have to add 'Id' properties to my classes, and I'm really against adding a database storage implementation like that to a class that can be used anywhere in the app, not even in connection with a database storage. I could add proxy storage classes with an Id property, but that adds another level of complexity. Plus the fact that the 'Id' property won't be readonly (so the ORM can set the ids after saving the entity to the database) means that it would be possible to accidentally set it to a random/invalid value.
I'm not very familiar with ORM's and I have zero knowledge of Dapper, so correct me if I was wrong in any of these points.
What would be the best approach here?
The problem I see with surrogate keys is that they have zero relevance to the business logic itself.
That is actually the benefit, not the problem. The key's value is the immutable identifier of the entity, regardless of whatever else changes. I really doubt that there is any widely used ORM that cannot easily make use of it, as the alternative (cascading changes to a key value to every child record) is so much more difficult to implement.
I'd also suggest that you add a value to your difficulty levels that allows the hierarchy of increasing difficulty to be represented in the data model, otherwise "harder than" or "easier than" cannot be robustly represented.
I'm trying to understand a concept rather than fixing a piece of code that won't work.
I'll take a general example of a form (parent table) and a form field (child table). Logically, this would be an identifying relationship, since a form field cannot exist without a form.
This would make me think that in order to translate the logical relationship into the technical relationship, a simple NOT NULL for the form_id field in the form_field table would suffice. (See the left part of above screenshot.)
However, when I add an identifying relationship using MySQL Workbench, form_id is not only NOT NULL but also part of the primary key. (See the right part of above screenshot.) And when I add a non-identifying relationship, NOT NULL is still applied so logically it would actually be an identifying relationship as well.
I guess this confuses me a little, as well as the fact that until now I always simply used the id field as primary key.
So I understand the logical concept of identifying vs. non-identifying relationships, but I don't understand the technical part.
Why is it, as this answer states, 'the "right" way to make the foreign key part of the child's primary key'?
What is the benefit of these composite primary keys?
Logically, this would be an identifying relationship, since a form field cannot exist without a form.
No, identifying relationship is about identification, not existence.
Any X:Y relationship where X >= 1 guarantees existence of the left side, whether identifying or not. In your case, a 1:N relationship guarantees existence of form for any given form_field. You could make it identifying or non-identifying and it would still guarantee the same.
Remarks:
You would model an identifying relationship by making form_field.form_id part of a key. For example form_field PK could look like: {form_id, label}, which BTW would be quite beneficial for proper clustering of your data (InnoDB tables are always clustered).
Just making a PK: {id, form_id} would be incorrect, since this superkey is not a candidate key (i.e. it is not minimal - we could remove form_id from it and still retain the uniqueness).
You would model a 0..1:N relationship by making the form_field.form_id NULL-able (but then you wouldn't be able to make it identifying as well - see below).
There are two definitions of the "identifying relationship":
Strict definition: A relationship that migrates parent key into child primary key1.
Loose definition: A relationship that migrates parent key into child key.
In other words, the loose definition allows migration into alternate key as well (and not just primary).
Most tools2 seem to use the strict definition though, so if you mark the relationship as identifying, that will automatically make the migrated attributes part of the child PK, and none of the PK attributes can be NULL.
1 Which is then either completely comprised from migrated attributes, or is a combination of migrated attributes and some additional attributes.
2 ERwin and Visio do. I haven't used MySQL Workbench for modeling yet, but your description seems to suggest it behaves the same.
An identifying relationship is supposed to be one where the primary key includes foreign key attributes. That's why when you designate a relationship as identifying the posted foreign key is deemed to be part of the primary key.
The difference between an "identifying" relationship and a non-identifying one is purely informational or diagrammatic if the same key constraints and nullability constraints apply in each case. The concept is analogous to and a consequence of designating a "primary" key. If a table has more than one candidate key then all other things being equal it doesn't matter from a logical perspective which key is designated the primary one - the form, function and (presumably) the business meaning of the table is the same.
In your example however, the keys in the two tables are NOT the same. In the first case ID is unique in the form_field table while in the second case it apparently isn't. I expect that's not what you intended.