Why is the foreign key part of the primary key in an identifying relationship? - sql

I'm trying to understand a concept rather than fixing a piece of code that won't work.
I'll take a general example of a form (parent table) and a form field (child table). Logically, this would be an identifying relationship, since a form field cannot exist without a form.
This would make me think that in order to translate the logical relationship into the technical relationship, a simple NOT NULL for the form_id field in the form_field table would suffice. (See the left part of above screenshot.)
However, when I add an identifying relationship using MySQL Workbench, form_id is not only NOT NULL but also part of the primary key. (See the right part of above screenshot.) And when I add a non-identifying relationship, NOT NULL is still applied so logically it would actually be an identifying relationship as well.
I guess this confuses me a little, as well as the fact that until now I always simply used the id field as primary key.
So I understand the logical concept of identifying vs. non-identifying relationships, but I don't understand the technical part.
Why is it, as this answer states, 'the "right" way to make the foreign key part of the child's primary key'?
What is the benefit of these composite primary keys?

Logically, this would be an identifying relationship, since a form field cannot exist without a form.
No, identifying relationship is about identification, not existence.
Any X:Y relationship where X >= 1 guarantees existence of the left side, whether identifying or not. In your case, a 1:N relationship guarantees existence of form for any given form_field. You could make it identifying or non-identifying and it would still guarantee the same.
Remarks:
You would model an identifying relationship by making form_field.form_id part of a key. For example form_field PK could look like: {form_id, label}, which BTW would be quite beneficial for proper clustering of your data (InnoDB tables are always clustered).
Just making a PK: {id, form_id} would be incorrect, since this superkey is not a candidate key (i.e. it is not minimal - we could remove form_id from it and still retain the uniqueness).
You would model a 0..1:N relationship by making the form_field.form_id NULL-able (but then you wouldn't be able to make it identifying as well - see below).
There are two definitions of the "identifying relationship":
Strict definition: A relationship that migrates parent key into child primary key1.
Loose definition: A relationship that migrates parent key into child key.
In other words, the loose definition allows migration into alternate key as well (and not just primary).
Most tools2 seem to use the strict definition though, so if you mark the relationship as identifying, that will automatically make the migrated attributes part of the child PK, and none of the PK attributes can be NULL.
1 Which is then either completely comprised from migrated attributes, or is a combination of migrated attributes and some additional attributes.
2 ERwin and Visio do. I haven't used MySQL Workbench for modeling yet, but your description seems to suggest it behaves the same.

An identifying relationship is supposed to be one where the primary key includes foreign key attributes. That's why when you designate a relationship as identifying the posted foreign key is deemed to be part of the primary key.
The difference between an "identifying" relationship and a non-identifying one is purely informational or diagrammatic if the same key constraints and nullability constraints apply in each case. The concept is analogous to and a consequence of designating a "primary" key. If a table has more than one candidate key then all other things being equal it doesn't matter from a logical perspective which key is designated the primary one - the form, function and (presumably) the business meaning of the table is the same.
In your example however, the keys in the two tables are NOT the same. In the first case ID is unique in the form_field table while in the second case it apparently isn't. I expect that's not what you intended.

Related

Can one field in a composite key be dependent on the other?

I am thinking about making a composite key for a table of mine (which would be composed of two fields, fields A and B). However, field B is dependent on field A. Would this composite key violate any database design principles?
Well, yes. It does violate database design principles. Why not just use A? That is, you can always look up the value of B using a JOIN, so a composite foreign key reference is unnecessary. Storing the value of B in referring tables is redundant and inefficient (takes up space in both data pages and index pages).
There are some cases where such a foreign key is useful. You have not provided enough information to know if you have such a case. So, as a general design principle this doesn't sound right. There may be exceptions, so it is not always a bad idea.

Partial Keys in a Weak Entity Set

I am a bit confused with the partial keys. 'Database System Concepts by Korth' says the following:
Although the weak entity set does not have a primary key, we
nevertheless need a means of distinguishing among all those entities in
the weak entity set that depend on one particular strong entity. The
discriminator of a weak entity set is a set of attributes that allows
this distinction to be made. The discriminator of a weak entity set is
also called the partial key of the entity set.
My confusion is that if the discriminator/partial keys of weak entities are able to uniquely identify the set of attributes, then it should be called primary key, instead of partial keys, as primary keys are those which can uniquely identify all the attributes of a relation.
Also, while surfing the web, I came across a definition of partial key, which says:
'A partial key is a key using which all the records of the table can not be identified uniquely'
It raises a question in my mind, that suppose if a table consists of a primary key which is made up of two or more attributes, then if we pick up a single attribute from this, then will it be called partial key, as that attribute is part of a primary key, but by itself it can't uniquely identify all attributes in a relation.
The definition doesn't say that "the discriminator/partial keys of weak entities are able to uniquely identify" within a table. It says that one identifies a weak entity within a particular strong entity.
Technical terms only mean what they are defined to mean in a certain context of assumptions, including other definitions. You can't expect the same term to mean the same thing everywhere. You can't just look at the text of a definition & make assumptions about what situations it applies to & what its technical terms mean or even whether a word is used in a technical or everyday meaning. When someone uses a term you have to make sure that you know what they mean by it.
A relational superkey uniquely identifies a row. A CK (candidate key) is a superkey that contains no smaller superkey. A PK (primary key) is just some CK you decided to call the PK.) So being unique is not a reason to call something a PK or CK. (An SQL PK/UNIQUE is analogous to a relational superkey.)
The book method generates discriminators that are not superkeys. So we can say that it agrees with the web definition--for cases that come up in that method. But if a method allowed generation of discriminators that were CKs or PKs then its use of that textbook wording would define "partial key" to be a different sort of thing than the web definition. Such a method couldn't use (relational) "PK" for a strong id plus discriminator, because it would be a superkey but not a CK or PK. (But it could still use SQL "PK" since that approximately means primary superkey.)
I really think this type of descriptions stems from the very first step in any modelling process, and one which anyone with any data modelling experience would just fix without even thinking about it.
The wiki page on "Weak Entity" gives the classic example of a Header/Detail pair, where the detail by itself doesn't have a reference to the header. Think of a two page document where page one is the header, page two is the details.
By itself, page two can not uniquely identify a row, but of course anyone would automatically add the header FK so we can uniquely identify a row.
Haven't seen the book you are reading, but I think that's what its getting at. So I think all your subsequent reasoning is correct. Have a look at the wiki page for more info.

Weak Entity in ERD

I have the following problem that I have multiple scenarios that might be right or wrong, I've been searching on this for a while and I didn't find a specific answer for my problem:
Doctor Clinic Example:
We have doctor, patient, treatment, treatment-type
Doctor: id, name....
Patient: id, name...
Treatment: date, cost
Treatment-Type: id, name
Doctor can do multiple treatments, and Patient can also do multiple treatments, so they are connected with Treatment with(1-N) relationship.
Treatment entity is a weak entity, as it cannot be defined in the absence of Doctor or Patient, so my question is, when we convert this ERD to actual tables, which is the correct (or the best-practice) scenario?
1 - doctor-id, patient-id cannot define the Treatment table uniquely, so we add to Treatment table the treatment-id field, and the PK is (doctor-id, patient-id, treatment-id).
2 - We add treatment-id field, and the PK is(treatment-id).
3 - The PK will be (doctor-id, patient-id, date).
I struggled finding if 'date' can be part of PK or not, and also I struggled if I can create an unique ID for weak entity
Thanks in advance.
Weak entity sets are entity sets that are partially identified by a parent entity set's primary key. A weak entity set necessarily depends on its parent entity set for existence (we say it participates totally in its identifying relationship), but not everything with an existence dependency is a weak entity set. Regular entity sets can also participate totally in one or more relationships. So, it depends on how you identify an entity set. See also my answer to the question "is optionality (mandatory, optional) and participation (total, partial) are same?"
An entity set that is uniquely identified by its own attributes is a regular entity set. An entity set that is partially identified by a parent entity set's primary key is a weak entity set. An entity set that is fully identified by a parent entity set's primary key is a subtype.
You should also note that weak entity sets can only have one parent entity set according to the entity-relationship model as Chen described it. Being identified by multiple parent entity sets would make it a relationship rather than an entity set.
In some schema design tools, a different interpretation is used where tables are equated to entity sets and relationships equated to FK constraints, and an identifying relationship would be an FK that is part of the PK of a table. This approach is closer to the network data model than the entity-relationship model, despite having adopted much of ER's terminology.
Let's take a look at your examples:
In example 1, we should consider whether treatment-id is identifying on its own (i.e. a surrogate key) or only in combination with doctor-id and patient-id (i.e. an ordinal number). If it's a surrogate key, it would be a mistake to include doctor-id and patient-id in the PK, example 2 would be the right way of handling it. If it's an ordinal number, then it's basically the same as example 3 - two foreign entity keys and a value set in a primary key. I'll say more about that in my comments on example 3.
In example 2, treatment-id is a surrogate key which means Treatment is a regular entity set which participates totally in its relationships with Patient and Doctor. This would be my recommended solution, since it's the simplest.
In example 3, you have a primary key consisting of two foreign entity keys and a value set.
The entity-relationship model doesn't cover such relations - relations with a single entity key are called entity relations, and relations with multiple entity keys are called relationships relations. Value sets are only described as the codomains of attributes, not the domains. The ER model's inability to handle arbitrary relations are a consequence of artificial distinctions between entity sets vs value sets, and between attributes vs relationships. Other data modeling disciplines like the relational model and object-role modeling are complete and can handle any kinds of relations.
Back to example 3, despite the ER model's shortcomings, it's not invalid to create such a table/relation in an actual database. However, think about what the primary key means - can a patient receive only one treatment per day from the same doctor? I would think multiple treatments should be possible, in which case you might need to add another ordinal number, e.g. (doctor-id, patient-id, date, treatment-id). In that case, it might be simpler just to do (doctor-id, patient-id, treatment-id).
One argument against such composite/natural keys is that they add up - a many-to-many association between two relations, each with 3 columns in their primary keys, could have up to 6 columns in its primary key! That gets inconvenient quickly, but on the other hand, those columns are relevant related info that would otherwise need to be retrieved from joined tables if the association was identified by a surrogate key.
Sorry about the long answer, but I hope this covers all the fine points. Let me know if you have any questions.

Identifying relationships losing meaning in relating entity

So as you can see I have an Identifying 1 to many relationship in the tables above.
If I was to change this relationship to a Identifying 1 to 1 relationship, then the auto_leads table will still contain two composite primary keys from its parent leads table. In other words, nothing will change.
Does an identifying relationship have any meaning in the context of relational models? It doesnt appear to change its effect with respect to relationships.
Identifying relationship is an ER-modelling concept which arises because ER modelling assumes there is some semantic significance to having a primary key for each entity. Primary keys have no special role in relational database design and therefore the concept of an identifying relationship is usually of no great importance.
Consider the example of a table with two candidate keys, A and B. A is also a foreign key. According to ER-modelling convention if A is chosen as a primary key then the foreign key relationship is an identifying one. If A is an alternate key then the relationship is deemed to be non-identifying. Yet the form, function, integrity constraints and presumably the business meaning is exactly the same in both cases. The concept of identifying relationships is only as important as you want it to be.

What are the down sides of using a composite/compound primary key?

What are the down sides of using a composite/compound primary key?
Could cause more problems for normalisation (2NF, "Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF")
More unnecessary data duplication. If your composite key consists of 3 columns, you will need to create the same 3 columns in every table, where it is used as a foreign key.
Generally avoidable with the help of surrogate keys (read about their advantages and disadvantages)
I can imagine a good scenario for composite key -- in a table representing a N:N relation, like Students - Classes, and the key in the intermediate table will be (StudentID, ClassID). But if you need to store more information about each pair (like a history of all marks of a student in a class) then you'll probably introduce a surrogate key.
There's nothing wrong with having a compound key per se, but a primary key should ideally be as small as possible (in terms of number of bytes required). If the primary key is long then this will cause non-clustered indexes to be bloated.
Bear in mind that the order of the columns in the primary key is important. The first column should be as selective as possible i.e. as 'unique' as possible. Searches on the first column will be able to seek, but searches just on the second column will have to scan, unless there is also a non-clustered index on the second column.
I think this is a specialisation of the synthetic key debate (whether to use meaningful keys or an arbitrary synthetic primary key). I come down almost completely on the synthetic key side of this debate for a number of reasons. These are a few of the more pertinent ones:
You have to keep dependent child
tables on the end of a foriegn key
up to date. If you change the the
value of one of the primary key
fields (which can happen - see
below) you have to somehow change
all of the dependent tables where
their PK value includes these
fields. This is a bit tricky
because changing key values will
invalidate FK relationships with
child tables so you may (depending
on the constraint validation options
available on your platform) have to
resort to tricks like copying the
record to a new one and deleting the
old records.
On a deep schema the keys can get
quite wide - I've seen 8 columns
once.
Changes in primary key values can be
troublesome to identify in ETL
processes loading off the system.
The example I once had occasion to
see was an MIS application
extracting from an insurance
underwriting system. On some
occasions a policy entry would be
re-used by the customer, changing
the policy identifier. This was a
part of the primary key of the
table. When this happens the
warehouse load is not aware of what
the old value was so it cannot match
the new data to it. The developer
had to go searching through audit
logs to identify the changed value.
Most of the issues with non-synthetic primary keys revolve around issues when PK values of records change. The most useful applications of non-synthetic values are where a database schema is intended to be used, such as an M.I.S. application where report writers are using the tables directly. In this case short values with fixed domains such as currency codes or dates might reasonably be placed directly on the table for convenience.
I would recommend a generated primary key in those cases with a unique not null constraint on the natural composite key.
If you use the natural key as primary then you will most likely have to reference both values in foreign key references to make sure you are identifying the correct record.
Take the example of a table with two candidate keys: one simple (single-column) and one compound (multi-column). Your question in that context seems to be, "What disadvantage may I suffer if I choose to promote one key to be 'primary' and I choose the compound key?"
First, consider whether you actually need to promote a key at all: "the very existence of the PRIMARY KEY in SQL seems to be an historical accident of some kind. According to author Chris Date the earliest incarnations of SQL didn't have any key constraints and PRIMARY KEY was only later addded to the SQL standards. The designers of the standard obviously took the term from E.F.Codd who invented it, even though Codd's original notion had been abandoned by that time! (Codd originally proposed that foreign keys must only reference one key - the primary key - but that idea was forgotten and ignored because it was widely recognised as a pointless limitation)." [source: David Portas' Blog: Down with Primary Keys?
Second, what criteria would you apply to choose which key in a table should be 'primary'?
In SQL, the choice of key PRIMARY KEY is arbitrary and product specific. In ACE/Jet (a.k.a. MS Access) the two main and often competing factors is whether you want to use PRIMARY KEY to favour clustering on disk or whether you want the columns comprising the key to appears as bold in the 'Relationships' picture in the MS Access user interface; I'm in the minority by thinking that index strategy trumps pretty picture :) In SQL Server, you can specify the clustered index independently of the PRIMARY KEY and there seems to be no product-specific advantage afforded. The only remaining advantage seems to be the fact you can omit the columns of the PRIMARY KEY when creating a foreign key in SQL DDL, being a SQL-92 Standard behaviour and anyhow doesn't seem such a big deal to me (perhaps another one of the things they added to the Standard because it was a feature already widespread in SQL products?) So, it's not a case of looking for drawbacks, rather, you should be looking to see what advantage, if any, your SQL product gives the PRIMARY KEY. Put another way, the only drawback to choosing the wrong key is that you may be missing out on a given advantage.
Third, are you rather alluding to using an artificial/synthetic/surrogate key to implement in your physical model a candidate key from your logical model because you are concerned there will be performance penalties if you use the natural key in foreign keys and table joins? That's an entirely different question and largely depends on your 'religious' stance on the issue of natural keys in SQL.
Need more specificity.
Taken too far, it can overcomplicate Inserts (Every key MUST exist) and documentation and your joined reads could be suspect if incomplete.
Sometimes it can indicate a flawed data model (is a composite key REALLY what's described by the data?)
I don't believe there is a performance cost...it just can go really wrong really easily.
when you se it on a diagram are less readable
when you use it on a query join are less
readable
when you use it on a foregein key
you have to add a check constraint
about all the attribute have to be
null or not null (if only one is
null the key is not checked)
usualy need more storage when use it
as foreign key
some tool doesn't manage composite
key
The main downside of using a compound primary key, is that you will confuse the hell out of typical ORM code generators.