From UML to SQL (PostgreSQL) - sql

I am training for an upcoming exam and just finished this (simple) exercise.
I just wanted to be sure that I implemented everything correctly, especially the Composition with the multiplicities 1 and 0..*
My Answer:
CREATE TABLE exam.A(
idA integer,
b text NOT NULL,
c float DEFAULT -1.0 CONSTRAINT negative_c CHECK (c < 0.0),
PRIMARY KEY(idA));
CREATE TABLE exam.B(
idB integer,
c integer,
PRIMARY KEY(idB));
CREATE TABLE exam.RelationAandB(
idA integer NOT NULL ON DELETE CASCADE,
idB integer,
b integer,
c text,
FOREIGN KEY (idA) REFERENCES exam.A(idA),
FOREIGN KEY (idB) REFERENCES exam.B(idB),
PRIMARY KEY (idA, idB));

Your SQL code is pretty good, but I see the following issues:
In UML class diagrams, attributes are mandatory by default. They would be optional only when qualified with the multiplicity expression [0..1]. Consequently, all attributes would need to be coded as NOT-NULL columns. Possibly, however, your instructor has not been aware of this or is using a non-standard reading for "UML data models".
The string-valued attribute A::b has a "{not empty}" property modifier, which reads as a constraint requiring non-empty strings. Notice that having a non-empty string value is not the same as being mandatory (NOT NULL) because the empty string "" satisfies the NOT NULL constraint.
You also need a CASCADE DELETE rule on idB in the RelationAandB table, because no matter if an A or a B is deleted, the associated tuples in RelationAandB have to be deleted as well.
I think, for readability, it is preferable to add the PRIMARY KEY declaration to the column definition if the key is non-composite (has just one column). The same holds for single-column FOREIGN KEY declarations.
Many people think that a composition implies a deletion dependency, although this is not warranted by the UML semantics (see Aggregation versus Composition), and it is also not based on common sense (see my remarks below). In your SQL code, you did not implement such a dependency ("whenever an A is deleted, all dependent Bs have to be deleted as well"), which is correct according to the UML semantics of the class diagram, but which may have been the intention of your instructor, especially since he made it mandatory for a B component to have an A composite (by the multiplicity 1 at the composite side). Such a mandatory composite constraint implies that, when their composite is deleted, components either have to be deleted as well or they have to be re-assigned to another (A) composite. If your instructor's intention was that there should be a deletion dependency, then you should better add a corresponding foreign key declaration in exam.B from idB to RelationAandB with a CASCADE DELETE rule: idB integer FOREIGN KEY REFERENCES exam.RelationAandB CASCADE DELETE,
Concerning the question if a composition implies a lifecycle dependency between a composite and its components, we have to distinguish between three levels of abstraction: 1) the purely conceptual (philosophical) level, which should be the common sense of a data modeler, 2) the UML semantics, which is often not precisely defined, and 3) the level of (e.g., SQL) code. At the conceptual level, it should be clear that there are compositions with and without such a lifecycle dependency, so the very fact that there is a composition does not imply a lifecycle dependency.
Unfortunately, UML didn't define any means how to declare that a composition has existentially dependent components. In my SO answer Aggregation versus Composition, I have proposed to use a stereotype "inseparable" for such a composition.

Related

TPH - how to satisfy FK constraint when FK is on derived class?

Let's say I have TPH for abstract class Person. Then I have Girl and Boy derived from this class. Girl has a relationship with FavoriteMakeup that Boy does not have. How do I satisfy the FK constraint on Makeup when inserting a Boy record? Or is TPH incompatible with FKs limited to derived types?
TPH: Table Per Hierarchy
FK: Foreign Key
There are two separate aspects here. Let's examine both of them.
The first is if a foreign key can have or not null values in a Relational DBMS. In general this is allowed: you can have both a foreign key constraint and a not null constraint or a foreign key constraint while the attribute can be, at the same time, nullable. This is because the two constraints are usually considered independent. For instance, a NULL value could have the meaning that the value is unknown for a particular object.
The second aspect is related to modeling: usually, in cases like your example, while you can use the “Table per Hierarchy” approach, this is not considered a good modeling practice: you should use the “Table per Type” (or TPT to use the jargon of Microsoft documentation) approach, since all the Girl entities have an association with the other entities Makeup, while the entities Boy do not have this association, and this fact is part of the meaning of the entity, not some special case for some entity (like an unknown value).

REFERENCES keyword in SQLite3

I was hoping someone could explain to me the purpose of the SQL keyword REFERENCES
CREATE TABLE wizards(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
age INTEGER
, color TEXT);
CREATE TABLE powers(
id INTEGER PRIMARY KEY AUTOINCREMENT,
name STRING,
damage INTEGER,
wizard_id INTEGER REFERENCES wizards(id)
);
I've spent a lot of time trying to look this up and I initially thought that it would constrain the type of data you can enter into the powers table (based on whether the wizard_id ) However, I am still able to insert data into both columns without any constraint that I have noticed.
So, is the keyword REFERENCES just for increasing querying speed? What is its true purpose?
Thanks
It creates a Foreign Key to the other table. This can have performance benefits, but foreign keys are mostly about data integrity. It means that (in your case) the wizard_id filed of powers must have a value that exists in the id field of the wizards table. In other words, powers must refer to a valid wizard. Many databases also use this information to propagate deletions or other changes, so the tables stay in sync.
Just noticed this. A reason that you're able to bypass the key constraint may be that foreign keys aren't enabled. See Enabling foreign keys in the SQLite3 documentation.
From what I've gathered, there are two main benefits of using REFERENCES, and an important distinction to be made between its use with and without FOREIGN KEY.
It gives the DBMS room to optimize
Without using REFERENCES, SQLite would not know that attribute id and attribute wizard_id are functionally equivalent. The more known constraints you can define for the Database Management System (SQLite in this case), the more freedom it has to optimize the way it handles your data under the hood.
It can enforce or encourage good practice
Reference declaration can also be useful for enforcement and warning provision. For example, say you have two tables, A and B, and you assume that A.name is functionally equivalent to B.name, so you attempt a join: SELECT * FROM A, B WHERE A.name = B.name. If REFERENCE was never used to indicate functional equivalency between these two attributes, the DBMS could warn you when you make the join, which would be helpful in the case that these attributes only happen to have the same name but are not actually meant to represent the same thing.
REFERENCES does not always create a "foreign key"
Contrary to what has already been suggested, references and foreign keys are not the same thing. A reference declares functional equivalency between attributes. A foreign key refers to the primary key of another table.
EDIT: #IanMcLaird has corrected me: the use of REFERENCES does always create a foreign key of some kind, although this conflicts with the popular definition of foreign key as "a set of attributes in a table that refers to the primary key of another table" (Wikipedia).
Using REFERENCES without FOREIGN KEY may create a "column-level foreign key" which operates contrary to the popular definition of "foreign key."
There is a difference between the following statements.
driver_id INT REFERENCES Drivers
driver_id INT REFERENCES Drivers(id)
driver_id INT,
FOREIGN KEY(driver_id) REFERENCES Drivers(id)
The first statement assumes that you would like to reference the primary key of Drivers since no attribute is specified. The third statement requires that id be the primary key of Drivers. Both assume you want to make a foreign key by the popular definition provided above; both create a table-level foreign key.
The second statement is tricky. If specifying an attribute which is the primary key of Drivers, the DBMS may opt to create a table-level foreign key. But the specified attribute does not have to be the primary key of Drivers, and if it isn't, the DBMS will create a column-level foreign key. This is somewhat unintuitive for those who are first approaching databases and learn the less-flexible, popular definition of "foreign key."
Some people may use these three statements as if they are the same, and they may be functionally identical in many general use cases, but they are not the same.
All that said, this is just my understanding. I am not an expert in this subject and would greatly appreciate additions, corrections, and affirmations.

Creating PostgreSQL tables + relationships - PROBLEMS with relationships - ONE TO ONE

So I am supposed to create this schema + relationships exactly the way this ERD depicts it. Here I only show the tables that I am having problems with:
So I am trying to make it one to one but for some reason, no matter what I change, I get one to many on whatever table has the foreign key.
This is my sql for these two tables.
CREATE TABLE lab4.factory(
factory_id INTEGER UNIQUE,
address VARCHAR(100) NOT NULL,
PRIMARY KEY ( factory_id )
);
CREATE TABLE lab4.employee(
employee_id INTEGER UNIQUE,
employee_name VARCHAR(100) NOT NULL,
factory_id INTEGER REFERENCES lab4.factory(factory_id),
PRIMARY KEY ( employee_id )
);
Here I get the same thing. I am not getting the one to one relationship but one to many. Invoiceline is a weak entity.
And here is my code for the second image.
CREATE TABLE lab4.product(
product_id INTEGER PRIMARY KEY,
product_name INTEGER NOT NULL
);
CREATE TABLE lab4.invoiceLine(
line_number INTEGER NOT NULL,
quantity INTEGER NOT NULL,
curr_price INTEGER NOT NULL,
inv_no INTEGER REFERENCES invoice,
product_id INTEGER REFERENCES lab4.product(product_id),
PRIMARY KEY ( inv_no, line_number )
);
I would appreciate any help. Thanks.
One-to-one isn't well represented as a first-class relationship type in standard SQL. Much like many-to-many, which is achieved using a connector table and two one-to-many relationships, there's no true "one to one" in SQL.
There are a couple of options:
Create an ordinary foreign key constraint ("one to many" style) and then add a UNIQUE constraint on the referring FK column. This means that no more than one of the referred-to values may appear in the referring column, making it one-to-one optional. This is a fairly simple and quite forgiving approach that works well.
Use a normal FK relationship that could model 1:m, and let your app ensure it's only ever 1:1 in practice. I do not recommend this, there's only a small write performance downside to adding the FK unique index and it helps ensure data validity, find app bugs, and avoid confusing someone else who needs to modify the schema later.
Create reciprocal foreign keys - possible only if your database supports deferrable foreign key constraints. This is a bit more complex to code, but allows you to implement one-to-one mandatory relationships. Each entity has a foreign key reference to the others' PK in a unique column. One or both of the constraints must be DEFERRABLE and either INITIALLY DEFERRED or used with a SET CONSTRAINTS call, since you must defer one of the constraint checks to set up the circular dependency. This is a fairly advanced technique that is not necessary for the vast majority of applications.
Use pre-commit triggers if your database supports them, so you can verify that when entity A is inserted exactly one entity B is also inserted and vice versa, with corresponding checks for updates and deletes. This can be slow and is usually unnecessary, plus many database systems don't support pre-commit triggers.

Why is the foreign key part of the primary key in an identifying relationship?

I'm trying to understand a concept rather than fixing a piece of code that won't work.
I'll take a general example of a form (parent table) and a form field (child table). Logically, this would be an identifying relationship, since a form field cannot exist without a form.
This would make me think that in order to translate the logical relationship into the technical relationship, a simple NOT NULL for the form_id field in the form_field table would suffice. (See the left part of above screenshot.)
However, when I add an identifying relationship using MySQL Workbench, form_id is not only NOT NULL but also part of the primary key. (See the right part of above screenshot.) And when I add a non-identifying relationship, NOT NULL is still applied so logically it would actually be an identifying relationship as well.
I guess this confuses me a little, as well as the fact that until now I always simply used the id field as primary key.
So I understand the logical concept of identifying vs. non-identifying relationships, but I don't understand the technical part.
Why is it, as this answer states, 'the "right" way to make the foreign key part of the child's primary key'?
What is the benefit of these composite primary keys?
Logically, this would be an identifying relationship, since a form field cannot exist without a form.
No, identifying relationship is about identification, not existence.
Any X:Y relationship where X >= 1 guarantees existence of the left side, whether identifying or not. In your case, a 1:N relationship guarantees existence of form for any given form_field. You could make it identifying or non-identifying and it would still guarantee the same.
Remarks:
You would model an identifying relationship by making form_field.form_id part of a key. For example form_field PK could look like: {form_id, label}, which BTW would be quite beneficial for proper clustering of your data (InnoDB tables are always clustered).
Just making a PK: {id, form_id} would be incorrect, since this superkey is not a candidate key (i.e. it is not minimal - we could remove form_id from it and still retain the uniqueness).
You would model a 0..1:N relationship by making the form_field.form_id NULL-able (but then you wouldn't be able to make it identifying as well - see below).
There are two definitions of the "identifying relationship":
Strict definition: A relationship that migrates parent key into child primary key1.
Loose definition: A relationship that migrates parent key into child key.
In other words, the loose definition allows migration into alternate key as well (and not just primary).
Most tools2 seem to use the strict definition though, so if you mark the relationship as identifying, that will automatically make the migrated attributes part of the child PK, and none of the PK attributes can be NULL.
1 Which is then either completely comprised from migrated attributes, or is a combination of migrated attributes and some additional attributes.
2 ERwin and Visio do. I haven't used MySQL Workbench for modeling yet, but your description seems to suggest it behaves the same.
An identifying relationship is supposed to be one where the primary key includes foreign key attributes. That's why when you designate a relationship as identifying the posted foreign key is deemed to be part of the primary key.
The difference between an "identifying" relationship and a non-identifying one is purely informational or diagrammatic if the same key constraints and nullability constraints apply in each case. The concept is analogous to and a consequence of designating a "primary" key. If a table has more than one candidate key then all other things being equal it doesn't matter from a logical perspective which key is designated the primary one - the form, function and (presumably) the business meaning of the table is the same.
In your example however, the keys in the two tables are NOT the same. In the first case ID is unique in the form_field table while in the second case it apparently isn't. I expect that's not what you intended.

Maintaining subclass integrity in a relational database

Let's say I have a table that represents a super class, students. And then I have N tables that represent subclasses of that object (athletes, musicians, etc). How can I express a constraint such that a student must be modeled in one (not more, not less) subclass?
Clarifications regarding comments:
This is being maintained manually, not through an ORM package.
The project this relates to sits atop SQL Server (but it would be nice to see a generic solution)
This may not have been the best example. There are a couple scenarios we can consider regarding subclassing, and I just happened to invent this student/athlete example.
A) In true object-oriented fashion, it's possible that the superclass can exist by itself and need not be modeled in any subclasses.
B) In real life, any object or student can have multiple roles.
C) The particular scenario I was trying to illustrate was requiring that every object be implemented in exactly one subclass. Think of the superclass as an abstract implementation, or just commonalities factored out of otherwise disparate object classes/instances.
Thanks to all for your input, especially Bill.
Each Student record will have a SubClass column (assume for the sake of argument it's a CHAR(1)). {A = Athlete, M=musician...}
Now create your Athlete and Musician tables. They should also have a SubClass column, but there should be a check constraint hard-coding the value for the type of table they represent. For example, you should put a default of 'A' and a CHECK constraint of 'A' for the SubClass column on the Athlete table.
Link your Musician and Athlete tables to the Student table using a COMPOSITE foreign key of StudentID AND Subclass. And you're done! Go enjoy a nice cup of coffee.
CREATE TABLE Student (
StudentID INT NOT NULL IDENTITY PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Name VARCHAR(200) NOT NULL,
CONSTRAINT UQ_Student UNIQUE (StudentID, SubClass)
);
CREATE TABLE Athlete (
StudentID INT NOT NULL PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Sport VARCHAR(200) NOT NULL,
CONSTRAINT CHK_Jock CHECK (SubClass = 'A'),
CONSTRAINT FK_Student_Athlete FOREIGN KEY (StudentID, Subclass) REFERENCES Student(StudentID, Subclass)
);
CREATE TABLE Musician (
StudentID INT NOT NULL PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Instrument VARCHAR(200) NOT NULL,
CONSTRAINT CHK_Band_Nerd CHECK (SubClass = 'M'),
CONSTRAINT FK_Student_Musician FOREIGN KEY (StudentID, Subclass) REFERENCES Student(StudentID, Subclass)
);
Here are a couple of possibilities. One is a CHECK in each table that the student_id does not appear in any of the other sister subtype tables. This is probably expensive and every time you need a new subtype, you need to modify the constraint in all the existing tables.
CREATE TABLE athletes (
student_id INT NOT NULL PRIMARY KEY,
FOREIGN KEY (student_id) REFERENCES students(student_id),
CHECK (student_id NOT IN (SELECT student_id FROM musicians
UNION SELECT student_id FROM slackers
UNION ...))
);
edit: #JackPDouglas correctly points out that the above form of CHECK constraint is not supported by Microsoft SQL Server. Nor, in fact, is it valid per the SQL-99 standard to reference another table (see http://kb.askmonty.org/v/constraint_type-check-constraint).
SQL-99 defines a metadata object for multi-table constraints. This is called an ASSERTION, however I don't know any RDBMS that implements assertions.
Probably a better way is to make the primary key in the students table a compound primary key, the second column denotes a subtype. Then restrict that column in each child table to a single value corresponding to the subtype represented by the table. edit: no need to make the PK a compound key in child tables.
CREATE TABLE athletes (
student_id INT NOT NULL PRIMARY KEY,
student_type CHAR(4) NOT NULL CHECK (student_type = 'ATHL'),
FOREIGN KEY (student_id, student_type) REFERENCES students(student_id, student_type)
);
Of course student_type could just as easily be an integer, I'm just showing it as a char for illustration purposes.
If you don't have support for CHECK constraints (e.g. MySQL), then you can do something similar in a trigger.
I read your followup about making sure a row exists in some subclass table for every row in the superclass table. I don't think there's a practical way to do this with SQL metadata and constraints. The only option I can suggest to meet this requirement is to use Single-Table Inheritance. Otherwise you need to rely on application code to enforce it.
edit: JackPDouglas also suggests using a design based on Class Table Inheritance. See his example or my examples of the similar technique here or here or here.
If you are interested in data modeling, in addition to object modeling, I suggest you look up "relational modeling generalization specialization" on the web.
There used to be some good resources out there that explains this kind of pattern quite well.
I hope those resources are still there.
Here's a simplified view of what I hope you'll find.
Before you begin designing a database, it's useful to come up with a conceptual data model that connects the values stored in the database back to the subject matter. Making a conceptual data model is really data analysis, not database design. Sometimes it's difficult to keep analysis and design separate.
One way of modeling data at the conceptual level is the Entity-Relationship (ER) model. There are well known patterns for modeling the specialization-generalization situation. Converting those ER patterns to SQL tables (called logical design) is pretty straightforward, although you do have to make some design choices.
The case you gave of a student having possibly several roles like musician probably doesn't illustrate the case you are interested in, if I read you right. You seem to be interested in the case where the subclasses are mutually exclusive. Perhaps the case where a vehicle might be an auto, a truck, or a motorcycle might be easier to discuss.
One difference you are likely to encounter is that the general table for the superclass doesn't really need the type code column. The type of a single superclass instance can be derived by the presence or absence of foreign keys in the various subclass tables. Whether it's smarter to include or omit the type code depends on how you intend to use the data.
interesting problem. Of course the FK constraints are there for the subtables so there has to be a student for those.
The main problem is trying to check as it is inserted. The student has to be inserted first so that you don't violate a FK constraint in a subtable so a trigger that does a check wouldn't work.
You could write an app that checks now and then if you are really concerned about this. I think the biggest fear though would be deletions. Someone could delete a subtable entry but not the student. You could have triggers to check when items are deleted from the subtables since that is probably the biggest problem.
I have a db with a table per subclass hierarchy like this as well. I use Hibernate and its mapped properly so it deletes everything automatically. If doing this by 'hand' then I would make sure to always delete the parent with proper cascades hehe :)
Thanks, Bill. You got me thinking...
The superclass table has a subclass code column. Each of the subclass tables has a foreign key constraint, as well as one that dictates that the id exist with a subset of the superclass table (where code = athlete).
The only missing part here is that it's possible to model a superclass without a subclass. Even if you make the code column mandatory, it could just be an empty join. That can be fixed by adding a constraint that the superclass's ids exist in a union of the ids in the subclass tables. Insertion gets a little hairy with these two constraints if constraints are enforced in the middle of transactions. That or just don't worry about unsubclassed objects.
Edit: Bleh, such a good sounding idea... But impeded by the fact that subqueries that refer to other tables aren't supported. At least not in SQL Server.
That can be fixed by adding a constraint that the superclass's ids exist in a union of
the ids in the subclass tables.
Depending on how much intelligence you want to put into your schema (and how much MS SQL Server lets you put there), you wouldn't actually need to do a union of the subclass tables, since you know that, if the id exists in any subclass table, it must exist in the same subclass as the one identified by the subclass code column.
I would add a Check Constraint possibly.
Create the ForeignKeys as Nullable.
Add a Check to make sure they aren't both null and to make sure they aren't both set.
CONSTRAINT [CK_HasOneForiegnKey] CHECK ((FK_First!= NULL OR FK_Second != NULL) AND NOT (FK_First != NULL AND FK_Second != NULL)).
I am not sure but I believe this would allow you to set only one key at a time.