I have the following tables:
Section and Content
And I want to relate them.
My current approach is the following table:
In which I would store
Section to Section
Section to Content
Content to Section
Content to Content
Now, while I clearly can do that by adding a pair of fields that indicate whether the source is a section or a content, and whether the target is a section or a content, I'd like to know if there's a cleaner way to do this. and if possible using just one table for the relationship, which would be the cleanest in my opinion. I'd also like the table to be somehow related to the Section and Content tables so I can avoid manually adding constraints, or triggers that delete the relationships when a Section or Content is deleted...
Thanks as usual for the input! <3
Here's how I would design it:
CREATE TABLE Pairables (
PairableID INT IDENTITY PRIMARY KEY,
...other columns common to both Section and Content...
);
CREATE TABLE Sections (
SectionID INT PRIMARY KEY,
...other columns specific to sections...
FOREIGN KEY (SectionID) REFERENCES Pairables(PairableID)
);
CREATE TABLE Contents (
ContentID INT PRIMARY KEY,
...other columns specific to contents...
FOREIGN KEY (ContentID) REFERENCES Pairables(PairableID)
);
CREATE TABLE Pairs (
PairID INT NOT NULL,
PairableId INT NOT NULL,
IsSource BIT NOT NULL,
PRIMARY KEY (PairID, PairableID),
FOREIGN KEY (PairableID) REFERENCES Pairables(PairableID)
);
You would insert two rows in Pairs for each pair.
Now it's easy to search for either type of pairable entity, you can search for either source or target in the same column, and you still only need one many-to-many intersection table.
Yes, there is a much cleaner way to do this:
one table tracks the relations from Section to Section and enforces them as foreign key constraints
one table tracks the relations from Section to Content and enforces them as foreign key constraints
one table tracks the relations from Content to Section and enforces them as foreign key constraints
one table tracks the relations from Content to Content and enforces them as foreign key constraints
This is much cleaner than a single table with overloaded IDs that cannot be enforced by foreign key constraints. The fact that the data modeling, nor the domain modeling patterns, never mention a pattern like the one you describe should be your first alarm bell. The second alarm should be that the engine cannot enforce the constraints you envision and you have to dwell into triggers.
Having four distinct relationships modeled in one table brings no elegance to the model, it only adds obfuscation. Relational model is not C++: it has no inheritance, it has no polymorphism, it has no overloading. Trying to enforce a OO mind set into data modeling has led many a fine developers into a mud of unmaintainable trigger mesh of on-disk table-like bits vaguely resembling 'data'.
Related
Here is a picture of the schema for a contrived example that demonstrates the problem I am facing.
My question is: In SQL, how do I make sure that the parent records to Door (House and Door Blueprint) each both have the same House Blueprint parent? In other words, how do I make sure that every Door record only has one House Blueprint grandparent?
I am already using best practices for foreign keys to designate the one to many relationships. I need the Door table, because the door instance could be painted any color based on the house. I need the Door Blueprint table, because I want to track who designed the door. Also, in this contrived example, the Door Blueprint can only have a single parent House Blueprint (I know this isn't realistic, so just ignore the possibility that Door Blueprints could be used in multiple House Blueprints).
The problem I am running into is that I sometimes get Door records with a House record attached to one House Blueprint and a Door Blueprint attached to a different House Blueprint record. This should never happen. And I could probably prevent this in my record insertion logic, but that is not at the SQL level.
It makes me uneasy that I have two different paths back to House Blueprint from Door, but I don't see any other way of doing it.
I'm not really looking for a bunch of code snippets, because I can figure out the syntax myself. Rather, I'm looking for a high-level approach to solving the problem in SQL. Also, I am using SQLite3, but I imagine this problem can be solved in any RDBMS.
Thanks in advance for any help with this!
In a relational database, you'd use assertions. In a SQL database, use overlapping foreign key references to unique constraints. (Not foreign key references to candidate keys.)
create table house_blueprints (
house_blueprint_id integer primary key
);
create table houses (
house_id integer primary key,
house_blueprint_id integer not null,
foreign key (house_blueprint_id)
references house_blueprints (house_blueprint_id),
unique (house_id, house_blueprint_id)
);
create table door_blueprints (
door_blueprint_id integer primary key,
house_blueprint_id integer not null,
foreign key (house_blueprint_id)
references house_blueprints (house_blueprint_id),
unique (door_blueprint_id, house_blueprint_id)
);
create table doors (
door_id integer primary key,
house_id integer not null,
house_blueprint_id integer not null,
foreign key (house_id, house_blueprint_id)
references houses (house_id, house_blueprint_id),
door_blueprint_id integer not null,
foreign key (door_blueprint_id, house_blueprint_id)
references door_blueprints (door_blueprint_id, house_blueprint_id)
);
The unique constraints aren't candidate keys, because they're not minimal. But they're necessary to provide targets for the overlapping foreign keys.
The table "doors" has one column for house_blueprint_id, and two overlapping foreign key constraints that use it. No way those foreign key constraints can have different values for house_blueprint_id.
I know how to convert an entity set, relationship, etc. into the relational model but what i wonder is that what should we do when an entire diagram is given? How do we convert it? Do we create a separate table for each relationship, and for each entity set? For example, if we are given the following ER diagram:
My solution to this is like the following:
//this part includes the purchaser relationship and policies entity set
CREATE TABLE Policies (
policyid INTEGER,
cost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (policyid).
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
//this part includes the dependents weak entity set and beneficiary relationship
CREATE TABLE Dependents (
pname CHAR(20),
age INTEGER,
policyid INTEGER,
PRIMARY KEY (pname, policyid).
FOREIGN KEY (policyid) REFERENCES Policies,
ON DELETE CASCADE)
//This part includes Employees entity set
CREATE TABLE Employees(
ssn Char(11),
name char (20),
lot INTEGER,
PRIMARY KEY (ssn) )
My questions are:
1)Is my conversion true?
2)What are the steps for converting a complete diagram into relational model.
Here are the steps that i follow, is it true?
-I first look whether there are any weak entities or key constraints. If there
are one of them, then i create a single table for this entity set and the related
relationship. (Dependents with beneficiary, and policies with purchaser in my case)
-I create a separate table for the entity sets, which do not have any participation
or key constraints. (Employees in my case)
-If there are relationships with no constraints, I create separate table for them.
-So, in conclusion, every relationship and entity set in the diagram are included
in a table.
If my steps are not true or there is something i am missing, please can you write the steps for conversion? Also, what do we do if there is only participation constraint for a relationship, but no key constraint? Do we again create a single table for the related entity set and relationship?
I appreciate any help, i am new to databases and trying to learn this conversion.
Thank you
Hi #bigO I think it is safe to say that your conversion is true and the steps that you have followed are correct. However from an implementation point of view, there may be room for improvement. What you have implemented is more of a logical model than a physical model
It is common practice to add a Surrogate Instance Identifier to a physical table, this is a general requirement for most persistence engines, and as pointed out by #Pieter Geerkens, aids database efficiency. The value of the instance id for example EmployeeId (INT) would be automatically generated by the database on insert. This would also help with the issue that #Pieter Geerkens has pointed out with the SSN. Add the Id as the first column of all your tables, I follow a convention of tablenameId. Make your current primary keys into secondary keys ( the natural key).
Adding the Ids then makes it necessary to implement a DependentPolicy intersection table
DependentPolicyId, (PK)
PolicyId,
DependentId
You may then need to consider as to what is natural key of the Dependent table.
I notice that you have age as an attribute, you should consider whether this the age at the time the policy is created or the actual age of the dependent, I which case you should be using date of birth.
Other ornamentations you could consider are creation and modified dates.
I also generally favor using the singular for a table ie Employee not Employees.
Welcome to the world of data modeling and design.
So I am supposed to create this schema + relationships exactly the way this ERD depicts it. Here I only show the tables that I am having problems with:
So I am trying to make it one to one but for some reason, no matter what I change, I get one to many on whatever table has the foreign key.
This is my sql for these two tables.
CREATE TABLE lab4.factory(
factory_id INTEGER UNIQUE,
address VARCHAR(100) NOT NULL,
PRIMARY KEY ( factory_id )
);
CREATE TABLE lab4.employee(
employee_id INTEGER UNIQUE,
employee_name VARCHAR(100) NOT NULL,
factory_id INTEGER REFERENCES lab4.factory(factory_id),
PRIMARY KEY ( employee_id )
);
Here I get the same thing. I am not getting the one to one relationship but one to many. Invoiceline is a weak entity.
And here is my code for the second image.
CREATE TABLE lab4.product(
product_id INTEGER PRIMARY KEY,
product_name INTEGER NOT NULL
);
CREATE TABLE lab4.invoiceLine(
line_number INTEGER NOT NULL,
quantity INTEGER NOT NULL,
curr_price INTEGER NOT NULL,
inv_no INTEGER REFERENCES invoice,
product_id INTEGER REFERENCES lab4.product(product_id),
PRIMARY KEY ( inv_no, line_number )
);
I would appreciate any help. Thanks.
One-to-one isn't well represented as a first-class relationship type in standard SQL. Much like many-to-many, which is achieved using a connector table and two one-to-many relationships, there's no true "one to one" in SQL.
There are a couple of options:
Create an ordinary foreign key constraint ("one to many" style) and then add a UNIQUE constraint on the referring FK column. This means that no more than one of the referred-to values may appear in the referring column, making it one-to-one optional. This is a fairly simple and quite forgiving approach that works well.
Use a normal FK relationship that could model 1:m, and let your app ensure it's only ever 1:1 in practice. I do not recommend this, there's only a small write performance downside to adding the FK unique index and it helps ensure data validity, find app bugs, and avoid confusing someone else who needs to modify the schema later.
Create reciprocal foreign keys - possible only if your database supports deferrable foreign key constraints. This is a bit more complex to code, but allows you to implement one-to-one mandatory relationships. Each entity has a foreign key reference to the others' PK in a unique column. One or both of the constraints must be DEFERRABLE and either INITIALLY DEFERRED or used with a SET CONSTRAINTS call, since you must defer one of the constraint checks to set up the circular dependency. This is a fairly advanced technique that is not necessary for the vast majority of applications.
Use pre-commit triggers if your database supports them, so you can verify that when entity A is inserted exactly one entity B is also inserted and vice versa, with corresponding checks for updates and deletes. This can be slow and is usually unnecessary, plus many database systems don't support pre-commit triggers.
Setup
So here's a scenario which I'm finding is rather common once you decide to play with STI (Single Table Inheritance).
You have some base type with various subtypes.
Person < (Teacher,Student,Staff,etc)
User < (Member,Admin)
Member < (Buyer,Seller)
Vehicle < (Car,Boat,Plane)
etc.
There are two major approaches to modelling that in the database:
Single Table Inheritance
One big table with a type field and a bunch of nullable fields
Class Table Inheritance
One table per type with shared PK (FK'd from the children to the parent)
While there are several issues with STI, I do like how it manages to cut down on the number of joins you have to make, as well as some of the support in frameworks like Rails, but I am running into an issue on how to relate subclass-specific tables.
For example:
Certifications should only reference Teacher-Persons
Profiles should only reference Member-Users
WingInformation should be not be related to a car or boat (unless you are Batman maybe)
Advertisements are owned by Seller-Members not Buyer-Members
With CTI, these relationships are trivial - just slap a Foreign Key on the related table and you're done:
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id) REFERENCES sellers (id)
But with STI, the similar thing wouldn't capture the subtype restriction.
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id) REFERENCES members (id)
What I would like to see is something like:
* Does not work in most (all?) databases *
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id, 'seller') REFERENCES members (id, type)
All I have been able to find is a dirty hack requiring adding a computed column to the related table:
ALTER TABLE advertisements
ADD seller_type VARCHAR(20) NOT NULL DEFAULT 'seller'
ALTER TABLE advertisements
FOREIGN KEY (seller_id, seller-type) REFERENCES members (id, type)
This strikes me as odd (not to mention inelegant).
The real questions
Is there a RDBMS out there which will allow me to do this?
Is there a reason why this isn't even possible?
Is this just one more reason why NOT to use STI except in the most trivial of cases?
There's no standard way to declare a constant in the foreign key declaration. You have to name columns.
But you could force the column to have a fixed value, using one of the following methods:
Computed column
CHECK constraint
Trigger before INSERT/UPDATE to overwrite any user-supplied value with the default value.
I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.
For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.
1. manufacture: Mercedes
model: SLK32 AMG
convertible: hardtop
2. manufacture: Ford
model: GT90
production phase: prototype
3. manufacture: Mazda
model: MX-5
convertible: softtop
Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.
I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.
My current design is something like this:
vehicles
int vid
String vin
vehicleTags
int vid
int tid
tags
int tid
String tag
int cid
categories
int cid
String category
I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?
Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?
My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:
vehicleTags
int vid
int cid
int tid
This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.
What should my schema be?
Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?
EDIT
I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.
This is yet another variation on the Entity-Attribute-Value design.
A more recognizable EAV table looks like the following:
CREATE TABLE vehicleEAV (
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY (vid, attr_name),
FOREIGN KEY (vid) REFERENCES vehicles (vid)
);
Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.
What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid),
FOREIGN KEY (tid) REFERENCES tags(tid)
);
CREATE TABLE categories (
cid INTEGER PRIMARY KEY,
category VARCHAR(20) -- "attr_name"
);
CREATE TABLE tags (
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) -- "attr_value"
);
If you're going to use the EAV design, you only need the vehicleTags and categories tables.
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER, -- reference to "attr_name" lookup table
tag VARCHAR(100, -- "attr_value"
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid)
);
But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.
How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.
Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is RDF4J (formerly Sesame).
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
What you describe are not tags, tags are only values, they do not have an associated key.
Tags are normally implemented as a string column, the value being a list of values delimited.
For example #1, a tag field would contain a value such as:
"manufacture_Mercedes,model_SLK32 AMG,convertible_hardtop"
The user then would normally be able to easily filter entries, by the existence of one or more tags. It is essentially schemaless data from a database perspective. There are downsides to tags, but they also avoid the extreme complications that come from using an EAV model. If you really need an EAV model, it also might be worth considering an attributes field, which contains JSON data. It's more painful to query, but still not as horrible as querying EAV across multiple tables.
I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.
So taking from your example the vehicle table would be something like:
vehicle
vid
vin
make
model
One way would be to slightly rethink your schema, normalising tag keys away from values:
vehicles
int vid
string vin
tags
int tid
int cid
string key
categories
int cid
string category
vehicleTags
int vid
int tid
string value
Now all you need is a unique constraint on vehicleTags(vid, tid).
Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.