Converting an ER diagram to relational model - sql

I know how to convert an entity set, relationship, etc. into the relational model but what i wonder is that what should we do when an entire diagram is given? How do we convert it? Do we create a separate table for each relationship, and for each entity set? For example, if we are given the following ER diagram:
My solution to this is like the following:
//this part includes the purchaser relationship and policies entity set
CREATE TABLE Policies (
policyid INTEGER,
cost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (policyid).
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
//this part includes the dependents weak entity set and beneficiary relationship
CREATE TABLE Dependents (
pname CHAR(20),
age INTEGER,
policyid INTEGER,
PRIMARY KEY (pname, policyid).
FOREIGN KEY (policyid) REFERENCES Policies,
ON DELETE CASCADE)
//This part includes Employees entity set
CREATE TABLE Employees(
ssn Char(11),
name char (20),
lot INTEGER,
PRIMARY KEY (ssn) )
My questions are:
1)Is my conversion true?
2)What are the steps for converting a complete diagram into relational model.
Here are the steps that i follow, is it true?
-I first look whether there are any weak entities or key constraints. If there
are one of them, then i create a single table for this entity set and the related
relationship. (Dependents with beneficiary, and policies with purchaser in my case)
-I create a separate table for the entity sets, which do not have any participation
or key constraints. (Employees in my case)
-If there are relationships with no constraints, I create separate table for them.
-So, in conclusion, every relationship and entity set in the diagram are included
in a table.
If my steps are not true or there is something i am missing, please can you write the steps for conversion? Also, what do we do if there is only participation constraint for a relationship, but no key constraint? Do we again create a single table for the related entity set and relationship?
I appreciate any help, i am new to databases and trying to learn this conversion.
Thank you

Hi #bigO I think it is safe to say that your conversion is true and the steps that you have followed are correct. However from an implementation point of view, there may be room for improvement. What you have implemented is more of a logical model than a physical model
It is common practice to add a Surrogate Instance Identifier to a physical table, this is a general requirement for most persistence engines, and as pointed out by #Pieter Geerkens, aids database efficiency. The value of the instance id for example EmployeeId (INT) would be automatically generated by the database on insert. This would also help with the issue that #Pieter Geerkens has pointed out with the SSN. Add the Id as the first column of all your tables, I follow a convention of tablenameId. Make your current primary keys into secondary keys ( the natural key).
Adding the Ids then makes it necessary to implement a DependentPolicy intersection table
DependentPolicyId, (PK)
PolicyId,
DependentId
You may then need to consider as to what is natural key of the Dependent table.
I notice that you have age as an attribute, you should consider whether this the age at the time the policy is created or the actual age of the dependent, I which case you should be using date of birth.
Other ornamentations you could consider are creation and modified dates.
I also generally favor using the singular for a table ie Employee not Employees.
Welcome to the world of data modeling and design.

Related

get confused in primary key concept and nomalisation

Let's take an example of a login table with the columns: id (primary key),username,password,status.
id is primary key but still we authenticate user by searching table through username+password. Doesn't it violate normalisation rule?
Another example: suppose we have two tables, employer and job
employer table's id is used injob table as foreign key but job table itself has its own id
job table
---------
id (primary key) || employer_id (foreign key) || etc etc
Now when we will search job posted by a employer we use employer_id but this table has its primary key?
Primary key on any table is an unique identifier and indexed by default. It does not mean that records will always be searched through that field itself. Now, when you are to link a parent record to a child record, you can use the primary key to establish the relationships.
While trying to get the records, you will make use of primary key as a link between different tables. For example, you have employers and employees. A search might look like: "Get me all the employees for this employer". Now, employer is the main entity here and we are looking for linked employee records for it. Here, employer id will help us fetch related records. A query for the same might appear something like this:
SELECT [COLUMN NAMES HERE]
FROM EMPLOYER INNER JOIN EMPLOYEE ON
EMPLOYER.ID = EMPLOYEE.EMPLOYERID
I suggest you to read some tutorial, but, shortly ... a primary key is an id that is unique and not null and identifies an entry in your table. A foreign key is a reference to an id in another table. As you said, employee and job tables. An id is in most cases generated by a sequence, you don't know its value before inserting the record.
You usually perform seraches through username, name, ... datas that are user-known. In your case, when you search for a job, you will probably perform a Join. A join is a relation (and there are more types) between tables. In your case you will do
select *
from employer emp inner join job jjj on emp.id = jjj.employer _id
The ids are usefull, code-wise, when you have to update/delete a record. In this case usually you know everything about your record, id included, then you will use the id (also because ids usually have indexes, so the queries are faster). You can usually create indexes in the columns you use in filters, in order to reduce the execution time of your queries.
We need to distinguish between business (or candidate) keys and technical (or surrogate) keys. The business key is the data item(s) which uniquely identify this row in the real world. The technical key is a convenience for data management, generated by some computer process such as a sequence or sys_guid().
Yes, the use of technical keys means storing redundant information but this is a case where practicality trumps theory. Primary keys should not change but in real life they can (e.g. people change their name due to various life events). Technical keys have no meaning so do not change. Business keys are often compound keys, which is often inconvenient for enforcing foreign keys (and occasionally highly undesirable, for instance when the business key is a sensitive data item such as Social Security Number).
So, it is common for tables to have an ID column as the primary key, which is used for foreign key relationships, and a unique constraint to enforce the business key.
In your first example username is a business key and id is a technical key. This is one reason why we should have two data models. The Logical Data Model has an Entity called user with a candidate key of username. The physical data model has a Table called user with a primary key of id and a unique key of username.
For your second example, it seems you're modelling a Situations Vacant job board. The relationship between Employer and Job is one to many, that is an employer can advertise many jobs. So the Job table has its own primary key, id, plus a foreign key employer_id which references the primary key of the Employer table. This means we can find all the jobs for a specific employer. But because the job table has its own primary key we can identify each job so that we can distinguish the Janitor job at Harrisons Pharmaceuticals from the Janitor job at Ravi's Cash'n'Carry. (Which incidentally shows why we need technical keys: imagine if the employer_id foreign key was a string of varchar2(128) for each row on the Job table. )
In a Logical Data Model the employer_id would be implied on the Entity job by a link to the Entity employee but would be shown (actually it might, it depends on the tool). In the physical model the column must be added to the dependent table, because database constraints (and joins!) physically need the column in order to work.
So we can see that Normalisation applies to the representation and storage of business data but the practicalities of database engines mean that we need additional columns to manage the data.

Creating PostgreSQL tables + relationships - PROBLEMS with relationships - ONE TO ONE

So I am supposed to create this schema + relationships exactly the way this ERD depicts it. Here I only show the tables that I am having problems with:
So I am trying to make it one to one but for some reason, no matter what I change, I get one to many on whatever table has the foreign key.
This is my sql for these two tables.
CREATE TABLE lab4.factory(
factory_id INTEGER UNIQUE,
address VARCHAR(100) NOT NULL,
PRIMARY KEY ( factory_id )
);
CREATE TABLE lab4.employee(
employee_id INTEGER UNIQUE,
employee_name VARCHAR(100) NOT NULL,
factory_id INTEGER REFERENCES lab4.factory(factory_id),
PRIMARY KEY ( employee_id )
);
Here I get the same thing. I am not getting the one to one relationship but one to many. Invoiceline is a weak entity.
And here is my code for the second image.
CREATE TABLE lab4.product(
product_id INTEGER PRIMARY KEY,
product_name INTEGER NOT NULL
);
CREATE TABLE lab4.invoiceLine(
line_number INTEGER NOT NULL,
quantity INTEGER NOT NULL,
curr_price INTEGER NOT NULL,
inv_no INTEGER REFERENCES invoice,
product_id INTEGER REFERENCES lab4.product(product_id),
PRIMARY KEY ( inv_no, line_number )
);
I would appreciate any help. Thanks.
One-to-one isn't well represented as a first-class relationship type in standard SQL. Much like many-to-many, which is achieved using a connector table and two one-to-many relationships, there's no true "one to one" in SQL.
There are a couple of options:
Create an ordinary foreign key constraint ("one to many" style) and then add a UNIQUE constraint on the referring FK column. This means that no more than one of the referred-to values may appear in the referring column, making it one-to-one optional. This is a fairly simple and quite forgiving approach that works well.
Use a normal FK relationship that could model 1:m, and let your app ensure it's only ever 1:1 in practice. I do not recommend this, there's only a small write performance downside to adding the FK unique index and it helps ensure data validity, find app bugs, and avoid confusing someone else who needs to modify the schema later.
Create reciprocal foreign keys - possible only if your database supports deferrable foreign key constraints. This is a bit more complex to code, but allows you to implement one-to-one mandatory relationships. Each entity has a foreign key reference to the others' PK in a unique column. One or both of the constraints must be DEFERRABLE and either INITIALLY DEFERRED or used with a SET CONSTRAINTS call, since you must defer one of the constraint checks to set up the circular dependency. This is a fairly advanced technique that is not necessary for the vast majority of applications.
Use pre-commit triggers if your database supports them, so you can verify that when entity A is inserted exactly one entity B is also inserted and vice versa, with corresponding checks for updates and deletes. This can be slow and is usually unnecessary, plus many database systems don't support pre-commit triggers.

Can a foreign key have a constant instead of a field name? Relate FK to STI subclass

Setup
So here's a scenario which I'm finding is rather common once you decide to play with STI (Single Table Inheritance).
You have some base type with various subtypes.
Person < (Teacher,Student,Staff,etc)
User < (Member,Admin)
Member < (Buyer,Seller)
Vehicle < (Car,Boat,Plane)
etc.
There are two major approaches to modelling that in the database:
Single Table Inheritance
One big table with a type field and a bunch of nullable fields
Class Table Inheritance
One table per type with shared PK (FK'd from the children to the parent)
While there are several issues with STI, I do like how it manages to cut down on the number of joins you have to make, as well as some of the support in frameworks like Rails, but I am running into an issue on how to relate subclass-specific tables.
For example:
Certifications should only reference Teacher-Persons
Profiles should only reference Member-Users
WingInformation should be not be related to a car or boat (unless you are Batman maybe)
Advertisements are owned by Seller-Members not Buyer-Members
With CTI, these relationships are trivial - just slap a Foreign Key on the related table and you're done:
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id) REFERENCES sellers (id)
But with STI, the similar thing wouldn't capture the subtype restriction.
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id) REFERENCES members (id)
What I would like to see is something like:
* Does not work in most (all?) databases *
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id, 'seller') REFERENCES members (id, type)
All I have been able to find is a dirty hack requiring adding a computed column to the related table:
ALTER TABLE advertisements
ADD seller_type VARCHAR(20) NOT NULL DEFAULT 'seller'
ALTER TABLE advertisements
FOREIGN KEY (seller_id, seller-type) REFERENCES members (id, type)
This strikes me as odd (not to mention inelegant).
The real questions
Is there a RDBMS out there which will allow me to do this?
Is there a reason why this isn't even possible?
Is this just one more reason why NOT to use STI except in the most trivial of cases?
There's no standard way to declare a constant in the foreign key declaration. You have to name columns.
But you could force the column to have a fixed value, using one of the following methods:
Computed column
CHECK constraint
Trigger before INSERT/UPDATE to overwrite any user-supplied value with the default value.

database design pattern: many to many relationship across tables?

I have the following tables:
Section and Content
And I want to relate them.
My current approach is the following table:
In which I would store
Section to Section
Section to Content
Content to Section
Content to Content
Now, while I clearly can do that by adding a pair of fields that indicate whether the source is a section or a content, and whether the target is a section or a content, I'd like to know if there's a cleaner way to do this. and if possible using just one table for the relationship, which would be the cleanest in my opinion. I'd also like the table to be somehow related to the Section and Content tables so I can avoid manually adding constraints, or triggers that delete the relationships when a Section or Content is deleted...
Thanks as usual for the input! <3
Here's how I would design it:
CREATE TABLE Pairables (
PairableID INT IDENTITY PRIMARY KEY,
...other columns common to both Section and Content...
);
CREATE TABLE Sections (
SectionID INT PRIMARY KEY,
...other columns specific to sections...
FOREIGN KEY (SectionID) REFERENCES Pairables(PairableID)
);
CREATE TABLE Contents (
ContentID INT PRIMARY KEY,
...other columns specific to contents...
FOREIGN KEY (ContentID) REFERENCES Pairables(PairableID)
);
CREATE TABLE Pairs (
PairID INT NOT NULL,
PairableId INT NOT NULL,
IsSource BIT NOT NULL,
PRIMARY KEY (PairID, PairableID),
FOREIGN KEY (PairableID) REFERENCES Pairables(PairableID)
);
You would insert two rows in Pairs for each pair.
Now it's easy to search for either type of pairable entity, you can search for either source or target in the same column, and you still only need one many-to-many intersection table.
Yes, there is a much cleaner way to do this:
one table tracks the relations from Section to Section and enforces them as foreign key constraints
one table tracks the relations from Section to Content and enforces them as foreign key constraints
one table tracks the relations from Content to Section and enforces them as foreign key constraints
one table tracks the relations from Content to Content and enforces them as foreign key constraints
This is much cleaner than a single table with overloaded IDs that cannot be enforced by foreign key constraints. The fact that the data modeling, nor the domain modeling patterns, never mention a pattern like the one you describe should be your first alarm bell. The second alarm should be that the engine cannot enforce the constraints you envision and you have to dwell into triggers.
Having four distinct relationships modeled in one table brings no elegance to the model, it only adds obfuscation. Relational model is not C++: it has no inheritance, it has no polymorphism, it has no overloading. Trying to enforce a OO mind set into data modeling has led many a fine developers into a mud of unmaintainable trigger mesh of on-disk table-like bits vaguely resembling 'data'.

How to design a database schema to support tagging with categories?

I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.
For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.
1. manufacture: Mercedes
model: SLK32 AMG
convertible: hardtop
2. manufacture: Ford
model: GT90
production phase: prototype
3. manufacture: Mazda
model: MX-5
convertible: softtop
Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.
I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.
My current design is something like this:
vehicles
int vid
String vin
vehicleTags
int vid
int tid
tags
int tid
String tag
int cid
categories
int cid
String category
I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?
Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?
My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:
vehicleTags
int vid
int cid
int tid
This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.
What should my schema be?
Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?
EDIT
I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.
This is yet another variation on the Entity-Attribute-Value design.
A more recognizable EAV table looks like the following:
CREATE TABLE vehicleEAV (
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY (vid, attr_name),
FOREIGN KEY (vid) REFERENCES vehicles (vid)
);
Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.
What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid),
FOREIGN KEY (tid) REFERENCES tags(tid)
);
CREATE TABLE categories (
cid INTEGER PRIMARY KEY,
category VARCHAR(20) -- "attr_name"
);
CREATE TABLE tags (
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) -- "attr_value"
);
If you're going to use the EAV design, you only need the vehicleTags and categories tables.
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER, -- reference to "attr_name" lookup table
tag VARCHAR(100, -- "attr_value"
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid)
);
But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.
How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.
Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is RDF4J (formerly Sesame).
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
What you describe are not tags, tags are only values, they do not have an associated key.
Tags are normally implemented as a string column, the value being a list of values delimited.
For example #1, a tag field would contain a value such as:
"manufacture_Mercedes,model_SLK32 AMG,convertible_hardtop"
The user then would normally be able to easily filter entries, by the existence of one or more tags. It is essentially schemaless data from a database perspective. There are downsides to tags, but they also avoid the extreme complications that come from using an EAV model. If you really need an EAV model, it also might be worth considering an attributes field, which contains JSON data. It's more painful to query, but still not as horrible as querying EAV across multiple tables.
I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.
So taking from your example the vehicle table would be something like:
vehicle
vid
vin
make
model
One way would be to slightly rethink your schema, normalising tag keys away from values:
vehicles
int vid
string vin
tags
int tid
int cid
string key
categories
int cid
string category
vehicleTags
int vid
int tid
string value
Now all you need is a unique constraint on vehicleTags(vid, tid).
Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.