ERD--> sql transformation - sql

I am having issues with translating my ER diagram into tables. In a ternary relationship with weak entities, according to the requirements:
A supplier supplies certain number parts for a project
A project uses the parts from the different suppliers.
The same kind partsfrom different suppliers are used by different
projects.
There is a name for a supplier and the city where the supplier
locates.
There is a name, color, and weight for a part.
Do I create a fourth table for supplies containing: projectNO, supplierName, City, Partname, color and weight? 6 attributes that makes up the PK for that table?

I don't think your relationship between Project and Supplies is right. Similarly, your relationships between Supplies and each of Supplier and Part are all backwards.
The crow's foot goes at the many end of the relationship. Supplies should be the ternary relationship table that you're talking about. If you are using natural keys then all of the key columns from Project, Supplier and Part should appear in Supplies as both FK to their respective tables and all together as PK.
However, your natural keys look like things that could change (e.g. supplier moves cities, part changes colour or weight). I think you might want to consider using surrogate keys to avoid future update anomalies.

Related

Right way to design database

Country table
country_id,country_name
State table
state_id,state_name,state_country_id
City table
city_id,city_name,city_state_id
Product table
product_id,product_name,product_city_id
Is the above design good?
If i want to retrieve product country id then i have to write sub queries or inner join.
or can i modify product table as below?
Product table
product_id,product_name,product_city_id,product_state_id,product_country_id
Your suggestion violates 3NF (assuming your candidate keys are the obvious ones). 3NF states (given wikipedias definition, http://en.wikipedia.org/wiki/Normal_forms#Normal_forms) that:
"Every non-prime attribute is non-transitively dependent on every candidate key in the table. The attributes that do not contribute to the description of the primary key are removed from the table. In other words, no transitive dependency is allowed."
In your proposed modification product_id is the candidate key, but product_country_id is transitively dependent of product_id
That said there are situations where you for practical reasons may choose to violate normal forms, but you should be aware of that you are doing that, and take extra precaution against update anomalies.
Side note: IMO it is also better to use standard identifiers such as country_code instead of country_id, see ISO 3166 ( http://en.wikipedia.org/wiki/ISO_3166-1_alpha-3 )
The second design (single table) allows you to create a product record against a city that is in a totally unrelated state, i.e. Moscow, Hawaii.
Therefore the second design is bad (is not normalised)
You should not be avoiding using joins in a relational database. Thats the whole point.

Compound primary key table with subtypes

Me and a database architect were having argument over if a table with a compound primary key with subtypes made sense relationally and if it was a good practice.
Say we have two tables Employee and Project. We create a composite table Employee_Project with a composite primary key back to Employee and Project.
Is there a valid way for Employee_Project to have subtypes? Or can you think of any scenario where a composite key table can have subtypes?
To me a composite key relationship is a 'Is A' relationship (Employee_Project is a Employee and a Project). Subtypes are also a 'Is A' relationship. So if you have a composite key with a subtype its two 'Is A' relationships in one sentence which makes me believe this is a bad practice.
Employee-project is a bit hard, but one can imagine something like this -- although I'm not much of a chemist.
Or something like this, which would require different legal forms (fields) for single person ownership vs joint (time-share).
Or like this, providing that different forms are needed for full time and temp.
Employee projects have subtypes if the candidate subtypes are
not utterly different, but
not exactly alike
That means that
Every employee project has some
attributes (columns) in common. So they're not utterly different.
Some employee projects have different
attributes than others. So they're not exactly alike.
The determination has to do with common and distinct attributes. It doesn't have anything to do with the number of columns in a candidate key. Do you have employee projects that are not utterly different, but not exactly alike?
The most common business supertype/subtype example concerns organizations and individuals. They're not utterly different.
Both have addresses.
Both have phone numbers.
Both can be plaintiffs and defendants
in court.
But they're not exactly alike.
Individuals can go to college.
Organizations can have a CEO.
Individuals can get married.
Individuals can have children.
Organizations (in the USA) can be liquidated.
So you can express individuals and organizations as subtypes of a supertype called, say, "Parties". The attributes all the subtypes have in common relate to the supertype.
Parties have addresses.
Parties have phone numbers.
Parties can be plaintiffs and defendants
in court.
Again, this has to do with attributes that are held in common, and attributes that are distinct. It has nothing to do with the number of columns in a candidate key.
To me a composite key relationship is
a 'Is A' relationship
(Employee_Project is a Employee and a
Project).
Database designers don't think that way. We think in terms of a table's predicate.
If an employee can have many projects and a project can have many employees it is a many-to-many join that RDBM's can only represent easily in one way (the way you have outlined above.) You can see in the ER diagram below (employee / departments is one of the classic many-to-many examples) that it does not have a separate ER component. The separate table is a leaky abstraction of RDBMS's (which is probably why you are having a hard time modeling it).
http://www.library.cornell.edu/elicensestudy/dlfdeliverables/fallforum2003/ERD_final.doc
Bridge Entities
When an instance of an entity may be related to multiple instances of another entity and vice versa, that is called a “many-to-many relationship.” In the example below, a supplier may provide many different products, and each type of product may be offered by many suppliers:
While this relationship model is perfectly valid, it cannot be translated directly into a relational database design. In a relational database, relationships are expressed by keys in a table column that point to the correct instance in the related table. A many-to-many relationship does not allow this relationship expression, because each record in each table might have to point to multiple records in the other table.
http://users.csc.calpoly.edu/~jdalbey/205/Lectures/ERD_image004.gif
Here they do not event bother with a separate box although they add in later (at this step it is a 'pure' ER diagram). It can also be explicitly represented with a box and a diamond superimposed on each other.

Difference between one-to-many and many-to-one relationship

What is the real difference between one-to-many and many-to-one relationship? It is only reversed, kind of?
I can't find any 'good-and-easy-to-understand' tutorial about this topic other than this one: SQL for Beginners: Part 3 - Database Relationships
Yes, it is vice versa. It depends on which side of the relationship the entity is present on.
For example, if one department can employ several employees then department to employee is a one-to-many relationship (1 department employs many employees), while employee to department relationship is many-to-one (many employees work in one department).
More info on the relationship types:
Database Relationships - IBM DB2 documentation
From this page about Database Terminology
Most relations between tables are one-to-many.
Example:
One area can be the habitat of many readers.
One reader can have many subscriptions.
One newspaper can have many subscriptions.
A Many to One relation is the same as one-to-many, but from a different viewpoint.
Many readers live in one area.
Many subscriptions can be of one and the same reader.
Many subscriptions are for one and the same newspaper.
What is the real difference between one-to-many and many-to-one relationship?
There are conceptual differences between these terms that should help you visualize the data and also possible differences in the generated schema that should be fully understood. Mostly the difference is one of perspective though.
In a one-to-many relationship, the local table has one row that may be associated with many rows in another table. In the example from SQL for beginners, one Customer may be associated to many Orders.
In the opposite many-to-one relationship, the local table may have many rows that are associated with one row in another table. In our example, many Orders may be associated to one Customer. This conceptual difference is important for mental representation.
In addition, the schema which supports the relationship may be represented differently in the Customer and Order tables. For example, if the customer has columns id and name:
id,name
1,Bill Smith
2,Jim Kenshaw
Then for a Order to be associated with a Customer, many SQL implementations add to the Order table a column which stores the id of the associated Customer (in this schema customer_id:
id,date,amount,customer_id
10,20160620,12.34,1
11,20160620,7.58,1
12,20160621,158.01,2
In the above data rows, if we look at the customer_id id column, we see that Bill Smith (customer-id #1) has 2 orders associated with him: one for $12.34 and one for $7.58. Jim Kenshaw (customer-id #2) has only 1 order for $158.01.
What is important to realize is that typically the one-to-many relationship doesn't actually add any columns to the table that is the "one". The Customer has no extra columns which describe the relationship with Order. In fact the Customer might also have a one-to-many relationship with ShippingAddress and SalesCall tables and yet have no additional columns added to the Customer table.
However, for a many-to-one relationship to be described, often an id column is added to the "many" table which is a foreign-key to the "one" table -- in this case a customer_id column is added to the Order. To associated order #10 for $12.34 to Bill Smith, we assign the customer_id column to Bill Smith's id 1.
However, it is also possible for there to be another table that describes the Customer and Order relationship, so that no additional fields need to be added to the Order table. Instead of adding a customer_id field to the Order table, there could be Customer_Order table that contains keys for both the Customer and Order.
customer_id,order_id
1,10
1,11
2,12
In this case, the one-to-many and many-to-one is all conceptual since there are no schema changes between them. Which mechanism depends on your schema and SQL implementation.
Hope this helps.
SQL
In SQL, there is only one kind of relationship, it is called a Reference. (Your front end may do helpful or confusing things [such as in some of the Answers], but that is a different story.)
A Foreign Key in one table (the referencing table)
References
a Primary Key in another table (the referenced table)
In SQL terms, Bar references Foo
Not the other way around
CREATE TABLE Foo (
Foo CHAR(10) NOT NULL, -- primary key
Name CHAR(30) NOT NULL
CONSTRAINT PK -- constraint name
PRIMARY KEY (Foo) -- pk
)
CREATE TABLE Bar (
Bar CHAR(10) NOT NULL, -- primary key
Foo CHAR(10) NOT NULL, -- foreign key to Foo
Name CHAR(30) NOT NULL
CONSTRAINT PK -- constraint name
PRIMARY KEY (Bar), -- pk
CONSTRAINT Foo_HasMany_Bars -- constraint name
FOREIGN KEY (Foo) -- fk in (this) referencing table
REFERENCES Foo(Foo) -- pk in referenced table
)
Since Foo.Foo is a Primary Key, it is unique, there is only one row for any given value of Foo
Since Bar.Foo is a Reference, a Foreign Key, and there is no unique index on it, there can be many rows for any given value of Foo
Therefore the relation Foo::Bar is one-to-many
Now you can perceive (look at) the relation the other way around, Bar::Foo is many-to-one
But do not let that confuse you: for any one Bar row, there is just one Foo row that it References
In SQL, that is all we have. That is all that is necessary.
What is the real difference between one to many and many to one relationship?
There is only one relation, therefore there is no difference. Perception (from one "end" or the other "end") or reading it backwards, does not change the relation.
Cardinality
Cardinality is declared first in the data model, which means Logical and Physical (the intent), and then in the implementation (the intent realised).
One to zero-to-many
In SQL that (the above) is all that is required.
One to one-to-many
You need a Transaction to enforce the one in the Referencing table.
One to zero-to-one
You need in Bar:
CONSTRAINT AK -- constraint name
UNIQUE (Foo) -- unique column, which makes it an Alternate Key
One to one
You need a Transaction to enforce the one in the Referencing table.
Many-to-Many
There is no such thing at the Physical level (recall, there is only one type of relation in SQL).
At the early Logical levels during the modelling exercise, it is convenient to draw such a relation. Before the model gets close to implementation, it had better be elevated to using only things that can exist. Such a relation is resolved by implementing an Associative Table at the physical [DDL] level.
There is no difference. It's just a matter of language and preference as to which way round you state the relationship.
Answer to your first question is : both are similar,
Answer to your second question is: one-to-many --> a MAN(MAN table) may have more than one wife(WOMEN table) many-to-one --> more than one women have married one MAN.
Now if you want to relate this relation with two tables MAN and WOMEN, one MAN table row may have many relations with rows in the WOMEN table. hope it clear.
One-to-Many and Many-to-One are similar in Multiplicity but not Aspect (i.e. Directionality).
The mapping of Associations between entity classes and the Relationships between tables. There are two categories of Relationships:
Multiplicity (ER term: cardinality)
One-to-one relationships (abbreviated 1:1): Example Husband and Wife
One-to-Many relationships (abbreviated 1:N): Example Mother and Children
Many-to-Many relationships (abbreviated M:N): Example Student and Subject
Directionality : Not affect on mapping but makes difference on how we can access data.
Uni-directional relationships: A relationship field or property that refers to the other entity.
Bi-directional relationships: Each entity has a relationship field or property that refers to the other entity.
This is an excellent question, according to my experience, in ERD diagrams and relational databases direction is implied. In RDBMS you always define Many-To->One (trivial case One-To->One) relationships. The Many side of the relationship, a.k.a children, references the One side, a.k.a parent and you implement this with a Foreign Key constraint. Technically speaking you have to access an index, fetch the Primary Key record of the One side and then visit this record to get more information.
You cannot do this the other way around unless we are speaking about Object-Relational DBMS such as Postgres, Intersystems Cache, etc. These DBMS allow you to define a bi-directional relationship between the two entities (tables). In that case accessing records the other way around, i.e. One--To-->Many is achieved by using an array of references (children). In ORMs you have classes that reference each other the same way we described here.
WARNING: Most RDBMS in the IT market are NOT relational database management systems in the strict sense, think about null values, duplicate records etc, many of these allowed features break the definition of what a Relation is.
There's no practical difference. Just use the relationship which makes the most sense given the way you see your problem as Devendra illustrated.
One-to-many and Many-to-one relationship is talking about the same logical relationship, eg an Owner may have many Homes, but a Home can only have one Owner.
So in this example Owner is the One, and Homes are the Many.
Each Home always has an owner_id (eg the Foreign Key) as an extra column.
The difference in implementation between these two, is which table defines the relationship.
In One-to-Many, the Owner is where the relationship is defined. Eg, owner1.homes lists all the homes with owner1's owner_id
In Many-to-One, the Home is where the relationship is defined. Eg, home1.owner lists owner1's owner_id.
I dont actually know in what instance you would implement the many-to-one arrangement, because it seems a bit redundant as you already know the owner_id. Perhaps its related to cleanness of deletions and changes.
---One to Many--- A Parent can have two or more children.
---Many to one--- Those 3 children can have a single Parent
Both are similar. This can be used according to the need. If you want to find children for a particular parent, then you can go with One-To-Many. Or else, want to find parents for twins, you may go with Many-To-One.
The easiest explanation I can give for this relationship is by piggybacking on evendra D. Chavan'sanswer.
Using the department and employee relationship
A department can have multiple employees, so from the employee side, it's one-to-many relationship, and from the department side it's many-to-one relationship
But if an employee can also belong to more than one department, we can also say from the employee side it's now many as opposed to one, so the relationship becomes many-to-many
In order words, a simple understanding would be, we can state that a relationship is many-to-many if one-to-many can be viewed from both sides
that is if;
one employee can belong to many departments (one-to-many)
one department can have many employees (one-to-many)
I am new to SQL and only have experience using SQLAlchemy. The documentation on relationships in SQLAlchemy does a good job explaining this, in my opinion.
You may find some clarity by reading this part
Also, I had to come up with my own example to think through this. I'll try to explain without writing a bunch of code for simplicity.
table Vehicle
column (name)
table Manufacturer
column (name)
A Vehicle can only have One manufacturer (Ford, Tesla, BMW etc.)
Manufacturers can make many Vehicles
Ford
Ford makes Mustang
Ford makes F-150
Ford makes Focus
Tesla
Tesla makes Model S
Tesla makes Model X
Tesla makes Roadster
When looking at the database rows you will want to decide if you want a column that references the other side of the relationship. This is where the SQLAlchemy documentation brings in the difference between backref vs. back_populates. I understand that is the difference between having a column in the table to reference the other side of the relationship or not having a column to reference the other side.
I hope this helps, and even more so, I hope I am accurate in the way I learned and understand this.
I have read most of the answer. The problem is not the relationship here at all. If you look at One to Many or Many to One conceptually, it is just a reversible relationship. HOWEVER, while implementing the concept in your software or application it differs a lot.
In case of Many to One, we often desire the table that has Many aspect to be written first and we desire it to associate with the table containing One aspect. If you convert Many to One concept into One to Many, you will have hard time writing the One aspect table first in your code. Since, the relationship is defined while you engineer the database, Many aspect table will seek for the One aspect table data for integrity. So if you are planning to do it by using foreign key -> unique key or foreign key -> primary key, Many to One implementation will be different even if you consider it as a One to Many.
I personally make associations without using actual relationship concepts in many cases. There is no such boundaries as to use Database concept to form relationship every time. Just make sure that your database integrity is maintained as you want, it is indexed properly for your search needs and is decently normalized.
one-to-many has parent class contains n number of childrens so it is a collection mapping.
many-to-one has n number of childrens contains one parent so it is a object mapping

Do these database design styles (or anti-pattern) have names?

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.
Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.
Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.
There are other approaches but these are the two I seem to encounter most often.
Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."
Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?
Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.
The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.
Personally I prefer the first one but neither is wrong (or bad).
The second would be used for two reasons that come to mind.
You envision the possibility that a product will have more than one manager; or
You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.
There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.
Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary...
Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.
Approach 1)
Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
Linking from the Product table to the Employee table is simple.
Approach 2)
Existing queries using the Product table are not affected.
Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.
How often must you link between the two tables?
How many other queries use the Product table?
How many records in the Product table?
in the particular case you give, i think the main motivation for two tables is avoiding nulls for missing data and that's how i would characterise the two approaches.
there's a discussion of the pros and cons on wikipedia.
i am pretty sure that, given c date's dislike of this, he defines relational theory so that only the multiple table solution is "valid". for example, you could call the single table approach "poorly typed" (since the type of null is unclear - see quote on p4).

"One-to-many" modeling question

What is the best way to model the following...
Assume I have two objects: Agency and Publisher, and both have a 1-to-n relationship to Employee. This is a true 1-to-n relationship, as each Employee can only work for one Agency or one Publisher. Let's assume further that I cannot introduce a
supertype (e.g. Employer) which holds the 1-to-n relationship.
My preferred solution is to have a foreign key in Employee that can either link to a primary key of Agency or Publisher (all my primary keys are 64-bit IDs that are unique across the database). However, now I won't be able to map a bi-directional association, without indicating in Employee whether this is an Agency or Publisher relationship.
My other option is to use two tables, AgencyEmployee and PublisherEmployee, which can then be linked as traditional 1-to-n bidirectional associations.
What do you consider best practice in this situation?
UPDATE: Thanks for the great responses in such short amount of time! What do you think of the following solution: Foreign keys in Employee for both Agency and Publisher, such as agency_id and publisher_id?
The best practice would probably be introducing an Employer class.
Dividing Employee into AgencyEmployee and PublisherEmployee, on the other hand, would be a Bad Thing™ to do.
If introducing Employer was out of the question, I would indicate the employer type (Agency or Publisher) in Employee.
Response to the updated question:
That is an option, but employer_type is better because:
You might want to add other employer types in the future
There will be an empty field in each row of the Employees table.
Sounds like this could be modeled as a ternary relationship. See if you like the approach taken in this MSDN article.
I have used this approach:
Employee.EmployerType ('AGENCY' or 'PUBLISHER')
Employee.EmployerID (AgencyID or PublisherID)
It works ok, but you need a case statement in a lot of your SQL. Also you if you are using an ORM you loose the FK relationship.
I am moving most of my code to the n:m relationships so that I can take advantage of our ORMs FK support.
Clearly an Employer entity would be preferred, but otherwise...
Having AgencyEmployee & PublisherEmployee tables means it will be difficult to determine if an employee belongs to one category or another.
So instead I would add and EmployerType column to the Employees table; it adds a little extra complexity to your subsequent SQL queries, but it works within the constraints you have given.
Out of interest, why would you be able to add AgencyEmployee & PublisherEmployee tables, but not add an Employer table? Just curious...
A +1 for Can Berk Güder.
Technically, introducing an Employer supertype means you introduce an Employer relation with all the attributes common to Agency and Publisher, an AgencySpecific (not sure how to call it) relation with the Employer key and all Agency-specific attributes and the Employer key as a foreign key, then Agency as a query or view that just joins AgencySpecific with Employer, and likewise for Publisher.
It may be a little cumbersome to have Agency and Publisher data spread across two tables that you have to join every time, but the alternative is to write most of your queries related to employees twice, once using Agency and once using Publisher, and then handcraft their unions to combine the results. (I haven't seen a query builder that helps out with basic stuff like this.) I've had to do it for a project that used the "no supertypes" approach and I don't want to do it again.
BTW ternary relationships and even n-to-n relationships aren't good design practice if you ask me. Using only functions (i.e. 1-to-n and 1-to-1) e.g. constraints become a lot easier to formulate.
My personal preference would be to have a table you expressly assume can not be used (the SuperType of Agency and Publisher; Employer)
A simple alternative would be to simply not enforce the foreign key constraint. This is not necessarily problematic, provided all modifications to the data structure is done through Stored Procedures which you control.
My final preference would be similar to your suggestion, of having one table for AgencyEmployees and one for PublisherEmployees. But I would separate this out one step further...
Table1: Agency (id, name, etc)
Table2: Publisher (id, name, etc)
Table3: Employee (id, name, etc)
Table4: AgencyEmployees (agency_id, employee_id)
Table5: PublisherEmployees (publisher_id, employee_id)
One reason I would do this is that it becomes trivial to make this time dependent. Employees start, leave and move around between companies. The following slight amendment to the above would allow you to track that...
Table1: Agency (id, name, etc)
Table2: Publisher (id, name, etc)
Table3: Employee (id, name, etc)
Table4: AgencyEmployees (agency_id, employee_id, start_date, leave_date)
Table5: PublisherEmployees (publisher_id, employee_id, start_date, leave_date)
This does not enforce 1 employer per employee at any one time, but that can be enforced by your stored procedures, GUI, etc...