SQL schema many to many relationships - sql

I am working on a school project and am running into some trouble with my understanding of building SQL schema ER diagrams.
I have 4 entities; a customer table in a one to many relationship with my order table, an order table with a one to many relationship with my line item table, a line item table that is currently in a one to many relationship with a burger table.
The problem I'm running into is that my line item table has a composite primary key of order id, burger id and I believe that line item->burger should be in a many to many relationship because one or more line items can be associated with one or more burger and vice-a-versa but my understanding is that you do not want any many to many relationships in a normalized tabled. Are my assumptions that it's supposed to be many to many correct and why am I allowed to have a many to many relationship?
For reference my table is currently in 3NF form
Ignoring modality and assuming it's always a minimum of one, if my Burger entity has a one to many relationship with line item then I'll have the following
Which means that one burger has one or more line items associated with it, however my thought process tells me that this isn't the only case, and in fact one line item can have more than one burger but from everything im reading this is not the case. The composite keys have me really confused, and I do see why it may be in a burger->line item one->many due to the burger count field which pertains to only a single burger type (it holds the value of how many burgers are in the line item).
What I believe it should look like:
Edit:
Please ignore the customer and order tables, they were only included to provide a full view and I did notice that I forgot to include foreign keys just now.

I believe that line item->burger should be in a many to many
Why would a single line item be related to multiple burgers? It wouldn't. An order has many line items. Each line item refers to a single burger. The top diagram is correct.
Note that normally you sell things other than burgers, and you have line items that don't refer to a product, like taxes, discounts, etc. So often the PK of the items table is just (OrderId,OrderItemId), and the ProductID is a non-key attribute.

Related

Confused on SQL Relationships

I am confused in SQL Relationships, specially with One to One and One to Many. I have read many articles like this and this
For an example, below are some statements from those articles and some other articles
The relationship between customer and address tables are one to one, that is because one address is belong to one customer.
The relationship between customer and order tables are one to many, that is because one customer can make many orders.
customer and contactData tables are one to one because one contactData is belong to one customer
Yes, true, but take my argument now.
In point 2 it says the relationship is one to many, but one order is belong to exact one customer right? So you can consider it as one to one as well.
In point 3, contactData and customer relationship said to be one to one, but one customer can have many contactData right? So it is one to many
With all of these, I am seriously confused. Can someone give me a clear definition about how to find the relationship please?
The relationship between customer and order tables are one to many, that is because one customer can make many orders.
This should better read
a customer can place many orders
an order belongs to one customer
So it is a one to many relationship between the tables.
customer and contactData tables are one to one because one contactData is belong to one customer
This should better read
a contactData belongs to one customer
a customer can only have one contactData
So it is a one to one relationship between the tables.
Yes, in reality a customer of some company may have more than one contact, but in this particular database they are talking about, it is defined that a customer can have only one contactData.
EDIT: I should add that you can say "there is a one to many relationship between tables a and b", which doesn't specify which is which, so you can be more precise, saying "there is a one to many relationship from table a to b" or "there is a many to one relationship from table b to a". With one to one, you don't have such problem of course. (Adding to this, there are even more precise definitions, such as a "one to one or zero" etc. where you specify if zero records are allowed, such as a person without contact data.)
Point 2 is clearly one to many because one customer can have multiple orders but every order belongs to exactly one customer.
You have to read the relations in both directions.
Imagine this relation was one to one. You would need to create a new customer account every time you want to place a new order.
Point 3 is probably a design question. If you want to allow multiple contactData there is no reason to keep it one to one.
Dongle. You are making a mistake of looking at the subject symmetrically. Relationship is considered to be one-directional, that's why you felt confused.
Let's take for example this Customer <-> Order relation. Like you observed, customer -> order is something completely different than order -> customer. They are actually independent, however weird it may be. It's because in the real code you're never actually dealing with two objects at once, rather than "acting" from perspective one of the two. E.g. you fetched order object from the db and only then you're who is the customer that made this order. Or you have the customer object and you want to check what are his orders.
As you see, this is very uni-directional (we call relations uni- or bi-directional). You have object X and you want to "enhance" your knowledge of it, by joining it with the other table, object Y. So it's X->Y. Or Y->X. But never actually it is X<->Y. That's why you always have to look at the relations as if they were independent.
Apart from that, you're right and it usually is so, that one relation is One to One and the other One to Many.
Your "point 2" got answered by Rav.
"point 3": I found in one of your first links the one to one example with "Addresses" table. In this example a customer can have one address only. So if your customers can have more then one you will need one to many (as in the example Orders).

Database Design related to Categories and Subcategories

I have researched this to no end. I am not the only person who has asked this question... but I would like your thoughts regarding the best practice.
I'm trying to design a Database that will track financial transactions. For the sake of simplicity, each transaction can only have one Category, and each category can only have one Sub-category.
I have a self-referencing table, like this:
Table: Categories
ID, int, primary key
parentID, int, foreign key
description, text
Long story short, you end up with data like this:
1 Auto [null]
2 Bills [null]
3 Healthcare [null]
4 Maintenance 1
5 Gasoline 1
6 Cell Phone 2
7 Rent 2
8 Prescriptions 3
9 Dentist 3
So far, so good. Here is my problem:
I don't know the proper way I'm supposed to relate this all back to my Transactions table. 'Transactions' has a column for 'Category' and 'Subcategory'. Transaction.ID would be the PK, and Categories.ID would be the FK.
With Transactions related to Categories in the manner specified above, that means any value from Categories could be written to Category or Subcategory...
Is it my responsibility as the programmer to control access to the table via a form? In other words, is my only option 'programmatically controlling' what goes into the Category and Subcategory columns?
Remember, each Category can only have one Subcategory. The selected Category should only allow that Category's children...
Am I making sense?
GOOD: Auto -- Maintenance
BAD: Healthcare -- Gasoline
The case you pose is subset of the more general problem of encoding hierarchical data, tree structures, in relational tables. This case has been studied in great detail ever since relational databases first made the scene in the late 1970s.
In bookkeeping systems in particular, the idea of subcategories and categories comes up, every single time. Larger scale industrial systems tend to have a four level system, with overall account type (Expenses), Category (Transportation), Subcategory (Automotive), and sub-subcategory (Gasoline).
Your research might be more productive if you used the following search terms: "Tree structure in relational design". That search yielded the following Wikipedia summary:
http://en.wikipedia.org/wiki/Hierarchical_database_model
You can find lots of related questions and answers here in SO. Search under "nested sets" or "adjancency lists" for a couple of techniques.
Your problem is going to be to simplify the answers you will find down to the case where there are only two levels: category and subcategory.
I think whatever design you choose will want to make the following rule explicit: Subcategory determines category. and you will, IMO, want the DBMS to enforce this rule so that no transaction ends up with a subcategory that is inconsistent with its category.
So your categorizations are not orthogonal and independent (such as gender and city), but rather hierarchical (such as State and County).
In order to enforce a hierarchical categorizations, use a single categorization table with a ID column as primary key, referenced as a foreign key in the data table, and two descriptive fields Category and Subcategory.
To facilitate data entry you might supply a combo box Category which filters the available subcategories. However the actual foreign key reference is supplied by the selection made in the Subcategory combo box, which must list both fields, Category and Subcategory. It would be usual to concatenate these two fields with a delimiter such as dash (-) or pipe(|).

database design, items and orders tables

I was just after some input on database design. I have two tables, Orders and Items.
The items table is going to be a list of items that can be used on multiple orders, each item has an id
The way i thought to do it at the moment, was in the order to put an array of comma seperated ids for each item in the order.
does that sound like the best way?
also im using linq to entity framework and i dont think id be able to create a relationship between the tables, but i dont think one is needed anyway is there, since the items are not unique to an order
Thanks for any advice
The way I thought to do it at the moment, was in the order to put an array of comma separated ids for each item in the order. Does that sound like the best way?
Absolutely not - It will be MUCH more difficult in SQL to determine which orders contain a particular item, enumerate the items (to get a total, for example), and to add/remove items from an order.
A much better way would be to create an OrderItem table, which has a foreign key back to Order and Item and any other attributes relating to the item in that order - quantity, discount, comments, etc.
As far as EF goes, it will probably create a third entity (OrderItem) that will "link" the two tables. If you don't add any extra properties (which you probably should) then EF will probably create it as a many-to-many relationship between the Order and Item entities.
As far as I have understood from your question (it is not very clear), every Order can have multiple Items and every Item can be used in multiple orders. If this is what you want, you have a many to many relationship, that must be resolved using an intersection entity. This intersection entity has 2 foreign keys, one for item and one for order. Using it, you can identify what items are in a certain order and what orders need a certain item.
As my explanation is very short and very sloppy, I will recommend you the following references:
http://sd271.k12.id.us/lchs/faculty/bkeylon/Oracle/database_design/section5/dd_s05_l03.pdf
Resolve many to many relationship
Also, you proposed design is very bad, as it breaks the first normal form: no attribute can have multiple values. You shoud try to build databases at least in third normal form.
Regarding the database design, you would usually create a third table - ORDER_ITEMS - linking the two tables, containing columns (foreign keys) for order id and item id. You might also want to include a column for quantity.

Many to many relationship self join SQL Server 2008

Are there any hard and fast rules against creating a junction table out of a table's primary key? Say I have a table with a structure similar to:
In this instance there are a list of items that can be sold together, but should be marked as dangerous. Any one item can have multiple other items with which it is dangerous. All of the items are uniquely identified using their itemId. Is it OK to refer a table to itself in a many-to-many relationship? I've seen other examples on SO, but they weren't SQL specific really.
This is the correct design for your problem, as long as your combinations can only be two-item combos.
In database design a conceptual design that renders a relation in many-to-many form is converted into 2 one-to-many in physical design. Say for example a Student could take one or many courses and a course could have many students, so that's many to many. So, in actual design it would be a Student table, Course table then CourseTaken table that has both the primary key of Student and Course table thus creating 2 one to many relayionship. In your case altough the two tables are one and the same but you have the virtual third table to facilitate the 2 one to many relationship so that to me is still very viable approach.

Do these database design styles (or anti-pattern) have names?

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.
Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.
Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.
There are other approaches but these are the two I seem to encounter most often.
Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."
Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?
Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.
The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.
Personally I prefer the first one but neither is wrong (or bad).
The second would be used for two reasons that come to mind.
You envision the possibility that a product will have more than one manager; or
You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.
There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.
Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary...
Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.
Approach 1)
Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
Linking from the Product table to the Employee table is simple.
Approach 2)
Existing queries using the Product table are not affected.
Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.
How often must you link between the two tables?
How many other queries use the Product table?
How many records in the Product table?
in the particular case you give, i think the main motivation for two tables is avoiding nulls for missing data and that's how i would characterise the two approaches.
there's a discussion of the pros and cons on wikipedia.
i am pretty sure that, given c date's dislike of this, he defines relational theory so that only the multiple table solution is "valid". for example, you could call the single table approach "poorly typed" (since the type of null is unclear - see quote on p4).