using one to many over many to many sql - sql

I'm designing a database model where there are agents that can have customers.
From a best practice standpoint, I'd like to know what is best relationship to use.
The thing is, a customer could be working with multiple agents. If we want to consider that the customer should be treated as if they are being worked from a different angle, is it best practice to design the model as a one to many relationship instead of a many to many?
In otherwords, if Agent A and Agent B are working with John Doe, should we treate John Doe as separate entities for each agent, even though the record of John Doe may be the same (think like contact details).

It sounds like you simply want a junction/association table with columns such as:
CustomerId
AgentId
You may also want dates, descriptions and other information describing the relationship being worked on.

You mean to have agent_customer table? Which has the auto increment ID as PK, agentID and customerID so one agent can have multiple customers and a customer can have multiple agents. Make sure to make agentID and customerID unique to avoid redundancy.

Related

Database Design for Contacts

So I have 3 main entities. Airports, Customers, and Vendor.
Each of these will have multiple contacts I need to relate to each.
So they way I set it up currently.
I have the following tables..
Airport
Customer
Vendor
I then have one Contacts table and a xref for Airport, Customer, Vendor...
I am questioning that and was thinking a contacts table for each ..
Airport and AirportContacts
Customer and CustomerContacts
Vendor and VendorContacts
Any drawbacks to either of these designs?
To me, the deciding factor is duplication of entities vs "one version of the truth". If a single real-world person can be a contact for more than one of the other entities, then you don't want to store that single person in multiple contact tables, because then you have to maintain any changes to his/her properties in multiple places.
If you put the same "Joe Smith" in both AirportContacts and VendorContacts, then one day when you look and see his city is "Denver" in one table and "Boston" in another table, which one will you consider to be the truth?
But as someone mentioned in comments, if a contact can only be associated with one of the three other entities ("types" as you call them), then putting them in separate tables makes the most sense.
And there's yet a third scenario. Say "Joe Smith" can be a contact for both Airports and Vendors. But say that he has some properties, like his gender and age, which are the same regardless of which "type" he is being considered, but there might be some properties, like phone number, or position/job title, which could depend on the "type". Maybe he uses one phone in his capacity as an Airport Vendor, and a different phone as a Vendor Contact. Moreover, maybe there are some properties that apply to one type of contact that don't apply to the others. In these cases, I would look at some kind of hybrid approach where you keep common properties in a single Contact table, and "Type"-specific properties in their own type-related tables. These type-related tables would be bridge tables that have FKs back to the Contact table and to the main entity table of the "Type" they are related to (Vendor, Customer or Airport).
What I have so far ... Dont mind some of the data types.. just placed quick placeholders..

Link table(s) or redundant columns, SQL optimisation

I have two tables, Users and People, both of which share a common attribute, email address, of which they should be allowed to have many email addresses.
I can see three options myself:
One link table with redundant columns:
Users [id,email_id] and People [id,email_id]
EmailAddress [id,user_id,person_id,email_id]
Emails [id,address,type]
Two link tables without redundancies:
Users [id,email_id] and People [id,email_id]
PersonEmail [id,person_id,email_id]
UserEmail [id,user_id,email_id]
Emails [id,address,type]
No link tables with redundant columns:
Users [id] and People [id]
Emails [id,address,type,user_id,person_id]
Does anyone have any idea what would be the best option, or if there is any other ways? Also, if anyone knows how to implement or feel it is better to have link tables without the generated id column please also specify.
Update: a User has many People, a person belongs to a User
First off, the relationship between user and e-mail is 1:N, not M:N, so in any case you don't need the "link" table EmailAddress.
You need to decide which of these possibilities is true for your application:
User is always person.
Person is always user.
There can be a person that is not user and there can be a user that is not person.
Option 1:
Assuming the option (1) is the correct one, the logical model should look like this:
The symbol between Person and User is "category", which at the level of the physical database can be implemented either:
as a "1 to 0 or 1" relationship between separate tables Person and User,
or a single table containing both person and user fields, where user fields are NULL for persons that are not also users.
If you have...
many user-specific fields,
there are user-specific foreign keys,
new kinds of persons could be added in the future
and you don't need to squeeze-out every last drop of performance,
...choose the implementation strategy with two tables.
If there are:
relatively few user-specific fields,
there are no user-specific relationships,
low "evolvability" is acceptable
and performance is of high importance,
...choose the implementation strategy with the single table.
Similar analysis can be done for each of the remaining possibilities...
Option 2:
Option 3:
If the two entities are conceptually related, then it might make sense to have one table. But if they are two different concepts, then in my experience it is best to have separate tables in order to avoid future confusion. And you're not going to take a big hit anywhere by doing so.
Isn't the User a Person (People)?
That would solve the redundant field issue right away.
----------
| Person |
----------
|
--------
| User |
--------
The User should have the single e-mail field, or mantain the relation with the e-mails table, since Person is an abstract concept not related to any application.
I would say start thinking about (re)modeling your schema, so you won't have problems like this.
Read the Multiple Table Inheritance in Rails guide, that should get you started.

Database design....storing relationship in one table, and the data in another table?

I'm looking at this company's database design, and would like to know the purpose of their design, ie store relationship in one table and the data in another, why do this?
They have this,
EMPLOYEE
Id (PK)
DepartmentId
EMPLOYEE_DATA
EmployeeId (PK)
First Name
Last Name
Position
etc...
Rather than this...
EMPLOYEE
Id (PK)
DepartmentId
First Name
Last Name
Position
etc...
...OR this...(employee can belong to many departments)
EMPLOYEE
Id (PK)
First Name
Last Name
etc...
EMPLOYEE_DEPARTMENT
Id
EmployeeId
DepartmentId
Position
That's a link table, or join table, or cross table.. lots of different names.
How would you assign an employee to two different departments with your design? You can't. You can only assign them to one.
With their design, they can assign the same ID to multiple departments by creating multiple records with the employee ID and different department ID's.
EDIT:
You need to be more specific about what you're asking. Your first question seemed to be asking what the purpose of mapping table was. Then you changed it, then you changed it again.. none of which makes much sense.
It seems now that you are asking what the better design is, which is a totally different question than what you originally asked. Please state specifically what question you want answered so we don't have to guess.
EDIT2:
Upon re-reading, if this is the actual design, then no.. It does not support multiple department id's. Such a design makes little sense, except for one situation. If the original design did not include a department, this would allow them to add a department ID without modifying the original EMPLOYEE_DATA table.
They may not have wanted to update legacy code to support the Employee id, so they added it this way.
Purpose of design is determined by business rules.
Business rules dictate entity (logical model perspective) / table (physical model perspective) design. No design is "bad" if it is built according to the requirements that were determined based on business rules. Those rules can however change over time -- foreseeing such changes and building to accommodate/future-proof the data model can really save time, effort and ultimately money.
The first and third example are the same -- the third example has an extraneous column (EMPLOYEE_DEPARTMENT.id). ORMs don't like composite keys, so a single column is used in place of.
The second example is fine if:
employees will never work for more than one department
there's no need for historical department tracking
Conclusion:
The first/third example is more realistic for the majority of real-world situations, and can be easily customized to provide more value without major impact (re-writing entire table structure). It uses what is most commonly referred to as a many-to-many relationship to allow many employees to relate to many departments.
If an employee can be in more than one department, then you would need a mapping table but I'd do it like the following:
EMPLOYEE
Id (PK)
First Name
Last Name
DEPARTMENT
Id (PK)
Name
EMPLOYEE_DEPARTMENT
EmployeeId_fk (PK)
DepartmentId_fk (PK)
Position
This would allow for multiple positions in multiple departments.
You would do this if an employee can be a member of multiple departments. With the latter table, each employee can only belong to one department.
The only remotely good reason for doing this is to implement an extension model where the master table identifying unique customers does not include all the data for customers that is not always necessary. Instead, you create one core table with the core employee data and and extension table with all the supplementary fields. I've seen people take this approach to avoid creating large tables with many columns that are rarely needed. However, in my experience it's typically premature optimization, and I wouldn't recommend it.
In contrast to many responses, the model included does not support multiple departments per employee - it is not a many to many mapping approach.

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ

Do these database design styles (or anti-pattern) have names?

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.
Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.
Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.
There are other approaches but these are the two I seem to encounter most often.
Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."
Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?
Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.
The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.
Personally I prefer the first one but neither is wrong (or bad).
The second would be used for two reasons that come to mind.
You envision the possibility that a product will have more than one manager; or
You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.
There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.
Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary...
Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.
Approach 1)
Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
Linking from the Product table to the Employee table is simple.
Approach 2)
Existing queries using the Product table are not affected.
Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.
How often must you link between the two tables?
How many other queries use the Product table?
How many records in the Product table?
in the particular case you give, i think the main motivation for two tables is avoiding nulls for missing data and that's how i would characterise the two approaches.
there's a discussion of the pros and cons on wikipedia.
i am pretty sure that, given c date's dislike of this, he defines relational theory so that only the multiple table solution is "valid". for example, you could call the single table approach "poorly typed" (since the type of null is unclear - see quote on p4).