SQL Join to either table, Best way or alternative design - sql

I am designing a database for a system and I came up with the following three tables
My problem is that an Address can belong to either a Person or a Company (or other things in the future) So how do I model this?
I discarded putting the address information in both tables (Person
and Company) because of it would be repeated
I thought of adding two columns (PersonId and CompanyId) to the
Address table and keep one of them null, but then I will need to add
one column for every future relation like this that appears (for
example an asset can have an address where its located at)
The last option that occur to me was to create two columns, one
called Type and other Id, so a pair of values would represent a
single record in the target table, for example: Type=Person,Id=5 and
Type=Company,Id=9 this way I can Join the right table using the type
and it will only be two columns no matter how many tables relate to
this table. But I cannot have constraints which reduce data integrity
I don't know if I am designing this properly. I think this should be a common issue (I've faced it at least three times during this small design in objects like Contact information, etc...) But I could not find many information or examples that would resemble mine.
Thanks for any guidance that you can give me

There are several basic approaches you could take, depending on how much you want to future proof your system.
In general, Has-One relationships are modeled by a foreign key on the owning entity, pointing to the primary key on the owned entity. So you would have an AddressId on both Company and Person,which would be a foreign key to Address.Id. The complexity in your case is how to handle the fact that a person can have multiple addresses. If you are 100% sure that there will only ever be a home and work address, you could put two foreign key columns on Person, but this becomes a big problem if there's a third, fourth, fifth etc. address. The other option is to create a join table, PersonAddress, with three columns a PersonId an AddressId and a AddressType, to indicate whether its a home work or whatever address.

Related

Adding an artificial primary key versus using a unique field [duplicate]

This question already has answers here:
Surrogate vs. natural/business keys [closed]
(19 answers)
Why would one consider using Surrogate keys vs Natural with ON UPDATE CASCADE?
(1 answer)
Closed 7 months ago.
Recently I Inherited a huge app from somebody who left the company.
This app used a SQL server DB .
Now the developer always defines an int base primary key on tables. for example even if Users table has a unique UserName field , he always added an integer identity primary key.
This is done for every table no matter if other fields could be unique and define primary key.
Do you see any benefits whatsoever on this? using UserName as primary key vs adding UserID(identify column) and set that as primary key?
I feel like I have to add add another element to my comments, which started to produce an essay of comments, so I think it is better that I post it all as an answer instead.
Sometimes there are domain specific reasons why a candidate key is not a good candidate for joins (maybe people change user names so often that the required cascades start causing performance problems). But another reason to add an ever-increasing surrogate is to make it the clustered index. A static and ever-increasing clustered index alleviates a high-cost IO operation known as a page split. So even with a good natural candidate key, it can be useful to add a surrogate and cluster on that. Read this for further details.
But if you add such a surrogate, recognise that the surrogate is purely internal, it is there for performance reasons only. It does not guarantee the integrity of your data. It has no meaning in the model, unless it becomes part of the model. For example, if you are generating invoice numbers as an identity column, and sending those values out into the real world (on invoice documents/emails/etc), then it's not a surrogate, it's part of the model. It can be meaningfully referenced by the customer who received the invoice, for example.
One final thing that is typically left out of this discussion is one particular aspect of join performance. It is often said that the primary key should also be narrow, because it can make joins more performant, as well as reducing the size of non-clustered indexes. And that's true.
But a natural primary key can eliminate the need for a join in the first place.
Let's put all this together with an example:
create table Countries
(
countryCode char(2) not null primary key clustered,
countryName varchar(64) not null
);
insert Countries values
('AU', 'Australia'),
('FR', 'France');
create table TourLocations
(
tourLocationName varchar(64) not null,
tourLocationId int identity(1,1) unique clustered,
countryCode char(2) not null foreign key references Countries(countryCode),
primary key (countryCode, tourLocationName)
);
insert TourLocations (TourLocationName, countryCode) values
('Bondi Beach', 'AU'),
('Eiffel Tower', 'FR')
I did not add a surrogate key to Countries, because there aren't many rows and we're not going to be constantly inserting new rows. I already know what all the countries are, and they don't change very often.
On the TourLocations table I have added an identity and clustered on it. There could be very many tour locations, changing all the time.
But I still must have a natural key on TourLocations. Otherwise I could insert the same tour location name with the same country twice. Sure, the Id's will be different. But the Id's don't mean anything. As far as any real human is concerned, two tour locations with the same name and country code are completely indistinguishable. Do you intend to have actual users using the system? Then you've got a problem.
By putting the same country and location name in twice I haven't created two facts in my database. I have created the same fact twice! No good. The natural key is necessary. In this sense The Impaler's answer is strictly, necessarily, wrong. You cannot not have a natural key. If the natural key can't be defined as anything other than "every meaningful column in the table" (that is to say, excluding the surrogate), so be it.
OK, now let's investigate the claim that an int identity key is advantageous because it helps with joins. Well, in this case my char(2) country code is narrower than an int would have been.
But even if it wasn't (maybe we think we can get away with a tinyint), those country codes are meaningful to real people, which means a lot of the time I don't have to do the join at all.
Suppose I gave the results of this query to my users:
select countryCode, tourLocationName
from TourLocations
order by 1, 2;
Very many people will not need me to provide the countries.countryName column for them to know which country is represented by the code in each of those rows. I don't have to do the join.
When you're dealing with a specific business domain that becomes even more likely. Meaningful codes are understood by the domain users. They often don't need to see the long description columns from the key table. So in many cases no join is required to give the users all of the information they need.
If I had foreign keyed to an identity surrogate I would have to do the join, because the identity surrogate doesn't mean anything to anyone.
You are talking about the difference between synthetic and natural keys.
In my [very] personal opinion, I would recommend to always use synthetic keys (and always call it id). The main problem is that natural keys are never unique; they are unique in theory, yes, but in the real world there are a myriad of unexpected and inexorable events that will make this false.
In database design:
Natural keys correspond to values present in the domain model. For example, UserName, SSN, VIN can be considered natural keys.
Synthetic keys are values not present in the domain model. They are just numeric/string/UUID values that have no relationship with the actual data. They only serve as a unique identifiers for the rows.
I would say, stick to synthetic keys and sleep well at night. You never know what the Marketing Department will come up with on Monday, and suddenly "the username is not unique anymore".
Yes having a dedicated int is a good thing for PK use.
you may have multiple alternate keys, that's ok too.
two great reasons for it:
it is performant
it protects against key mutation ( editing a name etc. )
A username or any such unique field that holds meaningful data is subject to changes. A name may have been misspelled or you might want to edit a name to choose a better one, etc. etc.
Primary keys are used to identify records and, in conjunction with foreign keys, to connect records in different tables. They should never change. Therefore, it is better to use a meaningless int field as primary key.
By meaningless I mean that apart from being the primary key it has no meaning to the users.
An int identity column has other advantages over a text field as primary key.
It is generated by the database engine and is guaranteed to be unique in multi-user scenarios.
it is faster than a text column.
Text can have leading spaces, hidden characters and other oddities.
There are multiple kinds of text data types, multiple character sets and culture dependent behaviors resulting in text comparisons not always working as expected.
int primary keys generated in ascending order have a superior performance in conjunction with clustered primary keys (which is a SQL-Server specialty).
Note that I am talking from a database point of view. In the user interface, users will prefer identifying entries by name or e-mail address, etc.
But commands like SELECT, INSERT, UPDATE or DELETE will always identify records by the primary key.
This subject - quite much like gulivar travels and wars being fought over which end of the egg you supposed to crack open to eat.
However, using the SAME "id" name for all tables, and autonumber? Yes, it is LONG establihsed choice.
There are of course MANY different views on this subject, and many advantages and disavantages.
Regardless of which choice one perfers (or even needs), this is a long established concept in our industry. In fact SharePoint tables use "ID" and autonumber by defualt. So does ms-access, and there probably more that do this.
The simple concpet?
You can build your tables with the PK and child tables with forighen keys.
At that point you setup your relationships between the tables.
Now, you might decide to add say some invoice number or whatever. Rules might mean that such invoice number is not duplicated.
But, WHY do we care of you have some "user" name, or some "invoice" number or whatever. Why should that fact effect your relational database model?
You mean I don't have a user name, or don't have a invoice number, and the whole database and relatonships don't work anymore? We don't care!!!!
The concept of data, even required fields, or even a column having to be unique ?
That has ZERO to do with a working relational data model.
And maybe you decide that invoice number is not generated until say sent to the customer. So, the fact of some user name, invoice number or whatever? Don't care - you can have all kinds of business rules for those numbers, but they have ZERO do to do with the fact that you designed a working relational data model based on so called "surrogate" or sometime called synthetic keys.
So, once you build that data model - even with JUST the PK "id" and FK (forighen keys), you are NOW free to start adding columns and define what type of data you going to put in each table. but, what you shove into each table has ZERO to do with that working related data model. They are to be thought as seperate concpets.
So, if you have a user name - add that column to the table. If you don't want users name, remove the column. As such data you store in the table has ZERO to do with the automatic PK ID you using - it not really any different then say what area of memory the computer going to allocate to load that data. Basic data operations of the system is has nothing to do with having build database with relationships that simple exist. And the data columns you add after having built those relationships is up to you - but will not, and should not effect the operation of the database and relationships you built and setup. Not only are these two concepts separate, but they free the developer from having to worry about the part that maintains the relationships as opposed to data column you add to such tables to store user data.
I mean, in json data, xml? We often have a master + child table relationship. We don't care how that relationship is maintained - but only that it exists.
Thus yes, all tables have that pk "ID". Even better? in code, you NEVER have to guess what the PK id is - it always the same!!!
So, data and columns you put and toss into a table? Those columns and data have zero to do with the PK id, and while it is the database generating that PK? It could be a web service call to some monkeys living in a far away jungle eating banana's and they give you a PK value based on how many bananas they eaten. We just really don't' care about that number - it is just internal house keeping numbers - one that we don't see or even care about in most code. And thus the number one rule to such auto matic PK values?
You NEVER give that auto PK number any meaning from a user and applcation point of view.
In summary:
Yes, using a PK called "id" for all tables? Common, and in fact in SharePoint and many systems, it not only the default, but is in fact required for such systems to operate.
Its better to use userid. User table is referenced by many other tables.
The referenced table would contain the primary key of the user table as foreign key.
Its better to use userid since its integer value,
it takes less space than string values of username and
the searches by the database engine would be faster
user(userid, username, name)
comments(commentid, comment, userid) would be better than
comments(commentid, comment, username)

A different way to Model a portion of this ERD

I have a very simple table diagram from modeling my application. The problem is I am second guessing my relation between Vendor and VendorOrder. The VendorOrders table should store all vendororders in the system. To get all orders for a certain apartment, you would just use the PK and FK relationship to gather that data. Is there anything I should improve with the overall design?
Diagram:
There's three things I see that you could improve this by doing.
Create an intersection table between your Apartment and Resident tables called ApartmentResidents, where each table references the intersection table with a one to many relationship. In this ERD, it only allows for one resident to be registered to an apartment. If a resident lives in more than one apartment for the lifetime of this database, you'll need to register them as an entirely new resident.
Intersection table example
In your Vendor table, instead of using a name as your primary key I would create an id instead. Using things that have a real-world value as your primary key can get messy for a number of reasons:
If two vendors have the same name, like "Johnson's Repair", you'll need to misspell one of them for it to be a valid key.
If you typo a vendor's name, you're also going to contain a reference to that typo in the foreign key tables (Which also might make it not show in results if you do a select query for the correct spelling).
Placing an index on a string is less performant than if you put it on an auto incrementing integer key.
(Optional) I usually name my database tables pluralized, like "Apartments", or "Vendors". It makes the SQL syntax read more like a sentence inside the query. If I remember right that's also one of the things that SQL's creator was going for too with the syntax design.

Can multiple relationship occur between two entities?

I have following two entities, where a person address is kept in a separate table because a person can have multiple addresses.
The mailing address however is one of the multiple addresses stored in address table, which is to be referred in person table.
Can following relationship exist?
The answer to your direct question is "Yes". However, you cannot declare the model only using create table because:
Declaring the foreign key address.person_id requires that the person table already exist.
Declaring the foreign key person.mailing_address requires that the address table already exist.
Hence, to implement the model, you need to use alter table to add one or both of the constraints after both tables are created.
Is this the model you want? One feature of an address is that multiple people can have the same address. Your model does not allow that. To handle this, you would typically have three tables:
Person
Address
PersonAddress
The third table has one row for each person/address pair. It can also have other information such as:
Type ("mailing" versus other types)
Effective and end dates.
Perhaps other information.
If you want to guarantee uniqueness of the "mailing" address in such a model, many databases support filtered unique indexes, to ensure there are no duplicate mailing addresses.
I'm not sure why you have person_id as an fk on address but that doesn't look correct. There are lots of correct ways to model this and the best one for you will depend on you particular circumstances - but a couple of options are:
If you know all the types of addresses there can be then add multiple address fk fields to the person e.g. billing address, shipping address, etc. This makes querying quick and simple but is inflexible: adding a new address type in the future is relatively complex to implement
Add an intersection table with fks for person and address and an address type field. This has a slight overhead when it comes to querying but has the advantage if being very flexible: adding a new address type is trivial

Setup Many-to-Many tables that share a common type

I'm preparing a legacy Microsoft SQL Server database so that I can interface with in through an ORM such as Entity Framework, and my question revolves around handling the setup of some of my many-to-many associations that share a common type. Specifically, should a common type be shared among master types or should each master type have its own linked table?
For example, here is a simple example I concocted that shows how the tables of interest are currently setup:
Notice that of there are two types, Teachers and Students, and both can contain zero, one, or many PhoneNumbers. The two tables, Teachers and Students, actually share an association table (PeoplePhoneNumbers). The field FKID is either a TeacherId or a StudentId.
The way I think it ought to be setup is like this:
This way, both the Teachers table and the Students table get its own PhoneNumbers table.
My gut tells me the second way is the proper way. Is this true? What about even if the PhoneNumbers tables contains several fields? My object oriented programmer brain is telling me that it would be wrong to have several identical tables, each containing a dozen or so fields if the only difference between these tables is which master table they are linked to? For example:
Here we have two tables that contain the same information, yet the only difference is that one table is addresses for Teachers and the other is for Students. These feels redundant to me and that they should really be one table -- but then I lose the ability for the database to constrain them (right?) and also make it messier for myself when I try to apply an ORM to this.
Should this type of common type be merged or should it stay separated for each master type?
Update
The answers below have directed me to the following solution, which is based on subclassing tables in the database. One of my original problems was that I had a common table shared among multiple other tables because that entity type was common to both the other tables. The proper way to handle that is to subclass the shared tables and essentially descend them from a common parent AND link the common data type to this new parent. Here's an example (keep in mind my actual database has nothing to do with Teachers and Students, so this example is highly manufactured but the concepts are valid):
Since Teachers and Students both required PhoneNumbers, the solution is to create a superclass, Party, and FK PhoneNumbers to the Party table. Also note that you can still FK tables that only have to do with Teachers or only have to do with Students. In this example I also subclassed Students and PartTimeStudents one more level down and descended them from Learners.
Where this solution is very satisfactory is when I implement it in an ORM, such as Entity Framework.
The queries are easy. I can query all Teachers AND Students with a particular phone number:
var partiesWithPhoneNumber = from p in dbContext.Parties
where p.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select p;
And it's just as easy to do a similar query but only for PhoneNumbers belonging to only Teachers:
var teachersWithPhoneNumber = from t in dbContext.Teachers
where t.Party.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select t;
Teacher and Student are both subclasses of a more general concept (a Person). If you create a Person table that contains the general data that is shared for all people in your database and then create Student and Teacher tables that link to Person and contain any additional details you will find that you have an appropriate point to link in any other tables.
If there is data that is common for all people (such as zero to many phone numbers) then you can link to the Person table. When you have data that is only appropriate for a Student you link it to the Student ID. You gain the additional advantage that Student Instructors are simply a Person with both a Student and Teacher record.
Some ORMs support the concept of subclass tables directly. LLBLGen does so in the way I describe so you can make your data access code work with higher level concepts (Teacher and Student) and the Person table will be managed on your behalf in the low level data access code.
Edit
Some commentary on the current diagram (which may not be relevant in the source domain this was translated from, so a pinch of salt is advised).
Party, Teachers and Learners looks good. Salaries looks good if you add start and end dates for the rate so you can track salary history. Also keep in mind it may make sense to use PartyID (instead of TeacherID) if you end up with multiple entites that have a Salary.
PartyPhoneNumbers looks like you might be able to hang the phone number off of that directly. This would depend on if you expect to change the phone number for multiple people (n:m) at once or if a phone number is owned by each Party independently. (I would expect the latter because you might have a student who is a (real world) child of a teacher and thus they share a phone number. I wouldn't want an update to the student's phone number to impact the teacher, so the join table seems odd here.)
Learners to PaymentHistories seems right, but the Students vs PartTimeStudents difference seems artificial. (It seems like PartTimeStudents is more AttendenceDays which in turn would be a result of a LearnerClasses join).
I think you should look into the supertype/subtype pattern. Add a Party or Person table that has one row for every teacher or student. Then, use the PartyID in the Teacher and Student tables as both the PK and FK back to Party (but name them TeacherID and StudentID). This establishes a "one-to-zero-or-one" relationship between the supertype table and each of the subtype tables.
Note that if you have identity columns in the subtype tables they will need to be removed. When creating those entities going forward you will first have to insert to the supertype and then use that row's ID in either subtype.
To maintain consistency you will also have to renumber one of your subtype tables so its IDs do not conflict with the other's. You can use SET IDENTITY_INSERT ON to create the missing supertype rows after that.
The beauty of all this is that when you have a table that must allow only one type such as Student you can FK to that table, but when you need an FK that can be either--as with your Address table--you FK to the Party table instead.
A final point is to move all the common columns into the supertype table and put only columns in the subtypes that must be different between them.
Your single Phone table now is easily linked to PartyID as well.
For a much more detailed explanation, please see this answer to a similar question.
The problem that you have is an example of a "one-of" relationship. A person is a teacher or a student (or possibly both).
I think the existing structure captures this information best.
The person has a phone number. Then, some people are teachers and some are students. The additional information about each entity is stored in either the teacher or student table. Common information, such as name, is in the phone table.
Splitting the phone numbers into two separate tables is rather confusing. After all, a phone number does not know whether it is for a student or a teacher. In addition, you don't have space for other phone numbers, such as for administrative staff. You also have a challenge for students who may sometimes teach or help teach a class.
Reading your question, it looks like you are asking for a common database schema to your situation. I've seen several in the past, some easier to work with than others.
One option is having a Student_Address table and a Teacher_Address table that both use the same Address table. This way if you have entity specific fields to store, you have that capability. But this can be slightly (although not significantly) harder to query against.
Another option is how you suggested above -- I would probably just add a primary key on the table. However you'd want to add a PersonTypeId field to that table (PersonTypeId which links to a PersonType table). This way you'd know which entity was with each record.
I would not suggest having two PhoneNumber tables. I think you'll find it much easier to maintain with all in the same table. I prefer keeping same entities together, meaning Students are a single entity, Teachers are a single entity, and PhoneNumbers are the same thing.
Good luck.

Two tables or one table?

a quick question in regards to table design..
Let's say I am designing a loan application database.
As it is right now, I will have 2 tables..
Applicant (ApplicantID, FirstName , LastName, SSN, Email... )
and
Co-Applicant(CoApplicantID, FirstName, LastName , SSN, Email.., ApplicantID)
Should I consider having just one table because all the fields are the same.. ??
Person( PersonID, FirstName, LastName , SSN, Email...,ParentID (This determines if it is a co-applicant))
What are the pros and cons of these two approaches ?
I suggest the following data model:
PERSON table
PERSON_ID, pk
LOAN_APPLICATIONS table
APPLICATION_ID, pk
APPLICANT_TYPE_CODE table
APPLICANT_TYPE_CODE, pk
APPLICANT_TYPE_CODE_DESCRIPTION
LOAN_APPLICANTS table
APPLICATION_ID, pk, fk
PERSON_ID, pk, fk
APPLICANT_TYPE_CODE, fk
Person( PersonID, FirstName, LastName , SSN, Email...,ParentID (This determines if it is a co-applicant))
That works if a person will only ever exist in your system as either an applicant or a co-applicant. A person could be a co-applicant to numerous loans and/or an applicant themselves - you don't want to be re-entering their details every time.
This is the benefit of how & why things are normalized. Based on the business rules & inherent reality of usage, the tables are setup so stop redundant data being stored. This is for the following reasons:
Redundant data is a waste of space & resources to support & maintain
The action of duplicating the data means it can also be different in subtle ways - capitalizations, spaces, etc that can all lead to complications to isolate real data
Data incorrectly stored due to oversight when creating the data model
Foresight & Flexibility. Currently there isn't any option other than applicant or co-applicant for an APPLICANT_TYPE_CODE value - it could be a stored without using another table & foreign key. But this setup allows support to add different applicant codes in the future, as needed - without any harm to the data model.
There's no performance benefit when you risk bad data. What you would make, will be eaten by the hacks you have to perform to get things to work.
If the Domain Model determines that both people are applicants and that are related, then they belong in the same table with a self-referential foriegn key.
You may want to read up on database normalization.
I think you should have two tables, but not those two. Have a table "loans" which has foreign keys to an applicants table, or just have records in applicants reference the same table.
The advantages:
- Makes searching easier: If you only have a phone number or a name, you can still search, in a single table and find the corresponding person regardless of he/she being a co-applicant or a main-applicant. Otherwise you'd need to use a UNION construct. (Yet, when you know that you search for a particular type of applicant, you add a filter on the type and you only get such applicants.
- Generally easier to maintain. Say tomorrow you need to add the tweeter id of the applicant ;-), only one place to change.
- Also allows inputing persons with say an "open/undefined" type, and assign then as applicant or otherwise, at a later date.
- Allows to introduce new types of applicants (say a co-latteral warrantor... whatever)...
The disadvantages:
with really huge (multi-million person records), there could be a slight performance gain with a two table approach (depending on index and various other things
SQL queries can be bit more complicated, for example with two separate joins to the the person table, one for the applicant the other for the co-applicant . (Nothing intractable but a bit more complexity.
On the whole, the proper design is in most likelihood the one with a single table. Only possible exception is if over time the info kept for one type of applicant was starting to diverge significantly from the other type(s) of applicant. (And even then we can deal with this situation in different ways, including the introduction of a related table for these extra fields, as it may make more sense; Yes, a two table system again, but one where the extra fields may fit "naturally" together in term of their semantics, usage etc...)
Both of your variants have one disadvantage: any person can be an applicant and co-applicant twice and more. So you should use table Person( PersonID, FirstName, LastName , SSN, Email... ) and table Co-Applicants (PersonID as Applicant, PersonID as CoApplicant)
How about since each Applicant can have a Co-Applicant -- just go with one table in total. So you'd have Applicants, which has an optional foreign key field 'coapplicant' (or similar).
If the fields between applicant and co-applicant are identical, then I would suggest that you put them in the same table and use an "applicant type" field to indicate main or co- applicant. IF there's some information special about the co-applicant (such as relationship to main applicant, extra phone numbers, other stuff) you might want to normalize to a separate table and refer from there back to the co-applicant (by (co-)applicant ID) in the applicant table.
Keep Two table>
1ST User type code ID
In this table u can keep user type ie applicat And Co applicant
2nd table User--> here u can keep all the field with similar coloums with user type code as foregin key.
By this you can easily distingush between two user.
I know - I'm too late on this.... The Loan Application is your primary entity. You will have one or more applicants for the loan. Drop the idea of Person - you're creating something that you can't control. I've been there, done that and got the T-Shirt.