SQL Database Design - sql

I have a design dilemma; On one hand (option 1) I only create address table one time but on the other hand (option 2) seems like it would have performance advantages...less joins and less data. What do you guys think?

Option 1 for sure. All your addresses should be in one spot. This shouldn't hinder performance either. You can limit the addresses returned by the WHERE statement when you query the data.
Three customer addresses in one table or in separate tables?
Referencing the aforementioned link: Since you're saying your customer address relationship is modeled as one-to-many, I would use the following example of one address table, an AddressType and a EntityId FK. Using this method would allow for an Entity (client/employee/contact) to have many addresses.
CREATE TABLE Entity
(
ID int not null IDENTITY(1,1) PRIMARY KEY,
Name varchar(60) not null,
EntityType int not null
-- etc
)
CREATE TABLE Address
(
ID int not null IDENTITY(1,1) PRIMARY KEY,
EntityID int not null
CONSTRAINT FK_Entity_EntityID FOREIGN KEY References Entity(ID),
Street varchar(120) not null,
AddressType int not null
-- etc
)

Option 1 is better from a design perspective. One table should have all the addresses - and kept updated accordingly. This way you can easily create relationships between addresses and Client, Employee and Contact (and everything else really) and do queries like: "select everything that lives in X address". Keep in mind that this way you will have to properly maintain your Address table - manage duplicate addresses, make sure the address is correct etc.
On the other hand, in a very simplistic Use-Case where you just want to display a Client, Employee and Contact info, an Address can be a free-text column. This will save you the trouble of properly maintaining addresses.
It all comes down to what you need in the end.

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?
Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

How to design entity tables for entities with multiple names

I want to create a table structure to store customers and I am facing a challenge: for each customer I can have multiple names, one being the primary one and the others being the alternative names.
The initial take on the tables looks like this:
CREATE TABLE dbo.Customer (
CustomerId INT IDENTITY(1,1) NOT NULL --PK
-- other fields below )
CREATE TABLE dbo.CustomerName (
CustomerNameId INT IDENTITY(1,1) NOT NULL -- PK
,CustomerId INT -- FK to Customer
,CustomerName VARCHAR(30)
,IsPrimaryName BIT)
Though, the name of the customer is part of the Customer entity and I feel that it belongs to the Customer table.
Is there a better design for this situation?
Thank you
Personally, I would keep the Primary name in the Customer table and create an "AlternateNames" table with a zero-to-many relationship to Customer.
This is because presumably most of the time when you are returning customer data, you are only going to be interested in returning the Primary Name. And probably the main (if not only) reason you want the alternate names is for looking up customers when an alternate name has been supplied.
Unfortunately, this is too long for a comment.
Before figuring this out, more information is needed.
Is additional information needed for names? For instance, language or title or date created?
Are the names unique? Is the uniqueness within a customer or over all names?
Are the primary names unique?
Does every customer have to have a primary name?
How often does the primary name change to an alternate name? (As opposed to just having the name updated.)
When querying the data, will you know if the name is a primary or alternate name? (Or do they all need to be compared?)
Depending on the answer to this question, the appropriate data structure can have some tricky nuances. For instance, if you have a flag to identify the primary name, it can be tricky to ensure that exactly one row has this value set -- particularly when updating rows.
Note: If you update the question with the answers, I'll delete this.

One Address Table for Many entities?

Conceptual stage question:
I have several Tables (Person, Institution, Factory) each has many kinds of Addresses (Mailing, Physical)
Is there a way to create a single Address table that contains all the addresses of all the Entities?
I'd rather not have a PersonAddress and FactoryAddress etc set of tables.
Is there another option?
The amount of data will only be several thousand addresses at most, so light in impact.
My proposal relies on the principle that one entity (person, Institution, Factory, etc) can have multiple adresses, which is usually the case (home, business, etc), and that one adress can be shared by entities of different nature:
CREATE TABLE ADDRESS
(
ID INT IDENTITY PRIMARY KEY NOT NULL,
.... (your adress fields here)
id_Person ... NULL,
id_Institution ... NULL,
id_Factory ... NULL
)
The main limit is that 2 different persons cannot share the same adress. In such a situation, you'll have to go with an additional "EntityAddress" table, like this:
CREATE TABLE ADDRESS
(
ID INT IDENTITY PRIMARY KEY NOT NULL,
.... (your adress fields here)
)
CREATE TABLE ENTITY_ADDRESS
(
ID INT IDENTITY PRIMARY KEY NOT NULL
id_Address .... NOT NULL,
id_Person .... NULL,
id_Institution ... NULL,
id_Factory .... NULL
)
The last model allows you to share for example one adress for multiple persons working in the same institution.
BUT: according to me, the 'better' solution would be to merge your different entities into one table. You will then need:
An Entity Table, made for all entities
An Entity Type table, that will contain the different entity types.
In your case you have at least 3 rows: persons, factories,
institution
If one adress per entity is enough, you could go for the address details as properties of the Entity table.
If you need multiple addresses by entity, you'll have to go with the Addresses Table with an Id_Entity as a foreign key.
If you want to share one adress among multiple entities, each entity having potentially multiple adresses (a many-to-many relation between entities and adresses), then you will need to go for the EntityAddres table in addition to the Entity and Address Tables.
Your choice between these models will depend on your needs and your businness rules.
You need to use abstraction and inheritance.
An individual and institution (I'd call it organization) are really just concrete representations of an abstract legal party.
A mailing or physical address is the concretion of an abstract address, which could also be an email address, telephone number, or web address.
A legal party can be have zero or more addresses.
An address can be belong to zero or more legal parties.
A party could use the same address for multiple roles, such as 'Home' address and 'Work' address.
If a factory is big enough, sub-facilities in the factory might have their own addresses, so you might want to consider a hierarchical relationship there. For example, each apartment in a condo has one address each. Each building in a large factory might have their own address.
create table party (
party_id identity primary key
);
create table individual (
individual_id int primary key references party(party_id),
...
);
create table organization (
organization_id int primary key references party(party_id),
...
);
create table address (
address_id identity primary key,
...
);
create table mailing_address (
address_id int primary key references address(address_id),
...
);
create table party_address (
party_id int references party(party_id),
address_id int references address(address_id),
role varchar(255), --this should really point to a role table
primary key (party_id, address_id, role)
);
create table facility (
facility_id identity primary key,
address_id int not null references address(address_id),
parent_id int null references facility(facility_id)
);
in my opinion ,you should create a pivot table to link Entity with her Address
for exampleinstitution_addresses(id, id_institution,id_address), person_addresses(id,id_person,id_address) etc...
You could very definitely do this. You could have the Address table that has an ID, then Person, Institution and Factory could all have foreign keys to the Address table.
If you need to be able to distinguish what kind of Address it is at the Address level, you could consider adding an AddressType table and having a foreign key to that on the Address table
Example:
CREATE TABLE ADDRESS
(
ID INT IDENTITY PRIMARY KEY NOT NULL,
City VARCHAR(50) NOT NULL,
State VARCHAR(2) NOT NULL,
Zip VARCHAR(10) NOT NULL,
AddressLine1 VARCHAR(200) NOT NULL,
AddressLine2 VARCHAR(200) NOT NULL,
)
CREATE TABLE Person
(
ID INT IDENTITY PRIMARY KEY NOT NULL,
AddressID INT FOREIGN KEY REFERENCES Address(ID)
)
CREATE TABLE Institution
(
ID INT IDENTITY PRIMARY KEY NOT NULL,
AddressID INT FOREIGN KEY REFERENCES Address(ID)
)
...etc
Another basic \ bullet proof system would be to organise your model around:
An Entity Table, made for all entities
An Entity Type table, that will contain the different entity types. In your case you have at least 3 rows: persons, factories, institution
If one adress per entity is enough, you could go for the address details as properties of the Entity table.
If you need multiple addresses by entity, you'll have to go with the Addresses Table with an Id_Entity as a foreign key.
If you want to share one adress among multiple entities, each entity having potentially multiple adresses (a many-to-many relation between entities and adresses), then you will need to go for the EntityAddres table in addition to the Entity and Address Tables.
EDIT: this answer was also merged with the other answer I gave here ... so I do not know if it deserves an upvote!

How to enforce uniques across multiple tables

I have the following tables in MySQL server:
Companies:
- UID (unique)
- NAME
- other relevant data
Offices:
- UID (unique)
- CompanyID
- ExternalID
- other data
Employees:
- UID (unique)
- OfficeID
- ExternalID
- other data
In each one of them the UID is unique identifier, created by the database.
There are foreign keys to ensure the links between Employee -> Office -> Company on the UID.
The ExternalID fields in Offices and Employees is the ID provided to my application by the Company (my client(s) actually). The clients does not have (and do not care) about my own IDs, and all the data my application receives from them is identified solely based on their IDs (i.e. ExternalID in my tables).
I.e. a request from the client in pseudo-language is like "I'm Company X, update the data for my employee Y".
I need to enforce uniqueness on the combination of CompanyID and Employees.ExternalID, so in my database there will be no duplicate ExternalID for the employees of the same company.
I was thinking about 3 possible solutions:
Change the schema for Employees to include CompanyID, and create unique constrain on the two fields.
Enforce a trigger, which upon update/insert in Employees validates the uniqueness.
Enforce the check on application level (i.e. my receiving service).
My alternative-dbadmin-in-me sais that (3) is the worst solution, as it does not protect the database of inconsistency in case of application bug or something else, and most probably will be the slowest one.
The trigger solution may be what I want, but it may become complicated, especially if a multiple inserts/updates need to be performed in a single statement, and I'm not sure about the performance vs. (1).
And (1) looks the fastest and easiest approach, but kind of goes against my understanding of relational model.
What SO DB experts opinion is about pros and cons of each of the approaches, especially if there is a possibility for adding an additional level of indirection - i.e. Company -> Office -> Department -> Employee, and the same uniqueness needs to be preserved (Company/Employee).
You're right - #1 is the best option.
Granted, I would question it at first glance (because of shortcutting) but knowing the business rule to ensure an employee is only related to one company - it makes sense.
Additionally, I'd have a foreign key relating the companyid in the employee table to the companyid in the office table. Otherwise, you allow an employee to be related to a company without an office. Unless that is acceptable...
Triggers are a last resort if the relationship can not be demonstrated in the data model, and servicing the logic from the application means the logic is centralized - there's no opportunity for bad data to occur, unless someone drops constraints (which means you have bigger problems).
Each of your company-provided tables should include CompanyID into the `UNIQUE KEY' over the company-provided ids.
Company-provided referential integrity should use company-provided ids:
CREATE TABLE company (
uid INT NOT NULL PRIMARY KEY,
name TEXT
);
CREATE TABLE office (
uid INT NOT NULL PRIMARY KEY,
companyID INT NOT NULL,
externalID INT NOT NULL,
UNIQIE KEY (companyID, externalID),
FOREIGN KEY (companyID) REFERENCES company (uid)
);
CREATE TABLE employee (
uid INT NOT NULL PRIMARY KEY,
companyID INT NOT NULL,
officeID INT NOT NULL,
externalID INT NOT NULL,
UNIQIE KEY (companyID, externalID),
FOREIGN KEY (companyID) REFERENCES company(uid)
FOREIGN KEY (companyID, officeID) REFERENCES office (companyID, externalID)
);
etc.
Set auto_increment_increment to the number of table you have.
SET auto_increment_increment = 3; (you might want to set this in your my.cnf)
Then manually set the starting auto_increment value of each table to different values
first table to 1, second table to 2, third table to 3
Table 1 will have values like 1,4,7,10,13,etc
Table 2 will have values like 2,5,8,11,14,etc
Table 3 will have values like 3,6,9,12,15,etc
Of course this is just ONE option, personally I'd just make it a combo value. Could be as simple as TableID, AutoincrementID, Where the TableID is constant in all rows.

SQL Server 2008 - Table - Clarifications

I am new to SQL Server 2008 database development.
Here I have a master table named ‘Student’ and a child table named ‘Address’. The common column between these tables is ‘Student ID’.
My doubts are:
Do we need to put ‘Address Id’ in the ‘Address’ table and make it primary key? Is it mandatory? ( I won’t be using this ‘Address Id’ in any of my reports )
Is Primary key column a must in any table?
Would you please help me on these.
Would you please also refer best links/tutorials for SQL Server 2008 database design practices (If you are aware of) which includes naming conventions, best practices, SQL optimizations etc. etc.
1) Yes, having an ADDRESS_ID column as the primary key of the ADDRESS table is a good idea.
But having the STUDENT_ID as a foreign key in the ADDRESS table is not a good idea. This means that an address record can only be associated to one student. Students can have roommates, so they'd have identical addresses. Which comes back to why it's a good idea to have the ADDRESS_ID column as a primary key, as it will indicate a unique address record.
Rather than have the STUDENT_ID column in the ADDRESS table, I'd have a corrollary/xref/lookup table between the STUDENT and ADDRESS tables:
STUDENT_ADDRESSES_XREF
STUDENT_ID, pk, fk to STUDENTS table
ADDRESS_ID, pk, fk to ADDRESS table
EFFECTIVE_DATE, date, not null
EXPIRY_DATE, date, not null
This uses a composite primary key, so that only one combination of the student & address exist. I added the dates in case there was a need to know when exactly, because someone could move back home/etc after all.
Most importantly, this works off the ADDRESS_ID column to allow for a single address to be associated to multiple people.
2) Yes, defining a primary key is frankly a must for any table.
In most databases, the act also creates an index - making searching more efficient. That's on top of the usual things like making sure a record is a unique entry...
Every table should have a way to uniquely and unambiguously identify a record. Make AddressID the primary key for the address table.
Without a primary key, the database will allow duplicate records; possibly creating join problems or trigger problems (if you implement them) down the road.