SQL separate table for relationship pros & cons - sql

I am working on database design so want to understand the pros & cons of having a separate table only for relationship.
It should be like:
Customer [Customer Detail] (CustomerID AS PK)
Address [Address Detail] (AddressID as PK)
CustomerAddress [CustomerID FK, AddressID FK]
Or
Customer [Customer Detail] (CustomerID AS PK)
Address [Address Detail, CustomerID FK]
A customer can have more than one address.
What are the advantage and disadvantage?

This is a reasonable question.
Basically, it boils down to a single question: "Do you want two identical addresses to always have the same key or not?".
In the first version, the "Address Detail" can be unique across the database. So, two room-mates could have the same AddressId. When the Smith's move out and the Jones' move in, they could have the same AddressId.
In the second version, each person would have one or more address records. However, the details for a given address could be repeated.
Which is better depends on your application. Often, the first method is preferable when you are contacting people at the address, because "de-duplication" is built into the data model

Related

How to invert a an incorrectly modeled 1:1 relationship?

I have an incorrectly modeled 1:1 relationship between two tables:
Table: Customer
* id (bigint)
* ...
Table: Address
* id
* customer_id (bigint) <--- FOREIGN KEY
* street (varchar)
* ...
The real-world relationship is so that a customer may have one address or not. However, with the current data model, it would be possible to assign multiple addresses to a customer. We do not do this at the moment, so the data could be migrated to this:
Table: Customer
* id (bigint)
* address_id (nullable bigint)
* ...
Is it possible to make this migration in one transaction, using purely SQL code? I would like to avoid an intermediate state where we have both relationships and migrate the customers one-by one. That is the best idea I came up with so far.
What I understood so far is you want to add address_id column in customer table. If there is more than one address for a customer then you might need to select only one address. Here I am considering last address.
update Customer
set address_id=(select max(id) from Address a where a.customer_id=Customer.id)
You can actually leave your data model as is and just add a unique constraint to address:
alter table address unq_address_customer unique (customer_id);
This is not ideal, but it does enforce the unique constraint with minimal changes to the data model.
That said, I would question why you want only one address per customer. Have you considered these situations?
Customers whose "delivery" address and whose "billing" address are different.
Customers who move.
Customers whose address changes although they do not move, say due to postal code reassignments or street name changes.

PK for table that have not unique data

I have 2 tables like
Company( #id_company, ... )
addresses( address, *id_company*, *id_city* )
cities( #id_city, name_city, *id_county* )
countries( #id_country, name_country )
What i want is :
It is a good design ? ( a company can have many addresses )
And the important thing is that you my notice that i didn't add a PK for addresses table because every address of a companies will be different, so am I right ?
And i will never have a where in a select that specify a address.
First of all we should distinguish natural keys and technical keys. As to natural keys:
A country is uniquely identified by its name.
A city can be uniquely identified by its country and a unique name. For instance there are two Frankfurt in Germany. To make sure what we are talking about we either use the distinct names Frankfurt/Main and Frankfurt/Oder or use the city name with its zip codes range.
A company gets identified by its full name usually. Or use some tax id, code, whatever.
To uniquely identify a company address we would take the company plus country, city and address in the city (street name and number usually).
You've decided to use technical keys. That's okay. But you should still make sure that names are unique. You don't want France and France in your table, it must be there just once. You don't want Frankfurt and Frankfurt without any distinction in your city table for Germany either. And you don't want to have the same address twice entered for one company.
company( #id_company, name_company, ... ) plus a unique constraint on name_country or whatever makes a company unique
countries( #id_country, name_country ) plus a unique constraint on name_country
cities( #id_city, name_city, id_county ) plus a unique constraint on name_city, id_country
addresses( address, id_company, id_city ) with a unique constraint on all three columns
From what you say, it looks like you want the addresses only for lookup. You don't want to use them in any other table, not now and not in the future. Well, then you are done. As you need a unique constraint on all three columns, you could just as well declare this as your primary key, but you don't have to.
Keep in mind, that to reference a company address in any other future table, you would have to store address + id_company + id_city in that table. At that point you would certainly like to have an address id instead. But you can add that when needed. For now you can do without.
It's okay - you might want to add some non-unique index on company_id so company address queries are sped up. Another option would be making a joining table between Company and Address, but that would probably only be justified if Address stored more data(so searches would be slower).
This design is fine.
A (relational) table always has a (candidate) key. (One of which you can choose as the primary key, but candidate keys, aka keys, are what matter.) Because if no subset of columns smaller than set of all columns is unique then the key is the set of all columns.
Since every table has one, in SQL you should declare it. Eg in SQL if you want to declare a FOREIGN KEY constraint to the key of this table then you have to declare that column set a key via PRIMARY KEY, KEY or UNIQUE. Also, telling the DBMS what you know helps optimize your use of it.
What matters to determining keys are subsets of columns that are unique that don't have smaller subsets that are unique. Those are the keys.
A company, address or city is not unique since you are going to have multiple of each.
A (city,address) is not unique normally.
A (city,company) is not unique normally.
A (company,address) is not unique normally.
So (company,address,city) is the (only) (candidate) key.
Note that if there were only ever one city, then (company,address) would be the key. And if there were only ever one company, then (address,city) would be the key. So your given reason that the "because every address[+city?] of a company [?] will be different" isn't sound unless we're supposed to assume other things.
I'm making this an answer instead of a comment because of length. As to the address table having a defined primary key, the answer is yes. There are several good reasons but just consider this one.
Suppose a company had several addresses and a move required you to delete one of the addresses. You can't just delete where comp_id = x as that would delete all the addresses for that company. You have to have where comp_id = x and something_else where the something else must differentiate the one address from all the others for that company. So you have to have someone look at the different addresses to see how they differ and select the one difference that correctly identifies the one address and then write that correctly into the where clause.
That's a lot of work to do every time you want to delete (or update) an address.
It also means it's more difficult to write a parameterized delete statement that can be used to delete any address. Suppose a company has several locations in the same building: Shipping in Suite 101, Marketing in Suite 202 and IT in (of course) the basement. So the street, city, state, everything is the same, different only in Suite_No or whatever is used to refine the address.
Then consider your user. Most of the time, a user isn't going to be interested in seeing every single address you have listed for a company. He's only interested in Product Testing. You should be able to give them Product Testing's address and no other. Users are not known for their patience when presented with a data dump every time they do a query and it's up to them to select the one they're looking for.
It just solves so many problems to be able to specify where addr_id = x.
An address is a thing and should have its own table.
An address can exist without a company, therefore it should not have a foreign key to company. Also, what if you start selling to/buying from individuals?
A company can have zero, one, or many addresses.
Two or more companies can have the exact same address. You assumption is flawed.
Use a junction table:
company -< company_address >- address

sql many to many relationship where do I put type field

If I have the following tables listed below, would I want to put the AddressTypeID in the Address table or the CompanyAddress table?
COMPANY
CompanyID
COMPANYADDRESS
CompanyAddressID
CompanyID
AddressID
ADDRESS
AddressID
ADDRESSTYPE
AddressTypeID
First, stop to think about whether this is really a many-to-many relationship. Will you ever really assign the exact same address record to more than one company? You may be able to simplify your design by eliminating CompanyAddress and adding a CompanyID column directly to Address.
If this truly is a many-to-many relationship, then to answer your original question, keep the AddressTypeID in Address, not in CompanyAddress, since it should be the same type for every company that uses it.

Database structure for storing historical data

Preface:
I was thinking the other day about a new database structure for a new application and realized that we needed a way to store historical data in an efficient way. I was wanting someone else to take a look and see if there are any problems with this structure. I realize that this method of storing data may very well have been invented before (I am almost certain it has) but I have no idea if it has a name and some google searches that I tried didn't yield anything.
Problem:
Lets say you have a table for orders, and orders are related to a customer table for the customer that placed the order. In a normal database structure you might expect something like this:
orders
------
orderID
customerID
customers
---------
customerID
address
address2
city
state
zip
Pretty straightforward, orderID has a foreign key of customerID which is the primary key of the customer table. But if we were to go and run a report over the order table, we are going to join the customers table to the orders table, which will bring back the current record for that customer ID. What if when the order was placed, the customers address was different and it has been subsequently changed. Now our order no longer reflects the history of that customers address, at the time the order was placed. Basically, by changing the customer record, we just changed all history for that customer.
Now there are several ways around this, one of which would be to copy the record when an order was created. What I have come up with though is what I think would be an easier way to do this that is perhaps a little more elegant, and has the added bonus of logging anytime a change is made.
What if I did a structure like this instead:
orders
------
orderID
customerID
customerHistoryID
customers
---------
customerID
customerHistoryID
customerHistory
--------
customerHistoryID
customerID
address
address2
city
state
zip
updatedBy
updatedOn
please forgive the formatting, but I think you can see the idea. Basically, the idea is that anytime a customer is changed, insert or update, the customerHistoryID is incremented and the customers table is updated with the latest customerHistoryID. The order table now not only points to the customerID (which allows you to see all revisions of the customer record), but also to the customerHistoryID, which points to a specific revision of the record. Now the order reflects the state of data at the time the order was created.
By adding an updatedby and updatedon column to the customerHistory table, you can also see an "audit log" of the data, so you could see who made the changes and when.
One potential downside could be deletes, but I am not really worried about that for this need as nothing should ever be deleted. But even still, the same effect could be achieved by using an activeFlag or something like it depending on the domain of the data.
My thought is that all tables would use this structure. Anytime historical data is being retrieved, it would be joined against the history table using the customerHistoryID to show the state of data for that particular order.
Retrieving a list of customers is easy, it just takes a join to the customer table on the customerHistoryID.
Can anyone see any problems with this approach, either from a design standpoint, or performance reasons why this is bad. Remember, no matter what I do I need to make sure that the historical data is preserved so that subsequent updates to records do not change history. Is there a better way? Is this a known idea that has a name, or any documentation on it?
Thanks for any help.
Update:
This is a very simple example of what I am really going to have. My real application will have "orders" with several foreign keys to other tables. Origin/destination location information, customer information, facility information, user information, etc. It has been suggested a couple of times that I could copy the information into the order record at that point, and I have seen it done this way many times, but this would result in a record with hundreds of columns, which really isn't feasible in this case.
When I've encountered such problems one alternative is to make the order the history table. Its functions the same but its a little easier to follow
orders
------
orderID
customerID
address
City
state
zip
customers
---------
customerID
address
City
state
zip
EDIT: if the number of columns gets to high for your liking you can separate it out however you like.
If you do go with the other option and using history tables you should consider using bitemporal data since you may have to deal with the possibility that historical data needs to be corrected. For example Customer Changed his current address From A to B but you also have to correct address on an existing order that is currently be fulfilled.
Also if you are using MS SQL Server you might want to consider using indexed views. That will allow you to trade a small incremental insert/update perf decrease for a large select perf increase. If you're not using MS SQL server you can replicate this using triggers and tables.
When you are designing your data structures, be very carful to store the correct relationships, not something that is similar to the correct relationships. If the address for an order needs to be maintained, then that is because the address is part of the order, not the customer. Also, unit prices are part of the order, not the product, etc.
Try an arrangement like this:
Customer
--------
CustomerId (PK)
Name
AddressId (FK)
PhoneNumber
Email
Order
-----
OrderId (PK)
CustomerId (FK)
ShippingAddressId (FK)
BillingAddressId (FK)
TotalAmount
Address
-------
AddressId (PK)
AddressLine1
AddressLine2
City
Region
Country
PostalCode
OrderLineItem
-------------
OrderId (PK) (FK)
OrderItemSequence (PK)
ProductId (FK)
UnitPrice
Quantity
Product
-------
ProductId (PK)
Price
etc.
If you truly need to store history for something, like tracking changes to an order over time, then you should do that with a log or audit table, not with your transaction tables.
Normally orders simply store the information as it is at the time of the order. This is especially true of things like part numbers, part names and prices as well as customer address and name. Then you don;t have to join to 5 or six tables to get teh information that can be stored in one. This is not denormalization as you actually need to have the innformation as it existed at the time of the order. I think is is less likely that having this information in the order and order detail (stores the individual items ordered) tables is less risky in terms of accidental change to the data as well.
Your order table would not have hundreds of columns. You would have an order table and an order detail table due to one to many relationships. Order table would include order no. customer id 9so you can search for everything this customer has ever ordered even if the name changed), customer name, customer address (note you don't need city state zip etc, put the address in one field), order date and possibly a few other fields that relate directly to the order at a top level. Then you have an order detail table that has order number, detail_id, part number, part description (this can be a consolidation of a bunch of fields like size, color etc. or you can separate out the most common), No of items, unit type, price per unit, taxes, total price, ship date, status. You put one entry in for each item ordered.
If you are genuinely interested in such problems, I can only suggest you take a serious look at "Temporal Data and the Relational Model".
Warning1 : there is no SQL in there and almost anything you think you know about the relational model will be claimed a falsehood. With good reason.
Warning2 : you are expected to think, and think hard.
Warning3 : the book is about what the solution for this particular family of problems ought to look like, but as the introduction says, it is not about any technology available today.
That said, the book is genuine enlightenment. At the very least, it helps to make it clear that the solution for such problems will not be found in SQl as it stands today, or in ORMs as those stand today, for that matter.
What you want is called a datawarehouse. Since datawarehouses are OLAP and not OLTP, it is recommended to have as many columns as you need in order to achieve your goals. In your case the orders table in the datawarehouse will have 11 fields as having a 'snapshot' of orders as they come, regardless of users accounts updates.
Wiley -The Data Warehouse Toolkit, Second Edition
It's a good start.
Our payroll system uses effective dates in many tables. The ADDRESSES table is keyed on EMPLID and EFFDT. This allows us to track every time an employee's address changes. You could use the same logic to track historical addresses for customers. Your queries would simply need to include a clause that compares the order date to the customer address date that was in effect at the time of the order. For example
select o.orderID, c.customerID, c.address, c.city, c.state, c.zip
from orders o, customers c
where c.customerID = o.customerID
and c.effdt = (
select max(c1.effdt) from customers c1
where c1.customerID = c.customerID and c1.effdt <= o.orderdt
)
The objective is to select the most recent row in customers having an effective date that is on or before the date of the order. This same strategy could be used to keep historical information on product prices.
I myself like to keep it simple. I would use two tables: a customer table and a customer history table. If you have the key (e.g. CustomerID) in the history table there is no reason to make a joining table, a select on that key will give you all records.
You also don't have audit information (e.g. date modified, who modified etc) in the history table as you show it, I expect you want this.
So mine would look something like this:
CustomerTable (this contains current customer information)
CustomerID (distinct non null)
...all customer information fields
CustomerHistoryTable
CustomerID (not distinct non null)
...all customer information fields
DateOfChange
WhoChanged
The DateOfChange field is the date the customer table was changed (from the values in this record) to the values in a more recent record of the values in the CustomerTable.
You orders table just needs a CustomerID if you need to find the customer information at the time of the order it is a simple select.

How to Set Customer Table with Multiple Phone Numbers? - Relational Database Design

CREATE TABLE Phone
(
phoneID - PK
.
.
.
);
CREATE TABLE PhoneDetail
(
phoneDetailID - PK
phoneID - FK points to Phone
phoneTypeID ...
phoneNumber ...
.
.
.
);
CREATE TABLE Customer
(
customerID - PK
firstName
phoneID - Unique FK points to Phone
.
.
.
);
A customer can have multiple phone numbers e.g. Cell, Work, etc.
phoneID in Customer table is unique and points to PhoneID in Phone table.
If customer record is deleted, phoneID in Phone table should also be deleted.
Do you have any concerns on my design? Is this designed properly? My problem is
phoneID in Customer table is a child and if child record is deleted then i
can not delete the parent (Phone) record automatically.
I think you've overdesigned it. I see no use for a separate Phone + PhoneDetail table. Typically there are two practical approaches.
1) Simplicity -Put all of the phones in the Customer record itself. Yes, it breaks normalization rules, but its very simple in practice and usually works as long as you provide (Work, Home, Mobile, Fax, Emergency). Upside is code is simply to write, time to implementation is shorter. Retrieving all the phones with a customer record is simple, and so is using a specific type of phone (Customer.Fax).
The downsides : adding additional phone types later is a little more painful, and searching for phone numbers is kludgy. You have to write SQL like "select * from customer where cell = ? or home = ? or work = ? or emergency = ?". Assess your design up front. If either of these issues is a concern, or you don't know if it may be a concern, go with the normalized approach.
2) Extensibility - Go the route you are going. Phone types can be added later, no DDL changes. Customer -> CustomerPhone
Customer (
customerId
)
CustomerPhone (
customerId references Customer(customerId)
phoneType references PhoneTypes(phoneTypeId)
phoneNumber
)
PhoneTypes (
phoneTypeId (H, W, M, F, etc.)
phoneTypeDescription
)
As mrjoltcola already addressed the normalization, I'll tackle the problem of having a record in phone and no record in phone detail.
If that is your only problem there are three approaches:
1) do not delete from detail table but from phone with CASCADE DELETE - gives a delete from two tables with single SQL statement and keeps data consistent
2) have triggers on the detail table that will delete the parent automatically when last record for a parent is deleted from the child (this will not perform well and will slow down all deletes on the table. and it is ugly. still it is possible to do it)
3) do it in the business logic layer of the application - if this layer is properly separated and if users(applications) will be modifying data only through this layer you might reach desired level of consistency guarantee