How to invert a an incorrectly modeled 1:1 relationship? - sql

I have an incorrectly modeled 1:1 relationship between two tables:
Table: Customer
* id (bigint)
* ...
Table: Address
* id
* customer_id (bigint) <--- FOREIGN KEY
* street (varchar)
* ...
The real-world relationship is so that a customer may have one address or not. However, with the current data model, it would be possible to assign multiple addresses to a customer. We do not do this at the moment, so the data could be migrated to this:
Table: Customer
* id (bigint)
* address_id (nullable bigint)
* ...
Is it possible to make this migration in one transaction, using purely SQL code? I would like to avoid an intermediate state where we have both relationships and migrate the customers one-by one. That is the best idea I came up with so far.

What I understood so far is you want to add address_id column in customer table. If there is more than one address for a customer then you might need to select only one address. Here I am considering last address.
update Customer
set address_id=(select max(id) from Address a where a.customer_id=Customer.id)

You can actually leave your data model as is and just add a unique constraint to address:
alter table address unq_address_customer unique (customer_id);
This is not ideal, but it does enforce the unique constraint with minimal changes to the data model.
That said, I would question why you want only one address per customer. Have you considered these situations?
Customers whose "delivery" address and whose "billing" address are different.
Customers who move.
Customers whose address changes although they do not move, say due to postal code reassignments or street name changes.

Related

Is this database in third normal form/3NF?

I know this is probably a stupid question to some but I'm required to have this database in 3NF but know very little about normalisation as our teacher has not covered it. Could someone give me a simple yes or no answer as to whether it is in 3NF and if it is not, suggest any changes. Thanks.
Simple answer No. Google transitive dependencies, or even just Google 3NF?
Why is this the case? Because you have some columns that are dependant on other columns in the same table, where those columns aren't part of the primary key.
For example, in your Customer Table you have Postcode and Town, but there is a relationship between the two, i.e. you couldn't have a Postcode for Paris without also having a Town of Paris. This is very weak transitive dependence, and most databases would have this without considering it bad practice, but I think this is enough to break 3NF.
There's another place where it's a little less clear, but I am pretty sure you break 3NF. In your Payment Table you have Deposit Paid, Total Price, Amount Still To Pay, and Fully Paid. There's an argument that given Total Price and Deposit Paid you could determine Amount Still to Pay. There's a very strong argument that you could always determine Fully Paid from the other three "paid" columns.
You can create Person table with id,title,firstname,lastname
You can add person_id to customerTable and employeeTable. And remove title,firstname,lastname fields from that table.
You can create TownTable with columns id,name and then add town_id to customerTable and emloyeeTable. Remove column town ftom that tables
Create contactInfoTable with columns id, contact_type_id, contact_info
Add contact_info_id column to employeeTable and customerTable. Delete another columns about contact info (phoneNo,email) from that tables.
Create contactType table woth columns id,name. Fill two rows to that table with names phone and email
Create personAddress table with columns id, address, town_id
Add personAddress_id to customerTable, employeeTable tables. Remove address,town from that tables
Create TownTable woth columns id,name
You can create userTable with columns id,employee_id,username
You can create passwordTable with id and user_id
Create user_role table with id, user_id, role_id
Create role_table and add id,name
Also add create_date,end_date (Date ), active(nvarchar2(1) or integer) to all your tables. And in your selects use active=1 condition.

Whats kind of relationship should I create ? One-To-One or One-To-Many?

I am using MS-Access database.
I am trying to make relationship with two tables, Old Customer table having data and Newly added coupon table.
As my client want to introduce new concept of coupon, where customer come with coupon instead of giving cash.
I have inserted Coupon code in coupon table in bulk.
Now, I am confused about what kind of relationship I should create with these two tables ?
I have to consider the below things...
customer can give either cash or coupon.
IF customer show the coupon, there will be an entry in CouponID column
as well in cash column (to know the value of that coupon.)
The CouponID should be unique in the customer table, Coupon Code should
not be repeated.
I am confused whether it should be one-To-One or One-To-Many ?
This image will help you to understand the problem.
I would not include "CouponID" in the customer table at all (nor "Cash" for that matter). The customer table models a customer, the coupon table models a coupon.
You need another table to model the transaction:
[CustomerTransaction]
id
date
customer_id
coupon_id
etc...
Every type of independent "thing" should be modeled by a discrete table. and "things" should be related to each other by other tables that create the 1:N relationship.
The relationship of customer to coupon is an optional (ie nullable) one-to-one; your data model looks good.
Some other comments:
The table would be better named sale rather than customer, since if the same customer comes back again, there will be a new row (but with the same name)
You could create a unique index on couponID that ignores nulls
You could rename Cash to Amount; the amount is either "cash" or coupon - the couponID column tells you the type of the amount
To create a unique index that ignores nulls:
CREATE UNIQUE INDEX idx1 ON customer (couponID) WITH IGNORE NULL;

PK for table that have not unique data

I have 2 tables like
Company( #id_company, ... )
addresses( address, *id_company*, *id_city* )
cities( #id_city, name_city, *id_county* )
countries( #id_country, name_country )
What i want is :
It is a good design ? ( a company can have many addresses )
And the important thing is that you my notice that i didn't add a PK for addresses table because every address of a companies will be different, so am I right ?
And i will never have a where in a select that specify a address.
First of all we should distinguish natural keys and technical keys. As to natural keys:
A country is uniquely identified by its name.
A city can be uniquely identified by its country and a unique name. For instance there are two Frankfurt in Germany. To make sure what we are talking about we either use the distinct names Frankfurt/Main and Frankfurt/Oder or use the city name with its zip codes range.
A company gets identified by its full name usually. Or use some tax id, code, whatever.
To uniquely identify a company address we would take the company plus country, city and address in the city (street name and number usually).
You've decided to use technical keys. That's okay. But you should still make sure that names are unique. You don't want France and France in your table, it must be there just once. You don't want Frankfurt and Frankfurt without any distinction in your city table for Germany either. And you don't want to have the same address twice entered for one company.
company( #id_company, name_company, ... ) plus a unique constraint on name_country or whatever makes a company unique
countries( #id_country, name_country ) plus a unique constraint on name_country
cities( #id_city, name_city, id_county ) plus a unique constraint on name_city, id_country
addresses( address, id_company, id_city ) with a unique constraint on all three columns
From what you say, it looks like you want the addresses only for lookup. You don't want to use them in any other table, not now and not in the future. Well, then you are done. As you need a unique constraint on all three columns, you could just as well declare this as your primary key, but you don't have to.
Keep in mind, that to reference a company address in any other future table, you would have to store address + id_company + id_city in that table. At that point you would certainly like to have an address id instead. But you can add that when needed. For now you can do without.
It's okay - you might want to add some non-unique index on company_id so company address queries are sped up. Another option would be making a joining table between Company and Address, but that would probably only be justified if Address stored more data(so searches would be slower).
This design is fine.
A (relational) table always has a (candidate) key. (One of which you can choose as the primary key, but candidate keys, aka keys, are what matter.) Because if no subset of columns smaller than set of all columns is unique then the key is the set of all columns.
Since every table has one, in SQL you should declare it. Eg in SQL if you want to declare a FOREIGN KEY constraint to the key of this table then you have to declare that column set a key via PRIMARY KEY, KEY or UNIQUE. Also, telling the DBMS what you know helps optimize your use of it.
What matters to determining keys are subsets of columns that are unique that don't have smaller subsets that are unique. Those are the keys.
A company, address or city is not unique since you are going to have multiple of each.
A (city,address) is not unique normally.
A (city,company) is not unique normally.
A (company,address) is not unique normally.
So (company,address,city) is the (only) (candidate) key.
Note that if there were only ever one city, then (company,address) would be the key. And if there were only ever one company, then (address,city) would be the key. So your given reason that the "because every address[+city?] of a company [?] will be different" isn't sound unless we're supposed to assume other things.
I'm making this an answer instead of a comment because of length. As to the address table having a defined primary key, the answer is yes. There are several good reasons but just consider this one.
Suppose a company had several addresses and a move required you to delete one of the addresses. You can't just delete where comp_id = x as that would delete all the addresses for that company. You have to have where comp_id = x and something_else where the something else must differentiate the one address from all the others for that company. So you have to have someone look at the different addresses to see how they differ and select the one difference that correctly identifies the one address and then write that correctly into the where clause.
That's a lot of work to do every time you want to delete (or update) an address.
It also means it's more difficult to write a parameterized delete statement that can be used to delete any address. Suppose a company has several locations in the same building: Shipping in Suite 101, Marketing in Suite 202 and IT in (of course) the basement. So the street, city, state, everything is the same, different only in Suite_No or whatever is used to refine the address.
Then consider your user. Most of the time, a user isn't going to be interested in seeing every single address you have listed for a company. He's only interested in Product Testing. You should be able to give them Product Testing's address and no other. Users are not known for their patience when presented with a data dump every time they do a query and it's up to them to select the one they're looking for.
It just solves so many problems to be able to specify where addr_id = x.
An address is a thing and should have its own table.
An address can exist without a company, therefore it should not have a foreign key to company. Also, what if you start selling to/buying from individuals?
A company can have zero, one, or many addresses.
Two or more companies can have the exact same address. You assumption is flawed.
Use a junction table:
company -< company_address >- address

How to Set Customer Table with Multiple Phone Numbers? - Relational Database Design

CREATE TABLE Phone
(
phoneID - PK
.
.
.
);
CREATE TABLE PhoneDetail
(
phoneDetailID - PK
phoneID - FK points to Phone
phoneTypeID ...
phoneNumber ...
.
.
.
);
CREATE TABLE Customer
(
customerID - PK
firstName
phoneID - Unique FK points to Phone
.
.
.
);
A customer can have multiple phone numbers e.g. Cell, Work, etc.
phoneID in Customer table is unique and points to PhoneID in Phone table.
If customer record is deleted, phoneID in Phone table should also be deleted.
Do you have any concerns on my design? Is this designed properly? My problem is
phoneID in Customer table is a child and if child record is deleted then i
can not delete the parent (Phone) record automatically.
I think you've overdesigned it. I see no use for a separate Phone + PhoneDetail table. Typically there are two practical approaches.
1) Simplicity -Put all of the phones in the Customer record itself. Yes, it breaks normalization rules, but its very simple in practice and usually works as long as you provide (Work, Home, Mobile, Fax, Emergency). Upside is code is simply to write, time to implementation is shorter. Retrieving all the phones with a customer record is simple, and so is using a specific type of phone (Customer.Fax).
The downsides : adding additional phone types later is a little more painful, and searching for phone numbers is kludgy. You have to write SQL like "select * from customer where cell = ? or home = ? or work = ? or emergency = ?". Assess your design up front. If either of these issues is a concern, or you don't know if it may be a concern, go with the normalized approach.
2) Extensibility - Go the route you are going. Phone types can be added later, no DDL changes. Customer -> CustomerPhone
Customer (
customerId
)
CustomerPhone (
customerId references Customer(customerId)
phoneType references PhoneTypes(phoneTypeId)
phoneNumber
)
PhoneTypes (
phoneTypeId (H, W, M, F, etc.)
phoneTypeDescription
)
As mrjoltcola already addressed the normalization, I'll tackle the problem of having a record in phone and no record in phone detail.
If that is your only problem there are three approaches:
1) do not delete from detail table but from phone with CASCADE DELETE - gives a delete from two tables with single SQL statement and keeps data consistent
2) have triggers on the detail table that will delete the parent automatically when last record for a parent is deleted from the child (this will not perform well and will slow down all deletes on the table. and it is ugly. still it is possible to do it)
3) do it in the business logic layer of the application - if this layer is properly separated and if users(applications) will be modifying data only through this layer you might reach desired level of consistency guarantee

How to enforce DB integrity with non-unique foreign keys?

I want to have a database table that keeps data with revision history (like pages on Wikipedia). I thought that a good idea would be to have two columns that identify the row: (name, version). So a sample table would look like this:
TABLE PERSONS:
id: int,
name: varchar(30),
version: int,
... // some data assigned to that person.
So if users want to update person's data, they don't make an UPDATE -- instead, they create a new PERSONS row with the same name but different version value. Data shown to the user (for given name) is the one with highest version.
I have a second table, say, DOGS, that references persons in PERSONS table:
TABLE DOGS:
id: int,
name: varchar(30),
owner_name: varchar(30),
...
Obviously, owner_name is a reference to PERSONS.name, but I cannot declare it as a Foreign Key (in MS SQL Server), because PERSONS.name is not unique!
Question: How, then, in MS SQL Server 2008, should I ensure database integrity (i.e., that for each DOG, there exists at least one row in PERSONS such that its PERSON.name == DOG.owner_name)?
I'm looking for the most elegant solution -- I know I could use triggers on PERSONS table, but this is not as declarative and elegant as I want it to be. Any ideas?
Additional Information
The design above has the following advantage that if I need to, I can "remember" a person's current id (or (name, version) pair) and I'm sure that data in that row will never be changed. This is important e.g. if I put this person's data as part of a document that is then printed and in 5 years someone might want to print a copy of it exactly unchanged (e.g. with the same data as today), then this will be very easy for them to do.
Maybe you can think of a completely different design that achieves the same purpose and its integrity can be enforced easier (preferably with foreign keys or other constraints)?
Edit: Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, which I posted as answers. Please vote which one you like better.
In your parent table, create a unique constraint on (id, version). Add version column to your child table, and use a check constraint to make sure that it is always 0. Use a FK constraint to map (parentid, version) to your parent table.
Alternatively you could maintain a person history table for the data that has historic value. This way you keep your Persons and Dogs table tidy and the references simple but also have access to the historically interesting information.
Okay, first thing is that you need to normalize your tables. Google "database normalization" and you'll come up with plenty of reading. The PERSONS table, in particular, needs attention.
Second thing is that when you're creating foreign key references, 99.999% of the time you want to reference an ID (numeric) value. I.e., [DOGS].[owner] should be a reference to [PERSONS].[id].
Edit: Adding an example schema (forgive the loose syntax). I'm assuming each dog has only a single owner. This is one way to implement Person history. All columns are not-null.
Persons Table:
int Id
varchar(30) name
...
PersonHistory Table:
int Id
int PersonId (foreign key to Persons.Id)
int Version (auto-increment)
varchar(30) name
...
Dogs Table:
int Id
int OwnerId (foreign key to Persons.Id)
varchar(30) name
...
The latest version of the data would be stored in the Persons table directly, with older data stored in the PersonHistory table.
I would use and association table to link the many versions to the one pk.
A project I have worked on addressed a similar problem. It was a biological records database where species names can change over time as new research improved understanding of taxonomy.
However old records needed to remain related to the original species names. It got complicated but the basic solution was to have a NAME table that just contained all unique species names, a species table that represented actual species and a NAME_VERSION table that linked the two together. At any one time there would be a preferred name (ie the currently accepted scientific name for the species) which was a boolean field held in name_version.
In your example this would translate to a Details table (detailsid, otherdetails columns) a link table called DetailsVersion (detailsid, personid) and a Person Table (personid, non-changing data). Relate dogs to Person.
Persons
id (int),
name,
.....
activeVersion (this will be UID from personVersionInfo)
note: Above table will have 1 row for each person. will have original info with which person was created.
PersonVersionInfo
UID (unique identifier to identify person + version),
id (int),
name,
.....
versionId (this will be generated for each person)
Dogs
DogID,
DogName
......
PersonsWithDogs
UID,
DogID
EDIT: You will have to join PersonWithDogs, PersionVersionInfo, Dogs to get the full picture (as of today). This kind of structure will help you link a Dog to the Owner (with a specific version).
In case the Person's info changes and you wish to have latest info associated with the Dog, you will have to Update PersonWithDogs table to have the required UID (of the person) for the given Dog.
You can have restrictions such as DogID should be unique in PersonWithDogs.
And in this structure, a UID (person) can have many Dogs.
Your scenarios (what can change/restrictions etc) will help in designing the schema better.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the first of them. Please vote which one you like better.
Solution 1
In PERSONS table, we leave only the name (unique identifier) and a link to current person's data:
TABLE PERSONS:
name: varchar(30),
current_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGE: for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
DISADVANTAGE: if you want to change a person's data, you have to:
add a new PERSONS_DATA row
update PERSONS entry for this person to point to the new PERSONS_DATA row.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the second of them. Please vote which one you like better.
Solution 2
In PERSONS table, we leave only the name (unique identifier) and a link to the first (not current!) person's data:
TABLE PERSONS:
name: varchar(30),
first_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
name: varchar(30)
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGES:
for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
if I want to change a person's data, I don't have to update the PERSONS row, only add a new PERSONS_DATA row
DISADVANTAGE: to retrieve current person's data, I have to either:
choose PERSONS_DATA with given name and highest version (may be expensive)
choose PERSONS_DATA with special version, e.g. "-1", but then I would have to update two PERSONS_DATA rows each time I add new PERSONS_DATA, and in this solution I wanted to avoid having to update 2 rows...
What do you think?