What would be the best schema to store the 'address' for different entities? - sql

Suppose we're making a system where we have to store the addrees for buildings,
persons, cars, etc.
The address 'format' should be something like:
State (From a State list)
County (From a County List)
Street (free text, like '5th Avenue')
Number (free text, like 'Chrysler Building, Floor 10, Office No. 10')
(Yes I don't live in U.S.A)
What would be the best way to store that info:
Should I have a Person_Address, Car_Address, ...
Or the address info should be in columns on each entity,
Could we have just one address table and try to link each row to a different entity?
Or are there another 'better' way to handle this type of scenario?
How would yo do it?

I have seen scenarios where the Address is stored in an Address table and then there are many-to-many link tables which store links to addresses from People - there is a separate table for each so that foreign keys can be enforced. Sometimes the link table stores information about the relationship, like primary, ship-to, etc.
I've also seen it where the address is stored in the row of a customer. This results in effectively arrays of addresses for bill-to, ship-to, etc, and it's fine that way. Having dealt with both, I think I prefer having them in their own entities, it allows you to keep history of old inactive addresses pretty easily.
We've used this same technique for phone numbers, where people need to store varying numbers of phone numbers.

I would highly recommend reading 'Data Model Patterns - Conventions of Thought' by David C. Hay. This issue is discussed in depth by the author.
What you have in your design are two broad entities.
Address of a geographical location
A person/object that resides/belongs to the address
In general, it is not a good practice to combine the address with a person or objects' details in the same table like below
Person(personID, name, gender, addressline1, addressline2)
You could have the following entities in your design
Address(number, street, countyID,stateID)
Party(PartyID, Type)
Person(PersonID, name, dob, gender,...,primaryPartyID)
Car(carID, make, model, ...,primaryPartyID)
The Party is a link between person/car to an address. The primaryPartyID in person and Car tables are foreign keys to the party table. This way, you can share and address between a car and a person. In the event you want to store multiple addresses for each person, you could add a separate m:n table between person and party. The type attribuet for Party can take the following values : 'Person', 'Vehicle' etc...

I'd say to have an AddressType field that is a lookup from a Drop-Down list

Related

Database Design for Contacts

So I have 3 main entities. Airports, Customers, and Vendor.
Each of these will have multiple contacts I need to relate to each.
So they way I set it up currently.
I have the following tables..
Airport
Customer
Vendor
I then have one Contacts table and a xref for Airport, Customer, Vendor...
I am questioning that and was thinking a contacts table for each ..
Airport and AirportContacts
Customer and CustomerContacts
Vendor and VendorContacts
Any drawbacks to either of these designs?
To me, the deciding factor is duplication of entities vs "one version of the truth". If a single real-world person can be a contact for more than one of the other entities, then you don't want to store that single person in multiple contact tables, because then you have to maintain any changes to his/her properties in multiple places.
If you put the same "Joe Smith" in both AirportContacts and VendorContacts, then one day when you look and see his city is "Denver" in one table and "Boston" in another table, which one will you consider to be the truth?
But as someone mentioned in comments, if a contact can only be associated with one of the three other entities ("types" as you call them), then putting them in separate tables makes the most sense.
And there's yet a third scenario. Say "Joe Smith" can be a contact for both Airports and Vendors. But say that he has some properties, like his gender and age, which are the same regardless of which "type" he is being considered, but there might be some properties, like phone number, or position/job title, which could depend on the "type". Maybe he uses one phone in his capacity as an Airport Vendor, and a different phone as a Vendor Contact. Moreover, maybe there are some properties that apply to one type of contact that don't apply to the others. In these cases, I would look at some kind of hybrid approach where you keep common properties in a single Contact table, and "Type"-specific properties in their own type-related tables. These type-related tables would be bridge tables that have FKs back to the Contact table and to the main entity table of the "Type" they are related to (Vendor, Customer or Airport).
What I have so far ... Dont mind some of the data types.. just placed quick placeholders..

composite key lookup in MS access

So I am working on a database for my company, and I have the following tables:
Countries, States, Cities, Vendors
The Cities table has an autonumbered ID, text Name, State (based on an ID from the states table), and a country (based on an ID from the countries table). The reason it has both a city AND country is because some cities don't have a state (i.e. Dubai, Mumbai, etc.), so they are related to the state table by a state called "No State". Because of this, the states table does not relate directly to the countries table, but can be related through the cities table.
Every vendor should belong to a city/state/country, related by that cities ID. I want to create a composite ID with the CityID, CityState and CityCountry. Then, within my vendor table, I want to have a lookup that puts a dropdown box of all unique city names to select, then once selected, ONLY states with that city should be allowed in the vendor state box, then ONLY countries with the given state and city would be allowed in the country box. I want the user to see the name, but each table, including the vendor table, is actually referring to an ID. Is this possible? Is there a better way to structure the data to avoid this?
The company is international, and I want to be able to analyze our company's vendors at city, state, country and regional level (region to country relationship is pretty easy, so I left that out).
You can create a drop down for your columns in access. Google something like 'how to create lookup in access'. This is very common in access and is called a 'lookup.'
As for the filtering you are trying to accomplish, how is your data configured? do you have any relationship information for city to state or state to country?
There are data sets of cities, states and countries that may help you build these relationships (though I have no experience with these data sets...try your Google Fu to find some options). It sounds pretty easy with the state to country relationship, but when you add cities, the list is pretty big. And many city names show up under many different state names.
I suspect that a dynamic filtering system like what I understand you to be asking for may be complex beyond the needs of the project.
You may want to explain your requirements/objectives a little more to give me an opportunity to give you a better answer.

database optimization issue

i have a table Students which contains the following colums:
Id,FirstName,LastName,Adress.
The colum Adress will contain just the street adress.
the question is: will it be better for the database optimization to isolate the column Adress in a different table?
Yes. If you seperate it into another table, you can have more than one address per person. If you seperate it to two different tables, an Address table and a StudentAddress table to map the two together, you can make sure that a single address is shared between people or even track a history of addresses for one person. Further, in a seperate table you can break the address down into columns so that you can easily search by City or Province or Country.
You can't do any of that putting an Address into a single column with the Student table.
It depends on how you are going to treat that Address. If you will need to treat it as different entity, i.e. link single address to several Students or vice versa e.t.c., then you should do normalization.
If address is only attribute of entity student then leave it as is.
For full proper data structures to manage addresses: THe Data Model Ressource BOok, Volume 1. It is a LOT more complicate to get right than you think.

SQL Database Design Best Practice (Addresses)

Of course I realize that there's no one "right way" to design a SQL database, but I wanted to get some opinions on what is better or worse in my particular scenario.
Currently, I'm designing an order entry module (Windows .NET 4.0 application with SQL Server 2008) and I'm torn between two design decisions when it comes to data that can be applied in more than one spot. In this question I'll refer specifically to Addresses.
Addresses can be used by a variety of objects (orders, customers, employees, shipments, etc..) and they almost always contain the same data (Address1/2/3, City, State, Postal Code, Country, etc). I was originally going to include each of these fields as a column in each of the related tables (e.g. Orders will contain Address1/2/3, City, State, etc.. and Customers will also contain this same column layout). But a part of me wants to apply DRY/Normalization principles to this scenario, i.e. have a table called "Addresses" which is referenced via Foreign Key in the appropriate table.
CREATE TABLE DB.dbo.Addresses
(
Id INT
NOT NULL
IDENTITY(1, 1)
PRIMARY KEY
CHECK (Id > 0),
Address1 VARCHAR(120)
NOT NULL,
Address2 VARCHAR(120),
Address3 VARCHAR(120),
City VARCHAR(100)
NOT NULL,
State CHAR(2)
NOT NULL,
Country CHAR(2)
NOT NULL,
PostalCode VARCHAR(16)
NOT NULL
)
CREATE TABLE DB.dbo.Orders
(
Id INT
NOT NULL
IDENTITY(1000, 1)
PRIMARY KEY
CHECK (Id > 1000),
Address INT
CONSTRAINT fk_Orders_Address
FOREIGN KEY REFERENCES Addresses(Id)
CHECK (Address > 0)
NOT NULL,
-- other columns....
)
CREATE TABLE DB.dbo.Customers
(
Id INT
NOT NULL
IDENTITY(1000, 1)
PRIMARY KEY
CHECK (Id > 1000),
Address INT
CONSTRAINT fk_Customers_Address
FOREIGN KEY REFERENCES Addresses(Id)
CHECK (Address > 0)
NOT NULL,
-- other columns....
)
From a design standpoint I like this approach because it creates a standard address format that is easily changeable, i.e. if I ever needed to add Address4 I would just add it in one place rather than to every table. However, I can see the number of JOINs required to build queries might get a little insane.
I guess I'm just wondering if any enterprise-level SQL architects out there have ever used this approach successfully, or if the number of JOINs that this creates would create a performance issue?
You're on the right track by breaking address out into its own table. I'd add a couple of additional suggestions.
Consider taking the Address FK columns out of the Customers/Orders tables and creating junction tables instead. In other words, treat Customers/Addresses and Orders/Addresses as many-to-many relationships in your design now so you can easily support multiple addresses in the future. Yes, this means introducing more tables and joins, but the flexibility you gain is well worth the effort.
Consider creating lookup tables for city, state and country entities. The city/state/country columns of the address table then consist of FKs pointing to these lookup tables. This allows you to guarantee consistent spellings across all addresses and gives you a place to store additional metadata (e.g., city population) if needed in the future.
I just have some cautions. For each of these, there's more than one way to fix the problem.
First, normalization doesn't mean "replace text with an id number".
Second, you don't have a key. I know, you have a column declared "PRIMARY KEY", but that's not enough.
insert into Addresses
(Address1, Address2, Address3, City, State, Country, PostalCode)
values
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500');
select * from Addresses;
1;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
2;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
3;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
4;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
In the absence of any other constraints, your "primary key" identifies a row; it doesn't identify an address. Identifying a row is usually not good enough.
Third, "Address1", "Address2", and "Address3" aren't attributes of addresses. They're attributes of mailing labels. (Lines on a mailing label.) That distinction might not be important to you. It's really important to me.
Fourth, addresses have a lifetime. Between birth and death, they sometimes change. They change when streets get re-routed, buildings get divided, buildings get undivided, and sometimes (I'm pretty sure) when a city employee has a pint too many. Natural disasters can eliminate whole communities. Sometimes buildings get renumbered. In our database, which is tiny compared to most, about 1% per year change like that.
When an address dies, you have to do two things.
Make sure nobody uses that address to mail, ship, or whatever.
Make sure its death doesn't affect historical data.
When an address itself changes, you have to do two things.
Some data must reflect that change. Make sure it does.
Some data must not reflect that change. Make sure it doesn't.
Fifth, DRY doesn't apply to foreign keys. Their whole purpose is to be repeated. The only question is how wide a key? An id number is narrow, but requires a join. (10 id numbers might require 10 joins.) An address is wide, but requires no joins. (I'm talking here about a proper address, not a mailing label.)
That's all I can think of off the top of my head.
I think there is a problem you are not aware of and that is that some of this data is time sensitive. You do not want your records to show you shipped an order to 35 State St, Chicago Il, when you actually sent it to 10 King Street, Martinsburg WV but the customer moved two years after the order was shipped. So yes, build an address table to get the address at that moment in time as long as any change to the address for someone like a customer results in a new addressid not in changing the current address which would break the history on an order.
You would want the addresses to be in a separate table only if they were entities in their own right. Entities have identity (meaning it matters if two objects pointed to the same address or to different ones), and they have their own lifecycle apart from other entities. If this was the case with your domain, I think it would be totally apparent and you wouldn't have a need to ask this question.
Cade's answer explains the mutability of addresses, something like a shipping address is part of an order and shouldn't be able to change out from under the order it belongs to. This shows that the shipping address doesn't have its own lifecycle. Handling it as if it was a separate entity can only lead to more opportunities for error.
"Normalization" specifically refers to removing redundancies from data so you don't have the same item represented in different places. Here the only redundancy is in the DDL, it's not in the data, so "normalization" is not relevant here. (JPA has the concept of embedded classes that can address the redundancy).
TLDR: Use a separate table if the address is truly an Entity, with its own distinct identity and its own lifecycle. Otherwise don't.
What you have to answer for yourself is the question whether the same address in everyday language is actually the same address in your database. If somebody "changes his address" (colloquially), he really links himself to another address. The address per se only changes when a street is renamed, a zip-code reform takes place or a nuke hits. And those are rare events (hopefully for the most part). There goes your main profit: change in one place for multiple rows (of multiple tables).
If you should actually change an address for that in your model - in the sense of an UPDATE on table address - that may or may not work for other rows that link to it. Also, in my experience, even the exact same address has to look different for different purposes. Understand the semantic differences and you will arrive at the right model that represents your real world best.
I have a number of databases where I use a common table of streets (which uses a table of cities (which uses a table of countries, ...)). In combination with a street number think of it as geocodes (lat/lon), not "street names". Addresses are not shared among different tables (or rows). Changes to street names and zip codes cascade, other changes don't.
You would normally normalise the data as far as possible, so use the table 'Addresses'.
You can use views to de-normalise the data afterwards which use indexes and should give a method to access data with easy references, whilst leaving the underlying structure normalised fully.
The number of joins shouldn't be a major issue, index based joins aren't too much of an overhead.
It's fine to have a split out addresses table.
However, you have to avoid the temptation of allowing multiple rows to refer to the same address without an appropriate system for managing options for the user to decide whether and how changing an address splits out a row for the new address change, i.e. You have the same address for billing and ship-to. Then a user says their address is changing. To start with, old orders might (should?) need their ship-to addresses retained, so you can't change it in-place. But the user might also need to say this address I'm changing is only going to change the ship-to.
One should maintain some master tables for City, State and Country. This way one can avoid the different spellings for these entities which might end up with mapping same city with some different state/country.
One can simply map the CityId in the address table as foreign key as shown below, instead of having all the three fields separately (City, State and Country) as plain text in address table itself.
Address: {
CityId
// With other fields
}
City: {
CityId
StateId
// Other fields
}
State: {
StateId
CountryId
// Other fields
}
Country: {
CountryId
// Other fields
}
If one maintains all the three ids (CityId, StateId and CountryId) in address table, at the end you have to make joins against those tables. Hence my suggestion would be to have only CityId and then retrieve rest of the required information though joins with above table structure.
I prefer to use an XREF table that contains a FK reference to the person/business table, a FK reference to the address table and, generally, a FK reference to a role table (HOME, OFFICE, etc) to delineate the actual type of address. I also include an ACTIVE flag to allow me to choose to ignore old address while preserving the ability to maintain an address history.
This approach allows me to maintain multiple addresses of varying types for each primary entity

Best way to model Customer <--> Address

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?
Option 1. Customer has foreign key to Address
Customer (id, phys_address_id, mail_address_id)
Address (id, street, city, etc.)
Option 2. Customer has one-to-many relationship to Address, which contains a field
to describe the address type
Customer (id)
Address (id, customer_id, address_type, street, city, etc.)
Option 3. Address information is de-normalized and stored in Customer
Customer (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)
One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?
I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.
If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach
Customer (id, phys_address_id)
Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
Address (id, street, city, etc.)
One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.
In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.
Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)
On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.
So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.
We are moving forward with a model like this:
Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)
Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.
The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.
We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).
The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.
I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.
When answering those kinds of questions I like to use the classifications of DDD. If it's a Entity it should have a separate ID, if it's a value object it should not.
Option 3 is too restrictive, and option 1 cannot be extended to allow for other address types without changing the schema.
Option 2 is clearly the most flexible and therefore the best choice.
In most code I write nowadays every customer has one and only one physical location. This is the legal entity beeing our business partner. Therefore I put street, city etc in the customer object/table. Often this is the possible simplest thing that works and it works.
When an additional mailing address is needed, I put it in a separate object/table to not clutter the customer object to much.
Earlier in my career I normalized like mad having an order referencing a customer which references a shipping address. This made things "clean" but slow and inelegant to use. Nowadays I use an order object which just contains all the address information. I actually consider this more natural since a customer might change his (default?) address, but the address of a shipment send in 2007 should always stay the same - even if the customer moves in 2008.
We currently implement the VerySimpleAddressProtocol in out project to standardize the fields used.
I'd go for the first option. In these situations I'm very weary of YAGNI (you aren't going to need it). I can't count the number of times I've looked at schemas that've had one-to-many tables "just incase" that are many years old. If you only need two, just use the first option; if the requirement changes in the future, change it then.
Like in many cases: It depends.
If your customers deal with multiple addresses then a to-many relationship would be appropriate. You could introduce a flag on address that signals if an address is for shipment or bill, etc. Or you store the different address types in different tables and have multiple to-one relationships on a customer.
In cases where you only need to know one address of a customer why would you model that to-many? A to-one relationship would satisfy your needs here.
Important: Denormalize only if you encounter performance issues.
I would go with option 1. If you want to, you could even modify it a little bit to keep an address history:
Customer (id, phys_address_id, mail_address_id)
Address (id, customer_id, start_dt, end_dt, street, city, etc.)
If the address changes, just end date the current address and add a new record in the Address table. The phys_address_id and mail_address_id always point to the current address.
That way you can keep a history of addresses, you could have multiple mailing addresses stored in the database (with the default in mail_address_id), and if the physical address and mailing address are identical you'll just point phys_address_id and mail_address_id at the same record.
Good thread. I have spent a while contemplating the most suitable schema and I have concluded that quentin-starin's solution is the best except I have added start_date and end_date fields to what would be his PersonAddress table. I have also decided to add notes, active and deleted.
deleted is for soft delete functionality as I think I do not want to lose trace of previous addresses simply by deleting the record from the junction table. I think that is quite wise and something others may want to consider. If not done this way, it could be left to revision of paper or electronic documents to try to trace address information (something best avoided).
notes I think of being something of a requirement but that might just be preference. I've spent time in backfill exercises verifying addresses in databases and some addresses can be very vague (such as rural addresses) that I think it is very useful to at least allow notes about that address to be held in the record address.
One thing i would like to hear opinions on is the unique indexing of the address table (again, referring to the table of the same name in quentin-starin's example. Do you think it should be unique index should be enforced (as a compound index presumably across all not-null/required fields)? This would seem sensible but it might still be hard to stop duplicate data regardless as postal/zip codes are not always unique to a single property. Even if the country, province and city fields are populated from reference data (which they are in my model), spelling differences in the address lines may not match up. The only way to best avoid this might be to run one or a number of DB queries from the incoming form fields to see if a possible duplicate has been found. Another safety measure would be give the user the option of selecting from address in the database already linked to that person and use that to auto-populate. I think this might be a case where you can only be sensible and take precautions to stop duplication but just accept it can (and probably will) happen sooner or later.
The other very important aspect of this for me is future editing of the address table records. Lets say you have 2 people both listed at: -
11 Whatever Street
Whatever City
Z1P C0D3
Should it not be considered dangerous to allow the same address table record to be assigned to different entities (person, company)? Then let's say the user realises one of these people lives at 111 Whatever Street and there is a typo. If you change that address, it will change it for both of the entities. I would like to avoid that. My suggestion would be to have the model in the MVC (in my case, PHP Yii2) look for existing address records when a new address is being created known to be related to that customer (SELECT * FROM address INNER JOIN personaddress ON personaddress.address_id = address.id WHERE personaddress.person_id = {current person being edited ID}) and provide the user the option of using that record instead (as was essentially suggested above).
I feel linking the same address to multiple different entities is just asking for trouble as it might be a case of refusing later editing of the address record (impractical) or risking that the future editing of the record may corrupt data related to other entities outside of the one who's address record is being edited.
I would love to hear people's thoughts.