Is it best to store address information with the order table or in it's own table? - sql

I am designing a database and initially I setup addresses separate from orders so the orders table just has a billing id and address id reference and the actual address are stored in the address table.
However the more I think about this i am not sure there is any real advantage to putting address into their own table.
The main reason is that a order is self contained meaning that the address is only valid for that order.
Having address linked instead of on the order record leads to some complications when updating addresses. Say if we have two orders from one customer linked to the same address and he wants the address changed on one order but not the other a new address. The application now needs to create a new address in the address table and change the linked address on the order (you can't just change the address because it would change it for both linked orders).
You also have to have code in your application that links new orders to addresses already in the database (and that only works for orders where the addresses are identical or nearly identical).
The only advantage I see right now to having order addresses in their own table is so the billing address and shipping address are not duplicated for every order.
On the surface it seems like preventing billing and shipping address data duplication isn't worth the extra code needed to make storing addresses in their own table work.
Any recommendations on how to handle this? Is there some big advantage to handling the addresses in separate tables that I am missing?

It sounds like what you want (ideally) is for the order to be linked to the address as it was at the time of the order.
There are two main ways to implement this - either the order includes all of the address information within its own structure, or you implement the address table as a temporal table, and ensure the order has appropriate datetime information to locate the correct address rows when queried.
Which one you implement depends on how you wish to handle updates - both models may have issues, depending on whether in-flight orders should/should not be updated.

If the address is linked to that order only, then it should be part of the order, not in its own table.
Unless your database has logic relating to an actual address entity, it doesn't need to be in its own table even if you'll want to add the ability to store address information for other sorts of entities.

Related

Can multiple relationship occur between two entities?

I have following two entities, where a person address is kept in a separate table because a person can have multiple addresses.
The mailing address however is one of the multiple addresses stored in address table, which is to be referred in person table.
Can following relationship exist?
The answer to your direct question is "Yes". However, you cannot declare the model only using create table because:
Declaring the foreign key address.person_id requires that the person table already exist.
Declaring the foreign key person.mailing_address requires that the address table already exist.
Hence, to implement the model, you need to use alter table to add one or both of the constraints after both tables are created.
Is this the model you want? One feature of an address is that multiple people can have the same address. Your model does not allow that. To handle this, you would typically have three tables:
Person
Address
PersonAddress
The third table has one row for each person/address pair. It can also have other information such as:
Type ("mailing" versus other types)
Effective and end dates.
Perhaps other information.
If you want to guarantee uniqueness of the "mailing" address in such a model, many databases support filtered unique indexes, to ensure there are no duplicate mailing addresses.
I'm not sure why you have person_id as an fk on address but that doesn't look correct. There are lots of correct ways to model this and the best one for you will depend on you particular circumstances - but a couple of options are:
If you know all the types of addresses there can be then add multiple address fk fields to the person e.g. billing address, shipping address, etc. This makes querying quick and simple but is inflexible: adding a new address type in the future is relatively complex to implement
Add an intersection table with fks for person and address and an address type field. This has a slight overhead when it comes to querying but has the advantage if being very flexible: adding a new address type is trivial

Do two tables with the same content break data normalization?

[Assuming there is a one to many relationship between an individual and an address, and assuming there is a one to many relationship between an agency and an address.]
Given the following table structure:
Wouldn't you want to merge the two address tables together and instead of using a foreign key within each one use a tie table?
Like this:
Are they both valid for normalization or only one?
Depends what you want to do.
In your second example with the tie tables, if I want to do a mailshot to my customers then my query has to go out to the agency tie table to exclude any agency addresses.
Of course you could have an address type column to differentiate but then you have a more complex query for your insert statement.
So although "address" is a global idea, sometimes it is easier to have it segregated by context.
Secondly, your customer data would usually be changing much more than your agency data. There may also be organisational and legal requirements around storage of personal data that make it better to separate the two.
e.g. in a health records system I want to be able to easily extract / restrict client data and to keep my configuration or commissioning data separate.
Thus in all the client systems I have used, the model tends to be the first one you describe rather than the second.

SQL table design: Should I store addresses with orders or in separate table?

I am busy creating a basic ecommerce website and would like to know what is the best of the following two options regarding the way I store the billing and delivery addresses. I am open to any other suggestions.
I can included the billing address and delivery address in the Order table:
order
-------
billing_name
billing_address
billing_state
shipping_name
shipping_address
shipping_state
Otherwise I can create another table that will just store addresses for orders:
order
-------
billing_address_id
shipping_address_id
order_address
-------
address_id
name
address
state
I would usually choose the second. This will let you have many different addresses for a customer of different types. But I would normally address this at the customer level first, then address the orders and invoices.
However, you may need to address the nature of your order workflow/business rules.
Once an order is completed, is it a document (like an invoice)? If so, then the address should be locked in at that time and cannot be altered, otherwise you may not be able re-present the original document.
When a customer changes their billing address, does the billing address of an old order even matter anymore? In this case, the billing address does not even need to be linked from the order, only from the customer. If you were to re-present the orders for payment, you would present them to their current billing address.
Personally, I like neither of your solutions although the second solution is "righter" in terms of database theory. If you have repeating addresses you should store them once.
The problem comes in implementation. When an order is placed, you are going to have to make a decision whether you want to use an existing address, update an existing address (for instance, with a newly-added apartment number) or create a new address (the customer has moved, has a new summer address, whatever).
In order to do this, someone (an employee for direct or phone sales, the customer or the program for on-line sales) will have to make a decision as to whether you're performing an address update or address addition operation. It's very difficult to get users to make this kind of decision accurately. If an update is performed when an addition was really needed, you've corrupted your order history (the older orders point to the new address). If an addition is performed when an update was the correct choice, you've eliminated the value of the normalized structure.
In situations like this I've come, not entirely happily, to the conclusion that the best option is to store one or more addresses for the customer and then copy the address information into address fields in the order itself.
If you choose your second option, you need to plan on writing a really good user interface to the address system to avoid the kind of problems I mentioned above. And remember that not only you, but every programmer who works on the project in the future is going to have to understand and agree on the management of that address table.
Pulling the addresses into separate tables is more normalized, but be careful. If you allow the addresses to be updated, you can lose track of where the order was originally intended to be billed to / shipped to.
Edit: Based on the new comment, where you are going to copy from an address book, to an order_address table, it might keep your order table cleaner to do so, but if you are going to duplicate the data anyway, I would say copy it to the record it belongs too.
--
Both, denormalize the shipping, and keep them with the order. Storage is cheap, and it's easier than managing a bunch of extra data in the address table. But keep the addresses split out so customers don't have to enter them over again. If you denormalize, then you don't have to keep explicit records in your addresses table for them, and worry about doing soft deletes.
Don't underestimate the complexity of managing addresses. If I enter an address into your system, and associate it with my account, and then realize some part of it is wrong, then you need to remove the old one and create a new one. The removal can be a soft delete, but it needs to be removed. You can try to decide if I was putting in a new address, or dramatically changing the old one. Or you can only allow adding and removing addresses. But when operations happen with addresses, the previous orders need to maintain the data that was assigned to it in the first place. Editing an address that is already associated with an order, would modify where the order says it was sent to, after it has already been sent. Make sure you think through these scenarios. There are several ways to solve the potential problems, but your decision is really based on how you want to handle these situations. if you denormalize and copy the address information to the order once it is placed, then editing addresses in the address table becomes less of an issue. Decide how to handle these situations, and your database schema just needs to support that. Either choice works.
I would keep the addresses in a separate table and reference them from the orders. I would include a "CurrentAddress" attribute, so that an end user can "delete" that address from their list of current address. The value would still exist in the table so previous orders could reference the address for historical purposes, but it would no longer be a selectable address at order time.
The second method has a couple advantages over the first. It is easier simply to have the addresses be the same, as they often will be, with less possibility of error. Also, if you ever save addresses within an account, the second method will give you an easier time. That said, you need to verify that a given address actually belongs to the same account as the order, which you might do by including a customer_id field in both the order and order_address tables, then including customer_id in the primary key of order_address and the foreign key from order to order_address.
It depends whether addresses are re-used.
If you have a "registered customer" table, you should definitely go for the option with "delivery_adress", "billing_adress", etc. tables, each record of them being linked to a customer.
A table with the billing and shipping address id will serve better if you plan to register users in your site, then you can place the order in other table and use the ids you already have to correlate data between order info <=> billing/shipping addresses

Best way to model Customer <--> Address

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?
Option 1. Customer has foreign key to Address
Customer (id, phys_address_id, mail_address_id)
Address (id, street, city, etc.)
Option 2. Customer has one-to-many relationship to Address, which contains a field
to describe the address type
Customer (id)
Address (id, customer_id, address_type, street, city, etc.)
Option 3. Address information is de-normalized and stored in Customer
Customer (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)
One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?
I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.
If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach
Customer (id, phys_address_id)
Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
Address (id, street, city, etc.)
One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.
In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.
Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)
On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.
So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.
We are moving forward with a model like this:
Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)
Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.
The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.
We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).
The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.
I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.
When answering those kinds of questions I like to use the classifications of DDD. If it's a Entity it should have a separate ID, if it's a value object it should not.
Option 3 is too restrictive, and option 1 cannot be extended to allow for other address types without changing the schema.
Option 2 is clearly the most flexible and therefore the best choice.
In most code I write nowadays every customer has one and only one physical location. This is the legal entity beeing our business partner. Therefore I put street, city etc in the customer object/table. Often this is the possible simplest thing that works and it works.
When an additional mailing address is needed, I put it in a separate object/table to not clutter the customer object to much.
Earlier in my career I normalized like mad having an order referencing a customer which references a shipping address. This made things "clean" but slow and inelegant to use. Nowadays I use an order object which just contains all the address information. I actually consider this more natural since a customer might change his (default?) address, but the address of a shipment send in 2007 should always stay the same - even if the customer moves in 2008.
We currently implement the VerySimpleAddressProtocol in out project to standardize the fields used.
I'd go for the first option. In these situations I'm very weary of YAGNI (you aren't going to need it). I can't count the number of times I've looked at schemas that've had one-to-many tables "just incase" that are many years old. If you only need two, just use the first option; if the requirement changes in the future, change it then.
Like in many cases: It depends.
If your customers deal with multiple addresses then a to-many relationship would be appropriate. You could introduce a flag on address that signals if an address is for shipment or bill, etc. Or you store the different address types in different tables and have multiple to-one relationships on a customer.
In cases where you only need to know one address of a customer why would you model that to-many? A to-one relationship would satisfy your needs here.
Important: Denormalize only if you encounter performance issues.
I would go with option 1. If you want to, you could even modify it a little bit to keep an address history:
Customer (id, phys_address_id, mail_address_id)
Address (id, customer_id, start_dt, end_dt, street, city, etc.)
If the address changes, just end date the current address and add a new record in the Address table. The phys_address_id and mail_address_id always point to the current address.
That way you can keep a history of addresses, you could have multiple mailing addresses stored in the database (with the default in mail_address_id), and if the physical address and mailing address are identical you'll just point phys_address_id and mail_address_id at the same record.
Good thread. I have spent a while contemplating the most suitable schema and I have concluded that quentin-starin's solution is the best except I have added start_date and end_date fields to what would be his PersonAddress table. I have also decided to add notes, active and deleted.
deleted is for soft delete functionality as I think I do not want to lose trace of previous addresses simply by deleting the record from the junction table. I think that is quite wise and something others may want to consider. If not done this way, it could be left to revision of paper or electronic documents to try to trace address information (something best avoided).
notes I think of being something of a requirement but that might just be preference. I've spent time in backfill exercises verifying addresses in databases and some addresses can be very vague (such as rural addresses) that I think it is very useful to at least allow notes about that address to be held in the record address.
One thing i would like to hear opinions on is the unique indexing of the address table (again, referring to the table of the same name in quentin-starin's example. Do you think it should be unique index should be enforced (as a compound index presumably across all not-null/required fields)? This would seem sensible but it might still be hard to stop duplicate data regardless as postal/zip codes are not always unique to a single property. Even if the country, province and city fields are populated from reference data (which they are in my model), spelling differences in the address lines may not match up. The only way to best avoid this might be to run one or a number of DB queries from the incoming form fields to see if a possible duplicate has been found. Another safety measure would be give the user the option of selecting from address in the database already linked to that person and use that to auto-populate. I think this might be a case where you can only be sensible and take precautions to stop duplication but just accept it can (and probably will) happen sooner or later.
The other very important aspect of this for me is future editing of the address table records. Lets say you have 2 people both listed at: -
11 Whatever Street
Whatever City
Z1P C0D3
Should it not be considered dangerous to allow the same address table record to be assigned to different entities (person, company)? Then let's say the user realises one of these people lives at 111 Whatever Street and there is a typo. If you change that address, it will change it for both of the entities. I would like to avoid that. My suggestion would be to have the model in the MVC (in my case, PHP Yii2) look for existing address records when a new address is being created known to be related to that customer (SELECT * FROM address INNER JOIN personaddress ON personaddress.address_id = address.id WHERE personaddress.person_id = {current person being edited ID}) and provide the user the option of using that record instead (as was essentially suggested above).
I feel linking the same address to multiple different entities is just asking for trouble as it might be a case of refusing later editing of the address record (impractical) or risking that the future editing of the record may corrupt data related to other entities outside of the one who's address record is being edited.
I would love to hear people's thoughts.

Is this a good way to model address information in a relational database?

I'm wondering if this is a good design. I have a number of tables that require address information (e.g. street, post code/zip, country, fax, email). Sometimes the same address will be repeated multiple times. For example, an address may be stored against a supplier, and then on each purchase order sent to them. The supplier may then change their address and any subsequent purchase orders should have the new address. It's more complicated than this, but that's an example requirement.
Option 1
Put all the address columns as attributes on the various tables. Copy the details down from the supplier to the PO as it is created. Potentially store multiple copies of the
Option 2
Create a separate address table. Have a foreign key from the supplier and purchase order tables to the address table. Only allow insert and delete on the address table as updates could change more than you intend. Then I would have some scheduled task that deletes any rows from the address table that are no longer referenced by anything so unused rows were not left about. Perhaps also have a unique constraint on all the non-pk columns in the address table to stop duplicates as well.
I'm leaning towards option 2. Is there a better way?
EDIT: I must keep the address on the purchase order as it was when sent. Also, it's a bit more complicated that I suggested as there may be a delivery address and a billing address (there's also a bunch of other tables that have address information).
After a while, I will delete old purchase orders en-masse based on their date. It is after this that I was intending on garbage collecting any address records that are not referenced anymore by anything (otherwise it feels like I'm creating a leak).
I actually use this as one of my interview questions. The following is a good place to start:
Addresses
---------
AddressId (PK)
Street1
... (etc)
and
AddressTypes
------------
AddressTypeId
AddressTypeName
and
UserAddresses (substitute "Company", "Account", whatever for Users)
-------------
UserId
AddressTypeId
AddressId
This way, your addresses are totally unaware of how they are being used, and your entities (Users, Accounts) don't directly know anything about addresses either. It's all up to the linking tables you create (UserAddresses in this case, but you can do whatever fits your model).
One piece of somewhat contradictory advice for a potentially large database: go ahead and put a "primary" address directly on your entities (in the Users table in this case) along with a "HasMoreAddresses" field. It seems icky compared to just using the clean design above, but can simplify coding for typical use cases, and the denormalization can make a big difference for performance.
Option 2, without a doubt.
Some important things to keep in mind: it's an important aspect of design to indicate to the users when addresses are linked to one another. I.e. corporate address being the same as shipping address; if they want to change the shipping address, do they want to change the corporate address too, or do they want to specify a new loading dock? This sort of stuff, and the ability to present users with this information and to change things with this sort of granularity is VERY important. This is important, too, about the updates; give the user the granularity to "split" entries. Not that this sort of UI is easy to design; in point of fact, it's a bitch. But it's really important to do; anything less will almost certainly cause your users to get very frustrated and annoyed.
Also; I'd strongly recommend keeping around the old address data; don't run a process to clean it up. Unless you have a VERY busy database, your database software will be able to handle the excess data. Really. One common mistake I see about databases is attempting to overoptimize; you DO want to optimize the hell out of your queries, but you DON'T want to optimize your unused data out. (Again, if your database activity is VERY HIGH, you may need to have something does this, but it's almost a certainty that your database will work well with still having excess data around in the tables.) In most situations, it's actually more advantageous to simply let your database grow than it is to attempt to optimize it. (Deletion of sporadic data from your tables won't cause a significant reduction in the size of your database, and when it does... well, the reindexing that that causes can be a gigantic drain on the database.)
Do you want to keep a historical record of what address was originally on the purchase order?
If yes go with option 1, otherwise store it in the supplier table and link each purchase order to the supplier.
BTW: A sure sign of a poor DB design is the need for an automated job to keep the data "cleaned up" or in synch. Option 2 is likely a bad idea by that measure
I think I agree with JohnFx..
Another thing about (snail-)mail addresses, since you want to include country I assume you want to ship/mail internationally, please keep the address field mostly freeform text. It's really annoying having to make up an 5 digit zip code when Norway don't have zip-codes, we have 4 digit post-numbers.
The best fields would be:
Name/Company
Address (multiline textarea)
Country
This should be pretty global, if the US-postal system require zip-codes in a specific format, then include that too but make it optional unless USA is selected as country. Everyone know how to format the address in their country, so as long as you keep the linebreaks it should be okay...
Why would any of the rows on the address table become unused? Surely they would still be pointed at by the purchase order that used them?
It seems to me that stopping the duplicates should be the priority, thus negating the need for any cleanup.
In the case of orders, you would never want to update the address as the person (or company) address changed if the order has been sent. You meed the record of where the order was actually sent if there is an issue with the order.
The address table is a good idea. Make a unique constraint on it so that the same entity cannot have duplicate addresses. You may still get them as users may add another one instead of looking them up and if they spell things slightly differently (St. instead of Street) the unique constraint won't prevent that. Copy the data at the time the order is created to the order. This is one case where you want the multiple records because you need a historical record of what you sent where. Only allowing inserts and deletes to the table makes no sense to me as they aren't any safer than updates and involve more work for the database. An update is done in one call to the database. If an address changes in your idea, then you must first delete the old address and then insert the new one. Not only more calls to the databse but twice the chance of making a code error.
I've seen every system using option 1 get into data quality trouble. After 5 years 30% of all addresses will no longer be current.