SQL Database Design Best Practice (Addresses) - sql

Of course I realize that there's no one "right way" to design a SQL database, but I wanted to get some opinions on what is better or worse in my particular scenario.
Currently, I'm designing an order entry module (Windows .NET 4.0 application with SQL Server 2008) and I'm torn between two design decisions when it comes to data that can be applied in more than one spot. In this question I'll refer specifically to Addresses.
Addresses can be used by a variety of objects (orders, customers, employees, shipments, etc..) and they almost always contain the same data (Address1/2/3, City, State, Postal Code, Country, etc). I was originally going to include each of these fields as a column in each of the related tables (e.g. Orders will contain Address1/2/3, City, State, etc.. and Customers will also contain this same column layout). But a part of me wants to apply DRY/Normalization principles to this scenario, i.e. have a table called "Addresses" which is referenced via Foreign Key in the appropriate table.
CREATE TABLE DB.dbo.Addresses
(
Id INT
NOT NULL
IDENTITY(1, 1)
PRIMARY KEY
CHECK (Id > 0),
Address1 VARCHAR(120)
NOT NULL,
Address2 VARCHAR(120),
Address3 VARCHAR(120),
City VARCHAR(100)
NOT NULL,
State CHAR(2)
NOT NULL,
Country CHAR(2)
NOT NULL,
PostalCode VARCHAR(16)
NOT NULL
)
CREATE TABLE DB.dbo.Orders
(
Id INT
NOT NULL
IDENTITY(1000, 1)
PRIMARY KEY
CHECK (Id > 1000),
Address INT
CONSTRAINT fk_Orders_Address
FOREIGN KEY REFERENCES Addresses(Id)
CHECK (Address > 0)
NOT NULL,
-- other columns....
)
CREATE TABLE DB.dbo.Customers
(
Id INT
NOT NULL
IDENTITY(1000, 1)
PRIMARY KEY
CHECK (Id > 1000),
Address INT
CONSTRAINT fk_Customers_Address
FOREIGN KEY REFERENCES Addresses(Id)
CHECK (Address > 0)
NOT NULL,
-- other columns....
)
From a design standpoint I like this approach because it creates a standard address format that is easily changeable, i.e. if I ever needed to add Address4 I would just add it in one place rather than to every table. However, I can see the number of JOINs required to build queries might get a little insane.
I guess I'm just wondering if any enterprise-level SQL architects out there have ever used this approach successfully, or if the number of JOINs that this creates would create a performance issue?

You're on the right track by breaking address out into its own table. I'd add a couple of additional suggestions.
Consider taking the Address FK columns out of the Customers/Orders tables and creating junction tables instead. In other words, treat Customers/Addresses and Orders/Addresses as many-to-many relationships in your design now so you can easily support multiple addresses in the future. Yes, this means introducing more tables and joins, but the flexibility you gain is well worth the effort.
Consider creating lookup tables for city, state and country entities. The city/state/country columns of the address table then consist of FKs pointing to these lookup tables. This allows you to guarantee consistent spellings across all addresses and gives you a place to store additional metadata (e.g., city population) if needed in the future.

I just have some cautions. For each of these, there's more than one way to fix the problem.
First, normalization doesn't mean "replace text with an id number".
Second, you don't have a key. I know, you have a column declared "PRIMARY KEY", but that's not enough.
insert into Addresses
(Address1, Address2, Address3, City, State, Country, PostalCode)
values
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500');
select * from Addresses;
1;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
2;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
3;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
4;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
In the absence of any other constraints, your "primary key" identifies a row; it doesn't identify an address. Identifying a row is usually not good enough.
Third, "Address1", "Address2", and "Address3" aren't attributes of addresses. They're attributes of mailing labels. (Lines on a mailing label.) That distinction might not be important to you. It's really important to me.
Fourth, addresses have a lifetime. Between birth and death, they sometimes change. They change when streets get re-routed, buildings get divided, buildings get undivided, and sometimes (I'm pretty sure) when a city employee has a pint too many. Natural disasters can eliminate whole communities. Sometimes buildings get renumbered. In our database, which is tiny compared to most, about 1% per year change like that.
When an address dies, you have to do two things.
Make sure nobody uses that address to mail, ship, or whatever.
Make sure its death doesn't affect historical data.
When an address itself changes, you have to do two things.
Some data must reflect that change. Make sure it does.
Some data must not reflect that change. Make sure it doesn't.
Fifth, DRY doesn't apply to foreign keys. Their whole purpose is to be repeated. The only question is how wide a key? An id number is narrow, but requires a join. (10 id numbers might require 10 joins.) An address is wide, but requires no joins. (I'm talking here about a proper address, not a mailing label.)
That's all I can think of off the top of my head.

I think there is a problem you are not aware of and that is that some of this data is time sensitive. You do not want your records to show you shipped an order to 35 State St, Chicago Il, when you actually sent it to 10 King Street, Martinsburg WV but the customer moved two years after the order was shipped. So yes, build an address table to get the address at that moment in time as long as any change to the address for someone like a customer results in a new addressid not in changing the current address which would break the history on an order.

You would want the addresses to be in a separate table only if they were entities in their own right. Entities have identity (meaning it matters if two objects pointed to the same address or to different ones), and they have their own lifecycle apart from other entities. If this was the case with your domain, I think it would be totally apparent and you wouldn't have a need to ask this question.
Cade's answer explains the mutability of addresses, something like a shipping address is part of an order and shouldn't be able to change out from under the order it belongs to. This shows that the shipping address doesn't have its own lifecycle. Handling it as if it was a separate entity can only lead to more opportunities for error.
"Normalization" specifically refers to removing redundancies from data so you don't have the same item represented in different places. Here the only redundancy is in the DDL, it's not in the data, so "normalization" is not relevant here. (JPA has the concept of embedded classes that can address the redundancy).
TLDR: Use a separate table if the address is truly an Entity, with its own distinct identity and its own lifecycle. Otherwise don't.

What you have to answer for yourself is the question whether the same address in everyday language is actually the same address in your database. If somebody "changes his address" (colloquially), he really links himself to another address. The address per se only changes when a street is renamed, a zip-code reform takes place or a nuke hits. And those are rare events (hopefully for the most part). There goes your main profit: change in one place for multiple rows (of multiple tables).
If you should actually change an address for that in your model - in the sense of an UPDATE on table address - that may or may not work for other rows that link to it. Also, in my experience, even the exact same address has to look different for different purposes. Understand the semantic differences and you will arrive at the right model that represents your real world best.
I have a number of databases where I use a common table of streets (which uses a table of cities (which uses a table of countries, ...)). In combination with a street number think of it as geocodes (lat/lon), not "street names". Addresses are not shared among different tables (or rows). Changes to street names and zip codes cascade, other changes don't.

You would normally normalise the data as far as possible, so use the table 'Addresses'.
You can use views to de-normalise the data afterwards which use indexes and should give a method to access data with easy references, whilst leaving the underlying structure normalised fully.
The number of joins shouldn't be a major issue, index based joins aren't too much of an overhead.

It's fine to have a split out addresses table.
However, you have to avoid the temptation of allowing multiple rows to refer to the same address without an appropriate system for managing options for the user to decide whether and how changing an address splits out a row for the new address change, i.e. You have the same address for billing and ship-to. Then a user says their address is changing. To start with, old orders might (should?) need their ship-to addresses retained, so you can't change it in-place. But the user might also need to say this address I'm changing is only going to change the ship-to.

One should maintain some master tables for City, State and Country. This way one can avoid the different spellings for these entities which might end up with mapping same city with some different state/country.
One can simply map the CityId in the address table as foreign key as shown below, instead of having all the three fields separately (City, State and Country) as plain text in address table itself.
Address: {
CityId
// With other fields
}
City: {
CityId
StateId
// Other fields
}
State: {
StateId
CountryId
// Other fields
}
Country: {
CountryId
// Other fields
}
If one maintains all the three ids (CityId, StateId and CountryId) in address table, at the end you have to make joins against those tables. Hence my suggestion would be to have only CityId and then retrieve rest of the required information though joins with above table structure.

I prefer to use an XREF table that contains a FK reference to the person/business table, a FK reference to the address table and, generally, a FK reference to a role table (HOME, OFFICE, etc) to delineate the actual type of address. I also include an ACTIVE flag to allow me to choose to ignore old address while preserving the ability to maintain an address history.
This approach allows me to maintain multiple addresses of varying types for each primary entity

Related

SQL Join to either table, Best way or alternative design

I am designing a database for a system and I came up with the following three tables
My problem is that an Address can belong to either a Person or a Company (or other things in the future) So how do I model this?
I discarded putting the address information in both tables (Person
and Company) because of it would be repeated
I thought of adding two columns (PersonId and CompanyId) to the
Address table and keep one of them null, but then I will need to add
one column for every future relation like this that appears (for
example an asset can have an address where its located at)
The last option that occur to me was to create two columns, one
called Type and other Id, so a pair of values would represent a
single record in the target table, for example: Type=Person,Id=5 and
Type=Company,Id=9 this way I can Join the right table using the type
and it will only be two columns no matter how many tables relate to
this table. But I cannot have constraints which reduce data integrity
I don't know if I am designing this properly. I think this should be a common issue (I've faced it at least three times during this small design in objects like Contact information, etc...) But I could not find many information or examples that would resemble mine.
Thanks for any guidance that you can give me
There are several basic approaches you could take, depending on how much you want to future proof your system.
In general, Has-One relationships are modeled by a foreign key on the owning entity, pointing to the primary key on the owned entity. So you would have an AddressId on both Company and Person,which would be a foreign key to Address.Id. The complexity in your case is how to handle the fact that a person can have multiple addresses. If you are 100% sure that there will only ever be a home and work address, you could put two foreign key columns on Person, but this becomes a big problem if there's a third, fourth, fifth etc. address. The other option is to create a join table, PersonAddress, with three columns a PersonId an AddressId and a AddressType, to indicate whether its a home work or whatever address.

Storing entities with dynamic set of properties in a table (and using fixed columns or key-value tables)

I would like to store people's information in the database (table). Every person can have a different set of properties. I would like to store all these properties. But creating fixed number of columns does not make my application scalable.
So, another approach is to store these values inside key-value tables which leads to tables with a few columns but a huge number of rows.
So I am wondering if there is another way of storing these information which is also easy and fast to query.
What Database are you using.
One solution (if your DB supports) could be storing the person's information in an XML Format.
If you are stuck using SQL Server for this task, you can leverage its XML Support. Notably, you can use XPath.
Using XPath Queries in SQLXML 4.0
No matter how your data is organized conceptually, its the DBSM's job to make it fast, so usually, you shouldn't have to worry about whether your query is fast.
The first and most obvious, but also the least recommended way is using alter table which will allow you to modify the table later on.
The more recommended route is decomposing your table into many tables and relating them with primary keys.
So your person might have as a primary key social security, and attributes name, last name, street address, street name, postal code
Then you would decompose the table into (in pseudocode)
Table Person: social
Table Address: social, street_address, street_name, postal_code
Table Info: social, first_name, last_name
Then you can join them with
Select Person.social_security, first_name, last_name [...etc...] from Person, Address, Info where Person.social = Address.social and Person.social = Info.social
Then you can keep adding more tables as needed. There's no problem with having lots of rows in the other tables, and only a few in Person, and this is the recommended way because it lessens the number of empty spaces if a person doesn't have an address but does have a first name and last name, and for many other reasons.

What would be the best schema to store the 'address' for different entities?

Suppose we're making a system where we have to store the addrees for buildings,
persons, cars, etc.
The address 'format' should be something like:
State (From a State list)
County (From a County List)
Street (free text, like '5th Avenue')
Number (free text, like 'Chrysler Building, Floor 10, Office No. 10')
(Yes I don't live in U.S.A)
What would be the best way to store that info:
Should I have a Person_Address, Car_Address, ...
Or the address info should be in columns on each entity,
Could we have just one address table and try to link each row to a different entity?
Or are there another 'better' way to handle this type of scenario?
How would yo do it?
I have seen scenarios where the Address is stored in an Address table and then there are many-to-many link tables which store links to addresses from People - there is a separate table for each so that foreign keys can be enforced. Sometimes the link table stores information about the relationship, like primary, ship-to, etc.
I've also seen it where the address is stored in the row of a customer. This results in effectively arrays of addresses for bill-to, ship-to, etc, and it's fine that way. Having dealt with both, I think I prefer having them in their own entities, it allows you to keep history of old inactive addresses pretty easily.
We've used this same technique for phone numbers, where people need to store varying numbers of phone numbers.
I would highly recommend reading 'Data Model Patterns - Conventions of Thought' by David C. Hay. This issue is discussed in depth by the author.
What you have in your design are two broad entities.
Address of a geographical location
A person/object that resides/belongs to the address
In general, it is not a good practice to combine the address with a person or objects' details in the same table like below
Person(personID, name, gender, addressline1, addressline2)
You could have the following entities in your design
Address(number, street, countyID,stateID)
Party(PartyID, Type)
Person(PersonID, name, dob, gender,...,primaryPartyID)
Car(carID, make, model, ...,primaryPartyID)
The Party is a link between person/car to an address. The primaryPartyID in person and Car tables are foreign keys to the party table. This way, you can share and address between a car and a person. In the event you want to store multiple addresses for each person, you could add a separate m:n table between person and party. The type attribuet for Party can take the following values : 'Person', 'Vehicle' etc...
I'd say to have an AddressType field that is a lookup from a Drop-Down list

Best way to model Customer <--> Address

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?
Option 1. Customer has foreign key to Address
Customer (id, phys_address_id, mail_address_id)
Address (id, street, city, etc.)
Option 2. Customer has one-to-many relationship to Address, which contains a field
to describe the address type
Customer (id)
Address (id, customer_id, address_type, street, city, etc.)
Option 3. Address information is de-normalized and stored in Customer
Customer (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)
One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?
I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.
If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach
Customer (id, phys_address_id)
Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
Address (id, street, city, etc.)
One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.
In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.
Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)
On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.
So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.
We are moving forward with a model like this:
Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)
Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.
The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.
We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).
The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.
I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.
When answering those kinds of questions I like to use the classifications of DDD. If it's a Entity it should have a separate ID, if it's a value object it should not.
Option 3 is too restrictive, and option 1 cannot be extended to allow for other address types without changing the schema.
Option 2 is clearly the most flexible and therefore the best choice.
In most code I write nowadays every customer has one and only one physical location. This is the legal entity beeing our business partner. Therefore I put street, city etc in the customer object/table. Often this is the possible simplest thing that works and it works.
When an additional mailing address is needed, I put it in a separate object/table to not clutter the customer object to much.
Earlier in my career I normalized like mad having an order referencing a customer which references a shipping address. This made things "clean" but slow and inelegant to use. Nowadays I use an order object which just contains all the address information. I actually consider this more natural since a customer might change his (default?) address, but the address of a shipment send in 2007 should always stay the same - even if the customer moves in 2008.
We currently implement the VerySimpleAddressProtocol in out project to standardize the fields used.
I'd go for the first option. In these situations I'm very weary of YAGNI (you aren't going to need it). I can't count the number of times I've looked at schemas that've had one-to-many tables "just incase" that are many years old. If you only need two, just use the first option; if the requirement changes in the future, change it then.
Like in many cases: It depends.
If your customers deal with multiple addresses then a to-many relationship would be appropriate. You could introduce a flag on address that signals if an address is for shipment or bill, etc. Or you store the different address types in different tables and have multiple to-one relationships on a customer.
In cases where you only need to know one address of a customer why would you model that to-many? A to-one relationship would satisfy your needs here.
Important: Denormalize only if you encounter performance issues.
I would go with option 1. If you want to, you could even modify it a little bit to keep an address history:
Customer (id, phys_address_id, mail_address_id)
Address (id, customer_id, start_dt, end_dt, street, city, etc.)
If the address changes, just end date the current address and add a new record in the Address table. The phys_address_id and mail_address_id always point to the current address.
That way you can keep a history of addresses, you could have multiple mailing addresses stored in the database (with the default in mail_address_id), and if the physical address and mailing address are identical you'll just point phys_address_id and mail_address_id at the same record.
Good thread. I have spent a while contemplating the most suitable schema and I have concluded that quentin-starin's solution is the best except I have added start_date and end_date fields to what would be his PersonAddress table. I have also decided to add notes, active and deleted.
deleted is for soft delete functionality as I think I do not want to lose trace of previous addresses simply by deleting the record from the junction table. I think that is quite wise and something others may want to consider. If not done this way, it could be left to revision of paper or electronic documents to try to trace address information (something best avoided).
notes I think of being something of a requirement but that might just be preference. I've spent time in backfill exercises verifying addresses in databases and some addresses can be very vague (such as rural addresses) that I think it is very useful to at least allow notes about that address to be held in the record address.
One thing i would like to hear opinions on is the unique indexing of the address table (again, referring to the table of the same name in quentin-starin's example. Do you think it should be unique index should be enforced (as a compound index presumably across all not-null/required fields)? This would seem sensible but it might still be hard to stop duplicate data regardless as postal/zip codes are not always unique to a single property. Even if the country, province and city fields are populated from reference data (which they are in my model), spelling differences in the address lines may not match up. The only way to best avoid this might be to run one or a number of DB queries from the incoming form fields to see if a possible duplicate has been found. Another safety measure would be give the user the option of selecting from address in the database already linked to that person and use that to auto-populate. I think this might be a case where you can only be sensible and take precautions to stop duplication but just accept it can (and probably will) happen sooner or later.
The other very important aspect of this for me is future editing of the address table records. Lets say you have 2 people both listed at: -
11 Whatever Street
Whatever City
Z1P C0D3
Should it not be considered dangerous to allow the same address table record to be assigned to different entities (person, company)? Then let's say the user realises one of these people lives at 111 Whatever Street and there is a typo. If you change that address, it will change it for both of the entities. I would like to avoid that. My suggestion would be to have the model in the MVC (in my case, PHP Yii2) look for existing address records when a new address is being created known to be related to that customer (SELECT * FROM address INNER JOIN personaddress ON personaddress.address_id = address.id WHERE personaddress.person_id = {current person being edited ID}) and provide the user the option of using that record instead (as was essentially suggested above).
I feel linking the same address to multiple different entities is just asking for trouble as it might be a case of refusing later editing of the address record (impractical) or risking that the future editing of the record may corrupt data related to other entities outside of the one who's address record is being edited.
I would love to hear people's thoughts.

Address book DB schema

I need to store contact information for users. I want to present this data on the page as an hCard and downloadable as a vCard. I'd also like to be able to search the database by phone number, email, etc.
What do you think is the best way to store this data? Since users could have multiple addresses, etc complete normalization would be a mess. I'm thinking about using XML, but I'm not familiar with querying XML db fields. Would I still be able to search for users by contact info?
I'm using SQL Server 2005, if that matters.
Consider two tables for People and their addresses:
People (pid, prefix, firstName, lastName, suffix, DOB, ... primaryAddressTag )
AddressBook (pid, tag, address1, address2, city, stateProv, postalCode, ... )
The Primary Key (that uniquely identifies each and every row) of People is pid. The PK of AddressBook is the composition of pid and tag (pid, tag).
Some example data:
People
1, Kirk
2, Spock
AddressBook
1, home, '123 Main Street', Iowa
1, work, 'USS Enterprise NCC-1701'
2, other, 'Mt. Selaya, Vulcan'
In this example, Kirk has two addresses: one 'home' and one 'work'. One of those two can (and should) be noted as a foreign key (like a cross-reference) in People in the primaryAddressTag column.
Spock has a single address with the tag 'other'. Since that is Spock's only address, the value 'other' ought to go in the primaryAddressTag column for pid=2.
This schema has the nice effect of preventing the same person from duplicating any of their own addresses by accidentally reusing tags while at the same time allowing all other people use any address tags they like.
Further, with FK references in primaryAddressTag, the database system itself will enforce the validity of the primary address tag (via something we database geeks call referential integrity) so that your -- or any -- application need not worry about it.
Why would complete normalization "be a mess"? This is exactly the kind of thing that normalization makes less messy.
Don't be afraid of normalizing your data. Normalization, like John mentions, is the solution not the problem. If you try to denormalize your data just to avoid a couple joins, then you're going to cause yourself serious trouble in the future. Trying to refactor this sort of data down the line after you have a reasonable size dataset WILL NOT BE FUN.
I strongly suggest you check out Highrise from 36 Signals. It was recently recommended to me when I was looking for an online contact manager. It does so much right. Actually, my only objection so far with the service is that I think the paid versions are too expensive -- that's all.
As things stand today, I do not fit into a flat address profile. I have 4-5 e-mail addresses that I use regularly, 5 phone numbers, 3 addresses, several websites and IM profiles, all of which I would include in my contact profile. If you're starting to build a contact management system now and you're unencumbered by architectural limitations (think gmail cantacts being keyed to a single email address), then do your users a favor and make your contact structure as flexible (normalized) as possible.
Cheers, -D.
I'm aware of SQLite, but that doesn't really help - I'm talking about figuring out the best schema (regardless of the database) for storing this data.
Per John, I don't see what the problem with a classic normalised schema would be. You haven't given much information to go on, but you say that there's a one-to-many relationship between users and addresses, so I'd plump for a bog standard solution with a foreign key to the user in the address relation.
If you assume each user has one or more addresses, a telephone number, etc., you could have a 'Users' table, an 'Addresses Table' (containing a primary key and then non-unique reference to Users), the same for phone numbers - allowing multiple rows with the same UserID foreign key, which would make querying 'all addresses for user X' quite simple.
I don't have a script, but I do have mySQL that you can use. Before that I should mentioned that there seem to be two logical approaches to storing vCards in SQL:
Store the whole card and let the database search, (possibly) huge text strings, and process them in another part of your code or even client side. e.g.
CREATE TABLE IF NOT EXISTS vcards (
name_or_letter varchar(250) NOT NULL,
vcard text NOT NULL,
timestamp timestamp default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
PRIMARY KEY (username)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Probably easy to implement, (depending on what you are doing with the data) though your searches are going to be slow if you have many entries.
If this is just for you then this might work, (if it is any good then it is never just for you.) You can then process the vCard client side or server side using some beautiful module that you share, (or someone else shared with you.)
I've watched vCard evolve and know that there is going to be
some change at /some/ time in the future so I use three tables.
The first is the card, (this mostly links back to my existing tables - if you don't need this then yours can be a cut down version).
The second are the card definitions, (which seem to be called profile in vCard speak).
The last is all the actual data for the cards.
Because I let DBIx::Class, (yes I'm one of those) do all of the database work this, (three tables) seems to work rather well for me,
(though obviously you can tighten up the types to match rfc2426 more closely,
but for the most part each piece of data is just a text string.)
The reason that I don't normalize out the address from the person is that I already have an
address table in my database and these three are just for non-user contact details.
CREATE TABLE `vCards` (
`card_id` int(255) unsigned NOT NULL AUTO_INCREMENT,
`card_peid` int(255) DEFAULT NULL COMMENT 'link back to user table',
`card_acid` int(255) DEFAULT NULL COMMENT 'link back to account table',
`card_language` varchar(5) DEFAULT NULL COMMENT 'en en_GB',
`card_encoding` varchar(32) DEFAULT 'UTF-8' COMMENT 'why use anything else?',
`card_created` datetime NOT NULL,
`card_updated` datetime NOT NULL,
PRIMARY KEY (`card_id`) )
ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='These are the contact cards'
create table vCard_profile (
vcprofile_id int(255) unsigned auto_increment NOT NULL,
vcprofile_version enum('rfc2426') DEFAULT "rfc2426" COMMENT "defaults to vCard 3.0",
vcprofile_feature char(16) COMMENT "FN to CATEGORIES",
vcprofile_type enum('text','bin') DEFAULT "text" COMMENT "if it is too large for vcd_value then user vcd_bin",
PRIMARY KEY (`vcprofile_id`)
) COMMENT "These are the valid types of card entry";
INSERT INTO vCard_profile VALUES('','rfc2426','FN','text'),('','rfc2426','N','text'),('','rfc2426','NICKNAME','text'),('','rfc2426','PHOTO','bin'),('','rfc2426','BDAY','text'),('','rfc2426','ADR','text'),('','rfc2426','LABEL','text'),('','rfc2426','TEL','text'),('','rfc2426','EMAIL','text'),('','rfc2426','MAILER','text'),('','rfc2426','TZ','text'),('','rfc2426','GEO','text'),('','rfc2426','TITLE','text'),('','rfc2426','ROLE','text'),('','rfc2426','LOGO','bin'),('','rfc2426','AGENT','text'),('','rfc2426','ORG','text'),('','rfc2426','CATEGORIES','text'),('','rfc2426','NOTE','text'),('','rfc2426','PRODID','text'),('','rfc2426','REV','text'),('','rfc2426','SORT-STRING','text'),('','rfc2426','SOUND','bin'),('','rfc2426','UID','text'),('','rfc2426','URL','text'),('','rfc2426','VERSION','text'),('','rfc2426','CLASS','text'),('','rfc2426','KEY','bin');
create table vCard_data (
vcd_id int(255) unsigned auto_increment NOT NULL,
vcd_card_id int(255) NOT NULL,
vcd_profile_id int(255) NOT NULL,
vcd_prof_detail varchar(255) COMMENT "work,home,preferred,order for e.g. multiple email addresses",
vcd_value varchar(255),
vcd_bin blob COMMENT "for when varchar(255) is too small",
PRIMARY KEY (`vcd_id`)
) COMMENT "The actual vCard data";
This isn't the best SQL but I hope that helps.