db design critic /suggestions - sql

I am redesigning a pharmacy db system and need inputs to see if the new design is optimal or requires tweaking.
Here's a snapshot of the old system..
As can be seen, the pharmacies table stores pharmacy information, along with its address and contact information. Pharmacies are grouped together for invoicing purposes(pharmacygroup) or for sales, advertsing other purposes (banner group). The invoice group may have a different physical address, different contact information.
Here's my new design. I have split the address from both the pharmacy and pharmacygroup table into a table of its own and made a new table for contacts. Their could be technical contacts, account contacts, owner contacts etc, hence the contacttypes table. The pharmacy and the pharmacygroup can have separate contact info, I thought of making a single contact table and have a 'linktype' and 'linkid' column to indicate if its a pharmacy contact or pharmacy group contact, but I am not sure if this is a right approach. Is this a good design or will it be costly in terms of data retrieval because of the number of joins?? Another thing I noted that , is in the old design , they didn't create any foreign key constraints, although the pharmacy table had groupid and bannergroupid references for pharmacygroup and bannergroup, possibly to save time for data retrieval. Is this a good approach?

Your design looks good to me. I always prefer to have a couple of extra joins on the design step over spending time reorganizing data after system went into production. You never know in advance what kind of reports will be requested by management/sales/financial people, and proper relational design will give you more freedom.
Also, you cannot blame only a couple of extra JOINs for your performance issues. You should always look at:
data volumes (and physical data layout),
transaction amount and density,
I/O, CPU, memory usage,
your RDBMS configuration,
SQL queries quality.
In my view, JOINs will be on the bottom of this list.
As to the RI constraints (Referential Integrity), I've seen a couple of projects that had been running without any Primary/Foreign keys for increased performance. The main excuse was: we have all checks embedded into the Application and Application is the only source of any changes in the system. On the other hand, they agreed, that it is not known, whether systems were in a consistent state (in fact, analysis showed they were not).
I always stick to creating all possible keys/constraints on the design state, as there always will be some “cowboys” around, who will dig into your database and “adjust” data they seem fits better. Still, you might want to temporarily disable or even drop some constraints/indexes for the bulk data manipulations, which is also an official recommendation.
If uncertain, create 2 test databases, one with and another without constraints. Load some data and compare query performance. I think it will be similar.
And here my comments on your sketches, decisions are all yours.
You might want to create a common contacts table the same way you did for addresses, i.e. add contact_id, owner_contact_id, etc. columns to the target relations instead of referencing relations from contacts table;
As you have only one column in contacttype table (and in case you'll have a common contacts), it's better to move the only field away and avoid this table;
You seem to have mixture of singular/plural names for your tables, better to stick to a common pattern here. I personally prefer singular;
In pharmacygroup your PK is named id, while all the rest PKs follow tableid pattern, it will be easier to write scripts later if you'll use a common pattern here;
In addresses table you have fields with underscores, like street_name, while elsewhere you avoid _ — consider making it common;
References are named differently. Although it is not so highly important, I do have a couple of systems where I have to rely on the constraints' names, so it's better to use some pattern here. I use the following one:
prefix p_, f_, c_, t_, u_ or i_ for primary, foreign keys, check constraints, triggers, unique and other indexes;
name of the table;
name of the column constraint/index/trigger refers to.
Why I prefer naming tables in singular form? Because I always name PK using table_id pattern, and IMHO pharmacy_id looks better then pharmacies_id. I use this approach as I have a bunch of general-purpose scripts which relies on this pattern when performing data consistency checks prior to loading it into the main tables.
EDIT:
More on contacts.
You can use contact_id in all your tables, making it a primary contact, whatever this might mean in your application. Should you need more contacts to be there for some relations, then you can go with different prefixes, like owner_contact_id, sales_contact_id, etc.
In case you expect a huge number of contacts to be there for some relations, like pharmacygroup, then you will can add an extra table like this:
CREATE TABLE pharmacygroupcontact (
contactid int4,
groupid int4,
contact_desc text
);
It partially copies your initial groupcontacts, but consists of two FKs and a description.
Which approach is better I cannot tell as I'm not aware how Application is designed.

You have 2 contact tables, I would create one, then use linking tables to link groupcontacts and pharmacycontacts. I would definitely want to have the FK and PK relationships setup to.

Related

Should I use one-to-one relationships to avoid repetition even if the new table wasn't really needed?

I'm designing a database from scratch and I'm wondering if the way I'm using one-to-one relationships is correct.
Imagine I have a table that needs the columns city and country_id, the first being alphanumeric and the second being a foreign key to another table. Should I place these in a locations table and use a one-to-one relationship?
Another example:
I have a table with the factory information of a device like the serial number and other fields. These will later be used to register a device in another table. Of course this is a one-to-one relationship, but should the columns of the first table be in the second table instead? Have in mind that the registrations table has another 4-5 columns.
I've read a lot of times that these relationships can often be omitted. However, I like the separation of concerns that creating a new table can give, in some cases.
Thanks in advance!
This may be a duplicate question, e.g. see:
SQL one to one relationship vs. single table
There's no perfect answer to this question as it depends on use case, but here's the rule of thumb I recommend abiding by: if you can already envision the potential need for separate tables, then I would err on the side of splitting them and using a 1:1 relationship. For example, imagine in the future that you want to have some kind of one-to-many relationship between some new table and the country table, or between some new table and the device table. In these cases you probably don't want city information mixed up with the former, and you probably don't want device registration information mixed up with the latter. By keeping your DB schema normalized, you can better future-proof it and you can mitigate the need for (likely extremely painful) updates that may have otherwise cropped up.

SQL one to one relationship vs. single table

Consider a data structure such as the below where the user has a small number of fixed settings.
User
[Id] INT IDENTITY NOT NULL,
[Name] NVARCHAR(MAX) NOT NULL,
[Email] VNARCHAR(2034) NOT NULL
UserSettings
[SettingA],
[SettingB],
[SettingC]
Is it considered correct to move the user's settings into a separate table, thereby creating a one-to-one relationship with the users table? Does this offer any real advantage over storing it in the same row as the user (the obvious disadvantage being performance).
You would normally split tables into two or more 1:1 related tables when the table gets very wide (i.e. has many columns). It is hard for programmers to have to deal with tables with too many columns. For big companies such tables can easily have more than 100 columns.
So imagine a product table. There is a selling price and maybe another price which was used for calculation and estimation only. Wouldn't it be good to have two tables, one for the real values and one for the planning phase? So a programmer would never confuse the two prices. Or take logistic settings for the product. You want to insert into the products table, but with all these logistic attributes in it, do you need to set some of these? If it were two tables, you would insert into the product table, and another programmer responsible for logistics data would care about the logistic table. No more confusion.
Another thing with many-column tables is that a full table scan is of course slower for a table with 150 columns than for a table with just half of this or less.
A last point is access rights. With separate tables you can grant different rights on the product's main table and the product's logistic table.
So all in all, it is rather rare to see 1:1 relations, but they can give a clearer view on data and even help with performance issues and data access.
EDIT: I'm taking Mike Sherrill's advice and (hopefully) clarify the thing about normalization.
Normalization is mainly about avoiding redundancy and relateded lack of consistence. The decision whether to hold data in only one table or more 1:1 related tables has nothing to do with this. You can decide to split a user table in one table for personal information like first and last name and another for his school, graduation and job. Both tables would stay in the normal form as the original table, because there is no data more or less redundant than before. The only column used twice would be the user id, but this is not redundant, because it is needed in both tables to identify a record.
So asking "Is it considered correct to normalize the settings into a separate table?" is not a valid question, because you don't normalize anything by putting data into a 1:1 related separate table.
Creating a new table with 1-1 relationships is not a reasonable solution. You might need to do it sometimes, but there would typically be no reason to have two tables where the user id is the primary key.
On the other hand, splitting the settings into a separate table with one row per user/setting combination might be a very good idea. This would be a three-table solution. One for users, one for all possible settings, and one for the junction table between them.
The junction table can be quite useful. For instance, it might contain the effective and end dates of the setting.
However, this assumes that the settings are "similar" to each other, in a SQL sense. If the settings are different such as:
Preferred location as latitude/longitude
Preferred time of day to receive an email
Flag to be excluded from certain contacts
Then you have a data-type problem when storing them in a table. So, the answer is "it depends". A lot of the answer depends on what the settings look like, how they will be used, and the type of constraints on them.
You're all wrong :) Just kidding.
On a very high load, high volume, heavily updated system splitting a table by 1:1 helps optimize I/O.
For example, this way you can place heavily read columns onto separate physical hard-drives to speed-up parallel reads (the 1-1 tables have to be in different "filegroups" for this). Or you can optimize table-level locks. Etc. Etc.
But this type of optimization usually does not happen until you have millions of rows and huge read/write concurrency
Splitting tables into distinct tables with 1:1 relationships between them is usually not practiced, because :
If the relationship is really 1:1, then integrity enforcement boils down to "inserts being done in all concerned tables, or none at all". Achieving this on the server side requires systems that support deferred constraint checking, and AFAIK that's a feature of the rather high-end systems. So in many cases the 1:1 enforcement is pushed over to the application side, and that approach has its own obvious downsides.
A case when splitting tables is nonetheless advisable, is when there are security perspectives, i.e. when not all columns can be updated by one user. But note that by definition, in such cases the relationship between the tables can never be strictly 1:1 .
(I also suggest you read carefully the discussion between Thorsten/Mike. You used the word 'normalization' but normalization has very little to do with your scenario - except if you were considering 6NF, which I think is rather unlikely.)
It makes more sense that your settings are not only in a separate table, but also use a on-to-many relationship between the ID and Settings. This way, you could potentially have a as many (or as few) setting as required.
UserSettings
[Settings_ID]
[User_ID]
[Settings]
In fact, one could make the same argument for the [Email] field.

Should I include user_id in multiple tables?

I'm at the planning stages of a multi-user application where each user will only have access their own data. There'll be a few tables that relate to each other, so I could use JOINs to ensure they're accessing only their data, but should I include user_id in each table? Would this be faster? It would certainly make some of the queries easier in the long run.
Specifically, the question is about multiple tables containing the user_id field.
For example, each user can configure categories, items (in those categories), and sub-items against those items. There's a logical path from user, to sub-items through the other tables, but it would require 3 JOINs. Should I just include user_id in all the tables?
Thanks!
This is a design decision in multi-tenant databases. With "root" tables, obviously you have to have the user_id. But in the non-"root" tables, you do have a choice when you are using surrogate PKs.
Say you have users with projects and projects with actions. Projects obviously has to have a user_id, but if actions are tied to one and only one project, then the user_id is redundant, and also violates normal form, since if it was to move to another user's project (probably not likely in your use cases), both the project FK and the user FK would have to be updated. Typically in multi-tenant scenarios, this isn't really a possible scenario, and so the primary key of every table is really a combination of tenant and a unique primary key "within" the tenant (which may also happen to be globally unique).
If you use natural keys extensively in your design, then clearly tenant+natural key is necessary so that each tenant's natural keys can be used. It's only when using surrogates like IDENTITY or GUIDs or sequences, that this becomes an issue, since it is tempting to make the IDENTITY the PK, after all, it is unique by definition.
Having the user_id in all tables does allow you to do certain things in views to enhance security (defense in depth), giving you a little bit of defensive programming (in SQL Server you can restrict all access through inline table valued function - essentially parametrized views - which require the app to specify user_id on every "table" access), and also allows you to easily scale out to multiple databases by forklifting everything on shared keys.
See this article for some interesting insights.
(In a massively multi-parallel paradigm like Teradata, the PRIMARY INDEX determines the amp on which the data lives, so I would think that this is a must to stop redistribution of rows to the other amps.)
In general, I would say you have a tenantid in each table, it should be the first column in the table, in most indexes and should be part of the primary key in most cases, unless otherwise justified. Where possible, it should be a required parameter in most stored procedures.
Generally, you use foreign keys to relate data between tables. In many cases, this foreign key is the user id. For example:
users
id
name
phonenumbers
user_id
phonenumber
So yes, that'd make perfect sense.
If a category can only belong to one user then yes, you need to include the user_id in the category table. If a category can belong to multiple people then you would have a separate table that maps category IDs to user IDs. You can still do this if you have a one to one mapping between the two, but there is no real reason for it.
You don't need to include the user_id in further tables if you can guarantee that those child tables will always be accessed via joining to the category table. If there is a chance that you will access them independantly of the category table then you should also have the user_id on those tables.
The extent to which to normalize can be a difficult decision. One of the best StackOverflow answers on this topic (Database Development Mistakes Made by App Developers) warns against both (1) failing to normalize, and (2) over-normalizing.
You mention that it might be easier "in the long run" to repeat the same data in multiple tables (that is, not to normalize that data). Look at the "Not simplifying complex queries through views" topic in the previous link. If you use views effectively, you will only have to do the 3 join query once when writing the view and then you can use a query with no joins for most purposes.
Most developers tend to under-normalize because it seems simpler. Go ahead and normalize. Use views to simplify your daily queries. When your requiremens get more complex or you decide to add features, you will be glad that you put time into a relational database design.
Alternatively, depending on your toolset, you may want to use a database abstraction layer that does the relational design under the covers while you manipulate higher level data object.
if it is Oracle, then you would probably set up a fine grained security rule to do the joins and prevent certain activities based on the existence of the original user id... (SELECT INSERT UPDATE DELETE etc)
You would need a map between the logged in user and the user_id. You could use uid, but then remember this umber may change if the database is reconstructed after some disaster...

Hierarchical Database, multiple tables or column with parent id?

I need to store info about county, municipality and city in Norway in a mysql database. They are related in a hierarchical manner (a city belongs to a municipality which again belongs to a county).
Is it best to store this as three different tables and reference by foreign key, or should I store them in one table and relate them with a parent_id field?
What are the pros and cons of either solution? (both structural end efficiency wise)
If you've really got a limit of these three levels (county, municipality, city), I think you'll be happiest with three separate tables with foreign keys reaching up one level each. This will make queries almost trivial to write.
Using a single table with a parent_id field referencing the same table allows you to represent arbitrary tree structures, but makes querying to extract the full path from node to root an iterative process best handled in your application code.
The separate table solution will be much easier to use.
three different tables:
more efficient, if your application mostly accesses information about only one entity (county, municipality, city)
owner-member-relationship is a clear and elegant model ;)
County, Municipality, and City don't sound like they are the same kind of data ; so, I would use three different tables : one per data-type.
And, then, I would indeed use foreign keys between those.
Efficiency-speaking, not sure it'll change much :
you'll do joins on 3 tables instead of joining 3 times on the same table ; I suppose it's quite the same.
it might make a little difference when you need to work on only one of those three type of data ; but with the right indexes, the differences should be minimal.
But, structurally speaking, if those are three different kind of entities, it makes sense to use three different tables.
I would recommend for using three different tables as they are three different entities.
I would use only one table in those cases you don´t know the depth of the hierarchy, but it is not case.
I would put them in three different tables, just on the grounds that it is 3 different concepts. This will hamper speed and will complicate your queries. However given that MySQL does not have any special support for hirachical queries (like Oracle's connect by statement) these would be complicated anyway.
Different tables: it's just "right". I doubt you'll see any performance gains/losses either way but this is one where modelling it properly up-front will probably save you lots of headaches later on. For one thing it'll make SQL SELECTs easier to write and read.
You'll get different opinions coming back to you on this but my personal preference would be to have separate tables because they are separate entities.
In reality you need to think about the queries you will doing on this data and usually your answer will come from that. With separate tables your queries will look much cleaner and in the end your not saving yourself anything because you'll still be joining tables together, even if they are the same table.
I would use three separate tables, since you know exactly what categories of information you are working with, and won't need to dynamically alter the 'depth' of your hierarchy.
It'll also make the data simpler to manage, as you'll be able to tell if the data is for a city, municipality or a county just by knowing the table (and without having to discern the 'depth' of a record in the hierarchy first!).
Since you'll probably be doing self joins anyway to get the hierarchy to work, I'd doubt there would be any benefits from having all the data in a single table.
In dataware housing applications, adherents of the Kimball methodology might place these fields in the same attribute table:
create table city (
id int not null,
county varchar(50) not null,
municipality varchar(50),
city varchar(50),
primary key(id)
);
The idea being that attibutes should never be more than l join away from the fact table.
I just state this as an alternative view. I would go with the 3 table design personally.
This is a case of ‘Database Normalization’, which is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. The purpose is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Multiple tables will help in the situation if the task has been distributed among different developers, or users at different levels require different rights to view and change the data or the small tables help when you need this data for other purposes as well or so.
My vote would be for multiple tables - with data appropriately distributed.

Is there ever a time where using a database 1:1 relationship makes sense?

I was thinking the other day on normalization, and it occurred to me, I cannot think of a time where there should be a 1:1 relationship in a database.
Name:SSN? I'd have them in the same table.
PersonID:AddressID? Again, same table.
I can come up with a zillion examples of 1:many or many:many (with appropriate intermediate tables), but never a 1:1.
Am I missing something obvious?
A 1:1 relationship typically indicates that you have partitioned a larger entity for some reason. Often it is because of performance reasons in the physical schema, but it can happen in the logic side as well if a large chunk of the data is expected to be "unknown" at the same time (in which case you have a 1:0 or 1:1, but no more).
As an example of a logical partition: you have data about an employee, but there is a larger set of data that needs to be collected, if and only if they select to have health coverage. I would keep the demographic data regarding health coverage in a different table to both give easier security partitioning and to avoid hauling that data around in queries unrelated to insurance.
An example of a physical partition would be the same data being hosted on multiple servers. I may keep the health coverage demographic data in another state (where the HR office is, for example) and the primary database may only link to it via a linked server... avoiding replicating sensitive data to other locations, yet making it available for (assuming here rare) queries that need it.
Physical partitioning can be useful whenever you have queries that need consistent subsets of a larger entity.
One reason is database efficiency. Having a 1:1 relationship allows you to split up the fields which will be affected during a row/table lock. If table A has a ton of updates and table b has a ton of reads (or has a ton of updates from another application), then table A's locking won't affect what's going on in table B.
Others bring up a good point. Security can also be a good reason depending on how applications etc. are hitting the system. I would tend to take a different approach, but it can be an easy way of restricting access to certain data. It's really easy to just deny access to a certain table in a pinch.
My blog entry about it.
Sparseness. The data relationship may be technically 1:1, but corresponding rows don't have to exist for every row. So if you have twenty million rows and there's some set of values that only exists for 0.5% of them, the space savings are vast if you push those columns out into a table that can be sparsely populated.
Most of the highly-ranked answers give very useful database tuning and optimization reasons for 1:1 relationships, but I want to focus on nothing but "in the wild" examples where 1:1 relationships naturally occur.
Please note one important characteristic of the database implementation of most of these examples: no historical information is retained about the 1:1 relationship. That is, these relationships are 1:1 at any given point in time. If the database designer wants to record changes in the relationship participants over time, then the relationships become 1:M or M:M; they lose their 1:1 nature. With that understood, here goes:
"Is-A" or supertype/subtype or inheritance/classification relationships: This category is when one entity is a specific type of another entity. For example, there could be an Employee entity with attributes that apply to all employees, and then different entities to indicate specific types of employee with attributes unique to that employee type, e.g. Doctor, Accountant, Pilot, etc. This design avoids multiple nulls since many employees would not have the specialized attributes of a specific subtype. Other examples in this category could be Product as supertype, and ManufacturingProduct and MaintenanceSupply as subtypes; Animal as supertype and Dog and Cat as subtypes; etc. Note that whenever you try to map an object-oriented inheritance hierarchy into a relational database (such as in an object-relational model), this is the kind of relationship that represents such scenarios.
"Boss" relationships, such as manager, chairperson, president, etc., where an organizational unit can have only one boss, and one person can be boss of only one organizational unit. If those rules apply, then you have a 1:1 relationship, such as one manager of a department, one CEO of a company, etc. "Boss" relationships don't only apply to people. The same kind of relationship occurs if there is only one store as the headquarters of a company, or if only one city is the capital of a country, for example.
Some kinds of scarce resource allocation, e.g. one employee can be assigned only one company car at a time (e.g. one truck per trucker, one taxi per cab driver, etc.). A colleague gave me this example recently.
Marriage (at least in legal jurisdictions where polygamy is illegal): one person can be married to only one other person at a time. I got this example from a textbook that used this as an example of a 1:1 unary relationship when a company records marriages between its employees.
Matching reservations: when a unique reservation is made and then fulfilled as two separate entities. For example, a car rental system might record a reservation in one entity, and then an actual rental in a separate entity. Although such a situation could alternatively be designed as one entity, it might make sense to separate the entities since not all reservations are fulfilled, and not all rentals require reservations, and both situations are very common.
I repeat the caveat I made earlier that most of these are 1:1 relationships only if no historical information is recorded. So, if an employee changes their role in an organization, or a manager takes responsibility of a different department, or an employee is reassigned a vehicle, or someone is widowed and remarries, then the relationship participants can change. If the database does not store any previous history about these 1:1 relationships, then they remain legitimate 1:1 relationships. But if the database records historical information (such as adding start and end dates for each relationship), then they pretty much all turn into M:M relationships.
There are two notable exceptions to the historical note: First, some relationships change so rarely that historical information would normally not be stored. For example, most IS-A relationships (e.g. product type) are immutable; that is, they can never change. Thus, the historical record point is moot; these would always be implemented as natural 1:1 relationships. Second, the reservation-rental relationship store dates separately, since the reservation and the rental are independent events, each with their own dates. Since the entities have their own dates, rather than the 1:1 relationship itself having a start date, these would remain as 1:1 relationships even though historical information is stored.
Your question can be interpreted in several ways, because of the way you worded it. The responses show this.
There can definitely be 1:1 relationships between data items in the real world. No question about it. The "is a" relationship is generally one to one. A car is a vehicle.
One car is one vehicle. One vehicle might be one car. Some vehicles are trucks, in which case one vehicle is not a car. Several answers address this interpretation.
But I think what you really are asking is... when 1:1 relationships exist, should tables ever be split? In other words, should you ever have two tables that contain exactly the same keys? In practice, most of us analyze only primary keys, and not other candidate keys, but that question is slightly diferent.
Normalization rules for 1NF, 2NF, and 3NF never require decomposing (splitting) a table into two tables with the same primary key. I haven't worked out whether putting a schema in BCNF, 4NF, or 5NF can ever result in two tables with the same keys. Off the top of my head, I'm going to guess that the answer is no.
There is a level of normalization called 6NF. The normalization rule for 6NF can definitely result in two tables with the same primary key. 6NF has the advantage over 5NF that NULLS can be completely avoided. This is important to some, but not all, database designers. I've never bothered to put a schema into 6NF.
In 6NF missing data can be represent by an omitted row, instead of a row with a NULL in some column.
There are reasons other than normalization for splitting tables. Sometimes split tables result in better performance. With some database engines, you can get the same performance benefits by partitioning the table instead of actually splitting it. This can have the advantage of keeping the logical design easy to understand, while giving the database engine the tools needed to speed things up.
I use them primarily for a few reasons. One is significant difference in rate of data change. Some of my tables may have audit trails where I track previous versions of records, if I only care to track previous versions of 5 out of 10 columns splitting those 5 columns onto a separate table with an audit trail mechanism on it is more efficient. Also, I may have records (say for an accounting app) that are write only. You can not change the dollar amounts, or the account they were for, if you made a mistake then you need to make a corresponding record to write adjust off the incorrect record, then create a correction entry. I have constraints on the table enforcing the fact that they cannot be updated or deleted, but I may have a couple of attributes for that object that are malleable, those are kept in a separate table without the restriction on modification. Another time I do this is in medical record applications. There is data related to a visit that cannot be changed once it is signed off on, and other data related to a visit that can be changed after signoff. In that case I will split the data and put a trigger on the locked table rejecting updates to the locked table when signed off, but allowing updates to the data the doctor is not signing off on.
Another poster commented on 1:1 not being normalized, I would disagree with that in some situations, especially subtyping. Say I have an employee table and the primary key is their SSN (it's an example, let's save the debate on whether this is a good key or not for another thread). The employees can be of different types, say temporary or permanent and if they are permanent they have more fields to be filled out, like office phone number, which should only be not null if the type = 'Permanent'. In a 3rd normal form database the column should depend only on the key, meaning the employee, but it actually depends on employee and type, so a 1:1 relationship is perfectly normal, and desirable in this case. It also prevents overly sparse tables, if I have 10 columns that are normally filled, but 20 additional columns only for certain types.
The most common scenario I can think of is when you have BLOB's. Let's say you want to store large images in a database (typically, not the best way to store them, but sometimes the constraints make it more convenient). You would typically want the blob to be in a separate table to improve lookups of the non-blob data.
In terms of pure science, yes, they are useless.
In real databases it's sometimes useful to keep a rarely used field in a separate table: to speed up queries using this and only this field; to avoid locks, etc.
Rather than using views to restrict access to fields, it sometimes makes sense to keep restricted fields in a separate table to which only certain users have access.
I can also think of situations where you have an OO model in which you use inheritance, and the inheritance tree has to be persisted to the DB.
For instance, you have a class Bird and Fish which both inherit from Animal.
In your DB you could have an 'Animal' table, which contains the common fields of the Animal class, and the Animal table has a one-to-one relationship with the Bird table, and a one-to-one relationship with the Fish table.
In this case, you don't have to have one Animal table which contains a lot of nullable columns to hold the Bird and Fish-properties, where all columns that contain Fish-data are set to NULL when the record represents a bird.
Instead, you have a record in the Birds-table that has a one-to-one relationship with the record in the Animal table.
1-1 relationships are also necessary if you have too much information. There is a record size limitation on each record in the table. Sometimes tables are split in two (with the most commonly queried information in the main table) just so that the record size will not be too large. Databases are also more efficient in querying if the tables are narrow.
In SQL it is impossible to enforce a 1:1 relationship between two tables that is mandatory on both sides (unless the tables are read-only). For most practical purposes a "1:1" relationship in SQL really means 1:0|1.
The inability to support mandatory cardinality in referential constraints is one of SQL's serious limitations. "Deferrable" constraints don't really count because they are just a way of saying the constraint is not enforced some of the time.
It's also a way to extend a table which is already in production with less (perceived) risk than a "real" database change. Seeing a 1:1 relationship in a legacy system is often a good indicator that fields were added after the initial design.
Most of the time, designs are thought to be 1:1 until someone asks "well, why can't it be 1:many"? Divorcing the concepts from one another prematurely is done in anticipation of this common scenario. Person and Address don't bind so tightly. A lot of people have multiple addresses. And so on...
Usually two separate object spaces imply that one or both can be multiplied (x:many). If two objects were truly, truly 1:1, even philosophically, then it's more of an is-relationship. These two "objects" are actually parts of one whole object.
If you're using the data with one of the popular ORMs, you might want to break up a table into multiple tables to match your Object Hierarchy.
I have found that when I do a 1:1 relationship its totally for a systemic reason, not a relational reason.
For instance, I've found that putting the reserved aspects of a user in 1 table and putting the user editable fields of the user in a different table allows logically writing those rules about permissions on those fields much much easier.
But you are correct, in theory, 1:1 relationships are completely contrived, and are almost a phenomenon. However logically it allows the programs and optimizations abstracting the database easier.
extended information that is only needed in certain scenarios. in legacy applications and programming languages (such as RPG) where the programs are compiled over the tables (so if the table changes you have to recompile the program(s)). Tag along files can also be useful in cases where you have to worry about table size.
Most frequently it is more of a physical than logical construction. It is commonly used to vertically partition a table to take advantage of splitting I/O across physical devices or other query optimizations associated with segregating less frequently accessed data or data that needs to be kept more secure than the rest of the attributes on the same object (SSN, Salary, etc).
The only logical consideration that prescribes a 1-1 relationship is when certain attributes only apply to some of the entities. However, in most cases there is a better/more normalized way to model the data through entity extraction.
The best reason I can see for a 1:1 relationship is a SuperType SubType of database design. I created a Real Estate MLS data structure based on this model. There were five different data feeds; Residential, Commercial, MultiFamily, Hotels & Land.
I created a SuperType called property that contained data that was common to each of the five separate data feeds. This allowed for very fast "simple" searches across all datatypes.
I create five separate SubTypes that stored the unique data elements for each of the five data feeds. Each SuperType record had a 1:1 relationship to the appropriate SubType record.
If a customer wanted a detailed search they had to select a Super-Sub type for example PropertyResidential.
In my opinion a 1:1 relationship maps a class Inheritance on a RDBMS.
There is a table A that contains the common attributes, i.e. the partent class status
Each inherited class status is mapped on the RDBMS with a table B with a 1:1 relationship
to A table, containing the specialized attributes.
The table namend A contain also a "type" field that represents the "casting" functionality
Bye
Mario
You can create a one to one relationship table if there is any significant performance benefit. You can put the rarely used fields into separate table.
1:1 relationships don't really make sense if you're into normalization as anything that would be 1:1 would be kept in the same table.
In the real world though, it's often different. You may want to break your data up to match your applications interface.
Possibly if you have some kind of typed objects in your database.
Say in a table, T1, you have the columns C1, C2, C3… with a one to one relation. It's OK, it's in normalized form. Now say in a table T2, you have columns C1, C2, C3, … (the names may differ, but say the types and the role is the same) with a one to one relation too. It's OK for T2 for the same reasons as with T1.
In this case however, I see a fit for a separate table T3, holding C1, C2, C3… and a one to one relation from T1 to T3 and from T2 to T3. I even more see a fit if there exist another table, with which there already exist a one to multiple C1, C2, C3… say from table A to multiple rows in table B. Then, instead of T3, you use B, and have a one to one relation from T1 to B, the same for from T2 to B, and still the same one to multiple relation from A to B.
I believe normalization do not agree with this, and that may be an idea outside of it: identifying object types and move objects of a same type to their own storage pool, using a one to one relation from some tables, and a one to multiple relation from some other tables.
It is unnecessary great for security purposes but there better ways to perform security checks. Imagine, you create a key that can only open one door. If the key can open any other door, you should ring the alarm. In essence, you can have "CitizenTable" and "VotingTable". Citizen One vote for Candidate One which is stored in the Voting Table. If citizen one appear in the voting table again, then their should be an alarm. Be advice, this is a one to one relationship because we not refering to the candidate field, we are refering to the voting table and the citizen table.
Example:
Citizen Table
id = 1, citizen_name = "EvryBod"
id = 2, citizen_name = "Lesly"
id = 3, citizen_name = "Wasserman"
Candidate Table
id = 1, citizen_id = 1, candidate_name = "Bern Nie"
id = 2, citizen_id = 2, candidate_name = "Bern Nie"
id = 3, citizen_id = 3, candidate_name = "Hill Arry"
Then, if we see the voting table as so:
Voting Table
id = 1, citizen_id = 1, candidate_name = "Bern Nie"
id = 2, citizen_id = 2, candidate_name = "Bern Nie"
id = 3, citizen_id = 3, candidate_name = "Hill Arry"
id = 4, citizen_id = 3, candidate_name = "Hill Arry"
id = 5, citizen_id = 3, candidate_name = "Hill Arry"
We could say that citizen number 3 is a liar pants on fire who cheated Bern Nie. Just an example.
When you are dealing with a database from a third party product, then you probably don't want to alter their database as to prevent tight coupling. but you may have data that corresponds 1:1 with their data
Anywhere were two entirely independent entities share a one-to-one relationship. There must be lots of examples:
person <-> dentist (its 1:N, so its wrong!)
person <-> doctor (its 1:N, so it's also wrong!)
person <-> spouse (its 1:0|1, so its mostly wrong!)
EDIT: Yes, those were pretty bad examples, particularly if I was always looking for a 1:1, not a 0 or 1 on either side. I guess my brain was mis-firing :-)
So, I'll try again. It turns out, after a bit of thought, that the only way you can have two separate entities that must (as far as the software goes) be together all of the time is for them to exist together in higher categorization. Then, if and only if you fall into a lower decomposition, the things are and should be separate, but at the higher level they can't live without each other. Context, then is the key.
For a medical database you may want to store different information about specific regions of the body, keeping them as a separate entity. In that case, a patient has just one head, and they need to have it, or they are not a patient. (They also have one heart, and a number of other necessary single organs). If you're interested in tracking surgeries for example, then each region should be a unique separate entity.
In a production/inventory system, if you're tracking the assembly of vehicles, then you certainly want to watch the engine progress differently from the car body, yet there is a one to one relationship. A care must have an engine, and only one (or it wouldn't be a 'car' anymore). An engine belongs to only one car.
In each case you could produce the separate entities as one big record, but given the level of decomposition, that would be wrong. They are, in these specific contexts, truly independent entities, although they might not appear so at a higher level.
Paul.