How many foreign keys is too many? - sql

After running across this article:
http://diovo.com/2008/08/are-foreign-keys-really-necessary-in-a-database-design/
It seems like a good idea to use foreign keys when designing a database. But when are you using too many?
For example, suppose I have a main table used to store a list of machinery part information that other programs make reference to with the following columns:
ID
Name
Colour
Price
Measurement Units
Category
etc...
Should I be making tables containing a list of all possible colours, units and categories and then setting these as foreign keys to the corresponding columns in my machine part info table? At what point would the benefit of using foreign keys out weight the fact that I'm making all these extra tables and relationships?

Any attribute for which you want to be able to state, with certainty, that there are only known, valid values present in the database should be protected with a foreign key. Otherwise, you can only hope to catch invalid values in your application code and whatever interfaces are created in the future.
It is NOT a bad thing to have more tables and relationships. The only issue -- and it usually is not one -- has to do with the overhead of maintaining the indexes that are used in enforcing those relationships. Until you experience performance issues you should create a foreign key relationship for every column that "should" have one (because the values need to be validated against a list).
The performance considerations would have to be pretty dire before I would be willing to sacrifice correctness for performance.

Every Design is a compromise of competing goals, so there are very few simple answers (except the wrong ones).
I would certainly put discrete measures such as name, color, category, measure unit, etc.. in their own key tables. Variable measures (cost, number of units ,etc..) not so much, unless you have units in standard size packages (i.e. only 1, 6, 12, etc..)

The simplest way to design a database is to start with the requirements. In one classical methodology, the requirements are summarized in an ER (Entity-Relationship) model. In this model, relationships between entities are not invented, they are discovered. If they lie within the scope of the information the database is supposed to cover, then they are part of the model. Period.
From there, when you turn to database design, you already know what relationships you need. You have a few decisions to make about the structure of your tables, but almost all the foreign keys that reference a primary key are a direct consequence of the requirements.
Of course, if you are at liberty to change the requirements as you go through the design process, then you can do anything you want.

Dimensional modelling covers all points of your question well. Having too many foreign key relationships can make query performance suffer. Kimball's Group Reader is a great introduction to Dimensional Design and how to translate customer requirements to a schema.
http://en.wikipedia.org/wiki/Dimensional_modeling
The main question to ask is 'how constrained does the data need to be?' Concerning the color of machine parts, I'd assume, it would be in everyone's best interest not to have burnt ciena and camomille as color options. So, a look up table for these would be best.

Related

Database table, Should all tables require a foreign key from the user table without over normalising

I am trying to design a database for a garage management system. And I am struggling with over normalising, Would it be best to have User_ID as a foreign key within all tables to make querying faster or does this design make sense?
Any help is appreciated
As for ORM, your design is absolutely correct. There is no redundancy and data is easily reached by transitivity. Nevertheless, sometimes, development urges for simplification and better performance. Performance must prevail over design.
I would not recommend adding user_id to the other tables. That just seems like a maintenance nightmare -- ensuring that the user_id is consistent across the tables.
Instead, you might want to borrow a technique which is associated with dimensional modeling (which is described in Wikipedia). That is flattening dimensions.
This means that you would have one table apart from users at the most detailed level. All the information about spaces, zones, boxes, and items would be in the same table.
This works assuming that the dimension values do not really change over time -- think time dimensions or geographic dimensions.

need help answering question about sql data integrity

The question is:
For your final designed database, find a scenario in which a relatively prominent business data
integrity can not be ensured by your current primary keys and foreign keys, nor by adding directly
more of such keys or check clauses in the created tables.
In other words, the data integrity ensured by
the keys within the database may not be enough to ensure all the data integrity within the business
context.
Write a SQL statement that will determine if such a problem exists or not, and where, for any
given state of the database.
I am not too sure what this question is asking or how to approach it.
Need help writing a sql code for this question.
I think the question is asking you to define some business logic that cannot be encoded in the database. However, it then wants you to find conflicts that could occur because the business logic is not encoded in the database. This second part seems to be in conflict with the first, but not necessarily.
An example based on your previous assignment would be if a coach is suddenly sick and there are too few additional coaches to cover the booked clients, or some coaches are not qualified to replace the sick coach, or had previous conflicts with certain clients and therefore can't be assigned to those clients. Therefore, some training bookings must be cancelled.
The decision on which are best to cancel may be difficult or impossible to code in SQL, but you can use SQL to verify that all of the sick coach's slots have either been filled by others or cancelled after the external business logic is applied.
EDIT: I think the above scenario fits the question's requirement that you can't find the conflicts (such as clients that don't like certain coaches) in your existing foreign key relationships, but you can verify that the external logic is consistent with the final requirements (all slots accounted for).
Perhaps a better example is the traveling salesman problem: It is difficult to code the least cost routing in SQL, but it's easy to verify that all cities have been visited.
The scenarios where every row can have variable number of attributes and each attribute can have variable number of datatypes, is generally modeled using EAV datamodel. EAV wikipedia.
Here, attribute can have variety of values and so, we cannot enforce check constraint always. In some scenarios, if attributes finite list is not available, we cannot have foreign key for attributeID.
This datamodel is popular in the medical history datamodel, where every patient can have different kinds of symptoms.
May be this can be an example for a scenario, where data integrity cannot be completely enforced.

Is it acceptable to have 2 one to many relationships between 2 tables?

I have a database for utility structures (poles, towers) with wire between them. Each span of wire is connected to two structures. I need to define the relationship between the structures and the spans.
I came up with a relationship between structure A to a "from" foreign key field in the span table and another between structure B and a "To" foreign key field in the span table. Is this acceptable database design?
Here's a diagram:
Your model indicates that a span is related to exactly 2 structures. So I think you're good, there. My only comment is that with the names, "fkFromStructureID" and, "fkToStructureID" you imply an orientation that may not exist. It's nit-picky, I admit. But if you do not have a concept of "direction" in your real-world model, you might want to consider different names. If orientation does exist, then this works fine.
Note: If you plan to be able to "walk" the chain of spans and structures using these FK fields, then you'll want to be certain that you manage your orientation on the way into the database. One span that has the From and To backwards will ruin the ability to walk the chain.
Yes, this is quite a common requirement when designing databases and your approach is probably the best way of fulfilling this requirement.

Database Design Without Foreign Keys

After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?
I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.
There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.
Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.
As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.
In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.
Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.

What are common pitfalls when developing a new SQL database?

I know, I quite dislike the catch-all survey type questions, but I couldn't think of a better way to find out what I think I need to know. I'm very green in the world of database development, having only worked on a small number of projects that merely interacted with the database rather than having to actually create a new one from scratch. However, things change and now I am faced with creating my own database.
So far, I have created the tables I need and added the columns that I think I need, including any link tables for many-many relationships and columns for one-to-many relationships. I have some specific questions on this, but I felt that rather than get just these answered, it would make more sense to ask about things I may not even know, which I should address now rather than 6 months from now when we have a populated database and client tools using it.
First the questions on my database which have led me to realise I don't know enough:
How do I ensure my many-to-many link tables and my one-to-many columns are up-to-date when changes are made to the referenced tables? What problems may I encounter?
I am using nvarchar(n) and nvarchar(MAX) for various text fields. Should I use varchar equivalents instead (I had read there may be performance risks in using nvarchar)? Are there any other gotchas regarding the selection of datatypes besides being wary of using fixed length char arrays to hold variable length information? Any rules on how to select the appropriate datatype?
I use int for the ID column of each table, which is my primary key in all but the link tables (where I have two primary keys, the IDs of the referenced table rows). This ID is set as the identity. Are there pitfalls to this approach?
I have created metadata tables for things like unit types and statuses, but I don't know if this was the correct thing to do or not. Should you create new tables for things like enumerated lists or is there a better way?
I understand that databases are complex and the subject of many many worthy tomes, but I suspect many of you have some tips and tricks to augment such reading material (though tips for essential reading would also be welcome).
Community wiki'd due to the rather subjective nature of these kinds of posts. Apologies if this is a duplicate, I've conducted a number of searches for something like this but couldn't find any, though this one is certainly related. Thanks.
Update
I just found this question, which is very similar in a roundabout way.
Not normalising
Not using normalisation
Trying to implement a denormalised schema from the start
Seriously:
Foreign keys will disallow deletes or updates from the parent tables. Or they can be cascaded.
Small as possible: 2 recent SO questions datatypes and (n)varchar
May not be portable and your "natural key" (say "product name") still needs a unique constraint. Otherwise no, but remember that an IDENTITY column is a "surrogate key"
Edit: Say you expect to store fruit with columns FruitID and FruitName. You have no way to restrict to one occurence of "Apple" or "Orange" because although this is your "natural key", you are using a surrogate key (FruitID). So, to maintain integrity, you need a unique constraint on FruitName
Not sure or your meaning, sorry. Edit: Don't do it. Ye olde "One true lookup table" idea.
I'll reply to your subjective query with some vague generalities. :)
The most common pitfall of designing a database is the same pitfall of any programming solution, not fully understanding the problem being solved. In the case of a database, it is understanding the nature of the data. How big it is, how it comes and goes, what business rules must it adhere to.
Here are some questions to ponder.
What is updated the most frequently? Is keeping that table write-locked going to lock up queries? Will it become a hot spot? Even a seemingly well normalized schema can be a poor performer if you don't understand your read versus write ratios.
What are your external interface needs? I've been on projects where the dotted line to "that other system" nearly scuttled the whole project because implementing it was delayed until everything else was in place, that is to say, everything else was inflexible.
Any other unspoken requirements? My favorite is date sensitivity. All the data is there, your reports are beautiful, the boss looks them over and asks, when did that datum change? Who did it and when? Is the database supposed to track itself and its users, or just the data? Will your front end do it for you?
Just some things to think about.
It does sounds like you've got a good grasp on what you're meant to be doing, and indeed there isn't "one true path" to doing databases.
Have you set up cascades for your hierarchical objects (i.e., a single delete at the 'head' of your object in the database will delete all entries in tables relating to that entry)?
Your link tables and 1:n columns should be foreign keys, so there isn't much to worry about if the data changes. By "two primary keys" here, did you mean indexes?
As for metadata tables, I've done them in the past, and I've not done them. A single char status with SQL comment can suffice for a limited set of statuses, but beyond a certain amount, or where you can think of adding more in the future, you might want to have a reference to another table of metadata, or maybe a char(8ish). E.g., I've seen user tables have "NORMAL", "ADMIN", "SUPER", "GUEST", etc for user type, which could have been 1,2,3,4,5 fkeys to a "UserType" table, but with such a restricted enumeration does it matter? Other people have a table of permissions (booleans of what a user can do) instead - many ways to skin a cat.
You might find some usable stuff in these slides:
[http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back][1]
I also am a beginner to database design, but I found this online tutorial very, very helpful:
Database design with UML and SQL, 3rd edition
The author explains all the fundamental design aspects of database, and in a very clear manner. Before I found this online guide I did a lot of wikipedia reading about normalization. While that helped, this author explains the exact same stuff (through 3rd normal form, at least) but in a much, much easier to read way. It pretty much addresses all your questions as well.
I'd suggest a good book. The best IMO is this:
http://www.amazon.com/Server-2005-Database-Design-Optimization/dp/1590595297/ref=ntt_at_ep_dpt_1
In addition to not normalizing, a common problem I see is overindexing, done before there are performance measurements that take into account your in-production mix of reads vs. writes.
It's really, really easy to add an index to speed up a query, and harder to figure out which one to remove when you have several that are getting updated during an INSERT or UPDATE.
The middle ground is to go after obvious secondary indexes (e.g., for common, frequent lookups by name on large tables), deferring other candidate indexes until you have reasonable performance tests in place.
Among other things, not using primary keys, not thinking ahead about whether you'll be using indexed views (and designing tables accordingly; I once had to drop and recreate a large table at my site to change its ANSI_NULL attribute to ON so that I could then use it with an indexed view), using indices.