So I am interested in the "use percentage" of FKs in the industry. Do you use FKs in your projects, are there cases when you don't use them? I worked in two companies so far and one of them is quite a leader in the industry and they solve that part of "integrity" in their app layer, so I was interested what is the "ratio" of of DBs with and without FKs.
I don't have a use percentage for you, but I can tell you it's too low. Yes, I do use FK constraints where possible, and I design my databases to support their use. In cases where I didn't or couldn't use them, I frequently ended up having to repair data. Integrity in the app isn't good enough - the app usually isn't the only user of the database, there's often external conversion/integration apps, and don't forget the DBA who needs to be able to modify data and schema without breaking it. Even if the app were the only user, I'd rather let the DBMS handle integrity than complicate my code by reinventing that functionality.
Below is a link to a horror story of a company that did NOT bother to use foreign keys on (one of) its databases :
Find GUID in database
Related
After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?
I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.
There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.
Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.
As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.
In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.
Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.
So far I always enforce my DB with FK relationship. Things changed yesterday while mapping some classes with FluentNhibernate. My mapping didn't work and I discovered that's the issue was because of the order FN create the query.
Now a question arise: should I keep enforcing data with FK or it's better to avoid it since I focus on domain classes instead of sql queries?
Thanks
To my knowledge, it will be far better to keep your database consistent,
cause you may not be the only one who works on this DB in future,
and maybe someone else have access to the DB and do sth that could corrupt your data consistency
and as a result your application also doesn't behavior in the way you expect because of assummed conditions that no longer hold.
Letting Fluent/NH create your database during development is fine, but when it goes into production you really should check all the foreign keys, index's, etc etc and then only do scripted changes there on after.
Keep your database consistent, maintain referential integrity.
If a tool you are using breaks as a result there is bound to be a workaround. However if you lose referential integrity to use nhibernate - what happens if you decide to use a different ORM? You will have a dodgy database and who's to say that the next ORM in line will like that?
Its like a separation-of-concerns question, each chunk of your application should be designed to be robust enough to survive if another chunk is changed or removed - so don't change good database practice simply to make a product that is layered above it play nicely.
Using a domain-driven approach , or model oriented approach where the DB is merely seen as an 'implementation-detail', does not mean that you should ignore the integrity of your data.
I see no reason why you should drop foreign-key (and other) constraints from your database.
The database is more then just a storage for your data. It's task is also to guard the integrity of it.
It is perfectly possible to combine the 2 worlds (domain driven and relational database) with NHibernate. Make sure that the 2 areas focus on what they're best at. And, the database is best at storing data and making sure that the data remains valid / integer.
Currently we are using check constraints for business rules implementation, but I want to know if we should implement business rules in SQL or in the business logic layer (C#). I have searched on the net and found that check constraints are good to use.
Please let me know if someone knows more detailed information about it. One more thing is that the data can be pumped into my database using a mobile application and also using a web application.
YES it is good!
You should really always check your business rules both in your app code (in the business layer), but if ever possible also in your database.
Why? Imagine someone manages to submit some data to your database without using your app - if you have your checks only in the app, those checks are not being applied.
If you have your checks on the database as well, you can make sure the data in the database conforms to at least those simple checks that can be formulated in SQL CHECK CONSTRAINTS.
Definitely use those! You need to try and keep your data quality as high as possible - adding referential integrity, check constraints and unique constraints and so forth on the database helps you do that.
Do not rely on your app alone!
Yes, check constraints are a valid tool for business rules.
But are you sure you need to use check constraints, or use a supporting table with a foreign key relationship? If you find yourself defining similar check constraints in various places - the answer is yes, this should definitely be a supporting table.
Data integrity is key; there's not much value to a system that will allow a person to store something that is not per business rules if the application is circumvented. It also makes life a lot easier if the logic is in the database for situations where the original app is in C# and the higher-ups decided the market needs a Java/Ruby/Python/etc version.
You should definitely use CHECK constraints where possible, but I also wouldn't over do it. If there is no possibility of getting data into your database without using your applications, you can be safe with minimal CHECK constraints and heavy business validation.
It can be fairly difficult to define strict business rules in SQL. Stick to data validation in the database, and actual business rules in your application.
Also, try to arrange your schema in such a way that makes it difficult to enter bad data with foreign keys and the like.
As more "intelligent" is your database, more secure will be the integrity of the data it contains. So, yes, I think this is good and important to implement it.
This puts in lots of advantages: you can ensure that your data will be secure if there are more than one application modifying the data (ex: C# app + Web app + Mobile app ...) and it allow you to make less work in those "secundary" applications. If the database do all the work, apps are only a frontend for the database.
It will be easier in the future to migrate the applications, but will be more dificult to migrate the database. This is an important decision.
Depends on the constraints
It depends on the constraints
You should also try to avoid (if possible) having the same constraint checked in 2 places - this would imply there is duplicated code in your system, leading to unnecesary complexity.
There are some constraints that can and should be applied in the database, for example foreign key constraints and uniqueness. The database will be able to apply these quickly and efficently.
Other more complex "business" constraints are better applied in the business logic layer. Examples of these might be "customer must have a validated email address before allowing a purchase". These would be complicated and onerous to apply in the database - you'd run the risk of coding your system in SQL which is A Bad Idea.
C#. It's much easier to reuse logic in C# than SQL (in my experience) and generally maintain.
I understand the need to have referential integrity for limiting specific values on entry or possibly preventing them from removal upon a request of deletion. However, I am unclear as to a valid use case which would exclude this mechanism from always being used.
I guess this would fall into several sub-questions:
When is referential integrity not appropriate?
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
Thoughts?
When is referential integrity not appropriate?
Referential intergrity if typically not used on Data Warehouses where the data is a read only copy of a transactional datbase. Another example of when you'd not need RI is when you want to log information which includes row ids; maintaining referential integrity for a read-only log table is a waste of database overhead.
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Sometimes you care more about capturing data than data quality. Imagine you are aggregating a large amount of data from disparate systems which each in their own right suffer from data quality issues. Sometimes you are after the greater good of data quality and having everything in one place even with broken keys etc. represents a starting point for moving towards true data quality. It's not ideal, but it does happen as the beenfits could outweigh the tradeoffs.
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
Everything about systems development is centered around information security, and a key element of that is data integrity. The database structure should lean towards enforcing these things when possible, however you often are not dealing with modern database systems. Sometimes your data source is an old school AS400 with long-antiquated apps. Sometimes you have to build a data and business layer which provide for data integrity.
Just my thoughts.
The only case I have heard of is if you are going to load a vast amount of data into your database; in that case, it may make sense to turn referential integrity off, as long as you know for certain that the data is valid. Once your loading/migration is complete, referential integrity should be turned back on.
There are arguments about putting data validation rules in programming code vs. the database, and I think it depends on the use cases of your software. If a single application is the only path to the database, you could put validation into the program itself and probably be alright. But if several different programs are using the database at the same time (e.g. your application and your friend's application), you'll want business rules in the database so that your data is always valid.
By 'validation rules', I am talking about rules such as 'items in cart > 0'. You may or may not want validation rules. But I think that primary/foreign keys are always important (or you could find later on that you wish you had them). I think they are required if you want to do replication at some point.
When is referential integrity not appropriate?
Sometimes when you are copying lots
of records in bulk, or restoring
data from some sort of backup, it is
convenient to temporarily turn off
the constraints of referential
integrity.
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Duplicating data in this way goes
against the concept of
normalization. There are are
advantages and disadvantages to this
approach.
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
I would consider it a schema design
decision. Think about the best way
to model your problem in relational
terms. Use the database in the way it
was intended.
Referential integrity would always be appropriate if it didn't come at the cost of performance, scalability, and/or other features.
In some applications, referential integrity may be traded for something more important than the quality of the data.
Never, though a few people in the NoSQL, the multi-value, and oo-db realms will feel differently. Don't listen to them, they're wrong.
Yes. For example, if a vehicle is identified uniquely as (lotid,vin) then lotid is a foreign key to the lot table. If you want to find all pictures for a lot you can join the vehicle_pictures table right to the lot table, by using a subset of the vehicle_pictures key (lotid in (lotid,vin)). Or, am I not understanding you?
Schema, interface comes second. If the schema is bad, having a nice interface is not a long term goal.
does setting up proper relationships in a database help with anything else other than data integrity?
do they improve or hinder performance?
As long as you have the obvious indexes in place corresponding to the foreign keys, there should be no perceptible negative effect on performance. It's one of the more foolproof database features you have to work with.
I'd have to say that proper relationships will help people to understand the data (or the intention of the data) better than if omitting them, especially as the overall cost is quite low in maintaining them.
Their presence doesn't hinder performance except in terms of architecture (as others have pointed out, data integrity will occasionally cause foreign key violations which may have some effect) but IMHO is outweighed by the many benefits (if used correctly).
I know you weren't asking whether to use FKs or not, but I thought I'd just add a couple of viewpoints about why to use them (and have to deal with the consequences):
There are other considerations too, such as if you ever plan to use an ORM (perhaps later on) you'll require foreign keys. They can also be very helpful for ETL/Data Import and Export and later for reporting and data warehousing.
It's also helpful if other applications will make use of the schema - since Foreign Keys implement a basic business logic. So your application (and any others) only need to be aware of the relationships (and honour them). It'll keep the data consistent and most likely reduce the number of data errors in any consuming applications.
Lastly, it gives you a pretty decent hint as to where to put indexes - since it's likely you'll lookup table data by an FK value.
It neither helps nor hurts performance in any significant way. The only hindrance is the check for integrity when inserting/updating/deleting.
Foreign keys are an important part of database design because they ensure consistency. You should use them because it offers the lowest level of protection against data screw ups that can wreck your applications. Another benefit is that database tools (visualization/analysis/code generation) use foreign keys to relate data.
Do relationships in databases improve or hinder performance?
Like any tool in your toolbox, the results you'll get depend on how you use it. Properly specified relationships and a well-designed logical database can be an enormous boon to performance -- consider the difference between searching through normalized and denormalized data, for example.
Depending on your database engine, relationships defined through foreign key constraints can benefit performance. The constraint allows the engine to make certain assumptions about the existence of data in tables on the parent side of the key.
A brief explanation for MS SQL Server can be found at http://www.microsoft.com/technet/abouttn/flash/tips/tips_122104.mspx. I don't know about other engines, but the concept would make sense in other platforms.
Relationships in the data exist whether you declare them or not. Declaring and enforcing the relationships via FK constraints will prevent certain kinds of errors in the data, at a small cost of checking data when inserts/updates/deletes occur.
Declaring cascading deletes via relationships helps prevent certain kinds of errors when deleting data.
Knowing the relationships helps to make flexible and correct use of the data when forming queries.
Designing the tables well can make the relationships more obvious and more useful. Using relationships in the data is the primary power behind using relational databases in the first place.
About impact on performance: In my experience with MS Access 2003, if you have a multi-user application and use Relationships to enforce a lot of referential integrity, you can take a big hit in terms of response time for the end-user.
There are different ways to take care of enforcing referential integrity. I decided to take out some rules in Relationships, build more enforcement into the front-end and live with some loss of RI. Of course in the multi-user environment, you want to be very careful with that bit of liberty.
In my experience building performance-sensitive databases, Foreign Keys hurt performance pretty significantly, since they have to be checked every time the referring record is inserted/updated or master record is deleted. If you need a proof, just look at the execution plan.
I still keep them for documentation and for tools to use but I usually disable them, especially in high-performance systems where access to DB is only through the application layer.