Index on every Foreign Key?

Index on every Foreign Key? - sql

Does Index on every Foreign Key makes queries optimized ??

Typically it's considered good practice to place indexes on foreign keys. This is done b/c it helps with join performance when linking the FK table to the table that contains the definition of the key.
This doesn't magically make your entire query optimized, but it will definitely help to improve the join performance between the FK and it's Primary Key counter-part.

It might be a seen as a good practice to add an index on every foreign key, but you should be warned that if you have a large database, the more index you have, the more heavy you system will become. There are always extra maintenance and system resource cost required when adding an index.
I personally would add indexes only on the foreign keys that are used in queries that needs optimization. Be sure to keep your indexes up to date by occasionally running a profiler to monitor your system.

i did a little bit of testing on this, and i didn't find any performance enhancement, but SQLMenace will tell you otherwise. My opinion is to try it and see if it works for you.

Related

Database Design Without Foreign Keys

After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?

I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.

There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.

Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.

As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.

In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.

Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.

Primary Key Change Force Foreign Key Changes

I have an issue I am working with an existing SQL Server 2008 database: I need to occasionally change the primary key value for some existing records in a table. Unfortunately, there are about 30 other tables with foreign key references to this table.
What is the most elegant way to change a primary key and related foreign keys?
I am not in a situation where I can change the existing key structure, so this is not an option. Additionally, as the system is expanded, more tables will be related to this table, so maintainability is very important. I am looking for the most elegant and maintainable solution, and any help is greatly appreciated. I so far have thought about using Stored Procedures or Triggers, but I wanted some advice before heading in the wrong direction.
Thanks!

When you say "I am not in a situation where I can change the existing key structure" are you able to add the ON UPDATE CASCADE option to the foreign keys? That is the easiest way to handle this situation — no programming required.

As Larry said, On Update Cascade will work, however, it can cause major problems in a production database and most dbas are not too thrilled with letting you use it. For instance, suppose you have a customer who changes his company name (and that is the PK) and there are two million related records in various tables. On UPDATE Cascade will do all the updates in one transaction which could lock up your major tables for several hours. This is one reason why it is a very bad idea to have a PK that will need to be changed. A trigger would be just as bad and if incorrectly written, it could be much worse.
If you do the changes in a stored proc you can put each part in a separate transaction, so at least you aren't locking everything up. You can also update records in batches so that if you have a million records to update in a table, you can do them in smaller batches which will will run faster and have fewer locks. The best way to do this is to create a new record in the primary table with the new PK and then move the old records to the new one in batches and then delete the old record once all related records are moved. If you do this sort of thing, it is best to have audit tables so you can easily revert the data if there is a problem since you will want to do this in multiple transactions to avoid locking the whole database. Now this is harder to maintain, you have to remember to add to the proc when you add an FK (but you would have to remember to do on UPDATE CASCADE as well). On the other hand if it breaks due to a problem with a new FK, it is an easy fix, you know right what the problems is and can easily put a change to prod relatively quickly.
There are no easy solutions to this problem because the basic problem is poor design. You'll have to look over the pros and cons of all solutions (I would throw out the trigger idea as Cascade Update will perform better and be less subject to bugs) and decide what works best in your case. Remember data integrity and performance are critical to enterprise databases and may be more important than maintainability (heresy, I know).

If you have to update your primary key regularly then something is wrong there. :)
I think the simplest way to do it is add another column and make it the primary key. This would allow you to change the values easily and also related the foreign keys. Besides, I do not understand why you cannot change the existing key structure.
But, as you pointed in the question (and Larry Lustig commented) you cannot change the existing structure. But, I am afraid if it is a column which requires frequent updates then use of triggers could affect the performance adversely. And, you also say that as the system expands, more tables will be related to this table so maintainability is very important. But, a quick fix now will only worsen the problem.

Does introducing foreign keys to MySQL reduce performance

I'm building Ruby on Rails 2.3.5 app. By default, Ruby on Rails doesn't provide foreign key contraints so I have to do it manually. I was wondering if introducing foreign keys reduces query performance on the database side enough to make it not worth doing. Performance in this case is my first priority as I can check for data consistency with code. What is your recommendation in general? do you recommend using foreign keys? and how do you suggest I should measure this?

Assuming:
You are already using a storage engine that supports FKs (ie: InnoDB)
You already have indexes on the columns involved
Then I would guess that you'll get better performance by having MySQL enforce integrity. Enforcing referential integrity, is, after all, something that database engines are optimized to do. Writing your own code to manage integrity in Ruby is going to be slow in comparison.
If you need to move from MyISAM to InnoDB to get the FK functionality, you need to consider the tradeoffs in performance between the two engines.
If you don't already have indicies, you need to decide if you want them. Generally speaking, if you're doing more reads than writes, you want (need, even) the indicies.
Stacking an FK on top of stuff that is currently indexed should cause less of an overall performance hit than implementing those kinds of checks in your application code.

Generally speaking, more keys (foreign or otherwise) will reduce INSERT/UPDATE performance and increase SELECT performance.
The added benefit of data integrity, is likely just about always worth the small performance decrease that comes with adding your foreign keys. What good is a fast app if the data within it is junk (missing parts or etc)?
Found a similar query here: Does Foreign Key improve query performance?

You should define foreign keys. In general (though I do not know the specifics about mySQL), there is no effect on queries (and when there is an optimizer, like the Cost based optimizer in Oracle, it may even have a positive effects since the optimizer can rely on the foreign key information to choose better access plans).
As per the effect on insert and update, there may be an impact, but the benefits that you get (referential integrity and data consistency) far outweight the performance impact. Of course, you can design a system that will not perform at all, but the main reason will not be because you added the foreign keys. And the impact on maintaining your code when you decide to use some other language, or because the business rules have slightly changed, or because a new programmer joins your team, etc., is far more expensive than the performance impact.
My recommendation, then, is yes, go and define the foreign keys. Your end product will be more robust.

It is a good idea to use foreign keys because that assures you of data consistency ( you do not want orphan rows and other inconsistent data problems).
But at the same time adding a foreign key does introduce some performance hit. Assuming you are using INNODB as the storage engine, it uses clustered index for PK's where essentially data is stored along with the PK. For accessing data using secondary index requires a pass over the secondary index tree ( where nodes contain the PK) and then a second pass over the clustered index to actually fetch the data. So any DML on the parent table which involves the FK in question, will require two passes over the index in the child table. Ofcourse, the impact of the performance hit depends on the amount of data, your disk performance, your memory constraints ( data/index cached). So it is best to measure it with your target system in mind. I would say the best way to measure it is with your sample target data, or atleast some representative target data for your system. Then try to run some benchmarks with and without FK constraints. Write client side scripts which generate the same load in both cases.
Though, if you are manually checking for FK constraints, I would recommend that you leave it upto mysql and let mysql handle it.

Two points:
1. are you sure that checking integrity at the application level would be better in terms of performance?
2. run your own test - testing if FKs have positive or negative influence on performance should be almost trivial.

Is there a severe performance hit for using Foreign Keys in SQL Server?

I'm trying my best to persuade my boss into letting us use foreign keys in our databases - so far without luck.
He claims it costs a significant amount of performance, and says we'll just have jobs to cleanup the invalid references now and then.
Obviously this doesn't work in practice, and the database is flooded with invalid references.
Does anyone know of a comparison, benchmark or similar which proves there's no significant performance hit to using foreign keys? (Which I hope will convince him)

There is a tiny performance hit on inserts, updates and deletes because the FK has to be checked. For an individual record this would normally be so slight as to be unnoticeable unless you start having a ridiculous number of FKs associated to the table (Clearly it takes longer to check 100 other tables than 2). This is a good thing not a bad thing as databases without integrity are untrustworthy and thus useless. You should not trade integrity for speed. That performance hit is usually offset by the better ability to optimize execution plans.
We have a medium sized database with around 9 million records and FKs everywhere they should be and rarely notice a performance hit (except on one badly designed table that has well over 100 foreign keys, it is a bit slow to delete records from this as all must be checked). Almost every dba I know of who deals with large, terabyte sized databases and a true need for high performance on large data sets insists on foreign key constraints because integrity is key to any database. If the people with terabyte-sized databases can afford the very small performance hit, then so can you.
FKs are not automatically indexed and if they are not indexed this can cause performance problems.
Honestly, I'd take a copy of your database, add properly indexed FKs and show the time difference to insert, delete, update and select from those tables in comparision with the same from your database without the FKs. Show that you won't be causing a performance hit. Then show the results of queries that show orphaned records that no longer have meaning because the PK they are related to no longer exists. It is especially effective to show this for tables which contain financial information ("We have 2700 orders that we can't associate with a customer" will make management sit up and take notice).

From Microsoft Patterns and Practices: Chapter 14 Improving SQL Server Performance:
When primary and foreign keys are
defined as constraints in the database
schema, the server can use that
information to create optimal
execution plans.

This is more of a political issue than a technical one. If your project management doesn't see any value in maintaining the integrity of your data, you need to be on a different project.
If your boss doesn't already know or care that you have thousands of invalid references, he isn't going to start caring just because you tell him about it. I sympathize with the other posters here who are trying to urge you to do the "right thing" by fighting the good fight, but I've tried it many times before and in actual practice it doesn't work. The story of David and Goliath makes good reading, but in real life it's a losing proposition.

It is OK to be concerned about performance, but making paranoid decisions is not.
You can easily write benchmark code to show results yourself, but first you'll need to find out what performance your boss is concerned about and detail exactly those metrics.
As far as the invalid references ar concerned, if you don't allow nulls on your foreign keys, you won't get invalid references. The database will esception if you try to assign an invalid foreign key that does not exist. If you need "nulls", assign a key to be "UNDEFINED" or something like that, and make that the default key.
Finally, explain database normalisation issues to your boss, because I think you will quickly find that this issue will be more of a problem than foreign key performance ever will.

Does anyone know of a comparison, benchmark or similar which proves there's no significant performance hit to using foreign keys ? (Which I hope will convince him)
I think you're going about this the wrong way. Benchmarks never convince anyone.
What you should do, is first uncover the problems that result from not using foreign key constraints. Try to quantify how much work it costs to "clean out invalid references". In addition, try and gauge how many errors result in the business process because of these errors. If you can attach a dollar amount to that - even better.
Now for a benchmark - you should try and get insight into your workload, identify which type of operations are done most often. Then set up a testing environment, and replay those operations with foreign keys in place. Then compare.
Personally I would not claim right away without knowledge of the applications that are running on the database that foreign keys don't cost performance. Especially if you have cascading deletes and/or updates in combination with composite natural primary keys, then I personally would have some fear of performance issues, especially timed-out or deadlocked transactions due to side-effects of cascading operations.
But no-one can tell you- you have to test yourself, with your data, your workload, your number of concurrent users, your hardware, your applications.

A significant factor in the cost would be the size of the index the foreign key references - if it's small and frequently used, the performance impact will be negligible, large and less frequently used indexes will have more impact, but if your foreign key is against a clustered index, it still shouldn't be a huge hit, but #Ronald Bouman is right - you need to test to be sure.

i know that this is a decade post.
But database primitives are always on demand.
I will refer to my own experience.
In one of the projects that i have worked has to deal with a telecommunication switch database. They have developed a database with no FKs, the reason was because they wanted as much faster inserts they could have. Because sy system it self it have to deal with calls, it make some sense.
Before, there was no need for any intensive queries and if you wanted any report, you could use the GUI software of the switch. After some time you could have some basic reports.
But when i was involved they wanted to develop and AI thus to be able to create smart reports and have something like an automatic troubleshooting.
It was completely a nightmare, having millions of records, you couldn't execute any long query and many times facing sql server timeout. And don't even think using Entity Framework.
It is much difference when you have to face a situation like this instead of describing.
My advice is that you have to be very specific on your design and having a very good reason why not using FKs.

Identity Column as Primary Key

Could you please opine if having identity column as primary key is a good practise?
For ORM tools, having identity column on tables helps. But there are other side effects such as accidental duplicate insertion.
Thanks
Nayn

Yes, using a INT (or BIGINT) IDENTITY is very good practice for SQL Server.
SQL Server uses the primary key as its default clustering key, and the clustering key should always have these properties:
narrow
static
unique
ever-increasing
INT IDENTITY fits the bill perfectly!
For more background info, and especially some info why a GUID as your primary (and thus clustering key) is a bad idea, see Kimberly Tripp's excellent posts:
GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues...
Ever-increasing clustering key - the Clustered Index Debate..........again!
If you have reasons to use a GUID as primary key (e.g. replication), then by all means make sure to have a INT IDENTITY as your clustering key on those tables!
Marc

IDENTITY keys are a good practice for server-side generated keys, in environments where you don't have replication or heavy data merging. The way they're implemented, they don't allow duplicates in the same table, so don't worry about that. They also have the advantage of minimizing fragmentation in tables that don't have lots of DELETEs.
GUIDs are the usual alternative. They have the advantage that you can create them at the web tier, without requiring a DB round-trip. However, they're larger than IDENTITIES, and they can cause extreme table fragmentation. Since they're (semi) random, inserts are spread through the entire table, rather than being focused in one page at the end.

I use a Guid because it really helps when I am dealing with distributed applications. Especially when all the distributed instances also need to create new data.
Nevertheless, I don't see any problems with autoincrement integer primary keys in simple situations. I would prefer them actually because it is easier to work directly with SQL queries because it is easier to remember.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas