As I understood from this post, there are some scenarios where foreign keys can improve query performance.
I've heard the opposite claim though, that because of referential integrity checks, foreign keys can actually hurt query performance. Under which conditions (if at all) is this true?
1) The term query seems to be misleading. I am interested in all kinds of performance penalties.
2) Does anyone have any real-world numbers about the negative impact on INSERT, DELETE or UPDATE statements (I know it depends on the specific system, but nevertheless any kind of real-world measurements would be appreciated)?
if a foreign key is needed for referential integrity then the presence of the foreign key should form the baseline for performance
you might as well ask if a car can go faster if you don't put seats in - a well formed car includes seats as much as a well formed database includes foreign keys
I'm assuming that for INSERT queries, constraints - including foreign key constraints - will slow performance somewhat. The database has to check that whatever you've told it to insert is something that your constraints allow it to insert.
For SELECT queries, foreign key constraints shouldn't make any changes to performance.
Since INSERTS are almost always very quick, the small amount of extra time won't be noticeable, except in fringe cases. (Building a several gigabyte database, you might want to disable constraints and then re-enable later, as long as you're sure the data is good.)
In theory, yes: data writes need to validate the constraints.
In practice, seldom: unless measured and proved otherwise, you can assume there is no performance impact. Overwhelmingly, performance problems occur due to other problems:
bad schema design (missing indexes, bad clustered index choice)
contention (blocking), again due to bad schema design (table scans guarantee lock conflicts)
bad query design
On a well designed schema and good queries the cost of constraints will start to show up at very high throughput. When this happens, there are preventive measures.
My 2c: Never sacrifice correctness constraints for some elusive performance goals. In the very rare case when the constraints are indeed the problem there are measurements to show that's the case, and as the saying goes: if you have to ask how much it costs, you can't afford it. If you have to ask if constraints can be a problem, you can't remove them (no offence intended).
Foreign keys can cause inserts(or some updates) in child tables or deletes from parent tables to take longer. This is a good thing however as it means that it is making sure that the data integrity remains. There is no reason whatsoever to skip using foriegn keys unless you don't want to have useful data. You wil not normally notice much differnce unless you have many foreign keys realted to the same parent table or if you are inserting or deleting many records in one batch. Further, I have noticed, users are more tolerant of a couple of extra seconds in an insert or delete than they are in a select. Users are also not tolerant at all of unreliable data which is what you have without foreign key constraints.
You will need to index them to improve performance on select queries.
For INSERT/UPDATE/DELETE the short answer is, "Yes". The database will need to check that the referential integrity is intact and the creation/modification is allowed. Or in DELETE's case, there may be some cascading to be done.
For SELECTs, it's actually quite the contrary. Foreign Keys have a secret added benefit of showing you exactly where you're most likely to be doing complex JOINs and have very commonly used fields. This makes the job of indexing much easier, and you can pretty much guarantee that all of your FK fields should be indexed. This makes SELECTs much faster.
Foreign key checking takes more time than most people think. A current test with Oracle 11g and a table with two foreign keys showed that the time for an insert of about 800.000 rows took 60 seconds with enabled foreign keys but only 20 seconds without foreign keys.
(The foreign key columns were indexed, of course)
Anyway, I agree with all the other posters, that integrity constraints are not an option, but the only way to keep data consistent. However, for imports, especially into empty tables, it could be an option to disable the foreign key for the time of the import, if time is critical.
If foreign keys had any impact in that way, it would be on INSERTS. The database does the referential checking on foreign keys when records are created/modified, not SELECTed.
Foreign keys will not adversley affect query performance in most cases, and are strongly recommended. By aiding normalization, you will eliminate redundant data, and if you follow up by adding underlying indexes (for the appropriate foreign key) you will get good performance on your queries.
Foreign-keys can help the query optimizer get the best query plans for a given query.
Foreign-key checking is a factor when you update your data (which is a separate consideration - I assume here your concern is query - unless by the word query you imply both).
Foreign keys slow down insertions and alterations, because each foreign key reference must be verified. Foreign keys can either not affect a selection, or make it go faster, depending on if the DBMS uses foreign key indexing.
Foreign keys have a complex effect on deletion. If you're deleting the thing that refers to the foreign key, it won't affect anything, but if what you're deleting is referenced by a foreign key in another row/table, then it will generally cause problems.
Foreign keys can cause a minor performance degradation in table creations and alterations.
Of course, this all assumes foreign key verification is in use.
I believe the noted post pointed out that putting an index on FK fields improved performance, not simply that a FK relationship improved performance. The existence of a FK on a table should not have any effect on a SELECT query, unless JOIN operations are being done, at which point, the FK relationship AND index on FK fields would improve performance.
If you're enforcing referential integrity, INSERTs, and UPDATEs that effect the FK field, will be slower. However, it's usually not much to worry about, especially as a lot of DBs are 80% read/20% write. It's also a price worth paying.
Creating an index on a foreign key is often beneficial, though obviously it how much depends on what SELECT statements you're running.
Generally, you need foreign keys due to normalisation (which avoids duplicate data and synchronisation problems). Normalise to the 3rd degree, and then after analysing real world performance can you consider de-normalising.
Related
What is the core differences between a table that is created with constraints like PK vs a table created or added indexes created on it only (that don't have PK) when it comes which route would anyone prefer to implement when creating a table? I have work on them but I am just curious to know what separate them Thank you
They are quite different.
Constraints check the data that is being inserted and updated meet some criteria (for example "not null"). If the data does not meet the criteria, the INSERT or UPDATE is rejected, and fails. Constraints help you to maintain the quality of the data.
Indexes improve (most of the time) the speed of a query, and are usually beneficial to SELECT, UPDATE, and DELETE operations. Indexes improve database performance.
An index has no effect on how a query behaves nor the schema definition. It only effects performance. Although some SQL servers implement features using indexes, particularly the unique constraint. The SQL standard doesn't even mention them because they're considered an implementation issue.
A primary key constraint very much does have an effect on behavior and schema definition. It says this column must be unique and not null. Most databases also happen to index it for obvious performance reasons.
Declaring primary key rather than manually saying unique not null also lets the person reading your schema know that this is the primary key. They will know what its purpose it. It also lets the database know this is the primary key which might allow it to do some extra optimizations.
I use foreign keys at work. But we pretty much manually manage our tables and we always make sure that we always have a parent entry in another table for a child entry that references it by its Id. We insert, update and delete the parent and child entities in the table in the same transaction.
So why should we still keep those foreign keys? They slow the database down when inserting new entities in the database and may be one of the reasons we get deadlocks from time to time.
Are they actually used by Sql Server for other things? Like gathering better statistics or is their only purpose to keep data integrity?
You shouldn't. Drop constraints with their foreign keys.
Checks at the Database lever are the last integrity barrier protecting your data.
For performance issues you might want to remove foreign keys but you might end up having to maintain a partially corrupted DB what ends up being a nightmare.
Can Foreign key improve performance
Foreign key constraint improve performance at the time of reading data
but at the same time it slows down the performance at the time of
inserting / modifying / deleting data.
In case of reading the query, the optimizer can use foreign key
constraints to create more efficient query plans as foreign key
constraints are pre declared rules. This usually involves skipping
some part of the query plan because for example the optimizer can see
that because of a foreign key constraint, it is unnecessary to execute
that particular part of the plan.
I have large table on my database (50 columns and might get to 100,000,000 rows).
Right now my primary key is 8 columns.
It is better to make 1 primary key (automatic number) and add columns unique constrains?
The structure of a database should be driven by how the data is going to be used. One can make a judgement that the structure is normalized or denormalized, for instance. But each of those methodologies is appropriate in different circumstances.
That said, I am heavily biased to having an auto incrementing ("identity") primary key in all tables. This is beneficial in many circumstances. Here are three reasons:
For knowing the insertion order of rows (to a close approximation).
For creating foreign key relationships.
To ensure that you can uniquely identify each row for updates and deletes.
However, such a column occupies more storage. And, a single primary key index is more efficient (space-wise) than having multiple indexes.
This isn't a direct answer to your question, but it does at least give you some parameters for thinking about the issues.
Usually when you don't know any better, it's best to follow the standard guidelines in DB design. Because they work for a clear majority of the cases, and there's less chance in going wrong with them.
In this case, it means that you would very likely benefit from using an auto-incrementing ID column as a clustered primary key index. As well as adding foreign keys and nonclustered indexes to all referencing columns, in case you haven't already.
Finally, you might even look at adding a few custom (even multicolumn) indexes specifically designed for your slowest queries... basically, if you have a few problem queries, you want to index the columns in their WHERE clauses.
In addition to all this, you should of course look to see that the table and DB in general is normalized. To learn about normalization, it's best for you to google for it. It's not a complex or difficult concept, but there are a thousand tutorials far better suited for explaining the concept than us here.
By doing this you will very likely be moving in the correct direction. And if it doesn't work, then that's so much more information people here can use to determine the best alternative.
We are having a rather long discussion in our company about whether or not to put an autoincrement key on EVERY table in our database.
I can understand putting one on tables that would have a FK reference to, but I kind-of dislike putting such keys on each and every one of our tables, even though the keys would never be used.
Please help with pros and cons for putting autoincrement keys on every table apart from taking extra space and slowing everything a little bit (we have some tables with hundreds of millions of records).
Thanks
I'm assuming that almost all tables will have a primary key - and it's just a question of whether that key consists of one or more natural keys or a single auto-incrementing surrogate key. If you aren't using primary keys then you will generally get a lot of advantages of using them on almost all tables.
So, here are some pros & cons of surrogate keys. First off, the pros:
Most importantly: they allow the natural keys to change. Trivial example, a table of persons should have a primary key of person_id rather than last_name, first_name.
Read performance - very small indexes are faster to scan. However, this is only helpful if you're actually constraining your query by the surrogate key. So, good for lookup tables, not so good for primary tables.
Simplicity - if named appropriately, it makes the database easy to learn & use.
Capacity - if you're designing something like a data warehouse fact table - surrogate keys on your dimensions allow you to keep a very narrow fact table - which results in huge capacity improvements.
And cons:
They don't prevent duplicates of the natural values. So, you'll still usually want a unique constraint (index) on the logical key.
Write performance. With an extra index you're going to slow down inserts, updates and deletes that much more.
Simplicity - for small tables of data that almost never changes they are unnecessary. For example, if you need a list of countries you can use the ISO list of countries. It includes meaningful abbreviations. This is better than a surrogate key because it's both small and useful.
In general, surrogate keys are useful, just keep in mind the cons and don't hesitate to use natural keys when appropriate.
You need primary keys on these tables. You just don't know it yet.
If you use small keys like this for Clustered Indexes, then there's quite significant advantages.
Like:
Inserts will always go at the end of pages.
Non-Clustered Indexes (which need a reference to the CIX key(s)) won't have long row addresses to consider.
And more... Kimberly Tripp's stuff is the best resource for this. Google her...
Also - if you have nothing else ensuring uniqueness, you have a hook into each row that you wouldn't otherwise have. You should still put unique indexes on fields that should be unique, and use FKs onto appropriate fields.
But... please consider the overhead of creating such things on existing tables. It could be quite scary. You can put unique indexes on tables without needing to create extra fields. Those unique indexes can then be used for FKs.
I'm not a fan of auto-increment primary keys on every table. The ideas that these give you fast joins and fast row inserts are really not true. My company calls this meatloaf thinking after the story about the woman who always cut the ends off her meatloaf just because her mother always did it. Her mother only did it because the pan was too short--the tradition keeps going even though the reason no longer exists.
When the driving table in a join has an auto-increment key, the joined table frequently shouldn't because it must have the FK to the driving table. It's the same column type, but not auto-increment. You can use the FK as the PK or part of a composite PK.
Adding an auto-increment key to a table with a naturally unique key will not always speed things up--how can it? You are adding more work by maintaining an extra index. If you never use the auto-increment key, this is completely wasted effort.
It's very difficult to predict optimizer performance--and impossible to predict future performance. On some databases, compressed or clustered indexes will decrease the costs of naturally unique PKs. On some parallel databases, auto-increment keys are negotiated between nodes and that increases the cost of auto-increment. You can only find out by profiling, and it really sucks to have to change Company Policy just to change how you create a table.
Having autoincrementing primary keys may make it easier for you to switch ORM layers in the future, and doesn't cost much (assuming you retain your logical unique keys).
You add surrogate auto increment primary keys as part of the implementation after logical design to respect the physical, on-disk architecture of the db engine.
That is, they have physcial properties (narrow, numeric, strictly monotonically increasing) that suit use as clustered keys, in joins etc.
Example: If you're modelling your data, then "product SKU" is your key. "product ID" is added afterwards, (with a unique constraint on "product SKU") when writing your "CREATE TABLE" statements because you know SQL Server.
This is the main reason.
The other reason a brain dead ORM that can't work without one...
Many tables are better off with a compound PK, composed of two or more FKs. These tables correspond to relationships in the Entity-Relationship (ER) model. The ER model is useful for conceptualizing a schema and understanding the requirements, but it should not be confused with a database design.
The tables that represent entities from an ER model should have a smiple PK. You use a surrogate PK when none of the natural keys can be trusted. The decision about whether a key can be trusted or not is not a technical decision. It depends on the data you are going to be given, and what you are expected to do with it.
If you use a surrogate key that's autoincremented, you now have to make sure that duplicate references to the same entity don't creep into your databases. These duplicates would show up as two or more rows with a distinct PK (because it's been autoincremented), but otherwise duplicates of each other.
If you let duplicates into your database, eventually your use of the data is going to be a mess.
The simplest approach is to always use surrogate keys that are either auto-incremented by the db or via an orm. And on every table. This is because they are the generally fasted method for joins and also they make learning the database extremely simple, i.e. none of this whats my key for a table nonsense as they all use the same kind of key. Yes they can be slower but in truth the most important part of design is something that wont break over time. This is proven for surrogate keys. Remember, maintenance of the system happens a lot longer than development. Plan for a system that can be maintained. Also, with current hardware the potential performance loss is really negligable.
Consider this:
A record is deleted in one table that has a relationship with another table. The corresponding record in the second table cannot be deleted for auditing reasons. This record becomes orphaned from the first table. If a new record is inserted into the first table, and a sequential primary key is used, this record is now linked to the orphan. Obviously, this is bad. By using an auto incremented PK, an id that has never been used before is always guaranteed. This means that orphans remain orphans, which is correct.
I would never use natural keys as a PK. A numeric PK, like an auto increment is the ideal choice the majority of the time, because it can be indexed efficiently. Auto increments are guaranteed to be unique, even when records are deleted, creating trusted data relationships.
We're designing a database in which I need to consider some FK(foreign key)
constraints. But it is not limited to
formal structuring and normalization.
We go for it only if it provides any
performance or scalability benefits.
I've been going thru some interesting articles and googling for practical benefits. Here are some links:
http://www.mssqltips.com/tip.asp?tip=1296
I wanted to know more about the benefits of FK (apart from the formal structuring and the famous cascaded delete\update).
FK are not 'indexed' by default so what are the considerations while indexing an FK?
How to handle nullable fields which are mapped as foreign key - is this allowed?
Apart from indexing, does this help in optimizing query-execution plans in SQL-Server?
I know there's more but I'd prefer experts speaking on this. Please guide me.
Foreign keys provide no performance or scalability benefits.
Foreign keys enforce referential integrity. This can provide a practical benefit by raising an error if someone attempted to delete rows from the parent table in error.
Foreign keys are not indexed by default. You should index your foreign keys columns, as this avoids a table scan on the child table when you delete/update your parent row.
You can make a foreign key column nullable and insert null.
The main benefit is that your database will not end up inconsistent if your buggy client code tries to do something wrong. Foreign keys are a type of 'constraint', so that's how you should use them.
They do not have any "functional" benefit, they will not optimize anything. You still have to create indexes yourself, etc. And yes, you can have NULL values in a column that is a foreign key.
FK constraints keep your data consistent. That's it. This is the main benefit.
FK constraints will not provide you with any performance gain.
But, unless you have denormalized on purpose db structure, I'd recommend you to use FK constraints. The main reason - consistency.
I have read at least one example on net where it was shown that Foreign Keys do improve performance because the optimiser does not have to do additional checks across tables because it knows data meets certain criteria already due to the FK. Sorry I don't have a link but the blog gave detailed output of the query plans to prove it.
As mentioned, they are for data integrity. Any performance "loss" would be utterly wiped out by the time required to fix broken data.
However, there could be an indirect performance benefit.
For SQL Server at least, the columns in the FK must have the same datatype on each side. Without an FK, you could have an nvarchar parent and a varchar child for example. When you join the 2 tables, you'll get a datatype conversions which can kill performance.
Example: different varchar lengths causing an issue