Is it beneficial to add an index to a column that is part of a foreign key relationship? I have two columns which will be queried frequently and already have them foreign keyed but wasn't sure if I should index them aswell, or if the foreign key creates an index behind the scenes?
SQL Server does not create a behind the scenes index, so creating an index on all foreign key fields is advisable to improve look up performance.
Details and additional benefits: http://technet.microsoft.com/en-us/library/ms175464.aspx
It is defenitely advisable to add an index to your FK column if you query it often.
In your situation, it is probably even better if you create a composite index which spans your 2 columns.
This is only advisable however (composite index) , if you often execute queries that filter / order on these 2 columns.
If you decide that a composite index is appropriate, then you should pay attention to the order in which you put an index on those columns.
Creating an index for your FK fields is strongly adviseable - this will almost definitely improve performance. Note that's for selects, inserts may be slower as the index itself will also have to be updated.
Sql Server will sometimes create an FK index 'on the fly' but that only exists for the lifetime of your query - you can look at the Sql Execution plan to see if that is occuring. Either way, creating your own index, under your control for FK fields is the way to go.
Related
I have a table that requires two rows to be unique. They are likely to be joined regularly so I probably need an index on these rows.
I checked the Infocenter on the Unique constraint and on the Unique Index.
I'm wondering about the difference and the performance impact. Both seem to create an index. The unique index allows one null value. Are there other important differences?
Do these indexes improve query performance or are they just enforcing uniqueness? Should I add an additional index for performance reasons or will be the unique index be good enough? Unfortunately I don't have enough test data yet for trying it out yet.
Unique constraint or Unique Index has no performance differences and any one would suffice. During query processing DB2 optimizer would automatically pick up indexes created for maintaining unique constraint.
You will find explanation in this topic : http://bytes.com/topic/db2/answers/185707-difference-between-unique-constraint-unique-index
Explaination in one quote :
A unique index is a physical thing whereas a unique constraint is a data
modeling construct. As was already stated, unique constraint are
implemented by adding a unique index (and additionally requiring the NOT
NULL condition).
Let's say we have a table to store users' favourite pictures, with a composite primary key pair(UserId, PictureId). Books normally say in this case you need a composite index based on (UserId, PictureId), which normally appears in the WHERE clause as (UserId=103 AND PictureId=1234). But I think the dababase engine should be smart enough to use two individual indexes based on the two columns separately. Just get the set of row numbers from each of the index and find the ones that are present in both sets. That way, a composite index is not necessary.
So, in reality can database engines do that?
There'd be no advantage to using the two separate single-column indexes; the engine would be better off doing a table scan.
The point of using an index is to make access faster. If the engine used two indexes, it would have to sort at least one set of data from one of the indexes and merge the results from the two indexes. That would be a lot more work than reading just one composite index, especially since the composite index allows for an index-only scan.
Most database engines will require the composite index to enforce the primary key. As such, it's a "free" index that you're going to have anyway - why worry about it?
There may be some benefit (if the index is on UserID,PictureID) to adding a second index just on PictureID. Any query on just UserID will be able to use the composite index, whereas a query just using PictureID would be unable to do so.
I think in the use case you describe, the composite index is not necessary. That would be useful if you were doing a query on, say, a given set of user IDs plus a given set of picture IDs. But when would you ever need that? You'd be more likely to query all a user's pictures in a given date range, or lookup a specific picture by ID. This would suggest an index structure of one composite user id + date index, and another picture id only index.
It always depends on the distribution of records in your database, and the types of queries you will be running most frequently.
PRIMARY KEY or UNIQUE constraints are abstract, theoretical concepts.
An INDEX is practical physical thing that lives in the real world.
In practice, indexes can be used to enforce PK or UNIQUE constraints. But other techniques could also be used (eg for a small domain: a bitmap)
What you describe would be significantly more expensive than using composite index.
First a set of rows would need to be identified from the first index, then a set of rows from the second and finally the set-intersection performed between the two.
--- UPDATE ---
Note hat this is the price you would pay for every INSERT/UPDATE and every foreign key check, not just SELECT.
Also, there may be concurrency issues involved - depending on how the DBMS is implemented, enforcing uniqueness through a single unique composite index might require less/simpler locking than enforcing uniqueness through two non-unique, non-composite indexes.
And of course, if you intend to cluster your table, the primary index will typically also be the clustering index, and contain all columns anyway, so there isn't much purpose in leaving anything out from the "sorting" portion of the index.
I have a table where I haven't explicitly defined a primary key, it's not really required for functionality... however a coworker suggested I add a column as a unique primary key to improve performance as the database grows...
Can anyone explain how this improves performance?
There is no indexing being used (I know I could add indexes to improve performance, what's not clear is how a primary key would also improve performance.)
The specifics
The main table is a log of user activity, it has a auto incrementing column for each entry, so it's already unique, but it isn't set as a primary key
This log table references activity tables which detail the specific activity, referenced by that autoincrementing entry in the main table. So the value is only unique in the main log table, there could be 100 entries in an activity table that reference that value as an identifier (ie. for session 212 Niall did these 500 things).
As you might guess the bulk of data is in the activity tables.
As Kimberly Tripp (the Queen of Indexing) clearly shows in her excellent blog post, The Clustered Index Debate Continues..., having a clustered index on your SQL Server table is beneficial - for all operations - yes, even for inserts!
To quote Kimberly:
Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is
that lookups in the IAM/PFS to determine the insert location in a heap
are slower than in a clustered table (where insert location is known,
defined by the clustered key). Inserts are faster when inserted into a
table where order is defined (CL) and where that order is
ever-increasing.
Since your primary key will by default automatically create a clustered index on that column you define, I would argue that yes, having a primary (clustering) key on your SQL Server table - even a log table - does have positive performance effects.
Primary keys can help performance - it tells SQL Server something important about that field - that it's unique and NOT NULL. This can help create more efficient execution plans.
This MSDN reference on Improving SQL Server Performance is worth a read.
Quote:
When primary and foreign keys are defined as constraints in the
database schema, the server can use that information to create optimal
execution plans.
A primary key automatically sets an index on the primary column. Setting an index to your table will increase performance on your queries.
You don't need to set a primary key to speed up your performance but you should set indexes to your table that will speed up your queries.
It depends on your queries and table what indexes make sense and which don't.
To add to the above - generally if you frequently search on a field, it is a good candidate for an index. Also, searching on an integer ID is usually faster than a string, for example.
Indexes take more storage space, but can increase search performance on that field.
I have a table A and a table B. A has a foreign key to B on B's primary key, B_ID.
For some reason (I know there are legitimate reasons) it is not using an index when I join these two tables on the key.
Do I need to separately create an index on A.B_ID or should the existence of a foreign key provide that?
The foreign key constraint alone does not provide the index on Oracle - one must (and should) be created.
Creating a foreign key does not automatically create an index on A.B_ID. So it would generally make sense from a query performance perspective to create a separate index on A.B_ID.
If you ever delete rows in B, you definitely want A.B_ID to be indexed. Otherwise, Oracle will have to do a full table scan on A every time you delete a row from B to make sure that there are no orphaned records (depending on the Oracle version, there may be additional locking implications as well, but those are diminished in more recent Oracle versions).
Just for more info: Oracle doesn't create an index automatically (as it does for unique constraints) because (a) it is not required to enforce the constraint, and (b) in some cases you don't need one.
Most of the time, however, you will want to create an index (in fact, in Oracle Apex there's a report of "unindexed foreign keys").
Whenever the application needs to be able to delete a row in the parent table, or update the PK value (which is rarer), the DML will suffer if no index exists, because it will have to lock the entire child table.
A case where I usually choose not to add an index is where the FK is to a "static data" table that defines the domain of a column (e.g. a table of status codes), where updates and deletes on the parent table are never done directly by the application. However, if adding an index on the column gives benefits to important queries in the application, then the index will still be a good idea.
SQL Server has never put indexes onto foreign key columns automatically - check out Kim Tripp's excellent blog post on the background and history of this urban myth.
It's usually a good idea to index your foreign key columns, however - so yes, I would recommend making sure each FK column is backed up by an index; not necessarily on that one column alone - maybe it can make sense to create an index on two or three columns with the FK column as the first one in there. Depends on your scenario and your data.
For performance reasons an index should be created. Is used in delete operations on primary table (to check that the record you are deleting is not used) and in joins that usually a foreign key is involved. Only few tables (I do not create them in logs) could be that do not need the index but probably, in this cases probably you don't need the foreign key constraint as well.
BUT
There are some databases that already automatically create indexes on foreign Keys.
Jet Engine (Microsoft Access Files)
Firebird
MySQL
FOR SURE
SQL Server
Oracle
DOES NOT
As with anything relating to performance, it depends on many factors and there is no silve bullet e.g. in a very high activilty environment the maintainance of an index may be unacceptable.
Most salient here would seem to be selectivity: if the values in the index would be highly duplicated then it may give better performance to drop the index (if possible) and allow a table scan.
UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints generate indexes that enforce or "back" the constraint (and are sometimes called backing indexes). PRIMARY KEY constraints generate unique indexes. FOREIGN KEY constraints generate non-unique indexes. UNIQUE constraints generate unique indexes if all the columns are non-nullable, and they generate non-unique indexes if one or more columns are nullable. Therefore, if a column or set of columns has a UNIQUE, PRIMARY KEY, or FOREIGN KEY constraint on it, you do not need to create an index on those columns for performance.
When do you use each MySQL index type?
PRIMARY - Primary key columns?
UNIQUE - Foreign keys?
INDEX - ??
For really large tables, do indexed columns improve performance?
Primary
The primary key is - as the name suggests - the main key of a table and should be a column which is commonly used to select the rows of this table. The primary key is always a unique key (unique identifier). The primary key is not limited to one column, for example in reference tables (many-to-many) it often makes sense to have a primary key including two or more columns.
Unique
A unique index makes sure your DBMS doesn't accept duplicate entries for this column. You ask 'Foreign keys?' NO! That would not be useful since foreign keys are per definition prone to be duplicates, (one-to-many, many-to-many).
Index
Additional indexes can be placed on columns which are often used for SELECTS (and JOINS) which is often the case for foreign keys. In many cases SELECT (and JOIN) queries will be faster, if the foreign keys are indexed.
Note however that - as SquareCog has clarified - Indexes get updated on any modifications to the data, so yes, adding more indexes can lead to degradation in INSERT/UPDATE performance. If indexes didn't get updated, you would get different information depending on whether the optimizer decided to run your query on an index or the raw table -- a highly undesirable situation.
This means, you should carefully assess the usage of indices. One thing is sure on the basis of that: Unused indices have to be avoided, resp. removed!
I'm not that familiar with MySQL, however I believe the following to be true across most database servers. An index is a balanced tree which is used to allow the database to scan the table for given data. For example say you have the following table.
CREATE TABLE person (
id SERIAL,
name VARCHAR(20),
dob VARCHAR(20)
);
If you created an index on the 'name' field this would create in a balanced tree for that data in the table for the name column. Balanced tree data structures allow for faster searching of results (see http://www.solutionhacker.com/tag/balanced-tree/).
You should note however indexing a column only allows you to search on the data as it is stored in the database. For example:
This would not be able to search on the index and would instead do a sequential scan on the table, calling UPPER() on each of the column:name rows in the table.
select *
from person
where UPPER(name) = "BOB";
This would also have the following effect, because the index will be sorted starting with the first letter. Replacing the search term with "%B" would however use the index.
select *
from person
where name like "%B"
Indexes will improve performance on larger tables. Normally, the primary key has an index based on the key. Usually unique.
It is useful to add indexes to fields that are used to search on a lot too such as Street Name or Surname as again it will improve perfomance. Don't need to be unique.
Foreign Keys and Unique Keys are more for keeping your data integrity in order. So that you cannot have duplicate primary keys and so that your child tables don't have data for a parent that has been deleted.
PRIMARY defines a primary key, yes.
UNIQUE simply defines that the specified field has to be unique, it has nothing to do with foreign keys.
INDEX creates an index for the specified column and, yes, it improves performance for large tables, sorting and finding something in this column can be much faster if you use indexing.
The bigger the table, the bigger is gain from using an index. Do note that indexes makes insert (and probably update) operations slower so make sure you don't index too many fields.