Indexes In Postgres - sql

So i am trying to wrap my head around what indexes in postgresql, I have understood so far that indexes help for faster querying and that all primary keys have an index, I was wondering if all foreign keys(referring to a primary key of another table) should also have an index or is it just the attributes in a table that we are querying that need indexes?
Thanks.

Indexes should be based primarily on queries that are being used. If you are going to be filtering or aggregating or sorting by the foreign key, then it is sensible as an index.
The other use is using indexes to enforce unique constraints.
In other words, there is no rule that foreign keys should always be indexed. It is often a good idea, but that depends on the queries you want to run.

It is not mandatory to index the foreign key. But if you ever delete from the referenced table, or update it in a way that changes the primary key, then it will have to search for rows in the referencing table by the foreign key column to make sure it will be left with no violations (or to set them to null, or delete them, depending on what action was defined in the foreign key). Without an index on the foreign key column this will likely be slow.
Also, it is likely you will want to run queries like "Show me all the detail rows of this master row", and without a index that will also be slow.
If you never delete from the referenced table and never run that kind of query, then you probably do not need the index.

Related

Composite Index primary key vs Unique auto increment primary key

We have a transaction table of over 111m rows that has a clustered composite primary key of...
RevenueCentreID int
DateOfSale smalldatetime
SaleItemID int
SaleTypeID int
...in a SQL 2008 R2 database.
We are going to be truncating and refilling the table soon for an archiving project, so the opportunity to get the indexes right will be once the table has been truncated.
Would it be better to keep the composite primary key or should we move to a unique auto increment primary key?
Most searches on the table are done using the DateOfSale and RevenueCentreID columns. We also often join to the SaleItemID column. We hardly ever use the SaleType column, in fact it is only included in the primary key for uniqueness. We dont care about how long it takes to insert & delete new sales figures(done over night) but rather the speed of returning reports.
A surrogate key serves no purpose here. I suggest a clustered primary key on the columns as listed, and an index on SaleItemID.
In have learned you want and need both a natural key and a surrogate key.
The natural key keeps the business keys unique and is prefect for indexing. where the surrogate key will help with queries and development.
So in your case a surrogate auto incrementing key is good in the fact it will help keep all the rows of data in tact. And a natural key of DateOfSale, RevenueID and maybe ClientID would make a great way of ensuring no duplicate records are being stored and speed up querying because you can index the natural key.
If you don't care about the speed of inserts and deletions, then you probably want multiple indexes which target the queries precisely.
You could create an auto increment primary key as you suggest, but also create indexes as required to cover the reporting queries. Create a unique constraint on the columns you currently have in the key to enforce uniqueness.
Index tuning wizard will help with defining the optimum set of indexes, but it's better to create your own.
Rule of thumb - you can define columns to index, and also "include" columns.
If your report has an OrderBy or a Where clause on a column then you need the index to be defined against these. Any other fields returned in the select should be included columns.

Do I need to create indexes on foreign keys on Oracle?

I have a table A and a table B. A has a foreign key to B on B's primary key, B_ID.
For some reason (I know there are legitimate reasons) it is not using an index when I join these two tables on the key.
Do I need to separately create an index on A.B_ID or should the existence of a foreign key provide that?
The foreign key constraint alone does not provide the index on Oracle - one must (and should) be created.
Creating a foreign key does not automatically create an index on A.B_ID. So it would generally make sense from a query performance perspective to create a separate index on A.B_ID.
If you ever delete rows in B, you definitely want A.B_ID to be indexed. Otherwise, Oracle will have to do a full table scan on A every time you delete a row from B to make sure that there are no orphaned records (depending on the Oracle version, there may be additional locking implications as well, but those are diminished in more recent Oracle versions).
Just for more info: Oracle doesn't create an index automatically (as it does for unique constraints) because (a) it is not required to enforce the constraint, and (b) in some cases you don't need one.
Most of the time, however, you will want to create an index (in fact, in Oracle Apex there's a report of "unindexed foreign keys").
Whenever the application needs to be able to delete a row in the parent table, or update the PK value (which is rarer), the DML will suffer if no index exists, because it will have to lock the entire child table.
A case where I usually choose not to add an index is where the FK is to a "static data" table that defines the domain of a column (e.g. a table of status codes), where updates and deletes on the parent table are never done directly by the application. However, if adding an index on the column gives benefits to important queries in the application, then the index will still be a good idea.
SQL Server has never put indexes onto foreign key columns automatically - check out Kim Tripp's excellent blog post on the background and history of this urban myth.
It's usually a good idea to index your foreign key columns, however - so yes, I would recommend making sure each FK column is backed up by an index; not necessarily on that one column alone - maybe it can make sense to create an index on two or three columns with the FK column as the first one in there. Depends on your scenario and your data.
For performance reasons an index should be created. Is used in delete operations on primary table (to check that the record you are deleting is not used) and in joins that usually a foreign key is involved. Only few tables (I do not create them in logs) could be that do not need the index but probably, in this cases probably you don't need the foreign key constraint as well.
BUT
There are some databases that already automatically create indexes on foreign Keys.
Jet Engine (Microsoft Access Files)
Firebird
MySQL
FOR SURE
SQL Server
Oracle
DOES NOT
As with anything relating to performance, it depends on many factors and there is no silve bullet e.g. in a very high activilty environment the maintainance of an index may be unacceptable.
Most salient here would seem to be selectivity: if the values in the index would be highly duplicated then it may give better performance to drop the index (if possible) and allow a table scan.
UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints generate indexes that enforce or "back" the constraint (and are sometimes called backing indexes). PRIMARY KEY constraints generate unique indexes. FOREIGN KEY constraints generate non-unique indexes. UNIQUE constraints generate unique indexes if all the columns are non-nullable, and they generate non-unique indexes if one or more columns are nullable. Therefore, if a column or set of columns has a UNIQUE, PRIMARY KEY, or FOREIGN KEY constraint on it, you do not need to create an index on those columns for performance.

Many-to-many link table design : two foreign keys only or an additional primary key?

this is undoubtedly a newbie question, but I haven't been able
to find a satisfactory answer.
When creating a link table for many-to-many relationships, is it better to
create a unique id or only use two foreign keys of the respective tables (compound key?).
Looking at different diagrams of the Northwind database for example, I've come across
both 'versions'.
That is: a OrderDetails table with fkProductID and fkOrderID and also versions
with an added OrderDetailsID.
What's the difference? (does it also depend on the DB engine?).
What are the SQL (or Linq) advantages/disadvantages?
Thanks in advance for an explanation.
Tom
ORMs have been mandating the use of non-composite primary keys to simplify queries...
But it Makes Queries Easier...
At first glance, it makes deleting or updating a specific order/etc easier - until you realize that you need to know the applicable id value first. If you have to search for that id value based on an orders specifics then you'd have been better off using the criteria directly in the first place.
But Composite keys are Complex...
In this example, a primary key constraint will ensure that the two columns--fkProductID and fkOrderID--will be unique and indexed (most DBs these days automatically index primary keys if the clustered index doesn't already exist) using the best index possible for the table.
The lone primary key approach means the OrderDetailsID is indexed with the best index for the table (SQL Server & MySQL call them clustered indexes, to Oracle they're all just indexes), and requires an additional composite unique constraint/index. Some databases might require additional indexing beyond the unique constraint... So this makes the data model more involved/complex, and for no benefit:
Some databases, like MySQL, put a limit on the amount of space you can use for indexes.
the primary key is getting the most ideal index yet the value has no relevance to the data in the table, so making use of the index related to the primary key will be seldom if ever.
Conclusion
I don't see the benefit in a single column primary key over a composite primary key. More work for additional overhead with no net benefit...
I'm used to use PrimaryKey column. It's because the primary key uniquely identify the record.
If you have a cascade-update settings on table relations, the values of foreign keys can be changed between "SELECT" and "UPDATE/DELETE" commands sent from application.

When should I use primary key or index?

When should I use a primary key or an index?
What are their differences and which is the best?
Basically, a primary key is (at the implementation level) a special kind of index. Specifically:
A table can have only one primary key, and with very few exceptions, every table should have one.
A primary key is implicitly UNIQUE - you cannot have more than one row with the same primary key, since its purpose is to uniquely identify rows.
A primary key can never be NULL, so the row(s) it consists of must be NOT NULL
A table can have multiple indexes, and indexes are not necessarily UNIQUE. Indexes exist for two reasons:
To enforce a uniquness constraint (these can be created implicitly when you declare a column UNIQUE)
To improve performance. Comparisons for equality or "greater/smaller than" in WHERE clauses, as well as JOINs, are much faster on columns that have an index. But note that each index decreases update/insert/delete performance, so you should only have them where they're actually needed.
Differences
A table can only have one primary key, but several indexes.
A primary key is unique, whereas an index does not have to be unique. Therefore, the value of the primary key identifies a record in a table, the value of the index not necessarily.
Primary keys usually are automatically indexed - if you create a primary key, no need to create an index on the same column(s).
When to use what
Each table should have a primary key. Define a primary key that is guaranteed to uniquely identify each record.
If there are other columns you often use in joins or in where conditions, an index may speed up your queries. However, indexes have an overhead when creating and deleting records - something to keep in mind if you do huge amounts of inserts and deletes.
Which is best?
None really - each one has its purpose. And it's not that you really can choose the one or the other.
I recommend to always ask yourself first what the primary key of a table is and to define it.
Add indexes by your personal experience, or if performance is declining. Measure the difference, and if you work with SQL Server learn how to read execution plans.
This might help Back to the Basics: Difference between Primary Key and Unique Index
The differences between the two are:
Column(s) that make the Primary Key of a table cannot be NULL since by definition, the Primary Key cannot be NULL since it helps uniquely identify the record in the table. The column(s) that make up the unique index can be nullable. A note worth mentioning over here is that different RDBMS treat this differently –> while SQL Server and DB2 do not allow more than one NULL value in a unique index column, Oracle allows multiple NULL values. That is one of the things to look out for when designing/developing/porting applications across RDBMS.
There can be only one Primary Key defined on the table where as you can have many unique indexes defined on the table (if needed).
Also, in the case of SQL Server, if you go with the default options then a Primary Key is created as a clustered index while the unique index (constraint) is created as a non-clustered index. This is just the default behavior though and can be changed at creation time, if needed.
Keys and indexes are quite different concepts that achieve different things. A key is a logical constraint which requires tuples to be unique. An index is a performance optimisation feature of a database and is therefore a physical rather than a logical feature of the database.
The distinction between the two is sometimes blurred because often a similar or identical syntax is used for specifying constraints and indexes. Many DBMSs will create an index by default when key constraints are created. The potential for confusion between key and index is unfortunate because separating logical and physical concerns is a highly important aspect of data management.
As regards "primary" keys. They are not a "special" type of key. A primary key is just any one candidate key of a table. There are at least two ways to create candidate keys in most SQL DBMSs and that is either using the PRIMARY KEY constraint or using a UNIQUE constraint on NOT NULL columns. It is a very widely observed convention that every SQL table has a PRIMARY KEY constraint on it. Using a PRIMARY KEY constraint is conventional wisdom and a perfectly reasonable thing to do but it generally makes no practical or logical difference because most DBMSs treat all keys as equal. Certainly every table ought to enforce at least one candidate key but whether those key(s) are enforced by PRIMARY KEY or UNIQUE constraints doesn't usually matter. In principle it is candidate keys that are important, not "primary" keys.
The primary key is by definition unique: it identifies each individual row. You always want a primary key on your table, since it's the only way to identify rows.
An index is basically a dictionary for a field or set of fields. When you ask the database to find the record where some field is equal to some specific value, it can look in the dictionary (index) to find the right rows. This is very fast, because just like a dictionary, the entries are sorted in the index allowing for a binary search. Without the index, the database has to read each row in the table and check the value.
You generally want to add an index to each column you need to filter on. If you search on a specific combination of columns, you can create a single index containing all of those columns. If you do so, the same index can be used to search for any prefix of the list of columns in your index. Put simply (if a bit inaccurately), the dictionary holds entries consisting of the concatenation of the values used in the columns, in the specified order, so the database can look for entries which start with a specific value and still use efficient binary search for this.
For example, if you have an index on the columns (A, B, C), this index can be used even if you only filter on A, because that is the first column in the index. Similarly, it can be used if you filter on both A and B. It cannot, however, be used if you only filter on B or C, because they are not a prefix in the list of columns - you need another index to accomodate that.
A primary key also serves as an index, so you don't need to add an index convering the same columns as your primary key.
Every table should have a PRIMARY KEY.
Many types of queries are sped up by the judicious choice of an INDEX. It may be that the best index is the primary key. My point is that the query is the main factor in whether to use the PK for its index.

SQL: what exactly do Primary Keys and Indexes do?

I've recently started developing my first serious application which uses a SQL database, and I'm using phpMyAdmin to set up the tables. There are a couple optional "features" I can give various columns, and I'm not entirely sure what they do:
Primary Key
Index
I know what a PK is for and how to use it, but I guess my question with regards to that is why does one need one - how is it different from merely setting a column to "Unique", other than the fact that you can only have one PK? Is it just to let the programmer know that this value uniquely identifies the record? Or does it have some special properties too?
I have no idea what "Index" does - in fact, the only times I've ever seen it in use are (1) that my primary keys seem to be indexed, and (2) I heard that indexing is somehow related to performance; that you want indexed columns, but not too many. How does one decide which columns to index, and what exactly does it do?
edit: should one index colums one is likely to want to ORDER BY?
Thanks a lot,
Mala
Primary key is usually used to create a numerical 'id' for your records, and this id column is automatically incremented.
For example, if you have a books table with an id field, where the id is the primary key and is also set to auto_increment (Under 'Extra in phpmyadmin), then when you first add a book to the table, the id for that will become 1'. The next book's id would automatically be '2', and so on. Normally, every table should have at least one primary key to help identifying and finding records easily.
Indexes are used when you need to retrieve certain information from a table regularly. For example, if you have a users table, and you will need to access the email column a lot, then you can add an index on email, and this will cause queries accessing the email to be faster.
However there are also downsides for adding unnecessary indexes, so add this only on the columns that really do need to be accessed more than the others. For example, UPDATE, DELETE and INSERT queries will be a little slower the more indexes you have, as MySQL needs to store extra information for each indexed column. More info can be found at this page.
Edit: Yes, columns that need to be used in ORDER BY a lot should have indexes, as well as those used in WHERE.
The primary key is basically a unique, indexed column that acts as the "official" ID of rows in that table. Most importantly, it is generally used for foreign key relationships, i.e. if another table refers to a row in the first, it will contain a copy of that row's primary key.
Note that it's possible to have a composite primary key, i.e. one that consists of more than one column.
Indexes improve lookup times. They're usually tree-based, so that looking up a certain row via an index takes O(log(n)) time rather than scanning through the full table.
Generally, any column in a large table that is frequently used in WHERE, ORDER BY or (especially) JOIN clauses should have an index. Since the index needs to be updated for evey INSERT, UPDATE or DELETE, it slows down those operations. If you have few writes and lots of reads, then index to your hear's content. If you have both lots of writes and lots of queries that would require indexes on many columns, then you have a big problem.
The difference between a primary key and a unique key is best explained through an example.
We have a table of users:
USER_ID number
NAME varchar(30)
EMAIL varchar(50)
In that table the USER_ID is the primary key. The NAME is not unique - there are a lot of John Smiths and Muhammed Khans in the world. The EMAIL is necessarily unique, otherwise the worldwide email system wouldn't work. So we put a unique constraint on EMAIL.
Why then do we need a separate primary key? Three reasons:
the numeric key is more efficient
when used in foreign key
relationships as it takes less space
the email can change (for example
swapping provider) but the user is
still the same; rippling a change of
a primary key value throughout a schema
is always a nightmare
it is always a bad idea to use
sensitive or private information as
a foreign key
In the relational model, any column or set of columns that is guaranteed to be both present and unique in the table can be called a candidate key to the table. "Present" means "NOT NULL". It's common practice in database design to designate one of the candidate keys as the primary key, and to use references to the primary key to refer to the entire row, or to the subject matter item that the row describes.
In SQL, a PRIMARY KEY constraint amounts to a NOT NULL constraint for each primary key column, and a UNIQUE constraint for all the primary key columns taken together. In practice many primary keys turn out to be single columns.
For most DBMS products, a PRIMARY KEY constraint will also result in an index being built on the primary key columns automatically. This speeds up the systems checking activity when new entries are made for the primary key, to make sure the new value doesn't duplicate an existing value. It also speeds up lookups based on the primary key value and joins between the primary key and a foreign key that references it. How much speed up occurs depends on how the query optimizer works.
Originally, relational database designers looked for natural keys in the data as given. In recent years, the tendency has been to always create a column called ID, an integer as the first column and the primary key of every table. The autogenerate feature of the DBMS is used to ensure that this key will be unique. This tendency is documented in the "Oslo design standards". It isn't necessarily relational design, but it serves some immediate needs of the people who follow it. I do not recommend this practice, but I recognize that it is the prevalent practice.
An index is a data structure that allows for rapid access to a few rows in a table, based on a description of the columns of the table that are indexed. The index consists of copies of certain table columns, called index keys, interspersed with pointers to the table rows. The pointers are generally hidden from the DBMS users. Indexes work in tandem with the query optimizer. The user specifies in SQL what data is being sought, and the optimizer comes up with index strategies and other strategies for translating what is being sought into a stategy for finding it. There is some kind of organizing principle, such as sorting or hashing, that enables an index to be used for fast lookups, and certain other uses. This is all internal to the DBMS, once the database builder has created the index or declared the primary key.
Indexes can be built that have nothing to do with the primary key. A primary key can exist without an index, although this is generally a very bad idea.