SQL Relationships and indexes

SQL Relationships and indexes - sql

I have an MS SQL server application where I have defined my relationships and primary keys.
However do I need to further define indexes on relationship fields which are sometimes not used in joins and just as part of a where clause?
I am working on the assumption that defining a relationship creates an index, which the sql engine can reuse.

Some very thick books have been written on this subject!
Here are some ruiles of thumb:-
Dont bother indexing (apart from PK) any table with < 1000 rows.
Otherwise index all your FKs.
Examine your SQL and look for the where clauses that will most reduce your result sets and index that columun.
eg. given:
SELECT OWNER FROM CARS WHERE COLOUR = 'RED' AND MANUFACTURER = "BMW" AND ECAP = "2.0";
You may have 5000 red cars out of 20,000 so indexing this wont help much.
However you may only have 100 BMWs so indexing MANUFACURER will immediatly reduce you result set to 100 and you can eliminate the the blue and white cars by simply scanning through the hundred rows.
Generally the dbms will pick one or two of the indexes available based on cardinality so it pays to second guess and define only those indexes that are likely to be used.

No indexes will be automatically created on foreign keys constraint. But unique and primary key constraints will create theirs.
Creating indexes on the queries you use, be it on joins or on the WHERE clause is the way to go.

Like everything in the programming world, it depends. You obviously want to create indexes and relationships to preserve normalization and speed up database lookups. But you also want to balance that by not having too many indexes that it will take SQL Server more time to build every index. Also the more indexes you have the more fragmentation that can occur in your database.
So what I do is put in the obvious indexes and relationships and then optimize after the application is build on the possible slow queries.

Defining a relationship does not create the index.
Usually in places where you have a where clause against some field you want an index but be careful not to just throw indexes out all over the place because they can and do have an effect on insert/update performance.

I would start by making sure that every PK and FK has an index.
Further to that, I have found that using the Index Tuning Wizard in SSMS provides excellent recommendations when you feed it the right information.

Database Considerations
When you design an index, consider the following database guidelines:
Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes.
Avoid over-indexing heavily updated tables and keep indexes narrow,
that is, with as few columns as possible.
Use many indexes to improve query performance on tables with low
update requirements, but large volumes of data. Large numbers of
indexes can help the performance of queries that do not modify data,
such as SELECT statements, because the query optimizer has more
indexes to choose from to determine the fastest access method.
Indexing small tables may not be optimal because it can take the
query optimizer longer to traverse the index searching for data than
to perform a simple table scan. Therefore, indexes on small tables
might never be used, but must still be maintained as data in the
table changes.
Indexes on views can provide significant performance gains when the
view contains aggregations, table joins, or a combination of
aggregations and joins. The view does not have to be explicitly
referenced in the query for the query optimizer to use it.
--Stay_Safe--

Indexes aren't very expensive, and speed up queries more than you realize. I would recommend adding indexes to all key and non-key fields that are often used in queries. You can even use the execution plan to recommend additional indexes that would speed up your queries.
The only point where indexes aren't in your favour is when you're doing large amounts of data inserts. Each insert requires each index in a table to be updated along with the table's data.
You can opt to wait until the application is running and you have some known queries against the database that you want to improve, or you could do it now, if you have a good idea.

Related

oracle - two unrelated categories on row - how to index?

I have an OLTP application with three tables
Item Table - ItemId, CategoryId, AgeGroupId, ... 100K rows.
CategoryTable - CategoryId, ... (only 5-10 rows)
AgeGroupTable - AgeGroupId, ... (only 4-5 rows)
What is appropriate index for CategoryId and AgeGroupId for Item table? It would be nice to query items by Category or Agegroup or both of them!
I was thinking that a bitmap index might work due to low cardinality, but I don't know how exactly they work with multiple bitmap indexes per table? How would horizontal partitioning help, if at all?

Since this is an OLTP application, you almost certainly don't want to use a bitmap index. Bitmap indexes tend not to work well with OLTP applications. They tend to grow in size very rapidly when you do a lot of single-row operations on the data (though this effect is lessened in more recent versions). But more importantly, the locking impact tends to radically reduce the scalability of an application. If you had a bitmap index on CategoryID, for example, updating a single row's CategoryID would effectively require locking every row in the table that has a CategoryID of either the source or target value.
It sounds like, at most, you need composite indexes on (AgeGroupID, CategoryID) and (CategoryID, AgeGroupID). Potentially, you could use just the composite index on (AgeGroupID, CategoryID) and let Oracle use an index skip scan if only CategoryID is specified. It depends on the trade-offs you want to make-- multiple indexes will make queries just on CategoryID more efficient at the expense of additional index maintenance on DML operations and additional disk space usage.
Are you licensed to use partitioning? That is an extra cost option on top of the enterprise edition license. Potentially, I suppose, you could partition the table. A table with just 100,000 rows is pretty small to consider partitioning, though. And whatever you partition by would tend to make queries that don't use the partition key less efficient. That might make sense if you know that queries that specify AgeGroupID are much more common than CategoryID (or vice versa) but that doesn't sound like what you are describing.

This started off as a comment, but it's getting too long.
What is appropriate index for CategoryId and AgeGroupId?
In what context? Both data domains appear as primary and foreign keys in your example schema. However this beside the point.
You should only add indices where they are actually going to add value, and with less than 10 rows in each table, unless the data is very skewed, there's no benefit to indexing either domain at all. Inserts/updates will be slower and accessing the data via such an index will be slower than performing a full table scan on each of the 3 tables.
There may implicit relationships between other attributes in the item table whereby it makes sense to add the domains to other indexes (but not at the front) but without knowing a lot more about the data and the queries being run against it, I'd ignore this for now.

It really depends on what your queries look like. If you are always going to be filtering on or joining on only one column at a time, then bitmap indexes will work fine. If you will be filtering or joining based on both columns, a composite index can work as well.
In my experience, the best way to know for sure is to test both options. I have had success putting multiple bitmap indexes on a table, as well as using composite indexes. With only 100K rows in a table, you should be able to create and drop indexes very quickly. Then you can test your most common queries on with different sets of indexes.

Index all columns

Knowing that an indexed column leads to a better performance, is it worthy to indexes all columns in all tables of the database? What are the advantages/disadvantages of such approach?
If it is worthy, is there a way to auto create indexes in SQL Server? My application dynamically adds tables and columns (depending on the user configuration) and I would like to have them auto indexed.

It is difficult to imagine real-world scenarios where indexing every column would be useful, for the reasons mentioned above. The type of scenario would require a bunch of different queries, all accessing exactly one column of the table. Each query could be accessing a different column.
The other answers don't address the issues during the select side of the query. Obviously, maintaining indexes is an issue, but if you are creating the table/s once and then reading many, many times, the overhead of updates/inserts/deletes is not a consideration.
An index contains the original data along with points to records/pages where the data resides. The structure of an index makes it fast to do things like: find a single value, retrieve values in order, count the number of distinct values, and find the minimum and maximum values.
An index does not only take space up on disk. More importantly, it occupies memory. And, memory contention is often the factor that determines query performance. In general, building an index on every column will occupy more space than then original data. (One exception would be a column that is relative wide and has relatively few values.)
In addition, to satisfy many queries you may need one or more indexes plus the original data. Your page cache gets rather filled with data, which can increase the number of cache misses, which in turn incurs more overhead.
I wonder if your question is really a sign that you have not modelled your data structures adequately. There are few cases where you want users to build ad hoc permanent tables. More typically, their data would be stored in a pre-defined format, which you can optimize for the access requirements.

No because you have to take in consideration that every time you add or update a record, you have to recalculate your indexes and having indexes on all columns would take a lot of time and lead to bad performance.
So databases like data warehouses where there use only select queries is a good idea but on normal database it's a bad idea.
Also, it's not because you are using a column in a where clause that you have to add an index on it.
Try to find a column where the record will be almost all unique like a primary key and that you don't edit often.
A bad idea would be to index the sex of a person cause there are only 2 possible values and the result of the index would only split the data then it will search in almost every records.

No, you should not index all of your columns, and there's several reasons for this:
There is a cost to maintain each index during an insert, update or delete statement, that will cause each of those transactions to take longer.
It will increase the storage required since each index takes up space on disk.
If the column values are not disperse, the index will not be used/ignored (ex: A gender flag).
Composite indexes (indexes with more than one column) can greatly benefit performance for frequently run WHERE, GROUP BY, ORDER BY or JOIN clauses, and multiple single indexes cannot be combined.
You are much better off using Explain plans and data access and adding indexes when necessary (and only when necessary, IMHO), rather than creating them all up front.

No, there is overhead in maintaining the indexes, so indexing all columns would slow down all of your insert, update and delete operations. You should index the columns that you are frequently referencing in WHERE clauses, and you will see a benefit.

Indexes take up space. And they take up time to create, rebuild, maintain, etc. So there's not a guaranteed return on performance for indexing just any old column. You should index the columns that give the performance for the operations you'll use. Indexes help reads, so if you're mostly reading, index columns that will be searched on, sorted by, or joined to other tables relationally. Otherwise, it's more expensive than what benefit you may see.

Every index requires additional CPU time and disk I/O overhead during
inserts and deletions.
Indies on non-primary keys might have to be hanged on updates, although an index on the primary key might not (this is beause updates typially do not modify the primary-key attributes).
Each extra index requires additional storage spae.
For queries whih involve onditions on several searh keys, e ieny
might not be bad even if only some of the keys have indies on them.
Therefore, database performane is improved less by adding indies when
many indies already exist.

Database indexes: A good thing, a bad thing, or a waste of time?

Adding indexes is often suggested here as a remedy for performance problems.
(I'm talking about reading & querying ONLY, we all know indexes can make writing slower).
I have tried this remedy many times, over many years, both on DB2 and MSSQL, and the result were invariably disappointing.
My finding has been that no matter how 'obvious' it was that an index would make things better, it turned out that the query optimiser was smarter, and my cleverly-chosen index almost always made things worse.
I should point out that my experiences relate mostly to small tables (<100'000 rows).
Can anyone provide some down-to-earth guidelines on choices for indexing?
The correct answer would be a list of recommendations something like:
Never/always index a table with less than/more than NNNN records
Never/always consider indexes on multi-field keys
Never/always use clustered indexes
Never/always use more than NNN indexes on a single table
Never/always add an index when [some magic condition I'm dying to learn about]
Ideally, the answer will give some instructive examples.

Indexes are kind of like chemotherapy...too much and it kills you...too little and you die...do it the wrong way and you die. You gotta know just how much, how often, and what kind to make it not kill you.
Your hardware, platform, environment, load all play a role. So to answer your questions..
Yes, possibly sometimes.

As a rule of thumb, primary keys and foreign keys need to be indexed. Usually primary key are indexed just by defining them as such, but FKs are not in every database (they definitely are not in SQL Server, I can't really speak for other dbs). You will be using these in joins, so it is generally critical to performance to define these.
Now if you have fields you often use in where clauses, they can benefit from indexes as well providing several things:
First the field must have a range of
values. A bit field or a field with
only 2 or 3 values will almost never
use an index.
Second the queries you write must be sargable. That is they must be designed to use indexes. I suspect if you never get performance improvements from what look like likely candidates for indexes, then you probably have queries that are not sargable. For instance take "WHERE Name like '%Smith'" as a where clause. Without knowing the first characters, the optimizer can't use the index.
Small tables rarely benefit much from indexes. If the optimizer can hold the whole thing in memory, then it is often faster to do so. If you were working with multimillion record tables, you would see that indexes are critical.
Indexing can be very complex and if you are interested in the subject, I suggest you get a good book on performance tuning your particular database and read in depth about them.

An index that's never used is a waste of disk space, as well as adding to the insert/update/delete time. It's probably best to define the clustering index first, then define
additional indexes as you find yourself writing WHERE clauses.
One common index mistake I see is people wondering why a select on col2 (or col3) takes so long when the index is defined as col1 ASC, col2 ASC, col3 ASC. When you have a multiple column index, your WHERE clause must use the first column in the index, or the first and second column in the index, and so forth.
If you need to access the data by col2, then you need an additional index that's defined as col2 ASC.
With small domain tables, it's sometimes faster to do a table scan than it is to read rows from the table using an index. This depends on the speed of your database machine and the speed of the network.

You need indexes. Only with indexes you can access data fast enough.
To make it as short as possible:
add indexes for columns you are frequently filtering (or grouping) for. (eg. a state or name)
like and sql functions could make the DBMS not use indexes.
add indexes only on columns which have many different values (eg. no boolean fields)
It is common to add indexes to foreign keys, but it is not always needed.
don't add indexes in very short tables
never add indexes when you don't know how they should enhance performance.
Finally: look into execution plans to decide how to optimize queries.
You'll add indexes just for a single, critical query. In this case, you'll add exactly the indexes that are needed in the query in question (multi-column indexes).

Basically when DB is collecting data and it's alive indexes have to go and evolve with that flow. There maybe really good index on table but after growing beyond of XXX records the same index in the same table is useless and in that case it should be refactored.
To have optimized and fast DB the only way is to monitor it all the time and refactor it over the time as records come in.
Real life example i got some time ago was super fast query restricted by some time range (created_at between A and B) and super slow query where time range was different. Same query, same database, same application and only one difference on time range.

Always use clustered indexes.
In fact you can't help but using them. The data in a table will be laid out on disk in some particular order anyway, it can't be save as a pile or something. You have the chance of specifying how exactly this data will be laid out. Why burn it?
When you have a table which gets new records appended and you observe that some value in those records always grow (like StackOverflow question number), make a clustered index out of it. Then the new data will not be inserted in the middle but will basically be appended to a file on disk which is a relatively cheap operation.

If a table is expected to be the target of a join then it is best to have a clustered index on that table so that the joins can be performed sequentially through the data pages. The columns in the clustered index will (on some DB systems) be included in all of the other indexes on that table, since those are the values that the indexes will use to reference the table data. To keep the other indexes from getting too large, the columns in the clustered index should be as narrow as possible, so it is best to use only numeric—rather than character—data types in the clustered index. In general, fewer columns are better than more columns, but notice that three int columns (12 bytes per row) are much better than one nvarchar(32) column (potentially 64 bytes per row).
If the clustered index is narrow, then a few additional indexes should not negatively impact performance very much even on very large tables.

Seems you are confusing two concepts here.
Adding indices *generally can only make a read query faster, very very rarely (almost never) slower. Adding an index never forces the query optimizer to use it. It will only use it if it thinks it can benefit from it, and it is generally very smart about those decisions.
For inserts/updates, of course, every index hurts performance a bit more... But at the other end of the spectrum, for, say a read only database, (like a USPS address database which is distributed monthly), in operational use there would ne no inserts/updates, so the only negative impact of additional indices is the disk space they take up.
This is entirely different that specifying that the query optimizer USE an index, in effect overriding what it would do on it's own... That can potentially make a query slower.
EDIT: Edited to eliminate opportunity for misinterpretation by overly literal readers.

mysql too many indexes?

I am spending some time optimizing our current database.
I am looking at indexes specifically.
There are a few questions:
Is there such a thing as too many indexes?
What will indexes speed up?
What will indexes slow down?
When is it a good idea to add an index?
When is it a bad idea to add an index?
Pro's and Con's of multiple indexes vs multi-column indexes?

What will indexes speed up?
Data retrieval -- SELECT statements.
What will indexes slow down?
Data manipulation -- INSERT, UPDATE, DELETE statements.
When is it a good idea to add an index?
If you feel you want to get better data retrieval performance.
When is it a bad idea to add an index?
On tables that will see heavy data manipulation -- insertion, updating...
Pro's and Con's of multiple indexes vs multi-column indexes?
Queries need to address the order of columns when dealing with a covering index (an index on more than one column), from left to right in index column definition. The column order in the statement doesn't matter, only that of columns 1, 2 and 3 - a statement needs have a reference to column 1 before the index can be used. If there's only a reference to column 2 or 3, the covering index for 1/2/3 could not be used.
In MySQL, only one index can be used per SELECT/statement in the query (subqueries/etc are seen as a separate statement). And there's a limit to the amount of space per table that MySQL allows. Additionally, running a function on an indexed column renders the index useless - IE:
WHERE DATE(datetime_column) = ...

I disagree with some of the answers on this question.
Is there such a thing as too many indexes?
Of course. Don't create indexes that aren't used by any of your queries. Don't create redundant indexes. Use tools like pt-duplicate-key-checker and pt-index-usage to help you discover the indexes you don't need.
What will indexes speed up?
Search conditions in the WHERE clause.
Join conditions.
Some cases of ORDER BY.
Some cases of GROUP BY.
UNIQUE constraints.
FOREIGN KEY constraints.
FULLTEXT search.
Other answers have advised that INSERT/UPDATE/DELETE are slower the more indexes you have. That's true, but consider that many uses of UPDATE and DELETE also have WHERE clauses and in MySQL, UPDATE and DELETE support JOINs too. Indexes may benefit these queries more than making up for the overhead of updating indexes.
Also, InnoDB locks rows affected by an UPDATE or DELETE. They call this row-level locking, but it's really index-level locking. If there's no index to narrow down the search, InnoDB has to lock a lot more rows than the specific row you're changing. It can even lock all the rows in the table. These locks block changes made by other clients, even if they don't logically conflict.
When is it a good idea to add an index?
If you know you need to run a query that would benefit from an index in one of the above cases.
When is it a bad idea to add an index?
If the index is a left-prefix of another existing index, or the index doesn't help any of the queries you need to run.
Pro's and Con's of multiple indexes vs multi-column indexes?
In some cases, MySQL can perform index-merge optimization, and either union or intersect the results from independent index searches. But it gives better performance to define a single index so the index-merge doesn't need to be done.
For one of my consulting customers, I defined a multi-column index on a many-to-many table where there was no index, and improved their join query by a factor of 94 million!
Designing the right indexes is a complex process, based on the queries you need to optimize. You shouldn't make broad rules like "index everything" or "index nothing to avoid slowing down updates."
See also my presentation How to Design Indexes, Really.

Is there such a thing as too many indexes?
Indexes should be informed by the problem at hand: the tables, the queries your application will run, etc.
What will indexes speed up?
SELECTs.
What will indexes slow down?
INSERTs will be slower, because you have to update the index.
When is it a good idea to add an index?
When your application needs another WHERE clause.
When is it a bad idea to add an index?
When you don't need it to query or enforce uniqueness constraints.
Pros and Cons of multiple indexes vs multi-column indexes?
I don't understand the question. If you have a uniqueness constraint that includes multiple columns, by all means model it as such.

Is there such a thing as too many indexes?
Yes. Don't go out looking to create indexes, create them as necessary.
What will indexes speed up?
Any queries against the indexes table/view.
What will indexes slow down?
Any INSERT statements against the indexed table will be slowed down, because each new record will need to be indexed.
When is it a good idea to add an index?
When a query is not running at an acceptable speed. You may be filtering on records that are not part of the clustered PK, in which case you should add indexes based on the filters you are searching upon (if the performance deems fit).
When is it a bad idea to add an index?
When you do it for the sake of it - i.e over-optimization.
Pro's and Con's of multiple indexes vs multi-column indexes?
Depends on the queries you are trying to improve.

Is there such a thing as too many indexes?
Yup, like all things, too many indexes will slow down data manipulation.
When is it a good idea to add an index?
A good idea to add an index is when your queries are too slow (i.e. you have too many joins in your queries). You should use this optimization only after you built a solid model, to tweak the performance.

When should you consider indexing your sql tables?

How many records should there be before I consider indexing my sql tables?

There's no good reason to forego obvious indexes (FKs, etc.) when you're creating the table. It will never noticeably affect performance to have unnecessary indexes on tiny tables, and it's good to take a first cut when your mind is into schema design. Also, some indexes serve to prevent duplicates, which can be useful regardless of table size.
I guess the proper answer to your question is that the number of records in the table should have nothing to do with when to create indexes.

I would create the index entries when I create my table. If you decide to create indices after the table has grown to 100, 1000, 100000 entries it can just take alot of time and perhaps make your database unavailable while you are doing it.
Think about the table first, create the indices you think you'll need, and then move on.
In some cases you will discover that you should have indexed a column, if thats the case, fix it when you discover it.
Creating an index on a searched field is not a pre-optimization, its just what should be done.

When the query time is unacceptable. Better yet, create a few indexes now that are likely to be useful, and run an EXPLAIN or EXPLAIN ANALYZE on your queries once your database is populated by representative data. If the indexes aren't helping, drop them. If there are slow queries that could benefit from more or different indexes, change the indexes.
You are not going to be locked in to an initial choice of indexes. Experiment, and make sure you measure performance!

In general I agree with the previous advice.
Always declare the referential integrity for the tables (Primary Key, Foreign Keys), column constraints (not null, check). Saves you from nightmares when apps put bad data into the tables (even in development).
I'd consider adding indexes for the common access columns (columns in your where clauses which are used in =, <> tests), as well.
Most of the modern RDBMS implementations are quite good at keeping you indexes up to date, without hitting your performance. So, the cost of having indexes is minimal.
Also, most RDBMS's have query plan evaluators which look at the relative costs going to the data rows via the index, or using some sort of table scan. So, again the performance hits are minimal.

Two.
I'm serious. If there are two rows now, and there will always be two rows, the cost of indexing is almost zero. It's quicker to index than to ponder whether you should. It won't take the optimizer very long to figure out that scanning the table is quicker than using the index.
If there are two rows now, but there will be 200,000 in the near future, the cost of not indexing could become prohibitively high. The right time to consider indexing is now.
Having said this, remember that you get an index automatically when you declare a primary key. Creating a table with no primary key is asking for trouble in most cases. So the only time you really need to consider indexing is when you want an index other than the index on the primary key. You need to know the traffic, and the anticipated volume to make this call. If you get it wrong, you'll know, and you can reverse the decision.
I once saw a reference table that had been created with no index when it contained 20 rows. Due to a business change, this table had grown to about 900 rows, but the person who should have noticed the absence of an index didn't. The time to insert a new order had grown from about 10 seconds to 15 minutes.

As a matter of routine I perform the following on read heavy tables:
Create indexes on common join fields such as Foreign Keys when I create the table.
Check the query plan for Views or Stored Procedures and add indexes wherever a table scan is indicated.
Check the query plan for queries by my application and add indexes wherever a table scan is indicated. (and often try to make them into Stored Procedures)
On write heavy tables (like activity logs) I avoid indexes unless they are absolutely necessary. I also tend to archive such data into indexed tables at regular intervals.

It depends.
How much data is in the table? How often is data inserted? A lot of indexes can slow down insertion time. Do you always query all the rows of the table? In this case indexes probably won't help much.
Those aren't common usages though. In most cases, you know you're going to be querying a subset of data. ON what fields? Are there common fields that are always joined on? Look at query plans for common or typical queries, it will generally show you where it's spending all of its time.

If there's a unique constraint on the table (and there should be at least one), then that will usually be enforced by a unique index.
Otherwise, you add indexes when the query performance is bad and adding the index will demonstrably improve the performance. There are books on the subject of how to create good sets of indexes on tables, including Relational Database Index Design and the Optimizers. It will give you a lot of ideas and the reasons why they are good.
See also:
No indexes on small tables
When to create a new SQL Server index
Best Practices and Anti-Patterns in Creating Indexes
and, no doubt, a host of others.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas