mysql too many indexes? - sql

I am spending some time optimizing our current database.
I am looking at indexes specifically.
There are a few questions:
Is there such a thing as too many indexes?
What will indexes speed up?
What will indexes slow down?
When is it a good idea to add an index?
When is it a bad idea to add an index?
Pro's and Con's of multiple indexes vs multi-column indexes?

What will indexes speed up?
Data retrieval -- SELECT statements.
What will indexes slow down?
Data manipulation -- INSERT, UPDATE, DELETE statements.
When is it a good idea to add an index?
If you feel you want to get better data retrieval performance.
When is it a bad idea to add an index?
On tables that will see heavy data manipulation -- insertion, updating...
Pro's and Con's of multiple indexes vs multi-column indexes?
Queries need to address the order of columns when dealing with a covering index (an index on more than one column), from left to right in index column definition. The column order in the statement doesn't matter, only that of columns 1, 2 and 3 - a statement needs have a reference to column 1 before the index can be used. If there's only a reference to column 2 or 3, the covering index for 1/2/3 could not be used.
In MySQL, only one index can be used per SELECT/statement in the query (subqueries/etc are seen as a separate statement). And there's a limit to the amount of space per table that MySQL allows. Additionally, running a function on an indexed column renders the index useless - IE:
WHERE DATE(datetime_column) = ...

I disagree with some of the answers on this question.
Is there such a thing as too many indexes?
Of course. Don't create indexes that aren't used by any of your queries. Don't create redundant indexes. Use tools like pt-duplicate-key-checker and pt-index-usage to help you discover the indexes you don't need.
What will indexes speed up?
Search conditions in the WHERE clause.
Join conditions.
Some cases of ORDER BY.
Some cases of GROUP BY.
UNIQUE constraints.
FOREIGN KEY constraints.
FULLTEXT search.
Other answers have advised that INSERT/UPDATE/DELETE are slower the more indexes you have. That's true, but consider that many uses of UPDATE and DELETE also have WHERE clauses and in MySQL, UPDATE and DELETE support JOINs too. Indexes may benefit these queries more than making up for the overhead of updating indexes.
Also, InnoDB locks rows affected by an UPDATE or DELETE. They call this row-level locking, but it's really index-level locking. If there's no index to narrow down the search, InnoDB has to lock a lot more rows than the specific row you're changing. It can even lock all the rows in the table. These locks block changes made by other clients, even if they don't logically conflict.
When is it a good idea to add an index?
If you know you need to run a query that would benefit from an index in one of the above cases.
When is it a bad idea to add an index?
If the index is a left-prefix of another existing index, or the index doesn't help any of the queries you need to run.
Pro's and Con's of multiple indexes vs multi-column indexes?
In some cases, MySQL can perform index-merge optimization, and either union or intersect the results from independent index searches. But it gives better performance to define a single index so the index-merge doesn't need to be done.
For one of my consulting customers, I defined a multi-column index on a many-to-many table where there was no index, and improved their join query by a factor of 94 million!
Designing the right indexes is a complex process, based on the queries you need to optimize. You shouldn't make broad rules like "index everything" or "index nothing to avoid slowing down updates."
See also my presentation How to Design Indexes, Really.

Is there such a thing as too many indexes?
Indexes should be informed by the problem at hand: the tables, the queries your application will run, etc.
What will indexes speed up?
SELECTs.
What will indexes slow down?
INSERTs will be slower, because you have to update the index.
When is it a good idea to add an index?
When your application needs another WHERE clause.
When is it a bad idea to add an index?
When you don't need it to query or enforce uniqueness constraints.
Pros and Cons of multiple indexes vs multi-column indexes?
I don't understand the question. If you have a uniqueness constraint that includes multiple columns, by all means model it as such.

Is there such a thing as too many indexes?
Yes. Don't go out looking to create indexes, create them as necessary.
What will indexes speed up?
Any queries against the indexes table/view.
What will indexes slow down?
Any INSERT statements against the indexed table will be slowed down, because each new record will need to be indexed.
When is it a good idea to add an index?
When a query is not running at an acceptable speed. You may be filtering on records that are not part of the clustered PK, in which case you should add indexes based on the filters you are searching upon (if the performance deems fit).
When is it a bad idea to add an index?
When you do it for the sake of it - i.e over-optimization.
Pro's and Con's of multiple indexes vs multi-column indexes?
Depends on the queries you are trying to improve.

Is there such a thing as too many indexes?
Yup, like all things, too many indexes will slow down data manipulation.
When is it a good idea to add an index?
A good idea to add an index is when your queries are too slow (i.e. you have too many joins in your queries). You should use this optimization only after you built a solid model, to tweak the performance.

Related

Improving efficiency of SQL statement

This is a theoretical example - a question I've been posed as a way of helping me learn. I'd appreciate any pointers and explanation, as when it comes to large scale databases, I am more than amateur but keen to learn more.
The following SQL is about to be run on a database of 8 million records, what can be done to improve efficiency?
SELECT
email_subscribers_extra.sendscont,
email_list_subscribers_new.subscriberid,
email_subscribers_extra.lastopen 
FROM
email_list_subscribers_new
LEFT JOIN
email_subscribers_extra ON email_subscribers_extra.subscriberid = email_list_subscribers_new.subscriberid
WHERE
email_list_subscribers_new.partner = 'AJ'
ORDER BY
email_subscribers_extra.lastsend ASC LIMIT 0, 40000
EDIT:
Lets presume that it is a Linux apache, mysql server as that is what I typically use. There are no indexing set at present.
Improving efficiency is a loaded phrase in SQL tuning. You usually add indexes to speed up SELECT performance, but adding indexes usually slows down updates. Every INSERT and DELETE statement requires rewriting indexes. Many UPDATE statements will require rewriting indexes, too.
Without looking at the execution plan, and without considering the effect on INSERT, UPDATE, and DELETE statements, I'd suggest adding indexes on
every column used in the WHERE clause,
every column used in the JOIN condition,
every column used in the ORDER BY.
Most dbms will automatically add an index to columns used in a foreign key reference. At least some of them will let you drop that index later, though.
So add an index, or make sure the index supporting a foreign key constraint exists, on these columns.
email_list_subscribers_new.subscriberid
email_list_subscribers_new.partner
email_subscribers_extra.subscriberid
email_subscribers_extra.lastsend
In some cases, one multi-column index will give you better SELECT performance than two single-column indexes.

Database indexes: A good thing, a bad thing, or a waste of time?

Adding indexes is often suggested here as a remedy for performance problems.
(I'm talking about reading & querying ONLY, we all know indexes can make writing slower).
I have tried this remedy many times, over many years, both on DB2 and MSSQL, and the result were invariably disappointing.
My finding has been that no matter how 'obvious' it was that an index would make things better, it turned out that the query optimiser was smarter, and my cleverly-chosen index almost always made things worse.
I should point out that my experiences relate mostly to small tables (<100'000 rows).
Can anyone provide some down-to-earth guidelines on choices for indexing?
The correct answer would be a list of recommendations something like:
Never/always index a table with less than/more than NNNN records
Never/always consider indexes on multi-field keys
Never/always use clustered indexes
Never/always use more than NNN indexes on a single table
Never/always add an index when [some magic condition I'm dying to learn about]
Ideally, the answer will give some instructive examples.
Indexes are kind of like chemotherapy...too much and it kills you...too little and you die...do it the wrong way and you die. You gotta know just how much, how often, and what kind to make it not kill you.
Your hardware, platform, environment, load all play a role. So to answer your questions..
Yes, possibly sometimes.
As a rule of thumb, primary keys and foreign keys need to be indexed. Usually primary key are indexed just by defining them as such, but FKs are not in every database (they definitely are not in SQL Server, I can't really speak for other dbs). You will be using these in joins, so it is generally critical to performance to define these.
Now if you have fields you often use in where clauses, they can benefit from indexes as well providing several things:
First the field must have a range of
values. A bit field or a field with
only 2 or 3 values will almost never
use an index.
Second the queries you write must be sargable. That is they must be designed to use indexes. I suspect if you never get performance improvements from what look like likely candidates for indexes, then you probably have queries that are not sargable. For instance take "WHERE Name like '%Smith'" as a where clause. Without knowing the first characters, the optimizer can't use the index.
Small tables rarely benefit much from indexes. If the optimizer can hold the whole thing in memory, then it is often faster to do so. If you were working with multimillion record tables, you would see that indexes are critical.
Indexing can be very complex and if you are interested in the subject, I suggest you get a good book on performance tuning your particular database and read in depth about them.
An index that's never used is a waste of disk space, as well as adding to the insert/update/delete time. It's probably best to define the clustering index first, then define
additional indexes as you find yourself writing WHERE clauses.
One common index mistake I see is people wondering why a select on col2 (or col3) takes so long when the index is defined as col1 ASC, col2 ASC, col3 ASC. When you have a multiple column index, your WHERE clause must use the first column in the index, or the first and second column in the index, and so forth.
If you need to access the data by col2, then you need an additional index that's defined as col2 ASC.
With small domain tables, it's sometimes faster to do a table scan than it is to read rows from the table using an index. This depends on the speed of your database machine and the speed of the network.
You need indexes. Only with indexes you can access data fast enough.
To make it as short as possible:
add indexes for columns you are frequently filtering (or grouping) for. (eg. a state or name)
like and sql functions could make the DBMS not use indexes.
add indexes only on columns which have many different values (eg. no boolean fields)
It is common to add indexes to foreign keys, but it is not always needed.
don't add indexes in very short tables
never add indexes when you don't know how they should enhance performance.
Finally: look into execution plans to decide how to optimize queries.
You'll add indexes just for a single, critical query. In this case, you'll add exactly the indexes that are needed in the query in question (multi-column indexes).
Basically when DB is collecting data and it's alive indexes have to go and evolve with that flow. There maybe really good index on table but after growing beyond of XXX records the same index in the same table is useless and in that case it should be refactored.
To have optimized and fast DB the only way is to monitor it all the time and refactor it over the time as records come in.
Real life example i got some time ago was super fast query restricted by some time range (created_at between A and B) and super slow query where time range was different. Same query, same database, same application and only one difference on time range.
Always use clustered indexes.
In fact you can't help but using them. The data in a table will be laid out on disk in some particular order anyway, it can't be save as a pile or something. You have the chance of specifying how exactly this data will be laid out. Why burn it?
When you have a table which gets new records appended and you observe that some value in those records always grow (like StackOverflow question number), make a clustered index out of it. Then the new data will not be inserted in the middle but will basically be appended to a file on disk which is a relatively cheap operation.
If a table is expected to be the target of a join then it is best to have a clustered index on that table so that the joins can be performed sequentially through the data pages. The columns in the clustered index will (on some DB systems) be included in all of the other indexes on that table, since those are the values that the indexes will use to reference the table data. To keep the other indexes from getting too large, the columns in the clustered index should be as narrow as possible, so it is best to use only numeric—rather than character—data types in the clustered index. In general, fewer columns are better than more columns, but notice that three int columns (12 bytes per row) are much better than one nvarchar(32) column (potentially 64 bytes per row).
If the clustered index is narrow, then a few additional indexes should not negatively impact performance very much even on very large tables.
Seems you are confusing two concepts here.
Adding indices *generally can only make a read query faster, very very rarely (almost never) slower. Adding an index never forces the query optimizer to use it. It will only use it if it thinks it can benefit from it, and it is generally very smart about those decisions.
For inserts/updates, of course, every index hurts performance a bit more... But at the other end of the spectrum, for, say a read only database, (like a USPS address database which is distributed monthly), in operational use there would ne no inserts/updates, so the only negative impact of additional indices is the disk space they take up.
This is entirely different that specifying that the query optimizer USE an index, in effect overriding what it would do on it's own... That can potentially make a query slower.
EDIT: Edited to eliminate opportunity for misinterpretation by overly literal readers.

Non-clustered index in sql server

can i specify more then one column as Non-cluster index, what will it affect?
Like other people have said, this will have an overhead in terms of maintaining the table, it's just a case of whether the benefits outweigh the cost.
The only reasons I can think of for doing adding a multiple-column non-clustered index are to speed up queries where you regularly search based on a combination of fields, or to make a combination of fields unique in the table. If those are your aims then generally I'd say go for it.
Assuming you are asking about how to define multiple indexes, you can define as many non-clustered indexes as you want. It's the clustered index that you can only have one of, since it determines how rows are clumped together as records on disk.
The more indexes you create, the longer it will take to perform insert, update and delete operations.
Yes, you can have multiple columns in an index, be it clustered or non-clustered. It is very common. It's effect depends entirely on the data in your table.

When should you consider indexing your sql tables?

How many records should there be before I consider indexing my sql tables?
There's no good reason to forego obvious indexes (FKs, etc.) when you're creating the table. It will never noticeably affect performance to have unnecessary indexes on tiny tables, and it's good to take a first cut when your mind is into schema design. Also, some indexes serve to prevent duplicates, which can be useful regardless of table size.
I guess the proper answer to your question is that the number of records in the table should have nothing to do with when to create indexes.
I would create the index entries when I create my table. If you decide to create indices after the table has grown to 100, 1000, 100000 entries it can just take alot of time and perhaps make your database unavailable while you are doing it.
Think about the table first, create the indices you think you'll need, and then move on.
In some cases you will discover that you should have indexed a column, if thats the case, fix it when you discover it.
Creating an index on a searched field is not a pre-optimization, its just what should be done.
When the query time is unacceptable. Better yet, create a few indexes now that are likely to be useful, and run an EXPLAIN or EXPLAIN ANALYZE on your queries once your database is populated by representative data. If the indexes aren't helping, drop them. If there are slow queries that could benefit from more or different indexes, change the indexes.
You are not going to be locked in to an initial choice of indexes. Experiment, and make sure you measure performance!
In general I agree with the previous advice.
Always declare the referential integrity for the tables (Primary Key, Foreign Keys), column constraints (not null, check). Saves you from nightmares when apps put bad data into the tables (even in development).
I'd consider adding indexes for the common access columns (columns in your where clauses which are used in =, <> tests), as well.
Most of the modern RDBMS implementations are quite good at keeping you indexes up to date, without hitting your performance. So, the cost of having indexes is minimal.
Also, most RDBMS's have query plan evaluators which look at the relative costs going to the data rows via the index, or using some sort of table scan. So, again the performance hits are minimal.
Two.
I'm serious. If there are two rows now, and there will always be two rows, the cost of indexing is almost zero. It's quicker to index than to ponder whether you should. It won't take the optimizer very long to figure out that scanning the table is quicker than using the index.
If there are two rows now, but there will be 200,000 in the near future, the cost of not indexing could become prohibitively high. The right time to consider indexing is now.
Having said this, remember that you get an index automatically when you declare a primary key. Creating a table with no primary key is asking for trouble in most cases. So the only time you really need to consider indexing is when you want an index other than the index on the primary key. You need to know the traffic, and the anticipated volume to make this call. If you get it wrong, you'll know, and you can reverse the decision.
I once saw a reference table that had been created with no index when it contained 20 rows. Due to a business change, this table had grown to about 900 rows, but the person who should have noticed the absence of an index didn't. The time to insert a new order had grown from about 10 seconds to 15 minutes.
As a matter of routine I perform the following on read heavy tables:
Create indexes on common join fields such as Foreign Keys when I create the table.
Check the query plan for Views or Stored Procedures and add indexes wherever a table scan is indicated.
Check the query plan for queries by my application and add indexes wherever a table scan is indicated. (and often try to make them into Stored Procedures)
On write heavy tables (like activity logs) I avoid indexes unless they are absolutely necessary. I also tend to archive such data into indexed tables at regular intervals.
It depends.
How much data is in the table? How often is data inserted? A lot of indexes can slow down insertion time. Do you always query all the rows of the table? In this case indexes probably won't help much.
Those aren't common usages though. In most cases, you know you're going to be querying a subset of data. ON what fields? Are there common fields that are always joined on? Look at query plans for common or typical queries, it will generally show you where it's spending all of its time.
If there's a unique constraint on the table (and there should be at least one), then that will usually be enforced by a unique index.
Otherwise, you add indexes when the query performance is bad and adding the index will demonstrably improve the performance. There are books on the subject of how to create good sets of indexes on tables, including Relational Database Index Design and the Optimizers. It will give you a lot of ideas and the reasons why they are good.
See also:
No indexes on small tables
When to create a new SQL Server index
Best Practices and Anti-Patterns in Creating Indexes
and, no doubt, a host of others.

SQL Relationships and indexes

I have an MS SQL server application where I have defined my relationships and primary keys.
However do I need to further define indexes on relationship fields which are sometimes not used in joins and just as part of a where clause?
I am working on the assumption that defining a relationship creates an index, which the sql engine can reuse.
Some very thick books have been written on this subject!
Here are some ruiles of thumb:-
Dont bother indexing (apart from PK) any table with < 1000 rows.
Otherwise index all your FKs.
Examine your SQL and look for the where clauses that will most reduce your result sets and index that columun.
eg. given:
SELECT OWNER FROM CARS WHERE COLOUR = 'RED' AND MANUFACTURER = "BMW" AND ECAP = "2.0";
You may have 5000 red cars out of 20,000 so indexing this wont help much.
However you may only have 100 BMWs so indexing MANUFACURER will immediatly reduce you result set to 100 and you can eliminate the the blue and white cars by simply scanning through the hundred rows.
Generally the dbms will pick one or two of the indexes available based on cardinality so it pays to second guess and define only those indexes that are likely to be used.
No indexes will be automatically created on foreign keys constraint. But unique and primary key constraints will create theirs.
Creating indexes on the queries you use, be it on joins or on the WHERE clause is the way to go.
Like everything in the programming world, it depends. You obviously want to create indexes and relationships to preserve normalization and speed up database lookups. But you also want to balance that by not having too many indexes that it will take SQL Server more time to build every index. Also the more indexes you have the more fragmentation that can occur in your database.
So what I do is put in the obvious indexes and relationships and then optimize after the application is build on the possible slow queries.
Defining a relationship does not create the index.
Usually in places where you have a where clause against some field you want an index but be careful not to just throw indexes out all over the place because they can and do have an effect on insert/update performance.
I would start by making sure that every PK and FK has an index.
Further to that, I have found that using the Index Tuning Wizard in SSMS provides excellent recommendations when you feed it the right information.
Database Considerations
When you design an index, consider the following database guidelines:
Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes.
Avoid over-indexing heavily updated tables and keep indexes narrow,
that is, with as few columns as possible.
Use many indexes to improve query performance on tables with low
update requirements, but large volumes of data. Large numbers of
indexes can help the performance of queries that do not modify data,
such as SELECT statements, because the query optimizer has more
indexes to choose from to determine the fastest access method.
Indexing small tables may not be optimal because it can take the
query optimizer longer to traverse the index searching for data than
to perform a simple table scan. Therefore, indexes on small tables
might never be used, but must still be maintained as data in the
table changes.
Indexes on views can provide significant performance gains when the
view contains aggregations, table joins, or a combination of
aggregations and joins. The view does not have to be explicitly
referenced in the query for the query optimizer to use it.
--Stay_Safe--
Indexes aren't very expensive, and speed up queries more than you realize. I would recommend adding indexes to all key and non-key fields that are often used in queries. You can even use the execution plan to recommend additional indexes that would speed up your queries.
The only point where indexes aren't in your favour is when you're doing large amounts of data inserts. Each insert requires each index in a table to be updated along with the table's data.
You can opt to wait until the application is running and you have some known queries against the database that you want to improve, or you could do it now, if you have a good idea.