Improving efficiency of SQL statement - sql

This is a theoretical example - a question I've been posed as a way of helping me learn. I'd appreciate any pointers and explanation, as when it comes to large scale databases, I am more than amateur but keen to learn more.
The following SQL is about to be run on a database of 8 million records, what can be done to improve efficiency?
SELECT
email_subscribers_extra.sendscont,
email_list_subscribers_new.subscriberid,
email_subscribers_extra.lastopen 
FROM
email_list_subscribers_new
LEFT JOIN
email_subscribers_extra ON email_subscribers_extra.subscriberid = email_list_subscribers_new.subscriberid
WHERE
email_list_subscribers_new.partner = 'AJ'
ORDER BY
email_subscribers_extra.lastsend ASC LIMIT 0, 40000
EDIT:
Lets presume that it is a Linux apache, mysql server as that is what I typically use. There are no indexing set at present.

Improving efficiency is a loaded phrase in SQL tuning. You usually add indexes to speed up SELECT performance, but adding indexes usually slows down updates. Every INSERT and DELETE statement requires rewriting indexes. Many UPDATE statements will require rewriting indexes, too.
Without looking at the execution plan, and without considering the effect on INSERT, UPDATE, and DELETE statements, I'd suggest adding indexes on
every column used in the WHERE clause,
every column used in the JOIN condition,
every column used in the ORDER BY.
Most dbms will automatically add an index to columns used in a foreign key reference. At least some of them will let you drop that index later, though.
So add an index, or make sure the index supporting a foreign key constraint exists, on these columns.
email_list_subscribers_new.subscriberid
email_list_subscribers_new.partner
email_subscribers_extra.subscriberid
email_subscribers_extra.lastsend
In some cases, one multi-column index will give you better SELECT performance than two single-column indexes.

Related

How to set the right indexes on a sql table?

How can I identify the indexes that are worth to set on a sql table?
Take the following as an example:
select *
from products
where name = 'car'
and type = 'vehicle'
and availability > 3
and insertion_date > '2015-10-10'
order by price asc
limit 1
Imagine a database with a few million entries.
Would there be benefits if I set an index on the combination of all attributes that occur in the WHERE and ORDER BY clause?
For the example:
create index i_my_idx on products
(name, type, availability, insertion_date, price)
There are a few rules of thumb that can be useful when deciding which columns to index:
Make sure there's a unique index on the primary key - this is done automatically when you specify a PK in most RDBMSs including postgresql.
Add indexes for each foreign key. These are created automatically in some RDBMSs when you specify a FK but not in postgresql.
If a PK is a compound key, consider adding indexes on each FK making up the PK (except for the first, which is covered by the PK index). As in 2, some RDBMSs (e.g. MySQL with ISAM) add these indexes automatically when the FKs are specified.
Usually, but not always, table joins in queries will be PF to FK and by having indexes on both keys, the query optimizer of the RDBMS has flexibility in determining the optimum plan for maximum performance. This won't always be the best though, and experienced programmers will often format the SQL for a database query to influence the execution plan for best performance, or decide to omit indexes they know are not needed. It's worth noting that an SQL query that is optimal on one RDBMS is not necessarily optimal on another, or on future versions of the DB server, or as the database grows. The latter is important as in some RDBMSs such as postgres and Oracle, the query execution plans are dependent on the data in the tables (this is known as cost-based optimisation).
Once you've got these out of the way a lot comes down to experience and a knowledge of your data, and importantly, how the data is going to be accessed.
Generally you will be looking to index those columns which are best at filtering the data. In your query above, the obvious one is name. This might be enough to make that query run fast enough (unless all your products are cars).
Other than that it's worth making a list of the common ways the data is likely to be accessed e.g.
Get a list of products that are in a category - an index on category will probably help
However, get a list of products that are currently available - an index on availability will probably not help because a large proportion of products are likely to satisfy this condition.
Unless you are dealing with large amounts of data this can often be all you need to do, and it's not generally a good idea to add indexes "just in case" as there are overheads in maintaining them. But if your system does has performance issues, then it's worth considering how combinations of columns are being used in queries, reading up about the postgres query optimizer etc.
And to answer your last question - possibly, but it's far from the first thing to consider.
Well the way you are setting indexes is absolutely correct. Indexes has nothing to do with order by clause.
Some important points while designing SQL query
Always put the condition first in WHERE clause which will filter maximum rows for eg above query name ='car' will filter maximum records in products.
Do not use ">=" use ">" only because greater or equal to will always end up in checking greater first if failed equals as well which will reduce performance of query.
Create a single index in same order your where clause is arranged in.
Try minimizing IN clause use ANY instead.
Thanks
Anant

mysql too many indexes?

I am spending some time optimizing our current database.
I am looking at indexes specifically.
There are a few questions:
Is there such a thing as too many indexes?
What will indexes speed up?
What will indexes slow down?
When is it a good idea to add an index?
When is it a bad idea to add an index?
Pro's and Con's of multiple indexes vs multi-column indexes?
What will indexes speed up?
Data retrieval -- SELECT statements.
What will indexes slow down?
Data manipulation -- INSERT, UPDATE, DELETE statements.
When is it a good idea to add an index?
If you feel you want to get better data retrieval performance.
When is it a bad idea to add an index?
On tables that will see heavy data manipulation -- insertion, updating...
Pro's and Con's of multiple indexes vs multi-column indexes?
Queries need to address the order of columns when dealing with a covering index (an index on more than one column), from left to right in index column definition. The column order in the statement doesn't matter, only that of columns 1, 2 and 3 - a statement needs have a reference to column 1 before the index can be used. If there's only a reference to column 2 or 3, the covering index for 1/2/3 could not be used.
In MySQL, only one index can be used per SELECT/statement in the query (subqueries/etc are seen as a separate statement). And there's a limit to the amount of space per table that MySQL allows. Additionally, running a function on an indexed column renders the index useless - IE:
WHERE DATE(datetime_column) = ...
I disagree with some of the answers on this question.
Is there such a thing as too many indexes?
Of course. Don't create indexes that aren't used by any of your queries. Don't create redundant indexes. Use tools like pt-duplicate-key-checker and pt-index-usage to help you discover the indexes you don't need.
What will indexes speed up?
Search conditions in the WHERE clause.
Join conditions.
Some cases of ORDER BY.
Some cases of GROUP BY.
UNIQUE constraints.
FOREIGN KEY constraints.
FULLTEXT search.
Other answers have advised that INSERT/UPDATE/DELETE are slower the more indexes you have. That's true, but consider that many uses of UPDATE and DELETE also have WHERE clauses and in MySQL, UPDATE and DELETE support JOINs too. Indexes may benefit these queries more than making up for the overhead of updating indexes.
Also, InnoDB locks rows affected by an UPDATE or DELETE. They call this row-level locking, but it's really index-level locking. If there's no index to narrow down the search, InnoDB has to lock a lot more rows than the specific row you're changing. It can even lock all the rows in the table. These locks block changes made by other clients, even if they don't logically conflict.
When is it a good idea to add an index?
If you know you need to run a query that would benefit from an index in one of the above cases.
When is it a bad idea to add an index?
If the index is a left-prefix of another existing index, or the index doesn't help any of the queries you need to run.
Pro's and Con's of multiple indexes vs multi-column indexes?
In some cases, MySQL can perform index-merge optimization, and either union or intersect the results from independent index searches. But it gives better performance to define a single index so the index-merge doesn't need to be done.
For one of my consulting customers, I defined a multi-column index on a many-to-many table where there was no index, and improved their join query by a factor of 94 million!
Designing the right indexes is a complex process, based on the queries you need to optimize. You shouldn't make broad rules like "index everything" or "index nothing to avoid slowing down updates."
See also my presentation How to Design Indexes, Really.
Is there such a thing as too many indexes?
Indexes should be informed by the problem at hand: the tables, the queries your application will run, etc.
What will indexes speed up?
SELECTs.
What will indexes slow down?
INSERTs will be slower, because you have to update the index.
When is it a good idea to add an index?
When your application needs another WHERE clause.
When is it a bad idea to add an index?
When you don't need it to query or enforce uniqueness constraints.
Pros and Cons of multiple indexes vs multi-column indexes?
I don't understand the question. If you have a uniqueness constraint that includes multiple columns, by all means model it as such.
Is there such a thing as too many indexes?
Yes. Don't go out looking to create indexes, create them as necessary.
What will indexes speed up?
Any queries against the indexes table/view.
What will indexes slow down?
Any INSERT statements against the indexed table will be slowed down, because each new record will need to be indexed.
When is it a good idea to add an index?
When a query is not running at an acceptable speed. You may be filtering on records that are not part of the clustered PK, in which case you should add indexes based on the filters you are searching upon (if the performance deems fit).
When is it a bad idea to add an index?
When you do it for the sake of it - i.e over-optimization.
Pro's and Con's of multiple indexes vs multi-column indexes?
Depends on the queries you are trying to improve.
Is there such a thing as too many indexes?
Yup, like all things, too many indexes will slow down data manipulation.
When is it a good idea to add an index?
A good idea to add an index is when your queries are too slow (i.e. you have too many joins in your queries). You should use this optimization only after you built a solid model, to tweak the performance.

How to know when to use indexes and which type?

I've searched a bit and didn't see any similar question, so here goes.
How do you know when to put an index in a table? How do you decide which columns to include in the index? When should a clustered index be used?
Can an index ever slow down the performance of select statements? How many indexes is too many and how big of a table do you need for it to benefit from an index?
EDIT:
What about column data types? Is it ok to have an index on a varchar or datetime?
Well, the first question is easy:
When should a clustered index be used?
Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:
narrow
static (never changes)
unique
if ever possible: ever increasing
INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.
Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....
Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!
Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.
Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.
Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).
Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.
Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column
There is more but you could do worse than using SQL Server's missing index DMVs
An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes
Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?
A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.
Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.
As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.
This is really a very involved question, though a good starting place would be to index any column that you will filter results on. ie. If you often break products into groups by sale price, index the sale_price column of the products table to improve scan times for that query, etc.
If you are querying based on the value in a column, you probably want to index that column.
i.e.
SELECT a,b,c FROM MyTable WHERE x = 1
You would want an index on X.
Generally, I add indexes for columns which are frequently queried, and I add compound indexes when I'm querying on more than one column.
Indexes won't hurt the performance of a SELECT, but they may slow down INSERTS (or UPDATES) if you have too many indexes columns per table.
As a rule of thumb - start off by adding indexes when you find yourself saying WHERE a = 123 (in this case, an index for "a").
You should use an index on columns that you use for selection and ordering - i.e. the WHERE and ORDER BY clauses.
Indexes can slow down select statements if there are many of them and you are using WHERE and ORDER BY on columns that have not been indexed.
As for size of table - several thousands rows and upwards would start showing real benefits to index usage.
Having said that, there are automated tools to do this, and SQL server has an Database Tuning Advisor that will help with this.

When should you consider indexing your sql tables?

How many records should there be before I consider indexing my sql tables?
There's no good reason to forego obvious indexes (FKs, etc.) when you're creating the table. It will never noticeably affect performance to have unnecessary indexes on tiny tables, and it's good to take a first cut when your mind is into schema design. Also, some indexes serve to prevent duplicates, which can be useful regardless of table size.
I guess the proper answer to your question is that the number of records in the table should have nothing to do with when to create indexes.
I would create the index entries when I create my table. If you decide to create indices after the table has grown to 100, 1000, 100000 entries it can just take alot of time and perhaps make your database unavailable while you are doing it.
Think about the table first, create the indices you think you'll need, and then move on.
In some cases you will discover that you should have indexed a column, if thats the case, fix it when you discover it.
Creating an index on a searched field is not a pre-optimization, its just what should be done.
When the query time is unacceptable. Better yet, create a few indexes now that are likely to be useful, and run an EXPLAIN or EXPLAIN ANALYZE on your queries once your database is populated by representative data. If the indexes aren't helping, drop them. If there are slow queries that could benefit from more or different indexes, change the indexes.
You are not going to be locked in to an initial choice of indexes. Experiment, and make sure you measure performance!
In general I agree with the previous advice.
Always declare the referential integrity for the tables (Primary Key, Foreign Keys), column constraints (not null, check). Saves you from nightmares when apps put bad data into the tables (even in development).
I'd consider adding indexes for the common access columns (columns in your where clauses which are used in =, <> tests), as well.
Most of the modern RDBMS implementations are quite good at keeping you indexes up to date, without hitting your performance. So, the cost of having indexes is minimal.
Also, most RDBMS's have query plan evaluators which look at the relative costs going to the data rows via the index, or using some sort of table scan. So, again the performance hits are minimal.
Two.
I'm serious. If there are two rows now, and there will always be two rows, the cost of indexing is almost zero. It's quicker to index than to ponder whether you should. It won't take the optimizer very long to figure out that scanning the table is quicker than using the index.
If there are two rows now, but there will be 200,000 in the near future, the cost of not indexing could become prohibitively high. The right time to consider indexing is now.
Having said this, remember that you get an index automatically when you declare a primary key. Creating a table with no primary key is asking for trouble in most cases. So the only time you really need to consider indexing is when you want an index other than the index on the primary key. You need to know the traffic, and the anticipated volume to make this call. If you get it wrong, you'll know, and you can reverse the decision.
I once saw a reference table that had been created with no index when it contained 20 rows. Due to a business change, this table had grown to about 900 rows, but the person who should have noticed the absence of an index didn't. The time to insert a new order had grown from about 10 seconds to 15 minutes.
As a matter of routine I perform the following on read heavy tables:
Create indexes on common join fields such as Foreign Keys when I create the table.
Check the query plan for Views or Stored Procedures and add indexes wherever a table scan is indicated.
Check the query plan for queries by my application and add indexes wherever a table scan is indicated. (and often try to make them into Stored Procedures)
On write heavy tables (like activity logs) I avoid indexes unless they are absolutely necessary. I also tend to archive such data into indexed tables at regular intervals.
It depends.
How much data is in the table? How often is data inserted? A lot of indexes can slow down insertion time. Do you always query all the rows of the table? In this case indexes probably won't help much.
Those aren't common usages though. In most cases, you know you're going to be querying a subset of data. ON what fields? Are there common fields that are always joined on? Look at query plans for common or typical queries, it will generally show you where it's spending all of its time.
If there's a unique constraint on the table (and there should be at least one), then that will usually be enforced by a unique index.
Otherwise, you add indexes when the query performance is bad and adding the index will demonstrably improve the performance. There are books on the subject of how to create good sets of indexes on tables, including Relational Database Index Design and the Optimizers. It will give you a lot of ideas and the reasons why they are good.
See also:
No indexes on small tables
When to create a new SQL Server index
Best Practices and Anti-Patterns in Creating Indexes
and, no doubt, a host of others.

SQL Relationships and indexes

I have an MS SQL server application where I have defined my relationships and primary keys.
However do I need to further define indexes on relationship fields which are sometimes not used in joins and just as part of a where clause?
I am working on the assumption that defining a relationship creates an index, which the sql engine can reuse.
Some very thick books have been written on this subject!
Here are some ruiles of thumb:-
Dont bother indexing (apart from PK) any table with < 1000 rows.
Otherwise index all your FKs.
Examine your SQL and look for the where clauses that will most reduce your result sets and index that columun.
eg. given:
SELECT OWNER FROM CARS WHERE COLOUR = 'RED' AND MANUFACTURER = "BMW" AND ECAP = "2.0";
You may have 5000 red cars out of 20,000 so indexing this wont help much.
However you may only have 100 BMWs so indexing MANUFACURER will immediatly reduce you result set to 100 and you can eliminate the the blue and white cars by simply scanning through the hundred rows.
Generally the dbms will pick one or two of the indexes available based on cardinality so it pays to second guess and define only those indexes that are likely to be used.
No indexes will be automatically created on foreign keys constraint. But unique and primary key constraints will create theirs.
Creating indexes on the queries you use, be it on joins or on the WHERE clause is the way to go.
Like everything in the programming world, it depends. You obviously want to create indexes and relationships to preserve normalization and speed up database lookups. But you also want to balance that by not having too many indexes that it will take SQL Server more time to build every index. Also the more indexes you have the more fragmentation that can occur in your database.
So what I do is put in the obvious indexes and relationships and then optimize after the application is build on the possible slow queries.
Defining a relationship does not create the index.
Usually in places where you have a where clause against some field you want an index but be careful not to just throw indexes out all over the place because they can and do have an effect on insert/update performance.
I would start by making sure that every PK and FK has an index.
Further to that, I have found that using the Index Tuning Wizard in SSMS provides excellent recommendations when you feed it the right information.
Database Considerations
When you design an index, consider the following database guidelines:
Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes.
Avoid over-indexing heavily updated tables and keep indexes narrow,
that is, with as few columns as possible.
Use many indexes to improve query performance on tables with low
update requirements, but large volumes of data. Large numbers of
indexes can help the performance of queries that do not modify data,
such as SELECT statements, because the query optimizer has more
indexes to choose from to determine the fastest access method.
Indexing small tables may not be optimal because it can take the
query optimizer longer to traverse the index searching for data than
to perform a simple table scan. Therefore, indexes on small tables
might never be used, but must still be maintained as data in the
table changes.
Indexes on views can provide significant performance gains when the
view contains aggregations, table joins, or a combination of
aggregations and joins. The view does not have to be explicitly
referenced in the query for the query optimizer to use it.
--Stay_Safe--
Indexes aren't very expensive, and speed up queries more than you realize. I would recommend adding indexes to all key and non-key fields that are often used in queries. You can even use the execution plan to recommend additional indexes that would speed up your queries.
The only point where indexes aren't in your favour is when you're doing large amounts of data inserts. Each insert requires each index in a table to be updated along with the table's data.
You can opt to wait until the application is running and you have some known queries against the database that you want to improve, or you could do it now, if you have a good idea.