I am tuning my SQL server and when I show my execution plan for one of my queries at the top it reads:
"Missing Index (Impact 99.7782): CREATE NONCLUSTERED INDEX..."
So I looked at the missing index details and it is showing this:
/*
Missing Index Details from ExecutionPlan1.sqlplan
The Query Processor estimates that implementing the following index could improve the query cost by 99.7782%.
*/
/*
USE [phsprod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[address] ([userid])
GO
*/
I have only been working with SQL for about a month now and I have never done anything with this as all my tables have been built for me already. Can anyone help explain/give me any ideas on what to do with this? Thanks.
That means SQL Server is suggesting that your query could run faster with an index. Indexes add overhead and disk storage, so you should ignore this hint unless the query is giving performance problems in production.
To create the index, uncomment the statement after use, replace [<Name of Missing Index, sysname,>] with a real name, and run it:
USE [phsprod]
GO
CREATE NONCLUSTERED INDEX IX_Address_UserId
ON [dbo].[address] ([userid])
That means SQL Server is suggesting that your query could run faster with this index.
It can mean that your current indexes are not the greatest for the query you are running. Maybe your query could be optimised. Or maybe you COULD add the index. But if you decide to do this, you have to analyse carefully.
Indeed, indexes add overhead and disk storage. But, it can also improve performance.
For instance, if you always search in your table based on a "userid", then maybe it can payoff to add an index on that column, since SQL will be able to search usign this index.
Think a little bit of this like if you search for a word in a dictionnary.
If your looking for the word "dog", your going to search for "d" and then words that begin with "do" to finally find the word "dog".
If the words were not in alphabetical order in the dictionnary, you would have to search the whole dictionnary to find the word "dog"!
A clustered index (or a primary key) is the order of your columns.
Right now, it seems that you don't have an index on the column "userid". So SQL Server has (probably) to scan the entire table until he finds the userid.
If you add a nonclustered index, it will not re-order your table, but it will tell SQL Server between what range he should search to find the userid you want to. (Like "in the dictionnary, between page 20 and 30") So it will not have to search the whole table to find it.
But it also means that when you add new data to the table, or remove, or modify, he needs to keep his index up-to-date. Generally, a few indexes don't hurt, but you need to be sure they are needed. You don't want to add too much indexes as they can hurt performances if you add too much.
And if your table contains only a few hundreds of rows, maybe it won't show you a big improvement of performances. But over time, when your table grows, it may make a difference.
Hope that helps!
Related
I am setting up some SQL views to be used as Power BI data sources. However, one of the queries constantly takes a long time to run, and I want to figure out what the best way to resolve this is. I am in the finance department at my employer, so SQL query tuning is not really what I do day to day, but I am trying to learn.
The execution plan is here:
https://www.brentozar.com/pastetheplan/?id=BJGHe1W0H
I can see the execution plan is asking me to add some indexes, but I am not sure if I should do that or not. I have read that the SQL hints should not be followed blindly, as it could cause other issues
The query is:
select ansapbicalls.*
from ansapbicalls
inner join
ANSAPBIStatus
on ansapbicalls.[Call Status]=ansapbistatus.[Status ID]
inner join ansapbifault on ansapbicalls.Fault=ANSAPBIFault.[Fault ID]
where ANSAPBIStatus.[Status Type]='Operations' and ANSAPBIFault.[Job Type]='RR'
And the missing index warnings are:
Missing Index (Impact 39.0531):
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[MXMSERVCALLAUDIT] ([TYPE],[DATAAREAID]) INCLUDE ([JOBID],[RECID])
Missing Index (Impact 51.6627):
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[MXMSERVCALLAUDIT] ([DATAAREAID],[RECID]) INCLUDE ([JOBID],[USERID],[DATE])
In my experience it is best to just try to apply the index.
Create the index by copying the code and name the Index, usually something like IX_Tablename_columnames.
Creating the index will take some time but afterwards running the query should be a lot faster. If this isn't the case (which I doubt) you can always remove the index again.
So run this code
CREATE NONCLUSTERED INDEX [IX_MXMSERVCALLAUDIT_TYPE_DATAAREAID_JOBID_RECID]
ON [dbo].[MXMSERVCALLAUDIT] ([TYPE],[DATAAREAID]) INCLUDE ([JOBID],[RECID])
CREATE NONCLUSTERED INDEX [IX_MXMSERVCALLAUDIT_DATAAREAID_RECID_JOBID_USERID_DATE]
ON [dbo].[MXMSERVCALLAUDIT] ([DATAAREAID],[RECID]) INCLUDE ([JOBID],[USERID],[DATE])
And then run the original query again.
How can I improve my performance issue? I have an sql query with 'IN' I guess 'IN' making some costly performance issue. But I need index my sql query?
My sql query:
SELECT [p].[ReferencedxxxId]
FROM [Common].[xxxReference] AS [p]
WHERE ([p].[IsDeleted] = 0)
AND (([p].[ReferencedxyzType] = #__refxyzType_0)
AND [p].[ReferencedxxxId] IN ('42342','ffsdfd','5345345345'))
My solution: (BUT I NEED YOUR HELP FOR BETTER ADVISE) Whichone is correct clustered or nonclustred index?
USE [xxx]
GO
CREATE NONCLUSTERED INDEX IX_NonClusteredIndexDemo_xxxId
ON [Common].[xxxReference](xxxId)
INCLUDE ([ID],[ReferencedxxxId])
WITH (DROP_EXISTING=ON, ONLINE=ON, FILLFACTOR=90)
GO
Second:
CREATE INDEX xxxReference_ReferencedxxxId_index
ON [Common].[xxxReference] (ReferencedxxxId)[/code]
Whichone is correct or do you have better solution?
The performance problem of this query is not the result of using the IN operator.
This operator performs very well with small lists (say, less than 1000 members).
The performance bottle neck here is the fact that SQL Server performs an index scan instead of an index seek (which is very costly), and the key lookup, which is 20% of the query cost.
To avoid both problems, you can add an index on IsDeleted, ReferencedxyzType and ReferencedxxxId - probably in this exact order.
SQL Performance tuning is a science that tends to look a little like art or magic - either way you look at it it requires a good knowledge of both the theory and practice of index settings and the relevant systems requirements.
Therefor, my suggestion is this: Do not attempt to solve it yourself with the help of strangers on the internet. Get an expert for a consulting job for a couple of hours/days to analyze the system and help you fine-tune it.
Learn whatever you can during this process. Ask questions about everything that is not trivial. This will be money well spent.
Couple of things:
If you have a SELECT statement inside the IN, that should be avoided
and should be replaced with an EXISTS clause. But in your above
example, that is not relevant as you have direct values inside IN.
Using EXISTS and NOT EXISTS instead of IN and NOT IN helps SQL
Server to not needing to scan each value of the column for each
values inside the IN / NOT IN and rather can short circuit the
search once a match or non-match found.
Avoid the implicit conversion. They degrade the performance due to
many reasons including i> SQL Server not able to find proper
statistics on an index and hence not able to leverage an index and
would rather go make use of a clustered index available in the table
(which may not be covering your query), ii> Not assigning proper
required RAM during memory allocation phase of the query by storage
engine, iii> Cardinality estimation becomes wrong as SQL Server
would not have statistics on the computed value of that column, and
rather probably had statistics on that column.
If you look at your execution plan posted above, you will see a
yellow mark in your 'SELECT'. If you hover over it, you will see
one/more warning messages. If your warning is related to implicit
conversion, try to use proper datatypes during comparison.
Eg. What is the datatype of the column '[ReferencedxxxId]'? If it
is not an NVARCHAR and is rather a VARCHAR, then I would suggest:
Make the values inside the IN as VARCHAR (currently you are making them NVARCHAR). This way you will still be able to take full advantage of the rowstore index created on [ReferencedxxxId] column.
If you must have the values as NVARCHAR inside the IN clause, then you should:
CONVERT/CAST the column [ReferencedxxxId] in your IN clause. This is going to get rid of the Implicit conversion but you will no longer be able to take full advantage of the rowstore index on [ReferencedxxxId] column.
+
Rather create a clustered/nonclustered columnstore index on the table covering the columns used in the query. This should significantly enhance the performance of your SELECT query.
If you decided to go with the route of using rowstore index by correcting the values inside the IN, you need to make sure that you create a clustered/nonclustered index which covers the query. Meaning the index covers the columns on which you are doing search ([ReferencedxxxId], [ReferencedxxxType], [IsDeleted]) and then including the columns used in SELECT statement under INCLUDE clause (if it is a nonclustered index)
Also, when you are creating a composite rowstore index, try to keep the order of columns inside the index high cardinality to low cardinality from left to right to make the best use of that index.
On the basis of assuming an OLTP based system and not OLAP, my first pass would be an NC Index - given isDeleted is likely to have the least selectivity, I would place it last, first pass would be an NC index ReferencedxyzType, ReferencedxxxId, IsDeleted
I might even be tempted in a higher volume scenario to move the IsDeleted out of the index onto an include instead, since it provides so little selectivity to the index itself.
There is clearly already a clustered index in place on the table (from the query plan we can see it), we don't have the details of what is in it.
The question around clustered vs non-clustered is more complex and requires a lot more knowledge of the system and usage.
Im running a query that is taking 2 seconds but it should perform better, so I run the Execute Plan details from SQL Managemenet Studio and I found a "step" in the process that the Cost is 70%.
Then, I right click on the item and I found an option that says "Missing Index Details", after I clicked that then a query with a recommendation is generated:
/*
Missing Index Details from SQLQuery15.sql - (local).application_prod (appprod (58))
The Query Processor estimates that implementing the following index could improve the query cost by 68.8518%.
*/
/*
USE [application_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([isactivedocument])
INCLUDE ([objectuid])
GO
*/
So my question is exactly what happens if I execute the query? Is it going to affect my database, is there any sideback or sideeffects after applying that?
Thanks a lot and appreciate in advance.
Running the query qill create an Index on the table specified (cloud_document).
This should improve the reading performance and improve performance/query time.
It does also affect the performance of INSERT/UPDATE/DELETE statements as the indexes needs to be maintained during these statements.
The decision to use indexing, how many indexes and what the index consists of is more an art than an exact science.
The actual maintinance of indexes, defragmenting, and statistics is something that can be automated, but should be left, until you have a better understanding of what indexes are and what they do.
I would recomend that you start reading some documentation regarding indexing.
May start with Stairway to SQL Server Indexes
The literal meaning is telling you to build an index on isactivedocument of table [dbo].[cloud_document], from which I assume you are using isactivedocument as a condition to filter the table, and select the column objectuid. Something like
select objectuid, ... from [dbo].[cloud_document] where isactivedocument = 0/1
Note that the "Clustered Index Scan (70%)" doesn't mean it's a problem of the clustered index. It means you don't have index on isactivedocument, then sql engine has to scan clustered index to find what you want. That's why it's taking so much pressure. Once you create index on isactivedocument, check out the plan again, and you'll see the node becomes "index seek", which is a much faster way to find out what you want.
Usually if your database stress mainly comes from query, new index doesn't do much harm to your system. Just go ahead and create the index. But of course you need to keep index quantities as less as possible.
If I have a field in a table of some date type and I know that I will always be searching it using comparisons like between, > or < and never = could there be a good reason not to add an index for it?
The only reason not to add an index on a field you are going to search on is that the cost of maintaining the index overweights its benefits.
This may happen if:
You have a really tough DML on your table
The existence of the index makes it intolerably slow, and
It's more important to have fast DML than the fast queries.
If it's not the case, then just create the index. The optimizer just won't use it if it thinks it's not needed.
There are far more bad reasons.
However, an index on the search column may not be enough if the index is nonclustered and non-covering. Queries like this are often good candidates for clustered indexes, however a covering index is just as good.
This is a great example of why this is as much art as science. Some considerations:
How often is data added to this table? If there is far more reading/searching than adding/changing (the whole point of some tables to dump data into for reporting), then you want to go crazy with indexes. You clustered index might be needed more for the ID field, but you can have plenty of multi-column indexes (where the date fields comes later, with columns listed earlier in the index do a good job of reducing the result set), and covered indexes (where all returned values are in the index, so it's very fast, like you're searching on the clustered index to begin with).
If the table is edited/added to often, or you have limited storage space and hence can't have tons of indexes, then you have to be more careful with your indexes. If your date criteria typically gives a wide range of data, and you don't search often on other fields, then you could give a clustered index over to this date field, but think several times before you do that. You clustered index being on a simple autonumber field is a bonus for all you indexes. Non-covered indexes use the clustered index to zip to the records for the result set. Don't move the clustered index to a date field unless the vast majority of your searching is on that date field. It's the nuclear option.
If you can't have a lot of covered indexes (data changes a lot on the table, there's limited space, your result sets are large and varied), and/or you really need the clustered index for another column, and the typical date criteria gives a wide range of records, and you have to search a lot, you've got problems. If you can dump data to a reporting table, do that. If you can't, then you'll have to balance all these competing factors carefully. Maybe for the top 2-3 searches you minimize the result-set columns as much as you can configure covered indexes, and you let the rest make due with a simple non -clustered index
You can see why good db people should be paid well. I know a lot of the factors, but I envy people to can balance all these things quickly and correctly without having to do a lot of profiling.
Don't index it IF you want to scan the entire table every time. I would want the database to try and do a range scan, so I'd add the index, but I use SQL Server and it will use the index in most cases. However different databases many not use the index.
Depending on the data, I'd go further than that, and suggest it could be a clustered index if you're going to be doing BETWEEN queries, to avoid the table scan.
While an index helps for querying the table, it will also slow down inserts, updates and deletes somewhat. If you have a lot more changes in the table than queries, an index can hurt the overall performance.
If the table is small it might never use the indexes therefore adding them may just be wasting resources.
There are datatypes (like image in SQL Server) and data distributions where indexes are unlikely to be used or can't be used. For instance in SQL Server, it is pointless to index a bit field as there is not enough variability in the data for an index to do any good.
If you usually query with a like clause and a wildcard as the first character, no index will be used, so creating one is another waste of reseources.
I have inherited a database where there are clustered indexes and additional duplicate indexes for each of the clustered index.
i.e
IX_PrimaryKey is a clustered index on the column ID.
IX_ID is a non clustered index on the column ID.
I want to clean up these duplicate non clustered indexes and I wanted to check to see if anyone could think of a reason to do this.
Can anyone think of a performance benefit for doing this?
For exact same indexes, there's no performance gain. Actually, it incurs performance loss in insertion and updates. However, if there are multicolumn indexes with different column order, there might be a valid reason for them.
Maybe I'm not thinking hard enough, but I can't see any reason to do this; the nature of the clustered index is that the data is organized in the order of the index. It seems that the extra index is a complete waste.
Digging through BOL and watching this question, though ...
There seems no sensible reason for doing this, and there is a performance hit.
The only thing I could think of to do this is to create an index with an incredibly narrow row width so that the rows per page was very high, making it very quick to scan / seek. But since it contains no other fields (except the clustered key, which is the same value) I still cannot see a reason for it.
It's quite possible the original creator was not aware that the PK was defaulting to a clustered index and created an NC index without realising it was a duplicate.
I presume what would have happened is that SQL Server would have automatically created clustered index when a primary key constraint was specified (this would happen if another index (non-clustered/clustered) is not present already) and then some one might have created a non-clustered index for the primary key column.
Such a scenario would:
Have some adverse effect on performance as indexes are updated when inserts/deletes/updates happen.
Use additional disk space.
Might lead to deadlocks.
Would contribute to more time in backup/restore of database.
cheers
It will be a waste to create a clustered primary key. Unless you have query that search for records using WHERE ID = 10 ?
You may want to create a clustered index on the column which will be frequently queried on WHERE City = 'Sydney'. Clustered means that SQL will group the data in the table based on the clustered index. By grouping the City values in the table means SQL can search for data quicker.
Storing two indexes over the same data is a waste of disk space and the processing needed to maintain the data.
However, I can imagine a product which depends on the existence of an index named IX_PrimaryKey. E.G.
string queryPattern = "select * from {0} as t with (index(IX_PrimaryKey))";
You can make the argument that the clustered index itself occupies much less space than the others, since the leaf is the actual data. On the other hand, the clustered index can be more susceptible to page splitting, and some indexes are better non-clustered.
Putting this together, I can definitely think of scenarios where removing the duplicate indexes would be a Bad Thing:
Code like above which depends on a known index name.
Code which can alter the clustered index to any of the non-clustered indexes.
Code which uses the presence/absence of IX_PrimaryKey to treat the table in a certain way.
I don't consider any of these good design, but I can definitely imagine someone doing it. (Have you posted this to DailyWTF?)
There are cases where it makes sense to have overlapping indexes which are not identical:
create index IX_1 on table1 (ID)
create index IX_2 on table1 (ID, TYPE, ORDER_DATE, TOTAL_CHARGES)
If you are looking up strictly by ID, SQL can optimize and use IX_1. If you are running a query based on ID, TYPE, ORDER_DATE and summing up TOTAL_CHARGES, SQL can use IX_2 as a "covering index", satisfying all the query details from the index without ever touching the table. Generally this is something you add in the course of performance tuning, after extensive testing.
Looking at your given example of two indexes on exactly the same field, I don't see a great fit. Perhaps SQL can use IX_ID as a "covering index" when checking for the existence of a value and bypass some blocking on IX_PrimaryKey?