SQL STORE PROCEDURE OPTIMIZATION - sql

I have a table which has non clustered index on PFX,EFF_DT,TERM_DT. The execution plan shows RID LookUp heap cost is 99%, instead of index scan. I want to know the reason why not index scan is not in execution plan, and is RID LookUp is good approach.
SELECT DISTINCT
ID
,PFX
,EFF_DT
,ID1
,TERM_DT
,RULE
,EXP_CAT
,ACCT_CAT
,OPTS
,RULE_ALT
,RULE_ALT_COND
FROM TempMaster
WHERE PFX = 'I004'
ORDER BY EFF_DT DESC

I want to know the reason why not index scan is not in execution plan
SQLSERVER is a cost based optimizer and it tries to choose a good plan in reasonable amount of time..
is RID LookUp is good approach
RID lookup is not always a good approach,since RID lookups are random seeks and they affect IO activity..
I would not worry if this query executes once a day..If this query is more frequent,i would avoid rid looup by including those columns in nonclustered index as well

Related

How to choose index scan or table scan when query in SequoiaDB?

There are the following scenarios:
Use PG to execute the query as follows:
Select count(*) from t where DATETIME >'2018-07-27 10.12.12.000000' and DATETIME < '2018-07-28 10.12.12.000000'
It returns 22 indexes with rapid execution.
The query condition has "="
Select count(*) from t where DATETIME >='2018-07-27 10.12.12.000000' and DATETIME <= '2018-07-28 10.12.12.000000'
It return 22 indexes which cost 20s.
I find that the query without “=” choose index scan, however, the query with “=” partly choose table scan.
According to your question:
The current indexing mechanism is that the optimizer matches the first available index, which means that the query will first select the first index created, and the choice of index depends on the order in which the index is created. In the case of an index, the query will take the index scan first.
Make sure that the nodes on each data group contain the index, otherwise the unindexed data nodes will take the table scan.
Execute analyze optimization query. Analyze is a new feature of SequoiaDB v3.0. It is mainly used to analyze collections, index data, and collect statistical information, and provide an optimal query algorithm to determine either index or table scan. Analyze specific usage reference: http://doc.sequoiadb.com/cn/index-cat_id-1496923440-edition_id-300
View the access plan by find.explain() to view the query cost

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:
Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)
Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

On what basis indexes are selected

I have a question: I have a clustered index on orderid,productid and a non-clustered index on productid. When I am using the following query, it uses the nonclustered index seek on productid, which I expected:
select orderid, productid
from Sales.OrderDetails
where productid =1
order by productid
However, without changing the search arguments, I added the Quantity:
select orderid, productid, qty
from Sales.OrderDetails
where productid =1
order by productid
Now it used a clustered index scan; and when I force use non clustered index (productid) the performance drops.
Every nonclustered index will contain the clustering key in its leaf level nodes. So your nonclustered index on productid really contains productid and orderid in its leaf level nodes.
So your first query can be satisfied by just looking up the value in the nonclustered index - the leaf level node found will contain both columns that your SELECT requires.
This is NOT the case when you add another column, like qty - now, once found, a key lookup into the actual data page is necessary to get all the columns to be returned from your SELECT query. So therefore, maybe now a clustered index scan is performing better than a nonclustered index seek and a key lookup.
I'm pretty sure the second query would use the nonclustered index again once you include the qty column in your index:
CREATE NONCLUSTERED INDEX ix_productId
ON Sales.OrderDetails(productId)
INCLUDE (qty)
because again: now once the leaf-level page in the non-clustered index is found, all necessary columns are present on that page and can be returned to your second query.
It makes sense, when you added the qty column using the clustered index is ideal because the non index (data) columns are located near the clustered indexed attribs.
Index: "The right index will help optimizer finding the best execution plan"
Selectivity: Density of an Index = 1 divided by the no of distinct records.Lower the value of density, higher the chances for Index to be choosen by optimizer, there by higher SELECTIVITY. For every Index, there will be a statistics also, With the following DBCC command, we can find the selectivity of a Statistics and there by the Index.
DBCC SHOW_STATISTICS('table name','statistics name')
It also displays, the Density, of the column when included with Clustered index columns. Lower selectivity will make optimizer ignore the index. We may find low performance query, even after creating index on the search column. The reason could be low selectivity.
Eg:
SELECT sod.OrderQty ,
sod.SalesOrderID ,
sod.SalesOrderDetailID ,
sod.LineTotal
FROM Sales.SalesOrderDetail sod
WHERE sod.OrderQty = 10;
When a where condition is involved, the chances for a seek is high.But in the above query, the execution plan is showing a clustered index Scan. created a non clusterd index on OrderQty, still optimizer is ignoring the index. The reason is, Order quantity density is 1/41 distinct values = ~.25, With such a low selectivity, the optmizser finds it comparable with index scan itself.
Outdated STATISTICS could really make OPTIMIZER ignore the index.
There are different things that a DBA can do to help optimizer find the best execution plan. One amoung those are QUERY HINTS. Introducing Query hits could help, at the same time could be affect the performance, as well. An Index related Query hint is WITH(INDEX())
We can tell optimizer to use a specific index.
eg: SELECT * FROM table WITH (INDEX (0)) -- Can give index number
SELECT * FROM table WITH (INDEX (indexName)) --
These things may help You

Getting RID Lookup instead of Table Scan?

SQL Fiddle: http://sqlfiddle.com/#!3/23cf8
In this query, when I have an In clause on an Id, and then also select other columns, the In is evaluated first, and then the Details column and other columns are pulled in via a RID Lookup:
--In production and in SQL Fiddle, Details is grabbed via a RID Lookup after the In clause is evaluated
SELECT [Id]
,[ForeignId]
,Details
--Generate a numbering(starting at 1)
--,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where foreignId In (1,2,3,5)
With this query, the Details are being pulled in via a Table Scan.
With NumberedContacts AS
(
SELECT [Id]
,[ForeignId]
--Generate a numbering(starting at 1)
,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where ForeignId In (1,2,3,5)
)
Select nc.[Id]
,nc.[ForeignId]
,sc.[Details]
From NumberedContacts nc
Inner Join SupportContacts sc on nc.Id = sc.Id
Where nc.ContactNumber <= 2 --Only grab the last 2 contacts per ForeignId
;
In SqlFiddle, the second query actually gets a RID Lookup, whereas in production with a million records it produces a Table Scan (the IN clause eliminates 99% of the rows)
Otherwise the query plan shown in SQL Fiddle is identical, the only difference being that for the second query the RID Lookup in SQL Fiddle, is a Table Scan in production :(
I would like to understand possibilities that would cause this behavior? What kinds of things would you look at to help determine the cause of it using a table scan here?
How can I influence it to use a RID Lookup there?
From looking at operation costs in the actual execution plan, I believe I can get the second query very close in performance to the first query if I can get it to use a RID Lookup. If I don't select the Detail column, then the performance of both queries is very close in production. It is only after adding other columns like Detail that performance degrades significantly for the second query. When I put it in SQL Fiddle and saw that the execution plan used an RID Lookup, I was surprised but slightly confused...
It doesn't have a clustered index because in testing with different clustered indexes, there was slightly worse performance for this and other queries. That was before I began adding other columns like Details though, and I can experiment with that more, but would like to have a understanding of what is going on now before I start shooting in the dark with random indexes.
What if you would change your main index to include the Details column?
If you use:
CREATE NONCLUSTERED INDEX [IX_SupportContacts_ForeignIdAsc_IdDesc]
ON SupportContacts ([ForeignId] ASC, [Id] DESC)
INCLUDE (Details);
then neither a RID lookup nor a table scan would be needed, since your query could be satisfied from just the index itself....
The differences in the query plans will be dependent on the types of indexes that exist and the statistics of the data for those tables in the different environments.
The optimiser uses the statistics (histograms of data frequency, mostly) and the available indexes to decide which execution plan is going to be the quickest.
So, for example, you have noticed that the performance degrades when the 'Details' column is included. This is an almost sure sign that either the 'Details' column is not part of an index, or if it is part of an index, the data in that column is mostly unique such that the index accesses would be equivalent (or almost equivalent) to a table scan.
Often when this situation arises, the optimiser will choose a table scan over the index access, as it can take advantage of things like block reads to access the table records faster than perhaps a fragmented read of an index.
To influence the path that will be chose by the optimiser, you would need to look at possible indexes that could be added/modified to make an index access more efficient, but this should be done with care as it can adversely affect other queries as well as possibly degrading insert performance.
The other important activity you can do to help the optimiser is to make sure the table statistics are kept up to date and refreshed at a frequency that is appropriate to the rate of change of the frequency distribution in the table data
If it's true that 99% of the rows would be omitted if it performed the query using the relevant index + RID then the likeliest problem in your production environment is that your statistics are out of date and the optimiser doesn't realise that ForeignID in (1,2,3,5) would limit the result set to 1% of the total data.
Here's a good link for discovering more about statistics from Pinal Dave: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
As for forcing the optimiser to follow the correct path WITHOUT updating the statistics, you could use a table hint - if you know the index that your plan should be using which contains the ID and ForeignID columns then stick that in your query as a hint and force SQL optimiser to use the index:
http://msdn.microsoft.com/en-us/library/ms187373.aspx
FYI, if you want the best performance from your second query, use this index and avoid the headache you're experiencing altogether:
create index ix1 on SupportContacts(ForeignID, Id DESC) include (Details);

Sql Server execution plan, cost of non-clustered index scan

Is it a bad symptom that non-clustered index scan cost is 53% ?
That depends on your query. The total query always costs 100%. So if you have a query like
SELECT Name from Customers WHERE ID = 3
than the index scan or seek may even cost 100%. That doesn't mean it's a bad thing. If you want a clear answer about you're query then you should at least post the query itself.
SQL Server will not use non-clustered index with key/bookmark lookup in the case if it expects iterator to return more than a few % of the total rows from the table.