Eliminate Clustered Index Scan in query - sql

I have this query involving 2 tables joined.
Select q.id,
q.LastUpdatedTicksSinceEpoch,
q.[Type] [QuoteType],
q.LatestFormData [FormDataJson],
q.QuoteNumber,
q.QuoteState,
q.policyId,
p.CustomerId,
p.CustomerFullName,
p.CustomerAlternativeEmail,
p.CustomerHomePhone,
p.CustomerMobilePhone,
p.CustomerWorkPhone,
p.CustomerPreferredName,
p.ProductId
from Quotes q
INNER JOIN PolicyReadModels p on q.PolicyId=p.id
where
p.TenantId = #TenantId
and p.Environment = #Environment
and q.LastUpdatedTicksSinceEpoch > #LastUpdatedTicksSinceEpoch
and q.QuoteState <> 'Nascent'
and q.QuoteNumber is not null
and q.IsDiscarded = 0
ORDER BY q.LastUpdatedTicksSinceEpoch
When I run it and get the execution plan, I see Clustered Index Scan - which I want to Eliminate and use an Index Seek.
How can I eliminate the clustered index scan here? How can I construct a new INDEX? To add more context to this, this will query will be called by 12 parallel thread(12 connections), so I need this fast and optimized.
Here is my Query and execution plan:
https://www.brentozar.com/pastetheplan/?id=HkWjdvKDD

Create an index on:
create index someIndex on
Quotes (LastUpdatedTicksSinceEpoch)
include (Id, Type, LatestFormData, QuoteNumber, QuoteState, PolicyId, IsDiscarded)
where IsDiscarded = 0

This
Filters by your two fixed values (QuoteNumber NOT NULL and IsDiscarded = 0)
Sorts the data by LastUpdatedTicksSinceEpoch (as you are passing a variable to that in your WHERE clause, and it appears the most likely one you want to sort on
Includes all other relevant fields so doesn't need to go back to the clustered index (e.g., becomes a covering index).
Note that quote.ID is not included as it is your PK (from what I can tell in the execution plan) and therefore implicitly included in the index. You can include it though - no harm done at all (e.g., index size and performance will be exactly the same).
CREATE INDEX IX_q ON Quotes
(
LastUpdatedTicksSinceEpoch
)
INCLUDE
(
PolicyId,
[Type],
QuoteState,
LatestFormData,
QuoteNumber,
IsDiscarded
)
WHERE
(
QuoteNumber is not null
AND IsDiscarded = 0
)
Note: I think this is probably the fastest you can get, but you need to be careful because more indexes on table mean slower inserts, updates and deletes.
Often I find a much shorter index that can be used in many places is preferable to a very specific index like this (remember, if some fields like LatestFormData are large, then this index can get bloated and not be as useful to other queries).

Related

SQL SERVER generates different and unoptimized execution plan for two similar queries

The following two queries are performed with SQL SERVER
SELECT TOP(10) [c].[Name] AS [I0], [c].[Surname] AS [I1],(
SELECT MAX([r].[Date])
FROM [Presences].[Regs] AS [r]
WHERE ([c].[Id] = [r].[ColId]) AND [r].[ColId] IS NOT NULL) AS [I7], [c].[Id] AS [I10]
FROM [Presences].[Cols] AS [c]
SELECT TOP(10) [c].[Code] AS [I0], [c].[Description] AS [I1], (
SELECT MAX([r].[Date])
FROM [Presences].[Regs] AS [r]
WHERE ([c].[Id] = [r].[CantId]) AND [r].[CantId] IS NOT NULL) AS [I9], [c].[Id] AS [I10]
FROM [Presences].[Cants] AS [c]
The second query is evaluated in less than 1 second but the first one takes more than 20 seconds.
Infact the generated execution plan for the first one is very different:
Bad execution plan of the first query
if compared with the second one:
Good execution plan of the second query
It is unclear to me why an index seek is not choosen.
Those are the indexes declared on the table [Regs]:
CREATE NONCLUSTERED INDEX [IX_Regs_Date] ON [Presences].[Regs]
(
[Date] ASC
)
CREATE NONCLUSTERED INDEX [IX_Regs_ColId] ON [Presences].[Regs]
(
[ColId] ASC
)
CREATE NONCLUSTERED INDEX [IX_Regs_CantId] ON [Presences].[Regs]
(
[CantId] ASC
)
Table [Cols] has around 600 rows and table [Cants] 21000.
The interesting fact is that the following query (using FORCESEEK) generates the correct execution plan:
SELECT TOP(10) [c].[Name] AS [I0], [c].[Surname] AS [I1],(
SELECT MAX([r].[Date])
FROM [Presences].[Regs] AS [r]
WITH (FORCESEEK)
WHERE ([c].[Id] = [r].[ColId]) AND [r].[ColId] IS NOT NULL) AS [I8]
FROM [Presences].[Cols] AS [c]
ORDER BY [c].[Name], [c].[Surname]
but I can't specify this hint because the queries are generated using an ORM.
If you need further informations I will be glad to provide them.
Your problem query is
SELECT TOP(10) c.Name AS I0,
c.Surname AS I1,
(SELECT MAX(r.Date)
FROM Presences.Regs AS r
WHERE ( c.Id = r.ColId )) AS I7,
c.Id AS I10
FROM Presences.Cols AS c
ORDER BY c.Name, c.Surname
In both the good and the bad plan the top is the same
It scans the Cols table, sorts by Name, Surname and then proceeds to calculate the MAX(Date) in Presences.Regs where r.ColId matches the corresponding c.Id from the outer row.
The ideal index for this would be one on ColId, Date - so it can seek into the index and just read the last one for that ColId.
You don't have this index though so it has two choices
Scan IX_Regs_Date - this is in date order so it can stop the scan as soon as it finds the first row matching c.Id = r.ColId
Seek into IX_Regs_ColId. Get all rows matching the predicate and then aggregate them to find the MAX.
For option 1 it estimates that it will only have to read 283 rows per scan on average before finding the first one matching c.Id = r.ColId. In reality it reads 1,358,719 per execution (and there are 10 executions so this is a corresponding 13.58 million key lookups). There are 1,517,230 rows in the table so it looks as though for some reason all the ones matching the join are congregated towards later dates in the table.
The easiest way of fixing this would be to give it the ideal index of ColId, Date - this covers the query so removes the lookup that is present even in the "good" plan and is an obvious best choice so will remove the temptation for SQL Server to select the "bad" plan.
--Suggested replacement for IX_Regs_ColId
CREATE NONCLUSTERED INDEX [IX_Regs_ColId_Date] ON [Presences].[Regs]
(
[ColId] ASC,
[Date] ASC
)

Query not using indexes using using Table function

I have following query:
select i.pkey as instrument_pkey,
p.asof,
p.price,
p.lastprice as lastprice,
p.settlementprice as settlement_price,
p.snaptime,
p.owner as source_id,
i.type as instrument_type
from quotes_mxsequities p,
instruments i,
(select instrument, maxbackdays
from TABLE(cast (:H_ARRAY as R_TAB))) lbd
where p.asof between :ASOF - lbd.maxbackdays and :ASOF
and p.instrument = lbd.instrument
and p.owner = :SOURCE_ID
and p.instrument = i.pkey
Since I have started using table function, query has started doing full table scan on table quotes_mxsequities which is large table.
Earlier when I used IN clause include of table function index was being used.
Any suggestion on how to enforce index usage?
EDIT:
I will try to get explain plan but just to add, H_ARRAY is expected to have around 10k entries. quotes_mxsequities is a large table millions of rows. Instruments is again a large table but has lesser rows than quotes_mxsequities.
Full table scan is happening for quotes_mxsequities while instruments is using index
Quite difficult to answer with no explain plan and informations about table structure, number of rows, etc.
As a general, simplified approach, you could try to force the use on an index with the INDEX hint.
Your problem can even be due to a wrong order in table processing; you can try to make Oracle follow the right order ( I suppose LBD first) with the LEADING hint.
Another point could be the full access, while you probably need a NESTED LOOP; in this case you can try the USE_NL hint
It's hard to be sure form the limited information provided, but it looks like this is an issue with the optimiser not being able to establish the cardinality of the table collection expression, since its contents aren't known at parse time. With a stored nested table the statistics would be available, but here there are none for it to use.
Without that information the optimiser defaults to guessing your table collection will have 8K entries, and uses that as the cardinality estimate; if that is a significant proportion of the number of rows in quotes_mxsequities then it will decide the index isn't going to be efficient, and will use a full table scan.
You can use the undocumented cardinality hint to tell the optimiser roughly how many elements you actually expect in the collection; you presumably won't know exactly, but you might know you usually expect around 10. So you could add a hint:
select /*+ CARDINALITY(lbd, 10) */ i.pkey as instrument_pkey,
You may also find the dynamic sampling hint useful here, but without your real data to look at, the cardinality hint applies to the basic execution plan so it's easy to see its effect.
Incidentally, you don't need the subquery on the table expression, you can simplify slightly to:
from TABLE(cast (:H_ARRAY as R_TAB)) lbd,
quotes_mxsequities p,
instruments i
or even better use modern join syntax:
select /*+ CARDINALITY(lbd, 10) */ i.pkey as instrument_pkey,
p.asof,
p.price,
p.lastprice as lastprice,
p.settlementprice as settlement_price,
p.snaptime,
p.owner as source_id,
i.type as instrument_type
from TABLE(cast (:H_ARRAY as R_TAB)) lbd
join quotes_mxsequities p
on p.asof between :ASOF - lbd.maxbackdays and :ASOF
and p.instrument = lbd.instrument
join instruments i
on i.pkey = p.instrument
where p.owner = :SOURCE_ID;

Join on columns with different indexes

I am using a query the join columns one has a clustered index and the other has a non-clustered index. The query is taking a long time. Is this the reason that I am using different type of indexes ?
SELECT #NoOfOldBills = COUNT(*)
FROM Billing_Detail D, Meter_Info m, Meter_Reading mr
WHERE D.Meter_Reading_ID = mr.id
AND m.id = mr.Meter_Info_ID
AND m.id = #Meter_Info_ID
AND BillType = 'Meter'
IF (#NoOfOldBills > 0) BEGIN
SELECT TOP 1 #PReadingDate = Bill_Date
FROM Billing_Detail D, Meter_Info m, Meter_Reading mr
WHERE D.Meter_Reading_ID = mr.id
AND m.id = mr.Meter_Info_ID
AND m.id = #Meter_Info_ID
AND billtype = 'Meter'
ORDER BY Bill_Date DESC
END
Without knowing more details and the context, it's tricky to advise - but it looks like you are trying to find out the date of the oldest bill. You could probably rewrite those two queries as one, which would improve the performance significantly (assuming that there are some old bills).
I would suggest something like this - which in addition to probably performing better, is a little easier to read!
SELECT count(d.*) NoOfOldBills, MAX(d.Billing_Date) OldestBillingDate FROM Billing_Detail d
INNER JOIN Meter_Reading mr ON mr.id=d.Meter_Reading_ID
INNER JOIN Meter_Info m ON m.Id=mr.Meter_Info_ID
WHERE
m.id = #Meter_Info_ID AND billtype = 'Meter'
The reason is not because you have different types of indices. Since you said you have clustered indices on all primary keys, you should be fine there. To support this query, you would also need an index with two columns on BillType and Bill_Date to cut down the time.
Hopefully they are in the same table. Otherwise, you may need a few indices to finally create one that does have two columns.
OK there are a number of things here. First, are the indexes you have relevant (or as relevant as they can be). There should be a clustered index on each table - a non clustered index does not work effectively if the table does not have a clustered index which non clustered indexes use to identify rows.
Does your index cover only one column? SQL Server uses indexes with the order of the columns for that index being very important. The left most column of the index should (in general) be the column that has the greatest ordinality (divides the data into the smallest amounts) Do either of the indexes cover all the columns referred to in the query (this is known as a covering index (Google this for more info).
In general indexes should be wide, if SQL Server has an index on col1, col2, col3, col4 and another on col1, col2 the later is redundant, as the information from the second index is fully contained in the first and SQL Server understands this.
Are you statistics up to date? SQL Server can / will choose a bad execution plan if the statistics are not up to date. What does query analyser show for plan execution (SSMS Query | show execution plan)?

How do I go about optimizing an Oracle query?

I was given a SQL query, saying that I have to optimize this query.
I came accross explain plan. So, in SQL developer, I ran explain plan for ,
It divided the query into different parts and showed the cost for each of them.
How do I go about optimizing the query? What do I look for? Elements with high costs?
I am a bit new to DB, so if you need more information, please ask me, and I will try to get it.
I am trying to understand the process rather than just posting the query itself and getting the answer.
The query in question:
SELECT cr.client_app_id,
cr.personal_flg,
r.requestor_type_id
FROM credit_request cr,
requestor r,
evaluator e
WHERE cr.evaluator_id = 96 AND
cr.request_id = r.request_id AND
cr.evaluator_id = e.evaluator_id AND
cr.request_id != 143462 AND
((r.soc_sec_num_txt = 'xxxxxxxxx' AND
r.soc_sec_num_txt IS NOT NULL) OR
(lower(r.first_name_txt) = 'test' AND
lower(r.last_name_txt) = 'newprogram' AND
to_char(r.birth_dt, 'MM/DD/YYYY') = '01/02/1960' AND
r.last_name_txt IS NOT NULL AND
r.first_name_txt IS NOT NULL AND
r.birth_dt IS NOT NULL))
On running explain plan, I am trying to upload the screenshot.
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 15
NESTED LOOPS
NESTED LOOPS 15
HASH JOIN 12
Access Predicates
CR.EVALUATOR_ID=E.EVALUATOR_ID
INDEX EVALUATOR_PK UNIQUE SCAN 0
Access Predicates
E.EVALUATOR_ID=96
TABLE ACCESS CREDIT_REQUEST BY INDEX ROWID 11
INDEX CRDRQ_DONE_EVAL_TASK_REQ_NDX SKIP SCAN 10
Access Predicates
CR.EVALUATOR_ID=96
Filter Predicates
AND
CR.EVALUATOR_ID=96
CR.REQUEST_ID<>143462
INDEX REQUESTOR_PK RANGE SCAN 1
Access Predicates
CR.REQUEST_ID=R.REQUEST_ID
Filter Predicates
R.REQUEST_ID<>143462
TABLE ACCESS REQUESTOR BY INDEX ROWID 3
Filter Predicates
OR
R.SOC_SEC_NUM_TXT='XXXXXXXX'
AND
R.BIRTH_DT IS NOT NULL
R.LAST_NAME_TXT IS NOT NULL
R.FIRST_NAME_TXT IS NOT NULL
LOWER(R.FIRST_NAME_TXT)='test'
LOWER(R.LAST_NAME_TXT)='newprogram'
TO_CHAR(INTERNAL_FUNCTION(R.BIRTH_DT),'MM/DD/YYYY')='01/02/1960'
As a quick update to your query, you're going to want to refactor it to something like this:
SELECT
cr.client_app_id,
cr.personal_flg,
r.requestor_type_id
FROM
credit_request cr
inner join requestor r on
cr.request_id = r.request_id
inner join evaluator e on
cr.evaluator_id = e.evaluator_id
WHERE
cr.evaluator_id = 96
and cr.request_id != 143462
and (r.soc_sec_num_txt = 'xxxxxxxxx'
or (
lower(r.first_name_txt) = 'test'
and lower(r.last_name_txt) = 'newprogram'
and r.birth_dt = date '1960-01-02'
)
)
Firstly, joining by commas creates a cross join, which you want to avoid. Luckily, Oracle's smart enough to do it as an inner join since you specified join conditions, but you want to be explicit so you don't accidentally miss something.
Secondly, your is not null checks are pointless--if a column is null, and = check you do will return false for that row. In fact, any comparison with a null column, even null = null returns false. You can try this with select 1 where null = null and select 1 where null is null. Only the second one returns.
Thirdly, Oracle's smart enough to compare dates with the ISO format (at least the last time I used it, it was). You can just do r.birth_dt = date '1960-01-02' and avoid doing a string format on that column.
That being said, your query isn't exactly poorly written in terms of egregious performance mistakes. What you want to look for are indices. Does evaluator have one on evaluator_id? Does credit_request? What types are they? Typically, evaluator will have a one on the PK evaluator_id, and credit_request will have one for that column, as well. The same for requestor and the request_id columns.
Other indices you may want to consider are all the fields you're using to filter. In this case, soc_sec_num_txt, first_name_txt, last_name_txt, birth_dt. Consider putting a multi-column index on the latter three, and a single column index on the soc_sec_num_txt column.
After refactoring the query comes indexes, so following on from #eric's post:
credit_request:
You're joining this onto requestor on request_id, which I hope is unique. In your where clause you then have a condition on evaluator_id and select client_app_id and personal_flg in the query. So, you probably need a unique index, on credit_request of (request_id, evaulator_id, client_app_id, personal_flg.
By putting the columns you're selecting into the index you avoid the by index rowid, which means that you have selected your values from the index then re-entered the table to pick up more information. If this information is already in the index then there's no need.
You're joining it onto evaluator on evaluator_id, which is included in the first index.
requestor:
This is being joined onto on request_id and your where clause include soc_sec_num_text, lower(first_name_txt), lower(last_name_txt) and birth_dt. So, you need a unique if possible, index on (request_id, soc_sec_num_text) because of the or this is further complicated because you should really have an index on as many of the conditions as possible. You're also selecting requestor_type_iud.
In this case to avoid a functional index, with many columns, I'd index on (request_id, soc_sec_num_text, birth_dt ) if you have the space, time and inclination then adding lower(first_name_txt)... etc to this may improve the speed depending on how selective the column is. This means that if there are far more values in for instance, first_name_txt than birth_dt you'd be better of putting this in front of birth_dt in the index so your query has less to scan if it's a non-unique index.
You notice that I haven't added the selected column into this index as you're already going to have to go into the table so you gain nothing by adding it.
evaluator:
This is only being joined on evaluator_id so you need a unique, if possible, index on this column.

Oracle : Indexes not being used

I have a query which is not using my indexes. Can someone say why?
explain plan set statement_id = 'bad8' for
select
g1.g1id,a.a1id from atable a,
(
select
phone,address,g1id from gtable g
where
g.active = 0 and
(g.name is not null) AND
(SYSDATE - g.CTIME <= 2*365)
) g1
where
(
(a.phone.ph1 = g1.phone.ph1 and
a.phone.ph2 = g1.phone.ph2 and
a.phone.ph3 = g1.phone.ph3
)
OR
(a.address.ad1 = g1.address.ad1 and a.address.ad2 = g1.address.ad2)
)
In both the tables : atable,gtable I have these indexes :
1. On phone.ph1,phone.ph2,phone.ph3
2. On address.ad1,address.ad2
phone,address are of custom data types.
Using Oracle 11g.
Here is the explain plan query and output :
SELECT cardinality "Rows",
lpad(' ',level-1)||operation||' '||
options||' '||object_name "Plan"
FROM PLAN_TABLE
CONNECT BY prior id = parent_id
AND prior statement_id = statement_id
START WITH id = 0
AND statement_id = 'bad8'
ORDER BY id;
Result:
> Rows Plan
490191190 SELECT STATEMENT
> null CONCATENATION
> 490190502 HASH JOIN
> 511841 TABLE ACCESS FULL gtable
> 41332965 PARTITION LIST ALL
> 41332965 TABLE ACCESS FULL atable
> 688 HASH JOIN
> 376893 TABLE ACCESS FULL gtable
> 41332965 PARTITION LIST ALL
> 41332965 TABLE ACCESS FULL atable
Both atable,gtable have more than 10 million rows each.
Most values in columns phone and address don't repeat.
What indices Oracle chosen depends on many factor including things you haven't mentioned in your question such as the number of rows in the table, frequency of values within a column and whether you have separate or combined indices when more than one column is indexed.
Having said that, I suppose that the main reason your indices aren't used are:
You don't join directly with GTABLE / GLOBAL. Instead you join with a view that has three additional WHERE clauses that aren't part of the index and thus make it less effective in this constellation.
The JOIN condition includes an OR, which makes it difficult to use indices.
Update:
If Oracle used your indices to do the join - which is already very difficult due to the OR condition - it would end up with a huge number of ROWIDs. For each ROWID, it then had to fetch the full row. Since a full table scan can easily be up to 50 times faster than a fetch by ROWID (I don't know what value Oracle uses), it will only use the indices if it reliably knows that the join will reduce the number of rows to fetch by a factor of 50.
In your case, there are the remaining WHERE conditions (g.active = 0, g.name is not null, SYSDATE - g.CTIME <= 2*365), which aren't represented in the indices. So they have to applied after the join and after the GTABLE rows have been fetched. This makes it even more difficult to reach a 50 times smaller result set than a full table scan.
So I'm pretty sure the Oracle cost estimate is correct, i.e. using the indices would result in a more expensive query and even longer execution time.
We can say "your query does not use your indexes because does not need them". A hash join is better. To use your indexes, oracle need to full scan them(4 indexes), make two joins, make a rowid or, and after that read from tables probably many blocks. If he belives that the result has many rows, the CBO coose the full scans, because is faster.
There are no conditions that reduce the number of rows taken from tables. There is no range scan. It must do full scans.