How do I go about optimizing an Oracle query? - sql

I was given a SQL query, saying that I have to optimize this query.
I came accross explain plan. So, in SQL developer, I ran explain plan for ,
It divided the query into different parts and showed the cost for each of them.
How do I go about optimizing the query? What do I look for? Elements with high costs?
I am a bit new to DB, so if you need more information, please ask me, and I will try to get it.
I am trying to understand the process rather than just posting the query itself and getting the answer.
The query in question:
SELECT cr.client_app_id,
cr.personal_flg,
r.requestor_type_id
FROM credit_request cr,
requestor r,
evaluator e
WHERE cr.evaluator_id = 96 AND
cr.request_id = r.request_id AND
cr.evaluator_id = e.evaluator_id AND
cr.request_id != 143462 AND
((r.soc_sec_num_txt = 'xxxxxxxxx' AND
r.soc_sec_num_txt IS NOT NULL) OR
(lower(r.first_name_txt) = 'test' AND
lower(r.last_name_txt) = 'newprogram' AND
to_char(r.birth_dt, 'MM/DD/YYYY') = '01/02/1960' AND
r.last_name_txt IS NOT NULL AND
r.first_name_txt IS NOT NULL AND
r.birth_dt IS NOT NULL))
On running explain plan, I am trying to upload the screenshot.
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 15
NESTED LOOPS
NESTED LOOPS 15
HASH JOIN 12
Access Predicates
CR.EVALUATOR_ID=E.EVALUATOR_ID
INDEX EVALUATOR_PK UNIQUE SCAN 0
Access Predicates
E.EVALUATOR_ID=96
TABLE ACCESS CREDIT_REQUEST BY INDEX ROWID 11
INDEX CRDRQ_DONE_EVAL_TASK_REQ_NDX SKIP SCAN 10
Access Predicates
CR.EVALUATOR_ID=96
Filter Predicates
AND
CR.EVALUATOR_ID=96
CR.REQUEST_ID<>143462
INDEX REQUESTOR_PK RANGE SCAN 1
Access Predicates
CR.REQUEST_ID=R.REQUEST_ID
Filter Predicates
R.REQUEST_ID<>143462
TABLE ACCESS REQUESTOR BY INDEX ROWID 3
Filter Predicates
OR
R.SOC_SEC_NUM_TXT='XXXXXXXX'
AND
R.BIRTH_DT IS NOT NULL
R.LAST_NAME_TXT IS NOT NULL
R.FIRST_NAME_TXT IS NOT NULL
LOWER(R.FIRST_NAME_TXT)='test'
LOWER(R.LAST_NAME_TXT)='newprogram'
TO_CHAR(INTERNAL_FUNCTION(R.BIRTH_DT),'MM/DD/YYYY')='01/02/1960'

As a quick update to your query, you're going to want to refactor it to something like this:
SELECT
cr.client_app_id,
cr.personal_flg,
r.requestor_type_id
FROM
credit_request cr
inner join requestor r on
cr.request_id = r.request_id
inner join evaluator e on
cr.evaluator_id = e.evaluator_id
WHERE
cr.evaluator_id = 96
and cr.request_id != 143462
and (r.soc_sec_num_txt = 'xxxxxxxxx'
or (
lower(r.first_name_txt) = 'test'
and lower(r.last_name_txt) = 'newprogram'
and r.birth_dt = date '1960-01-02'
)
)
Firstly, joining by commas creates a cross join, which you want to avoid. Luckily, Oracle's smart enough to do it as an inner join since you specified join conditions, but you want to be explicit so you don't accidentally miss something.
Secondly, your is not null checks are pointless--if a column is null, and = check you do will return false for that row. In fact, any comparison with a null column, even null = null returns false. You can try this with select 1 where null = null and select 1 where null is null. Only the second one returns.
Thirdly, Oracle's smart enough to compare dates with the ISO format (at least the last time I used it, it was). You can just do r.birth_dt = date '1960-01-02' and avoid doing a string format on that column.
That being said, your query isn't exactly poorly written in terms of egregious performance mistakes. What you want to look for are indices. Does evaluator have one on evaluator_id? Does credit_request? What types are they? Typically, evaluator will have a one on the PK evaluator_id, and credit_request will have one for that column, as well. The same for requestor and the request_id columns.
Other indices you may want to consider are all the fields you're using to filter. In this case, soc_sec_num_txt, first_name_txt, last_name_txt, birth_dt. Consider putting a multi-column index on the latter three, and a single column index on the soc_sec_num_txt column.

After refactoring the query comes indexes, so following on from #eric's post:
credit_request:
You're joining this onto requestor on request_id, which I hope is unique. In your where clause you then have a condition on evaluator_id and select client_app_id and personal_flg in the query. So, you probably need a unique index, on credit_request of (request_id, evaulator_id, client_app_id, personal_flg.
By putting the columns you're selecting into the index you avoid the by index rowid, which means that you have selected your values from the index then re-entered the table to pick up more information. If this information is already in the index then there's no need.
You're joining it onto evaluator on evaluator_id, which is included in the first index.
requestor:
This is being joined onto on request_id and your where clause include soc_sec_num_text, lower(first_name_txt), lower(last_name_txt) and birth_dt. So, you need a unique if possible, index on (request_id, soc_sec_num_text) because of the or this is further complicated because you should really have an index on as many of the conditions as possible. You're also selecting requestor_type_iud.
In this case to avoid a functional index, with many columns, I'd index on (request_id, soc_sec_num_text, birth_dt ) if you have the space, time and inclination then adding lower(first_name_txt)... etc to this may improve the speed depending on how selective the column is. This means that if there are far more values in for instance, first_name_txt than birth_dt you'd be better of putting this in front of birth_dt in the index so your query has less to scan if it's a non-unique index.
You notice that I haven't added the selected column into this index as you're already going to have to go into the table so you gain nothing by adding it.
evaluator:
This is only being joined on evaluator_id so you need a unique, if possible, index on this column.

Related

Eliminate Clustered Index Scan in query

I have this query involving 2 tables joined.
Select q.id,
q.LastUpdatedTicksSinceEpoch,
q.[Type] [QuoteType],
q.LatestFormData [FormDataJson],
q.QuoteNumber,
q.QuoteState,
q.policyId,
p.CustomerId,
p.CustomerFullName,
p.CustomerAlternativeEmail,
p.CustomerHomePhone,
p.CustomerMobilePhone,
p.CustomerWorkPhone,
p.CustomerPreferredName,
p.ProductId
from Quotes q
INNER JOIN PolicyReadModels p on q.PolicyId=p.id
where
p.TenantId = #TenantId
and p.Environment = #Environment
and q.LastUpdatedTicksSinceEpoch > #LastUpdatedTicksSinceEpoch
and q.QuoteState <> 'Nascent'
and q.QuoteNumber is not null
and q.IsDiscarded = 0
ORDER BY q.LastUpdatedTicksSinceEpoch
When I run it and get the execution plan, I see Clustered Index Scan - which I want to Eliminate and use an Index Seek.
How can I eliminate the clustered index scan here? How can I construct a new INDEX? To add more context to this, this will query will be called by 12 parallel thread(12 connections), so I need this fast and optimized.
Here is my Query and execution plan:
https://www.brentozar.com/pastetheplan/?id=HkWjdvKDD
Create an index on:
create index someIndex on
Quotes (LastUpdatedTicksSinceEpoch)
include (Id, Type, LatestFormData, QuoteNumber, QuoteState, PolicyId, IsDiscarded)
where IsDiscarded = 0
This
Filters by your two fixed values (QuoteNumber NOT NULL and IsDiscarded = 0)
Sorts the data by LastUpdatedTicksSinceEpoch (as you are passing a variable to that in your WHERE clause, and it appears the most likely one you want to sort on
Includes all other relevant fields so doesn't need to go back to the clustered index (e.g., becomes a covering index).
Note that quote.ID is not included as it is your PK (from what I can tell in the execution plan) and therefore implicitly included in the index. You can include it though - no harm done at all (e.g., index size and performance will be exactly the same).
CREATE INDEX IX_q ON Quotes
(
LastUpdatedTicksSinceEpoch
)
INCLUDE
(
PolicyId,
[Type],
QuoteState,
LatestFormData,
QuoteNumber,
IsDiscarded
)
WHERE
(
QuoteNumber is not null
AND IsDiscarded = 0
)
Note: I think this is probably the fastest you can get, but you need to be careful because more indexes on table mean slower inserts, updates and deletes.
Often I find a much shorter index that can be used in many places is preferable to a very specific index like this (remember, if some fields like LatestFormData are large, then this index can get bloated and not be as useful to other queries).

Do I need to filter all subqueries before joining very large tables for optimization

I have three tables that I'm trying to join with over a billion rows per table. Each table has an index on the billing date column. If I just filter the left table and do the joins, will the query run efficiently or do I need to put the same date filter in each subquery?
I.E. will the first query run much slower than the second query?
select item, billing_dollars, IC_billing_dollars
from billing
left join IC_billing on billing.invoice_number = IC_billing.invoice_number
where billing.date = '2019-09-24'
select item, billing_dollars, IC_billing_dollars
from billing
left join (select * from IC_billing where IC_billing.date = '2019-09-24') on billing.invoice_number = IC_billing.invoice_number
where billing.date = '2019-09-24'
I don't want to run this without knowing whether the query will perform well as there aren't many safeguards for poor performing queries. Also, if I need to write the query in the second way, is there a way to only have the date filter in one location rather than having it show up multiple times in the query?
That depends.
Consider your query:
select b.item, b.billing_dollars, icb.IC_billing_dollars
from billing b left join
IC_billing icb
on b.invoice_number = icb.invoice_number
where b.date = '2019-09-24';
(Assuming I have the columns coming from the correct tables.)
The optimal strategy is an index on billing(date, invoice_number) -- perhaps also with item and billing_dollars to the index; and ic_billing(invoice_number) -- perhaps with IC_billing_dollars.
I can think of two situations where filtering on date in ic_billing would be useful.
First, if there is an index on (invoice_date, invoice_number), particularly a primary key definition. Then using this index is usually preferred, even if another index is available.
Second, if ic_billing is partitioned by invoice_date. In that case, you will want to specify the partition for performance.
Generally, though, the additional restriction on the invoice date does not help. In some databases, it might even hurt performance (particularly if the subquery is materialized and the outer query does not use an appropriate index).

Horrible Oracle update performance

I am performing an update with a query like this:
UPDATE (SELECT h.m_id,
m.id
FROM h
INNER JOIN m
ON h.foo = m.foo)
SET m_id = id
WHERE m_id IS NULL
Some info:
Table h is roughly ~5 million rows
All rows in table h have NULL values for m_id
Table m is roughly ~500 thousand rows
m_id on table h is an indexed foreign key pointing to id on table m
id on table m is the primary key
There are indexes on m.foo and h.foo
The EXPLAIN PLAN for this query indicated a hash join and full table scans, but I'm no DBA, so I can't really interpret it very well.
The query itself ran for several hours and did not complete. I would have expected it to complete in no more than a few minutes. I've also attempted the following query rewrite:
UPDATE h
SET m_id = (SELECT id
FROM m
WHERE m.foo = h.foo)
WHERE m_id IS NULL
The EXPLAIN PLAN for this mentioned ROWID lookups and index usage, but it also went on for several hours without completing. I've also always been under the impression that queries like this would cause the subquery to be executed for every result from the outer query's predicate, so I would expect very poor performance from this rewrite anyway.
Is there anything wrong with my approach, or is my problem related to indexes, tablespace, or some other non-query-related factor?
Edit:
I'm also having abysmal performance from simple count queries like this:
SELECT COUNT(*)
FROM h
WHERE m_id IS NULL
These queries are taking anywhere from ~30 seconds to sometimes ~30 minutes(!).
I am noticing no locks, but the tablespace for these tables is sitting at 99.5% usage (only ~6MB free) right now. I've been told that this shouldn't matter as long as indexes are being used, but I don't know...
Some points:
Oracle does not index NULL values (it will index a NULL that is part of a globally non-null tuple, but that's about it).
Oracle is going for a HASH JOIN because of the size of both h and m. This is likely the best option performance-wise.
The second UPDATE might get Oracle to use indexes, but then Oracle is usually smart about merging subqueries. And it would be a worse plan anyway.
Do you have recent, reasonable statistics for your schema? Oracle really needs decent statistics.
In your execution plan, which is the first table in the HASH JOIN? For best performance it should be the smaller table (m in your case). If you don't have good cardinality statistics, Oracle will get messed up. You can force Oracle to assume fixed cardinalities with the cardinality hint, it may help Oracle get a better plan.
For example, in your first query:
UPDATE (SELECT /*+ cardinality(h 5000000) cardinality(m 500000) */
h.m_id, m.id
FROM h
INNER JOIN m
ON h.foo = m.foo)
SET m_id = id
WHERE m_id IS NULL
In Oracle, FULL SCAN reads not only every record in the table, it basically reads all storage allocated up to the maximum used (the high water mark in Oracle documentation). So if you have had a lot of deleted rows your tables might need some cleaning up. I have seen a SELECT COUNT(*) on an empty table consume 30+ seconds because the table in question had like 250 million deleted rows. If that is the case, I suggest analyzing your specific case with a DBA, so he/she can reclaim space from deleted rows and lower the high water mark.
As far as I remember, a WHERE m_id IS NULL performs a full-table scan, since NULL values cannot be indexed.
Full-table scan means, that the engine needs to read every record in the table to evaluate the WHERE condition, and cannot use an index.
You could try to add a virtual column set to a not-null value if m_id IS NULL, and index this column, and use this column in the WHERE condition.
Then you could also move the WHERE condition from the UPDATE statement to the sub-select, which will probably make the statement faster.
Since JOINs are expensive, rewriting INNER JOIN m ON h.foo = m.foo as
WHERE h.foo IN (SELECT m.foo FROM m WHERE m.foo IS NOT NULL)
may also help.
For large tables, MERGE is often much faster than UPDATE. Try this (untested):
MERGE INTO h USING
(SELECT h.h_id,
m.id as new_m_id
FROM h
INNER JOIN m
ON h.foo = m.foo
WHERE h.m_id IS NULL
) new_data
ON (h.h_id = new_data.h_id)
WHEN MATCHED THEN
UPDATE SET h.m_id = new_data.new_m_id;
Try undocumented hint /*+ BYPASS_UJVC */. If it works, add an UNIQUE/PK constraint on m.foo.
I would update the table in iterations, for example, add a condition according to where h.date_created > sysdate-30 and after it finishes I would run the same query and change the condition to: where h.date_created between sysdate-30 and sysdate-60 etc. If you don't have a column like date_created maybe there's another column you can filter by ? for example: WHERE m.foo = h.foo AND m.foo between 1 and 10
Only the result of plan can explain why the cost of this update is high, but an educated guess will be that both tables are very big and that there are many NULL values as well as a lot of matching (m.foo = h.foo)...

sql, query optimisation with and inner join?

I'm trying to optimise my query, it has an inner join and coalesce.
The join table, is simple a table with one field of integer, I've added a unique key.
For my where clause I've created a key for the three fields.
But when I look at the plan it still says it's using a table scan.
Where am I going wrong ?
Here's my query
select date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) as due
from billsndeposits a
inner join util_nums b on date(a.startdate, '+'||(b.n*a.interval)||'
'||a.intervaltype) <= coalesce(a.enddate, date('2013-02-26'))
where not (intervaltype = 'once' or interval = 0) and factid = 1
order by due, pid;
Most likely your JOIN expression cannot use any index and it is calculated by doing a NATURAL scan and calculate date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) for every row.
BTW: That is a really weird join condition in itself. I suggest you find a better way to join billsndeposits to util_nums (if that is actually needed).
I think I understand what you are trying to achieve. But this kind of join is a recipe for slow performance. Even if you remove date computations and the coalesce (i.e. compare one date against another), it will still be slow (compared to integer joins) even with an index. And because you are creating new dates on the fly you cannot index them.
I suggest creating a temp table with 2 columns (1) pid (or whatever id you use in billsndeposits) and (2) recurrence_dt
populate the new table using this query:
INSERT INTO TEMP
SELECT PID, date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype)
FROM billsndeposits a, util_numbs b;
Then create an index on recurrence_dt columns and runstats. Now your select statement can look like this:
SELECT recurrence_dt
FROM temp t, billsndeposits a
WHERE t.pid = a.pid
AND recurrence_dt <= coalesce(a.enddate, date('2013-02-26'))
you can add a exp_ts on this new table, and expire temporary data afterwards.
I know this adds more work to your original query, but this is a guaranteed performance improvement, and should fit naturally in a script that runs frequently.
Regards,
Edit
Another thing I would do, is make enddate default value = date('2013-02-26'), unless it will affect other code and/or does not make business sense. This way you don't have to work with coalesce.

Oracle : Indexes not being used

I have a query which is not using my indexes. Can someone say why?
explain plan set statement_id = 'bad8' for
select
g1.g1id,a.a1id from atable a,
(
select
phone,address,g1id from gtable g
where
g.active = 0 and
(g.name is not null) AND
(SYSDATE - g.CTIME <= 2*365)
) g1
where
(
(a.phone.ph1 = g1.phone.ph1 and
a.phone.ph2 = g1.phone.ph2 and
a.phone.ph3 = g1.phone.ph3
)
OR
(a.address.ad1 = g1.address.ad1 and a.address.ad2 = g1.address.ad2)
)
In both the tables : atable,gtable I have these indexes :
1. On phone.ph1,phone.ph2,phone.ph3
2. On address.ad1,address.ad2
phone,address are of custom data types.
Using Oracle 11g.
Here is the explain plan query and output :
SELECT cardinality "Rows",
lpad(' ',level-1)||operation||' '||
options||' '||object_name "Plan"
FROM PLAN_TABLE
CONNECT BY prior id = parent_id
AND prior statement_id = statement_id
START WITH id = 0
AND statement_id = 'bad8'
ORDER BY id;
Result:
> Rows Plan
490191190 SELECT STATEMENT
> null CONCATENATION
> 490190502 HASH JOIN
> 511841 TABLE ACCESS FULL gtable
> 41332965 PARTITION LIST ALL
> 41332965 TABLE ACCESS FULL atable
> 688 HASH JOIN
> 376893 TABLE ACCESS FULL gtable
> 41332965 PARTITION LIST ALL
> 41332965 TABLE ACCESS FULL atable
Both atable,gtable have more than 10 million rows each.
Most values in columns phone and address don't repeat.
What indices Oracle chosen depends on many factor including things you haven't mentioned in your question such as the number of rows in the table, frequency of values within a column and whether you have separate or combined indices when more than one column is indexed.
Having said that, I suppose that the main reason your indices aren't used are:
You don't join directly with GTABLE / GLOBAL. Instead you join with a view that has three additional WHERE clauses that aren't part of the index and thus make it less effective in this constellation.
The JOIN condition includes an OR, which makes it difficult to use indices.
Update:
If Oracle used your indices to do the join - which is already very difficult due to the OR condition - it would end up with a huge number of ROWIDs. For each ROWID, it then had to fetch the full row. Since a full table scan can easily be up to 50 times faster than a fetch by ROWID (I don't know what value Oracle uses), it will only use the indices if it reliably knows that the join will reduce the number of rows to fetch by a factor of 50.
In your case, there are the remaining WHERE conditions (g.active = 0, g.name is not null, SYSDATE - g.CTIME <= 2*365), which aren't represented in the indices. So they have to applied after the join and after the GTABLE rows have been fetched. This makes it even more difficult to reach a 50 times smaller result set than a full table scan.
So I'm pretty sure the Oracle cost estimate is correct, i.e. using the indices would result in a more expensive query and even longer execution time.
We can say "your query does not use your indexes because does not need them". A hash join is better. To use your indexes, oracle need to full scan them(4 indexes), make two joins, make a rowid or, and after that read from tables probably many blocks. If he belives that the result has many rows, the CBO coose the full scans, because is faster.
There are no conditions that reduce the number of rows taken from tables. There is no range scan. It must do full scans.