Validating optimization of an Oracle query - sql

Ok, so I'm working on this (rather old) project at work, which uses loads of queries towards an Oracle database. I recently stumbled upon this gem, which takes about 6-7 hours to run and returns ~1400 rows. The table/view in question contains ~200'000 rows. I thought this felt like it was taking maybe a little longer than seemed reasonable, so I started having a closer look at it. Now I can't, for security/proprietary reasons, share the exact query, but this should show what the query does in more general terms:
SELECT
some_field,
some_other_field
FROM (
SELECT
*
FROM
some_view a
WHERE
some_criteria AND
a.client_no || ':' || a.engagement_no || ':' || a.registered_date = (
SELECT
b.client_no || ':' || b.engagement_no || ':' || MAX(b.registered_date)
FROM
some_view b
JOIN some_engagement_view e
ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
JOIN some_client_view c
ON c.client_no = b.client_no
WHERE
some_other_criteria AND
b.client_no = a.client_no AND
b.engagement_no = a.engagement_no
GROUP BY
b.client_no,
b.engagement_no
)
);
Basically what it is supposed to do, as far as I've managed to figure out, is to from some_view (which contains evaluations of customers/engagements) fetch the latest evaluation for every unique client/engagement.
The two joins are there to ensure that the client and engagement exists in another system, where they are primarily handled after you have done the evaluation in this system.
Notice how it concatenates two numbers and a date, and then compares that to a sub-query? "Interesting" design-choice. So I thought that if you replace the concatenation with a proper comparison you might get some kind of performance gain at least. Please notice that I primarily develop .NET and for the web, and am far from an expert when it comes to databases, but I rewrote it as follows:
SELECT
some_field,
some_other_filed
FROM
some_view a
WHERE
some_criteria AND
(a.client_no, a.engagement_no, a.registered_date) = (
SELECT
b.client_no,
b.engagement_no,
MAX(b.registered_date)
FROM
some_view b
JOIN some_engagement_view e
ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
JOIN some_client_view c
ON c.client_no = b.client_no
WHERE
some_other_criteria AND
b.client_no = a.client_no AND
b.engagement_no = a.engagement_no
GROUP BY
b.client_no,
b.engagement_no
)
);
Now if I replace the fields in the very first select with a COUNT(1), I get exactly the same number of rows with both queries, so a good start. The new query fetches data just as fast as it counts, < 10 seconds. The old query gets the count in ~20 seconds, and as I mentioned before, the data takes close to 6-7 hours. It is currently running so that I can do some kind of analysis to see if the new query is valid, but I thought that I'd ask here as well to see if there is anything apparently wrong that I have done?
EDIT Also removed the outer-most query, which did not seem to fulfill any kind of purpose, except maybe making the query look cooler.. or something.. I dunno..

Expanding on my comment... if I try to replicate your query structure using built-in views it also runs for a long time. For example, getting the most recently created table for each owner (purely for demo purposes, it can be done more simply) like this takes several minutes, with either version:
SELECT
owner,
object_name
FROM
all_objects a
WHERE
(a.owner, a.object_type, TRUNC(a.created)) = (
SELECT
b.owner, b.object_type, TRUNC(MAX(b.created))
FROM
all_objects b
JOIN all_tables e
ON e.owner = b.owner and e.table_name = b.object_name
JOIN all_users c
ON c.username = b.owner
WHERE
b.owner = a.owner AND
b.object_type = a.object_type
GROUP BY
b.owner,
b.object_type
);
If I rewrite that to avoid the self-join on all_objects (equivalent to some_view in your example) by using an analytic function instead:
SELECT
owner,
object_name
FROM (
SELECT
a.owner,
a.object_name,
row_number() over (partition by a.owner, a.object_type
order by a.created desc) as rn
FROM
all_objects a
JOIN all_tables e
ON e.owner = a.owner and e.table_name = a.object_name
JOIN all_users c
ON c.username = a.owner
)
WHERE
rn = 1;
... then it takes a few seconds.
Now, in this case I don't get exactly the same output because I have multiple objects created at the same time (within the same second as far as created is concerned).
I don't know the precision of the values stored in your registered_date of course. So you might need to look at different functions, possibly rank rather than row_number, or adjust the ordering to deal with ties if necessary.
rank() over (partition by a.owner, a.object_type
order by trunc(a.created) desc) as rn
...
WHERE
rn = 1;
gives me the same results (well, almost; the join to all_tables is also skewing things, as I seem to have tables listed in all_objects that aren't in all_tables, but that's a side issue). Or max could work too:
max(created) over (partition by a.owner, a.object_type) as mx
...
WHERE
TRUNC(created) = TRUNC(mx)
In both of those I'm using trunc to get everything on the same day; you may not need to if your registered_date doesn't have a time component.
But of course, check you do actually get the same results.

Related

How to run sql queries with multiple with clauses(sub-query refactoring)?

I have a code block that has 7/8 with clauses(
sub-query refactoring) in queries. I'm looking on how to run this query as I'm getting 'sql compilation errors' when running these!, While I'm trying to run them I'm getting errors in snowflake. for eg:
with valid_Cars_Stock as (
select car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
)
, car_sale_hist as (
select vw.issue_id, vw.delivery_effective_ts, bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
)
,
So how to run this 2 queries separately or together! I'm new to sql aswell any help would be ideal
So lets just reformate that code as it stands:
with valid_Cars_Stock as (
select
car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
), car_sale_hist as (
select
vw.issue_id,
vw.delivery_effective_ts,
bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw
on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b
on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
),
There are clearly part of a larger block of code.
For an aside of CTE, you should 100% ignore anything anyone (including me) says about them. They are 2 things, a statical sugar, and they allow avoidance of repetition, thus the Common Table Expression name. Anyways, they CAN perform better than temp tables, AND they CAN perform worse than just repeating the say SQL many times in the same block. There is no one rule. Testing is the only way to find for you SQL what is "fastest" and it can and does change as updates/releases are made. So ignoring performance comments.
if I am trying to run a chain like this to debug it I alter the point I would like to stop normally like so:
with valid_Cars_Stock as (
select
car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
)--, car_sale_hist as (
select
vw.issue_id,
vw.delivery_effective_ts,
bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw
on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b
on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
;), NEXT_AWESOME_CTE_THAT_TOTALLY_MAKES_SENSE (
-- .....
and now the result of car_sale_hist will be returned. because we "completed" the CTE chain by not "starting another" and the ; stopped the this is all part of my SQL block.
Then once you have that steps working nicely, remove the semi-colon and end of line comments, and get of with value real value.

optimizing a large "distinct" select in postgres

I have a rather large dataset (millions of rows). I'm having trouble introducing a "distinct" concept to a certain query. (I putting distinct in quotes, because this could be provided by the posgtres keyword DISTINCT or a "group by" form).
A non-distinct search takes 1ms - 2ms ; all attempts to introduce a "distinct" concept have grown this to the 50,000ms - 90,000ms range.
My goal is to show the latest resources based on their most recent appearance in an event stream.
My non-distinct query is essentially this:
SELECT
resource.id AS resource_id,
stream_event.event_timestamp AS event_timestamp
FROM
resource
JOIN
resource_2_stream_event ON (resource.id = resource_2_stream_event.resource_id)
JOIN
stream_event ON (resource_2_stream_event.stream_event_id = stream_event.id)
WHERE
stream_event.viewer = 47
ORDER BY event_timestamp DESC
LIMIT 25
;
I've tried many different forms of queries (and subqueries) using DISTINCT, GROUP BY and MAX(event_timestamp). The issue isn't getting a query that works, it's getting one that works in a reasonable execution time. Looking at the EXPLAIN ANALYZE output for each one, everything is running off of indexes. Th problem seems to be that with any attempt to deduplicate my results, postges must assemble the entire resultset onto disk; since each table has millions of rows, this becomes a bottleneck.
--
update
here's a working group-by query:
EXPLAIN ANALYZE
SELECT
resource.id AS resource_id,
max(stream_event.event_timestamp) AS stream_event_event_timestamp
FROM
resource
JOIN resource_2_stream_event ON (resource_2_stream_event.resource_id = resource.id)
JOIN stream_event ON stream_event.id = resource_2_stream_event.stream_event_id
WHERE (
(stream_event.viewer_id = 57) AND
(resource.condition_1 IS NOT True) AND
(resource.condition_2 IS NOT True) AND
(resource.condition_3 IS NOT True) AND
(resource.condition_4 IS NOT True) AND
(
(resource.condition_5 IS NULL) OR (resource.condition_6 IS NULL)
)
)
GROUP BY (resource.id)
ORDER BY stream_event_event_timestamp DESC LIMIT 25;
looking at the query planner (via EXPLAIN ANALYZE), it seems that adding in the max+groupby clause (or a distinct) forces a sequential scan. that is taking about half the time to computer. there already is an index that contains every "condition", and i tried creating a set of indexes (one for each element). none work.
in any event, the difference is between 2ms and 72,000ms
Often, distinct on is the most efficient way to get one row per something. I would suggest trying:
SELECT DISTINCT ON (r.id) r.id AS resource_id, se.event_timestamp
FROM resource r JOIN
resource_2_stream_event r2se
ON r.id = r2se.resource_id JOIN
stream_event se
ON r2se.stream_event_id = se.id
WHERE se.viewer = 47
ORDER BY r.id, se.event_timestamp DESC
LIMIT 25;
An index on resource(id, event_timestamp) might help performance.
EDIT:
You might try using a CTE to get what you want:
WITH CTE as (
SELECT r.id AS resource_id,
se.event_timestamp AS stream_event_event_timestamp
FROM resource r JOIN
resource_2_stream_event r2se
ON r2se.resource_id = r.id JOIN
stream_event se
ON se.id = r2se.stream_event_id
WHERE ((se.viewer_id = 57) AND
(r.condition_1 IS NOT True) AND
(r.condition_2 IS NOT True) AND
(r.condition_3 IS NOT True) AND
(r.condition_4 IS NOT True) AND
( (r.condition_5 IS NULL) OR (r.condition_6 IS NULL)
)
)
)
SELECT resource_id, max(stream_event_event_timestamp) as stream_event_event_timestamp
FROM CTE
GROUP BY resource_id
ORDER BY stream_event_event_timestamp DESC
LIMIT 25;
Postgres materializes the CTE. So, if there are not that many matches, this may speed the query by using indexes for the CTE.

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.
Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

Complicated Calculation Using Oracle SQL

I have created a database for an imaginary solicitors, my last query to complete is driving me insane. I need to work out the total a solicitor has made in their career with the company, I have time_spent and rate to multiply and special rate to add. (special rate is a one off charge for corporate contracts so not many cases have them). the best I could come up with is the code below. It does what I want but only displays the solicitors working on a case with a special rate applied to it.
I essentially want it to display the result of the query in a table even if the special rate is NULL.
I have ordered the table to show the highest amount first so i can use ROWNUM to only show the top 10% earners.
CREATE VIEW rich_solicitors AS
SELECT notes.time_spent * rate.rate_amnt + special_rate.s_rate_amnt AS solicitor_made,
notes.case_id
FROM notes,
rate,
solicitor_rate,
solicitor,
case,
contract,
special_rate
WHERE notes.solicitor_id = solicitor.solicitor_id
AND solicitor.solicitor_id = solicitor_rate.solicitor_id
AND solicitor_rate.rate_id = rate.rate_id
AND notes.case_id = case.case_id
AND case.contract_id = contract.contract_id
AND contract.contract_id = special_rate.contract_id
ORDER BY -solicitor_made;
Query:
SELECT *
FROM rich_solicitors
WHERE ROWNUM <= (SELECT COUNT(*)/10
FROM rich_solicitors)
I'm suspicious of your use of ROWNUM in your example query...
Oracle9i+ supports analytic functions, like ROW_NUMBER and NTILE, to make queries like your example easier. Analytics are also ANSI, so the syntax is consistent when implemented (IE: Not on MySQL or SQLite). I re-wrote your query as:
SELECT x.*
FROM (SELECT n.time_spent * r.rate_amnt + COALESCE(spr.s_rate_amnt, 0) AS solicitor_made,
n.case_id,
NTILE(10) OVER (ORDER BY solicitor_made) AS rank
FROM NOTES n
JOIN SOLICITOR s ON s.solicitor_id = n.solicitor_id
JOIN SOLICITOR_RATE sr ON sr.solicitor_id = s.solicitor_id
JOIN RATE r ON r.rate_id = sr.rate_id
JOIN CASE c ON c.case_id = n.case_id
JOIN CONTRACT cntrct ON cntrct.contract_id = c.contract_id
LEFT JOIN SPECIAL_RATE spr ON spr.contract_id = cntrct.contract_id) x
WHERE x.rank = 1
If you're new to SQL, I recommend using ANSI-92 syntax. Your example uses ANSI-89, which doesn't support OUTER JOINs and is considered deprecated. I used a LEFT OUTER JOIN against the SPECIAL_RATE table because not all jobs are likely to have a special rate attached to them.
It's also not recommended to include an ORDER BY in views, because views encapsulate the query -- no one will know what the default ordering is, and will likely include their own (waste of resources potentially).
you need to left join in the special rate.
If I recall the oracle syntax is like:
AND contract.contract_id = special_rate.contract_id (+)
but now special_rate.* can be null so:
+ special_rate.s_rate_amnt
will need to be:
+ coalesce(special_rate.s_rate_amnt,0)

Optimise a Query Which is taking too long to Run

Im Using toad for Oracle to run a query which is taking much too long to run, sometimes over 15 minutes.
The query is pulling memos which are left to be approved by managers. The query is not bringing back alot of rows. Typically when it is run it will return about 30 or 40 rows. The query needs to access a few tables for its information so I'm using alot of joins to get this information.
I have attached my query below.
If anyone can help with optimising this query I would be very greatfull.
Query:
SELECT (e.error_Description || DECODE(t.trans_Comment, 'N', '', '','', ' - ' || t.trans_Comment)) AS Title,
t.Date_Time_Recorded AS Date_Recorded,
DECODE(t.user_ID,0,'System',(SELECT Full_Name FROM employee WHERE t.user_Id = user_id)) AS Recorded_by,
DECODE(t.user_ID,0, Dm_General.getCalendarShiftName(t.Date_Time_Recorded), (SELECT shift FROM employee WHERE t.user_Id = user_id)) AS Shift,
l.Lot_Number AS entity_number,
ms.Line_Num,
'L' AS Entity_Type,
t.entity_id, l.lot_Id AS Lot_Id
FROM DAT_TRANSACTION t
JOIN ADM_ERRORCODES e ON e.error_id = t.error_id
JOIN ADM_ACTIONS a ON a.action_id = t.action_id,
DAT_LOT l
INNER JOIN Status s ON l.Lot_Status_ID = s.Status_ID,
DAT_MASTER ms
INNER JOIN ADM_LINE LN ON ms.Line_Num = LN.Line_Num
WHERE
(e.memo_req = 'Y' OR a.memo_req = 'Y')
AND ms.Run_type_Id = Constants.Runtype_Production_Run --Production Run type
AND s.completed_type NOT IN ('D', 'C', 'R') -- Destroyed /closed / Released
AND LN.GEN = '2GT'
AND (NOT EXISTS (SELECT 1 FROM LNK_MEMO_TRANS lnk, DAT_MEMO m
WHERE lnk.Trans_ID = t.trans_id AND lnk.Memo_ID = m.Memo_ID
AND NVL(m.approve, 'Y') = 'Y'))--If it's null, it's
been created and is awaiting approval
AND l.Master_ID = ms.Master_ID
AND t.Entity_ID = l.Lot_ID
AND t.Entity_Type IN ('L', 'G');
The usual cause for bad performance of queries is that Oracle can't find an appropriate index. Use EXPLAIN PLAN with TOAD so Oracle can tell you what it thinks the best way to execute the query. That should give you some idea when it uses indexes and when not.
For general pointers, see http://www.orafaq.com/wiki/Oracle_database_Performance_Tuning_FAQ
See here for EXPLAIN PLAN.
You have some function calls in your SQL:
dm_general.getcalendarshiftname(t.date_time_recorded)
constants.runtype_production_run
Function calls are slow in SQL, and depending on the query plan may get called redundantly many times - e.g. computing dm_general.getcalendarshiftname for rows that end up being filtered out of the results.
To see if this is a significant factor, try replacing the function calls with literal constants temporarily and see if the performance improves.
The number of function calls can sometimes be reduced by restructuring the query like this:
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
This ensures that myfunction is only called for rows that will appear in the results.
I have replaced function calls with literal constants and this speeds it up by only a second or 2. The query is still taking about about 50 seconds to run.
Is there anything I can do around the Joins to help spped this up. Have a used the INNER JOIN function correctly here.
Im not really sure I understand what you mean about the below or how to use it.
I get the error d invalid identifier when I try to call the function in the second select
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
Any other views would be greatly appreciated
Before we can say anything sensible, we have to take a look at where time is being spent. And that means you have to collect some information first.
Therefore, my standard reaction to a question like this, is this one: http://forums.oracle.com/forums/thread.jspa?threadID=501834
Regards,
Rob.