Oracle/SQL - Need help optimizing this union/group/count query

Oracle/SQL - Need help optimizing this union/group/count query - sql

I'm trying to optimize this query however possible. In my test tables this does exactly what I want it too, but on the live tables this takes a VERY long time to run.
select THING_,
count(case STATUS_ when '_Good_' then 1 end) as GOOD,
count(case STATUS_ when '_Bad_' then 1 end) as BAD,
count(case STATUS_ when '_Bad_' then 1 end) / count(case STATUS_ when '_Good_' then 1 end) * 100 as FAIL_PERCENT
from
(
select THING_,
STATUS_,
from <good table>
where TIMESTAMP_ > (sysdate - 1) and
STATUS_ = '_Good_' and
upper(THING_) like '%TEST%'
UNION ALL
select THING_,
STATUS_,
from <bad table>
where TIMESTAMP_ > (sysdate - 1) and
STATUS_ = '_Bad_' and
THING_THING_ like '%TEST%'
) u
group by THING_
I think by looking at the query it should be self explanatory what I want to do, but if not or if additional info is needed please let me know and I will post some sample tables.
Thanks!

Create composite indexes on (STATUS_, TIMESTAMP_) in both tables.

(1) Looking at the execution plan should always be your first step in diagnosing SQL performance issues
(2) A possible problem with the query as written is that, because SYSDATE is a function that is not evaluated until execution time (i.e. after the execution plan is determined), the optimizer cannot make use of histograms on the timestamp column to evaluate the utility of an index. I have seen that lead to bad optimizer decisions. If you can work out a way to calculate the date first then feed it into the query as a bind or a literal, that may help, although this is really just a guess.
(3) Maybe a better overall way to structure the query would be as a join (possibly full outer join) between aggregate queries on each of the tables.
SELECT COALESCE(g.thing_,b.thing_), COALESCE(good_count,0), COALESCE(bad_count,0)
FROM (SELECT thing_,count(*) good_count from good_table WHERE ... GROUP BY thing_) g
FULL OUTER JOIN
(SELECT thing_,count(*) bad_count from bad_table WHERE ... GROUP BY thing_) b
ON b.thing_ = g.thing_
(Have to say, it seems kind of weird that you have two separate tables when you also have a status column to indicate "good" or "bad". But maybe I am overinterpreting.)

Have you tried analytical function to use? It might decrease some time execution. Here you are an example:
select distinct col1, col2, col3
(Select col1,
count(col2) over (partition by col1) col2,
count(col3) over (partition by col1) col3
from table
)
Its something like that.

Related

Filtering the same query 3 different times. Performance?

I have a query that is really slow. I will post pseudo code here.
SELECT
ListofDates.Date as Event,
(SELECT COUNT(DISTINCT TableofExtensiveJoins1.ID)
FROM TableofExtensiveJoins1)
WHERE Event=TableofExtensiveJoins1.Date AND Condition1
(SELECT COUNT(DISTINCT TableofExtensiveJoins2.ID)
FROM TableofExtensiveJoins2
WHERE Event = TableofExtensiveJoins2.Date AND Condition2)
(SELECT COUNT(DISTINCT TableofExtensiveJoins3.ElementID)
FROM TableofExtensiveJoins3
WHERE Event = TableofExtensiveJoins3.Date AND Condition3)
FROM
ListOfDates
One thing to notice here is that TableOfExtensiveJoins1 , 2 and 3 are exactly the same query. But the Where condition is different on every one. Running the same query 3 times just to filter 3 times differently seems a little bit extensive. But as you can see it is necessary because i want to count stuff on the table. The table is each time filtered differently. But because of the "count" I have the fear that SQL compiles the table every time again.
I have that fear because the query runs exceptionally long. The subqueries are really complicated itself. To give you an example: To get only one record of the main query takes around 15 seconds. The sub query itself takes 5 seconds which would explain the 15 seconds, 3*5=15. And to run the whole main query it would likely get a few thousand records. I let it run 50 Minutes one day and it didn't finish. Obviously its not linear but that is beside the point. I just wanted to stress how bad the query is.
So obviously I need to increase performance on that query. For the sake of the optimization lets say i can not create new tables in the database. Else it would be to easy I guess. Lets also assume that TableoExtensiveJoins is already optimized.
So my question here is how can i rewrite the query to run it faster. Compile the table one once and then run the filter on the compilation. The query is run in Microsoft SQL Reporting Services. So there might be limitation on what kind of query are run able. But I'm not 100% sure about this.
Edit: The desired result might be helpful for the right answer.
TableOfExtensiveJoins is basically an event table. Evertime something specific happens (Doesnt matter) a new entry is created.
I now want for any given date to count the number of events with certain conditions. The ListOfDates has a list of dates. It takes the first occurence of the event and then creats a list of dates that than is filtered with Day(Date) % 5=1. So every 5. date.

Try conditional aggregation, kind of
SELECT ListofDates.Date as Event,
COUNT(DISTINCT CASE WHEN Condition 1 THEN tej.ID END) cnt1,
COUNT(DISTINCT CASE WHEN Condition 2 THEN tej.ID END) cnt2,
COUNT(DISTINCT CASE WHEN Condition 3 THEN tej.ID END) cnt3
from ListOfDates lod
left join TableofExtensiveJoins tej on lod.Event=tej.Date
group by lod.Event

The below should perform better as it only evaluates TableofExtensiveJoins once and only needs one operation to get the distinct counts
WITH DistCounts
AS (SELECT COUNT(DISTINCT ID) AS DistCount,
condition_flag,
Date
FROM TableofExtensiveJoins
CROSS APPLY (SELECT 1 WHERE Condition1
UNION ALL
SELECT 2 WHERE Condition2
UNION ALL
SELECT 3 WHERE Condition3) CA(condition_flag)
GROUP BY condition_flag,
Date),
Pivoted
AS (SELECT Date,
MAX(CASE WHEN condition_flag = 1 THEN DistCount END) AS DistCount1,
MAX(CASE WHEN condition_flag = 2 THEN DistCount END) AS DistCount2,
MAX(CASE WHEN condition_flag = 3 THEN DistCount END) AS DistCount3
FROM DistCounts
GROUP BY Date)
SELECT lod.Date as Event,
DistCount1,
DistCount2,
DistCount3
from ListOfDates lod
left join Pivoted p on lod.Date=p.Date

I think you want OUTER APPLY:
SELECT lod.Date as Event, tej.*
From ListOfDates lod OUTER APPLY
(SELECT SUM(CASE WHEN <condition 1> THEN 1 ELSE 0 END) as col1,
SUM(CASE WHEN <condition 2> THEN 1 ELSE 0 END) as col2,
SUM(CASE WHEN <condition 3> THEN 1 ELSE 0 END) as col3
FROM TableofExtensiveJoins tej
WHERE lod.Event = tej.Date
) tej;
Assuming that tej.ID is unique, you don't need the COUNT(DISTINCT). However, if you do:
SELECT lod.Date as Event, tej.*
From ListOfDates lod OUTER APPLY
(SELECT COUNT(DISTINCT CASE WHEN <condition 1> THEN tej.ID END) as col1,
COUNT(DISTINCT CASE WHEN <condition 2> THEN tej.ID END) as col2,
COUNT(DISTINCT CASE WHEN <condition 3> THEN tej.ID END) as col3
FROM TableofExtensiveJoins tej
WHERE lod.Event = tej.Date
) tej;
This generalizes to whatever conditions you might have in the subqueries. As a bonus, lateral joins (the technical term for what APPLY is doing in this case) often have the best performance in SQL Server.

Count instances of value (say, '4') in several columns/ rows

I have survey responses in a SQL database. Scores are 1-5.
Current format of the data table is this:
Survey_id, Question_1, Question_2, Question_3
383838, 1,1,1
392384, 1,5,4
393894, 4,3,5
I'm running a new query where I need % 4's, % 5's ... question doesn't matter, just overall.
At first glance I'm thinking
sum(iif(Question_1 =5,1,0)) + sum(iif(Question_2=5,1,0)) .... as total5s
sum(iif(Question_1=4,1,0)) + sum(iif(Question_2=4,1,0)) .... as total4s
But I am unsure if this is the quickest or most elegant way to achieve this.
EDIT: Hmm on first test this query already appears not to work correctly
EDIT2: I think I need sum instead of count in my example, will edit.

You have to unpivot the data and calculate the % responses thereafter. Because there are a limited number of questions, you can use union all to unpivot the data.
select 100.0*count(case when question=4 then 1 end)/count(*) as pct_4s
from (select survey_id,question_1 as question from tablename
union all
select survey_id,question_2 from tablename
union all
select survey_id,question_3 from tablename
) responses
Another way to do this could be
select 100.0*(count(case when question_1=4 then 1 end)
+count(case when question_2=4 then 1 end)
+count(case when question_3=4 then 1 end))
/(3*count(*))
from tablename
With unpivot as #Dudu suggested,
with unpivoted as (select *
from tablename
unpivot (response for question in (question_1,question_2,question_3)) u
)
select 100.0*count(case when response=4 then 1 end)/count(*)
from unpivoted

Oracle HASH_JOIN_RIGHT_SEMI performance

Here is my query,
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
SHIPMENT_ITEMS is a very large table (10.1TB) , id_map is a very small table (12 rows and 3 columns). This query goes through HASH_JOIN_RIGHT_SEMI and takes a very long time.SHIPMENT_ITEMS is partitioned on ID column.
If I remove subquery with hard code values , it performs lot better
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN (1,2,3 )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
I cannot remove the subquery as it leads to hard coding.
Given that id_map is a very small table , I expect both queries to perform very similar. Why is the first one taking much longer.
I'm actually trying to understand why this performs so bad.
I expect dynamic partition pruning to happen here and I'm not able to come out with a reason on why its not happening
https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_avail.htm#BABHDCJG

Try hint no_unnest.
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT /*+ NO_UNNEST */ ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
CBO will not try to join subquery and use it like filter

Instead of using 'in' operator, use exists and check the query performance
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE Exists ( SELECT 1 FROM id_map map WHERE map.code = 'A' and map.ID = so.ID)
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')

Query Performance Opttimization using NOT IN(Oracle Sql Developer)

I have been trying to optimize performance to the following query. I request all experts in this field to give me a hand and suggestions.
I have app. 70k records and my requirement says to remove duplicates. I need to improve the performance of the below query.
select *
from x.vw_records
where id not in
(select distinct id
from x.vw_datarecords
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords))
union
select distinct id
from x.vw_historyrecords
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords)
union
select distinct id
from x.vw_transactiondata
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords);
union
select distinct id
from x.vw_cashdata
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords)
Currently It takes ten minutes to count no. of rows using count(*). Suggest me any ideas to tune performance of this query.
Thanks in Advance.

I've always found better performance swapping out a NOT IN (query) with a left join + where iS NULL
example instead of:
select *
from x.vw_records
where id not in (
select distinct id
from x.vw_datarecords
where effective_date >= trunc(sysdate - 30)
and book in (
select book_shortname from x.vw_datarecords
)
use:
select *
from x.vw_records vr
left join vw_datarecords vdr on vr.id = vdr.id
and effective_date >= trunc(sysdate - 30)
and book in (
select book_shortname from x.vw_datarecords
)
where vdr.id IS NULL
additionally, you can sometimes get noticeably better performance by doing a group by rather than distinct.

I suspect you need indexes.
What indexes do you have on the tables involved in your query?
& Time to learn how to use an "explain plan" which is an essential tool for query optimization. It isn't that hard to get one. They may be a bit harder to understand however. Please include the explain plan output with your question.
EXPLAIN PLAN FOR
<<Your SQL_Statement here>>
;
SET LINESIZE 130
SET PAGESIZE 0
SELECT * FROM table(DBMS_XPLAN.DISPLAY);
There is absolutely zero benefit from using "select distinct" when you are using "union", do not do both, just do one.

If you could try to use exists/not exists clause in place of in/not in (http://www.techonthenet.com/sql/exists.php). That generally runs much faster.

cleaner way to write this sql

sometimes when I write sql I encounter the following situation:
select A = (
select sum(A)
--... big query using abc
),
select B = (
select sum(B)
--... same big query using abc
)
from abc
Maybe it doesn't look very well, but it's the only way I can think of in some situations. So the question is: big query is repeated, perhaps there is a cleaner way to write same thing?
Clarifications: abc is a bunch of joins. using abc means using current abc row's data. big query is not the same as abc.

Outer apply will help here:
select *
from abc
outer apply (
select sum(a) as sumA, sum(b) as sumB
-- big query using abc
) sums

if the 'big query' is the same in all the subselects, can't you just do:
select sum(a), sum(b)
from abc
where ...big query
Can't be more helpful without a decent set of example data and corresponsing query..

your query could be simplified to
SELECT sum(a) as A, sum(b) as B
FROM abc
although i suspect you've oversimplified your situation

It's hard to say what to do without seeing actual query and what you are trying to achieve. There are some approaches that might be useful.
1. Use CTE or derived table for your big query
2. In some cases it can be replaced with a number of SUM(CASE WHEN [condition] THEN field END)

If A and B are fields, you can just put both sums in the query:
select sum(a), sum(b) from abc
If what you want to do is to aggregate the same rows depending on different conditions, you can often use case. Imagine you have a table TASKS with fields STATUS and EFFORT, and you want to count both ACTIVE and PASSIVE tasks, and get the total effort of each aggregate. You could do:
select
count(case when status = 'ACTIVE' then 1 end) active_nr,
sum(case when status = 'ACTIVE' then effort else 0 end) active_effort,
count(case when status = 'PASSIVE' then 1 end) passive_nr,
sum(case when status = 'PASSIVE' then effort else 0 end) passive_effort
from tasks;
This is a simple example, the predicates tested by case can be as complex as you need, involving multiple fields, etc. As a bonus, this approach will usually be nicer to the database.

select sum(A),sum(B)
--... big query using abc
from abc
No need to split it up.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Oracle/SQL - Need help optimizing this union/group/count query - sql

Create composite indexes on (STATUS_, TIMESTAMP_) in both tables.

Have you tried analytical function to use? It might decrease some time execution. Here you are an example: select distinct col1, col2, col3 (Select col1, count(col2) over (partition by col1) col2, count(col3) over (partition by col1) col3 from table ) Its something like that.

Related

Filtering the same query 3 different times. Performance?

Count instances of value (say, '4') in several columns/ rows

Oracle HASH_JOIN_RIGHT_SEMI performance

Query Performance Opttimization using NOT IN(Oracle Sql Developer)

cleaner way to write this sql

Categories

Resources