Oracle - NVL(col1,col2) Order By slowness - sql

There is a column in Select clause NVL(b.name, a.name) and I am using this column in Order By due to which the Oracle query has become slow.
I tried creating index on the NAME column but of no use.
SELECT
*
FROM
(
SELECT
nvl(b.name,a.name) AS b_a_name, -- Order by is using this column and hence the slowness. Index is present on NAME column but of no use
b.name b_name,
a.name a_name
FROM
employee a
LEFT JOIN employee b ON a.parent_id = b.child_id
)
ORDER BY b_a_name --- this Order By is taking time
;
I expect how to tune Order By clause or how can I re-write the query to get the same output but with improved performance.

you can write your query without using the subquery though not sure it will improve any performance or not
SELECT nvl(b.name,a.name) AS b_a_name,
b.name a_name,
a.name b_name
FROM
employee a
LEFT JOIN employee b ON a.parent_id = b.child_id
order by b_a_name

How about removing NVL from ORDER BY?
SELECT NVL(b.name, a.name) AS b_a_name,
b.name a_name,
a.name b_name
FROM employee a
LEFT JOIN employee b ON a.parent_id = b.child_id
ORDER BY b.name, a.name;
Anyway, ORDER BY will slow things down. Unordered set is always retrieved faster.
By the way, why did you do that with column aliases? To confuse the enemy? Well, you confused me.
b.name a_name --> shouldn't that be b_name
a.name b_name --> a_name

Here the time was taken by the IO time and no degradation by (CPU time)Order By clause. I confirmed this by putting all data of query into a table and then applied Order by. All time was consumed by writing the data to table. Response time is 0.2 secs but IO time is 6 secs which cant be reduced, parallel hint may help.

Related

How is my SQL subquery evaluating to more than one row? When run independently, it works fine, but doesn't work in a SELECT subquery

I am obtaining values from two tables. When I run the subquery in its own PROC SQL statement in SAS, it runs fine, with the count of citations for each ID. When I input the subquery into my SELECT outer query, it gives me ERROR: Subquery evaluated to more than one row. I am having a hard time determining the cause of this issue.
The subquery should result in one row of count of citations per ID. I am trying to get the count of citations (per ID) into my outer query. Not all items from B will be in A (hence the left join on B).
SELECT
A.AREA
,A.NAME
,B.ID
,(
SELECT
COUNT(B.TYPE)
FROM
EVAL.CITATIONS AS B
GROUP BY
B.ID
)
AS COUNT_CITATIONS
FROM
EVAL.OCT AS A
LEFT JOIN EVAL.CITATIONS
ON A.DBA = B.NAME
ORDER BY A.NAME ASC
;
I expected the outer query to pull the counts for the citations per ID. The citations are coming from table B (which I'm using to left join into table A). I have been searching forums for this error and I understand that my query is resulting in more than one row, but I can't figure out why the outer query is not simply pulling the counts I need from ID when the left join completes.
I also tried adding in the subquery this WHERE clause after researching some similar questions to no avail.
WHERE FACID = CDPH_CITATIONS.FACID
You need to use a correlated subquery where your subquery references your main query e.g.
SELECT
A.AREA
,A.NAME
,B.ID
,(
SELECT
COUNT(C.TYPE)
FROM
EVAL.CITATIONS AS C
WHERE B.ID = C.ID
GROUP BY
C.ID
)
AS COUNT_CITATIONS
FROM
EVAL.OCT AS A
LEFT JOIN EVAL.CITATIONS
ON A.DBA = B.NAME
ORDER BY A.NAME ASC
However, I don’t think you need a subquery at all, you can just count the B records and group by the other columns in your main query
Your query fails because the subquery in the SELECT clause returns more than one row: it returns the count of each and every id in the citations table, while you want just the count of the "current" id. There is also a problem with the scoping of the table identifiers (eg: B is defined in the subquery but used in the outer query).
To avoid that, we can correlate the subquery with the outer query (and fix the table aliases):
SELECT o.AREA, o.NAME, c.ID,
(SELECT COUNT(c.TYPE) FROM EVAL.CITATIONS WHERE c1.ID = c.ID) AS COUNT_CITATIONS
FROM EVAL.OCT AS o
LEFT JOIN EVAL.CITATIONS AS c ON c.NAME = o.DBA
ORDER BY o.NAME ASC
Now this can be further optimized using window functions; we do nott need to reopen the citations table in a subquery, we can perform a window count instead; this is much more efficient since the table is scanned only once. So:
SELECT o.AREA, o.NAME, c.ID,
COUNT(c.TYPE) OVER(PARTITION BY c.ID) AS COUNT_CITATIONS
FROM EVAL.OCT AS o
LEFT JOIN EVAL.CITATIONS AS c ON o.DBA = c.NAME
ORDER BY o.NAME ASC
You are getting multiple values because you did not tell it to only run the subquery for the single value of ID that matched value on the current observation of the outer query.
But in SAS there is no need to get so tricky. SAS will happily re-merge the aggregate values back onto all observations that share the group by variable values.
proc sql;
SELECT
A.NAME
,A.AREA
,B.ID
,COUNT(B.TYPE) AS COUNT_CITATIONS
FROM EVAL.OCT AS A
LEFT JOIN EVAL.CITATIONS B
ON A.DBA = B.NAME
GROUP BY ID
ORDER BY NAME
;
quit;

Optimize a Group By that is done on a joined table in Azure SQL Data Warehouse

I've been working at optimizing a query that I've been given, I'm curious if there is any way to speed up the following query that I have.
This isn't the exact query that I am using but I think it serves to show what I am working with.
(I'm using a data warehouse for the following)
SELECT sum(a), b, c, d FROM
(
SELECT a, tb.b, tc.c, td.d FROM table_a ta
LEFT JOIN table_b tb on ta.g = tb.g
LEFT JOIN table_c tc on ta.f = tc.f
LEFT JOIN table_d td on ta.e = td.e
) as table
GROUP BY b, c, d
I have the inner query quite optimized so I don't think there is any issue there, the problem is when I attempt a group by on these values it slows down quite significantly, I'm pretty sure the query is having trouble distributing the work on the group by clause since nothing is indexed within the sub query.
Any suggestions?

SQL Help- in rewriting a query

How can we reduce the Execution time of the below query?
Need help in rewriting below SQL query in a more efficient way?
SELECT A.*, C.*, F.*, D.*
FROM TABLE1 A INNER JOIN
TABLE2 C
ON A.CODE = C.CODE INNER JOIN
TABLE3 D
ON A.CODE = D.CODE INNER JOIN
TABLE4 F
ON A.CODE = F.CODE
WHERE D.IND1 = 'N' AND
D.IND2 = 'N' AND
D.EFF_DATE = (SELECT MAX(X.EFF_DATE)
FROM TABLE3 X
WHERE X.CODE = D.CODE AND X.EFF_DATE <= A.EFFECTIVE_DATE
) AND
F.EFF_DATE = (SELECT MAX(Z.EFF_DATE)
FROM TABLE4 Z
WHERE Z.DETAIL_CODE = F.DETAIL_CODE AND Z.EFF_DATE <= A.EFFECTIVE_DATE
)
For performance, I would start with indexes on:
TABLE3(IND1, IND2, CODE, EFF_DATE)
TABLE3(CODE, EFF_DATE)
TABLE1(CODE, EFF_DATE)
TABLE2(CODE)
TABLE4(CODE)
TABLE4(DETAIL_CODE, EFF_DATE)
If you have performance issues, though, I suspect your code may be generating unexpected Cartesian products. Debugging that requires much more information. I might suggest asking another question.
If you can find out where the bottlenecks in your query are -- i.e. sub-queries, joins -- that will give you a better idea of what to look at. In the absence of that, take a look at:
modify your column projections (i.e. A., C., F., D.) to only return the columns you need
look at table partitioning for the queries accessing rows based on DATE values (TABLE3.EFF_DATE, TABLE4.EFF_DATE) (http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56partition-090450.html)
look at adding materialized view(s) either on the entire query OR the sub-queries (https://oracle-base.com/articles/misc/materialized-views)
look at statistic generation if the query plan is not optimal (https://docs.oracle.com/cd/A97630_01/server.920/a96533/stats.htm#26713)
If you can provide an EXPLAIN plan (or Oracle's equivalent), that would be helpful.
Note that because of the conditions on the two sub-queries all the records in your result will have D.EFF_DATE <= A.EFFECTIVE_DATE and F.EFF_DATE <= A.EFFECTIVE_DATE, so I would suggest to put those conditions in the JOIN clauses.
Secondly, analytic functions may give better performance than subqueries:
SELECT *
FROM (
SELECT A.*,C.*,F.*,D.*,
RANK() OVER (PARTITION BY D.CODE
ORDER BY D.EFF_DATE DESC) AS D_RANK,
RANK() OVER (PARTITION BY F.DETAIL_CODE
ORDER BY F.EFF_DATE DESC) AS F_RANK
FROM TABLE1 A
INNER JOIN TABLE2 C
ON A.CODE = C.CODE
INNER JOIN TABLE3 D
ON A.CODE = D.CODE
AND D.EFF_DATE <= A.EFFECTIVE_DATE
INNER JOIN TABLE4 F
ON A.CODE = F.CODE
AND F.EFF_DATE <= A.EFFECTIVE_DATE
WHERE D.IND1 = 'N'
AND D.IND2 = 'N'
)
WHERE D_RANK = 1 AND F_RANK = 1
Evidently you need to have the right indexes to optimise the execution plan.
Another thing to consider is the total number of columns your query returns, you seem to be selecting all the columns from 4 tables.
We found that our complex queries ran in under a second when selecting only a few columns but took orders of magnitude longer when selecting many columns.
Question why you need so many columns in your result set.

SQL Server query perfomance tuning with group by and join clause

We have been experiencing performance concerns over job and I could fortunately find the query causing the slowness..
select name from Student a, Student_Temp b
where a.id = b.id and
a.name in (select name from Student
group by name having count(*) = #sno)
group by a.name having count(*) = #sno
OPTION (MERGE JOIN, LOOP JOIN)
This particular query is iteratively called many times slowing down the performance..
Student table has 8 Million records and Student_temp receives 5-20 records in the iteration process each time.
Student table has composite primary key on ( id and name)
and sno = No of records in Student_Temp.
My questions are below,
1) why does this query show performance issues.
2) could you guys give a more efficient way of writing this piece ?
Thanks in Advance !
It's repeating the same logic unnecessarily. You really just want:
Of the Student(s) who also exist in Student_temp
what names exist #sno times?
Try this:
SELECT
name
FROM
Student a JOIN
Student_Temp b ON a.id = b.id
GROUP BY
name
HAVING
count(*) = #sno
Your query returns the following result: Give me all names that are #sno times in the table Student and exactly once in Student_temp.
You can rewrite the query like this:
SELECT a.name
FROM Student a
INNER JOIN Student_temp b
ON a.id = b.id
GROUP BY a.name
HAVING COUNT(*) = #sno
You should omit the query hint unless you are absolutely sure that the query optimizer screws up.
EDIT: There is of course a difference between the queries: if for instance #sno=2 then a name that shows up once in Student but twice in Student_temp would be included in my query but not in the original. I depends on what you really want to achieve whether that needs to be adressed or not.
Here you go
select name
from Student a
inner join Student_Temp b
on a.id = b.id
group by a.name
HAVING COUNT(*) = #sno

Rewrite SQL and use of group by

I have written below sql for one of the requirement and is fetching my results. But, I am wondering if there is any better way of writing this query rather than using alias table as A.
SELECT A.*,B.OPRDEFNDESC FROM
( select OPRID_ENTERED_BY ,COUNT(*)
from ps_req_hdr
where entered_dt > '01-JUL-2012'
GROUP BY OPRID_ENTERED_BY
ORDER BY COUNT(*) DESC) A, PSOPRDEFN B
WHERE A.OPRID_ENTERED_BY=B.OPRID
You may be able to use a simple INNER JOIN to do the same thing...
SELECT A.OPRID_ENTERED_BY, COUNT(*), B.OPRDEFNDESC
FROM ps_req_hdr A
JOIN PSOPRDEFN B ON A.OPRID_ENTERED_BY = B.OPRID
WHERE A.entered_dt > '01-JUL-2012'
GROUP BY A.OPRID_ENTERED_BY, B.OPRDEFNDESC
ORDER BY COUNT(*) DESC
NOTE
As per the comments below, the COUNT(*) result for this query will NOT include records that don't have corresponding matches in table B, and it will inflate for non-unique matches in table B. What this means is: if B.OPRID is not a unique field or if A.OPRID_ENTERED_BY is not a foreign key for B.OPRID then this answer will not yield the same results as the original query.