Avoiding Correlated Subquery in Oracle - sql

In Oracle 9.2.0.8, I need to return a record set where a particular field (LAB_SEQ) is at a maximum (it is a sequential VARCHAR array '0001', '0002', etc.) for each of another field (WO_NUM). To select the maximum, I am attempting to order in descending order and select the first row. Everything I can find on StackOverflow suggests that the only way to do this is with a correlated subquery. Then I use this maximum in the WHERE clause of the outer query to get the row I want for each WO_NUM:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT LAB_SEQ FROM (
SELECT lab.LAB_SEQ FROM LAB_TIM lab WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM ORDER BY ROWNUM DESC
) WHERE ROWNUM=1
)
However, this returns an invalid identifier for lt.WO_NUM error. Research suggests that ORacle 8 only allows correlated subqueries one level deep, and suggests rewriting to avoid the subquery - something which discussion of selecting maximums suggests can't be done. Any help getting this statement to execute would be greatly appreciated.

Your correlated subquery would need to be something like
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT max(lab.LAB_SEQ)
FROM LAB_TIM lab
WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM
)
Since you are on Oracle 9.2, it will probably be more efficient to use a correlated subquery. I'm not sure what the predicates lab.CCN='1' AND MAS_LOC='1' are doing in your current query so I'm not quite sure how to translate them into the analytic function approach. Is the combination of LAB_SEQ and WO_NUM not unique in LAB_TIM? Do you need to add in the predicates on CCN and MAS_LOC in order to get a single unique row for every WO_NUM? Or are you using those predicates to decrease the number of rows in your output? The basic approach will be something like
SELECT *
FROM (SELECT lt.WO_NUM,
lt.EMP_NUM,
lt.LAB_END_DATE,
lt.LAB_END_TIME,
rank() over (partition by wo_num
order by lab_seq desc) rnk
FROM LAB_TIM lt)
WHERE rnk = 1
but it's not clear to me whether CCN and MAS_LOC need to be added to the ORDER BY clause in the analytic function or whether they need to be added to the WHERE clause.

This is one case where a correlated subquery is better, particularly if you have indexes on the table. However, it should be possible to rewrite correlated subqueries as joins.
I think the following is equivalent, without the correlated subquery:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM (select *, rownum as r
from LAB_TIM lt
) lt join
(select wo_num, max(r) as maxrownum
from (select LAB_SEQ, wo_num, rownum as r
from LAB_TIM lt
where lab.CCN = '1' AND MAS_LOC = '1'
)
) ltsum
on lt.wo_num = ltsum.wo_num and
lt.r = ltsum.maxrownum
I'm a little unsure about how Oracle works with rownums in things like ORDER BY.

Related

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

Optimize Query (remove subquery)

Can you help me to optimize this Query ?. I need to remove the subquery because the performance is awful.
select LICENSE,
(select top 1 SERVICE_KEY
from SERVICES
where SERVICES.LICENSE = VEHICLE.LICENSE
order by DATE desc, HOUR desc)
from VEHICLE
The problem is that I can have two SERVICES on the same DATE and HOUR, so I haven't been able to code an equivalent SQL avoiding the subquery.
The query runs on a Legacy database where I can't modify its metadata, and it doesn't have any index at all. That's the reason to look for a solution that can avoid a correlated query.
Thank you.
You can express your query using ROW_NUMBER() without the need for a correlated subquery. Try the following query and see how the peformance is:
SELECT t.LICENSE, t.SERVICE_KEY
FROM
(
SELECT t1.LICENSE, t1.SERVICE_KEY
ROW_NUMBER() OVER (PARTITION BY t1.LICENSE
ORDER BY t2.DATE DESC, t2.HOUR DESC) rn
FROM VEHICLE t1
INNER JOIN SERVICES t2
ON t1.LICENSE = t2.LICENSE
) t
WHERE t.rn = 1
The performance of this query would depend, among other things, on having indices on the join columns of your two tables.

PostgreSQL: Select only the first record per id based on sort order

For the following query I need to select only the first record with the lowest shape_type value (ranges from 1 to 10). If you have any knowledge on how to easily do this is postgresql, please help. Thanks for your time.
select g.geo_id, gs.shape_type
from schema.geo g
join schema.geo_shape gs on (g.geo_id=gs.geo_id)
order by gs.shape_type asc;
PostgreSQL have very nice syntax for this types of queries - distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the "first row" of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first.
So your query becomes:
select distinct on(g.geo_id)
g.geo_id, gs.shape_type
from schema.geo g
join schema.geo_shape gs on (g.geo_id=gs.geo_id)
order by g.geo_id, gs.shape_type asc;
In general ANSI-SQL syntax for this (in any RDBMS with window functions and common table expression, which could be switched to subquery) would be:
with cte as (
select
row_number() over(partition by g.geo_id order by gs.shape_type) as rn,
g.geo_id, gs.shape_type
from schema.geo g
join schema.geo_shape gs on (g.geo_id=gs.geo_id)
)
select
geo_id, shape_type
from cte
where rn = 1

Order by not working in Oracle subquery

I'm trying to return 7 events from a table, from todays date, and have them in date order:
SELECT ID
FROM table
where ID in (select ID from table
where DATEFIELD >= trunc(sysdate)
order by DATEFIELD ASC)
and rownum <= 7
If I remove the 'order by' it returns the IDs just fine and the query works, but it's not in the right order. Would appreciate any help with this since I can't seem to figure out what I'm doing wrong!
(edit) for clarification, I was using this before, and the order returned was really out:
select ID
from TABLE
where DATEFIELD >= trunc(sysdate)
and rownum <= 7
order by DATEFIELD
Thanks
The values for the ROWNUM "function" are applied before the ORDER BY is processed. That why it doesn't work the way you used it (See the manual for a similar explanation)
When limiting a query using ROWNUM and an ORDER BY is involved, the ordering must be done in an inner select and the limit must be applied in the outer select:
select *
from (
select *
from table
where datefield >= trunc(sysdate)
order by datefield ASC
)
where rownum <= 7
You cannot use order by in where id in (select id from ...) kind of subquery. It wouldn't make sense anyway. This condition only checks if id is in subquery. If it affects the order of output, it's only incidental. With different data query execution plan might be different and output order would be different as well. Use explicit order by at the end of the main query.
It is well known 'feature' of Oracle that rownum doesn't play nice with order by. See http://www.adp-gmbh.ch/ora/sql/examples/first_rows.html for more information. In your case you should use something like:
SELECT ID
FROM (select ID, row_number() over (order by DATEFIELD ) r
from table
where DATEFIELD >= trunc(sysdate))
WHERE r <= 7
See also:
http://www.orafaq.com/faq/how_does_one_select_the_top_n_rows_from_a_table
http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html
http://asktom.oracle.com/pls/asktom/f?p=100:11:507524690399301::::P11_QUESTION_ID:127412348064
See also other similar questions on SO, eg.:
Oracle SELECT TOP 10 records
Oracle/SQL - Select specified range of sequential records
Your outer query cant "see" the ORDER in the inner query and in this case the order in the inner doesn't make sense because it (the inner) is only being used to create a subset of data that will be used on the WHERE of the outer one, so the order of this subset doesn't matter.
maybe if you explain better what you want to do, we can help you
ORDER BY CLAUSE IN Subqueries:
the order by clause is not allowed inside a subquery, with the exception of the inline views. If attempt to include an ORDER BY clause, you receive an error message
An inline View is a query at the from clause.
SELECT t.*
FROM (SELECT id, name FROM student) t

Optimize sql query with the rank function

This query gets the top item in each group using the ranking function.
I want to reduce the number of inner selects down to two instead of three. I tried using the rank() function in the innermost query, but couldn't get it working along with an aggregate function. Then I couldn't use a where clause on 'itemrank' without wrapping it in yet another select statement.
Any ideas?
select *
from (
select
tmp.*,
rank() over (partition by tmp.slot order by slot, itemcount desc) as itemrank
from (
select
i.name,
i.icon,
ci.slot,
count(i.itemid) as itemcount
from items i
inner join citems ci on ci.itemid = i.itemid
group by i.name, i.icon, ci.slot
) as tmp
) as popularitems
where itemrank = 1
EDIT: using sql server 2008
In Oracle and Teradata (and perhaps others too), you can use QUALIFY itemrank = 1 to get rid of the outer select. This is not part of the ANSI standard.
You can use Common Table Expressions in Oracle or in SQL Server.
Here is the syntax:
WITH expression_name [ ( column_name [,...n] ) ]
AS
( CTE_query_definition )
The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
The statement to run the CTE is:
SELECT <column_list>
FROM expression_name;