Oracle SQL Any comparision with subquery raises right paranthesis missing error - sql

The query works fine with any operator for multiple values for any comparison.
SELECT Name, ID
from tblABC
where ID = ANY (1,2,3,4,5 )
But when a subquery is used for any comparison a right parenthesis missing error occurs
SELECT Name, ID
from tblABC
where ID = ANY (select ID from tblXYZ where ROWNUM <= 10 order by ID desc )
The subquery just gives the top 10 recent id entries from the selected table. Should there be a conversion to number or missing condition in this query?

The reason is order by, which is not necessary as it is evaluated after count stopkey (which is rownum < <constant>).
select *
from table(dbms_xplan.display_cursor(format => 'BASIC +PREDICATE'));
| PLAN_TABLE_OUTPUT |
| :----------------------------------------------------------------------- |
| EXPLAINED SQL STATEMENT: |
| ------------------------ |
| select /*+ gather_plan_statistics */ * from t where rownum < 5 order by |
| 1 asc |
| |
| Plan hash value: 846588679 |
| |
| ------------------------------------ |
| | Id | Operation | Name | |
| ------------------------------------ |
| | 0 | SELECT STATEMENT | | |
| | 1 | SORT ORDER BY | | |
| |* 2 | COUNT STOPKEY | | |
| | 3 | TABLE ACCESS FULL| T | |
| ------------------------------------ |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - filter(ROWNUM<5) |
| |
If you are on Oracle 12C+, then you may use fetch first:
select *
from dual
where 1 = any(select l from t order by 1 asc fetch first 4 rows only)
| DUMMY |
| :---- |
| X |
Or row_number() for older versions:
select *
from dual
where 1 = any (
select l
from (
select l, row_number() over(order by l asc) as rn
from t
)
where rn < 5
)
| DUMMY |
| :---- |
| X |
db<>fiddle here

It is order by part. It is not supported within sub-queries like this.
Just remove it. You don't need it for comparison anyway.
SELECT Name, ID
from tblABC
where ID = ANY (select ID from tblXYZ where ROWNUM <= 10 )

You can use FETCH FIRST <n> ROWS ONLY instead of using the old ROWNUM in the subquery.
For example:
SELECT Name, ID
from tblABC
where ID = ANY (select ID
from tblXYZ
order by ID desc
fetch first 10 rows only)
See running example at db<>fiddle.

Related

SQL SERVER How to select the latest record in each group? [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
| ID | TimeStamp | Item |
|----|-----------|------|
| 1 | 0:00:20 | 0 |
| 1 | 0:00:40 | 1 |
| 1 | 0:01:00 | 1 |
| 2 | 0:01:20 | 1 |
| 2 | 0:01:40 | 0 |
| 2 | 0:02:00 | 1 |
| 3 | 0:02:20 | 1 |
| 3 | 0:02:40 | 1 |
| 3 | 0:03:00 | 0 |
I have this and I would like to turn it into
| ID | TimeStamp | Item |
|----|-----------|------|
| 1 | 0:01:00 | 1 |
| 2 | 0:02:00 | 1 |
| 3 | 0:03:00 | 0 |
Please advise, thank you!
A correlated subquery is often the fastest method:
select t.*
from t
where t.timestamp = (select max(t2.timestamp)
from t t2
where t2.id = t.id
);
For this, you want an index on (id, timestamp).
You can also use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by timestamp desc) as seqnum
from t
) t
where seqnum = 1;
This is typically a wee bit slower because it needs to assign the row number to every row, even those not being returned.
You need to group by id, and filter out through timestamp values descending in order to have all the records returning as first(with value 1) in the subquery with contribution of an analytic function :
SELECT *
FROM
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TimeStamp DESC) AS dr
FROM t
) t
WHERE t.dr = 1
where DENSE_RANK() analytic function is used in order to include records with ties also.

Associate usage records with the corresponding usage plan in BigQuery

Customer's resource usage:
+-------+-------------+-----------------------+
| usage | customer_id | timestamp |
+-------+-------------+-----------------------+
| 10 | 1 | 2019-01-12T01:00:00 |
| 16 | 1 | 2019-02-12T02:00:00 |
| 26 | 1 | 2019-03-12T03:00:00 |
| 24 | 1 | 2019-04-12T04:00:00 |
| 4 | 1 | 2019-05-15T01:00:00 |
+-------+-------------+-----------------------+
This table shows the usage reported every hour for every customer. Minutes and seconds are always zero.
Customer's plan change log:
+--------+-------------+-----------------------+
| plan | customer_id | timestamp |
+--------+-------------+-----------------------+
| A | 1 | 2018-12-12T01:24:00 |
| B | 1 | 2019-01-12T02:31:00 |
| C | 1 | 2019-03-12T03:53:00 |
+--------+-------------+-----------------------+
When a customer changes his usage plan, the action is stored in a change log.
Result: associate each usage record with a usage plan.
+-------+-------------+--------+-----------------------+
| usage | customer_id | plan | timestamp |
+-------+-------------+--------+-----------------------+
| 10 | 1 | A | 2019-01-05T01:00:00 |
| 16 | 1 | B | 2019-02-12T02:00:00 |
| 26 | 1 | C | 2019-03-10T03:00:00 |
| 24 | 1 | C | 2019-04-12T04:00:00 |
| 4 | 1 | C | 2019-05-15T01:00:00 |
+-------+-------------+--------+-----------------------+
What I have tried: to determine the plan for a specific usage record, I take the timestamp of that record and look for the most recent plan change record in the usage plan log:
SELECT
customer_id,
plan,
timestamp,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
FROM
`project.dataset.table`
WHERE seqnum = 1 AND timestamp <= timestamp_of_the_usage_record
However I am not sure, how to combine that with the usage table. I tried:
WITH log AS (
SELECT
customer_id,
plan,
timestamp,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
FROM
`project.dataset.plan_change_log`
)
SELECT
t1.customer_id,
log.plan,
t1.usage,
t1.timestamp
FROM
`project.dataset.usage` t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp AND seqnum = 1
The result table has fewer rows than the original usage table because of the join condition. However the amount of rows should stay the same. Any ideas how to solve that problem?
You were on the right track, although the data in your example is a bit off for the first and third line of the end result.
with data as (
SELECT
t1.customer_id,
log.plan,
t1.usage,
t1.timestamp,
log.timestamp as logt,
ROW_NUMBER() OVER (PARTITION BY t1.customer_id, t1.timestamp ORDER BY log.timestamp DESC) seqnum
FROM
resource t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp
)
select * from data where seqnum = 1
You want to do create the sequence on the result of the join, not before.

How do I select rows with maximum value?

Given this table I want to retrieve for each different url the row with the maximum count. For this table the output should be: 'dell.html' 3, 'lenovo.html' 4, 'toshiba.html' 5
+----------------+-------+
| url | count |
+----------------+-------+
| 'dell.html' | 1 |
| 'dell.html' | 2 |
| 'dell.html' | 3 |
| 'lenovo.html' | 1 |
| 'lenovo.html' | 2 |
| 'lenovo.html' | 3 |
| 'lenovo.html' | 4 |
| 'toshiba.html' | 1 |
| 'toshiba.html' | 2 |
| 'toshiba.html' | 3 |
| 'toshiba.html' | 4 |
| 'toshiba.html' | 5 |
+----------------+-------+
What SQL query do I need to write to do this?
Try to use this query:
select url, max(count) as count
from table_name
group by url;
use aggregate function
select max(count) ,url from table_name group by url
From your comments it seems you need corelated subquery
select t1.* from table_name t1
where t1.count = (select max(count) from table_name t2 where t2.url=t1.url
)
If row_number support on yours sqllite version
then you can write query like below
select * from
(
select *,row_number() over(partition by url order by count desc) rn
from table_name
) a where a.rn=1

Optimization by replacing subquery with named subquery having INLINE hint

Let's have these two tables:
create table table_x(
x_id varchar2(100) primary key
);
create table table_y(
x_id varchar2(100) references table_x(x_id),
stream varchar2(10),
val_a number,
val_b number
);
create index table_y_idx on table_y (x_id, stream);
Suppose we have millions of rows in each table and table_y contains from 0 to 10 rows per each x_id.
The queries in the following examples return 200 rows by filter substr(x_id, 2, 1) = 'B'.
It's required to optimize the query:
QUERY 1
select
x.x_id,
y.val_a,
y.val_b
from table_x x
left join (select
x_id,
max(val_a) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_a,
max(val_b) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_b
from table_y
group by x_id
) y on x.x_id = y.x_id
where substr(x.x_id, 2, 1) = 'B'; -- intentionally not use the primary key filter
------
PLAN 1
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 2400000 | 22698 | 00:04:33 |
| * 1 | HASH JOIN OUTER | | 10000 | 2400000 | 22698 | 00:04:33 |
| * 2 | TABLE ACCESS FULL | TABLE_X | 10000 | 120000 | 669 | 00:00:09 |
| 3 | VIEW | | 10692 | 2437776 | 22029 | 00:04:25 |
| 4 | SORT GROUP BY | | 10692 | 245916 | 22029 | 00:04:25 |
| 5 | TABLE ACCESS FULL | TABLE_Y | 1069200 | 24591600 | 19359 | 00:03:53 |
----------------------------------------------------------------------------------
* 1 - access("X"."X_ID"="Y"."X_ID"(+))
* 2 - filter(SUBSTR("X"."X_ID", 2, 1)='B')
There's a way of significant optimization, so QUERY 2 returns rows 2-3 times faster than QUERY 1. The INLINE hint is сritically important, because without it the second performs as slow as the first one.
QUERY 2
with
table_y_total as (
select --+ INLINE
x_id,
max(val_a) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_a,
max(val_b) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_b
from table_y
group by x_id
)
select
x.x_id,
(select val_a from table_y_total y where y.x_id = x.x_id) as val_a,
(select val_b from table_y_total y where y.x_id = x.x_id) as val_b
from table_x x
where substr(x.x_id, 2, 1) = 'B';
------
PLAN 2
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 120000 | 669 | 00:00:09 |
| 1 | SORT GROUP BY NOSORT | | 1 | 19 | 103 | 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID | TABLE_Y | 100 | 1900 | 103 | 00:00:02 |
| * 3 | INDEX RANGE SCAN | TABLE_Y_IDX | 100 | | 3 | 00:00:01 |
| 4 | SORT GROUP BY NOSORT | | 1 | 20 | 103 | 00:00:02 |
| 5 | TABLE ACCESS BY INDEX ROWID | TABLE_Y | 100 | 2000 | 103 | 00:00:02 |
| * 6 | INDEX RANGE SCAN | TABLE_Y_IDX | 100 | | 3 | 00:00:01 |
| * 7 | TABLE ACCESS FULL | TABLE_X | 10000 | 120000 | 669 | 00:00:09 |
-----------------------------------------------------------------------------------------
* 3 - access("X_ID"=:B1)
* 6 - access("X_ID"=:B1)
* 7 - filter(SUBSTR("X"."X_ID", 2, 1)='B')
Since the first query uses less code duplication I would prefer to keep it.
Is there a hint or another trick to meet the following conditions both?
keep the first query code (QUERY 1)
force optimizer use the second plan (PLAN 2)
Perhaps you have oversimplified your code, but doesn't this do what you want:
select y.x_id,
max(y.val_a) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_a,
max(y.val_b) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_b
from table_y y
where substr(y.x_id, 2, 1) = 'B'
group by x_id;
I don't think the join to table x is unnecessary, as you have framed the question.
use index hint
select /*+index(index_name)*/ from table
Since full scan on table_x is the cheapest part of the plan, there's an approach with filtering it before joining table_y. Although optimizer decides to use full scan on table_y by default, hinting with index(y) helps to reduce timing to 110% of the QUERY 2's.
with
table_x_filtered as (
select x_id
from table_x
where substr(x_id, 2, 1) = 'B'
)
select /*+ index(y table_y_idx) */
x.x_id,
max(val_a) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_a,
max(val_b) KEEP (DENSE_RANK FIRST ORDER BY stream) as val_b
from table_x_filtered x
left join table_y y on y.x_id = x.x_id
group by x.x_id;

SELECT only latest record of an ID from given rows

I have this table shown below...How do I select only the latest data of the id based on changeno?
+----+--------------+------------+--------+
| id | data | changeno | |
+----+--------------+------------+--------+
| 1 | Yes | 1 | |
| 2 | Yes | 2 | |
| 2 | Maybe | 3 | |
| 3 | Yes | 4 | |
| 3 | Yes | 5 | |
| 3 | No | 6 | |
| 4 | No | 7 | |
| 5 | Maybe | 8 | |
| 5 | Yes | 9 | |
+----+---------+------------+-------------+
I would want this result...
+----+--------------+------------+--------+
| id | data | changeno | |
+----+--------------+------------+--------+
| 1 | Yes | 1 | |
| 2 | Maybe | 3 | |
| 3 | No | 6 | |
| 4 | No | 7 | |
| 5 | Yes | 9 | |
+----+---------+------------+-------------+
I currently have this SQL statement...
SELECT id, data, MAX(changeno) as changeno FROM Table1 GROUP BY id;
and clearly it doesn't return what I want. This should return an error because of the aggrerate function. If I added fields under the GROUP BY clause it works but it doesn't return what I want. The SQL statement is by far the closest I could think of. I'd appreciate it if anybody could help me on this. Thank you in advance :)
This is typically referred to as the "greatest-n-per-group" problem. One way to solve this in SQL Server 2005 and higher is to use a CTE with a calculated ROW_NUMBER() based on the grouping of the id column, and sorting those by largest changeno first:
;WITH cte AS
(
SELECT id, data, changeno,
rn = ROW_NUMBER() OVER (PARTITION BY id ORDER BY changeno DESC)
FROM dbo.Table1
)
SELECT id, data, changeno
FROM cte
WHERE rn = 1
ORDER BY id;
You want to use row_number() for this:
select id, data, changeno
from (SELECT t.*,
row_number() over (partition by id order by changeno desc) as seqnum
FROM Table1 t
) t
where seqnum = 1;
Not a well formed or performance optimized query but for small tasks it works fine.
SELECT * FROM TEST
WHERE changeno IN (SELECT MAX(changeno)
FROM TEST
GROUP BY id)
for other alternatives :
DECLARE #Table1 TABLE
(
id INT, data VARCHAR(5), changeno INT
);
INSERT INTO #Table1
SELECT 1,'Yes',1
UNION ALL
SELECT 2,'Yes',2
UNION ALL
SELECT 2,'Maybe',3
UNION ALL
SELECT 3,'Yes',4
UNION ALL
SELECT 3,'Yes',5
UNION ALL
SELECT 3,'No',6
UNION ALL
SELECT 4,'No',7
UNION ALL
SELECT 5,'Maybe',8
UNION ALL
SELECT 5,'Yes',9
SELECT Y.id, Y.data, Y.changeno
FROM #Table1 Y
INNER JOIN (
SELECT id, changeno = MAX(changeno)
FROM #Table1
GROUP BY id
) X ON X.id = Y.id
WHERE X.changeno = Y.changeno
ORDER BY Y.id