Oracle SQL Query taking longer with OR statement - sql

I have a query that takes no time at all:
select count(*) from mytable where processed_status = 0 and tid not in
(select max(tid) from mytable group by userid)
tid is an auto-incremented unique identifier. I'm grabbing all the rows from mytable that are not the latest row based on userid. These are duplicate rows and I'm discarding them. Now I'm adding another filter to grab a specific row as well as all of the rows from the above query. I run the following query and it runs for 10 min before I kill it.
select count(*) from mytable where processed_status = 0 and (tid = 5 or tid not in
(select max(tid) from mytable group by userid))
if there is a better way to grab all the duplicate rows, I would be interested in some ideas as well.

You can use the ROW_NUMBER() analytic function:
SELECT COUNT(*)
FROM (
SELECT tid,
ROW_NUMBER() OVER ( PARTITION BY userid ORDER BY tid DESC ) AS rn
FROM mytable
WHERE processed_status = 0
)
WHERE tid = 5
OR rn > 1

Related

How do I select 1 [oldest] row per group of rows, given multiple groups?

Let's say we have the database table below, called USER_JOBS.
I'd like to write an SQL query that reflects this algorithm:
Divide the whole table in groups of rows defined by a common USER_ID (in the example table, the 2 resulting groups are colored yellow & green)
From each group, select the oldest row (according to SCHEDULE_TIME)
From this example table, the desired SQL query would return these 2 rows:
You can use ranking function (supported in most RDBS):
SELECT *
FROM
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY USER_ID ORDER BY SCHEDULE_TIME DESC) AS RowID
FROM [table]
)
WHERE RowID = 1
WITH Ranked AS (
SELECT
RANK() OVER (PARTITION BY User_ID ORDER BY ScheduleTime DESC) as Ranking,
*
FROM [table_name]
)
SELECT Status, Sob_Type, User_ID, TimeStamp FROM ranking WHERE Ranks = 1;

How to group and pick only certain values based on a field using select query SQL

I have a table as follow
ID
ORDERNO
1
123
1
123
2
456
2
456
During every select query done via application using JDBC, only the grouped records based on ORDERNO should be picked.
That means, for example, during first select query only details related to ID = 1, but we cannot specify the ID number in where clause because we do not know how many number of IDs will be there in future. So the query should yield only one set of records; application will delete those records after picking, hence next select query will result in picking other set of records. How to achieve it?
You can use TOP WITH TIES for this
SELECT TOP (1) WITH TIES
t.ID,
t.ORDERNO
FROM YourTable t
ORDER BY
t.ID;
If you want to select and delete at the same time you could delete using an OUTPUT clause
WITH cte AS (
SELECT TOP (1) WITH TIES
t.ID,
t.ORDERNO
FROM YourTable t
ORDER BY
t.ID
)
DELETE cte
OUTPUT deleted.*;
As one option you could select on the MIN(ID) like:
SELECT *
FROM yourtable
WHERE ID = (SELECT MIN(ID) FROM yourtable);
You could also use window functions to do this:
SELECT ID, ORDERNO
FROM
(
SELECT ID, ORDERNO
DENSE_RANK() OVER (ORDER BY ID ASC) AS dr
FROM yourtable
)dt
WHERE dr = 1;
order your rows and select top n number of rows that you want :
select top (1) with ties ID, ORDERNO
from tablename
order by ID asc

sql: Select count(*) - nth record from each group

I'm grouping by tenant_id. I want to select the count() - 1000th record (ordered by _updated time) from each GROUPBY group, for the groups where count() is greater than 1000. As follows:
select t1.tenant_id,
(select temp._updated
from trace temp
where temp.tenant_id = t1.tenant_id
order by _updated limit 1 offset
count(*) - 1000
) as timekey
from fgc.trace as t1
group by tenant_id
having count(*) > 1000;
But this is not allowed as count(*) cannot be used inside the subquery.
So I tried the following, which still doesn't work as I don't have access to t1 since this is not a join.
select t1.tenant_id,
(select temp._updated
from trace temp
where temp.tenant_id = t1.tenant_id
order by _updated limit 1 offset
(select count(*)-1000
from trace t2
group by tenant_id
having t2.tenant_id = t1.tenant_id)
) as timekey
from fgc.trace as t1
group by tenant_id
having count(*) > 1000;
So how can I get the following?
tenant_id | timekey
+-----------+----------------------------------+
n7ia6ryc | 2019-07-23 23:09:49.951406+00:00
You seem to want ROW_NUMBER(). Cockroach supports windows functions, so:
SELECT updated
FROM (
SELECT
tenant_id,
updated,
ROW_NUMBER() OVER(PARTITION BY tenant_id ORDER BY updated DESC) rn
FROM trace
) x WHERE rn = 1001
For each tenant_id, this will return the timestamp of the 1001th less recent record. If a given tenant has less than 1000 records, it will not appear in the results.
select x.tenant_id
from (
select t.tenant_id,
row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
from fgc.trace as t
) x
where x.tenant_number > 1000
group by x.tenant_id
just the one timestamp would look like this:
select min(x.timekey) as min_timestamp
from (
select t.tenant_id, t.timekey,
row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
from fgc.trace as t
) x
where x.tenant_number > 1000
note that grouping does not matter here because each row can only be in one group and you are only looking at one row.

Delete Duplicate Rows in SQL

I have a table with unique id but duplicate row information.
I can find the rows with duplicates using this query
SELECT
PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
Attendance
GROUP BY
PersonAliasId, StartDateTime, GroupId
HAVING
COUNT(*) > 1
I can manually delete the rows while keeping the 1 I need with this query
Delete
From Attendance
Where Id IN(SELECT
Id
FROM
Attendance
Where PersonAliasId = 15
and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)
I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.
How can I find the duplicate rows like the first query and delete all but the most recent like the second?
You can use a Common Table Expression to delete the duplicates:
WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;
This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.
Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.
DELETE a
FROM attendance as a
INNER JOIN (
SELECT
PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
FROM
Attendance
GROUP BY
PersonAliasId, GroupId
HAVING
COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime
Same as the CTE answer - give Felix the check
delete
from ( SELECT rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
) tt
where tt.rn > 1

oracle sql wih rownum <=

why below query is not giving results if I remove the < sign from query.Because even without < it must match with results?
Query used to get second max id value:
select min(id)
from(
select distinct id
from student
order by id desc
)
where rownum <=2
student id
1
2
3
4
Rownum has a special meaning in Oracle. It is increased with every row, but the optimizer knows that is increasing continuously and all consecutive rows must met the rownum condition. So if you specify rownum = 2 it will never occur since the first row is already rejected.
You can see this very nice if you do an explain plan on your query. It will show something like:
Plan for rownum <=:
COUNT STOPKEY
Plan for rownum =:
FILTER
A ROWNUM value is not assigned permanently to a row (this is a common misconception). A row in a table does not have a number; you cannot ask for row 2 or 3 from a table
click Here for more Info.
This is from the link provided:
Also confusing to many people is when a ROWNUM value is actually assigned. A ROWNUM value is assigned to a row after it passes the predicate phase of the query but before the query does any sorting or aggregation. Also, a ROWNUM value is incremented only after it is assigned, which is why the following query will never return a row:
select *
from t
where ROWNUM > 1;
Because ROWNUM > 1 is not true for the first row, ROWNUM does not advance to 2. Hence, no ROWNUM value ever gets to be greater than 1. Consider a query with this structure:
select ..., ROWNUM
from t
where <where clause>
group by <columns>
having <having clause>
order by <columns>;
I think this is the query you are looking for:
select id
from (select distinct id
from student
order by id desc
) t
where rownum <= 2;
Oracle processes the rownum before the order by, so you need a subquery to get the first two rows. The min() was forcing an aggregation that returned only one result, but before the rownum was applied.
If you actually want only the second value, you need an additional layer of subqueries:
select min(id)
from (select id
from (select distinct id
from student
order by id desc
) t
where rownum <= 2
) t;
However, I would do:
select id
from (select id, dense_rank() over (order by id) as seqnum
from student
) t
where seqnum = 2;
Order asc instead of desc
select id from student where rownum <=2 order by id asc;
Why not just use
select id
from ( select distinct id
, row_number() over (order by id desc) x
from student
)
where x = 2
Or even really bad. Getting the count and index :)
select id
from ( select id
, row_number() over (order by id desc) idx
, sum(1) over (order by null) cnt
from student
group
by id
)
where idx = cnt - 1 -- get the pre-last
Or
where idx = cnt - 2 -- get the 2nd-last
Or
where idx = 3 -- get the 3rd
Try this
SELECT *
FROM (
SELECT id, row_number() over (order by id asc) row_num
FROM student
) AS T
WHERE row_num = 2 -- or 3 ... n
ROW_NUMBER