sql: Select count(*) - nth record from each group - sql

I'm grouping by tenant_id. I want to select the count() - 1000th record (ordered by _updated time) from each GROUPBY group, for the groups where count() is greater than 1000. As follows:
select t1.tenant_id,
(select temp._updated
from trace temp
where temp.tenant_id = t1.tenant_id
order by _updated limit 1 offset
count(*) - 1000
) as timekey
from fgc.trace as t1
group by tenant_id
having count(*) > 1000;
But this is not allowed as count(*) cannot be used inside the subquery.
So I tried the following, which still doesn't work as I don't have access to t1 since this is not a join.
select t1.tenant_id,
(select temp._updated
from trace temp
where temp.tenant_id = t1.tenant_id
order by _updated limit 1 offset
(select count(*)-1000
from trace t2
group by tenant_id
having t2.tenant_id = t1.tenant_id)
) as timekey
from fgc.trace as t1
group by tenant_id
having count(*) > 1000;
So how can I get the following?
tenant_id | timekey
+-----------+----------------------------------+
n7ia6ryc | 2019-07-23 23:09:49.951406+00:00

You seem to want ROW_NUMBER(). Cockroach supports windows functions, so:
SELECT updated
FROM (
SELECT
tenant_id,
updated,
ROW_NUMBER() OVER(PARTITION BY tenant_id ORDER BY updated DESC) rn
FROM trace
) x WHERE rn = 1001
For each tenant_id, this will return the timestamp of the 1001th less recent record. If a given tenant has less than 1000 records, it will not appear in the results.

select x.tenant_id
from (
select t.tenant_id,
row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
from fgc.trace as t
) x
where x.tenant_number > 1000
group by x.tenant_id
just the one timestamp would look like this:
select min(x.timekey) as min_timestamp
from (
select t.tenant_id, t.timekey,
row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
from fgc.trace as t
) x
where x.tenant_number > 1000
note that grouping does not matter here because each row can only be in one group and you are only looking at one row.

Related

Exclude records where count > 5 and select top 1 of it

I want to exclude records where id > 5 then select the top 1 of it order by date. How can I achieve this? Each record has audit_line which is unique field for each record. Recent SQL script is on below:
SELECT *
FROM db.table
HAVING COUNT(id) > 5
If you want id > 5 then you want where:
select top (1) t.*
from db.table t
where id > 5
order by date;
You can use row-numbering for this.
Note that if you have no other column to order by, you can do ORDER BY (SELECT NULL), but then you may get different results on each run.
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY some_other_column) rn
FROM table
) t
WHERE rn = 5;

Delete Duplicate Rows in SQL

I have a table with unique id but duplicate row information.
I can find the rows with duplicates using this query
SELECT
PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
Attendance
GROUP BY
PersonAliasId, StartDateTime, GroupId
HAVING
COUNT(*) > 1
I can manually delete the rows while keeping the 1 I need with this query
Delete
From Attendance
Where Id IN(SELECT
Id
FROM
Attendance
Where PersonAliasId = 15
and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)
I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.
How can I find the duplicate rows like the first query and delete all but the most recent like the second?
You can use a Common Table Expression to delete the duplicates:
WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;
This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.
Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.
DELETE a
FROM attendance as a
INNER JOIN (
SELECT
PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
FROM
Attendance
GROUP BY
PersonAliasId, GroupId
HAVING
COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime
Same as the CTE answer - give Felix the check
delete
from ( SELECT rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
) tt
where tt.rn > 1

Aggregate function like MAX for most common cell in column?

Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;

How to Select Top 100 rows in Oracle?

My requirement is to get each client's latest order, and then get top 100 records.
I wrote one query as below to get latest orders for each client. Internal query works fine. But I don't know how to get first 100 based on the results.
SELECT * FROM (
SELECT id, client_id, ROW_NUMBER() OVER(PARTITION BY client_id ORDER BY create_time DESC) rn
FROM order
) WHERE rn=1
Any ideas? Thanks.
Assuming that create_time contains the time the order was created, and you want the 100 clients with the latest orders, you can:
add the create_time in your innermost query
order the results of your outer query by the create_time desc
add an outermost query that filters the first 100 rows using ROWNUM
Query:
SELECT * FROM (
SELECT * FROM (
SELECT
id,
client_id,
create_time,
ROW_NUMBER() OVER(PARTITION BY client_id ORDER BY create_time DESC) rn
FROM order
)
WHERE rn=1
ORDER BY create_time desc
) WHERE rownum <= 100
UPDATE for Oracle 12c
With release 12.1, Oracle introduced "real" Top-N queries. Using the new FETCH FIRST... syntax, you can also use:
SELECT * FROM (
SELECT
id,
client_id,
create_time,
ROW_NUMBER() OVER(PARTITION BY client_id ORDER BY create_time DESC) rn
FROM order
)
WHERE rn = 1
ORDER BY create_time desc
FETCH FIRST 100 ROWS ONLY)
you should use rownum in oracle to do what you seek
where rownum <= 100
see also those answers to help you
limit in oracle
select top in oracle
select top in oracle 2
As Moneer Kamal said, you can do that simply:
SELECT id, client_id FROM order
WHERE rownum <= 100
ORDER BY create_time DESC;
Notice that the ordering is done after getting the 100 row. This might be useful for who does not want ordering.
Update:
To use order by with rownum you have to write something like this:
SELECT * from (SELECT id, client_id FROM order ORDER BY create_time DESC) WHERE rownum <= 100;
First 10 customers inserted into db (table customers):
select * from customers where customer_id <=
(select min(customer_id)+10 from customers)
Last 10 customers inserted into db (table customers):
select * from customers where customer_id >=
(select max(customer_id)-10 from customers)
Hope this helps....
To select top n rows updated recently
SELECT *
FROM (
SELECT *
FROM table
ORDER BY UpdateDateTime DESC
)
WHERE ROWNUM < 101;
Try this:
SELECT *
FROM (SELECT * FROM (
SELECT
id,
client_id,
create_time,
ROW_NUMBER() OVER(PARTITION BY client_id ORDER BY create_time DESC) rn
FROM order
)
WHERE rn=1
ORDER BY create_time desc) alias_name
WHERE rownum <= 100
ORDER BY rownum;
Or TOP:
SELECT TOP 2 * FROM Customers; //But not supported in Oracle
NOTE: I suppose that your internal query is fine. Please share your output of this.

oracle sql wih rownum <=

why below query is not giving results if I remove the < sign from query.Because even without < it must match with results?
Query used to get second max id value:
select min(id)
from(
select distinct id
from student
order by id desc
)
where rownum <=2
student id
1
2
3
4
Rownum has a special meaning in Oracle. It is increased with every row, but the optimizer knows that is increasing continuously and all consecutive rows must met the rownum condition. So if you specify rownum = 2 it will never occur since the first row is already rejected.
You can see this very nice if you do an explain plan on your query. It will show something like:
Plan for rownum <=:
COUNT STOPKEY
Plan for rownum =:
FILTER
A ROWNUM value is not assigned permanently to a row (this is a common misconception). A row in a table does not have a number; you cannot ask for row 2 or 3 from a table
click Here for more Info.
This is from the link provided:
Also confusing to many people is when a ROWNUM value is actually assigned. A ROWNUM value is assigned to a row after it passes the predicate phase of the query but before the query does any sorting or aggregation. Also, a ROWNUM value is incremented only after it is assigned, which is why the following query will never return a row:
select *
from t
where ROWNUM > 1;
Because ROWNUM > 1 is not true for the first row, ROWNUM does not advance to 2. Hence, no ROWNUM value ever gets to be greater than 1. Consider a query with this structure:
select ..., ROWNUM
from t
where <where clause>
group by <columns>
having <having clause>
order by <columns>;
I think this is the query you are looking for:
select id
from (select distinct id
from student
order by id desc
) t
where rownum <= 2;
Oracle processes the rownum before the order by, so you need a subquery to get the first two rows. The min() was forcing an aggregation that returned only one result, but before the rownum was applied.
If you actually want only the second value, you need an additional layer of subqueries:
select min(id)
from (select id
from (select distinct id
from student
order by id desc
) t
where rownum <= 2
) t;
However, I would do:
select id
from (select id, dense_rank() over (order by id) as seqnum
from student
) t
where seqnum = 2;
Order asc instead of desc
select id from student where rownum <=2 order by id asc;
Why not just use
select id
from ( select distinct id
, row_number() over (order by id desc) x
from student
)
where x = 2
Or even really bad. Getting the count and index :)
select id
from ( select id
, row_number() over (order by id desc) idx
, sum(1) over (order by null) cnt
from student
group
by id
)
where idx = cnt - 1 -- get the pre-last
Or
where idx = cnt - 2 -- get the 2nd-last
Or
where idx = 3 -- get the 3rd
Try this
SELECT *
FROM (
SELECT id, row_number() over (order by id asc) row_num
FROM student
) AS T
WHERE row_num = 2 -- or 3 ... n
ROW_NUMBER