get the id based on condition in group by - sql

I'm trying to create a sql query to merge rows where there are equal dates. the idea is to do this based on the highest amount of hours, so that i in the end gets the corresponding id for each date with the highest amount of hours. i've been trying to do with a simple group by, but does not seem to work, since i CANT just put a aggregate function on id column, since it should be based the hours condition
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 2 | 2012-01-01 | 10 |
| 3 | 2012-01-01 | 5 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
desired result
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+

If you want exactly one row -- even if there are ties -- then use row_number():
select t.*
from (select t.*, row_number() over (partition by date order by hours desc) as seqnum
from t
) t
where seqnum = 1;
Ironically, both Postgres and Oracle (the original tags) have what I would consider to be better ways of doing this, but they are quite different.
Postgres:
select distinct on (date) t.*
from t
order by date, hours desc;
Oracle:
select date, max(hours) as hours,
max(id) keep (dense_rank first over order by hours desc) as id
from t
group by date;

Here's one approach using row_number:
select id, dt, hours
from (
select id, dt, hours, row_number() over (partition by dt order by hours desc) rn
from yourtable
) t
where rn = 1

You can use subquery with correlation approach :
select t.*
from table t
where id = (select t1.id
from table t1
where t1.date = t.date
order by t1.hours desc
limit 1);
In Oracle you can use fetch first 1 row only in subquery instead of LIMIT clause.

Related

How can I filter duplicates/repeated fields in bigquery?

I have a table without primaryKey. And I am trying to get the events of the earliest date grouped by id.
This is what small piece of mytable looks like:
|----------|------------------|-------------|
| id | date | events |
|----------|------------------|-------------|
| 1 |2020-04-11 3:44:20| call |
|----------|------------------|-------------|
| 3 |2020-04-21 7:59:06| appointment |
|----------|------------------|-------------|
| 1 |2020-04-17 1:14:32| appointment |
|----------|------------------|-------------|
| 2 |2020-04-10 3:41:17| feedback |
|----------|------------------|-------------|
| 1 |2020-04-23 1:36:13| appointment |
|----------|------------------|-------------|
| 3 |2020-04-12 4:55:38| call |
|----------|------------------|-------------|
This is the result I am looking for:
|----------|------------------|-------------|
| id | date | events |
|----------|------------------|-------------|
| 1 |2020-04-11 3:44:20| call |
|----------|------------------|-------------|
| 2 |2020-04-10 3:41:17| feedback |
|----------|------------------|-------------|
| 3 |2020-04-12 4:55:38| call |
|----------|------------------|-------------|
I am trying to get events by id only for their respective MIN(date) but the problem is that I have to SELECT events but then I have to add events to GROUP BY so I can't GROUP BY id only as I would like to.
I have tried a lot of different version but here is one:
SELECT id, MIN(date), events
FROM mydataset.mytable
GROUP BY id, events
Please keep in mind that my table is much larger than this.
Any help would be very much appreciated.
You can use aggregation:
select array_agg(t order by date asc limit 1)[ordinal(1)].*
from mydataset.mytable t
group by t.id;
Or the more traditional method of using row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by id order by date) as seqnum
from mydataset.mytable t
) t
where seqnum = 1;
You could modify what you have as an uncorrelated subquery
select *
from mytable
where (id, date) in (select id, min(date)
from mytable
group by id);
If your DB supports window functions you could also do
select distinct id,
min(date) over(partition by id) date,
first_value(events) over (partition by id order by date asc) events
from mytable;
Outputs
+----+---------------------+----------+
| id | date | events |
+----+---------------------+----------+
| 1 | 2020-04-11 03:44:20 | call |
| 2 | 2020-04-10 03:41:17 | feedback |
| 3 | 2020-04-12 04:55:38 | call |
+----+---------------------+----------+
A join to a derived table might perform better, esp. if id and date are indexed:
select m.*
from mytable m
join (select id, min(date) date
from mytable
group by id ) x
on m.id = x.id
and m.date = x.date
;
to built on Gordon's answer with Jones' comment -
Below version does not require using alias and allows use of just id in GROUP BY
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY date LIMIT 1)[ORDINAL(1)]
FROM `project.dataset.table` t
GROUP BY id

Order By Id and Limit Offset By Id from a table

I have an issue similar to the following query:
select name, number, id
from tableName
order by id
limit 10 offset 5
But in this case I only take the 10 elements from the group with offset 5
Is there a way to set limit and offset by id?
For example if I have a set:
|------------------------------------|---|---------------------------------------|
| Ana | 1 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
| Joe | 2 | 64ed0011-ef54-4708-a64a-f85228149651 |
and if I have skip 1 I should get
|------------------------------------|---|---------------------------------------|
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
I think that you want to filter by row_number():
select name, number, id
from (
select t.*, row_number() over(partition by name order by id) rn
from mytable t
) t
where
rn >= :number_of_records_per_group_to_skip
and rn < :number_of_records_per_group_to_skip + :number_of_records_per_group_to_keep
The query ranks records by id withing groups of records having the same name, and then filters using two parameters:
:number_of_records_per_group_to_skip: how many records per group should be skipped
:number_of_records_per_group_to_skip: how many records per group should be kept (after skipping :number_of_records_per_group_to_skip records)
This might not be the answer you are looking for but it gives you the results your example shows:
select name, number, id
from (
select * from tableName
order by id
limit 3 offset 0
) d
where id > 1;
Best regards,
Bjarni

Select Top N rows plus another Select based on previous result

I am quite new to SQL.
I have a MS SQL DB where I would like to fetch the top 3 rows with datetime above a specific input PLUS get all the rows where the datetime value is equal to the last row of the previous fetch.
| rowId | Timestamp | data |
|-------|--------------------------|------|
| rsg | 2019-01-01T00:00:00.000Z | 120 |
| zqd | 2020-01-01T00:00:00.000Z | 36 |
| ylp | 2020-01-01T00:00:00.000Z | 48 |
| abt | 2022-01-01T00:00:00.000Z | 53 |
| zio | 2022-01-01T00:00:00.000Z | 12 |
Here is my current request to fetch the 3 rows.
SELECT
TOP 3 *
FROM
Table
WHERE
Timestamp >= '2020-01-01T00:00:00.000Z'
ORDER BY
Timestamp ASC
Here I would like to get in one request the last 4 rows.
Thanks for your help
One possibility, using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Timestamp) rn
FROM yourTable
)
SELECT *
FROM cte
WHERE
Timestamp <= (SELECT Timestamp FROM cte WHERE rn = 3);
Matching records should be in the first 3 rows or should have timestamps equal to the timestamp in the third row. We can combine these conditions by restricting to timestamps equal or before the timestamp in the third row.
Or maybe use TOP 3 WITH TIES:
SELECT TOP 3 WITH TIES *
FROM yourTable
ORDER BY Timestamp;

How to SELECT in SQL based on a value from the same table column?

I have the following table
| id | date | team |
|----|------------|------|
| 1 | 2019-01-05 | A |
| 2 | 2019-01-05 | A |
| 3 | 2019-01-01 | A |
| 4 | 2019-01-04 | B |
| 5 | 2019-01-01 | B |
How can I query the table to receive the most recent values for the teams?
For example, the result for the above table would be ids 1,2,4.
In this case, you can use window functions:
select t.*
from (select t.*, rank() over (partition by team order by date desc) as seqnum
from t
) t
where seqnum = 1;
In some databases a correlated subquery is faster with the right indexes (I haven't tested this with Postgres):
select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.team = t.team);
And if you wanted only one row per team, then the canonical answer is:
select distinct on (t.team) t.*
from t
order by t.team, t.date desc;
However, that doesn't work in this case because you want all rows from the most recent date.
If your dataset is large, consider the max analytic function in a subquery:
with cte as (
select
id, date, team,
max (date) over (partition by team) as max_date
from t
)
select id
from cte
where date = max_date
Notionally, max is O(n), so it should be pretty efficient. I don't pretend to know the actual implementation on PostgreSQL, but my guess is it's O(n).
One more possibility, generic:
select * from t join (select max(date) date,team from t
group by team) tt
using(date,team)
Window function is the best solution for you.
select id
from (
select team, id, rank() over (partition by team order by date desc) as row_num
from table
) t
where row_num = 1
That query will return this table:
| id |
|----|
| 1 |
| 2 |
| 4 |
If you to get it one row per team, you need to use array_agg function.
select team, array_agg(id) ids
from (
select team, id, rank() over (partition by team order by date desc) as row_num
from table
) t
where row_num = 1
group by team
That query will return this table:
| team | ids |
|------|--------|
| A | [1, 2] |
| B | [4] |

How to keep the first row of a certain group based on some condition on Teradata SQL?

I have table in Teradata that looks like this
ID | Date | Values
------------------------
abc | 1Jan2015 | 1
abc | 1Dec2015 | 0
def | 2Feb2015 | 0
def | 2Jul2015 | 0
I want to write a piece of SQL that keeps only the earliest date of each ID. So the result I wanted is
ID | Date | Values
------------------------
abc | 1Jan2015 | 1
def | 2Feb2015 | 0
I know there is top n syntax but it only seems to work on the whole table not within groups.
Basically how do I do a top n within groups?
TOP can be easily rewritten using ROW_NUMBER:
select *
from tab
qualify
row_number() over (partition by id order by date) = 1
You can do this using row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by date) as seqnum
from table t
) t
where seqnum = 1;