Get most recent measurement - sql

I have a table that has has some measurements, ID and date.
The table is built like so
ID DATE M1 M2
1 2020 1 NULL
1 2020 NULL 15
1 2018 2 NULL
2 2019 1 NULL
2 2019 NULL 1
I would like to end up with a table that has one row per ID with the most recent measurement
ID M1 M2
1 1 15
2 1 1
Any ideas?

You can use correlated sub-query with aggregation :
select id, max(m1), max(m2)
from t
where t.date = (select max(t1.date) from t t1 where t1.id = t.id)
group by id;

Use ROW_NUMBER combined with an aggregation:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT ID, MAX(M1) AS M1, MAX(M2) AS M2
FROM cte
WHERE rn = 1
GROUP BY ID;
The row number lets us restrict to only records for each ID having the most recent year date. Then, we aggregate to find the max values for M1 and M2.

In standard SQL, you can use lag(ignore nulls):
select id, coalesce(m1, prev_m1), coalesce(m2, prev_m2)
from (select t.*,
lag(m1 ignore nulls) over (partition by id order by date) as prev_m1,
lag(m2 ignore nulls) over (partition by id order by date) as prev_m2,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum = 1;

Related

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month
You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;
You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output
Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL
This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

Delete duplicated record

I have a table which contains a lot of duplicated rows like this:
id_emp id date ch_in ch_out
1 34103 2019-09-01
1 34193 2019-09-01 17:00
1 34194 2019-09-02 07:03:21 16:59:26
1 34104 2019-09-02 07:03:21 16:59:26
1 33361 2019-09-02 NULL NULL
I want just one row for each date and others must delete with condition like I want the output must be:
id_emp id date ch_in ch_out
1 34193 2019-09-01 17:00
1 34104 2019-09-02 07:03:21 16:59:26
I tried to use distinct but nothing working:
select distinct id_emp, id, date_1, ch_in,ch_out
from ch_inout
where id_emp=1 order by date_1 asc
And I tried too using this query to delete:
select *
from (
select *, rn=row_number() over (partition by date_1 order by id)
from ch_inout
) x
where rn > 1;
But nothing is working the result is empty.
You can use aggregation:
select id_emp, max(id) as id, date, min(ch_in), max(ch_out)
from ch_inout
group by id_emp, date;
This returns the maximum id for each group of rows. That is not exactly what is returned in the question, but you don't specify the logic.
EDIT:
If you want to delete all but the largest id for each id_emp/date combination, you can use:
delete c from ch_inout c
where id < (select max(c2.id)
from ch_inout c2
where c2.id_emp = c.id_emp and c2.date = c.date
);
You can use ROW_NUMBER() to identify the records you want to delete. Assuming that you want to keep the record with the lowest id on each date:
SELECT *
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY id) rn
FROM ch_inout t
) x
WHERE rn > 1
You can easily turn this into a DELETE statement:
WITH cte AS (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY id) rn
FROM ch_inout t
)
DELETE FROM cte WHERE rn > 1

Select TOP 2 values for each group

I'm having problem with getting only TOP 2 values for each group (groups are in column).
Example :
ID Group Value
1 A 30
2 A 150
3 A 40
4 A 70
5 B 0
6 B 100
7 B 90
I expect my output to be
ID Group Value
1 A 150
2 A 70
3 B 100
4 B 90
Simply, for each group I want just 2 rows with the highest Value
Most databases support the ANSI standard row_number() function. You would use it as:
select group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
To set the id you can use row_number() in the outer query:
select row_number() over (order by group, value) as id,
group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
However, changing the id seems suspicious.
You can use CTE with rank function ROW_NUMBER() .
Here is query to get your result.
;WITH cte AS
( SELECT Group, value,
ROW_NUMBER() OVER (PARTITION BY Group ORDER BY value DESC) AS rn
FROM test
)
SELECT Group, value FROM cte
WHERE rn <= 2
ORDER BY value

Comparing row values in oracle

I have Table1 with three columns:
Key | Date | Price
----------------------
1 | 26-May | 2
1 | 25-May | 2
1 | 24-May | 2
1 | 23 May | 3
1 | 22 May | 4
2 | 26-May | 2
2 | 25-May | 2
2 | 24-May | 2
2 | 23 May | 3
2 | 22 May | 4
I want to select the row where value 2 was last updated (24-May). The Date was sorted using RANK function.
I am not able to get the desired results. Any help will be appreciated.
SELECT *
FROM (SELECT key, DATE, price,
RANK() over (partition BY key order by DATE DESC) AS r2
FROM Table1 ORDER BY DATE DESC) temp;
Another way of looking at the problem is that you want to find the most recent record with a price different from the last price. Then you want the next record.
with lastprice as (
select t.*
from (select t.*
from table1 t
order by date desc
) t
where rownum = 1
)
select t.*
from (select t.*
from table1 t
where date > (select max(date)
from table1 t2
where t2.price <> (select price from lastprice)
)
order by date asc
) t
where rownum = 1;
This query looks complicated. But, it is structured so it can take advantage of indexes on table1(date). The subqueries are necessary in Oracle pre-12. In the most recent version, you can use fetch first 1 row only.
EDIT:
Another solution is to use lag() and find the most recent time when the value changed:
select t1.*
from (select t1.*
from (select t1.*,
lag(price) over (order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where rownum = 1;
Under many circumstances, I would expect the first version to have better performance, because the only heavy work is done in the innermost subquery to get the max(date). This verson has to calculate the lag() as well as doing the order by. However, if performance is an issue, you should test on your data in your environment.
EDIT II:
My best guess is that you want this per key. Your original question says nothing about key, but:
select t1.*
from (select t1.*,
row_number() over (partition by key order by date desc) as seqnum
from (select t1.*,
lag(price) over (partition by key order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where seqnum = 1;
You can try this:-
SELECT Date FROM Table1
WHERE Price = 2
AND PrimaryKey = (SELECT MAX(PrimaryKey) FROM Table1
WHERE Price = 2)
This is very similar to the second option by Gordon Linoff but introduces a second windowed function row_number() to locate the most recent row that changed the price. This will work for all or a range of keys.
select
*
from (
select
*
, row_number() over(partition by Key order by [date] DESC) rn
from (
select
*
, NVL(lag(Price) over(partition by Key order by [date] DESC),0) prevPrice
from table1
where Key IN (1,2,3,4,5) -- as an example
)
where Price <> prevPrice
)
where rn = 1
apologies but I haven't been able to test this at all.

How to filter out the first and last entry from a table using RANK?

I've this data:
Id Date Value
'a' 2000 55
'a' 2001 3
'a' 2012 2
'a' 2014 5
'b' 1999 10
'b' 2014 110
'b' 2015 8
'c' 2011 4
'c' 2012 33
I want to filter out the first and the last value (when the table is sorted on the Date column), and only keep the other values. In case there are only two entries, nothing is returned. (Example for Id = 'c')
ID Date Value
'a' 2001 3
'a' 2012 2
'b' 2014 110
I tried to use order by (RANK() OVER (PARTITION BY [Id] ORDER BY Date ...)) in combination with this article (http://blog.sqlauthority.com/2008/03/02/sql-server-how-to-retrieve-top-and-bottom-rows-together-using-t-sql/) but I can't get it to work.
[UPDATE]
All the 3 answers seem fine. But I'm not a SQL expert, so my question is which one has the fastest performance if the table has around 800000 rows and there a no indexes on any column.
You can use row_number twice to determine the min and max dates and then filter accordingly:
with cte as (
select id, [date], value,
row_number() over (partition by id order by [date]) minrn,
row_number() over (partition by id order by [date] desc) maxrn
from data
)
select id, [date], value
from cte
where minrn != 1 and maxrn != 1
SQL Fiddle Demo
Here's another approach using min and max for this without needing to use a ranking function:
with cte as (
select id, min([date]) mindate, max([date]) maxdate
from data
group by id
)
select *
from data d
where not exists (
select 1
from cte c
where d.id = c.id and d.[date] in (c.mindate, c.maxdate))
More Fiddle
Here is a similar solution with row_number and count :
SELECT id,
dat,
value
FROM (SELECT *,
ROW_NUMBER()
OVER(
partition BY id
ORDER BY dat) rnk,
COUNT(*)
OVER (
partition BY id) cnt
FROM #table) t
WHERE rnk NOT IN( 1, cnt )
You can do this with EXISTS:
SELECT *
FROM Table1 a
WHERE EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date < a.Date
)
AND EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date > a.Date
)
Demo: SQL Fiddle