How to get columns from multiple rows in a single row in SQL - sql

I want to get 2 columns col_a and col_b's values for min and max of some other column. For example:
id
last_updated
col_a
col_b
1
2021-01-01
abc
xyz
1
2021-01-02
abc_0
xyz_0
1
2021-01-03
abc_1
xyz_1
1
2021-01-04
abc_2
xyz_2
2
2021-01-01
abc
xyz
2
2021-01-01
abc
xyz
...
I want to get the result:
|1|abc|abc_2|xyz|xyz_2|
That is the result of grouping by id, and getting the values of these columns while putting the condition of min and max on some other column(last_updated).
I came up with the following query:
select id, max(last_updated), min(last_updated)
from my_table
group by id
This gives me the id and min and max dates but not the other 2 columns. I'm not sure how to get the values for the other 2 columns for both dates in same query.

You can use MIN and MAX analytical function as follows:
select id,
max(case when mindt = last_updated then col_a end) as min_col_a,
max(case when maxdt = last_updated then col_a end) as max_col_a,
max(case when mindt = last_updated then col_b end) as min_col_b,
max(case when maxdt = last_updated then col_b end) as max_col_b
from
(select t.*,
min(last_updated) over (partition by id) as mindt,
max(last_updated) over (partition by id) as maxdt
from your_table t) t
group by id

We can use ROW_NUMBER, twice, to find the first and last rows, as ordered by last_updated, for each id group of records. Then, aggregate by id and pivot out columns for the various col_a and col_b values.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY last_updated) rn_min,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY last_updated DESC) rn_max
FROM yourTable
)
SELECT
id,
MAX(CASE WHEN rn_min = 1 THEN col_a END) AS col_a_min,
MAX(CASE WHEN rn_max = 1 THEN col_a END) AS col_a_max,
MAX(CASE WHEN rn_min = 1 THEN col_b END) AS col_b_min,
MAX(CASE WHEN rn_max = 1 THEN col_b END) AS col_b_max
FROM cte
GROUP BY id;

Not the neatest solution but demonstrates another way to obtain the data you want. We join the table on itself as we normally want data from 2 rows, then we use cross apply to restrict it to first and last.
select T1.id, T2.col_a, T1.col_a, T2.col_b, T1.col_b
from #my_table T1
inner join #my_table T2 on T1.id = T2.id
cross apply (
select id, max(last_updated) MaxLastUpdated, min(last_updated) MinLastUpdated
from #my_table
group by id
) X
where T1.last_updated = X.MaxLastUpdated and T2.last_updated = X.MinLastUpdated;
With the sample data provided this appear to perform worse than the row_number() solution. The fastest solution is the analytical functions.

select id,max(last_updated) last_updatedMax, min(last_updated) last_updatedMin,max(col_aMax) col_aMax, max(col_aMin) col_aMin,max(col_bMax) col_bMax, max(col_bMin) col_bMin
from
(
select
*
, first_value(col_a) OVER (PARTITION BY id ORDER BY last_updated desc) as col_aMax
, first_value(col_a) OVER (PARTITION BY id ORDER BY last_updated asc) as col_aMin
, first_value(col_b) OVER (PARTITION BY id ORDER BY last_updated desc) as col_bMax
, first_value(col_b) OVER (PARTITION BY id ORDER BY last_updated asc) as col_bMin
from my_table
) t
group by id

Related

How to find the highest and second highest entry in SQL in a single query using `GROUP BY`?

Let this be the table that is provided.
PID
TID
Type
Freq
1
1
A
3
1
1
A
2
1
1
A
1
1
1
B
3
1
2
A
4
1
2
B
5
I want to write a query to get an output like this.
PID
TID
Type
Max_Freq_1
Max_Freq_2
1
1
A
3
2
1
1
B
3
NULL
1
2
A
4
NULL
1
2
B
5
NULL
That is, given a combination of PID, TID, Type, what is the highest and second-highest frequency? If there aren't a sufficient number of entries in the table, then put second highest as NULL
If your database can use the window functions, then the top 2 Freq can be calculated via the DENSE_RANK function.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT PID, TID, Type, Freq
, DENSE_RANK() OVER (PARTITION BY PID, TID, Type ORDER BY Freq DESC) AS Rnk
FROM YourTable t
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
pid
tid
type
max_freq_1
max_freq_2
1
1
A
3
2
1
1
B
3
null
1
2
A
4
null
1
2
B
5
null
If ROW_NUMBER isn't available, then try this.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT t1.PID, t1.TID, t1.Type, t1.Freq
, COUNT(DISTINCT t2.Freq) AS Rnk
FROM YourTable t1
LEFT JOIN YourTable t2
ON t2.PID = t1.PID
AND t2.TID = t1.TID
AND t2.Type = t1.Type
AND t2.Freq >= t1.Freq
GROUP BY t1.PID, t1.TID, t1.Type, t1.Freq
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
Demo on db<>fiddle here
This is what I came up with on PostgreSQL. Using the window function like row_number is the easiest way to get the result you want.
with t as (
select *, row_number() over (partition by pid, tid, "type" order by freq desc) as r
from test_so
) select pid, tid, "type", max(case when r = 1 then freq end) as "highest", max(case when r = 2 then freq end) as "second_highest"
from t
group by pid, tid, "type"

Get most recent measurement

I have a table that has has some measurements, ID and date.
The table is built like so
ID DATE M1 M2
1 2020 1 NULL
1 2020 NULL 15
1 2018 2 NULL
2 2019 1 NULL
2 2019 NULL 1
I would like to end up with a table that has one row per ID with the most recent measurement
ID M1 M2
1 1 15
2 1 1
Any ideas?
You can use correlated sub-query with aggregation :
select id, max(m1), max(m2)
from t
where t.date = (select max(t1.date) from t t1 where t1.id = t.id)
group by id;
Use ROW_NUMBER combined with an aggregation:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT ID, MAX(M1) AS M1, MAX(M2) AS M2
FROM cte
WHERE rn = 1
GROUP BY ID;
The row number lets us restrict to only records for each ID having the most recent year date. Then, we aggregate to find the max values for M1 and M2.
In standard SQL, you can use lag(ignore nulls):
select id, coalesce(m1, prev_m1), coalesce(m2, prev_m2)
from (select t.*,
lag(m1 ignore nulls) over (partition by id order by date) as prev_m1,
lag(m2 ignore nulls) over (partition by id order by date) as prev_m2,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum = 1;

First value in DATE minus 30 days SQL

I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks
I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.
I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;

TSQL getting max and min date with a seperate but not unique record

example table:
test_date | test_result | unique_ID
12/25/15 | 100 | 50
12/01/15 | 150 | 75
10/01/15 | 135 | 75
09/22/14 | 99 | 50
04/10/13 | 125 | 50
I need to find the first and last test date as well as the test result to match said date by user. So, I can group by ID, but not test result.
SELECT MAX(test_date)[need matching test_result],
MIN(test_date) [need matching test_result],
unique_id
from [table]
group by unique_id
THANKS!
Create TABLE #t
(
test_date date ,
Test_results int,
Unique_id int
)
INSERT INTO #t
VALUES ( '12/25/15',100,50 ),
( '12/01/15',150,75 ),
( '10/01/15',135,75 ),
( '09/22/14',99,50 ),
( '04/10/13',125,50 )
select 'MinTestDate' as Type, a.test_date, a.Test_results, a.Unique_id
from #t a inner join (
select min(test_date) as test_datemin, max(test_date) as test_datemax, unique_id from #t
group by unique_ID) b
on a.test_date = b.test_datemin
union all
select 'MaxTestDate' as Type, a.test_date, a.Test_results, a.Unique_id from #t a
inner join (
select min(test_date) as test_datemin, max(test_date) as test_datemax, unique_id from #t
group by unique_ID) b
on a.test_date = b.test_datemax
I would recommend window functions. The following returns the information on 2 rows per id:
select t.*
from (select t.*,
row_number() over (partition by unique_id order by test_date) as seqnum_asc,
row_number() over (partition by unique_id order by test_date desc) as seqnum_desc
from table t
) t;
For one row, use conditional aggregation (or pivot if you prefer):
select unique_id,
min(test_date), max(case when seqnum_asc = 1 then test_result end),
max(test_date), max(case when seqnum_desc = 1 then test_result end)
from (select t.*,
row_number() over (partition by unique_id order by test_date) as seqnum_asc,
row_number() over (partition by unique_id order by test_date desc) as seqnum_desc
from table t
) t
group by unique_id;
Consider using a combination of self-joins and derived tables:
SELECT t1.unique_id, minTable.MinOftest_date, t1.test_result As Mintestdate_result,
maxTable.MaxOftest_date, t2.test_result As Maxtestdate_result
FROM TestTable AS t1
INNER JOIN
(
SELECT Min(TestTable.test_date) AS MinOftest_date,
TestTable.unique_ID
FROM TestTable
GROUP BY TestTable.unique_ID
) As minTable
ON (t1.test_date = minTable.MinOftest_date
AND t1.unique_id = minTable.unique_id)
INNER JOIN TestTable As t2
INNER JOIN
(
SELECT Max(TestTable.test_date) AS MaxOftest_date,
TestTable.unique_ID
FROM TestTable
GROUP BY TestTable.unique_ID
) AS maxTable
ON t2.test_date = maxTable.MaxOftest_date
AND t2.unique_ID = maxTable.unique_ID
ON minTable.unique_id = maxTable.unique_id;
OUTPUT
unique_id MinOftest_date Mintestdate_result MaxOftest_date Maxtestdate_result
50 4/10/2013 125 12/25/2015 100
75 10/1/2015 135 12/1/2015 150

How to filter out the first and last entry from a table using RANK?

I've this data:
Id Date Value
'a' 2000 55
'a' 2001 3
'a' 2012 2
'a' 2014 5
'b' 1999 10
'b' 2014 110
'b' 2015 8
'c' 2011 4
'c' 2012 33
I want to filter out the first and the last value (when the table is sorted on the Date column), and only keep the other values. In case there are only two entries, nothing is returned. (Example for Id = 'c')
ID Date Value
'a' 2001 3
'a' 2012 2
'b' 2014 110
I tried to use order by (RANK() OVER (PARTITION BY [Id] ORDER BY Date ...)) in combination with this article (http://blog.sqlauthority.com/2008/03/02/sql-server-how-to-retrieve-top-and-bottom-rows-together-using-t-sql/) but I can't get it to work.
[UPDATE]
All the 3 answers seem fine. But I'm not a SQL expert, so my question is which one has the fastest performance if the table has around 800000 rows and there a no indexes on any column.
You can use row_number twice to determine the min and max dates and then filter accordingly:
with cte as (
select id, [date], value,
row_number() over (partition by id order by [date]) minrn,
row_number() over (partition by id order by [date] desc) maxrn
from data
)
select id, [date], value
from cte
where minrn != 1 and maxrn != 1
SQL Fiddle Demo
Here's another approach using min and max for this without needing to use a ranking function:
with cte as (
select id, min([date]) mindate, max([date]) maxdate
from data
group by id
)
select *
from data d
where not exists (
select 1
from cte c
where d.id = c.id and d.[date] in (c.mindate, c.maxdate))
More Fiddle
Here is a similar solution with row_number and count :
SELECT id,
dat,
value
FROM (SELECT *,
ROW_NUMBER()
OVER(
partition BY id
ORDER BY dat) rnk,
COUNT(*)
OVER (
partition BY id) cnt
FROM #table) t
WHERE rnk NOT IN( 1, cnt )
You can do this with EXISTS:
SELECT *
FROM Table1 a
WHERE EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date < a.Date
)
AND EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date > a.Date
)
Demo: SQL Fiddle