How to filter out the first and last entry from a table using RANK? - sql

I've this data:
Id Date Value
'a' 2000 55
'a' 2001 3
'a' 2012 2
'a' 2014 5
'b' 1999 10
'b' 2014 110
'b' 2015 8
'c' 2011 4
'c' 2012 33
I want to filter out the first and the last value (when the table is sorted on the Date column), and only keep the other values. In case there are only two entries, nothing is returned. (Example for Id = 'c')
ID Date Value
'a' 2001 3
'a' 2012 2
'b' 2014 110
I tried to use order by (RANK() OVER (PARTITION BY [Id] ORDER BY Date ...)) in combination with this article (http://blog.sqlauthority.com/2008/03/02/sql-server-how-to-retrieve-top-and-bottom-rows-together-using-t-sql/) but I can't get it to work.
[UPDATE]
All the 3 answers seem fine. But I'm not a SQL expert, so my question is which one has the fastest performance if the table has around 800000 rows and there a no indexes on any column.

You can use row_number twice to determine the min and max dates and then filter accordingly:
with cte as (
select id, [date], value,
row_number() over (partition by id order by [date]) minrn,
row_number() over (partition by id order by [date] desc) maxrn
from data
)
select id, [date], value
from cte
where minrn != 1 and maxrn != 1
SQL Fiddle Demo
Here's another approach using min and max for this without needing to use a ranking function:
with cte as (
select id, min([date]) mindate, max([date]) maxdate
from data
group by id
)
select *
from data d
where not exists (
select 1
from cte c
where d.id = c.id and d.[date] in (c.mindate, c.maxdate))
More Fiddle

Here is a similar solution with row_number and count :
SELECT id,
dat,
value
FROM (SELECT *,
ROW_NUMBER()
OVER(
partition BY id
ORDER BY dat) rnk,
COUNT(*)
OVER (
partition BY id) cnt
FROM #table) t
WHERE rnk NOT IN( 1, cnt )

You can do this with EXISTS:
SELECT *
FROM Table1 a
WHERE EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date < a.Date
)
AND EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date > a.Date
)
Demo: SQL Fiddle

Related

How to get columns from multiple rows in a single row in SQL

I want to get 2 columns col_a and col_b's values for min and max of some other column. For example:
id
last_updated
col_a
col_b
1
2021-01-01
abc
xyz
1
2021-01-02
abc_0
xyz_0
1
2021-01-03
abc_1
xyz_1
1
2021-01-04
abc_2
xyz_2
2
2021-01-01
abc
xyz
2
2021-01-01
abc
xyz
...
I want to get the result:
|1|abc|abc_2|xyz|xyz_2|
That is the result of grouping by id, and getting the values of these columns while putting the condition of min and max on some other column(last_updated).
I came up with the following query:
select id, max(last_updated), min(last_updated)
from my_table
group by id
This gives me the id and min and max dates but not the other 2 columns. I'm not sure how to get the values for the other 2 columns for both dates in same query.
You can use MIN and MAX analytical function as follows:
select id,
max(case when mindt = last_updated then col_a end) as min_col_a,
max(case when maxdt = last_updated then col_a end) as max_col_a,
max(case when mindt = last_updated then col_b end) as min_col_b,
max(case when maxdt = last_updated then col_b end) as max_col_b
from
(select t.*,
min(last_updated) over (partition by id) as mindt,
max(last_updated) over (partition by id) as maxdt
from your_table t) t
group by id
We can use ROW_NUMBER, twice, to find the first and last rows, as ordered by last_updated, for each id group of records. Then, aggregate by id and pivot out columns for the various col_a and col_b values.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY last_updated) rn_min,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY last_updated DESC) rn_max
FROM yourTable
)
SELECT
id,
MAX(CASE WHEN rn_min = 1 THEN col_a END) AS col_a_min,
MAX(CASE WHEN rn_max = 1 THEN col_a END) AS col_a_max,
MAX(CASE WHEN rn_min = 1 THEN col_b END) AS col_b_min,
MAX(CASE WHEN rn_max = 1 THEN col_b END) AS col_b_max
FROM cte
GROUP BY id;
Not the neatest solution but demonstrates another way to obtain the data you want. We join the table on itself as we normally want data from 2 rows, then we use cross apply to restrict it to first and last.
select T1.id, T2.col_a, T1.col_a, T2.col_b, T1.col_b
from #my_table T1
inner join #my_table T2 on T1.id = T2.id
cross apply (
select id, max(last_updated) MaxLastUpdated, min(last_updated) MinLastUpdated
from #my_table
group by id
) X
where T1.last_updated = X.MaxLastUpdated and T2.last_updated = X.MinLastUpdated;
With the sample data provided this appear to perform worse than the row_number() solution. The fastest solution is the analytical functions.
select id,max(last_updated) last_updatedMax, min(last_updated) last_updatedMin,max(col_aMax) col_aMax, max(col_aMin) col_aMin,max(col_bMax) col_bMax, max(col_bMin) col_bMin
from
(
select
*
, first_value(col_a) OVER (PARTITION BY id ORDER BY last_updated desc) as col_aMax
, first_value(col_a) OVER (PARTITION BY id ORDER BY last_updated asc) as col_aMin
, first_value(col_b) OVER (PARTITION BY id ORDER BY last_updated desc) as col_bMax
, first_value(col_b) OVER (PARTITION BY id ORDER BY last_updated asc) as col_bMin
from my_table
) t
group by id

PostgreSQL Pivot by Last Date

I need to make a PIVOT table from Source like this table
FactID UserID Date Product QTY
1 11 01/01/2020 A 600
2 11 02/01/2020 A 400
3 11 03/01/2020 B 500
4 11 04/01/2020 B 200
6 22 06/01/2020 A 1000
7 22 07/01/2020 A 200
8 22 08/01/2020 B 300
9 22 09/01/2020 B 100
Need Pivot Like this where Product QTY is QTY by Last Date
UserID A B
11 400 200
22 200 100
My try PostgreSQL
Select
UserID,
MAX(CASE WHEN Product='A' THEN 'QTY' END) AS 'A',
MAX(CASE WHEN Product='B' THEN 'QTY' END) AS 'B'
FROM table
GROUP BY UserID
And Result
UserID A B
11 600 500
22 1000 300
I mean I get a result by the maximum QTY and not by the maximum date!
What do I need to add to get results by the maximum (last) date ??
Postgres doesn't have "first" and "last" aggregation functions. One method for doing this (without a subquery) uses arrays:
select userid,
(array_agg(qty order by date desc) filter (where product = 'A'))[1] as a,
(array_agg(qty order by date desc) filter (where product = 'B'))[1] as b
from tab
group by userid;
Another method uses select distinct with first_value():
select distinct userid,
first_value(qty) over (partition by userid order by product = 'A' desc, date desc) as a,
first_value(qty) over (partition by userid order by product = 'B' desc, date desc) as b
from tab;
With the appropriate indexes, though, distinct on might be the fastest approach:
select userid,
max(qty) filter (where product = 'A') as a,
max(qty) filter (where product = 'B') as b
from (select distinct on (userid, product) t.*
from tab t
order by userid, product, date desc
) t
group by userid;
In particular, this can use an index on userid, product, date desc). The improvement in performance will be most notable if there are many dates for a given user.
You can use DENSE_RANK() window function in order to filter by the last date per each product and UserID before applying conditional aggregation such as
SELECT UserID,
MAX(CASE WHEN Product='A' THEN QTY END) AS "A",
MAX(CASE WHEN Product='B' THEN QTY END) AS "B"
FROM
(
SELECT t.*, DENSE_RANK() OVER (PARTITION BY Product,UserID ORDER BY Date DESC) AS rn
FROM tab t
) q
WHERE rn = 1
GROUP BY UserID
Demo
presuming all date values are distinct(no ties occur for dates)

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month
You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;
You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output
Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL
This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

Get most recent measurement

I have a table that has has some measurements, ID and date.
The table is built like so
ID DATE M1 M2
1 2020 1 NULL
1 2020 NULL 15
1 2018 2 NULL
2 2019 1 NULL
2 2019 NULL 1
I would like to end up with a table that has one row per ID with the most recent measurement
ID M1 M2
1 1 15
2 1 1
Any ideas?
You can use correlated sub-query with aggregation :
select id, max(m1), max(m2)
from t
where t.date = (select max(t1.date) from t t1 where t1.id = t.id)
group by id;
Use ROW_NUMBER combined with an aggregation:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT ID, MAX(M1) AS M1, MAX(M2) AS M2
FROM cte
WHERE rn = 1
GROUP BY ID;
The row number lets us restrict to only records for each ID having the most recent year date. Then, we aggregate to find the max values for M1 and M2.
In standard SQL, you can use lag(ignore nulls):
select id, coalesce(m1, prev_m1), coalesce(m2, prev_m2)
from (select t.*,
lag(m1 ignore nulls) over (partition by id order by date) as prev_m1,
lag(m2 ignore nulls) over (partition by id order by date) as prev_m2,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum = 1;

get maximum date in separated columns (month and year)

I have this table :
Month Year Provider Number
1 2015 1 345
2 2015 1 345
3 2015 1 345
12 2015 2 444
1 2016 2 444
Let's say I want to get all different numbers by provider but only the max month and max year, something like this:
Month Year Provider Number
3 2015 1 345
1 2016 2 444
I have this ugly query that I would like to improve :
SELECT (SELECT max([Month])
FROM dbo.Info b
WHERE b.Provider = a.Provider
AND b.Number = a.Number
AND [Year] = (SELECT max([Year])
FROM dbo.Info c
WHERE c.Provider = a.Provider
AND c.Number = a.Number)) AS [Month],
(SELECT max([Year])
FROM dbo.Info d
WHERE d.Provider = a.Provider
AND d.Number = a.Number)) AS [Year],
a.Provider,
a.Number
FROM dbo.Info a
You could use a row_number and cte
;WITH cte AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Provider ORDER BY [Year] DESC, [Month] DESC) as rNum
FROM Info)
SELECT *
FROM cte where rNum = 1
If you want to create a view then
CREATE VIEW SomeViewName
AS
WITH cte AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Provider ORDER BY [Year] DESC, [Month] DESC) as rNum
FROM Info)
SELECT *
FROM cte where rNum = 1
One option is to use row_number:
select *
from (
select *, row_number() over (partition by provider
order by [year] desc, [month] desc) rn
from dbo.Info
) t
where rn = 1
This assumes the number and provider fields are the same. If not, you may need to also partition by the number field.