get maximum date in separated columns (month and year) - sql

I have this table :
Month Year Provider Number
1 2015 1 345
2 2015 1 345
3 2015 1 345
12 2015 2 444
1 2016 2 444
Let's say I want to get all different numbers by provider but only the max month and max year, something like this:
Month Year Provider Number
3 2015 1 345
1 2016 2 444
I have this ugly query that I would like to improve :
SELECT (SELECT max([Month])
FROM dbo.Info b
WHERE b.Provider = a.Provider
AND b.Number = a.Number
AND [Year] = (SELECT max([Year])
FROM dbo.Info c
WHERE c.Provider = a.Provider
AND c.Number = a.Number)) AS [Month],
(SELECT max([Year])
FROM dbo.Info d
WHERE d.Provider = a.Provider
AND d.Number = a.Number)) AS [Year],
a.Provider,
a.Number
FROM dbo.Info a

You could use a row_number and cte
;WITH cte AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Provider ORDER BY [Year] DESC, [Month] DESC) as rNum
FROM Info)
SELECT *
FROM cte where rNum = 1
If you want to create a view then
CREATE VIEW SomeViewName
AS
WITH cte AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Provider ORDER BY [Year] DESC, [Month] DESC) as rNum
FROM Info)
SELECT *
FROM cte where rNum = 1

One option is to use row_number:
select *
from (
select *, row_number() over (partition by provider
order by [year] desc, [month] desc) rn
from dbo.Info
) t
where rn = 1
This assumes the number and provider fields are the same. If not, you may need to also partition by the number field.

Related

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month
You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;
You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output
Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL
This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

Get most recent measurement

I have a table that has has some measurements, ID and date.
The table is built like so
ID DATE M1 M2
1 2020 1 NULL
1 2020 NULL 15
1 2018 2 NULL
2 2019 1 NULL
2 2019 NULL 1
I would like to end up with a table that has one row per ID with the most recent measurement
ID M1 M2
1 1 15
2 1 1
Any ideas?
You can use correlated sub-query with aggregation :
select id, max(m1), max(m2)
from t
where t.date = (select max(t1.date) from t t1 where t1.id = t.id)
group by id;
Use ROW_NUMBER combined with an aggregation:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT ID, MAX(M1) AS M1, MAX(M2) AS M2
FROM cte
WHERE rn = 1
GROUP BY ID;
The row number lets us restrict to only records for each ID having the most recent year date. Then, we aggregate to find the max values for M1 and M2.
In standard SQL, you can use lag(ignore nulls):
select id, coalesce(m1, prev_m1), coalesce(m2, prev_m2)
from (select t.*,
lag(m1 ignore nulls) over (partition by id order by date) as prev_m1,
lag(m2 ignore nulls) over (partition by id order by date) as prev_m2,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum = 1;

First value in DATE minus 30 days SQL

I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks
I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.
I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;

Determine first year of minimum consecutive year range and count of consecutive years

Given the following table,
PersonID Year
---------- ----------
1 1991
1 1992
1 1993
1 1994
1 1996
1 1997
1 1998
1 1999
1 2000
1 2001
1 2002
1 2003
2 1999
2 2000
... ...
Is there a way with a SQL select query to get the first year of the most recent range of consecutive years meeting a minimum number, as well as the total consecutive years? In this case, for 4 year minimum, for personID 1, it would return 1996 and 8.
This will be joined to another table on personID, so the personID is not specific.
Thanks for your help.
You can create islands of years in the cte and check your conditions:
declare #PersonId int = 1, #cnt int = 4
;with cte_numbered as (
select
PersonID,
[Year],
row_number() over(partition by PersonID order by [Year]) as rn
from Table1
), cte_grouped as (
select
PersonID, min([Year]) as [Year], count(*) as cnt
from cte_numbered
group by PersonID, [Year] - rn
)
select top 1 *
from cte_grouped
where PersonId = #PersonId and cnt >= #cnt
order by [Year] desc
sql fiddle demo
You also could do something more optimized, like this
declare #PersonId int = 1, #cnt int = 4
;with cte_numbered as (
select
PersonID,
[Year],
row_number() over(partition by PersonID order by [Year]) as rn
from Table1
where personId = #personId
), cte_grouped as (
select
row_number() over(partition by [year] - rn order by year) as cnt, year
from cte_numbered
)
select top 1 cnt, year - cnt + 1
from cte_grouped
where cnt >= #cnt
order by [Year] desc, cnt desc
sql fiddle demo
Using two CTEs to create row number groupings allows you to group by PersonID and display all personIDs that it applies to:
Declare #MinimumConsecutiveYears int=4
;With YearGroupings as (
Select
PersonID
,year
,row_number() over(partition by personid order by year asc) rown
From #years
)
, ConsecutiveYears as (
Select
PersonID
,min(year) as MinYear
,count(rown) as ConsecutiveYears
,row_number() over(partition by PersonID order by count(rown) desc) rown
From YearGroupings
Group By PersonID,year-rown
Having Count(rown)>#MinimumConsecutiveYears
)
Select PersonID,MinYear,ConsecutiveYears
From ConsecutiveYears
Where Rown=1
Alternatively, without CTEs:
Declare #MinimumConsecutiveYears int=4
Select
PersonID
,year
,row_number() over(partition by personid order by year asc) rown
Into #YearGroupings
From #years
Select
PersonID
,min(year) as MinYear
,count(rown) as ConsecutiveYears
,row_number() over(partition by PersonID order by count(rown) desc) rown
Into #ConsecutiveYears
From YearGroupings
Group By PersonID,year-rown
Having Count(rown)>#MinimumConsecutiveYears
Select PersonID,MinYear,ConsecutiveYears
From #ConsecutiveYears
Where Rown=1
try this:
declare #minnumber int
set #minnumber = 4
declare #personid int
set #personid = 0
select orig.[PersonID], min(orig.[Year]) as FirstYear ,count(*) as TCYears
from --add rownumber, sorted by year column
(
SELECT ROW_NUMBER()
OVER (Partition by [PersonID] ORDER BY [Year]) AS Row,*
from Table1
where PersonID = #personid
) orig
where orig.PersonID = #personid
and orig.Row > #minnumber --
group by orig.PersonID

How to filter out the first and last entry from a table using RANK?

I've this data:
Id Date Value
'a' 2000 55
'a' 2001 3
'a' 2012 2
'a' 2014 5
'b' 1999 10
'b' 2014 110
'b' 2015 8
'c' 2011 4
'c' 2012 33
I want to filter out the first and the last value (when the table is sorted on the Date column), and only keep the other values. In case there are only two entries, nothing is returned. (Example for Id = 'c')
ID Date Value
'a' 2001 3
'a' 2012 2
'b' 2014 110
I tried to use order by (RANK() OVER (PARTITION BY [Id] ORDER BY Date ...)) in combination with this article (http://blog.sqlauthority.com/2008/03/02/sql-server-how-to-retrieve-top-and-bottom-rows-together-using-t-sql/) but I can't get it to work.
[UPDATE]
All the 3 answers seem fine. But I'm not a SQL expert, so my question is which one has the fastest performance if the table has around 800000 rows and there a no indexes on any column.
You can use row_number twice to determine the min and max dates and then filter accordingly:
with cte as (
select id, [date], value,
row_number() over (partition by id order by [date]) minrn,
row_number() over (partition by id order by [date] desc) maxrn
from data
)
select id, [date], value
from cte
where minrn != 1 and maxrn != 1
SQL Fiddle Demo
Here's another approach using min and max for this without needing to use a ranking function:
with cte as (
select id, min([date]) mindate, max([date]) maxdate
from data
group by id
)
select *
from data d
where not exists (
select 1
from cte c
where d.id = c.id and d.[date] in (c.mindate, c.maxdate))
More Fiddle
Here is a similar solution with row_number and count :
SELECT id,
dat,
value
FROM (SELECT *,
ROW_NUMBER()
OVER(
partition BY id
ORDER BY dat) rnk,
COUNT(*)
OVER (
partition BY id) cnt
FROM #table) t
WHERE rnk NOT IN( 1, cnt )
You can do this with EXISTS:
SELECT *
FROM Table1 a
WHERE EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date < a.Date
)
AND EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date > a.Date
)
Demo: SQL Fiddle