Rank the dates in a table for each month - sql

I need to find the last three distinct loaddates for each month in various tables for reporting purposes. Example: If I have data from 2021 February to today: I need the three loaddates of Feb 2021, March 2021 and so on till. Dec 2022
So far, I'm able to create the below query in SQL Server which gives me the result for a particular month that I pass in the where condition.
SELECT ROW_NUMBER() OVER (ORDER BY loaddate desc) AS myrank, loaddate
FROM <tablename>
where year(loaddate) = 2022 and month(loaddate) = 6
group by loaddate
It gives me:
myrank loaddate
1 2022-08-29 00:00:00.000
2 2022-08-25 00:00:00.000
3 2022-08-18 00:00:00.000
4 2022-08-17 00:00:00.000
5 2022-08-11 00:00:00.000
From this I can easily select the top three dates with the below query:
SELECT myrank, loaddate
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY loaddate desc) AS myrank, loaddate
FROM <tablename>
where year(loaddate) = 2022 and month(loaddate) = 6
group by loaddate
) as daterank
WHERE daterank.myrank <= 3
which outputs:
rank loaddate
1 2022-08-29 00:00:00.000
2 2022-08-25 00:00:00.000
3 2022-08-18 00:00:00.000
But this is only for one month. I'm manually passing the month number in the where condition. How to make this ranking query give me the the last 3 distinct loaddates for each month of data that exists in the table?
And also, how to do I run such a generic query on list of 400+ tables instead of changing the tablename manually for each table in the list?

You just add the PARTITION BY clause to ROW_NUMBER() and partition by month (and year since your data might cross a year boundary).
WITH cte AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY DATEPART(year, loaddate), DATEPART(month, loaddate) ORDER BY loaddate desc) AS myrank
FROM #MyTable
)
SELECT *
FROM cte
WHERE myrank <= 3
ORDER BY loaddate;
Note: The CTE is doing the same thing as your sub-query - don't let that confuse you - I just prefer it for neatness.

If I understand your request I think this would help you:
SELECT myrank, loaddate, monthofyear
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY month(loaddate) ORDER BY loaddate DESC) AS myrank
, loaddate, month(loaddate) as monthofyear
FROM Db15.dbo.mytable
GROUP BY loaddate
) AS daterank
WHERE daterank.myrank <= 3

Related

How to select only rows with 2 consecutive "Yes" values ordered by year in SQL?

I have a query to return sample values for each employee per calendar year, and a column that checks (yes/no) if the sample value is >= 60,000.
My initial data:
Employee_ID Calendar_Year Sample_Value Sample_Check
1234 2020 55,000 No
1234 2021 70,000 Yes
1234 2022 50,000 No
3456 2020 80,000 Yes
3456 2021 40,000 No
3456 2022 65,000 Yes
5678 2020 30,000 No
5678 2021 70,000 Yes
5678 2022 90,000 Yes
I would like to get this result, because this employee is the only one with "yes" for 2 consecutive calendar years.
Employee_ID Calendar_Year Sample_Value Sample_Check
5678 2022 90,000 Yes
I have looked up similar questions but could not find something that solves my issue. I have also looked into LAG and LEAD but need help in understanding if they can give me the result I want.
I would tend towards using a correlated query to find qualifying rows, followed by a row_number window to select the greatest/least of each group you require:
with v as (
select *,
case when exists (
select * from t t2
where t2.Employee_ID = t.Employee_ID
and t.Sample_Check = 'Yes'
and t2.Sample_Check = 'Yes'
and t2.Calendar_Year = t.Calendar_Year - 1
) then 1 else 0 end valid
from t
), s as (
select *,
Row_Number() over(partition by Employee_ID, valid order by Calendar_Year desc) rn
from v
)
select Employee_Id, Calendar_Year, Sample_Value, Sample_Check
from s
where valid = 1 and rn = 1;
I'm not sure how bullet proof this is. I used the lag function in a window partition to get the prior Sample_Check. I then matched on the outer query to get the record that (basically shows yes = yes). If you had 3 (Yes) in a row then it would pull back 2. You might be able to use some conditional logic to offset the rows if you ran into that scenario
SELECT
*
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE
) X
WHERE Sample_Check = LagSampleCheck1
ORDER BY Employee_ID ASC, Calendar_Year ASC
I also created this one that does another row_number() Over (Partition BY Employee ID and Order by Calendar year so it picks up the latest year if you have a situation where you have more than one that meet that criteria. I added another record to your original data set (Employee ID 5678, Calendar Year 2023, Samples Value and Sample Check Yes) too create two records.
Employee_ID Calendar_Year Sample_Value Sample_Check
1234 2020 55,000 No
1234 2021 70,000 Yes
1234 2022 50,000 No
3456 2020 80,000 Yes
3456 2021 40,000 No
3456 2022 65,000 Yes
5678 2020 30,000 No
5678 2021 70,000 Yes
5678 2022 90,000 Yes
5678 2023 90,000 Yes
SELECT
*
FROM
(
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY CALENDAR_YEAR DESC) AS ROWCOUNTER
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE
) X
WHERE Sample_Check = LagSampleCheck1
) Z
WHERE ROWCOUNTER = 1
ORDER BY Employee_ID ASC, Calendar_Year ASC
This is the most straightforward solution . Just join the table with itself ( assuming calendar year is numeric )
SELECT t1.*, t2.sample_check
FROM data AS t1, data AS t2
WHERE t1.emp_id = t2.emp_id
AND t1.calendar_year = t2.calendar_year + 1
AND t1.sample_check = t2.sample_check
AND t1.sample_check = 'Yes'
test it
Also you can get same result with lag function with this;
WITH temp AS (SELECT emp_id
, calendar_year
, sample_value
, sample_check
, lag( CASE WHEN sample_check = 'Yes' THEN 1 ELSE 0 END, 1 )
OVER (PARTITION BY emp_id ORDER BY calendar_year) AS prevcheck
FROM data)
SELECT *
FROM temp
WHERE prevcheck = 1
AND sample_check = 'Yes'
Both gives the same result
emp_id calendar_year sample_value sample_check prevcheck
5678 2022 90 Yes 1
test it

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

How to merge rows startdate enddate based on column values using Lag Lead or window functions?

I have a table with 4 columns: ID, STARTDATE, ENDDATE and BADGE. I want to merge rows based on ID and BADGE values but make sure that only consecutive rows will get merged.
For example, If input is:
Output will be:
I have tried lag lead, unbounded, bounded precedings but unable to achieve the output:
SELECT ID,
STARTDATE,
MAX(ENDDATE),
NAME
FROM (SELECT USERID,
IFF(LAG(NAME) over(Partition by USERID Order by STARTDATE) = NAME,
LAG(STARTDATE) over(Partition by USERID Order by STARTDATE),
STARTDATE) AS STARTDATE,
ENDDATE,
NAME
from myTable )
GROUP BY USERID,
STARTDATE,
NAME
We have to make sure that we merge only consective rows having same ID and Badge.
Help will be appreciated, Thanks.
You can split the problem into two steps:
creating the right partitions
aggregating on the partitions with direct aggregation functions (MIN and MAX)
You can approach the first step using a boolean field that is 1 when there's no consecutive date match (row1.ENDDATE = row2.STARTDATE + 1 day). This value will indicate when a new partition should be created. Hence if you compute a running sum, you should have your correctly numbered partitions.
WITH cte AS (
SELECT *,
IFF(LAG(ENDDATE) OVER(PARTITION BY ID, Badge ORDER BY STARTDATE) + INTERVAL 1 DAY = STARTDATE , 0, 1) AS boolval
FROM tab
)
SELECT *
SUM(COALESCE(boolval, 0)) OVER(ORDER BY ID DESC, STARTDATE) AS rn
FROM cte
Then the second step can be summarized in the direct aggregation of "STARTDATE" and "ENDDATE" using the MIN and MAX function respectively, grouping on your ranking value. For syntax correctness, you need to add "ID" and "Badge" too in the GROUP BY clause, even though their range of action is already captured by the computed ranking value.
WITH cte AS (
SELECT *,
IFF(LAG(ENDDATE) OVER(PARTITION BY ID, Badge ORDER BY STARTDATE) + INTERVAL 1 DAY = STARTDATE , 0, 1) AS boolval
FROM tab
), cte2 AS (
SELECT *,
SUM(COALESCE(boolval, 0)) OVER(ORDER BY ID DESC, STARTDATE) AS rn
FROM cte
)
SELECT ID,
MIN(STARTDATE) AS STARTDATE,
MAX(ENDDATE) AS ENDDATE,
Badge
FROM cte2
GROUP BY ID,
Badge,
rn
In Snowflake, such gaps and island problem can be solved using
function conditional_true_event
As below query -
First CTE, creates a column to indicate a change event (true or false) when a value changes for column badge.
Next CTE (cte_1) using this change event column with function conditional_true_event produces another column (increment if change is TRUE) to be used as grouping, in the final main query.
And, final query is just min, max group by.
with cte as (
select
m.*,
case when badge <> lag(badge) over (partition by id order by null)
then true
else false end flag
from merge_tab m
), cte_1 as (
select c.*,
conditional_true_event(flag) over (partition by id order by null) cn
from cte c
)
select id,min(startdate) ms, max(enddate) me, badge
from cte_1
group by id,badge,cn
order by id desc, ms asc, me asc, badge asc;
Final output -
ID
MS
ME
BADGE
51
1985-02-01
2019-04-28
1
51
2019-04-29
2020-08-16
2
51
2020-08-17
2021-04-03
3
51
2021-04-04
2021-04-05
1
51
2021-04-06
2022-08-20
2
51
2022-08-21
9999-12-31
3
10
2020-02-06
9999-12-31
3
With data -
select * from merge_tab;
ID
STARTDATE
ENDDATE
BADGE
51
1985-02-01
2019-04-28
1
51
2019-04-29
2019-04-28
2
51
2019-09-16
2019-11-16
2
51
2019-11-17
2020-08-16
2
51
2020-08-17
2021-04-03
3
51
2021-04-04
2021-04-05
1
51
2021-04-06
2022-05-05
2
51
2022-05-06
2022-08-20
2
51
2022-08-21
9999-12-31
3
10
2020-02-06
2019-04-28
3
10
2021-03-21
9999-12-31
3

Find the start and end date of stock difference

Please Suggest good sql query to find the start and end date of stock difference
imagine i data in a table like below.
Sample_table
transaction_date stock
2018-12-01 10
2018-12-02 10
2018-12-03 20
2018-12-04 20
2018-12-05 20
2018-12-06 20
2018-12-07 20
2018-12-08 10
2018-12-09 10
2018-12-10 30
Expected result should be
Start_date end_date stock
2018-12-01 2018-12-02 10
2018-12-03 2018-12-07 20
2018-12-08 2018-12-09 10
2018-12-10 null 30
It is the gap and island problem. You may use row_numer and group by for this.
select t.stock, min(transaction_date), max(transaction_date)
from (
select row_number() over (order by transaction_date) -
row_number() over (partition by stock order by transaction_date) grp,
transaction_date,
stock
from data
) t
group by t.grp, t.stock
In the following DBFIDDLE DEMO I solve also the null value of the last group, but the main idea of finding consecutive rows is build on the above query.
You may check this for an explanation of this solution.
You can try below using row_number()
select stock,min(transaction_date) as start_date,
case when min(transaction_date)=max(transaction_date) then null else max(transaction_date) end as end_date
from
(
select *,row_number() over(order by transaction_date)-
row_number() over(partition by stock order by transaction_date) as rn
from t1
)A group by stock,rn
Try to use GROUP BY with MIN and MAX:
SELECT
stock,
MIN(transaction_date) Start_date,
CASE WHEN COUNT(*)>1 THEN MAX(transaction_date) END end_date
FROM Sample_table
GROUP BY stock
ORDER BY stock
You can try with LEAD, LAG functions as below:
select currentStockDate as startDate,
LEAD(currentStockDate,1) as EndDate,
currentStock
from
(select *
from
(select
LAG(transaction_date,1) over(order by transaction_date) as prevStockDate,
transaction_date as CurrentstockDate,
LAG(stock,1) over(order by transaction_date) as prevStock,
stock as currentStock
from sample_table) as t
where (prevStock <> currentStock) or (prevStock is null)
) as t2

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138