SQL Row Number with Grouping - sql

I am not sure whether this can be done. I want to group the data based on company with consecutive date. Below is the desired result I am attempting in SQL.
EmpNo
Company
StartDt
EndDt
Desired Result
0003
C01
2021-01-01 00:00:00.000
2021-01-10 00:00:00.000
1
0003
C02
2021-01-11 00:00:00.000
2021-01-15 00:00:00.000
2
0003
C02
2021-01-16 00:00:00.000
2021-01-20 00:00:00.000
2
0003
C01
2021-01-21 00:00:00.000
2021-01-31 00:00:00.000
3

You can use lag() to detect when a company changes and then a cumulative sum:
select t.*,
sum(case when company = prev_company then 0 else 1 end) over (partition by empno order by startdt) as desired_result
from (select t.*,
lag(company) over (partition by empno order by startdt) as prev_company
from t
) t

Something like:
SELECT * FROM `<your-table>`
GROUP BY `Company`
ORDER BY `StartDt` DESC

Related

SQL calculate the day difference between current row and next row based on GroupId

Original table is as below, the table has been ordered by GroupId and Date in ascending order. I'd like to calculate the day difference between current and next row for the same GroupId:
GroupId
Date
1
2022-02-20 00:00:00.000
1
2022-02-27 00:00:00.000
1
2022-03-02 00:00:00.000
2
2022-02-03 00:00:00.000
2
2022-02-17 00:00:00.000
The target output should be like this:
GroupId
Date
Previous_Date
Day_Difference
1
2022-02-20 00:00:00.000
null
null
1
2022-02-27 00:00:00.000
2022-02-20 00:00:00.000
7
1
2022-03-02 00:00:00.000
2022-02-27 00:00:00.000
3
2
2022-02-03 00:00:00.000
null
null
2
2022-02-17 00:00:00.000
2022-02-03 00:00:00.000
14
I got the script as below, but it's getting the Previous_Date from the last row and does the calculation, but I would like to keep the Previous_Date and Day_Difference as NULL for the first row as the target table above.
Can someone please help?
My script is:
SELECT
GroupId,
[Date],
LAG([Date]) OVER (ORDER BY [Date]) AS Previous_Date,
DATEDIFF(DAY, LAG([Date]) OVER (ORDER BY [Date]), [Date]) AS Day_Difference
FROM
TestTable
ORDER BY
GroupId
You can try to use PARTITION BY GroupId in OVER clause. PARTITION BY that divides the query result set into partitions.
SELECT
GroupId,
[Date],
lag([Date]) OVER (PARTITION BY GroupId ORDER BY [Date]) as Previous_Date,
DATEDIFF(day, lag([Date]) OVER (PARTITION BY GroupId ORDER BY [Date]), [Date]) AS Day_Difference
FROM TestTable
order by GroupId
sqlfiddle

Using RANK() in SQL as ID Number for Groups of Records

This is my table:
employeeid workdate workstatus
----------- ----------------------- ----------
1 2020-09-01 00:00:00.000 ON
1 2020-09-02 00:00:00.000 ON
1 2020-09-03 00:00:00.000 ON
1 2020-09-04 00:00:00.000 OFF
1 2020-09-05 00:00:00.000 OFF
2 2020-09-01 00:00:00.000 ON
2 2020-09-02 00:00:00.000 ON
2 2020-09-03 00:00:00.000 OFF
2 2020-09-04 00:00:00.000 OFF
2 2020-09-05 00:00:00.000 ON
And I am executing this query:
select employeeid, workdate, workstatus, rank() over(partition by employeeid, workstatus order by workdate) as cycle
from #workstatus
order by 1, 2
With this result:
employeeid workdate workstatus cycle
----------- ----------------------- ---------- --------------------
1 2020-09-01 00:00:00.000 ON 1
1 2020-09-02 00:00:00.000 ON 2
1 2020-09-03 00:00:00.000 ON 3
1 2020-09-04 00:00:00.000 OFF 1
1 2020-09-05 00:00:00.000 OFF 2
2 2020-09-01 00:00:00.000 ON 1
2 2020-09-02 00:00:00.000 ON 2
2 2020-09-03 00:00:00.000 OFF 1
2 2020-09-04 00:00:00.000 OFF 2
2 2020-09-05 00:00:00.000 ON 3
My goal is to have the "cycle" of on/off work be identified by a unique number per employee. So the three ON days for employee 1 would be cycle 1, then the two OFF days would be cycle 2.
The first two ON days for employee 2 would be cycle 1, then the two OFF days would be cycle 2, and the final ON day would be cycle 3.
I'm not sure if I can use RANK() for this, or if there is a better solution. Thanks!
This is a type of gaps-and-islands problem. For this version, use lag() and a cumulative sum:
select t.*,
sum(case when prev_ws= workstatus then 0 else 1 end) over
(partition by employeeid order by workdate) as ranking
from (select t.*,
lag(workstatus) over (partition by employeeid order by workdate) as prev_ws
from t
) t;
Use dense_rank instead of rank
You can use window functions to solve this gaps-and-islands problem. One approach is to take the difference between row numbers to build groups of "adjacent" records:
select employeeid, workdate, workstatus,
row_number() over(partition by employeeid, workstatus, rn1 - rn2 order by workdate) cycle
from (
select t.*,
row_number() over(partition by employeeid order by workdate) rn1,
row_number() over(partition by employeeid, workstatus order by workdate) rn2
from mytable t
) t

how to make table 1 look like table 2? I tried some group by functions but I don't know how to compare one record vs the other

I want a select query that makes table 1 look like table 2. Meaning I want a query that will give me records that match in mem_no, join_date, end_date, product_id but do not match in Indicator. The indicator combination I am looking at is (PR vs SN) and (SR and SN). Meaning, I want records that have same mem_no, join_date, end_date, product_id but have 'PR' as indicator and 'SN' the next time or 'SR' as indicator or 'SN' next time.
Table 1:
mem_no join_date end_date product_id Indicator
1 2/11/2018 12/12/2018 1 PR
2 2/11/2018 12/12/2018 1 PR
2 2/11/2018 12/12/2018 1 SN
3 3/5/2017 12/12/2018 8 SR
3 3/5/2017 12/12/2018 8 SN
4 3/5/2017 12/12/2018 86 PR
4 3/5/2017 12/12/2018 86 PR
4 3/5/2017 12/12/2018 87 SR
4 3/5/2017 12/12/2018 87 SN
Table 2:
mem_no join_date end_date product_id Indicator
2 2/11/2018 12/12/2018 1 PR
2 2/11/2018 12/12/2018 1 SN
4 3/5/2017 12/12/2018 87 SR
4 3/5/2017 12/12/2018 87 SN
Perhaps you want EXISTS:
select t.*
from table1 t1
where Indicator in ('PR', 'SN', 'SR') and
exists (select 1
from table1 t11
where t11.mem_no = t.mem_no and t11.join_date = t.join_date and t11.end_date = t1.end_date and
t11.product_id = t.product_id and t11.Indicator <> t.Indicator
);
Analytic functions are your friends
with data as (
select
mem_no, join_date, end_date, product_id, Indicator,
count(*) over (partition by mem_no, join_date, end_date, product_id) as cnt,
listagg(Indicator, ',') WITHIN GROUP (ORDER BY Indicator)
over (partition by mem_no, join_date, end_date, product_id) Indicator_lst
from tab)
select mem_no, join_date, end_date, product_id, Indicator
from data
where cnt = 2 and
Indicator_lst in ('PR,SN','SN,SR');
The CTO calculates the count of the rows with same mem_no, join_date, end_date, product_id. The listagg concatenates the indicators.
The main query filters the two conditions you want.
On your sample data you get
MEM_NO JOIN_DATE END_DATE PRODUCT_ID INDICATOR
---------- ------------------- ------------------- ---------- ---------
2 02-11-2018 00:00:00 12-12-2018 00:00:00 1 PR
2 02-11-2018 00:00:00 12-12-2018 00:00:00 1 SN
3 03-05-2017 00:00:00 12-12-2018 00:00:00 8 SN
3 03-05-2017 00:00:00 12-12-2018 00:00:00 8 SR
4 03-05-2017 00:00:00 12-12-2018 00:00:00 87 SN
4 03-05-2017 00:00:00 12-12-2018 00:00:00 87 SR

SQL Dates Selection

I Have a OPL_Dates Table with Start Date and End Dates as Below:
dbo.OPL_Dates
ID Start_date End_date
--------------------------------------
12345 1975-01-01 2001-12-31
12345 1989-01-01 2004-12-31
12345 2005-01-01 NULL
12345 2007-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2013-02-07 NULL
12377 2010-01-01 2012-01-01
12489 2011-12-31 NULL
12489 2012-03-01 2012-04-01
The Output I am looking for is:
ID Start_date End_date
-------------------------------------
12345 1975-01-01 2004-12-31
12345 2005-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2010-01-01 2012-01-01
12377 2013-02-07 NULL
12489 2011-12-31 NULL
Basically, I want to show the gap between the OPL periods(IF Any) else I need min of Start Date and Max of End Dates, for a particular ID.NULL means Open-Ended Date which can be converted to "9999-12-31".
The following pretty much does what you want:
with p as (
select v.*, sum(inc) over (partition by v.id order by v.dte) as running_inc
from t cross apply
(values (id, start_date, 1),
(id, coalesce(end_date, '2999-12-31'), -1)
) v(id, dte, inc)
)
select id, min(dte), max(dte)
from (select p.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by id order by dte desc) as grp
from p
) p
group by id, grp;
Note that it changes the "inifinite" end date from NULL to 2999-12-31. This is a convenience, because NULL orders first in SQL Server ascending sorts.
Here is a SQL Fiddle.
What is this doing? It is unpivoting the dates into a single column, with a 1/-1 flag (inc) indicating whether the record is a start or end. The running sum of this flag then indicates the groups that should be combined. When the running sum is 0, then a group has ended. To include the end date in the right group, a reverse running sum is needed -- but that's a detail.

Calculate discount between weeks

I have a table containing product price data, like that:
ProductId RecordDate Price
46 2015-01-17 14:35:05.533 112.00
47 2015-01-17 14:35:05.533 88.00
45 2015-01-17 14:35:05.533 134.00
I have been able to group data by week and product, with this query:
SET DATEFIRST 1;
SELECT DATEADD(WEEK, DATEDIFF(WEEK, 0, [RecordDate]), 0) AS [Week], ProductId, MIN([Price]) AS [MinimumPrice]
FROM [dbo].[ProductPriceHistory]
GROUP BY DATEADD(WEEK, DATEDIFF(WEEK, 0, [RecordDate]), 0), ProductId
ORDER BY ProductId, [Week]
obtaining this result:
Week Product Price
2015-01-12 00:00:00.000 1 99.00
2015-01-19 00:00:00.000 1 98.00
2015-01-26 00:00:00.000 1 95.00
2015-02-02 00:00:00.000 1 95.00
2015-02-09 00:00:00.000 1 95.00
2015-02-16 00:00:00.000 1 95.00
2015-02-23 00:00:00.000 1 80.00
2015-03-02 00:00:00.000 1 97.00
2015-03-09 00:00:00.000 1 85.00
2015-01-12 00:00:00.000 2 232.00
2015-01-19 00:00:00.000 2 233.00
2015-01-26 00:00:00.000 2 194.00
2015-02-02 00:00:00.000 2 194.00
2015-02-09 00:00:00.000 2 199.00
2015-02-16 00:00:00.000 2 199.00
2015-02-23 00:00:00.000 2 199.00
2015-03-02 00:00:00.000 2 214.00
Now for each product I'd like to get the difference between the last two week values, so that I can calculate the discount. I don't know how to write this as a SQL Query!
EDIT:
Expected output would be something like that:
Product Price
1 -12.00
2 15.00
Thank you!
since you are using Sql Server 2014 you can use LAG or LEAD window function to do this.
Generate Row number to find the last two weeks for each product.
;WITH cte
AS (SELECT *,
Row_number()OVER(partition BY product ORDER BY weeks DESC)rn
FROM Yourtable)
SELECT product,
price
FROM (SELECT product,
Price=price - Lead(price)OVER(partition BY product ORDER BY rn)
FROM cte a
WHERE a.rn <= 2) A
WHERE price IS NOT NULL
SQLFIDDLE DEMO
Traditional solution, can be used before Sql server 2012
;WITH cte
AS (SELECT *,
Row_number()OVER(partition BY product
ORDER BY weeks DESC)rn
FROM Yourtable)
SELECT a.Product,
b.Price - a.Price
FROM cte a
LEFT JOIN cte b
ON a.Product = b.Product
AND a.rn = b.rn + 1
WHERE a.rn <= 2
AND b.Product IS NOT NULL