I'd like to create and populate the following No. of Entries in Curr.Status field seen below using SQL (sql server).
ID Sequence Prev.Status Curr.Status No. of Entries in Curr.Status
9-9999-9 1 Status D Status A 1
9-9999-9 2 Status A Status A 2
9-9999-9 3 Status A Status A 3
9-9999-9 4 Status A Status A 4
9-9999-9 5 Status A Status B 1
9-9999-9 6 Status B Status B 2
9-9999-9 7 Status B Status B 3
9-9999-9 8 Status B Status A 1
9-9999-9 9 Status A Status A 2
9-9999-9 10 Status A Status C 1
9-9999-9 11 Status C Status C 2
Is there an quick way using something like row_number() --this alone doesn't appear to be sufficient-- to create the field I'm looking for?
Thanks!
This appears to be a Groups and Islands problem. there are plenty of examples out there on how to achieve this, however:
WITH VTE AS(
SELECT *
FROM (VALUES('9-9999-9',1 ,'Status D','Status A'),
('9-9999-9',2 ,'Status A','Status A'),
('9-9999-9',3 ,'Status A','Status A'),
('9-9999-9',4 ,'Status A','Status A'),
('9-9999-9',5 ,'Status A','Status B'),
('9-9999-9',6 ,'Status B','Status B'),
('9-9999-9',7 ,'Status B','Status B'),
('9-9999-9',8 ,'Status B','Status A'),
('9-9999-9',9 ,'Status A','Status A'),
('9-9999-9',10,'Status A','Status C'),
('9-9999-9',11,'Status C','Status C')) V(ID, Sequence, PrevStatus,CurrStatus)),
CTE AS(
SELECT ID,
[Sequence],
PrevStatus,
CurrStatus,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Sequence]) -
ROW_NUMBER() OVER (PARTITION BY ID,CurrStatus ORDER BY [Sequence]) AS Grp
FROM VTE V)
SELECT ID,
[Sequence],
PrevStatus,
CurrStatus,
ROW_NUMBER() OVER (PARTITION BY Grp ORDER BY [Sequence]) AS Entries
FROM CTE;
You can mark the rows where status changes using LAG function, and use SUM() OVER () to assign unique number to each group. Numbering within group is trivial:
DECLARE #t TABLE (ID VARCHAR(100), Sequence INT, PrevStatus VARCHAR(100), CurrStatus VARCHAR(100));
INSERT INTO #t VALUES
('9-9999-9', 1, 'Status D', 'Status A'),
('9-9999-9', 2, 'Status A', 'Status A'),
('9-9999-9', 3, 'Status A', 'Status A'),
('9-9999-9', 4, 'Status A', 'Status A'),
('9-9999-9', 5, 'Status A', 'Status B'),
('9-9999-9', 6, 'Status B', 'Status B'),
('9-9999-9', 7, 'Status B', 'Status B'),
('9-9999-9', 8, 'Status B', 'Status A'),
('9-9999-9', 9, 'Status A', 'Status A'),
('9-9999-9', 10, 'Status A', 'Status C'),
('9-9999-9', 11, 'Status C', 'Status C');
WITH cte1 AS (
SELECT *, CASE WHEN LAG(CurrStatus) OVER(ORDER BY Sequence) = CurrStatus THEN 0 ELSE 1 END AS chg
FROM #t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY Sequence) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY Sequence) AS SeqInGroup
FROM cte2
)
SELECT *
FROM cte3
ORDER BY Sequence
Demo on DB Fiddle
If the Sequence is identity column then you can do :
select t.*,
row_number() over (partition by (Sequence - seq) order by Sequence) as [No. of Entries in Curr.Status]
from (select t.*,
row_number() over (partition by [Curr.Status] order by Sequence) as seq
from table t
) t;
else you need to generate two row_numbers :
select t.*,
row_number() over (partition by (seq1- seq2) order by Sequence) as [No. of Entries in Curr.Status]
from (select t.*,
row_number() over (partition by id order by Sequence) as seq1
row_number() over (partition by id, [Curr.Status] order by Sequence) as seq2
from table t
) t;
Related
I have to build the Exceptions Report to catch Overlaps or Gaps. The dataset has clients and assigned supervisors with start and end dates of supervision.
CREATE TABLE Report
(Id INT, ClientId INT, ClientName VARCHAR(30), SupervisorId INT, SupervisorName
VARCHAR(30), SupervisionStartDate DATE, SupervisionEndDate DATE);
INSERT INTO Report
VALUES
(1, 22, 'Client A', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(2, 22, 'Client A', 44, 'Supervisor B', '2022-05-01', '2022-08-23'),
(3, 22, 'Client A', 55, 'Supervisor C', '2022-08-24', NULL),
(4, 23, 'Client B', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(5, 23, 'Client B', 44, 'Supervisor B', '2022-04-30', '2022-08-23'),
(6, 24, 'Client C', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(7, 24, 'Client C', 44, 'Supervisor B', '2022-05-01', '2022-08-23'),
(8, 24, 'Client C', 55, 'Supervisor C', '2022-07-22', '2022-10-25'),
(9, 25, 'Client D', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(10, 25, 'Client D', 44, 'Supervisor B', '2022-07-23', NULL)
SELECT * FROM Report
'Valid' status should be assigned to all rows associated with Client if no Gaps or Overlaps present, for example:
Client A has 3 Supervisors - Supervisor A (01/01/2022 - 04/30/2022), Supervisor B (05/01/2022 - 08/23/2022) and Supervisor C (08/24/2022 - Present).
'Issue Found' status should be assigned to all rows associated with Client if any Gaps or Overlaps present, for example:
Client B has 2 Supervisors - Supervisor A (01/01/2022 - 04/30/2022) and Supervisor B (04/30/2022 - 08/23/2022).
Client C has 3 Supervisors - Supervisor A (01/01/2022 - 04/30/2022), Supervisor B (05/01/2022 - 08/23/2022) and Supervisor C (07/22/2022 - 10/25/2022).
These are examples of the Overlap.
Client D has 2 Supervisors - Supervisor A (01/01/2022 - 04/30/2022) and Supervisor B (07/23/2022 - Present).
This is the example of the Gap.
The Output I need:
I added some columns that might be helpful, but don't know how to accomplish the main goal.
However, I noticed, that if the first record in the [Diff Between PreviousEndDate And SupervisionStartDate] column is NULL and all other = 1, then it will be Valid.
SELECT
Report.*,
ROW_NUMBER() OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS [ClientRecordNumber],
COUNT(*) OVER (PARTITION BY Report.ClientId) AS [TotalNumberOfClientRecords],
DATEDIFF(DAY, Report.SupervisionStartDate, Report.SupervisionEndDate) AS SupervisionAging,
LAG(Report.SupervisionStartDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS PreviousStartDate,
LAG(Report.SupervisionEndDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS PreviousEndDate,
LEAD(Report.SupervisionStartDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS NextStartDate,
LEAD(Report.SupervisionEndDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS NextEndDate,
DATEDIFF(dd, LAG(Report.SupervisionEndDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)), Report.SupervisionStartDate) AS [Diff Between PreviousEndDate And SupervisionStartDate]
FROM Report
One approach:
Use the additional LAG parameters to provide a default value for when its null, and make that value a valid value i.e. 1 day before the StartDate
Use a CTE to calculate the difference in days between the StartDate and previous EndDate.
Then use a second CTE to determine for any given client whether there is an issue.
Finally display your desired results.
WITH cte1 AS (
SELECT
R.*
, DATEDIFF(day, LAG(R.SupervisionEndDate,1,dateadd(day,-1,R.SupervisionStartDate)) OVER (PARTITION BY R.ClientId ORDER BY COALESCE(R.SupervisionStartDate, R.SupervisionEndDate)), R.SupervisionStartDate) AS Diff
FROM Report R
), cte2 AS (
SELECT *
, MAX(COALESCE(Diff,0)) OVER (PARTITION BY ClientId) MaxDiff
, MIN(COALESCE(Diff,0)) OVER (PARTITION BY ClientId) MinDiff
FROM cte1
)
SELECT Id, ClientId, ClientName, SupervisorId, SupervisorName, SupervisionStartDate, SupervisionEndDate
--, Diff, MaxDiff, MinDiff -- Debug
, CASE WHEN MaxDiff = 1 AND MinDiff = 1 THEN 'Valid' ELSE 'Issue Found' END [Status]
FROM cte2
ORDER BY Id;
Notes:
Use the fullname of the datepart you are diff-ing - its much clearer and easier to maintain.
Use short, relevant, table aliases to reduce the code.
I am trying to find all records that exist within a date range prior to an event occurring. In my table below, I want to pull all records that are 3 days or less from when the switch field changes from 0 to 1, ordered by date, partitioned by product. My solution does not work, it includes the first record when it should skip as it's outside the 3 day window. I am scanning a table with millions of records, is there a way to reduce the complexity/cost while maintaining my desired results?
http://sqlfiddle.com/#!18/eebe7
CREATE TABLE productlist
([product] varchar(13), [switch] int, [switchday] date)
;
INSERT INTO productlist
([product], [switch], [switchday])
VALUES
('a', 0, '2019-12-28'),
('a', 0, '2020-01-02'),
('a', 1, '2020-01-03'),
('a', 0, '2020-01-06'),
('a', 0, '2020-01-07'),
('a', 1, '2020-01-09'),
('a', 1, '2020-01-10'),
('a', 1, '2020-01-11'),
('b', 1, '2020-01-01'),
('b', 0, '2020-01-02'),
('b', 0, '2020-01-03'),
('b', 1, '2020-01-04')
;
my solution:
with switches as (
SELECT
*,
case when lead(switch) over (partition by product order by switchday)=1
and switch=0 then 'first day switch'
else null end as leadswitch
from productlist
),
switchdays as (
select * from switches
where leadswitch='first day switch'
)
select pl.*
,'lead'
from productlist pl
left join switchdays ss
on pl.product=ss.product
and pl.switchday = ss.switchday
and datediff(day, pl.switchday, ss.switchday)<=3
where pl.switch=0
desired output, capturing records that occur within 3 days of a switch going from 0 to 1, for each product, ordered by date:
product switch switchday
a 0 2020-01-02 lead
a 0 2020-01-06 lead
a 0 2020-01-07 lead
b 0 2020-01-02 lead
b 0 2020-01-03 lead
If I understand correctly, you can just use lead() twice:
select pl.*
from (select pl.*,
lead(switch) over (partition by product order by switchday) as next_switch_1,
lead(switch, 2) over (partition by product order by switchday) as next_switch_2
from productlist pl
) pl
where switch = 0 and
1 in (next_switch_1, next_switch_2);
Here is a db<>fiddle.
EDIT (based on comment):
select pl.*
from (select pl.*,
min(case when switch = 1 then switchdate end) over (partition by product order by switchdate desc) as next_switch_1_day
from productlist pl
) pl
where switch = 0 and
next_switch_one_day <= dateadd(day, 2, switchdate);
Below I've the example table
Create Table #A
(
Time nvarchar(70),
Trader nvarchar(30),
Product nvarchar(30),
[Buy/Sell] nvarchar(30)
)
Insert into #A Values
('2019-03-01T14:22:29z', 'Jhon', 'Apple', 'Buy'),
('2019-03-01T12:35:09z', 'Jhon', 'Orange', 'Sell'),
('2019-03-01T12:35:09z', 'Mary', 'Milk', 'Buy'),
('2019-03-01T12:35:10z', 'Susan', 'Milk', 'Buy'),
('2019-03-01T12:35:23z', 'Tom', 'Bread', 'Sell'),
('2019-03-01T14:15:52z', 'Jhon', 'Apple', 'Sell'),
('2019-03-01T14:15:53z', 'Tom', 'Orange', 'Sell'),
('2019-03-01T14:22:33z', 'Mary', 'Apple', 'Buy'),
('2019-03-01T14:22:37z', 'Mary', 'Orange', 'Sell'),
('2019-03-01T12:37:41z', 'Susan', 'Milk', 'Buy'),
('2019-03-01T12:37:41z', 'Susan', 'Milk', 'Buy')
Select * from #A
Basically I'm to get the same Trader buying and selling the same product within the 3minutes
Below I've tried this but not the correct one and working
;With DateTimeTbl
as
(
select SUBSTRING(a.Time,1,10) date, SUBSTRING(a.Time,12,8) Time1, a.*
-- lead(Time) over(order by time) cnt
from #A a
),
DataTbl
as
(
Select d.*, row_number() over(Partition by d.Trader,d.product order by d.time1) CntSrs
from DateTimeTbl d
--where [buy/sell] = 'Sell'
)
Select lag(Time1) over(order by time) cnt, d.* from DataTbl d where CntSrs>1
Basically I'm to get the same Trader buying and selling the same product within the 3minutes
I would suggest lead(). To get the first record:
select a.*
from (select a.*,
lead(time) over (partition by trader, product order by time) as next_time,
lead(buy_sell) over (partition by trader, product order by time) as next_buy_sell
from #a a
) a
where next_time < dateadd(minute, 3, time) and
buy_sell <> next_buy_sell;
Note: This assumes that buy_sell takes on only two values, which is consistent with your sample data.
Here is a db<>fiddle. Note that it fixes the data types to be appropriate (for the time column) and renames the last column so it does not need to be escaped.
I am a bit stuck! I have data, like the below.
I need to calculate the sum of frequency between each customer. In the above, FROM customer1 TO customer2 should be summed with FROM customer2 TO customer1 - like below.
It doesn't matter which direction the message went in; I just need to sum all communication between customer1 and customer2.
You can use the greatest and least functionality as follows:
select least(from,to) as from, greatest(from,to) as to, sum(frequency) as freq
from your_Table
group by least(from,to), greatest(from,to)
If greatest and least is not supported in your version then you can use the case..when also.
select case when from > to then to else from end as from,
case when from > to then from else to end as to,
sum(frequency) as freq
from your_Table
group by case when from > to then to else from end,
case when from > to then from else to end
you can try sorting the From To
WITH tab AS
(SELECT * FROM (VALUES ('Customer 1', 'Customer 2', 2)
, ('Customer 2', 'Customer 1', 4)
, ('Customer 3', 'Customer 1', 4)
) a ([From], [To], [Frequency])
)
SELECT IIF([From] > [To], [To], [From]) [From]
, IIF([From] > [To], [From], [To]) [To]
, SUM([Frequency]) Frequency
From tab
GROUP BY IIF([From] > [To], [To], [From])
, IIF([From] > [To], [From], [To])
You can group by a sorted array:
select
sort_array(array(`from`, `to`))[0] `from`,
sort_array(array(`from`, `to`))[1] `to`,
sum(frequency)
from mytable
group by sort_array(array(`from`, `to`));
Taking into account that two map() with the same key-value pairs but in different order are equal, because maps are unordered by definition, you can exploit this property to aggregate frequency.
Demo with your data example:
with mytable as(
select stack (3,
'Customer 1', 'Customer 2', 2,
'Customer 2', 'Customer 1', 4,
'Customer 3', 'Customer 1', 4
) as (`from`, `to` , frequency)
)
select map_keys(vmap)[0] as `from`, map_keys(vmap)[1] as `to`, frequency
from
(
select map(`from`, 1, `to`, 1) vmap, sum(frequency) frequency
from mytable group by map(`from`, 1, `to`, 1)
)s;
Result:
from to frequency
Customer 2 Customer 1 6
Customer 3 Customer 1 4
I'd like a running distinct count with a partition by year for the following data:
DROP TABLE IF EXISTS #FACT;
CREATE TABLE #FACT("Year" INT,"Month" INT, "Acc" varchar(5));
INSERT INTO #FACT
values
(2015, 1, 'A'),
(2015, 1, 'B'),
(2015, 1, 'B'),
(2015, 1, 'C'),
(2015, 2, 'D'),
(2015, 2, 'E'),
(2015, 3, 'E'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C');
SELECT * FROM #FACT;
The following returns the correct answer but is there a more concise way that is also performant?
WITH
dnsRnk AS
(
SELECT
"Year"
, "Month"
, DenseR = DENSE_RANK() OVER(PARTITION BY "Year", "Month" ORDER BY "Acc")
FROM #FACT
),
mxPerMth AS
(
SELECT
"Year"
, "Month"
, RunningTotal = MAX(DenseR)
FROM dnsRnk
GROUP BY
"Year"
, "Month"
)
SELECT
"Year"
, "Month"
, X = SUM(RunningTotal) OVER (PARTITION BY "Year" ORDER BY "Month")
FROM mxPerMth
ORDER BY
"Year"
, "Month";
The above returns the following - the answer should also return exactly the same table:
If you want a running count of distinct accounts:
SELECT f.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by year order by month) as cume_distinct_acc
FROM (
SELECT
f.*
,row_number() over (partition by account order by year, month) as seqnum
FROM #fact f
) f;
This counts each account during the first month when it appears.
EDIT:
Oops. The above doesn't aggregate by year and month and then start over for each year. Here is the correct solution:
SELECT
year
,month
,sum( sum(case when seqnum = 1 then 1 else 0 end)
) over (partition by year order by month) as cume_distinct_acc
FROM (
SELECT
f.*
,row_number() over (partition by account, year order by month) as seqnum
FROM #fact f
) f
group by year, month
order by year, month;
And, SQL Fiddle isn't working but the following is an example:
with FACT as (
SELECT yyyy, mm, account
FROM (values
(2015, 1, 'A'),
(2015, 1, 'B'),
(2015, 1, 'B'),
(2015, 1, 'C'),
(2015, 2, 'D'),
(2015, 2, 'E'),
(2015, 3, 'E'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C')) v(yyyy, mm, account)
)
SELECT
yyyy
,mm
,sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by yyyy order by mm) as cume_distinct_acc
FROM (
SELECT
f.*
,row_number() over (partition by account, yyyy order by mm) as seqnum
FROM fact f
) f
group by yyyy, mm
order by yyyy, mm;
Demo Here:
;with cte as (
SELECT yearr, monthh, count(distinct acc) as cnt
FROM #fact
GROUP BY yearr, monthh
)
SELECT
yearr
,monthh
,sum(cnt) over (Partition by yearr order by yearr, monthh rows unbounded preceding ) as x
FROM cte