Using a (Recursive?) CTE + Window Functions to zero out sales orders? - sql

I am trying to use a recursive CTE + window functions to find the last outcome of a series of buy/sell orders.
First, here's some nomenclature:
field_id is the store's ID.
Field_number is an order number, but can be reused by the same person
Field_date is the date of the initial order.
Field_inserted is when this specific transaction occcurred.
Field_sale is whether we bought or returned it.
Unfortunately, because of the way the systems work, I do NOT get the cost when an item is returned, so figuring out the last outcome for an order is complicated (did we wind up selling any). I need to match the purchase with the sale, Which normally works pretty well. However, there are cases such as below when it fails, and I'm trying to find a way to do this in one pass, possibly using a recursive CTE.
Here's some code.
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:01', 'buy'),
(2, 100, '20170311','20170311 01:03:00', 'REtu'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
Now to remove the buys that were then returned. The ISNULL is because I'm the NOT IN will ignore all the rows that have NULL for the _lead/_lag values.
WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM #tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
And I felt pretty smug and thought I had it. However, that's the simple case. Buy, Return, Buy, Return. Let's try another case, Buy Buy Return Return, which is still valid, but obviously would result in a net of 0..
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM #tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'sell')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'sell' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
When you do this, though, you realize that it found direct matches, but now there's still a Buy/Return pair, and I'd like to cancel that out.
It's at this point I'm stuck. I've done Recursive CTEs before, but for whatever reason I can't figure out how to recurse and make it cancel out 1/1/100 and 4/1/100. All I've managed to do is have it choke on the recursion.
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
field_inserted,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM #tablea
--)
--SELECT * FROM cte
--WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
--AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
UNION ALL
SELECT
ROW_NUMBER() OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS row_num,
cte.field_id,
cte.field_number,
cte.field_date,
cte.field_sale,
cte.field_inserted,
lead(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lead,
lag(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lag
FROM #tablea INNER JOIN cte ON cte.field_date = [#tablea].field_date AND cte.field_id = [#tablea].field_id AND cte.field_number = [#tablea].field_number
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)

We can tackle this without loops or recursion by using a common table expression and row_number() like so:
If I am understanding your question correctly, you want to remove sales that have been returned
, and for each 'retu' it should remove the most recent 'buy'.
First we will add id using row_number() to our rowset so we can uniquely identify our rows.
Next, we add br_rn (short for Buy/Return RowNumber) partitioned by field_id, field_number, field_date, but we will also add field_sale to the partition; and we will order it by field_inserted desc.
This will let us match each 'retu' with the most recent 'buy', and once we can do that, we can eliminate all of the pairs with not exists():
;with cte as (
select
id = row_number() over (
order by field_id, field_number, field_date, field_inserted asc
)
, field_id
, field_number
, field_date
, field_inserted
, field_sale
, br_rn = row_number() over (
partition by field_id, field_number, field_date, field_sale
order by field_inserted desc
)
from #tablea
)
select
id
, field_number
, field_date
, field_inserted
, field_sale
from cte
where not exists (
select 1
from cte as i
where i.field_id = cte.field_id
and i.field_number = cte.field_number
and i.field_date = cte.field_date
and i.br_rn = cte.br_rn
and i.id <> cte.id
)
order by id
rextester demo: http://rextester.com/TKXOC61533
For this input:
(1, 100, '20170311','20170311 01:00:00', 'Buy')
, (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Retu')
, (1, 100, '20170311','20170311 01:03:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');
returns:
+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date | field_inserted | field_sale |
+----+----------+--------------+------------+---------------------+------------+
| 5 | 1 | 110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
| 6 | 2 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
+----+----------+--------------+------------+---------------------+------------+
and for this input:
(1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Buy')
, (1, 100, '20170311','20170311 01:06:00', 'Retu')
, (1, 100, '20170311','20170311 01:07:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');
returns:
+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date | field_inserted | field_sale |
+----+----------+--------------+------------+---------------------+------------+
| 1 | 1 | 100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy |
| 8 | 1 | 110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
| 9 | 2 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
+----+----------+--------------+------------+---------------------+------------+
for this input:
(1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Retu')
, (1, 100, '20170312','20170311 01:06:00', 'Buy')
, (1, 100, '20170312','20170311 01:07:00', 'Buy')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy')
returns:
+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date | field_inserted | field_sale |
+----+----------+--------------+------------+---------------------+------------+
| 5 | 1 | 100 | 2017-03-12 | 2017-03-11 01:06:00 | Buy |
| 6 | 1 | 100 | 2017-03-12 | 2017-03-11 01:07:00 | Buy |
| 7 | 1 | 110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
| 8 | 2 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
+----+----------+--------------+------------+---------------------+------------+
It may help illustrate what we are doing to take a look what the cte is returning before we eliminate any pairs.
Looking at just the set that needs filtering, before we filter it:
+----+----------+--------------+------------+---------------------+------------+-------+
| id | field_id | field_number | field_date | field_inserted | field_sale | br_rn |
+----+----------+--------------+------------+---------------------+------------+-------+
| 1 | 1 | 100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy | 4 |
| 2 | 1 | 100 | 2017-03-11 | 2017-03-11 01:02:00 | Buy | 3 |
| 3 | 1 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy | 2 |
| 4 | 1 | 100 | 2017-03-11 | 2017-03-11 01:04:00 | Retu | 3 |
| 5 | 1 | 100 | 2017-03-11 | 2017-03-11 01:05:00 | Buy | 1 |
| 6 | 1 | 100 | 2017-03-11 | 2017-03-11 01:06:00 | Retu | 2 |
| 7 | 1 | 100 | 2017-03-11 | 2017-03-11 01:07:00 | Retu | 1 |
+----+----------+--------------+------------+---------------------+------------+-------+
Looking at it like this, we can easily see that the 'buy' order id 1 has a br_rn of 4 and there is no associated 'retu'.

One thing i can suggest delete pairs of sequential buy/return while it's possible. Try
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'),
(1, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 100, '20170311','20170311 01:04:00', 'Retu'),
(1, 100, '20170311','20170311 01:05:00', 'Buy'),
(1, 100, '20170311','20170311 01:06:00', 'Retu'),
(1, 100, '20170311','20170311 01:07:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
select * from #tablea
order by field_id,
field_number,
field_inserted
declare #eoj int =1;
while #eoj > 0
begin
WITH cte AS
(
SELECT
case field_sale when 'Buy' then
lead (field_sale) OVER (PARTITION BY field_id, field_number ORDER BY field_inserted)
when 'Retu' then
lag (field_sale) OVER (PARTITION BY field_id, field_number ORDER BY field_inserted)
end nbr_type,
field_id,
field_number,
field_date,
field_sale,
field_inserted
FROM #tablea
)
delete
from cte
where nbr_type is not null and nbr_type <> field_sale;
set #eoj = ##rowcount;
-- check it
select * from #tablea
order by field_id,
field_number,
field_inserted;
end;
It will be repeated N+1 times where N is the length of the longest sequence of returns. N=2 in the above example.

Related

risk_score result for each of month

I want to generate highest risk_score result for each of month (Jan, Feb & Mar)
Displaying the following columns: Firm_id_1, risk_score_Jan, risk_score_Feb, risk_score_Mar
CREATE table firm_risk (
firm_id_1 INT,
assessment_date DATE,
risk_score FLOAT
);
INSERT INTO firm_risk (firm_id_1, assessment_date, risk_score)
VALUES (123, '1/01/2018', 0.43),
(123, '1/28/2018', 0.80),
(123, '2/11/2018', 0.28),
(123, '2/23/2018', 0.91),
(123, '3/11/2018', 0.08),
(123, '3/31/2018', 0.60),
(456, '1/4/2018', 0.87),
(456, '1/6/2018', 0.02),
(456, '1/20/2018', 0.39),
(456, '2/3/2018', 0.10),
(456, '3/1/2018', 0.12),
(789, '1/1/2018', 0.20),
(789, '3/1/2018', 0.17);
SELECT * FROM firm_risk;
SELECT firm_id_1, date_part('month', assessment_date) AS AD
FROM firm_risk
WHERE assessment_date = (SELECT MAX (assessment_date) FROM firm_risk)
GROUP BY firm_id_1, risk_score, assessment_date;
CREATE table latest_risk_score (
firm_id_2 integer,
latest_risk_score_Jan float,
latest_risk_score_Feb float,
latest_risk_score_Mar float
);
SELECT * FROM latest_risk_score;
INSERT INTO latest_risk_score (firm_id_2)
VALUES (123),
(456),
(789);
SELECT firm_risk.firm_id_1, date_part('month', assessment_date), firm_risk.risk_score
FROM firm_risk
INNER JOIN latest_risk_score
ON firm_risk.firm_id_1 = latest_risk_score.firm_id_2
GROUP BY firm_risk.firm_id_1, firm_risk.risk_score, assessment_date;
SELECT firm_risk.firm_id_1, date_part('month', assessment_date), firm_risk.risk_score
FROM firm_risk
WHERE assessment_date = (SELECT MAX (assessment_date) FROM firm_risk)
AND assessment_date LIKE '_%-01-2018%';
SELECT firm_risk.firm_id_1, date_part('month', assessment_date)
FROM firm_risk
WHERE assessment_date >= date_part('month', assessment_date - '3 months')
GROUP BY firm_risk.firm_id_1, ('month', assessment_date);
UPDATE latest_risk_score SET latest_risk_score_Jan = (SELECT Risk_Score FROM firm_risk.firm_id_1 WHERE Assessment_Date = (SELECT MAX(Assessment_Date)
FROM firm_risk.firm_id_1 WHERE firm_id_1 = 123 AND Assessment_Date LIKE "2018-01-%" ORDER BY Assessment_Date))
WHERE firm_id_1 = 123;
update latest_risk_score
set latest_risk_score_Feb = (select Risk_Score from firm_risk.firm_id_1 where Assessment_Date = (select max(Assessment_Date)
from firm_risk.firm_id_1 where firm_id_1 = 123 and Assessment_Date like "2018-02-%" order by Assessment_Date))
where firm_id_1 = 123;
update latest_risk_score
set latest_risk_score_Mar = (select Risk_Score from firm_risk.firm_id_1 where Assessment_Date = (select max(Assessment_Date)
from firm_risk.firm_id_1 where firm_id_1 = 123 and Assessment_Date like "2018-03-%" order by Assessment_Date))
where firm_id_1 = 123;
select * from latest_risk_score;
Assuming postgres is relevant (due to existence of "date_part" in question)
CREATE table firm_risk (
firm_id_1 INT,
assessment_date DATE,
risk_score FLOAT
);
INSERT INTO firm_risk (firm_id_1, assessment_date, risk_score)
VALUES (123, '2018-01-01', 0.43),
(123, '2018-01-28', 0.80),
(123, '2018-02-11', 0.28),
(123, '2018-02-23', 0.91),
(123, '2018-03-11', 0.08),
(123, '2018-03-31', 0.60),
(456, '2018-01-04', 0.87),
(456, '2018-01-06', 0.02),
(456, '2018-01-20', 0.39),
(456, '2018-02-03', 0.10),
(456, '2018-03-01', 0.12),
(789, '2018-01-01', 0.20),
(789, '2018-03-01', 0.17);
SELECT
firm_risk.firm_id_1
, max(case when date_part('month',assessment_date) = 1 then firm_risk.risk_score end) jan_risk
, max(case when date_part('month',assessment_date) = 2 then firm_risk.risk_score end) feb_risk
, max(case when date_part('month',assessment_date) = 3 then firm_risk.risk_score end) mar_risk
FROM firm_risk
WHERE date_part('month',assessment_date) in (1,2,3)
GROUP BY
firm_risk.firm_id_1
firm_id_1 | jan_risk | feb_risk | mar_risk
--------: | :------- | :------- | :-------
789 | 0.2 | null | 0.17
456 | 0.87 | 0.1 | 0.12
123 | 0.8 | 0.91 | 0.6
db<>fiddle here

SQL Server Query for average value over a date period

DECLARE #SampleOrderTable TABLE
(
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '12/10/2019', '762.84'),
(2, '11/10/2019', '886.32'),
(3, '11/9/2019', '10245.00')
How do I select the the last 4 days prior to OrderDate and the average Amount over that period?
So result data would be:
pkPersonID Date Amount
------------------------------------
1 '12/7/2019' 190.71
1 '12/8/2019' 190.71
1 '12/9/2019' 190.71
1 '12/10/2019' 190.71
2 '12/7/2019' 221.58
2 '12/8/2019' 221.58
2 '12/9/2019' 221.58
2 '12/10/2019' 221.58
3 '11/6/2019' 2561.25
3 '11/7/2019' 2561.25
3 '11/8/2019' 2561.25
3 '11/9/2019' 2561.25
You may try with the following approach, using DATEADD(), windowed COUNT() and VALUES() table value constructor:
Table:
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '20191210', '762.84'),
(2, '20191210', '886.32'),
(3, '20191109', '10245.00')
Statement:
SELECT
t.pkPersonID,
DATEADD(day, -v.Day, t.OrderDate) AS [Date],
CONVERT(numeric(18, 6), Amount / COUNT(Amount) OVER (PARTITION BY t.pkPersonID)) AS Amount
FROM #SampleOrderTable t
CROSS APPLY (VALUES (0), (1), (2), (3)) v(Day)
ORDER BY t.pkPersonID, [Date]
Result:
pkPersonID Date Amount
1 07/12/2019 00:00:00 190.710000
1 08/12/2019 00:00:00 190.710000
1 09/12/2019 00:00:00 190.710000
1 10/12/2019 00:00:00 190.710000
2 07/12/2019 00:00:00 221.580000
2 08/12/2019 00:00:00 221.580000
2 09/12/2019 00:00:00 221.580000
2 10/12/2019 00:00:00 221.580000
3 06/11/2019 00:00:00 2561.250000
3 07/11/2019 00:00:00 2561.250000
3 08/11/2019 00:00:00 2561.250000
3 09/11/2019 00:00:00 2561.250000
You can use sql functions like AVG, DATEADD and GETDATE.
SELECT AVG(Amount) as AverageAmount
FROM #SampleOrderTable
WHERE OrderDate >= DATEADD(DAY, -4, GETDATE())
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
);
INSERT INTO #SampleOrderTable
(pkPersonID, OrderDate, Amount)
VALUES
(1, '12/20/2019', 762.84),
(2, '12/20/2019', 886.32),
(3, '12/20/2019', 10245.00),
(4, '12/19/2019', 50.00),
(5, '12/19/2019', 100.00),
(6, '09/01/2019', 200.00),
(7, '09/01/2019', 300.00),
(8, '12/15/2019', 400.00),
(9, '12/15/2019', 500.00),
(10, '09/02/2019', 150.00),
(11, '09/02/2019', 1100.00),
(12, '09/02/2019', 1200.00),
(13, '09/02/2019', 1300.00),
(14, '09/02/2019', 1400.00),
(15, '09/02/2019', 1500.00);
SELECT OrderDate,AVG(Amount) AS Average_Value
FROM #SampleOrderTable
WHERE DATEDIFF(DAY, CAST(OrderDate AS DATETIME), CAST(GETDATE() AS Datetime)) <= 4
GROUP BY OrderDate;

Average Duration in Status - Gaps and Islands

I'm trying to calculate the average turnover time of a piece of equipment in REPAIR status.
I was able to create a query containing a list of equipments with their snapshotted status on each day.
+-----------------+--------------+--------+----------------------+------------+------------------+
| equipmentNumber | snapshotDate | status | previousSnapshotDate | prevStatus | statusChangeFlag |
+-----------------+--------------+--------+----------------------+------------+------------------+
| 123456 | 2018-04-29 | ONHIRE | 2018-04-28 | AVAILABLE | 1 |
| 123456 | 2018-04-30 | ONHIRE | 2018-04-29 | ONHIRE | 0 |
| 123456 | 2018-05-01 | ONHIRE | 2018-04-30 | ONHIRE | 0 |
| 123456 | 2018-05-02 | REPAIR | 2018-05-01 | ONHIRE | 1 |
| 123456 | 2018-05-03 | REPAIR | 2018-05-02 | REPAIR | 0 |
| 123456 | 2018-05-04 | ONHIRE | 2018-05-03 | REPAIR | 1 |
| 654321 | 2018-04-30 | REPAIR | 2018-04-29 | AVAILABLE | 1 |
| 654321 | 2018-05-01 | REPAIR | 2018-04-30 | REPAIR | 0 |
| 654321 | 2018-05-02 | REPAIR | 2018-05-01 | REPAIR | 0 |
+-----------------+--------------+--------+----------------------+------------+------------------+
So, in this example, we have 2 equipments, "123456" was in REPAIR status 2 days on 5/2 and 5/3, and "654321" was in REPAIR status 3 days on 4/30, 5/1, and 5/2. That would be an average repair turnaround time of (2+3) / 2 = 2.5 days.
I tried this algorithm (Detect consecutive dates ranges using SQL) but it doesn't seem to be quite working for my needs.
I attempt to answer Gaps and Islands using an Incrementing ID column, create one if one doesn't exist, and the ROW_NUMBER window function
CREATE TABLE T1
([equipmentNumber] int, [snapshotDate] datetime, [status] varchar(6), [previousSnapshotDate] datetime, [prevStatus] varchar(9), [statusChangeFlag] int)
;
INSERT INTO T1
([equipmentNumber], [snapshotDate], [status], [previousSnapshotDate], [prevStatus], [statusChangeFlag])
VALUES
(123456, '2018-04-29 00:00:00', 'ONHIRE', '2018-04-28 00:00:00', 'AVAILABLE', 1),
(123456, '2018-04-30 00:00:00', 'ONHIRE', '2018-04-29 00:00:00', 'ONHIRE', 0),
(123456, '2018-05-01 00:00:00', 'ONHIRE', '2018-04-30 00:00:00', 'ONHIRE', 0),
(123456, '2018-05-02 00:00:00', 'REPAIR', '2018-05-01 00:00:00', 'ONHIRE', 1),
(123456, '2018-05-03 00:00:00', 'REPAIR', '2018-05-02 00:00:00', 'REPAIR', 0),
(123456, '2018-05-04 00:00:00', 'ONHIRE', '2018-05-03 00:00:00', 'REPAIR', 1),
(654321, '2018-04-30 00:00:00', 'REPAIR', '2018-04-29 00:00:00', 'AVAILABLE', 1),
(654321, '2018-05-01 00:00:00', 'REPAIR', '2018-04-30 00:00:00', 'REPAIR', 0),
(654321, '2018-05-02 00:00:00', 'REPAIR', '2018-05-01 00:00:00', 'REPAIR', 0)
;
;WITH cteX
AS(
SELECT
Id = ROW_NUMBER()OVER(ORDER BY T.equipmentNumber, T.snapshotDate)
,T.equipmentNumber
,T.snapshotDate
,T.[status]
,T.previousSnapshotDate
,T.prevStatus
,T.statusChangeFlag
FROM dbo.T1 T
),cteIsland
AS(
SELECT
Island = X.Id - ROW_NUMBER()OVER(ORDER BY X.Id)
,*
FROM cteX X
WHERE X.[status] = 'REPAIR'
)
SELECT * FROM cteIsland
Note the Island Column
Island Id equipmentNumber status
3 4 123456 REPAIR
3 5 123456 REPAIR
4 7 654321 REPAIR
4 8 654321 REPAIR
4 9 654321 REPAIR
Using the Island Column you can get the answer you need with this TSQL
;WITH cteX
AS(
SELECT
Id = ROW_NUMBER()OVER(ORDER BY T.equipmentNumber, T.snapshotDate)
,T.equipmentNumber
,T.snapshotDate
,T.[status]
,T.previousSnapshotDate
,T.prevStatus
,T.statusChangeFlag
FROM dbo.T1 T
),cteIsland
AS(
SELECT
Island = X.Id - ROW_NUMBER()OVER(ORDER BY X.Id)
,*
FROM cteX X
WHERE X.[status] = 'REPAIR'
)
SELECT
AvgDuration =SUM(Totals.IslandCounts) / (COUNT(Totals.IslandCounts) * 1.0)
FROM
(
SELECT
IslandCounts = COUNT(I.Island)
,I.equipmentNumber
FROM cteIsland I
GROUP BY I.equipmentNumber
) Totals
Answer
AvgDuration
2.50000000000000
Here's the SQLFiddle
That method should work to identify the repair periods:
select equipmentNumber, min(snapshotDate), max(snapshotDate)
from (select t.*,
row_number() over (partition by equipmentNumber order by snapshotDate) as seqnum
from t
) t
where status = 'REPAIR'
group by equipmentNumber, dateadd(day, - seqnum, snapshotDate);
You can get the average using a subquery:
select avg(datediff(day, minsd, maxsd) * 1.0)
from (select equipmentNumber, min(snapshotDate) as minsd, max(snapshotDate) as maxsd
from (select t.*,
row_number() over (partition by equipmentNumber order by snapshotDate) as seqnum
from t
) t
where status = 'REPAIR'
group by equipmentNumber, dateadd(day, - seqnum, snapshotDate)
) e;

SQL query to compare time difference

I've used the code below to query and got the output shown. Now, I would like to query as describe below. How should I do it?
Find code 2, check if code 1 comes after code 2 within the same ItemID. If yes, compare the time difference. If time difference is less than 10 seconds, display the two compared rows.
SELECT [Date]
,[Code]
,[ItemId]
,[ItemName]
FROM [dbo].[Log] as t
join Item as d
on t.ItemId = d.Id
where ([Code] = 2 or [Code] = 1) and ([ItemId] > 97 and [ItemId] < 100)
order by [ItemId], [Date]
Output from the above query
Date Code ItemName ItemID
2017-01-06 11:00:49.000 2 B 98
2017-01-06 11:00:49.000 1 A 98
2017-01-06 11:00:55.000 2 B 98
2017-01-06 12:01:56.000 1 A 98
2017-01-06 12:02:37.000 2 B 98
2017-01-06 12:03:49.000 1 A 98
2017-01-06 12:05:44.000 2 B 98
2017-01-06 20:24:32.000 1 A 98
2017-01-06 20:24:55.000 2 B 98
2017-03-14 16:37:42.000 2 B 99
2017-03-14 17:40:24.000 1 A 99
2017-03-14 17:40:25.000 2 B 99
2017-03-14 21:28:46.000 1 A 99
2017-03-15 08:03:07.000 2 B 99
2017-03-15 10:43:00.000 1 A 99
2017-03-15 12:01:17.000 2 B 99
2017-03-15 14:18:19.000 2 B 99
Expected Result
Date Code ItemName ItemID
2017-01-06 11:00:49.000 2 B 98
2017-01-06 11:00:49.000 1 A 98
create table results ([Date] datetime, Code int, ItemName char(1), ItemID int);
insert into results values
('2017-01-06 11:00:49', 2, 'B', 98),
('2017-01-06 11:00:49', 1, 'A', 98),
('2017-01-06 11:00:55', 2, 'B', 98),
('2017-01-06 12:01:56', 1, 'A', 98),
('2017-01-06 12:01:58', 1, 'A', 98),
('2017-01-06 12:02:37', 2, 'B', 98),
('2017-01-06 12:03:49', 1, 'A', 98),
('2017-01-06 12:05:44', 2, 'B', 98),
('2017-01-06 20:24:32', 1, 'A', 98),
('2017-01-06 20:24:55', 2, 'B', 98),
('2017-03-07 00:02:27', 1, 'A', 91),
('2017-03-07 00:02:27', 1, 'A', 58),
('2017-03-14 16:37:42', 2, 'B', 99),
('2017-03-14 17:40:24', 1, 'A', 99),
('2017-03-14 17:40:38', 2, 'B', 99),
('2017-03-14 21:28:46', 1, 'A', 99),
('2017-03-15 08:03:07', 2, 'B', 99),
('2017-03-15 10:43:00', 1, 'A', 99),
('2017-03-15 12:01:17', 2, 'B', 99),
('2017-03-15 14:18:19', 1, 'A', 99);
--= set a reset point when ItemId changes, or there is no correlative (2,1) couples
--= keep in mind this solution assumes that first Code must be 2
--
WITH SetReset AS
(
SELECT [Date], Code, ItemName, ItemId,
CASE WHEN LAG([ItemId]) OVER (PARTITION BY ItemId ORDER BY [Date]) IS NULL
OR ([Code] = 2)
OR ([Code] = COALESCE(LAG([Code]) OVER (PARTITION BY ItemId ORDER BY [Date]), [Code]))
THEN 1 END is_reset
FROM results
)
--
--= set groups according to reset points
--
, SetGroup AS
(
SELECT [Date], Code, ItemName, ItemId,
COUNT(is_reset) OVER (ORDER BY [ItemId], [Date]) grp
FROM SetReset
)
--
--= calcs diff date for each group
, CalcSeconds AS
(
SELECT [Date], Code, ItemName, ItemId,
DATEDIFF(SECOND, MIN([Date]) OVER (PARTITION BY grp), MAX([Date]) OVER (PARTITION BY grp)) dif_sec,
COUNT(*) OVER (PARTITION BY grp) num_items
FROM SetGroup
)
--
--= selects those rows with 2 items by group and date diff less than 10 sec
SELECT [Date], Code, ItemName, ItemId
FROM CalcSeconds
WHERE dif_sec < 10
AND num_items = 2
;
GO
Date | Code | ItemName | ItemId
:------------------ | ---: | :------- | -----:
06/01/2017 11:00:49 | 2 | B | 98
06/01/2017 11:00:49 | 1 | A | 98
Warning: Null value is eliminated by an aggregate or other SET operation.
dbfiddle here

T-SQL: Conditional NULL removal

I need to select only the Room_IDs that have no instances where the Status is NULL.
For example here :
TABLE_A
Room_Id Status Inspection_Date
-----------------------------------
1 NULL 5/15/2015
2 occupied 5/21/2015
2 NULL 1/19/2016
1 occupied 12/16/2015
4 NULL 3/25/2016
3 vacant 8/27/2015
1 vacant 4/17/2016
3 vacant 12/12/2015
3 vacant 3/22/2016
4 vacant 2/2/2015
4 vacant 3/24/2015
My result should look like this:
Room_Id Status Inspection_Date
-----------------------------------
3 vacant 8/27/2015
3 vacant 12/12/2015
3 vacant 3/22/2016
Because Room_ID '3' has no instances where the Status is NULL
Quick example of how to do it:
DECLARE #tTable TABLE(
Room_Id INT,
Status VARCHAR(20),
Inspection_Date DATETIME)
INSERT INTO #tTable VALUES
(1, NULL, '5/15/2015'),
(1,NULL, '5/15/2015'),
(2,'occupied', '5/21/2015'),
(2,NULL, '1/19/2016'),
(1,'occupied', '12/16/2015'),
(4,NULL, '3/25/2016'),
(3,'vacant', '8/27/2015'),
(1,'vacant', '4/17/2016'),
(3,'vacant', '12/12/2015'),
(3,'vacant', '3/22/2016'),
(4,'vacant', '2/2/2015'),
(4,'vacant', '3/24/2015')
SELECT * FROM #tTable T1
WHERE Room_Id NOT IN (SELECT Room_ID FROM #tTable WHERE Status IS NULL)
Gives :
Room_Id | Status | Inspection_Date |
-------------------------------------------------
3 | vacant | 2015-08-27 00:00:00.000
3 | vacant | 2015-12-12 00:00:00.000
3 | vacant | 2016-03-22 00:00:00.000
Try this out:
SELECT *
FROM Table1
WHERE Room_ID NOT IN
(
SELECT DISTINCT Room_ID
FROM Table1
WHERE Status IS NULL
)
The sub query returns a list of unique room id's that, at one time or another, had a NULL status. The outer query looks at that list, and says "Return * where the room_ID IS NOT one those in the subquery.
If you want to try it in SQL Fiddle, here is the Schema:
CREATE TABLE Table1
(Room_ID int, Status varchar(8), Inspection_Date datetime)
;
INSERT INTO Table1
(Room_ID, Status, Inspection_Date)
VALUES
(1, NULL, '2015-05-15 00:00:00'),
(2, 'occupied', '2015-05-21 00:00:00'),
(2, NULL, '2016-01-19 00:00:00'),
(1, 'occupied', '2015-12-16 00:00:00'),
(4, NULL, '2016-03-25 00:00:00'),
(4, 'vacant', '2015-08-27 00:00:00'),
(1, 'vacant', '2016-04-17 00:00:00'),
(3, 'vacant', '2015-12-12 00:00:00'),
(3, 'vacant', '2016-03-22 00:00:00'),
(4, 'vacant', '2015-02-02 00:00:00'),
(4, 'vacant', '2015-03-24 00:00:00'),
(2, NULL, '2015-05-22 00:00:00')
;
As alternative to Hashman, I just prefer to use not exists over not in for these types of queries.
Creating some test data
Note that I just kept the same date for everything since it's not imperative to the question.
create table #table_a (
Room_Id int,
Status varchar(32),
Inspection_Date date);
insert #table_a (Room_Id, Status, Inspection_Date)
values
(1, null, getdate()),
(2, 'occupied', getdate()),
(2, null, getdate()),
(1, 'occupied', getdate()),
(4, null, getdate()),
(3, 'vacant', getdate()),
(1, 'vacant', getdate()),
(3, 'vacant', getdate()),
(3, 'vacant', getdate()),
(4, 'vacant', getdate()),
(4, 'vacant', getdate());
The query
select *
from #table_a t1
where not exists (
select *
from #table_a t2
where t1.Room_Id = t2.Room_Id
and Status is null);
The results
Room_Id Status Inspection_Date
----------- -------------------------------- ---------------
3 vacant 2016-06-17
3 vacant 2016-06-17
3 vacant 2016-06-17
You can use CTE and NOT EXIST like below code
WITH bt
AS ( SELECT RoomId ,
Status,
Inspection_Date
FROM dbo.Table_1
)
SELECT *
FROM bt AS a
WHERE NOT EXISTS ( SELECT 1
FROM bt
WHERE bt.RoomId = a.RoomId
AND bt.Status IS NULL );