Count Days using Dense Row function - sql

I have a table which contains data for all the action performed on a particular object. The table below appears something as follows:
ActionId ProductName ProductPart ActionDate ActionStatusId
1 Bike abc123 3/24/2013 12:00:00 -4:00 7
2 Bike abc123 3/25/2013 12:00:00 -4:00 3
3 Bike abc123 3/25/2013 15:00:00 -4:00 1
4 Bike abc123 3/26/2013 16:00:00 -4:00 3
5 Bike abc123 3/26/2013 16:00:00 -4:00 3
6 Bike abc123 4/26/2013 15:00:00 -4:00 3
7 Bicycle def432 4/27/2013 12:00:00 -4:00 1
8 Bicycle def432 4/26/2013 12:00:00 -4:00 4
9 Bicycle def432 4/27/2013 12:00:00 -4:00 3
10 Bicycle def432 4/28/2013 12:00:00 -4:00 1
Now i need to get productname, productpart, laststatusid (only if it is 3 or 1), [No of days since statusid = 3]
So basically if last statusid based on last actiondate is not 3 or 1 i don't need that data, which i am able to get using row_number function.
But after that i need to count no.of days if statusid = 3. I don't need to count days if the last actionstatusid = 1.
But i have a problem in achieving it, because if the last statusid = 3 then i need to count no.of days not from that instance but the instance when it went to that status till date.
So, for productname Bike i should be getting following result:
ProductName ProductPart ActionStatusId [No. of Days Since Statusid = 3]
Bike abc123 3 34 (i.e. getdate() - 3/26/2013) as it went to statusid = 3 since 3/26/2013 and not taking just last actiondate
Bicycle dec432 1 -
I tried using row_number,dense_rank function but able to achieve it. Is there a way to achieve it?
Also, i am working with sql 2012.

Possible this be helpful for you -
DECLARE #temp TABLE
(
ActionId INT
, ProductName VARCHAR(50)
, ProductPart VARCHAR(50)
, ActionDate DATETIME
, ActionStatusId TINYINT
)
INSERT INTO #temp (ActionId, ProductName, ProductPart, ActionDate, ActionStatusId)
VALUES
(1, 'Bike', 'abc123', '20130324 12:00:00', 7),
(2, 'Bike', 'abc123', '20130325 12:00:00', 3),
(3, 'Bike', 'abc123', '20130325 15:00:00', 1),
(4, 'Bike', 'abc123', '20130326 16:00:00', 3),
(5, 'Bike', 'abc123', '20130326 16:00:00', 3),
(6, 'Bike', 'abc123', '20130426 15:00:00', 3),
(7, 'Bicycle', 'def432', '20130427 12:00:00', 1),
(8, 'Bicycle', 'def432', '20130426 12:00:00', 4),
(9, 'Bicycle', 'def432', '20130427 12:00:00', 3),
(10, 'Bicycle', 'def432', '20130428 12:00:00', 1)
DECLARE #Date DATE = GETDATE()
SELECT
ProductName
, ProductPart
, ActionStatusId
, CASE WHEN ActionStatusId = 3
THEN MAX(DATEDIFF(DAY, ActionDate, #Date))
ELSE 0
END
FROM #temp
WHERE ActionStatusId IN (1, 3)
GROUP BY
ProductName
, ProductPart
, ActionStatusId
Output:
ProductName ProductPart ActionStatusId Count
------------- ------------ -------------- -----------
Bicycle def432 1 0
Bicycle def432 3 2
Bike abc123 1 0
Bike abc123 3 35

Related

Function that returns MAX OR MIN dates based on ID count

I have a task in SQL Server where I need to return the RESULT_DATE column using ID, PRODUCT_ID and DATE columns. Task criteria:
If DATE column is filled once for each PRODUCT_ID then I need to return the only date (like for PRODUCT_ID 1 and 3). Let`s say its MIN date.
If DATE column is filled more than one time (like for PRODUCT_ID 2) then I need to return the next filled DATE row.
Data:
CREATE TABLE #temp (
ID INT,
PRODUCT_ID INT,
[DATE] DATETIME
)
INSERT #temp (ID, PRODUCT_ID, DATE) VALUES
(1, 1, '2008-04-24 00:00:00.000'),
(2, 1, NULL),
(3, 2, '2015-12-09 00:00:00.000'),
(4, 2, NULL),
(5, 2, NULL),
(6, 2, '2022-01-01 13:06:45.253'),
(7, 2, NULL),
(8, 2, '2022-01-19 13:06:45.253'),
(9, 3, '2018-04-25 00:00:00.000'),
(10,3, NULL),
(11,3, NULL)
ID
PRODUCT_ID
DATE
RESULT_DATE
1
1
2008-04-24 00:00:00.000
2008-04-24 00:00:00.000
2
1
NULL
2008-04-24 00:00:00.000
3
2
2015-12-09 00:00:00.000
2022-01-01 13:06:45.253
4
2
NULL
2022-01-01 13:06:45.253
5
2
NULL
2022-01-01 13:06:45.253
6
2
2022-01-01 13:06:45.253
2022-01-19 13:06:45.253
7
2
NULL
2022-01-19 13:06:45.253
8
2
2022-01-19 13:06:45.253
2022-01-19 13:06:45.253
9
3
2018-04-25 00:00:00.000
2018-04-25 00:00:00.000
10
3
NULL
2018-04-25 00:00:00.000
11
3
NULL
2018-04-25 00:00:00.000
I have tried different techniques, for example using LEAD and LAG SQL function combinations. The latest script: (However, still not working)
SELECT
COALESCE(DATE,
CAST(
SUBSTRING(
MAX(CAST(DATE AS BINARY(4)) + CAST(DATE AS BINARY(4))) OVER ( PARTITION BY PRODUCT_ID ORDER BY DATE ROWS UNBOUNDED PRECEDING)
,5,4)
AS INT)
) AS RESULT_DATE,
*
FROM TABLE
You can use a CTE, Select all rows with a non-NULL Date giving each a row_number, then use a second CTE to fetch all rows from the first CTE equivalent to the date with the largest row number per product_id that is less than 3. Finally join this CTE to the original table to supply the 2nd Date to each row:
Set Up
CREATE TABLE #temp (
ID INT,
PRODUCT_ID INT,
MyDATE DATETIME
)
INSERT #temp (ID, PRODUCT_ID, MyDate)
VALUES
(1, 1, '2008-04-24 00:00:00.000'),
(2, 1, NULL),
(3, 2, '2015-12-09 00:00:00.000'),
(4, 2, NULL),
(5, 2, NULL),
(6, 2, '2022-01-01 13:06:45.253'),
(7, 2, NULL),
(8, 2, '2022-01-19 13:06:45.253'),
(9, 3, '2018-04-25 00:00:00.000'),
(10,3, NULL),
(11,3, NULL);
Query:
;WITH CTE
AS
(
SELECT ID, Product_ID, MyDate,
ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Id) AS rn
from #temp
WHERE MyDate IS NOT NULL
),
CTE2
AS
(
SELECT *
FROM CTE C1
WHERE C1.rn < 3
AND
C1.rn =
(SELECT MAX(rn) FROM CTE WHERE Product_Id = C1.Product_Id AND rn<3)
)
SELECT T.Id, T.Product_Id, T.MyDate, C.MyDate As Result_date
FROM #temp T
INNER JOIN CTE2 C
ON T.Product_Id = C.Product_Id
ORDER BY T.Id;
Results:
Id Product_Id MyDate Result_Date
1 1 2008-04-24 00:00:00.000 2008-04-24 00:00:00.000
2 1 NULL 2008-04-24 00:00:00.000
3 2 2015-12-09 00:00:00.000 2022-01-01 13:06:45.253
4 2 NULL 2022-01-01 13:06:45.253
5 2 NULL 2022-01-01 13:06:45.253
6 2 2022-01-01 13:06:45.253 2022-01-01 13:06:45.253
7 2 NULL 2022-01-01 13:06:45.253
8 2 2022-01-19 13:06:45.253 2022-01-01 13:06:45.253
9 3 2018-04-25 00:00:00.000 2018-04-25 00:00:00.000
10 3 NULL 2018-04-25 00:00:00.000
11 3 NULL 2018-04-25 00:00:00.000

Histogram of orders by range of dates

I'm trying to create a histogram based on interval of dates and total number of orders but im having a hard time binning it through SQL.
A simplified table can be seen below
customer_id
Date
count_orders
1
01-01-2020
5
1
01-13-2020
26
1
02-06-2020
11
2
01-17-2020
9
3
02-04-2020
13
3
03-29-2020
24
4
04-05-2020
1
5
02-23-2020
10
6
03-15-2020
7
6
04-18-2020
32
...
...
...
and im thinking of binning it into 20 day intervals but the only thing I can think about is do a
SUM(CASE WHEN Date BETWEEN <interval1_startdate> AND <interval1_enddate> ...)
method per interval which if used into the actual data (which contains millions of row) is quite exhausting. So i need help in automating the binning part.
Desired output would either be
1)
interval
total_count
01-01-2020 - 01-20-2020
31
01-21-2020 - 02-10-2020
24
02-10-2020 - 03-01-2020
10
...
...
or 2)
start
end
total_count
01-01-2020
01-20-2020
31
01-21-2020
02-10-2020
24
02-10-2020
03-01-2020
10
...
...
...
Do you have any ideas?
You can group by the (current date - minimum date)/20. For preso something like this:
WITH dataset (customer_id, Date, count_orders) AS (
VALUES (1, date_parse('01-01-2020', '%m-%d-%Y'), 5),
(1, date_parse('01-13-2020', '%m-%d-%Y'), 26),
(1, date_parse('02-06-2020', '%m-%d-%Y'), 11),
(2, date_parse('01-17-2020', '%m-%d-%Y'), 9),
(3, date_parse('02-04-2020', '%m-%d-%Y'), 13),
(3, date_parse('03-29-2020', '%m-%d-%Y'), 24),
(4, date_parse('04-05-2020', '%m-%d-%Y'), 1),
(5, date_parse('02-23-2020', '%m-%d-%Y'), 10),
(6, date_parse('03-15-2020', '%m-%d-%Y'), 7),
(6, date_parse('04-18-2020', '%m-%d-%Y'), 32)
)
SELECT date_add('day', 20 * grp, min(min_date)) interval_end,
date_add('day', 20 * (grp + 1) - 1, min(min_date)) interval_end,
sum(count_orders) total_count
FROM (
SELECT *,
date_diff('day', min(date) over (), date) / 20 as grp,
min(date) over () min_date
FROM dataset
)
group by grp
order by 1
Output:
interval_end
interval_end
total_count
2020-01-01 00:00:00.000
2020-01-20 00:00:00.000
40
2020-01-21 00:00:00.000
2020-02-09 00:00:00.000
24
2020-02-10 00:00:00.000
2020-02-29 00:00:00.000
10
2020-03-01 00:00:00.000
2020-03-20 00:00:00.000
7
2020-03-21 00:00:00.000
2020-04-09 00:00:00.000
25
2020-04-10 00:00:00.000
2020-04-29 00:00:00.000
32
You can get the intervals using CTE and then get the total using cross apply.
Drop table Tbl
Create Table Tbl (customer_id Int, [date] Date, count_orders Int)
Insert Into Tbl (customer_id, [date], count_orders)
Values (1,'2020-01-01', 5),
(1,'2020-01-13',26),
(1,'2020-02-06',11),
(2,'2020-01-17',9),
(3,'2020-02-04',13),
(3,'2020-03-29',24),
(4,'2020-04-05',1),
(5,'2020-02-23',10),
(6,'2020-03-15',7),
(6,'2020-04-18',32)
;With A As (
Select Min([date]) As start, DateAdd(dd,19,Min([date])) As [end], Max([date]) As [max]
From Tbl
Union All
Select DateAdd(dd,1,[end]) As start, DateAdd(dd,20,[end]) As [end], [max]
From A
Where [end]<[max])
Select A.[start], A.[end], T.total_count
From A Cross Apply (Select SUM(count_orders) As total_count
From Tbl Where [date] between A.[start] And A.[end]) As T
Result:
start end total_count
---------- ---------- -----------
2020-01-01 2020-01-20 40
2020-01-21 2020-02-09 24
2020-02-10 2020-02-29 10
2020-03-01 2020-03-20 7
2020-03-21 2020-04-09 25
2020-04-10 2020-04-29 32

SQL Server Query for average value over a date period

DECLARE #SampleOrderTable TABLE
(
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '12/10/2019', '762.84'),
(2, '11/10/2019', '886.32'),
(3, '11/9/2019', '10245.00')
How do I select the the last 4 days prior to OrderDate and the average Amount over that period?
So result data would be:
pkPersonID Date Amount
------------------------------------
1 '12/7/2019' 190.71
1 '12/8/2019' 190.71
1 '12/9/2019' 190.71
1 '12/10/2019' 190.71
2 '12/7/2019' 221.58
2 '12/8/2019' 221.58
2 '12/9/2019' 221.58
2 '12/10/2019' 221.58
3 '11/6/2019' 2561.25
3 '11/7/2019' 2561.25
3 '11/8/2019' 2561.25
3 '11/9/2019' 2561.25
You may try with the following approach, using DATEADD(), windowed COUNT() and VALUES() table value constructor:
Table:
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '20191210', '762.84'),
(2, '20191210', '886.32'),
(3, '20191109', '10245.00')
Statement:
SELECT
t.pkPersonID,
DATEADD(day, -v.Day, t.OrderDate) AS [Date],
CONVERT(numeric(18, 6), Amount / COUNT(Amount) OVER (PARTITION BY t.pkPersonID)) AS Amount
FROM #SampleOrderTable t
CROSS APPLY (VALUES (0), (1), (2), (3)) v(Day)
ORDER BY t.pkPersonID, [Date]
Result:
pkPersonID Date Amount
1 07/12/2019 00:00:00 190.710000
1 08/12/2019 00:00:00 190.710000
1 09/12/2019 00:00:00 190.710000
1 10/12/2019 00:00:00 190.710000
2 07/12/2019 00:00:00 221.580000
2 08/12/2019 00:00:00 221.580000
2 09/12/2019 00:00:00 221.580000
2 10/12/2019 00:00:00 221.580000
3 06/11/2019 00:00:00 2561.250000
3 07/11/2019 00:00:00 2561.250000
3 08/11/2019 00:00:00 2561.250000
3 09/11/2019 00:00:00 2561.250000
You can use sql functions like AVG, DATEADD and GETDATE.
SELECT AVG(Amount) as AverageAmount
FROM #SampleOrderTable
WHERE OrderDate >= DATEADD(DAY, -4, GETDATE())
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
);
INSERT INTO #SampleOrderTable
(pkPersonID, OrderDate, Amount)
VALUES
(1, '12/20/2019', 762.84),
(2, '12/20/2019', 886.32),
(3, '12/20/2019', 10245.00),
(4, '12/19/2019', 50.00),
(5, '12/19/2019', 100.00),
(6, '09/01/2019', 200.00),
(7, '09/01/2019', 300.00),
(8, '12/15/2019', 400.00),
(9, '12/15/2019', 500.00),
(10, '09/02/2019', 150.00),
(11, '09/02/2019', 1100.00),
(12, '09/02/2019', 1200.00),
(13, '09/02/2019', 1300.00),
(14, '09/02/2019', 1400.00),
(15, '09/02/2019', 1500.00);
SELECT OrderDate,AVG(Amount) AS Average_Value
FROM #SampleOrderTable
WHERE DATEDIFF(DAY, CAST(OrderDate AS DATETIME), CAST(GETDATE() AS Datetime)) <= 4
GROUP BY OrderDate;

SQL Server - Selecting periods without changes in data

What I am trying to do is to select periods of time where the rest of data in the table was stable based on one column and check was there a change in second column value in this period.
Table:
create table #stable_periods
(
[Date] date,
[Car_Reg] nvarchar(10),
[Internal_Damages] int,
[External_Damages] int
)
insert into #stable_periods
values ('2015-08-19', 'ABC123', 10, 10),
('2015-08-18', 'ABC123', 9, 10),
('2015-08-17', 'ABC123', 8, 9),
('2015-08-16', 'ABC123', 9, 9),
('2015-08-15', 'ABC123', 10, 10),
('2015-08-14', 'ABC123', 10, 10),
('2015-08-19', 'ABC456', 5, 3),
('2015-08-18', 'ABC456', 5, 4),
('2015-08-17', 'ABC456', 8, 4),
('2015-08-16', 'ABC456', 9, 4),
('2015-08-15', 'ABC456', 10, 10),
('2015-01-01', 'ABC123', 1, 1),
('2015-01-01', 'ABC456', NULL, NULL);
--select * from #stable_periods
-- Unfortunately I can’t post pictures yet but you get the point of how the table looks like
What I would like to receive is
Car_Reg FromDate ToDate External_Damages Have internal damages changed in this period?
ABC123 2015-08-18 2015-08-19 10 Yes
ABC123 2015-08-16 2015-08-17 9 Yes
ABC123 2015-08-14 2015-08-15 10 No
ABC123 2015-01-01 2015-01-01 1 No
ABC456 2015-08-19 2015-08-19 3 No
ABC456 2015-08-16 2015-08-18 4 Yes
ABC456 2015-08-15 2015-08-15 10 No
ABC456 2015-01-01 2015-01-01 NULL NULL
Basically to build period frames where [External_Damages] were constant and check did the [Internal_Damages] change in the same period (doesn't matter how many times).
I spend a lot of time trying but I am afraid that my level of abstraction thinking in much to low...
Will be great to see any suggestions.
Thanks,
Bartosz
I believe this is a form of Islands Problem.
Here is a solution using ROW_NUMBER and GROUP BY:
SQL Fiddle
WITH CTE AS(
SELECT *,
RN = DATEADD(DAY, - ROW_NUMBER() OVER(PARTITION BY Car_reg, External_Damages ORDER BY [Date]), [Date])
FROM #stable_periods
)
SELECT
Car_Reg,
FromDate = MIN([Date]),
ToDate = MAX([Date]) ,
External_Damages,
Change =
CASE
WHEN MAX(External_Damages) IS NULL THEN NULL
WHEN COUNT(DISTINCT Internal_Damages) > 1 THEN 'Yes'
ELSE 'No'
END
FROM CTE c
GROUP BY Car_Reg, External_Damages, RN
ORDER BY Car_Reg, ToDate DESC

SQL - Setting Value From Hierarchical Children

I am writing an application which gets task data from a project planning MS SQL table (let's call the table tasks). For simplicity the table fields can be thought of as follows:
task_id, parent_id, name, start_date, end_date
All parent tasks have NULL as start and end dates. Only the children (with no children of their own) have a start and end date.
I want to get the tasks data and in the process set the start date of each parent based upon the earliest start date of all the parent's children and recursive grandchildren and set the end date to be the latest end date of all the children and recursive grandchildren. Is this possible please?
I assume from your question that you use Sql Server. I think this is what you want. It is done with recursive common table expression. It begins with leaf children and goes up to top most parents:
DECLARE #t TABLE(id INT, pid INT, sd DATE, ed DATE)
INSERT INTO #t VALUES
(1, NULL, NULL, NULL),
(2, 1, NULL, NULL),
(3, 2, '20150201', '20150215'),
(4, 2, '20150101', '20150201'),
(5, 1, NULL, NULL),
(6, 5, '20150301', '20150401'),
(7, 1, NULL, NULL),
(8, 7, NULL, NULL),
(9, 8, '20140101', '20141230'),
(10, 8, '20140102', '20141231')
;WITH cte AS(
SELECT * FROM #t WHERE sd IS NOT NULL
UNION ALL
SELECT t.id, t.pid, c.sd, c.ed FROM #t t
JOIN cte c ON c.pid = t.id
)
SELECT id, pid, MIN(sd) AS sd, MAX(ed) AS ed
FROM cte
GROUP BY id, pid
ORDER BY id
Output:
id pid sd ed
1 NULL 2014-01-01 2015-04-01
2 1 2015-01-01 2015-02-15
3 2 2015-02-01 2015-02-15
4 2 2015-01-01 2015-02-01
5 1 2015-03-01 2015-04-01
6 5 2015-03-01 2015-04-01
7 1 2014-01-01 2014-12-31
8 7 2014-01-01 2014-12-31
9 8 2014-01-01 2014-12-30
10 8 2014-01-02 2014-12-31