Create new date ranges from overlapping date ranges and assign an ID

Create new date ranges from overlapping date ranges and assign an ID - sql

I have the following table
ID | START_DATE | END_DATE | FEATURE
---------------------------------------
001 | 1995-08-01 | 1997-12-31 | 1
001 | 1998-01-01 | 2017-03-31 | 4
001 | 2000-06-14 | 2017-03-31 | 5
001 | 2013-04-01 | 2017-03-31 | 8
002 | 1929-10-01 | 2006-05-25 | 1
002 | 2006-05-26 | 2016-11-10 | 4
002 | 2006-05-26 | 2016-11-10 | 7
002 | 2013-04-01 | 2016-11-10 | 8
I want to convert this table into a consolidated table which will look for overlapping date ranges and then combine these into new rows. Creating a non-overlapping set of date ranges.
The bit that I need the most help with is the consolidations of the 'feature' column which will concatenate each feature into the format below.
ID | START_DATE | END_DATE | FEATURE
---------------------------------------
001 | 1995-08-01 | 1997-12-31 | 1
001 | 1998-01-01 | 2000-06-13 | 4
001 | 2000-06-14 | 2013-03-31 | 45
001 | 2013-04-01 | 2017-03-31 | 458
002 | 1929-10-01 | 2006-05-25 | 1
002 | 2006-05-26 | 2013-03-31 | 47
002 | 2013-04-01 | 2016-11-10 | 478
I've used the following to create the test data.
CREATE TABLE #TEST (
[ID] [varchar](10) NULL,
[START_DATE] [date] NULL,
[END_DATE] [date] NULL,
[FEATURE] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO #TEST
VALUES
('001','1998-01-01','2017-03-31',4),
('001','2000-06-14','2017-03-31',5),
('001','2013-04-01','2017-03-31',8),
('001','1995-08-01','1997-12-31',1),
('002','2006-05-26','2016-11-10',4),
('002','2006-05-26','2016-11-10',7),
('002','2013-04-01','2016-11-10',8),
('002','1929-10-01','2006-05-25',1)

You can use apply :
select distinct t.id, t.START_DATE, t.END_DATE, coalesce(tt.feature, t.feature) as feature
from #test t outer apply
( select ' '+t1.feature
from #test t1
where t1.id = t.id and t1.end_date = t.end_date and t1.start_date <= t.start_date
order by t1.start_date
for xml path('')
) tt(feature)
order by t.id, t.START_DATE;
Here is a db<>fiddle.

Here is a query that will set DATE_END. It looks like you are using SQL Server, but without or small modifications it will run almost on every db.
with grouped_data as
(
select ID, START_DATE, END_DATE from #TEST group by ID, START_DATE, END_DATE
)
,cte as
(
select
*,
ROW_NUMBER() over (partition by ID order by start_date) as nr
from grouped_data
)
select
c1.ID
,c1.START_DATE
,case when c1.nr <> 1 then isnull(DATEADD(DAY, -1, c2.START_DATE), c1.END_DATE) ELSE c1.END_DATE end as END_DATE
from cte as c1
left join cte as c2
on c1.ID = c2.ID
and c1.nr = c2.nr -1
order by c1.ID
If you have SQL Server 2017 you can easly transform FEATURE using STRING_AGG.

Related

Difference between consecutive rows in SQL, inclusive of the first and last row?

I'm trying to get the successive differences of rows of data in SQL, including differences between first and last row and 0.
I have two tables that look like this
+------------+-------+ +------------+-------+
| Date | Name | | Date | Value |
+------------+-------+ +------------+-------+
| 2019-10-10 | AAA | | 2019-10-11 | 100 |
| 2019-10-11 | BBB | | 2019-10-12 | 150 |
| 2019-10-12 | CCC | | 2019-10-14 | 300 |
| 2019-10-13 | DDD | +------------+-------+
| 2019-10-14 | EEE |
| 2019-10-15 | FFF |
+------------+-------+
The end result I'm looking for is
+------------+-------+-------+---------------+------------+
| Date | Name | Value | PreviousValue | Difference |
+------------+-------+-------+---------------+------------+
| 2019-10-11 | BBB | 100 | 0 | 100 |
| 2019-10-12 | CCC | 150 | 100 | 50 |
| 2019-10-14 | EEE | 300 | 150 | 150 |
| 2019-10-15 | FFF | 0 | 300 | -300 |
+------------+-------+-------+---------------+------------+
I can get the first row by using LAG, but I don't quite know how to get the last row at the same time.
SELECT
d.[Date],
d.[Name],
v.[Value],
[PreviousValue] = COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
[PreviousLossAmount] = v.[Value] - COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
FROM
[Dates] d
LEFT JOIN
[Values] v
ON
d.[Date] = v.[Date]
Note that in reality, my tables are more complex, and I'd need to group and partition by multiple columns.

In a subquery, you can LEFT JOIN, and use ROW_NUMBER() to identify the last record. Then in the outer query you can filter out rows that did not match in the LEFT JOIN while allowing the last record.
Other considerations:
you just need to order LAG() by d.[Date] instead of v.[Date]. This properly handles the case when the left join comes up empty (ie when v.[Date] is null).
You also want to use COALESCE() on v.[Value], since this may come up null too.
Query:
SELECT
[Date],
[Name],
[Value],
[PreviousValue],
[PreviousLossAmount]
FROM (
SELECT
d.[Date],
d.[Name],
[Value] = COALESCE(v.[Value], 0),
[PreviousValue] = COALESCE(LAG(v.[Value]) OVER (ORDER BY d.[Date]), 0),
[PreviousLossAmount] = COALESCE(v.[Value], 0)
- COALESCE(LAG(v.[Value]) OVER (ORDER BY d.[Date]), 0),
rn = ROW_NUMBER() OVER(ORDER BY d.[Date] DESC),
vDate = v.[Date]
FROM [Dates] d
LEFT JOIN [Values] v ON d.[Date] = v.[Date]
) t
WHERE vDate IS NOT NULL OR RN = 1
ORDER BY [Date]
Demo on DB Fiddle:
Date | Name | Value | PreviousValue | PreviousLossAmount
:------------------ | :--- | ----: | ------------: | -----------------:
11/10/2019 00:00:00 | BBB | 100 | 0 | 100
12/10/2019 00:00:00 | CCC | 150 | 100 | 50
14/10/2019 00:00:00 | EEE | 300 | 0 | 300
15/10/2019 00:00:00 | FFF | 0 | 300 | -300

You seem to want:
select d.[Date], d.[Name], v.[Value],
PreviousValue = COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
PreviousLossAmount]= v.[Value] - COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
from dates d join
[Values] v
on d.[Date] = v.[Date]
union all
select top (1) d.date, d.name, 0 as value, v.value, - v.value
from dates d cross join
(select top (1) v.*
from values v
order by v.date desc
) v
order by d.date;

SQL - Get predicted dates if not available

I got a table #a as follows:
ID | TYPE_ID | CREATED_DT
============================
001 | 111 | 2019-08-28
001 | 111 | 2018-08-12
001 | 111 | 2017-08-23
001 | 111 | 2016-08-14
001 | 111 | 2015-08-17
001 | 111 | 2014-08-11
001 | 112 | 2019-05-31
001 | 112 | 2018-05-28
I would like to get my final output as follows:
ID | TYPE_ID | CREATED_DT
============================
001 | 111 | 2019-08-28
001 | 111 | 2018-08-12
001 | 111 | 2017-08-23
001 | 111 | 2016-08-14
001 | 111 | 2015-08-17
001 | 111 | 2014-08-11
001 | 112 | 2019-05-31
001 | 112 | 2018-05-28
001 | 112 | 2017-05-31 --Predict YEAR end dates if not available
001 | 112 | 2016-05-31
001 | 112 | 2015-05-31
001 | 112 | 2014-05-31
The final result set should predict dates upto 6 month end dates per TYPE_ID if the dates are not available(TYPE_ID = 112 has only 2 dates available). I'm sure we can do this using DATEADD and DATEDIFF functions to predict dates but a bit complicated for my knowledge. Any help?
Query that I'm trying, but not exactly there:
select *,
ROW_NUMBER() OVER(PARTITION BY ID, TYPE_ID ORDER BY CREATED_DT DESC) AS RN
INTO #B
from #a;
;WITH CTE(ID, TYPE_ID, CREATED_DT, RN)
AS(
SELECT
ID,
TYPE_ID,
CREATED_DT,
RN
FROM #B
WHERE RN = 1 --Instead of RN = 1 I would like to get this till all
--available dates, so that I can go to recursive part for
--predicting non-available dates
UNION ALL
SELECT
A.ID,
A.TYPE_ID,
DATEADD(yy, -1, CTE.CREATED_DT)AS CREATED_DT,
CTE.RN +1 AS RN
FROM #B AS A
INNER JOIN CTE ON CTE.ID = A.ID
AND CTE.TYPE_ID = A.TYPE_ID
AND CTE.RN < 6
AND A.RN = 1
)

Because there isn't an id to identify each row, you could use a rank window function to take the last row in this table. Then from the last row date you can dateadd -1 year to each date based on the rank. Then at the end UNION the intial CTE with the predictive CTE.
;WITH CTE
AS (
SELECT d.ID
,d.Type_ID
,d.CREATED_DT
,RANK() OVER (
ORDER BY Type_ID
,Created_DT
) AS OrderOf
FROM datetable d
)
,CTE2
AS (
SELECT M.ID
,m.Type_ID
,DATEADD(Year, - 1, m.CREATED_DT) AS Created_DT
,M.OrderOf + 1 AS OrderOf
FROM CTE M
WHERE OrderOf = 8
)
,CTE3 (
n
,ID
,Type_Id
,Created_DT
,OrderOf
)
AS (
SELECT 0
,M.ID
,m.Type_ID
,m.CREATED_DT
,M.OrderOf AS OrderOf
FROM CTE2 M
UNION ALL
SELECT n + 1
,T.ID
,T.Type_ID
,DATEADD(YEAR, - 1, T.CREATED_DT)
,T.OrderOf + 1 AS OrderOf
FROM CTE3 T
WHERE n < 4
)
SELECT ID
,Type_ID
,Created_DT
FROM CTE3
UNION
SELECT ID
,Type_ID
,Created_DT
FROM CTE
ORDER BY Type_Id
,Created_DT DESC;

select datetime closest to a specified value for every day in a year

I have 2 simple tables defined as:
CREATE TABLE [dbo].[shop](
[id] [uniqueidentifier] NOT NULL,
[name] [ntext] NOT NULL,
[brand] [ntext] NULL
)
CREATE TABLE [dbo].[shop_history](
[id] [int] NOT NULL,
[shopid] [uniqueidentifier] NOT NULL, (references shop.id)
[totalstockval] [int] NOT NULL,
[date] [datetime2](0) NOT NULL
)
With data:
**dbo.shop**
id | name | brand
--------------------------
1 | Bow Rd | Tesco
2 | Wren Rd | Tesco
3 | Skye Rd | Safeway
**dbo.shop_history**
id | shopid | totalstockval | date
----------------------------------------------
997 | 1 | 19923031 | 2017-02-01 08:00
998 | 1 | 19323322 | 2017-02-01 08:30
999 | 1 | 19283873 | 2017-02-01 09:45
1000 | 2 | 14949321 | 2017-02-01 07:00
1001 | 2 | 12312312 | 2017-02-01 09:30
1002 | 3 | 12232344 | 2017-01-31 23:45
1003 | 3 | 12999222 | 2017-02-01 09:45
I have a full years worth of similar data. I want to query the data to find the latest stock value each day BEFORE 09:00, even if that occurred the previous day.
The resultset I'm trying to achieve would look like:
shop.id | name | brand | totalstockval | date
---------------------------------------------------------------
1 | Bow Rd | Tesco | 19323322 | 2017-02-01 08:30
2 | Wren Rd | Tesco | 14949321 | 2017-02-01 07:00
3 | Skye Rd | Safeway | 12232344 | 2017-01-31 23:45
repeated for each day of the year. If there's no value row on a particular day, use the latest available value.
I have a feeling I would need a tally table containing every date (or datetime) that I want a price for, but I'm not sure of the query. How can I achieve a resultset similar to the above example?

could be you need a join with a subselect for max date < 9
select t.shopid, b.name, b.brand, t.max_date, a.totalstockvale
from shop_history a
inner join (
select shopid, max(date) max_date
from shop_history a
where time(date) < '09:00'
group by a.shopid ) t on a.shopid = t.shopid and a.date = t.date
inner join shop on a.shopid = b.shopid

This is a tricky question. You want the latest shop_history record for each shop where the day ends at 9:00 a.m. One method is to subtract 9 hours and do the calculation based on the resulting date:
select sh.*
from (select sh.*,
row_number() over (partition by shopid,
cast(dateadd(hour, -9, date) as date)
order by date desc
) as seqnum
from shop_history sh
) sh
where seqnum = 1

Stored procedure to find Highest Scores in SQL table

+-------+-----------+-------+
| Name | Date | Score |
+-------+-----------+-------+
| Name1 | 1/3/2016 | 80 |
| Name2 | 1/5/2016 | 76 |
| Name3 | 1/29/2016 | 77 |
| Name4 | 1/30/2016 | 40 |
| Name4 | 1/17/2016 | 79 |
| Name5 | 1/1/2016 | 90 |
| Name2 | 1/3/2016 | 79 |
| Name5 | 1/27/2016 | 92 |
| Name2 | 1/27/2016 | 99 |
| Name1 | 1/21/2016 | 93 |
| Name2 | 1/3/2016 | 70 |
| Name1 | 2/15/2016 | 80 |
| Name3 | 3/31/2016 | 84 |
+-------+-----------+-------+
I have this table and need to find the highest score for each name in a given period of time (i.e. between 01/01/2016 and 01/31/2016) and display Name, Date and Highest Score.
Please help! Thank you - Humberto Goez

You're going to have a problem with duplicate rows, as you don't have a primary key shown. This query would work, but it would be better to employ a primary key
Of course, this is just the SQL...
SELECT Name, [Date], Score
FROM MyTable T1
WHERE T1.Score = (SELECT MAX(T2.Score)
FROM MyTable T2
WHERE T2.Name = T1.Name)
AND [Date] >= #StartDate
AND [Date] <= #EndDate

If you are using SQL Server then you could use:
DECLARE #start_date DATE = '2016-01-01T00:00:00'
,#end_date DATE = '2016-01-31T00:00:00';
SELECT TOP 1 WITH TIES [Name],[Date],[Score]
FROM tab_name t
WHERE [Date] BETWEEN #start_date AND #end_date
ORDER BY RANK() OVER(PARTITION BY Name ORDER BY Score DESC);
LiveDemo

A good approach is CTE. Sample query looks like this:
declare #StartDate datetime = '2016-01-01',
#EndDate datetime = '2016-06-01'
;with scores as (
SELECT Name, [Date], Score,
row_number() over(partition by name /*start over each name*/
order by Score desc /*top first*/,[Date] /*earlier first*/) rn
FROM MyTable
)
select * from scores
where rn = 1

SQL - Grouping with aggregation

I have a table (TABLE1) that lists all employees with their Dept IDs, the date they started and the date they were terminated (NULL means they are current employees).
I would like to have a resultset (TABLE2) , in which every row represents a day starting since the first employee started( in the sample table below, that date is 20090101 ), till today. (the DATE field). I would like to group the employees by DeptID and calculate the total number of employees for each row of TABLE2.
How do I this query? Thanks for your help, in advance.
TABLE1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20090101 NULL
001 234 20110101 20120101
TABLE2
DeptID Date EmployeeCount
-----------------------------------
001 20090101 1
001 20090102 1
... ... 1
001 20100101 2
001 20100102 2
... ... 2
001 20110101 3
001 20110102 3
... ... 3
001 20120101 1
001 20120102 1
001 20120103 1
... ... 1

This will work if you have a date look up table. You will need to specify the department ID. See it in action.
Query
SELECT d.dt, SUM(e.ecount) AS RunningTotal
FROM dates d
INNER JOIN
(SELECT b.dt,
CASE
WHEN c.ecount IS NULL THEN 0
ELSE c.ecount
END AS ecount
FROM dates b
LEFT JOIN
(SELECT a.DeptID, a.dt, SUM([count]) AS ecount
FROM
(SELECT DeptID, EmployeeID, 1 AS [count], StartDate AS dt FROM TABLE1
UNION ALL
SELECT DeptID, EmployeeID,
CASE
WHEN EndDate IS NOT NULL THEN -1
ELSE 0
END AS [count], EndDate AS dt FROM TABLE1) a
WHERE a.dt IS NOT NULL AND DeptID = 1
GROUP BY a.DeptID, a.dt) c ON c.dt = b.dt) e ON e.dt <= d.dt
GROUP BY d.dt
Result
| DT | RUNNINGTOTAL |
-----------------------------
| 2009-01-01 | 1 |
| 2009-02-01 | 1 |
| 2009-03-01 | 1 |
| 2009-04-01 | 1 |
| 2009-05-01 | 1 |
| 2009-06-01 | 1 |
| 2009-07-01 | 1 |
| 2009-08-01 | 1 |
| 2009-09-01 | 1 |
| 2009-10-01 | 1 |
| 2009-11-01 | 1 |
| 2009-12-01 | 1 |
| 2010-01-01 | 2 |
| 2010-02-01 | 2 |
| 2010-03-01 | 2 |
| 2010-04-01 | 2 |
| 2010-05-01 | 2 |
| 2010-06-01 | 2 |
| 2010-07-01 | 2 |
| 2010-08-01 | 2 |
| 2010-09-01 | 2 |
| 2010-10-01 | 2 |
| 2010-11-01 | 2 |
| 2010-12-01 | 2 |
| 2011-01-01 | 3 |
| 2011-02-01 | 3 |
| 2011-03-01 | 3 |
| 2011-04-01 | 3 |
| 2011-05-01 | 3 |
| 2011-06-01 | 3 |
| 2011-07-01 | 3 |
| 2011-08-01 | 3 |
| 2011-09-01 | 3 |
| 2011-10-01 | 3 |
| 2011-11-01 | 3 |
| 2011-12-01 | 3 |
| 2012-01-01 | 1 |
Schema
CREATE TABLE TABLE1 (
DeptID tinyint,
EmployeeID tinyint,
StartDate date,
EndDate date)
INSERT INTO TABLE1 VALUES
(1, 123, '2010-01-01', '2012-01-01'),
(1, 124, '2009-01-01', NULL),
(1, 234, '2011-01-01', '2012-01-01')
CREATE TABLE dates (
dt date)
INSERT INTO dates VALUES
('2009-01-01'), ('2009-02-01'), ('2009-03-01'), ('2009-04-01'), ('2009-05-01'),
('2009-06-01'), ('2009-07-01'), ('2009-08-01'), ('2009-09-01'), ('2009-10-01'),
('2009-11-01'), ('2009-12-01'), ('2010-01-01'), ('2010-02-01'), ('2010-03-01'),
('2010-04-01'), ('2010-05-01'), ('2010-06-01'), ('2010-07-01'), ('2010-08-01'),
('2010-09-01'), ('2010-10-01'), ('2010-11-01'), ('2010-12-01'), ('2011-01-01'),
('2011-02-01'), ('2011-03-01'), ('2011-04-01'), ('2011-05-01'), ('2011-06-01'),
('2011-07-01'), ('2011-08-01'), ('2011-09-01'), ('2011-10-01'), ('2011-11-01'),
('2011-12-01'), ('2012-01-01')

you need somthing along these lines.
SELECT *
, ( SELECT COUNT(EmployeeID) AS EmployeeCount
FROM TABLE1 AS f
WHERE t.[Date] BETWEEN f.BeginDate AND f.EndDate
)
FROM ( SELECT DeptID
, BeginDate AS [Date]
FROM TABLE1
UNION
SELECT DeptID
, EndDate AS [Date]
FROM TABLE1
) AS t
EDIT since OP clarified that he wants all the dates here is the updated solution
I have excluded a Emplyee from Count if his job is ending on that date.But if you want to include change t.[Date] < f.EndDate to t.[Date] <= f.EndDate in the below solution. Plus I assume the NULL value in EndDate mean Employee still works for Department.
DECLARE #StartDate DATE = (SELECT MIN(StartDate) FROM Table1)
,#EndDate DATE = (SELECT MAX(EndDate) FROM Table1)
;WITH CTE AS
(
SELECT DISTINCT DeptID,#StartDate AS [Date] FROM Table1
UNION ALL
SELECT c.DeptID, DATEADD(dd,1,c.[Date]) AS [Date] FROM CTE AS c
WHERE c.[Date]<=#EndDate
)
SELECT * ,
EmployeeCount=( SELECT COUNT(EmployeeID)
FROM TABLE1 AS f
WHERE f.DeptID=t.DeptID AND t.[Date] >= f.StartDate
AND ( t.[Date] < f.EndDate OR f.EndDate IS NULL )
)
FROM CTE AS t
ORDER BY 1
OPTION ( MAXRECURSION 0 )
here is SQL Fiddler demo.I have added another department and added an Employee to it.
http://sqlfiddle.com/#!3/5c4ec/1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create new date ranges from overlapping date ranges and assign an ID - sql

Related

Difference between consecutive rows in SQL, inclusive of the first and last row?

SQL - Get predicted dates if not available

select datetime closest to a specified value for every day in a year

Stored procedure to find Highest Scores in SQL table

SQL - Grouping with aggregation

Categories

Resources