I have the following table
ID | START_DATE | END_DATE | FEATURE
---------------------------------------
001 | 1995-08-01 | 1997-12-31 | 1
001 | 1998-01-01 | 2017-03-31 | 4
001 | 2000-06-14 | 2017-03-31 | 5
001 | 2013-04-01 | 2017-03-31 | 8
002 | 1929-10-01 | 2006-05-25 | 1
002 | 2006-05-26 | 2016-11-10 | 4
002 | 2006-05-26 | 2016-11-10 | 7
002 | 2013-04-01 | 2016-11-10 | 8
I want to convert this table into a consolidated table which will look for overlapping date ranges and then combine these into new rows. Creating a non-overlapping set of date ranges.
The bit that I need the most help with is the consolidations of the 'feature' column which will concatenate each feature into the format below.
ID | START_DATE | END_DATE | FEATURE
---------------------------------------
001 | 1995-08-01 | 1997-12-31 | 1
001 | 1998-01-01 | 2000-06-13 | 4
001 | 2000-06-14 | 2013-03-31 | 45
001 | 2013-04-01 | 2017-03-31 | 458
002 | 1929-10-01 | 2006-05-25 | 1
002 | 2006-05-26 | 2013-03-31 | 47
002 | 2013-04-01 | 2016-11-10 | 478
I've used the following to create the test data.
CREATE TABLE #TEST (
[ID] [varchar](10) NULL,
[START_DATE] [date] NULL,
[END_DATE] [date] NULL,
[FEATURE] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO #TEST
VALUES
('001','1998-01-01','2017-03-31',4),
('001','2000-06-14','2017-03-31',5),
('001','2013-04-01','2017-03-31',8),
('001','1995-08-01','1997-12-31',1),
('002','2006-05-26','2016-11-10',4),
('002','2006-05-26','2016-11-10',7),
('002','2013-04-01','2016-11-10',8),
('002','1929-10-01','2006-05-25',1)
You can use apply :
select distinct t.id, t.START_DATE, t.END_DATE, coalesce(tt.feature, t.feature) as feature
from #test t outer apply
( select ' '+t1.feature
from #test t1
where t1.id = t.id and t1.end_date = t.end_date and t1.start_date <= t.start_date
order by t1.start_date
for xml path('')
) tt(feature)
order by t.id, t.START_DATE;
Here is a db<>fiddle.
Here is a query that will set DATE_END. It looks like you are using SQL Server, but without or small modifications it will run almost on every db.
with grouped_data as
(
select ID, START_DATE, END_DATE from #TEST group by ID, START_DATE, END_DATE
)
,cte as
(
select
*,
ROW_NUMBER() over (partition by ID order by start_date) as nr
from grouped_data
)
select
c1.ID
,c1.START_DATE
,case when c1.nr <> 1 then isnull(DATEADD(DAY, -1, c2.START_DATE), c1.END_DATE) ELSE c1.END_DATE end as END_DATE
from cte as c1
left join cte as c2
on c1.ID = c2.ID
and c1.nr = c2.nr -1
order by c1.ID
If you have SQL Server 2017 you can easly transform FEATURE using STRING_AGG.
Related
I'm trying to get the successive differences of rows of data in SQL, including differences between first and last row and 0.
I have two tables that look like this
+------------+-------+ +------------+-------+
| Date | Name | | Date | Value |
+------------+-------+ +------------+-------+
| 2019-10-10 | AAA | | 2019-10-11 | 100 |
| 2019-10-11 | BBB | | 2019-10-12 | 150 |
| 2019-10-12 | CCC | | 2019-10-14 | 300 |
| 2019-10-13 | DDD | +------------+-------+
| 2019-10-14 | EEE |
| 2019-10-15 | FFF |
+------------+-------+
The end result I'm looking for is
+------------+-------+-------+---------------+------------+
| Date | Name | Value | PreviousValue | Difference |
+------------+-------+-------+---------------+------------+
| 2019-10-11 | BBB | 100 | 0 | 100 |
| 2019-10-12 | CCC | 150 | 100 | 50 |
| 2019-10-14 | EEE | 300 | 150 | 150 |
| 2019-10-15 | FFF | 0 | 300 | -300 |
+------------+-------+-------+---------------+------------+
I can get the first row by using LAG, but I don't quite know how to get the last row at the same time.
SELECT
d.[Date],
d.[Name],
v.[Value],
[PreviousValue] = COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
[PreviousLossAmount] = v.[Value] - COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
FROM
[Dates] d
LEFT JOIN
[Values] v
ON
d.[Date] = v.[Date]
Note that in reality, my tables are more complex, and I'd need to group and partition by multiple columns.
In a subquery, you can LEFT JOIN, and use ROW_NUMBER() to identify the last record. Then in the outer query you can filter out rows that did not match in the LEFT JOIN while allowing the last record.
Other considerations:
you just need to order LAG() by d.[Date] instead of v.[Date]. This properly handles the case when the left join comes up empty (ie when v.[Date] is null).
You also want to use COALESCE() on v.[Value], since this may come up null too.
Query:
SELECT
[Date],
[Name],
[Value],
[PreviousValue],
[PreviousLossAmount]
FROM (
SELECT
d.[Date],
d.[Name],
[Value] = COALESCE(v.[Value], 0),
[PreviousValue] = COALESCE(LAG(v.[Value]) OVER (ORDER BY d.[Date]), 0),
[PreviousLossAmount] = COALESCE(v.[Value], 0)
- COALESCE(LAG(v.[Value]) OVER (ORDER BY d.[Date]), 0),
rn = ROW_NUMBER() OVER(ORDER BY d.[Date] DESC),
vDate = v.[Date]
FROM [Dates] d
LEFT JOIN [Values] v ON d.[Date] = v.[Date]
) t
WHERE vDate IS NOT NULL OR RN = 1
ORDER BY [Date]
Demo on DB Fiddle:
Date | Name | Value | PreviousValue | PreviousLossAmount
:------------------ | :--- | ----: | ------------: | -----------------:
11/10/2019 00:00:00 | BBB | 100 | 0 | 100
12/10/2019 00:00:00 | CCC | 150 | 100 | 50
14/10/2019 00:00:00 | EEE | 300 | 0 | 300
15/10/2019 00:00:00 | FFF | 0 | 300 | -300
You seem to want:
select d.[Date], d.[Name], v.[Value],
PreviousValue = COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
PreviousLossAmount]= v.[Value] - COALESCE(LAG(v.[Value) OVER (ORDER BY v.[Date]), 0)
from dates d join
[Values] v
on d.[Date] = v.[Date]
union all
select top (1) d.date, d.name, 0 as value, v.value, - v.value
from dates d cross join
(select top (1) v.*
from values v
order by v.date desc
) v
order by d.date;
I got a table #a as follows:
ID | TYPE_ID | CREATED_DT
============================
001 | 111 | 2019-08-28
001 | 111 | 2018-08-12
001 | 111 | 2017-08-23
001 | 111 | 2016-08-14
001 | 111 | 2015-08-17
001 | 111 | 2014-08-11
001 | 112 | 2019-05-31
001 | 112 | 2018-05-28
I would like to get my final output as follows:
ID | TYPE_ID | CREATED_DT
============================
001 | 111 | 2019-08-28
001 | 111 | 2018-08-12
001 | 111 | 2017-08-23
001 | 111 | 2016-08-14
001 | 111 | 2015-08-17
001 | 111 | 2014-08-11
001 | 112 | 2019-05-31
001 | 112 | 2018-05-28
001 | 112 | 2017-05-31 --Predict YEAR end dates if not available
001 | 112 | 2016-05-31
001 | 112 | 2015-05-31
001 | 112 | 2014-05-31
The final result set should predict dates upto 6 month end dates per TYPE_ID if the dates are not available(TYPE_ID = 112 has only 2 dates available). I'm sure we can do this using DATEADD and DATEDIFF functions to predict dates but a bit complicated for my knowledge. Any help?
Query that I'm trying, but not exactly there:
select *,
ROW_NUMBER() OVER(PARTITION BY ID, TYPE_ID ORDER BY CREATED_DT DESC) AS RN
INTO #B
from #a;
;WITH CTE(ID, TYPE_ID, CREATED_DT, RN)
AS(
SELECT
ID,
TYPE_ID,
CREATED_DT,
RN
FROM #B
WHERE RN = 1 --Instead of RN = 1 I would like to get this till all
--available dates, so that I can go to recursive part for
--predicting non-available dates
UNION ALL
SELECT
A.ID,
A.TYPE_ID,
DATEADD(yy, -1, CTE.CREATED_DT)AS CREATED_DT,
CTE.RN +1 AS RN
FROM #B AS A
INNER JOIN CTE ON CTE.ID = A.ID
AND CTE.TYPE_ID = A.TYPE_ID
AND CTE.RN < 6
AND A.RN = 1
)
Because there isn't an id to identify each row, you could use a rank window function to take the last row in this table. Then from the last row date you can dateadd -1 year to each date based on the rank. Then at the end UNION the intial CTE with the predictive CTE.
;WITH CTE
AS (
SELECT d.ID
,d.Type_ID
,d.CREATED_DT
,RANK() OVER (
ORDER BY Type_ID
,Created_DT
) AS OrderOf
FROM datetable d
)
,CTE2
AS (
SELECT M.ID
,m.Type_ID
,DATEADD(Year, - 1, m.CREATED_DT) AS Created_DT
,M.OrderOf + 1 AS OrderOf
FROM CTE M
WHERE OrderOf = 8
)
,CTE3 (
n
,ID
,Type_Id
,Created_DT
,OrderOf
)
AS (
SELECT 0
,M.ID
,m.Type_ID
,m.CREATED_DT
,M.OrderOf AS OrderOf
FROM CTE2 M
UNION ALL
SELECT n + 1
,T.ID
,T.Type_ID
,DATEADD(YEAR, - 1, T.CREATED_DT)
,T.OrderOf + 1 AS OrderOf
FROM CTE3 T
WHERE n < 4
)
SELECT ID
,Type_ID
,Created_DT
FROM CTE3
UNION
SELECT ID
,Type_ID
,Created_DT
FROM CTE
ORDER BY Type_Id
,Created_DT DESC;
I have 2 simple tables defined as:
CREATE TABLE [dbo].[shop](
[id] [uniqueidentifier] NOT NULL,
[name] [ntext] NOT NULL,
[brand] [ntext] NULL
)
CREATE TABLE [dbo].[shop_history](
[id] [int] NOT NULL,
[shopid] [uniqueidentifier] NOT NULL, (references shop.id)
[totalstockval] [int] NOT NULL,
[date] [datetime2](0) NOT NULL
)
With data:
**dbo.shop**
id | name | brand
--------------------------
1 | Bow Rd | Tesco
2 | Wren Rd | Tesco
3 | Skye Rd | Safeway
**dbo.shop_history**
id | shopid | totalstockval | date
----------------------------------------------
997 | 1 | 19923031 | 2017-02-01 08:00
998 | 1 | 19323322 | 2017-02-01 08:30
999 | 1 | 19283873 | 2017-02-01 09:45
1000 | 2 | 14949321 | 2017-02-01 07:00
1001 | 2 | 12312312 | 2017-02-01 09:30
1002 | 3 | 12232344 | 2017-01-31 23:45
1003 | 3 | 12999222 | 2017-02-01 09:45
I have a full years worth of similar data. I want to query the data to find the latest stock value each day BEFORE 09:00, even if that occurred the previous day.
The resultset I'm trying to achieve would look like:
shop.id | name | brand | totalstockval | date
---------------------------------------------------------------
1 | Bow Rd | Tesco | 19323322 | 2017-02-01 08:30
2 | Wren Rd | Tesco | 14949321 | 2017-02-01 07:00
3 | Skye Rd | Safeway | 12232344 | 2017-01-31 23:45
repeated for each day of the year. If there's no value row on a particular day, use the latest available value.
I have a feeling I would need a tally table containing every date (or datetime) that I want a price for, but I'm not sure of the query. How can I achieve a resultset similar to the above example?
could be you need a join with a subselect for max date < 9
select t.shopid, b.name, b.brand, t.max_date, a.totalstockvale
from shop_history a
inner join (
select shopid, max(date) max_date
from shop_history a
where time(date) < '09:00'
group by a.shopid ) t on a.shopid = t.shopid and a.date = t.date
inner join shop on a.shopid = b.shopid
This is a tricky question. You want the latest shop_history record for each shop where the day ends at 9:00 a.m. One method is to subtract 9 hours and do the calculation based on the resulting date:
select sh.*
from (select sh.*,
row_number() over (partition by shopid,
cast(dateadd(hour, -9, date) as date)
order by date desc
) as seqnum
from shop_history sh
) sh
where seqnum = 1
+-------+-----------+-------+
| Name | Date | Score |
+-------+-----------+-------+
| Name1 | 1/3/2016 | 80 |
| Name2 | 1/5/2016 | 76 |
| Name3 | 1/29/2016 | 77 |
| Name4 | 1/30/2016 | 40 |
| Name4 | 1/17/2016 | 79 |
| Name5 | 1/1/2016 | 90 |
| Name2 | 1/3/2016 | 79 |
| Name5 | 1/27/2016 | 92 |
| Name2 | 1/27/2016 | 99 |
| Name1 | 1/21/2016 | 93 |
| Name2 | 1/3/2016 | 70 |
| Name1 | 2/15/2016 | 80 |
| Name3 | 3/31/2016 | 84 |
+-------+-----------+-------+
I have this table and need to find the highest score for each name in a given period of time (i.e. between 01/01/2016 and 01/31/2016) and display Name, Date and Highest Score.
Please help! Thank you - Humberto Goez
You're going to have a problem with duplicate rows, as you don't have a primary key shown. This query would work, but it would be better to employ a primary key
Of course, this is just the SQL...
SELECT Name, [Date], Score
FROM MyTable T1
WHERE T1.Score = (SELECT MAX(T2.Score)
FROM MyTable T2
WHERE T2.Name = T1.Name)
AND [Date] >= #StartDate
AND [Date] <= #EndDate
If you are using SQL Server then you could use:
DECLARE #start_date DATE = '2016-01-01T00:00:00'
,#end_date DATE = '2016-01-31T00:00:00';
SELECT TOP 1 WITH TIES [Name],[Date],[Score]
FROM tab_name t
WHERE [Date] BETWEEN #start_date AND #end_date
ORDER BY RANK() OVER(PARTITION BY Name ORDER BY Score DESC);
LiveDemo
A good approach is CTE. Sample query looks like this:
declare #StartDate datetime = '2016-01-01',
#EndDate datetime = '2016-06-01'
;with scores as (
SELECT Name, [Date], Score,
row_number() over(partition by name /*start over each name*/
order by Score desc /*top first*/,[Date] /*earlier first*/) rn
FROM MyTable
)
select * from scores
where rn = 1
I have a table (TABLE1) that lists all employees with their Dept IDs, the date they started and the date they were terminated (NULL means they are current employees).
I would like to have a resultset (TABLE2) , in which every row represents a day starting since the first employee started( in the sample table below, that date is 20090101 ), till today. (the DATE field). I would like to group the employees by DeptID and calculate the total number of employees for each row of TABLE2.
How do I this query? Thanks for your help, in advance.
TABLE1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20090101 NULL
001 234 20110101 20120101
TABLE2
DeptID Date EmployeeCount
-----------------------------------
001 20090101 1
001 20090102 1
... ... 1
001 20100101 2
001 20100102 2
... ... 2
001 20110101 3
001 20110102 3
... ... 3
001 20120101 1
001 20120102 1
001 20120103 1
... ... 1
This will work if you have a date look up table. You will need to specify the department ID. See it in action.
Query
SELECT d.dt, SUM(e.ecount) AS RunningTotal
FROM dates d
INNER JOIN
(SELECT b.dt,
CASE
WHEN c.ecount IS NULL THEN 0
ELSE c.ecount
END AS ecount
FROM dates b
LEFT JOIN
(SELECT a.DeptID, a.dt, SUM([count]) AS ecount
FROM
(SELECT DeptID, EmployeeID, 1 AS [count], StartDate AS dt FROM TABLE1
UNION ALL
SELECT DeptID, EmployeeID,
CASE
WHEN EndDate IS NOT NULL THEN -1
ELSE 0
END AS [count], EndDate AS dt FROM TABLE1) a
WHERE a.dt IS NOT NULL AND DeptID = 1
GROUP BY a.DeptID, a.dt) c ON c.dt = b.dt) e ON e.dt <= d.dt
GROUP BY d.dt
Result
| DT | RUNNINGTOTAL |
-----------------------------
| 2009-01-01 | 1 |
| 2009-02-01 | 1 |
| 2009-03-01 | 1 |
| 2009-04-01 | 1 |
| 2009-05-01 | 1 |
| 2009-06-01 | 1 |
| 2009-07-01 | 1 |
| 2009-08-01 | 1 |
| 2009-09-01 | 1 |
| 2009-10-01 | 1 |
| 2009-11-01 | 1 |
| 2009-12-01 | 1 |
| 2010-01-01 | 2 |
| 2010-02-01 | 2 |
| 2010-03-01 | 2 |
| 2010-04-01 | 2 |
| 2010-05-01 | 2 |
| 2010-06-01 | 2 |
| 2010-07-01 | 2 |
| 2010-08-01 | 2 |
| 2010-09-01 | 2 |
| 2010-10-01 | 2 |
| 2010-11-01 | 2 |
| 2010-12-01 | 2 |
| 2011-01-01 | 3 |
| 2011-02-01 | 3 |
| 2011-03-01 | 3 |
| 2011-04-01 | 3 |
| 2011-05-01 | 3 |
| 2011-06-01 | 3 |
| 2011-07-01 | 3 |
| 2011-08-01 | 3 |
| 2011-09-01 | 3 |
| 2011-10-01 | 3 |
| 2011-11-01 | 3 |
| 2011-12-01 | 3 |
| 2012-01-01 | 1 |
Schema
CREATE TABLE TABLE1 (
DeptID tinyint,
EmployeeID tinyint,
StartDate date,
EndDate date)
INSERT INTO TABLE1 VALUES
(1, 123, '2010-01-01', '2012-01-01'),
(1, 124, '2009-01-01', NULL),
(1, 234, '2011-01-01', '2012-01-01')
CREATE TABLE dates (
dt date)
INSERT INTO dates VALUES
('2009-01-01'), ('2009-02-01'), ('2009-03-01'), ('2009-04-01'), ('2009-05-01'),
('2009-06-01'), ('2009-07-01'), ('2009-08-01'), ('2009-09-01'), ('2009-10-01'),
('2009-11-01'), ('2009-12-01'), ('2010-01-01'), ('2010-02-01'), ('2010-03-01'),
('2010-04-01'), ('2010-05-01'), ('2010-06-01'), ('2010-07-01'), ('2010-08-01'),
('2010-09-01'), ('2010-10-01'), ('2010-11-01'), ('2010-12-01'), ('2011-01-01'),
('2011-02-01'), ('2011-03-01'), ('2011-04-01'), ('2011-05-01'), ('2011-06-01'),
('2011-07-01'), ('2011-08-01'), ('2011-09-01'), ('2011-10-01'), ('2011-11-01'),
('2011-12-01'), ('2012-01-01')
you need somthing along these lines.
SELECT *
, ( SELECT COUNT(EmployeeID) AS EmployeeCount
FROM TABLE1 AS f
WHERE t.[Date] BETWEEN f.BeginDate AND f.EndDate
)
FROM ( SELECT DeptID
, BeginDate AS [Date]
FROM TABLE1
UNION
SELECT DeptID
, EndDate AS [Date]
FROM TABLE1
) AS t
EDIT since OP clarified that he wants all the dates here is the updated solution
I have excluded a Emplyee from Count if his job is ending on that date.But if you want to include change t.[Date] < f.EndDate to t.[Date] <= f.EndDate in the below solution. Plus I assume the NULL value in EndDate mean Employee still works for Department.
DECLARE #StartDate DATE = (SELECT MIN(StartDate) FROM Table1)
,#EndDate DATE = (SELECT MAX(EndDate) FROM Table1)
;WITH CTE AS
(
SELECT DISTINCT DeptID,#StartDate AS [Date] FROM Table1
UNION ALL
SELECT c.DeptID, DATEADD(dd,1,c.[Date]) AS [Date] FROM CTE AS c
WHERE c.[Date]<=#EndDate
)
SELECT * ,
EmployeeCount=( SELECT COUNT(EmployeeID)
FROM TABLE1 AS f
WHERE f.DeptID=t.DeptID AND t.[Date] >= f.StartDate
AND ( t.[Date] < f.EndDate OR f.EndDate IS NULL )
)
FROM CTE AS t
ORDER BY 1
OPTION ( MAXRECURSION 0 )
here is SQL Fiddler demo.I have added another department and added an Employee to it.
http://sqlfiddle.com/#!3/5c4ec/1