Convert start and end dates to normalized table - sql

I have a table with customer records, with each customer having a start date and end date. I'm looking for the most efficient way to convert this into a table that counts the number of active customers for each day. For example:
Existing table (Table A):
Customer - Start Date - End Date
A - 1/1/2017 - 1/3/2017
B - 1/2/2017 - 1/5/2017
What I need (Table B):
Date - Customer_Count
1/1/2017 - 1
1/2/2017 - 2
1/3/2017 - 2
1/4/2017 - 1
1/5/2017 - 1
The method I'm using right now is simply joining a date reference table to the customer table, and then grouping by the reference date column. While this method works, the customer table is very large, and there are additional conditions I want to be able to apply (i.e., the geography of the customer, product, etc.) which will additionally impact performance.
Appreciate the help!

You can generate dates using custom table and the do cross apply as below:
select RowN as [Date], count(*) as Customer_Count from #yourcust cross apply
(
select top (datediff(day, startdate, enddate)+1) rowN = dateadd(day, row_number() over (order by s1.name) -1 , startdate) from master..spt_values s1,master..spt_values s2
) a
group by RowN
Output
+------------+----------------+
| Date | Customer_Count |
+------------+----------------+
| 2017-01-01 | 1 |
| 2017-01-02 | 2 |
| 2017-01-03 | 2 |
| 2017-01-04 | 1 |
| 2017-01-05 | 1 |
+------------+----------------+

A tally/calendar table would do the trick, but an ad-hoc tally table in concert with a Cross Apply may help as well
Example
Select Date
,Customer_Count = count(*)
From YourTable A
Cross Apply (
Select Top (DateDiff(DD,[Start Date] ,[End Date] )+1) Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),[Start Date])
From master..spt_values
) B
Group By Date
Order By Date
Returns
Date Customer_Count
2017-01-01 1
2017-01-02 2
2017-01-03 2
2017-01-04 1
2017-01-05 1

Related

How to make entries from column appear as row title

I have a hospital database which looks something like this
id | patient_name | admitDate | DischargeDate |RoomCategory
1 | john |3/01/2011 | 5/01/2011 |Category1
2 | lisa |3/01/2011 | 4/01/2011 |Category2
3 | ron |5/01/2011 | 10/01/2011 |Category1
4 | howard |6/01/2012 | 10/01/2012 |Category3
5 | john |6/05/2011 | 7/05/2011 |Category4
6 | rammy |6/02/2011 | 7/03/2011 |Category4
I have to calculate the number of patients in hospital on each day (both admit and discharge date to be counted) and group them by category
Suppose on 3/01/2011 we have 2 patients, one in category 1 and one in category 2 on 4/01/2011 we again have same 2 patients but on 5/01/2011 lisa (id 2) is discharged so we only have 1 patient from category 1 but now ron (id 3) is also admitted so now we also have to count him.
The output should look something like this
Date | Category1 | Category2 | Category3 |Category4
3/01/2011 | 1 | 1 | 0 | 0
4/01/2011 | 1 | 1 | 0 | 0
5/01/2011 | 2 | 0 | 0 | 0
I am not able to figure out how to list all the dates which might have a patient, because the actual table is huge and a lot of dates don't have any patient. I also am not able to get how will I count distinctively to get count under each category.
I have 15 categories in total in my actual table so using where for each one of them separately wouldn't be very efficient.
You have 2 problems here. 1 you need a calendar table, and then 2 a pivot. I suggest, if I am honest, you invest in creating a calendar table firstly, but I use an inline one here. Then you can use pivoting to convert the values to columns. I use conditional aggregation here, as it is transferable and less restrictive.
SELECT *
INTO dbo.YourTable
FROM (VALUES(1,'john ',CONVERT(date,'3/01/2011'),CONVERT(date,'5/01/2011 '),'Category1'),
(2,'lisa ',CONVERT(date,'3/01/2011'),CONVERT(date,'4/01/2011 '),'Category2'),
(3,'ron ',CONVERT(date,'5/01/2011'),CONVERT(date,'10/01/2011'),'Category1'),
(4,'howard',CONVERT(date,'6/01/2012'),CONVERT(date,'10/01/2012'),'Category3'),
(5,'john ',CONVERT(date,'6/05/2011'),CONVERT(date,'7/05/2011 '),'Category4'),
(6,'rammy ',CONVERT(date,'6/02/2011'),CONVERT(date,'7/03/2011 '),'Category4'))V(id,patient_name,admitDate,DischargeDate,RoomCategory)
GO
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT TOP (SELECT DATEDIFF(DAY, MIN(admitDate), MAX(DischargeDate)) FROM dbo.YourTable)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3), --UP to 1000 days. Add more cross joins for more days
Calendar AS(
SELECT DATEADD(DAY, T.I, YT.MinAdmitDate) AS D
FROM Tally T
CROSS APPLY (SELECT MIN(admitDate) AS MinAdmitDate FROM dbo.YourTable) YT)
SELECT C.D AS [Date],
COUNT(CASE YT.RoomCategory WHEN 'Category1' THEN 1 END) AS Category1,
COUNT(CASE YT.RoomCategory WHEN 'Category2' THEN 1 END) AS Category2,
COUNT(CASE YT.RoomCategory WHEN 'Category3' THEN 1 END) AS Category3,
COUNT(CASE YT.RoomCategory WHEN 'Category4' THEN 1 END) AS Category4
FROM Calendar C
LEFT JOIN dbo.YourTable YT ON C.D >= YT.admitDate
AND C.D <= DischargeDate
GROUP BY C.D;
GO
DROP TABLE dbo.YourTable;
db<>fiddle Note that that results might not be what you expect as DB Fiddle defaults to American, and you provide an ambiguous date format and I don't provide an explicit style in the CONVERT functions.

Get a list of dates between few dates

There are some quite similar questions, but not the same.
I have to solve the next problem:
From table with such structure
| DATE_FROM | DATE_TO |
|------------|------------|
| 2010-05-17 | 2010-05-19 |
| 2017-01-02 | 2017-01-04 |
| 2017-05-01 | NULL |
| 2017-06-12 | NULL |
I need to get a list like the one below
| DATE_LIST |
|------------|
| 2010-05-17 |
| 2010-05-18 |
| 2010-05-19 |
| 2017-01-02 |
| 2010-01-03 |
| 2010-01-04 |
| 2017-05-01 |
| 2017-06-12 |
How can I get it with SQL? SQL Server 2016.
Another option is with a CROSS APPLY and an ad-hoc tally table
Select Date_List=B.D
from YourTable A
Cross Apply (
Select Top (DateDiff(DAY,[DATE_FROM],IsNull([DATE_TO],[DATE_FROM]))+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),[DATE_FROM])
From master..spt_values n1,master..spt_values n2
) B
Returns
Date_List
2010-05-17
2010-05-18
2010-05-19
2017-01-02
2017-01-03
2017-01-04
2017-05-01
2017-06-12
One method uses a recursive CTE:
with cte as (
select date_from as date_list, date_to
from t
union all
select dateadd(day, 1, date_from), date_to
from cte
where date_from < date_to
)
select date_list
from cte;
By default, the recursive CTE is limited to a recursive depth of 100 (and then it returns an error). That works for spans of up to 100 days. You can remove the limit with OPTION (MAXRECURSION 0).
Although you could create the date range on the fly in your query, consider creating a permanent calendar table. This will provide better performance and can be extended with other attributes like day of week, fiscal quarter, etc. You can find many examples of loading such a table with an internet search.
Below is an example with 40 years of dates.
--example calendar table load script
CREATE TABLE dbo.Calendar(
CalendarDate date NOT NULL
CONSTRAINT PK_Calendar PRIMARY KEY
);
WITH
t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
,t64k AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) AS num FROM t256 AS a CROSS JOIN t256 AS b)
INSERT INTO dbo.Calendar WITH(TABLOCKX)
SELECT DATEADD(day, num, '20000101')
FROM t64k
WHERE DATEADD(day, num, '20000101') < '20400101'
GO
DECLARE #example TABLE(
DATE_FROM date NOT NULL
,DATE_TO date NULL
);
GO
--example query
INSERT INTO #example VALUES
('2010-05-17', '2010-05-19')
, ('2017-01-02', '2017-01-04')
, ('2017-05-01', NULL)
, ('2017-06-12', NULL)
SELECT
c.CalendarDate
FROM #example AS e
JOIN dbo.Calendar AS c ON
c.CalendarDate BETWEEN e.DATE_FROM AND COALESCE(e.DATE_TO, e.DATE_FROM);

Grouping rows if their dates overlap, and ranking them

My situation is I have a table of transactions, with start and end dates. The problem is that often times these transaction dates overlap with each other, and I want to group these scenarios together.
For example in the case below, transaction #1 is the "root" transaction, while #2-#4 are overlapping with #1 and/or with each other. However, transaction #5 is not overlapping with anything, hence it is a new "root" transaction.
+----------------+-----------+-----------+----------------------------------+
| Transaction ID | StartDate | EndDate | |
+----------------+-----------+-----------+----------------------------------+
| 1 | 1/1/2017 | 1/3/2017 | root transaction |
| 2 | 1/2/2017 | 1/6/2017 | overlaps with #1 |
| 3 | 1/5/2017 | 1/10/2017 | overlaps with #2 |
| 4 | 1/3/2017 | 1/13/2017 | overlaps with #2 and #3 |
| 5 | 1/15/2017 | 1/20/2017 | no overlap, new root transaction |
+----------------+-----------+-----------+----------------------------------+
Below is how I want the output to look. I want to
Identify the root transaction (column 4)
Rank the transactions in a chain by EndDate, so that the root is always = 1
+----------------+-----------+-----------+------------------+------+
| Transaction ID | Start | End | Root Transaction | Rank |
+----------------+-----------+-----------+------------------+------+
| 1 | 1/1/2017 | 1/3/2017 | 1 | 1 |
| 2 | 1/2/2017 | 1/6/2017 | 1 | 2 |
| 3 | 1/5/2017 | 1/10/2017 | 1 | 3 |
| 4 | 1/3/2017 | 1/13/2017 | 1 | 4 |
| 5 | 1/15/2017 | 1/20/2017 | 5 | 1 |
+----------------+-----------+-----------+------------------+------+
How would I go about this in SQL?
Here is one method using an OUTER APPLY
Declare #YourTable table ([Transaction ID] int,StartDate date,EndDate date)
Insert Into #YourTable values
(1,'1/1/2017','1/3/2017'),
(2,'1/2/2017','1/6/2017'),
(3,'1/5/2017','1/10/2017'),
(4,'1/3/2017','1/13/2017'),
(5,'1/15/2017','1/20/2017')
Select [Transaction ID]
,[Start] = StartDate
,[End] = EndDate
,[Root Transaction]=Grp
,[Rank] = Row_Number() over (Partition By Grp Order by [Transaction ID])
From (
Select A.*
,Grp = max(Flag*[Transaction ID]) over (Order By [Transaction ID])
From (
Select A.*,Flag = IsNull(B.Flg,1)
From #YourTable A
Outer Apply (
Select Top 1 Flg=0
From #YourTable
Where (StartDate between A.StartDate and A.EndDate
or EndDate between A.StartDate and A.EndDate )
and [Transaction ID]<A.[Transaction ID]
) B
) A
) A
Returns
EDIT - Some Commentary
In the OUTER APPLY, Flag will be set to 1 or 0. 1 Indicates a New Group. 0 Indicates that the record will overlap with an existing range
Then the next query "up", We use the window function to apply a Grp Code (Flag*Trans ID). Remember a new group is 1 and existing is 0.
Now the window function will take max of this product, as it traverses the Transactions.
The final query is just to apply the Rank using the window function partition by the Grp, Order by Trans ID
If it helps with the visualization:
The 1st sub-query (outer apply) genererates
The 2nd sub-query generates
This is an example of "gaps-and-islands". For your data, you can determine the "island"s by determining where each starts -- that is, where a record does not overlap with the previous record. You can then get the rank using row_number().
So, here is a method:
select t.*,
min(transactionId) over (partition by island) as start,
row_number() over (partition by island order by endDate) as rnk
from (select t.*,
sum(startIslandFlag) over (order by startDate) as island
from (select t.*,
(case when not exists (select 1
from t t2
where t2.startdate < t.startdate and
t2.enddate >= t.startdate
)
then 1 else 0
end) as startIslandFlag
from t
) t
) t;
Notes:
In the event that the lowest transaction id is not the "root", then a tweak may be needed to the code to get the transaction id with the minimum start date.
If there are duplicate start dates in the code, a tweak may be needed with the cumulative sums (using an explicit range window).
Identify the root transactions:
with roots as (
select *
from tran as t1
where not exists (
select 1
from tran as t2
where t2.Transaction_ID < t1.Transaction_ID
and (
t1.StartDate between t2.StartDate and t2.EndDate
or
t1.EndDate between t2.StartDate and t2.EndDate
)
)
)
Create a two root system to capture all the overlaps in between them
select t.Transaction_ID,
t.StartDate as [Start],
t.EndDate as [End],
r1.Transaction_ID as Root_Transaction,
row_number() over (partition by r1.Transaction_ID order by t.EndDate) as [Rank]
from roots as r1
inner join roots as r2
on r2.Transaction_ID > r1.Transaction_ID
inner join tran as t
on t.Transaction_ID >= r1.Transaction_ID
and t.Transaction_ID < r2.Transaction_ID
where not exists ( --this "not exists" makes sure r1 and r2 are consequetive roots
select 1
from roots as r3
where r3.Transaction_ID > r1.Transaction_ID
and r3.Transaction_ID < r2.Transaction_ID
)

Get Monthly Totals from Running Totals

I have a table in a SQL Server 2008 database with two columns that hold running totals called Hours and Starts. Another column, Date, holds the date of a record. The dates are sporadic throughout any given month, but there's always a record for the last hour of the month.
For example:
ContainerID | Date | Hours | Starts
1 | 2010-12-31 23:59 | 20 | 6
1 | 2011-01-15 00:59 | 23 | 6
1 | 2011-01-31 23:59 | 30 | 8
2 | 2010-12-31 23:59 | 14 | 2
2 | 2011-01-18 12:59 | 14 | 2
2 | 2011-01-31 23:59 | 19 | 3
How can I query the table to get the total number of hours and starts for each month between two specified years? (In this case 2011 and 2013.) I know that I need to take the values from the last record of one month and subtract it by the values from the last record of the previous month. I'm having a hard time coming up with a good way to do this in SQL, however.
As requested, here are the expected results:
ContainerID | Date | MonthlyHours | MonthlyStarts
1 | 2011-01-31 23:59 | 10 | 2
2 | 2011-01-31 23:59 | 5 | 1
Try this:
SELECT c1.ContainerID,
c1.Date,
c1.Hours-c3.Hours AS "MonthlyHours",
c1.Starts - c3.Starts AS "MonthlyStarts"
FROM Containers c1
LEFT OUTER JOIN Containers c2 ON
c1.ContainerID = c2.ContainerID
AND datediff(MONTH, c1.Date, c2.Date)=0
AND c2.Date > c1.Date
LEFT OUTER JOIN Containers c3 ON
c1.ContainerID = c3.ContainerID
AND datediff(MONTH, c1.Date, c3.Date)=-1
LEFT OUTER JOIN Containers c4 ON
c3.ContainerID = c4.ContainerID
AND datediff(MONTH, c3.Date, c4.Date)=0
AND c4.Date > c3.Date
WHERE
c2.ContainerID is null
AND c4.ContainerID is null
AND c3.ContainerID is not null
ORDER BY c1.ContainerID, c1.Date
Using recursive CTE and some 'creative' JOIN condition, you can fetch next month's value for each ContainterID:
WITH CTE_PREP AS
(
--RN will be 1 for last row in each month for each container
--MonthRank will be sequential number for each subsequent month (to increment easier)
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY ContainerID, YEAR(Date), MONTH(DATE) ORDER BY Date DESC) RN
,DENSE_RANK() OVER (ORDER BY YEAR(Date),MONTH(Date)) MonthRank
FROM Table1
)
, RCTE AS
(
--"Zero row", last row in decembar 2010 for each container
SELECT *, Hours AS MonthlyHours, Starts AS MonthlyStarts
FROM CTE_Prep
WHERE YEAR(date) = 2010 AND MONTH(date) = 12 AND RN = 1
UNION ALL
--for each next row just join on MonthRank + 1
SELECT t.*, t.Hours - r.Hours, t.Starts - r.Starts
FROM RCTE r
INNER JOIN CTE_Prep t ON r.ContainerID = t.ContainerID AND r.MonthRank + 1 = t.MonthRank AND t.Rn = 1
)
SELECT ContainerID, Date, MonthlyHours, MonthlyStarts
FROM RCTE
WHERE Date >= '2011-01-01' --to eliminate "zero row"
ORDER BY ContainerID
SQLFiddle DEMO (I have added some data for February and March in order to test on different lengths of months)
Old version fiddle

SQL Calculate Days between two dates in one table

I have a table dbo.Trans which contains an id called bd_id(varchar) and transfer_date(Datetime), also an identifier member_id pk is trns_id and is sequential
Duplicates of bd_id and member_id exist in the table.
transfer_date |bd_id| member_id | trns_id
2008-01-01 00:00:00 | 432 | 111 | 1
2008-01-03 00:00:00 | 123 | 111 | 2
2008-01-08 00:00:00 | 128 | 111 | 3
2008-02-04 00:00:00 | 123 | 432 | 4
.......
For each member_id, I want to get the amount of days between dates and for each bd_id
E.G., member 111 used 432 from 2008-01-01 until 2008-02-01 so return should be 2
Then next would be 5
I know the DATEDIFF() function exists but I am not sure how to get the difference when dates are in the same table.
Any help appreciated.
You could try something like this.
select T1.member_id,
datediff(day, T1.transfer_date, T3.transfer_date) as DD
from YourTable as T1
cross apply (select top 1 T2.transfer_date
from YourTable as T2
where T2.transfer_date > T1.transfer_date and
T2.member_id = T1.member_id
order by T2.transfer_date) as T3
SE-Data
You must select 1st and 2nd records that you want, then get their dates and get DATEDIFF of those two dates.
DATEDIFF(date1, date2);
Your problem is getting the next member date.
Here is an example using a correlated subquery to get the next date:
select t.*, datediff(day, t.transfer_date, nextdate) as Days_Between
from (select t.*,
(select min(transfer_date)
from trans t2
where t.bd_id = t2.bd_id and
t.member_id = t2.member_id and
t.transfer_date < t2.transfer_date
) as NextDate
from trans t
) t
SQL Server 2012 has a function called lag() that makes this a bit easier to express.