SQL JOIN combining columns to a single column - sql

I've written the following query for Microsoft SQL Server 2008 R2 ...
with
downloads as
(
select convert(varchar(10), timestamp, 112) as downloadDate, COUNT(*) as counter
from <download_table>
group by convert(varchar(10), timestamp,112)
),
uploads as
(
select CONVERT(varchar(10), dateadded, 112) as uploadDate, COUNT(*) as counter
from <upload_table>
group by CONVERT(varchar(10), dateadded, 112)
)
select
downloads.downloadDate,
uploads.uploadDate,
downloads.counter as dCount,
uploads.counter as uCount
from downloads
full join uploads on uploads.uploadDate = downloads.downloadDate
order by downloadDate desc;
which returns the following table...
downloadDate uploadDate dCount uCount
20121211 NULL 40 NULL
20121210 NULL 238 NULL
20121207 20121207 526 4
20121206 20121206 217 12
20121205 NULL 108 NULL
20121204 20121204 190 13
20121203 NULL 141 NULL
20121130 20121130 248 187
20121129 NULL 134 NULL
20121128 NULL 102 NULL
20121127 20121127 494 57
20121126 NULL 153 NULL
20121119 20121119 319 20
20121118 NULL 4 NULL
20121116 20121116 215 16
20121112 20121112 431 144
20121109 20121109 168 48
20121108 20121108 132 181
NULL 20121125 NULL 3
but I can't get the two dates to combine into a single 'date' column without getting some NULL entries, nor can I get the NULL values in the dCount or uCount to display 0 instead of NULL.
Can somebody help me with this please ?

In SQL Serve,r you can use COALESCE around the date field which returns the first non-null value and ISNULL around the count totals to replace the null value with zero:
with
downloads as
(
select convert(varchar(10), timestamp, 112) as downloadDate, COUNT(*) as counter
from download_table
group by convert(varchar(10), timestamp,112)
),
uploads as
(
select CONVERT(varchar(10), dateadded, 112) as uploadDate, COUNT(*) as counter
from upload_table
group by CONVERT(varchar(10), dateadded, 112)
)
select
coalesce(downloads.downloadDate, uploads.uploadDate) as dDate,
isnull(downloads.counter, 0) as dCount,
isnull(uploads.counter, 0) as uCount
from downloads
full join uploads
on uploads.uploadDate = downloads.downloadDate
order by downloadDate desc;
See SQL Fiddle with Demo
Result:
| DDATE | DCOUNT | UCOUNT |
------------------------------
| 20121211 | 2 | 0 |
| 20121210 | 1 | 1 |
| 20121207 | 1 | 0 |
| 20121206 | 2 | 1 |
| 20121208 | 0 | 1 |
| 20121209 | 0 | 1 |
| 20121204 | 0 | 1 |
| 20121205 | 0 | 1 |

Depending on your SQL dialect, something like IFNULL(), NVL(), COALESCE(), IIF() etc. will help you get rid of the NULLs in favour of a date in the past, such as '18000101'.
After having done that, you can use MAX(), SWITCH(), IIF(), IF() or friends to make a single "last usage date" column.

You can use coalesce and nvl like this:
with
downloads as
(
select convert(varchar(10), timestamp, 112) as downloadDate, COUNT(*) as counter
from <download_table>
group by convert(varchar(10), timestamp,112)
),
uploads as
(
select CONVERT(varchar(10), dateadded, 112) as uploadDate, COUNT(*) as counter
from <upload_table>
group by CONVERT(varchar(10), dateadded, 112)
)
select
coalesce(downloads.downloadDate, uploads.uploadDate) as dDate,
nvl(downloads.counter, 0) as dCount,
nvl(uploads.counter, 0) as uCount
from downloads
full join uploads on uploads.uploadDate = downloads.downloadDate
order by downloadDate desc;

Firstly, it is much better to cast a datetime to a date if you want to remove the time element, rather than converting to varchar, in SQL-Server 2008 you can simply use:
CAST(DateAdded AS DATE)
Then rather than using a FULL JOIN I would do this do this using UNION ALL, it should perform better (although I can't say 100% without testing on your actual data).
WITH Data AS
( SELECT [Date] = CAST(Timestamp AS DATE),
[Downloads] = 1,
[Uploads] = 0
FROM Download_Table
UNION ALL
SELECT [Date] = CAST(DateAdded AS DATE),
[Downloads] = 0,
[Uploads] = 1
FROM Upload_Table
)
SELECT [Date],
[Downloads] = SUM(Downloads),
[Uploads] = SUM(Uploads)
FROM Data
GROUP BY [Date]
ORDER BY [Date];

Related

Return Min-Max value of dates into one column with condition

I have the data shown below:
ID_DATA | DATE
--------------------
1101 | 2020-02-01
1101 | 2020-02-02
1101 | 2020-02-03
1102 | 2020-02-01
1102 | 2020-02-01
What I want is, under similar ID_DATA, there will be one column showing concatenated date range as string 'MIN(date) - MAX(date)'. But if under similar ID_DATA the date is similar, it will only show the DATE as is.
Note that there might be more than 2 rows of DATE for a single ID_DATA. I'm hoping to use case-when.
Following expected result:
ID_DATA DATE
1101 2020-02-01 - 2020-02-03
1102 2020-02-01
Try this option:
SELECT
ID_DATA,
CONVERT(varchar, MIN(DATE), 111) +
CASE WHEN MIN(DATE) < MAX(DATE)
THEN ' - ' + CONVERT(varchar, MAX(DATE), 111)
ELSE '' END AS DATE
FROM yourTable
GROUP BY
ID_DATA;
The logic here is use a CASE expression to check if the minimum date for an ID be smaller than the maximum date. If so, then we display the upper range, otherwise we just report the minimum date.
You can try the below using the min() and max() function.
WITH cte (
ID_DATA
,MinDate
,MaxDate
)
AS (
SELECT ID_DATA
,Min(DtDATE) AS MinDate
,Max(DtDATE) AS MaxDate
FROM TblData
GROUP BY ID_DATA
)
SELECT ID_DATA
,CASE
WHEN MinDate = MaxDate
THEN Convert(VARCHAR(10), MinDate)
ELSE Convert(VARCHAR(10), MinDate) + ' - ' + Convert(VARCHAR(10), MaxDate)
END AS DatePeriod
FROM cte
Here is the live db<>fiddle demo.

How to get Cumulative data from the same table using SQL?

I have this table
table1
eventid entityid eventdate
----------------------------------------
123 xyz Jan-02-2019
541 xyz Jan-02-2019
234 xyz Jan-03-2019
432 xyz Jan-04-2019
111 xyz Jan-05-2019
124 xyz Jan-06-2019
123 xyz Jan-07-2019
234 xyz Jan-08-2019
432 xyz Jan-09-2019
111 xyz Jan-12-2019
I want to show final result as
entityid interval1 interval2
------------------------------
xyz 2 4
here intervals are in days.
Logic to calculate intervals are :
Ex - event 123 and 234 happens multiple time so date difference between each occurrence as shown below would be added finally into interval1.
Please note - its not necessary 234 would be always in a next row of 123. there could be other events in between.
Formula is
interval1 = datediff(day,eventdate of 123,eventdate of 234) + datediff(day,eventdate of 123,eventdate of 234) + and so on
Same for interval2 but for event 432 & 111.
entityid eventid1 eventid2 event_date_diff
--------------------------------------------
xyz 123 234 1
xyz 123 234 1
xyz 432 111 1
xyz 432 111 3
The challenge here is to find out if event 123 has 234 event or not in upcoming rows (not necessarily in immediate next row) and if its there then find the date difference. If there are any other events between 123-234 then we need to ignore those in between events. Also if 123 appears twice then need latest eventdate for 123.
Let's go over this in terms of your requirements, and build up the necessary pieces. This won't be approached in the order you stated them in, but in an order that makes them easier to understand.
Also if 123 appears twice then need latest eventdate for 123.
This means we need to create a range bounds. This is pretty easy:
NextOccurence AS (SELECT eventId, entityId, eventDate,
LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
FROM Table1)
... this will give us every occurrence of an event, with the next one, if present (these can be limited to just your "source" events, but I'm not bothering with that here).
The challenge here is to find out if event 123 has 234 event or not in upcoming rows (not necessarily in immediate next row) and if its there then find the date difference. If there are any other events between 123-234 then we need to ignore those in between events.
(and you previously mentioned it should be the minimum following date, if there were multiple following events).
For this we need to first map events:
EventMap AS (SELECT 123 AS original, 234 AS follow
UNION ALL
SELECT 432, 111)
... and use this to get the "next" following event in range, in what is partially a greatest-n-per-group query:
SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
FROM NextOccurence
JOIN EventMap
ON EventMap.original = NextOccurence.eventId
CROSS APPLY (SELECT TOP 1 Table1.eventDate
FROM Table1
WHERE Table1.entityId = NextOccurence.entityId
AND Table1.eventId = EventMap.follow
AND Table1.eventDate >= NextOccurence.eventDate
AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
ORDER BY Table1.eventDate) AS Table1
... at this point, we have something close to your intermediate results table:
| entityId | eventId | diff |
|----------|---------|------|
| xyz | 123 | 1 |
| xyz | 123 | 1 |
| xyz | 432 | 1 |
| xyz | 432 | 3 |
... and what follows afterwards would be a standard PIVOT query to aggregate the results.
The final query ends up looking like this:
WITH NextOccurence AS (SELECT eventId, entityId, eventDate,
LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
FROM Table1),
EventMap AS (SELECT 123 AS original, 234 AS follow
UNION ALL
SELECT 432, 111)
SELECT entityId, [123] AS '123-234', [432] AS '432-111'
FROM (SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
FROM NextOccurence
JOIN EventMap
ON EventMap.original = NextOccurence.eventId
CROSS APPLY (SELECT TOP 1 Table1.eventDate
FROM Table1
WHERE Table1.entityId = NextOccurence.entityId
AND Table1.eventId = EventMap.follow
AND Table1.eventDate >= NextOccurence.eventDate
AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
ORDER BY Table1.eventDate) AS Table1) AS d
PIVOT (SUM(diff)
FOR eventId IN ([123], [432])
) AS pvt
Fiddle example
...which generates the expected results:
| entityId | 123-234 | 432-111 |
|----------|---------|---------|
| xyz | 2 | 4 |
From what I understood of the question, we are asked to provide occurrences of each eventid per date. However these are to be represented in columns rather than rows.
My approach to this problem is firstly to pivot the data within a cte and then to select the unique value from each column in to the cross apply operator of a query. There may be better ways of doing it but this made the most sense to me.
DECLARE #T TABLE
(
EventId INT,
EntityId NVARCHAR(3),
EventDate DATETIME
);
INSERT INTO #T (EventId, EntityId, EventDate)
SELECT * FROM (VALUES
(123, 'xyz', '2019-01-02'),
(234, 'xyz', '2019-01-03'),
(432, 'xyz', '2019-01-04'),
(111, 'xyz', '2019-01-05'),
(124, 'xyz', '2019-01-06'),
(123, 'xyz', '2019-01-07'),
(234, 'xyz', '2019-01-08'),
(432, 'xyz', '2019-01-09'),
(111, 'xyz', '2019-01-12')
) X (EVENTID, ENTITYID, EVENTDATE);
with cte as (
select EntityId, [123] as Interval1, [234] as Interval2, [432] as Interval3, [111] as
Interval4, [124] as Interval5
from
(
select top 5 EntityId, EventId, min(eventdate) as ordering, count(distinct EventDate)
as
vol from #T
group by EntityId, EventId
order by ordering
) src
PIVOT
(
max(vol)
for EventId in ([123], [234], [432], [111], [124])
) as pvt)
select distinct EntityId, Interval1, Interval2, Interval3, Interval4, Interval5
from (select EntityId from cte) a
cross apply
(select Interval1 from cte where Interval1 is not null) b
cross apply
(select Interval2 from cte where Interval2 is not null) c
cross apply
(select Interval3 from cte where Interval3 is not null) d
cross apply
(select Interval4 from cte where Interval4 is not null) e
cross apply
(select Interval5 from cte where Interval5 is not null) f;
You can use lead() and conditional aggregation for this:
select sum(case when eventid = 123 and next_eventid = 234
then datediff(day, eventdate, next_eventdate)
end) as interval1,
sum(case when eventid = 432 and next_eventid = 111
then datediff(day, eventdate, next_eventdate)
end) as interval2
from (select t.*,
lead(eventid) over (partition by entityid order by eventdate) as next_eventid,
lead(eventdate) over (partition by entityid order by eventdate) as next_eventdate
from t
) t;
Probably the simplest way to handle intervening events is conditional cumulative arithemtic:
select sum(case when eventid = 123 and
then datediff(day, eventdate, next_eventdate_234)
end) as interval1,
sum(case when eventid = 432 and
then datediff(day, eventdate, next_eventdate_111)
end) as interval2
from (select t.*,
min(case when eventid = 234 then eventdate end) over (order by eventdate desc) as next_eventdate_234,
min(case when eventid = 111 then eventdate end) over (order by eventdate desc) as next_eventdate_111
from t
where eventid in (123, 234)
) t
where eventid in (123, 432);

SQL: Generate Record Per Month In Date Range

I have a table which describes a value which is valid for a certain period of days / months.
The table looks like this:
+----+------------+------------+-------+
| Id | From | To | Value |
+----+------------+------------+-------+
| 1 | 2018-01-01 | 2018-03-31 | ValA |
| 2 | 2018-01-16 | NULL | ValB |
| 3 | 2018-04-01 | 2018-05-12 | ValC |
+----+------------+------------+-------+
As you can see, the only value still valid on this day is ValB (To is nullable, From isn't).
I am trying to achieve a view on this table like this (assuming I render this view someday in july 2018):
+----------+------------+------------+-------+
| RecordId | From | To | Value |
+----------+------------+------------+-------+
| 1 | 2018-01-01 | 2018-01-31 | ValA |
| 1 | 2018-02-01 | 2018-02-28 | ValA |
| 1 | 2018-03-01 | 2018-03-31 | ValA |
| 2 | 2018-01-16 | 2018-01-31 | ValB |
| 2 | 2018-02-01 | 2018-02-28 | ValB |
| 2 | 2018-03-01 | 2018-03-31 | ValB |
| 2 | 2018-04-01 | 2018-04-30 | ValB |
| 2 | 2018-05-01 | 2018-05-31 | ValB |
| 2 | 2018-06-01 | 2018-06-30 | ValB |
| 3 | 2018-04-01 | 2018-04-30 | ValC |
| 3 | 2018-05-01 | 2018-05-12 | ValC |
+----------+------------+------------+-------+
This view basically creates a record for each record in the table, but splitted by month, using the correct dates (especially minding the start and end dates that are not on the first or the last day of the month).
The one record without a To date (so it's still valid to this day), is rendered until the last day of the month in which I render the view, so at the time of writing, this is july 2018.
This is a simple example, but a solution will seriously help me along. I'll need this for multiple calculations, including proration of amounts.
Here's a table script and some insert statements that you can use:
CREATE TABLE [dbo].[Test]
(
[Id] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[From] SMALLDATETIME NOT NULL,
[To] SMALLDATETIME NULL,
[Value] NVARCHAR(100) NOT NULL
)
INSERT INTO dbo.Test ([From],[To],[Value])
VALUES
('2018-01-01','2018-03-31','ValA'),
('2018-01-16',null,'ValB'),
('2018-04-01','2018-05-12','ValC');
Thanks in advance!
Generate all months that might appear on your values (with start and end), then join where each month overlaps the period of your values. Change the result so if a month doesn't overlap fully, you just display the limits of your period.
DECLARE #StartDate DATE = '2018-01-01'
DECLARE #EndDate DATE = '2020-01-01'
;WITH GeneratedMonths AS
(
SELECT
StartDate = #StartDate,
EndDate = EOMONTH(#StartDate)
UNION ALL
SELECT
StartDate = DATEADD(MONTH, 1, G.StartDate),
EndDate = EOMONTH(DATEADD(MONTH, 1, G.StartDate))
FROM
GeneratedMonths AS G
WHERE
DATEADD(MONTH, 1, G.StartDate) < #EndDate
)
SELECT
T.Id,
[From] = CASE WHEN T.[From] >= G.StartDate THEN T.[From] ELSE G.StartDate END,
[To] = CASE WHEN G.EndDate >= T.[To] THEN T.[To] ELSE G.EndDate END,
T.Value
FROM
dbo.Test AS T
INNER JOIN GeneratedMonths AS G ON
G.EndDate >= T.[From] AND
G.StartDate <= ISNULL(T.[To], GETDATE())
ORDER BY
T.Id,
G.StartDate
OPTION
(MAXRECURSION 3000)
Recursive cte is very simple way if you don't have a large dataset :
with t as (
select id, [from], [to], Value
from Test
union all
select id, dateadd(mm, 1, [from]), [to], value
from t
where dateadd(mm, 1, [from]) < coalesce([to], getdate())
)
select id, [from], (case when eomonth([from]) <= coalesce([to], cast(getdate() as date))
then eomonth([from]) else coalesce([to], eomonth([from]))
end) as [To],
Value
from t
order by id;
By using date functions and recursive CTE.
with cte as
(
Select Id, Cast([From] as date) as [From], EOMONTH([from]) as [To1],
COALESCE([To],EOMONTH(GETDATE())) AS [TO],Value from test
UNION ALL
Select Id, DATEADD(DAY,1,[To1]),
CASE when EOMONTH(DATEADD(DAY,1,[To1])) > [To] THEN CAST([To] AS DATE)
ELSE EOMONTH(DATEADD(DAY,1,[To1])) END as [To1],
[To],Value from cte where TO1 <> [To]
)
Select Id, [From],[To1] as [To], Value from cte order by Id
#EzLo your solution is good but require setting 2 variables with fixed values.
To avoid this you can do recursive CTE on real data
WITH A AS(
SELECT
T.Id, CAST(T.[From] AS DATE) AS [From], CASE WHEN T.[To]<EOMONTH(T.[From], 0) THEN T.[To] ELSE EOMONTH(T.[From], 0) END AS [To], T.Value, CAST(0 AS INTEGER) AS ADD_M
FROM
TEST T
UNION ALL
SELECT
T.Id, DATEADD(DAY, 1, EOMONTH(T.[From], -1+(A.ADD_M+1))), CASE WHEN T.[To]<EOMONTH(T.[From], A.ADD_M+1) THEN T.[To] ELSE EOMONTH(T.[From], A.ADD_M+1) END AS [To], T.Value, A.ADD_M+1
FROM
TEST T
INNER JOIN A ON T.Id=A.Id AND DATEADD(MONTH, A.ADD_M+1, T.[From]) < CASE WHEN T.[To] IS NULL THEN CAST(GETDATE() AS DATE) ELSE T.[To] END
)
SELECT
A.[Id], A.[From], A.[To], A.[Value]
FROM
A
ORDER BY A.[Id], A.[From]

SQL how to count census points occurring between date records

I’m using MS-SQL-2008 R2 trying to write a script that calculates the Number of Hospital Beds occupied on any given day, at 2 census points: midnight, and 09:00.
I’m working from a data set of patient Ward Stays. Basically, each row in the table is a record of an individual patient's stay on a single ward, and records the date/time the patient is admitted onto the ward, and the date/time the patient leaves the ward.
A sample of this table is below:
Ward_Stay_Primary_Key | Ward_Start_Date_Time | Ward_End_Date_Time
1 | 2017-09-03 15:04:00.000 | 2017-09-27 16:55:00.000
2 | 2017-09-04 18:08:00.000 | 2017-09-06 18:00:00.000
3 | 2017-09-04 13:00:00.000 | 2017-09-04 22:00:00.000
4 | 2017-09-04 20:54:00.000 | 2017-09-08 14:30:00.000
5 | 2017-09-04 20:52:00.000 | 2017-09-13 11:50:00.000
6 | 2017-09-05 13:32:00.000 | 2017-09-11 14:49:00.000
7 | 2017-09-05 13:17:00.000 | 2017-09-12 21:00:00.000
8 | 2017-09-05 23:11:00.000 | 2017-09-06 17:38:00.000
9 | 2017-09-05 11:35:00.000 | 2017-09-14 16:12:00.000
10 | 2017-09-05 14:05:00.000 | 2017-09-11 16:30:00.000
The key thing to note here is that a patient’s Ward Stay can span any length of time, from a few hours to many days.
The following code enables me to calculate the number of beds at both census points for any given day, by specifying the date in the case statement:
SELECT
'05/09/2017' [Date]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 00:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 00:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 00:00]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 09:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 09:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 09:00]
FROM
WardStaysTable
And, based on the sample 10 records above, generates this output:
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
05/09/2017 | 4 | 4
To perform this for any number of days is obviously onerous, so what I’m looking to create is a query where I can specify a start/end date parameter (e.g. 1st-5th Sept), and for the query to then evaluate the Ward_Start_Date_Time and Ward_End_Date_Time variables for each record, and – grouping by the dates defined in the date parameter – count each time the 00:00:00.000 and 09:00:00.000 census points fall between these 2 variables, to give an output something along these lines (based on the above 10 records):
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
01/09/2017 | 0 | 0
02/09/2017 | 0 | 0
03/09/2017 | 0 | 0
04/09/2017 | 1 | 1
05/09/2017 | 4 | 4
I’ve approached this (perhaps naively) thinking that if I use a cte to create a table of dates (defined by the input parameters), along with associated midnight and 9am census date/time points, then I could use these variables to group and evaluate the dataset.
So, this code generates the grouping dates and census date/time points:
DECLARE
#StartDate DATE = '01/09/2017'
,#EndDate DATE = '05/09/2017'
,#0900 INT = 540
SELECT
DATEADD(DAY, nbr - 1, #StartDate) [Date]
,CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))) [MidnightDate]
,DATEADD(mi, #0900,(CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))))) [0900Date]
FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(DAY, #StartDate, #EndDate)
The stumbling block I’ve hit is how to join the cte to the WardStays dataset, because there’s no appropriate key… I’ve tried a few iterations of using a subquery to make this work, but either I’m taking the wrong approach or I’m getting my syntax in a mess.
In simple terms, the logic I’m trying to create to get the output is something like:
SELECT
[Date]
,SUM (case when WST.Ward_Start_Date_Time <= [MidnightDate] AND (WST.Ward_End_Date_Time >= [MidnightDate] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 00:00]
,SUM (case when WST.Ward_Start_Date_Time <= [0900Date] AND (WST.Ward_End_Date_Time >= [0900Date] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 09:00]
FROM WardStaysTable WST
GROUP BY [Date]
Is the above somehow possible, or am I barking up the wrong tree and need to take a different approach altogether? Appreciate any advice.
I would expect something like this:
WITH dates as (
SELECT CAST(#StartDate as DATETIME) as dte
UNION ALL
SELECT DATEADD(DAY, 1, dte)
FROM dates
WHERE dte < #EndDate
)
SELECT dates.dte [Date],
SUM(CASE WHEN Ward_Start_Date_Time <= dte AND
Ward_END_Date_Time >= dte
THEN 1 ELSE 0
END) as num_beds_0000,
SUM(CASE WHEN Ward_Start_Date_Time <= dte + CAST('09:00' as DATETIME) AND
Ward_END_Date_Time >= dte + CAST('09:00' as DATETIME)
THEN 1 ELSE 0
END) as num_beds_0900
FROM dates LEFT JOIN
WardStaysTable wt
ON wt.Ward_Start_Date_Time <= DATEADD(day, 1, dates.dte) AND
wt.Ward_END_Date_Time >= dates.dte
GROUP BY dates.dte
ORDER BY dates.dte;
The cte is just creating the list of dates.
What a cool exercise. Here is what I came up with:
CREATE TABLE #tmp (ID int, StartDte datetime, EndDte datetime)
INSERT INTO #tmp values(1,'2017-09-03 15:04:00.000','2017-09-27 06:55:00.000')
INSERT INTO #tmp values(2,'2017-09-04 08:08:00.000','2017-09-06 18:00:00.000')
INSERT INTO #tmp values(3,'2017-09-04 13:00:00.000','2017-09-04 22:00:00.000')
INSERT INTO #tmp values(4,'2017-09-04 20:54:00.000','2017-09-08 14:30:00.000')
INSERT INTO #tmp values(5,'2017-09-04 20:52:00.000','2017-09-13 11:50:00.000')
INSERT INTO #tmp values(6,'2017-09-05 13:32:00.000','2017-09-11 14:49:00.000')
INSERT INTO #tmp values(7,'2017-09-05 13:17:00.000','2017-09-12 21:00:00.000')
INSERT INTO #tmp values(8,'2017-09-05 23:11:00.000','2017-09-06 07:38:00.000')
INSERT INTO #tmp values(9,'2017-09-05 11:35:00.000','2017-09-14 16:12:00.000')
INSERT INTO #tmp values(10,'2017-09-05 14:05:00.000','2017-09-11 16:30:00.000')
DECLARE
#StartDate DATE = '09/01/2017'
,#EndDate DATE = '10/01/2017'
, #nHours INT = 9
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, #StartDate)
FROM (SELECT TOP (DATEDIFF(DAY, #StartDate, #EndDate) + 1)
ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects) AS x(n)
)
, CTE AS(
select OrderDate, t2.*
from #tmp t2
cross apply(select orderdate from d ) d
where StartDte >= #StartDate and EndDte <= #EndDate)
select OrderDate,
SUM(CASE WHEN OrderDate >= StartDte and OrderDate <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 00:00],
SUM(CASE WHEN StartDTE <= DateAdd(hour,#nHours,CAST(OrderDate as datetime)) and DateAdd(hour,#nHours,CAST(OrderDate as datetime)) <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 09:00]
from CTE
GROUP BY OrderDate
This should allow you to check for any hour of the day using the #nHours parameter if you so choose. If you only want to see records that actually fall within your date range then you can filter the cross apply on start and end dates.

Delete rows in single table in SQL Server where timestamp column differs

I have a table of employee timeclock punches that looks something like this:
| EmployeeID | PunchDate | PunchTime | PunchType | Sequence |
|------------|------------|-----------|-----------|----------|
| 5386 | 12/27/2016 | 03:57:42 | On Duty | 552 |
| 5386 | 12/27/2016 | 09:30:00 | Off Duty | 563 |
| 5386 | 12/27/2016 | 010:02:00 | On Duty | 564 |
| 5386 | 12/27/2016 | 12:10:00 | Off Duty | 570 |
| 5386 | 12/27/2016 | 12:22:00 | On Duty | 571 |
| 5386 | 12/27/2016 | 05:13:32 | Off Duty | 578 |
What I need to do is delete any rows where the difference in minutes between an Off Duty punch and the following On Duty punch is less than, say, 25 minutes. In the example above, I would want to remove Sequence 570 and 571.
I'm already creating this table by pulling all Off Duty punches from another table and using this query to pull all On Duty punches that follow an Off Duty punch:
SELECT * FROM [dbo].[Punches]
INSERT INTO [dbo].[UpdatePunches (EmployeeID,PunchDate,PunchTime,PunchType,Sequence)
SELECT * FROM [dbo].[Punches]
WHERE Sequence IN (
SELECT Sequence + 1
FROM [dbo].[Punches]
WHERE PunchType LIKE 'Off Duty%') AND
PunchType LIKE 'On Duty%'
I have been trying to fit some sort of DATEDIFF query both in this code and as a separate step to weed these out, but have not had any luck. I can't use specific Sequence numbers because those are going to change for every punch.
I'm using SQL Server 2008.
Any suggestions would be much appreciated.
You can assign rownumbers per employee based on punchdate and punchtime and join each row with the next based on ascending order of date and time.
Thereafter, get the rownumbers of those rows where the difference is less than 25 minutes and finally delete those rows.
with rownums as
(select t.*,row_number() over(partition by employeeid
order by cast(punchdate +' '+punchtime as datetime) ) as rn
from t)
,rownums_to_delete as
(
select r1.rn,r1.employeeid
from rownums r1
join rownums r2 on r1.employeeid=r2.employeeid and r1.rn=r2.rn+1
where dateadd(minute,25,cast(r2.punchdate +' '+r2.punchtime as datetime)) > cast(r1.punchdate +' '+r1.punchtime as datetime)
and r1.punchtype <> r2.punchtype
union all
select r2.rn, r2.employeeid
from rownums r1
join rownums r2 on r1.employeeid=r2.employeeid and r1.rn=r2.rn+1
where dateadd(minute,25,cast(r2.punchdate +' '+r2.punchtime as datetime)) > cast(r1.punchdate +' '+r1.punchtime as datetime)
and r1.punchtype <> r2.punchtype
)
delete r
from rownums_to_delete rd
join rownums r on rd.employeeid=r.employeeid and r.rn=rd.rn
Sample Demo
If date and time columns are not varchar but actual date and time datatype, use punchdate+punchtime in the query.
Edit: An easier version of the query would be
with todelete as (
select t1.employeeid,cast(t2.punchdate+' '+t2.punchtime as datetime) as punchtime,
t2.punchtype,t2.sequence,
cast(t1.punchdate+' '+t1.punchtime as datetime) next_punchtime,
t1.punchtype as next_punchtype,t1.sequence as next_sequence
from t t1
join t t2 on t1.employeeid=t2.employeeid
and cast(t2.punchdate+' '+t2.punchtime as datetime) between dateadd(minute,-25,cast(t1.punchdate+' '+t1.punchtime as datetime)) and cast(t1.punchdate+' '+t1.punchtime as datetime)
where t2.punchtype <> t1.punchtype
)
delete t
from t
join todelete td on t.employeeid = td.employeeid
and cast(t.punchdate+' '+t.punchtime as datetime) in (td.punchtime,td.next_punchtime)
;
SQL Server has a nice ability called updatable CTEs. Using lead() and lag(), you can do exactly what you want. The following assumes that the date is actually stored as a datetime -- this is just for the convenience of adding the date and time together (you can also explicitly use conversion):
with todelete as (
select tcp.*,
(punchdate + punchtime) as punchdatetime.
lead(punchtype) over (partition by employeeid order by punchdate, punchtime) as next_punchtype,
lag(punchtype) over (partition by employeeid order by punchdate, punchtime) as prev_punchtype,
lead(punchdate + punchtime) over (partition by employeeid order by punchdate, punchtime) as next_punchdatetime,
lag(punchdate + punchtime) over (partition by employeeid order by punchdate, punchtime) as prev_punchdatetime
from timeclockpunches tcp
)
delete from todelete
where (punchtype = 'Off Duty' and
next_punchtype = 'On Duty' and
punchdatetime > dateadd(minute, -25, next_punchdatetime)
) or
(punchtype = 'On Duty' and
prev_punchtype = 'Off Duty' and
prev_punchdatetime > dateadd(minute, -25, punchdatetime)
);
EDIT:
In SQL Server 2008, you can do use the same idea, just not as efficiently:
delete t
from t outer apply
(select top 1 tprev.*
from t tprev
where tprev.employeeid = t.employeeid and
(tprev.punchdate < t.punchdate or
(tprev.punchdate = t.punchdate and tprev.punchtime < t.punchtime)
)
order by tprev.punchdate desc, tprev.punchtime desc
) tprev outer apply
(select top 1 tnext.*
from t tnext
where tnext.employeeid = t.employeeid and
(t.punchdate < tnext.punchdate or
(t.punchdate = tnext.punchdate and t.punchtime < tnext.punchtime)
)
order by tnext.punchdate desc, tnext.punchtime desc
) tnext
where (t.punchtype = 'Off Duty' and
tnext.punchtype = 'On Duty' and
t.punchdatetime > dateadd(minute, -25, tnext.punchdatetime)
) or
(t.punchtype = 'On Duty' and
tprev.punchtype = 'Off Duty' and
tprev.punchdatetime > dateadd(minute, -25, t.punchdatetime)
);
You could create a DateTime from the Date and Time fields in a CTE and then lookup the next On Duty Time after the Off Duty Time like below:
;
WITH OnDutyDateTime AS
(
SELECT
EmployeeID,
Sequence,
DutyDateTime = DATEADD(ms, DATEDIFF(ms, '00:00:00', PunchTime), CONVERT(DATETIME, PunchDate))
FROM
#TempEmployeeData
where PunchType = 'On Duty'
),
OffDutyDateTime As
(
SELECT
EmployeeID,
Sequence,
DutyDateTime = DATEADD(ms, DATEDIFF(ms, '00:00:00', PunchTime), CONVERT(DATETIME, PunchDate))
FROM
#TempEmployeeData
where PunchType = 'Off Duty'
)
SELECT
OffDutyDateTime = DutyDateTime,
OnDutyDateTime = (SELECT TOP 1 DutyDateTime FROM OnDutyDateTime WHERE EmployeeID = A.EmployeeID AND Sequence > A.Sequence ORDER BY Sequence ASC ),
DiffInMinutes = DATEDIFF(minute,DutyDateTime,(SELECT TOP 1 DutyDateTime FROM OnDutyDateTime WHERE EmployeeID = A.EmployeeID AND Sequence > A.Sequence ORDER BY Sequence ASC ))
FROM
OffDutyDateTime A
OffDutyDateTime OnDutyDateTime DiffInMinutes
----------------------- ----------------------- -------------
2016-12-27 09:30:00.000 2016-12-27 10:02:00.000 32
2016-12-27 12:10:00.000 2016-12-27 12:22:00.000 12
2016-12-28 05:13:32.000 NULL NULL
(3 row(s) affected)
Maybe something like this would be easy to slap in there.. This simply uses a subquery to find the next 'on duty' punch and compare it in the main query to the 'off duty' punch.
Delete
FROM [dbo].[Punches] p
where p.PunchTime >=
dateadd(minute, -25, isnull (
(select top 1 p2.PunchTime from [dbo].[Punches] p2 where
p2.EmployeeID=p.EmployeeID and p2.PunchType='On Duty'
and p1.Sequence < p2.Sequence and p2.PunchDate=p.PunchDate
order by p2.Sequence asc)
),'2500-01-01')
and p.PunchType='Off Duty'