Get total working hours from SQL table - sql

I have an attendance SQL table that stores the start and end day's punch of employee. Each punch (punch in and punch out) is in a separate record.
I want to calculate the total working hour of each employee for a requested month.
I tried to make a scalar function that takes two dates and employee ID and return the calculation of the above task, but it calculate only the difference of one date between all dates.
The data is like this:
000781 2015-08-14 08:37:00 AM EMPIN 539309898
000781 2015-08-14 08:09:48 PM EMPOUT 539309886
My code is:
#FromDate NVARCHAR(10)
,#ToDate NVARCHAR(10)
,#EmpID NVARCHAR(6)
CONVERT(NVARCHAR,DATEDIFF(HOUR
,(SELECT Time from PERS_Attendance att where attt.date between convert(date,#fromDate) AND CONVERT(Date,#toDate)
AND (EmpID= #EmpID OR ISNULL(#EmpID, '') = '') AND Funckey = 'EMPIN')
,(SELECT Time from PERS_Attendance att where attt.date between convert(date,#fromDate) AND CONVERT(Date,#toDate)
AND (EmpID= #EmpID OR ISNULL(#EmpID, '') = '') AND Funckey = 'EMPOUT') ))
FROM PERS_Attendance attt

One more approach that I think is simple and efficient.
It doesn't require modern functions like LEAD
it works correctly if the same person goes in and out several times during the same day
it works correctly if the person stays in over the midnight or even for several days in a row
it works correctly if the period when person is "in" overlaps the start OR end date-time.
it does assume that data is correct, i.e. each "in" is matched by "out", except possibly the last one.
Here is an illustration of a time-line. Note that start time happens when a person was "in" and end time also happens when a person was still "in":
All we need to do it calculate a plain sum of time differences between each event (both in and out) and start time, then do the same for end time. If event is in, the added duration should have a positive sign, if event is out, the added duration should have a negative sign. The final result is a difference between sum for end time and sum for start time.
summing for start:
|---| +
|----------| -
|-----------------| +
|--------------------------| -
|-------------------------------| +
--|====|--------|======|------|===|=====|---|==|---|===|====|----|=====|--- time
in out in out in start out in out in end out in out
summing for end:
|---| +
|-------| -
|----------| +
|--------------| -
|------------------------| +
|-------------------------------| -
|--------------------------------------| +
|-----------------------------------------------| -
|----------------------------------------------------| +
I would recommend to calculate durations in minutes and then divide result by 60 to get hours, but it really depends on your requirements. By the way, it is a bad idea to store dates as NVARCHAR.
DECLARE #StartDate datetime = '2015-08-01 00:00:00';
DECLARE #EndDate datetime = '2015-09-01 00:00:00';
DECLARE #EmpID nvarchar(6) = NULL;
WITH
CTE_Start
AS
(
SELECT
EmpID
,SUM(DATEDIFF(minute, (CAST(att.[date] AS datetime) + att.[Time]), #StartDate)
* CASE WHEN Funckey = 'EMPIN' THEN +1 ELSE -1 END) AS SumStart
FROM
PERS_Attendance AS att
WHERE
(EmpID = #EmpID OR #EmpID IS NULL)
AND att.[date] < #StartDate
GROUP BY EmpID
)
,CTE_End
AS
(
SELECT
EmpID
,SUM(DATEDIFF(minute, (CAST(att.[date] AS datetime) + att.[Time]), #StartDate)
* CASE WHEN Funckey = 'EMPIN' THEN +1 ELSE -1 END) AS SumEnd
FROM
PERS_Attendance AS att
WHERE
(EmpID = #EmpID OR #EmpID IS NULL)
AND att.[date] < #EndDate
GROUP BY EmpID
)
SELECT
CTE_End.EmpID
,(SumEnd - ISNULL(SumStart, 0)) / 60.0 AS SumHours
FROM
CTE_End
LEFT JOIN CTE_Start ON CTE_Start.EmpID = CTE_End.EmpID
OPTION(RECOMPILE);
There is LEFT JOIN between sums for end and start times, because there can be EmpID that has no records before the start time.
OPTION(RECOMPILE) is useful when you use Dynamic Search Conditions in T‑SQL. If #EmpID is NULL, you'll get results for all people, if it is not NULL, you'll get result just for one person.
If you need just one number (a grand total) for all people, then wrap the calculation in the last SELECT into SUM(). If you always want a grand total for all people, then remove #EmpID parameter altogether.
It would be a good idea to have an index on (EmpID,date).

My approach would be as follows:
CREATE FUNCTION [dbo].[MonthlyHoursByEmpID]
(
#StartDate Date,
#EndDate Date,
#Employee NVARCHAR(6)
)
RETURNS FLOAT
AS
BEGIN
DECLARE #TotalHours FLOAT
DECLARE #In TABLE ([Date] Date, [Time] Time)
DECLARE #Out TABLE ([Date] Date, [Time] Time)
INSERT INTO #In([Date], [Time])
SELECT [Date], [Time]
FROM PERS_Attendance
WHERE [EmpID] = #Employee AND [Funckey] = 'EMPIN' AND ([Date] > #StartDate AND [Date] < #EndDate)
INSERT INTO #Out([Date], [Time])
SELECT [Date], [Time]
FROM PERS_Attendance
WHERE [EmpID] = #Employee AND [Funckey] = 'EMPOUT' AND ([Date] > #StartDate AND [Date] < #EndDate)
SET #TotalHours = (SELECT SUM(CONVERT([float],datediff(minute,I.[Time], O.[Time]))/(60))
FROM #in I
INNER JOIN #Out O
ON I.[Date] = O.[Date])
RETURN #TotalHours
END

Assuming the entries are properly paired (in -> out -> in -> out -> in etc).
SQL Server 2012 and later:
DECLARE #Year int = 2015
DECLARE #Month int = 8
;WITH
cte AS (
SELECT EmpID,
InDate = LAG([Date], 1) OVER (PARTITION BY EmpID ORDER BY [Date]),
OutDate = [Date],
HoursWorked = DATEDIFF(hour, LAG([Date], 1) OVER (PARTITION BY EmpID ORDER BY [Date]), [Date]),
Funckey
FROM PERS_Attendance
)
SELECT EmpID,
TotalHours = SUM(HoursWorked)
FROM cte
WHERE Funckey = 'EMPOUT'
AND YEAR(InDate) = #Year
AND MONTH(InDate) = #Month
GROUP BY EmpID
SQL Server 2005 and later:
;WITH
cte1 AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY [Date])
FROM PERS_Attendance
),
cte2 AS (
SELECT a.EmpID, b.[Date] As InDate, a.[Date] AS OutDate,
HoursWorked = DATEDIFF(hour, b.[Date], a.[Date])
FROM cte1 a
LEFT JOIN cte1 b ON a.EmpID = b.EmpID and a.rn = b.rn + 1
WHERE a.Funckey = 'EMPOUT'
)
SELECT EmpID,
TotalHours = SUM(HoursWorked)
FROM cte2
WHERE YEAR(InDate) = #Year
AND MONTH(InDate) = #Month
GROUP BY EmpID

Related

my end goal is to see end of month data for previous month

My end goal is to see end of month data for previous month.
Our processing is a day behind so if today is 7/28/2021 our Process date is 7/27/2021
So, I want my data to be grouped.
DECLARE
#ProcessDate INT
SET #ProcessDate = (SELECT [PrevMonthEnddatekey] FROM dbo.dimdate WHERE datekey = (SELECT [datekey] FROM sometable [vwProcessDate]))
SELECT
ProcessDate
, LoanOrigRiskGrade
,SUM(LoanOriginalBalance) AS LoanOrigBalance
,Count(LoanID) as CountofLoanID
FROM SomeTable
WHERE
ProcessDate in (20210131, 20210228,20210331, 20210430, 20210531, 20210630)
I do not want to hard code these dates into my WHERE statement. I have attached a sample of my results.
I am GROUPING BY ProcessDate, LoanOrigRiskGrade
Then ORDERING BY ProcessDate, LoanOrigIRskGrade
It looks like you want the last day of the month for months within a specified range. You can parameterize that.
For SQL Server:
DECLARE #ProcessDate INT
SET #ProcessDate = (
SELECT [PrevMonthEnddatekey]
FROM dbo.dimdate
WHERE datekey = (
SELECT [datekey]
FROM sometable [vwProcessDate]
)
)
DECLARE #startDate DATE
DECLARE #endDate DATE
SET #startDate = '2021-01-01'
SET #endDate = '2021-06-30'
;
with d (dt, eom) as (
select #startDate
, convert(int, replace(convert(varchar(10), eomonth(#startDate), 102), '.', ''))
union all
select dateadd(month, 1, dt)
, eomonth(dateadd(month, 1, dt))
from d
where dateadd(month, 1, dt) < #endDate
)
SELECT ProcessDate
, LoanOrigRiskGrade
, SUM(LoanOriginalBalance) AS LoanOrigBalance
, Count(LoanID) as CountofLoanID
FROM SomeTable
inner join d on d.eom = SomeTable.ProcessDate
Difficult to check without sample data.

how to get the list from calendar table and merge to attendance table and leave table

I am using SQL Server 2008 R2. I have a problem in merging tables. We have 500+ employees
I have the following tables:
Calendar table - holds the dates from 1/1/2005 to 12/31/2016
Attendance table - for the attendance of employees
LeaveHistory table - for the leave history
LeaveBreakDown table -for leave break down
Holiday table - for holidays
Our goal, with date range from calendar (11/1/2015 - 11/30/2015)
we want to show the complete days even if the attendance is not equal to the total numbers of days.
Here's my first solution but too slow and without calendar table
FETCH NEXT FROM Employees INTO #EmployeeID,#BranchCode,#IsOfficer, #FirstName, #MiddleName, #LastName, #RankCode;
WHILE ##FETCH_STATUS = 0
BEGIN
WHILE #StartDate <= #EndDate
BEGIN
INSERT INTO #tblData(ActualDate,EmployeeID,BranchCode,IsOfficer, FirstName, MiddleName, LastName, RankCode, LeaveCode, Gender, ShiftCode,ShiftIn,ShiftOut,IsRestDay)
SELECT
#StartDate
, #EmployeeID
, #BranchCode
, #IsOfficer
, #FirstName
, #MiddleName
, #LastName
, #RankCode
, LB.LeaveBreakDownCode
, E.Gender
,ShiftCode = SS.ShiftCode
,ShiftIn = CONVERT(DATETIME, CONVERT(VARCHAR(20), DATEADD(day,0,#StartDate), 101) + ' ' + CONVERT(VARCHAR(20), SS.ShiftIn,108))
,ShiftOut = CASE
WHEN SS.ShiftOut < SS.ShiftIn THEN CONVERT(DATETIME, CONVERT(VARCHAR(20), DATEADD(day,1,#StartDate), 101) + ' ' + CONVERT(VARCHAR(20), SS.ShiftOut ,108))
ELSE CONVERT(DATETIME, CONVERT(VARCHAR(20), DATEADD(day,0,#StartDate), 101) + ' ' + CONVERT(VARCHAR(20), SS.ShiftOut ,108))
END
,IsRestDay = CASE
WHEN SS.Sunday = 1 AND DATEPART(weekday, #StartDate) = 1 THEN 1
WHEN SS.Monday = 1 AND DATEPART(weekday, #StartDate) = 2 THEN 1
WHEN SS.Tuesday = 1 AND DATEPART(weekday, #StartDate) = 3 THEN 1
WHEN SS.Wednesday = 1 AND DATEPART(weekday, #StartDate) = 4 THEN 1
WHEN SS.Thursday = 1 AND DATEPART(weekday, #StartDate) = 5 THEN 1
WHEN SS.Friday = 1 AND DATEPART(weekday, #StartDate) = 6 THEN 1
WHEN SS.Saturday = 1 AND DATEPART(weekday, #StartDate) = 7 THEN 1
ELSE 0
END
FROM Employees E
LEFT JOIN (
SELECT
LB.LeaveBreakDownCode
, LB.EmployeeID
FROM LeaveBreakDown LB
INNER JOIN LeaveHistory LH ON LH.LeaveHistoryCode = LB.LeaveHistoryCode AND LB.DateLeave = #StartDate AND LH.Status IN ('0','1')
WHERE LB.EmployeeID = #EmployeeID
) LB ON LB.EmployeeID = E.EmployeeID
LEFT JOIN ShiftSchedule SS ON SS.EmployeeID = E.EmployeeID AND #StartDate BETWEEN SS.EffectivityDate AND SS.EndDate
WHERE E.Status='1' AND E.ResignedDate IS NULL AND E.EmployeeID = #EmployeeID
SET #StartDate = DATEADD(day,1,#StartDate)
END
SET #StartDate = #InitialDate -- Reinitialize Start Date
FETCH NEXT FROM Employees INTO #EmployeeID, #BranchCode,#IsOfficer,#FirstName, #MiddleName, #LastName, #RankCode;
END;
CLOSE Employees;
DEALLOCATE Employees;
With this solution, if we are going to run the script. it took 3 minutes and sometimes 6minutes.
Might be the structure
Dates ('11/1/2015' - '11/30/2015')
-> Attendance
-> LeaveHistory
All dates from Dates table in date range will be filled with values from different table.
Don't use cursor.In fact it can be done without using RBAR/cursor.
You need to explain little more your table structure.Or start slowly with fewer column and table and ask where you are struck.
Calender Table is not require.What is the purpose of "ShiftSchedule" ?
You can do so by using recursive CTE.
Tell us more then we help you accordingly.
Recursive CTE sample.you can implemented your example here with fewer table and column then gradually increase.
DECLARE #StartDate DATETIME = '11/1/2015'
DECLARE #EndDate DATETIME = '11/30/2015';
WITH CTE
AS (
SELECT #StartDate DT
UNION ALL
SELECT DATEADD(day, 1, DT)
FROM CTE
WHERE DT < #EndDate
)
SELECT *
FROM CTE

How to find the total playing time per day for all the users in my sql server database

I have a table which contains following columns
userid,
game,
gameStarttime datetime,
gameEndtime datetime,
startdate datetime,
currentdate datetime
I can retrieve all the playing times but I want to count the total playing time per DAY and 0 or null if game not played on a specific day.
Take a look at DATEDIFF to do the time calculations. Your requirements are not very clear, but it should work for whatever you're looking to do.
Your end result would probably look something like this:
SELECT
userid,
game,
DATEDIFF(SS, gameStarttime, gameEndtime) AS [TotalSeconds]
FROM [source]
GROUP BY
userid,
game
In the example query above, the SS counts the seconds between the 2 dates (assuming both are not null). If you need just minutes, then MI will provide the total minutes. However, I imagine total seconds is best so that you can convert to whatever unit of measure you need accurate, such as hours that might be "1.23" or something like that.
Again, most of this is speculation based on assumptions and what you seem to be looking for. Hope that helps.
MSDN Docs for DATEDIFF: https://msdn.microsoft.com/en-us/library/ms189794.aspx
You may also look up DATEPART if you want the minutes and seconds separately.
UPDATED BASED ON FEEDBACK
The query below breaks out the hour breakdowns by day, splits time across multiple days, and shows "0" for days where no games are played. Also, for your output, I have to assume you have a separate table of users (so you can show users who have no time in your date range).
-- Define start date
DECLARE #BeginDate DATE = '4/21/2015'
-- Create sample data
DECLARE #Usage TABLE (
userid int,
game nvarchar(50),
gameStartTime datetime,
gameEndTime datetime
)
DECLARE #Users TABLE (
userid int
)
INSERT #Users VALUES (1)
INSERT #Usage VALUES
(1, 'sample', '4/25/2015 10pm', '4/26/2015 2:30am'),
(1, 'sample', '4/22/2015 4pm', '4/22/2015 4:30pm')
-- Generate list of days in range
DECLARE #DayCount INT = DATEDIFF(DD, #BeginDate, GETDATE()) + 1
;WITH CTE AS (
SELECT TOP (225) [object_id] FROM sys.all_objects
), [Days] AS (
SELECT TOP (#DayCount)
DATEADD(DD, ROW_NUMBER() OVER (ORDER BY x.[object_id]) - 1, #BeginDate) AS [Day]
FROM CTE x
CROSS JOIN CTE y
ORDER BY
[Day]
)
SELECT
[Days].[Day],
Users.userid,
SUM(COALESCE(CONVERT(MONEY, DATEDIFF(SS, CASE WHEN CONVERT(DATE, Usage.gameStartTime) < [Day] THEN [Day] ELSE Usage.gameStartTime END,
CASE WHEN CONVERT(DATE, Usage.gameEndTime) > [Day] THEN DATEADD(DD, 1, [Days].[Day]) ELSE Usage.gameEndTime END)) / 3600, 0)) AS [Hours]
FROM [Days]
CROSS JOIN #Users Users
LEFT OUTER JOIN #Usage Usage
ON Usage.userid = Users.userid
AND [Days].[Day] BETWEEN CONVERT(DATE, Usage.gameStartTime) AND CONVERT(DATE, Usage.gameEndTime)
GROUP BY
[Days].[Day],
Users.userid
The query above yields the output below for the sample data:
Day userid Hours
---------- ----------- ---------------------
2015-04-21 1 0.00
2015-04-22 1 0.50
2015-04-23 1 0.00
2015-04-24 1 0.00
2015-04-25 1 2.00
2015-04-26 1 2.50
2015-04-27 1 0.00
I've edited my sql on sql fiddle and I think this might get you what you asked for. to me it looks a little more simple then the answer you've accepted.
DECLARE #FromDate datetime, #ToDate datetime
SELECT #Fromdate = MIN(StartDate), #ToDate = MAX(currentDate)
FROM Games
-- This recursive CTE will get you all dates
-- between the first StartDate and the last CurrentDate on your table
;WITH AllDates AS(
SELECT #Fromdate As TheDate
UNION ALL
SELECT TheDate + 1
FROM AllDates
WHERE TheDate + 1 <= #ToDate
)
SELECT UserId,
TheDate,
COALESCE(
SUM(
-- When the game starts and ends in the same date
CASE WHEN DATEDIFF(DAY, GameStartTime, GameEndTime) = 0 THEN
DATEDIFF(HOUR, GameStartTime, GameEndTime)
ELSE
-- when the game starts in the current date
CASE WHEN DATEDIFF(DAY, GameStartTime, TheDate) = 0 THEN
DATEDIFF(HOUR, GameStartTime, DATEADD(Day, 1, TheDate))
ELSE -- meaning the game ends in the current date
DATEDIFF(HOUR, TheDate, GameEndTime)
END
END
),
0) As HoursPerDay
FROM (
SELECT DISTINCT UserId,
TheDate,
CASE
WHEN CAST(GameStartTime as Date) = TheDate
THEN GameStartTime
ELSE NULL
END As GameStartTime, -- return null if no game started that day
CASE
WHEN CAST(GameEndTime as Date) = TheDate
THEN GameEndTime
ELSE NULL
END As GameEndTime -- return null if no game ended that day
FROM Games CROSS APPLY AllDates -- This is where the magic happens :-)
) InnerSelect
GROUP BY UserId, TheDate
ORDER BY UserId, TheDate
OPTION (MAXRECURSION 0)
Play with it your self on sql fiddle.

get best sales rep weekly SQL

I need a bit of help with a SQL Server issue.
I have 2 tables:
complete_sales_raw
(
Id int Identity(1,1) PK,
RepId int FK in sale_reps,
Revenue decimal(15,2),
Sale_date datetime2(7)
)
and
sale_reps
(
Id int Identity(1,1) PK,
RepName nvarchar(50)
)
What I need to do is get best sales rep based on the total revenue for each week, starting with 2014-06-01 and ending at current date.
Each week has 7 days and the first day is 2014-06-01.
So far I got to here:
SELECT TOP(1)
sr.RepName as RepName,
SUM(csr.Revenue) as Revenue
INTO #tmp1
FROM complete_sales_raw csr
JOIN sale_reps sr on csr.RepId = sr.Id
WHERE DATEDIFF( d,'2014-06-01', Sale_date ) BETWEEN 0 and 6
GROUP BY sr.RepName
ORDER BY 2 desc
But this only returns the best sale rep for the first week and I need it for each week.
All help is appreciated.
ok so, I created a week table like so
IF ( OBJECT_ID('dbo.tmp4') IS NOT NULL )
DROP TABLE dbo.tmp4
GO
Create Table tmp4(
StartDate datetime,Enddate datetime,WeekNo varchar(20)
)
DECLARE
#start_date DATETIME,
#end_date DATETIME,
#start_date1 DATETIME,
#end_date1 DATETIME
DECLARE #Table table(StartDate datetime,Enddate datetime,WeekNo varchar(20))
Declare #WeekDt as varchar(10)
SET #start_date = '2014-06-01'
SET #end_date = '2015-01-03'
Set #WeekDt = DATEPART(WEEK,#start_date)
SET #start_date1 = #start_date
While #start_date<=#end_date
Begin
--Select #start_date,#start_date+1
IF #WeekDt<>DATEPART(WEEK,#start_date)
BEGIN
Set #WeekDt = DATEPART(WEEK,#start_date)
SET #end_date1=#start_date-1
INSERT INTO tmp4 Values(#start_date1,#end_date1,DATEPART(WEEK,#start_date1))
SET #start_date1 = #start_date
END
set #start_date = #start_date+1
END
GO
and then I used Gordon's answer and made this:
SELECT t.StartDate as StartDate, sr.RepName as RepName, SUM(csr.Revenue) as Revenue,
RANK() OVER (PARTITION BY (t.StartDate) ORDER BY SUM(csr.Revenue) desc) as seqnum into tmp1
FROM tmp4 t,
complete_sales_raw csr
JOIN sale_reps sr on csr.RepId = sr.Id
WHERE DATEDIFF( d,t.StartDate, MAS_PostDate ) BETWEEN 0 and 6
GROUP BY sr.RepName, t.StartDate
SELECT * FROM tmp1
WHERE seqnum = 1
ORDER BY StartDate
which returns the best sales_rep for each week
You can do an aggregation to get the total sales by week. This requires some manipulation of the dates to calculate the number of weeks -- basically dividing the days by 7.
Then, use rank() (or row_number() if you only want one when there are ties) to get the top value:
SELECT s.*
FROM (SELECT tsr.RepName as RepName,
(DATEDIFF(day, '2014-06-01', MAS_PostDate ) - 1) / 7 as weeknum,
SUM(csr.Revenue) as Revenue,
RANK() OVER (PARTITION BY (DATEDIFF(day, '2014-06-01', MAS_PostDate ) - 1) / 7 ORDER BY SUM(csr.Revenue)) as seqnum
FROM complete_sales_raw csr JOIN
sale_reps sr
on csr.RepId = sr.Id
WHERE DATEDIFF(day, '2014-06-01', MAS_PostDate ) BETWEEN 0 and 6
GROUP BY sr.RepName, (DATEDIFF(day, '2014-06-01', MAS_PostDate ) - 1) / 7
) s
WHERE seqnum = 1;

T-SQL function loop [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I was wondering if I could get some help on a T-SQL function I am trying to create:
Here is some sample data that needs to be queried:
Simplified table:
ID|PersonID|ValueTypeID|ValueTypeDescription|Value
1|ZZZZZ000L6|ZZZZZ00071|Start Prison Date|3/28/2012
2|ZZZZZ000L6|ZZZZZ00071|Start Prison Date|10/10/2012
3|ZZZZZ000L6|ZZZZZ00072|End Prison Date |3/29/2012
4|ZZZZZ000MD|ZZZZZ00071|Start Prison Date|1/15/2012
5|ZZZZZ000MD|ZZZZZ00072|End Prison Date |2/15/2012
6|ZZZZZ000MD|ZZZZZ00071|Start Prison Date|4/1/2012
7|ZZZZZ000MD|ZZZZZ00072|End Prison Date |4/5/2012
8|ZZZZZ000MD|ZZZZZ00071|Start Prison Date|9/3/2012
9|ZZZZZ000MD|ZZZZZ00072|End Prison Date |12/1/2012
What I need is a T-SQL function that accepts the PersonID and the Year (#PID, #YR) and returns the number of days that person has been in prison for that year.
dbo.NumDaysInPrison(#PID, #YR) as int
Example:
dbo.NumDaysInPrison('ZZZZZ000L6', 2012) returns 84
dbo.NumDaysInPrison('ZZZZZ000MD', 2012) returns 124
So far, I have come up with this query that gives me the answer sometimes.
DECLARE #Year int
DECLARE #PersonID nvarchar(50)
SET #Year = 2012
SET #PersonID = 'ZZZZZ000AA'
;WITH StartDates AS
(
SELECT
Value,
ROW_NUMBER() OVER(ORDER BY Value) AS RowNumber
FROM Prisoners
WHERE ValueTypeDescription = 'Start Prison Date' AND PersonID = #PersonID AND YEAR(Value) = #Year
), EndDates AS
(
SELECT
Value,
ROW_NUMBER() OVER(ORDER BY Value) AS RowNumber
FROM Prisoners
WHERE ValueTypeDescription = 'End Prison Date' AND PersonID = #PersonID AND YEAR(Value) = #Year
)
SELECT
SUM(DATEDIFF(d, s.Value, ISNULL(e.Value, cast(str(#Year*10000+12*100+31) as date)))) AS NumDays
FROM StartDates s
LEFT OUTER JOIN EndDates e ON s.RowNumber = e.RowNumber
This fails to capture if a record earlier in the year was left without an end date:
for example if a person has only two records:
ID|PersonID|ValueTypeID|ValueTypeDescription|Value
1|ZZZZZ000AA|ZZZZZ00071|Start Prison Date|3/28/2012
2|ZZZZZ000AA|ZZZZZ00071|Start Prison Date|10/10/2012
(3/28/2012 -> End of Year)
(10/10/2012 -> End of Year)
will returns 360, not 278.
So it seems that you have the data that you need to split out your 'start date' values and your 'end date' values. You don't really need to loop through anything, you can just pull out your start values then your end values based on your person and compare them.
The important thing is to pull out all you need to begin with and then compare the appropriate values.
Here's an example based on your data above. It would need some heavy tweaking to work with production data; it makes assumptions about the Value data. It's also a bad idea to hard-code valuetypeid as I have here; if you're making a function, you'd want to handle that, I think.
DECLARE #pid INT, #yr INT;
WITH startdatecalc AS
(
SELECT personid, CAST([value] AS date) AS startdate, DATEPART(YEAR, CAST([value] AS date)) AS startyear
FROM incarctbl
WHERE valuetypeid = 'ZZZZZ00071'
),
enddatecalc AS
(
SELECT personid, CAST([value] AS date) AS enddate, DATEPART(YEAR, CAST([value] AS date)) AS endyear
FROM incarctbl
WHERE valuetypeid = 'ZZZZZ00072'
)
SELECT CASE WHEN startyear < #yr THEN DATEDIFF(day, CAST(CAST(#yr AS VARCHAR(4)) + '-01-01' AS date), ISNULL(enddatecalc.enddate, CURRENT_TIMESTAMP))
ELSE DATEDIFF(DAY, startdate, ISNULL(enddatecalc.enddate, CURRENT_TIMESTAMP)) END AS NumDaysInPrison
FROM startdatecalc
LEFT JOIN enddatecalc
ON startdatecalc.personid = enddatecalc.personid
AND enddatecalc.enddate >= startdatecalc.startdate
AND NOT EXISTS
(SELECT 1 FROM enddatecalc xref
WHERE xref.personid = enddatecalc.personid
AND xref.enddate < enddatecalc.enddate
AND xref.enddate >= startdatecalc.startdate
AND xref.endyear < #yr)
WHERE startdatecalc.personid = #pid
AND startdatecalc.startyear <= #yr
AND (enddatecalc.personid IS NULL OR endyear >= #yr);
EDIT: Added existence check to attempt to handle if the same personid was used multiple times in the same year.
Here's my implementation with test tables and data. You'll have to change where appropriate. NOTE: i take datediff + 1 for days in prison, so if you go in on monday and leave on tuesday, that counts as two days. if you want it to count as one day, remove the "+ 1"
create table PrisonRegistry
(
id int not null identity(1,1) primary key
, PersonId int not null
, ValueTypeId int not null
, Value date
)
-- ValueTypeIDs: 1 = start prison date, 2 = end prison date
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 1, 1, '2012-03-28' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 1, 1, '2012-10-12' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 1, 2, '2012-03-29' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 1, '2012-01-15' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 2, '2012-02-15' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 1, '2012-04-01' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 2, '2012-04-05' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 1, '2012-09-03' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 2, '2012-12-1' )
go
create function dbo.NumDaysInPrison(
#personId int
, #year int
)
returns int
as
begin
declare #retVal int
set #retVal = 0
declare #valueTypeId int
declare #value date
declare #startDate date
declare #noDates bit
set #noDates = 1
set #startDate = DATEFROMPARTS( #year, 1, 1 )
declare prisonCursor cursor for
select
pr.ValueTypeId
, pr.Value
from
PrisonRegistry pr
where
DATEPART( yyyy, pr.Value ) = #year
and pr.ValueTypeId in (1,2)
and PersonId = #personId
order by
pr.Value
open prisonCursor
fetch next from prisonCursor
into #valueTypeId, #value
while ##FETCH_STATUS = 0
begin
set #noDates = 0
-- if end date, add date diff to retVal
if 2 = #valueTypeId
begin
--if #startDate is null
--begin
-- -- error: two end dates in a row
-- -- handle
--end
set #retVal = #retVal + DATEDIFF( dd, #startDate, #value ) + 1
set #startDate = null
end
else if 1 = #valueTypeId
begin
set #startDate = #value
end
fetch next from prisonCursor
into #valueTypeId, #value
end
close prisonCursor
deallocate prisonCursor
if #startDate is not null and 0 = #noDates
begin
set #retVal = #retVal + DATEDIFF( dd, #startDate, DATEFROMPARTS( #year, 12, 31 ) ) + 1
end
return #retVal
end
go
select dbo.NumDaysInPrison( 1, 2012 )
select dbo.NumDaysInPrison( 2, 2012 )
select dbo.NumDaysInPrison( 2, 2011 )
This is a complicated question. It is not so much "asking for a function" as it is dealing with two competing problems. The first is organizing the data, which is transaction-based, into records with start and stop dates for the prison period. The second is summarizing this for time spent within another given span of time (a year).
I think you need to spend some time investigating the data to understand the anomalies in it, before progressing to writing a function. The following query should help you. It does the calculate for all prisoners for a given year (which is the year in the first CTE):
with vals as (
select 2012 as yr
),
const as (
select cast(CAST(yr as varchar(255))+'-01-01' as DATE) as periodstart,
cast(CAST(yr as varchar(255))+'-12-31' as DATE) as periodend
from vals
)
select t.personId, SUM(datediff(d, (case when StartDate < const.periodStart then const.periodStart else StartDate end),
(case when EndDate > const.PeriodEnd or EndDate is NULL then const.periodEnd, else EndDate end)
)
) as daysInYear
from (select t.*, t.value as StartDate,
(select top 1 value
from t t2
where t.personId = t2.personId and t2.Value >= t.Value and t2.ValueTypeDescription = 'End Prison Date'
order by value desc
) as EndDate
from t
where valueTypeDescription = 'Start Prison Date'
) t cross join
const
where StartDate <= const.periodend and (EndDate >= const.periodstart or EndDate is NULL)
group by t.PersonId;
This query can be adapted as a function. But, I would encourage you to investigate the data before going there. Once you wrap things up in a function, it will be much more difficult to find and understand anomalies -- why did someone go in and out on the same day? How has the longest periods in prison? And so on.