In SQL 2005 stored proc I need to run a query that contains a 1-M. I need to return only 1 of the Many table the one with the earliest date.
I have looked at In SQL how do I write a query to return 1 record from a 1 to many relationship?
and SQL conundrum, how to select latest date for part, but only 1 row per part (unique)
But I am not sure what's the best solution in my case as I am also doing a Insert Into temp table and using dynamic sorting and paging.
Here is my SQL. What I want is to return many rows of Foo, but only the earliest b.CreatedDate between the start and end data paramaters I pass in where there is normally about 5 rows in Bar for each Foo.
DECLARE #StartDate datetime
DECLARE #EndDate datetime
INSERT INTO #Results
SELECT distinct
f.Name,
f.Price
b.CreatedDate ,
// loads more columns removed for brevity
FROM
foo f
join bar b on f.Id = b.fooId
// loads more table removed for brevity
WHERE
(#x is null OR f.Id = #x)
AND (#Deal is null OR f.IsDeal = #Deal)
AND (#StartDate is null OR sd.SailingDate >= #StartDate)
AND (#EndDate is null OR sd.SailingDate <= #EndDate)
// loads more filters removed for brevity
declare #firstResult int, #lastResult int
set #firstResult = ((#PageNumber-1) * #ItemsPerPage) + 1;
set #lastResult = #firstResult + #ItemsPerPage;
select #TotalResults = count(1) from #Results;
WITH ResultItems AS
(
SELECT *, ROW_NUMBER() OVER (
ORDER BY
CASE WHEN #SortBy = 'priceLow' THEN Price END ASC,
CASE WHEN #SortBy = 'Soonest' THEN CreatedDate END ASC,
CASE WHEN #SortBy = 'priceHigh' THEN Price END DESC
) As RowNumber
FROM #Results r
)
SELECT * from ResultItems
WHERE RowNumber >= #firstResult AND RowNumber < #lastResult
ORDER BY
CASE
WHEN #SortBy = 'priceHigh' THEN (RANK() OVER (ORDER BY Price desc))
WHEN #SortBy = 'priceLow' THEN (RANK() OVER (ORDER BY Price))
WHEN #SortBy = 'Soonest' THEN (RANK() OVER (ORDER BY CreatedDate ))
END
This query as is will return multiple 'b.CreatedDate' instead of just the earliest one between my Filters
Update
So I want to See
If my source data is:
Foo
___
1 , Hello
2 , There
Boo
___
1, 1, 2011-2-4
2, 1, 2011-3-6
3, 1, 2012-12-21
4, 2, 2012-11-2
The result would be
1, Hello,2011-2-4
2, There, 2012-11-2
I think I just got it working by adding a CTE to the top of my query
;with cteMinDate as (
select FooId, min(CreatedDate) As CreatedDate
from Bar WHERE
(#StartDate is null OR CreatedDate>= #StartDate)
AND (#EndDate is null OR CreatedDate<= #EndDate)
group by FooId
)
Same as shown here SQL conundrum, how to select latest date for part, but only 1 row per part (unique). Doing this allows me to remove the date query part from my main query and only do it once in the CTE
Related
I'm trying to determine the number of records with consecutive dates (previous record ends on the same date as the start date of the next record) before and after a specified date, and ignore any consecutive records as soon as there is a break in the chain.
If I have the following data:
-- declare vars
DECLARE #dateToCheck date = '2020-09-20'
DECLARE #numRecsBefore int = 0
DECLARE #numRecsAfter int = 0
DECLARE #tempID int
-- temp table
CREATE TABLE #dates
(
[idx] INT IDENTITY(1,1),
[startDate] DATETIME ,
[endDate] DATETIME,
[prevEndDate] DATETIME
)
-- insert temp table
INSERT INTO #dates
( [startDate], [endDate] )
VALUES ( '2020-09-01', '2020-09-04' ),
( '2020-09-04', '2020-09-10' ),
( '2020-09-10', '2020-09-16' ),
( '2020-09-17', '2020-09-19' ),
( '2020-09-19', '2020-09-20' ),
--
( '2020-09-20', '2020-09-23' ),
( '2020-09-25', '2020-09-26' ),
( '2020-09-27', '2020-09-28' ),
( '2020-09-28', '2020-09-30' ),
( '2020-10-01', '2020-09-05' )
-- update with previous records endDate
DECLARE #maxRows int = (SELECT MAX(idx) FROM #dates)
DECLARE #intCount int = 0
WHILE #intCount <= #maxRows
BEGIN
UPDATE #dates SET prevEndDate = (SELECT endDate FROM #dates WHERE idx = (#intCount - 1) ) WHERE idx=#intCount
SET #intCount = #intCount + 1
END
-- clear any breaks in the chain?
-- number of consecutive records before this date
SET #numRecsBefore = (SELECT COUNT(idx) FROM #dates WHERE startDate = prevEndDate AND endDate <= #dateToCheck)
-- number of consecutive records after this date
SET #numRecsAfter = (SELECT COUNT(idx) FROM #dates WHERE startDate = prevEndDate AND endDate >= #dateToCheck)
-- return & clean up
SELECT * FROM #dates
SELECT #numRecsBefore AS numBefore, #numRecsAfter AS numAfter
DROP TABLE #dates
With the specified date being '2020-09-20, I would expect #numRecsBefore = 2 and #numRecsAfter = 1. That is not what I am getting, as its counting all the consecutive records.
There has to be a better way to do this. I know the loop isn't optimal, but I couldn't get LAG() or LEAD() to work. I've spend all morning trying different methods and searching, but everything I find doesn't deal with two dates, or breaks in the chain.
This reads like a gaps-and-island problem. Islands represents rows whose date ranges are adjacent, and you want to count how many records preceed of follow a current date in the same island.
You could do:
select
max(case when #dateToCheck > startdate and #dateToCheck <= enddate then numRecsBefore end) as numRecsBefore,
max(case when #dateToCheck >= startdate and #dateToCheck < enddate then numRecsAfter end) as numRecsAfter
from (
select d.*,
count(*) over(partition by grp order by startdate) as numRecsBefore,
count(*) over(partition by grp order by startdate desc) as numRecsAfter
from (
select d.*,
sum(case when startdate = lag_enddate then 0 else 1 end) over(order by startdate) as grp
from (
select d.*,
lag(enddate) over(order by startdate) as lag_enddate
from #dates d
) d
) d
) d
This uses lag() and a cumulative sum() to define the islands. The a window count gives the number and preceding and following records on the same island. The final step is conditional aggrgation; extra care needs to be taken on the inequalities to take in account various possibilites (typically, the date you search for might not always match a range bound).
Demo on DB Fiddle
I think this is what you are after, however, this does not give the results in your query; I suspect that is because they aren't the expected results? One of the conditional aggregated may also want to be a >= or <=, but I don't know which:
WITH CTE AS(
SELECT startDate,
endDate,
CASE startDate WHEN LAG(endDate) OVER (ORDER BY startDate ASC) THEN 1 END AS IsSame
FROM #dates d)
SELECT COUNT(CASE WHEN startDate < #dateToCheck THEN IsSame END) AS numBefore,
COUNT(CASE WHEN startDate > #dateToCheck THEN IsSame END) AS numAfter
FROM CTE;
I have a very large SQL database that I am pulling data to a web page. Instead of pulling every value, I want to take every 12th value. Is there a way to modify my current select statement?
SELECT *
FROM (
SELECT CAST(DateTimeUTC as SmallDateTime) as [DateTime],
CASE When DataValue = '-9999' Then null
When DataValue < '-60' Then null
Else DataValue
End DataValue, VariableID
FROM DataValues
WHERE SiteID = #siteID and VariableID IN(9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
) TableDate
PIVOT (SUM(DataValue) FOR VariableID IN ([9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30])) PivotTable ORDER BY [DateTime]
END
This works except the data is staggered from one column to the next. I am not sure why all the data points don't start at the same location.
See the screen shot below.
=
Using this theory (SQL Server) -
with rNum As(
SELECT t.*,RowNum = row_number() over (order by date)
FROM testdb.dbo.testtable t
)
select * from rNum where (RowNum % 12) = 0
Something like this -
with dVal As(
Select RowNum = row_number() over (order by datetime),DataValues.*
from datavalues)
SELECT *
FROM (
SELECT CAST(DateTimeUTC as SmallDateTime) as [DateTime],
CASE When DataValue = '-9999' Then null
When DataValue < '-60' Then null
Else DataValue
End DataValue, VariableID
FROM dVal
WHERE
/* divide by 12 has no remainder */
(RowNum % 12) = 0 and
SiteID = #siteID and VariableID IN(9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
) TableDate
PIVOT (SUM(DataValue) FOR VariableID IN ([9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30])) PivotTable ORDER BY [DateTime]
or,
Select * from DataValues d
where (Select count(*) from datavalues
where DateTimeUTC < d.DateTimeUTC) % 12 = 0
to start at the 12th row, instead of the first row,
Select * from DataValues d
where (Select count(*) from datavalues
where DateTimeUTC <= d.DateTimeUTC) % 12 = 0
or
Select * from DataValues d
where (Select count(*) from datavalues
where DateTimeUTC < d.DateTimeUTC) % 12 = 11
I have a table called test.
In test I have An ID, a value and a date.
The dates are ordered for each ID.
I want to select rows for an ID, before and after a change of value, so the following example table.
RowNum--------ID------- Value -------- Date
1------------------001 ---------1----------- 01/01/2015
2------------------001 ---------1----------- 02/01/2015
3------------------001 ---------1----------- 04/01/2015
4------------------001 ---------1----------- 05/01/2015
5------------------001 ---------1----------- 06/01/2015
6------------------001 ---------1----------- 08/01/2015
7------------------001 ---------0----------- 09/01/2015
8------------------001 ---------0----------- 10/01/2015
9------------------001 ---------0----------- 11/01/2015
10-----------------001 ---------1----------- 12/01/2015
11-----------------001 ---------1----------- 14/01/2015
12------------------002 ---------1----------- 01/01/2015
13------------------002 ---------1----------- 04/01/2015
14------------------002 ---------0----------- 05/01/2015
15------------------002 ---------0----------- 07/01/2015
The result would return rows 6, 7, 9, 10, 13, 14
You could use analytic functions LAG() and LEAD() to access value in preceding and following rows, then check that it does not match value in current row.
SELECT *
FROM (
SELECT RowNum,
ID,
Value,
Date,
LAG(VALUE, 1, VALUE) OVER(ORDER BY RowNum) PrevValue,
LEAD(VALUE, 1, VALUE) OVER(ORDER BY RowNum) NextValue
FROM test)
WHERE PrevValue <> Value
OR NextValue <> Value
Params passed to this functions are
some scalar expression (column name in this case);
offset (1 row before or after);
default value (LAG() will return NULL for first row and LEAD() will return NULL for last row, but they don't seem special in your question, so I used column value as default).
Refer the below one for without using LEAD and LAG:
DECLARE #i INT = 1,
#cnt INT,
#dstvalue INT,
#srcvalue INT
CREATE TABLE #result
(
id INT,
mydate DATE
)
CREATE TABLE #temp1
(
rn INT IDENTITY(1, 1),
id INT,
mydate DATE
)
INSERT INTO #temp1
(id,
mydate)
SELECT id,
mydate
FROM table
ORDER BY id,
mydate
SELECT #cnt = Count(*)
FROM #temp1
SELECT #srcvalue = value
FROM #temp1
WHERE rn = #i
WHILE ( #i <= #cnt )
BEGIN
SELECT #dstvalue = value
FROM #temp1
WHERE rn = #i
IF( #srcvalue = #dstvalue )
BEGIN
SET #i = #i + 1
CONTINUE;
END
ELSE
BEGIN
SET #srcvalue = #dstvalue
INSERT INTO #result
(id,
mydate)
SELECT id,
mydate
FROM #temp
WHERE rn = #i - 1
UNION ALL
SELECT id,
mydate
FROM #temp
WHERE rn = #i
END
SET #i = #i + 1
END
SELECT *
FROM #result
The answer using lag() and lead() is the right answer. If you are using a pre-SQL Server 2012 version, then you can do essentially the same thing using cross apply or a correlated subquery:
select t.*
from test t cross apply
(select top 1 tprev.*
from test tprev
where tprev.date < t.date
order by date desc
) tprev cross apply
(select top 1 tnext.*
from test tnext
where tnext.date > t.date
order by date asc
) tnext
where tprev.value <> tnext.value;
I want to populate a datetime column on the fly within a stored procedure. below is the query that I currently have that does same but slows down query performance.
CREATE TABLE #TaxVal
(
ID INT
, PaidDate DATETIME
, CustID INT
, CompID INT
)
INSERT INTO #TaxVal(ID, PaidDate, CustID, CompID)
VALUES(01, '20150201',12, 100)
, (03,'20150301', 18,101)
, (10,'20150401',19,22)
, (17,'20150401',02,11)
, (11,'20150411',18,201)
, (78,'20150421',18,299)
, (133,'20150407',18,101)
-- SELECT * FROM #TaxVal
DECLARE #StartDate DATETIME = '20150101'
, #EndDate DATETIME = '20150501'
DECLARE #Tab TABLE
(
CompID INT
, DateField DATETIME
)
DECLARE #T INT
SET #T = 0
WHILE #EndDate >= #StartDate + #T
BEGIN
INSERT INTO #Tab
SELECT CompID
, #StartDate + #T AS DateField
FROM #TaxVal
WHERE CustID = 18
AND CompID = 101
ORDER BY DateField DESC
SET #T = #T + 1
END
SELECT DISTINCT * FROM #Tab
DROP TABLE #TaxVal
Which is the best way to write this query for better performance?
Change this:
DECLARE #T INT
SET #T = 0
WHILE #EndDate >= #StartDate + #T
BEGIN
INSERT INTO #Tab
SELECT CompID
, #StartDate + #T AS DateField
FROM #TaxVal
WHERE CustID = 18
AND CompID = 101
ORDER BY DateField DESC
SET #T = #T + 1
END
to this:
;with cte as(
select cast('20150101' as date) as d
union all
select dateadd(dd, 1, d) as d from cte where d < '20150501'
)
INSERT INTO #Tab
SELECT CompID, d
FROM #TaxVal
cross join cte
WHERE CustID = 18 AND CompID = 101
Option(maxrecursion 0)
Here is recursive common table expression to get all dates in range. Then you do a cross join and insert. Notice that there is no sense to order set while inserting.
Giorgi's answer of
;with cte as(
select cast('20150101' as date) as d
union all
select dateadd(dd, 1, d) as d from cte where d < '20150501'
)
INSERT INTO #Tab
SELECT CompID, d
FROM #TaxVal
cross join cte
WHERE CustID = 18 AND CompID = 101
will work, but be careful with recursive CTE's. If the date range is large, you'll quickly hit your maximum recursion level. Often, a numbers table is used much like HABO mentioned. This is simply a table with a single column that only a integer so the rows would be 1, 2, 3, 4, 5, etc. You can then join the Numbers table (outer apply works well for this) and use the numbers with dateadd to get your incremental dates. Also note that you can run into an issue where the Numbers table doesn't contain enough rows for you date range.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I was wondering if I could get some help on a T-SQL function I am trying to create:
Here is some sample data that needs to be queried:
Simplified table:
ID|PersonID|ValueTypeID|ValueTypeDescription|Value
1|ZZZZZ000L6|ZZZZZ00071|Start Prison Date|3/28/2012
2|ZZZZZ000L6|ZZZZZ00071|Start Prison Date|10/10/2012
3|ZZZZZ000L6|ZZZZZ00072|End Prison Date |3/29/2012
4|ZZZZZ000MD|ZZZZZ00071|Start Prison Date|1/15/2012
5|ZZZZZ000MD|ZZZZZ00072|End Prison Date |2/15/2012
6|ZZZZZ000MD|ZZZZZ00071|Start Prison Date|4/1/2012
7|ZZZZZ000MD|ZZZZZ00072|End Prison Date |4/5/2012
8|ZZZZZ000MD|ZZZZZ00071|Start Prison Date|9/3/2012
9|ZZZZZ000MD|ZZZZZ00072|End Prison Date |12/1/2012
What I need is a T-SQL function that accepts the PersonID and the Year (#PID, #YR) and returns the number of days that person has been in prison for that year.
dbo.NumDaysInPrison(#PID, #YR) as int
Example:
dbo.NumDaysInPrison('ZZZZZ000L6', 2012) returns 84
dbo.NumDaysInPrison('ZZZZZ000MD', 2012) returns 124
So far, I have come up with this query that gives me the answer sometimes.
DECLARE #Year int
DECLARE #PersonID nvarchar(50)
SET #Year = 2012
SET #PersonID = 'ZZZZZ000AA'
;WITH StartDates AS
(
SELECT
Value,
ROW_NUMBER() OVER(ORDER BY Value) AS RowNumber
FROM Prisoners
WHERE ValueTypeDescription = 'Start Prison Date' AND PersonID = #PersonID AND YEAR(Value) = #Year
), EndDates AS
(
SELECT
Value,
ROW_NUMBER() OVER(ORDER BY Value) AS RowNumber
FROM Prisoners
WHERE ValueTypeDescription = 'End Prison Date' AND PersonID = #PersonID AND YEAR(Value) = #Year
)
SELECT
SUM(DATEDIFF(d, s.Value, ISNULL(e.Value, cast(str(#Year*10000+12*100+31) as date)))) AS NumDays
FROM StartDates s
LEFT OUTER JOIN EndDates e ON s.RowNumber = e.RowNumber
This fails to capture if a record earlier in the year was left without an end date:
for example if a person has only two records:
ID|PersonID|ValueTypeID|ValueTypeDescription|Value
1|ZZZZZ000AA|ZZZZZ00071|Start Prison Date|3/28/2012
2|ZZZZZ000AA|ZZZZZ00071|Start Prison Date|10/10/2012
(3/28/2012 -> End of Year)
(10/10/2012 -> End of Year)
will returns 360, not 278.
So it seems that you have the data that you need to split out your 'start date' values and your 'end date' values. You don't really need to loop through anything, you can just pull out your start values then your end values based on your person and compare them.
The important thing is to pull out all you need to begin with and then compare the appropriate values.
Here's an example based on your data above. It would need some heavy tweaking to work with production data; it makes assumptions about the Value data. It's also a bad idea to hard-code valuetypeid as I have here; if you're making a function, you'd want to handle that, I think.
DECLARE #pid INT, #yr INT;
WITH startdatecalc AS
(
SELECT personid, CAST([value] AS date) AS startdate, DATEPART(YEAR, CAST([value] AS date)) AS startyear
FROM incarctbl
WHERE valuetypeid = 'ZZZZZ00071'
),
enddatecalc AS
(
SELECT personid, CAST([value] AS date) AS enddate, DATEPART(YEAR, CAST([value] AS date)) AS endyear
FROM incarctbl
WHERE valuetypeid = 'ZZZZZ00072'
)
SELECT CASE WHEN startyear < #yr THEN DATEDIFF(day, CAST(CAST(#yr AS VARCHAR(4)) + '-01-01' AS date), ISNULL(enddatecalc.enddate, CURRENT_TIMESTAMP))
ELSE DATEDIFF(DAY, startdate, ISNULL(enddatecalc.enddate, CURRENT_TIMESTAMP)) END AS NumDaysInPrison
FROM startdatecalc
LEFT JOIN enddatecalc
ON startdatecalc.personid = enddatecalc.personid
AND enddatecalc.enddate >= startdatecalc.startdate
AND NOT EXISTS
(SELECT 1 FROM enddatecalc xref
WHERE xref.personid = enddatecalc.personid
AND xref.enddate < enddatecalc.enddate
AND xref.enddate >= startdatecalc.startdate
AND xref.endyear < #yr)
WHERE startdatecalc.personid = #pid
AND startdatecalc.startyear <= #yr
AND (enddatecalc.personid IS NULL OR endyear >= #yr);
EDIT: Added existence check to attempt to handle if the same personid was used multiple times in the same year.
Here's my implementation with test tables and data. You'll have to change where appropriate. NOTE: i take datediff + 1 for days in prison, so if you go in on monday and leave on tuesday, that counts as two days. if you want it to count as one day, remove the "+ 1"
create table PrisonRegistry
(
id int not null identity(1,1) primary key
, PersonId int not null
, ValueTypeId int not null
, Value date
)
-- ValueTypeIDs: 1 = start prison date, 2 = end prison date
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 1, 1, '2012-03-28' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 1, 1, '2012-10-12' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 1, 2, '2012-03-29' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 1, '2012-01-15' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 2, '2012-02-15' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 1, '2012-04-01' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 2, '2012-04-05' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 1, '2012-09-03' )
insert PrisonRegistry( PersonId, ValueTypeId, Value ) values ( 2, 2, '2012-12-1' )
go
create function dbo.NumDaysInPrison(
#personId int
, #year int
)
returns int
as
begin
declare #retVal int
set #retVal = 0
declare #valueTypeId int
declare #value date
declare #startDate date
declare #noDates bit
set #noDates = 1
set #startDate = DATEFROMPARTS( #year, 1, 1 )
declare prisonCursor cursor for
select
pr.ValueTypeId
, pr.Value
from
PrisonRegistry pr
where
DATEPART( yyyy, pr.Value ) = #year
and pr.ValueTypeId in (1,2)
and PersonId = #personId
order by
pr.Value
open prisonCursor
fetch next from prisonCursor
into #valueTypeId, #value
while ##FETCH_STATUS = 0
begin
set #noDates = 0
-- if end date, add date diff to retVal
if 2 = #valueTypeId
begin
--if #startDate is null
--begin
-- -- error: two end dates in a row
-- -- handle
--end
set #retVal = #retVal + DATEDIFF( dd, #startDate, #value ) + 1
set #startDate = null
end
else if 1 = #valueTypeId
begin
set #startDate = #value
end
fetch next from prisonCursor
into #valueTypeId, #value
end
close prisonCursor
deallocate prisonCursor
if #startDate is not null and 0 = #noDates
begin
set #retVal = #retVal + DATEDIFF( dd, #startDate, DATEFROMPARTS( #year, 12, 31 ) ) + 1
end
return #retVal
end
go
select dbo.NumDaysInPrison( 1, 2012 )
select dbo.NumDaysInPrison( 2, 2012 )
select dbo.NumDaysInPrison( 2, 2011 )
This is a complicated question. It is not so much "asking for a function" as it is dealing with two competing problems. The first is organizing the data, which is transaction-based, into records with start and stop dates for the prison period. The second is summarizing this for time spent within another given span of time (a year).
I think you need to spend some time investigating the data to understand the anomalies in it, before progressing to writing a function. The following query should help you. It does the calculate for all prisoners for a given year (which is the year in the first CTE):
with vals as (
select 2012 as yr
),
const as (
select cast(CAST(yr as varchar(255))+'-01-01' as DATE) as periodstart,
cast(CAST(yr as varchar(255))+'-12-31' as DATE) as periodend
from vals
)
select t.personId, SUM(datediff(d, (case when StartDate < const.periodStart then const.periodStart else StartDate end),
(case when EndDate > const.PeriodEnd or EndDate is NULL then const.periodEnd, else EndDate end)
)
) as daysInYear
from (select t.*, t.value as StartDate,
(select top 1 value
from t t2
where t.personId = t2.personId and t2.Value >= t.Value and t2.ValueTypeDescription = 'End Prison Date'
order by value desc
) as EndDate
from t
where valueTypeDescription = 'Start Prison Date'
) t cross join
const
where StartDate <= const.periodend and (EndDate >= const.periodstart or EndDate is NULL)
group by t.PersonId;
This query can be adapted as a function. But, I would encourage you to investigate the data before going there. Once you wrap things up in a function, it will be much more difficult to find and understand anomalies -- why did someone go in and out on the same day? How has the longest periods in prison? And so on.