Calculating a running total of when a value changes over a partition - sql

I am having trouble figuring out how to write a window function that solves my problem. I am quite the novice at window functions, but I think one could be written to meet my needs.
Problem Statement:
I want to calculate a transfer sequence showing when person has changed locations based on the corresponding location ID over time.
Sample Data (Table1)
+----------+------------+-----------+---------+
| PersonID | LocationID | Date | Time |
+----------+------------+-----------+---------+
| 12 | A | 6/17/2020 | 12:00PM |
+----------+------------+-----------+---------+
| 12 | A | 6/18/2020 | 1:00PM |
+----------+------------+-----------+---------+
| 12 | B | 6/18/2020 | 6:00AM |
+----------+------------+-----------+---------+
| 12 | C | 6/19/2020 | 3:00PM |
+----------+------------+-----------+---------+
| 13 | A | 6/16/2020 | 8:00AM |
+----------+------------+-----------+---------+
| 13 | A | 6/16/2020 | 11:00AM |
+----------+------------+-----------+---------+
| 13 | A | 6/16/2020 | 12:00AM |
+----------+------------+-----------+---------+
| 13 | B | 6/16/2020 | 4:00PM |
+----------+------------+-----------+---------+
Expected Results
+----------+------------+-----------+---------+-------------------+
| PersonID | LocationID | Date | Time | Transfer Sequence |
+----------+------------+-----------+---------+-------------------+
| 12 | A | 6/17/2020 | 12:00PM | 1 |
+----------+------------+-----------+---------+-------------------+
| 12 | A | 6/18/2020 | 1:00PM | 1 |
+----------+------------+-----------+---------+-------------------+
| 12 | B | 6/18/2020 | 6:00AM | 2 |
+----------+------------+-----------+---------+-------------------+
| 12 | C | 6/19/2020 | 3:00PM | 3 |
+----------+------------+-----------+---------+-------------------+
| 13 | A | 6/16/2020 | 8:00AM | 1 |
+----------+------------+-----------+---------+-------------------+
| 13 | A | 6/16/2020 | 11:00AM | 1 |
+----------+------------+-----------+---------+-------------------+
| 13 | A | 6/16/2020 | 12:00AM | 1 |
+----------+------------+-----------+---------+-------------------+
| 13 | B | 6/16/2020 | 4:00PM | 2 |
+----------+------------+-----------+---------+-------------------+
What I Tried
SELECT
[t1].[PersonID]
,[t1].[LocationID]
,[t1].[Date]
,[t1].[Time]
,DENSE_RANK()
OVER(
partition BY [t1].[PersonID], [t1].[LocationID]
ORDER BY [t1].[Date] ASC, [t1].[Time] ASC) AS
[Transfer Sequence]
FROM Table1 [t1]
Unfortunately, I believe DENSE_RANK() is assigning a rank regardless of whether the value of LocationID has changed. I need a function that will only add one to the sequence when the LocationID has changed.
Any help would be greatly appreciated.
Thank you!

You want to put "adjacent" rows in the same group. Straigt window functions cannot do that for you - we would need to use a gaps-and-island technique:
select
t.*,
sum(case when locationID = lagLocationID then 0 else 1 end)
over(partition by personID order by date, time)
as transfert_sequence
from (
select
t.*,
lag(locationID)
over(partition by personID order by date, time)
as lagLocationID
from mytable t
) t
The idea is to compute a window sum that increments everytime the locationID changes.
Note that this would properly handle the case when a person comes back to a location they have already been before.

What wuold I do (and I'm sure it's not the best way) is create a second table orderd with PersonID, locationID, Date, time and and empty field for the transfer sequence (sequence), then a cursor:
DECLARE transaction CURSOR
FOR select PersonID, LocationID, Date, Time from table1;
Then a loop:
OPEN CURSOR transaction
set #count = 0
set #person_saved = ""
set #location_saed = ""
FETCH NEXT FROM transaction INTO #person, #location, #date, #time
WHILE ##FETCH_STATUS = 0
BEGIN
if #person_saved <> #person -- changing personID, reset count
begin
set count = 0
set persone_saved = #person
end
if #location_saved <> #location. -- changing location, add count
begin
set #count = #count + 1
set #location_saved = #location
end
update table1 set sequence = #count where PersonId = #person and locationId = #location and date = #date and time = #time
FETCH NEXT FROM transaction INTO #person, #location, #date, #time
END
CLOSE transaction
DEALLOCATE transaction

Related

SQL Server - Counting total number of days user had active contracts

I want to count the number of days while user had active contract based on table with start and end dates for each service contract. I want to count the time of any activity, no matter if the customer had 1 or 5 contracts active at same time.
+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE | END_DATE |
+---------+-------------+------------+------------+
| 1 | 14 | 18.02.2021 | 18.04.2022 |
| 1 | 13 | 02.01.2019 | 02.01.2020 |
| 1 | 12 | 01.01.2018 | 01.01.2019 |
| 1 | 11 | 13.02.2017 | 13.02.2019 |
| 2 | 23 | 19.06.2021 | 18.04.2022 |
| 2 | 22 | 01.07.2019 | 01.07.2020 |
| 2 | 21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+
In result I want a table:
+---------+--------------------+
| USER_ID | DAYS_BEEING_ACTIVE |
+---------+--------------------+
| 1 | 1477 |
| 2 | 832 |
+---------+--------------------+
Where
1477 stands by 1053 (days from 13.02.2017 to 02.01.2020 - user had active contracts during this time) + 424 (days from 18.02.2021 to 18.04.2022)
832 stands by 529 (days from 19.01.2019 to 01.07.2020) + 303 (days from 19.06.2021 to 18.04.2022).
I tried some queries with joins, datediff's, case when conditions but nothing worked. I'll be grateful for any help.
If you don't have a Tally/Numbers table (highly recommended), you can use an ad-hoc tally/numbers table
Example or dbFiddle
Select User_ID
,Days = count(DISTINCT dateadd(DAY,N,Start_Date))
from YourTable A
Join ( Select Top 10000 N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1, master..spt_values n2
) B
On N<=DateDiff(DAY,Start_Date,End_Date)
Group By User_ID
Results
User_ID Days
1 1477
2 832

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;

Get employees still employed in August

I am trying to get records of employees who are still working and those who resigned in August. Here is my query:
DECLARE #MontStart datetime,#MonthEnd datetime
set #MontStart =cast('8/1/2015' as datetime)
set #MonthEnd = cast('8/31/2015' as datetime)
Select * from ( Select EmployeeNo, (Select LastName+','+FirstName from EmpPersonalInfo where EmployeeNo=s1.EmployeeNo) as EmployeeName,(Select Classtitle from Classification where ClassCode=s1.ClassCode) as Classification,Status,s1.EffectivityDateFrom,s1.EffectivityDateTo,
ROW_NUMBER() OVER (PARTITION BY EmployeeNo ORDER BY Status desc,cast(EffectivityDateTo as date) desc) AS Priority
FROM Employmenthistory s1)S2 where Priority=1 and (LTRIM(RTRIM(EmployeeNo))<>'' and NOT(EmployeeNo=''))
AND (#MontStart >= cast(EffectivityDateFrom as datetime) and (cast(EffectivityDateTo as datetime)>=#MontStart or (cast(EffectivityDateTo as datetime)<=#MonthEnd OR EffectivityDateTo is null)))
order by EmployeeName
But this query returns also those employees who resigned in the previous months and years.
Here is the result.
NULL value of EffectivityDateTo column means that employee is still employed (Status=1).
Status 1 = Employed/Active.
Status 0 = Inactive
Employee 901790 is still active though his EmployedTo is year 2010, he is still set as Active
EmployeeNo | EmployeeName | Status | EmployedFrom | EmployedTo
901790 | EmpName1 | 1 | 2008-07-28 | 2010-07-31
902566 | EmpName2 | 1 | 2013-01-25 | 2013-12-13
902502 | EmpName3 | 1 | 2012-08-15 | NULL
902309 | EmpName4 | 0 | 2011-07-12 | 2015-08-14
902575 | EmpName5 | 0 | 2013-03-11 | 2015-08-14
902706 | EmpName6 | 1 | 2014-03-24 | 2015-10-10
Expected result is this:
EmployeeNo | EmployeeName | Status | EmployedFrom | EmployedTo
902502 | EmpName3 | 1 | 2012-08-15 | NULL
902309 | EmpName4 | 0 | 2011-07-12 | 2015-08-14
902575 | EmpName5 | 0 | 2013-03-11 | 2015-08-14
902706 | EmpName6 | 1 | 2014-03-24 | 2015-10-10
I think your query could be a lot simpler. I'm not sure what the query would be to get to your first table, but I'll assume that's the Employmenthistory table
SELECT * FROM Employmenthistory
WHERE
(Status = 1 AND EmployedFrom <= #MonthEnd)
--Get all active employees that started before the end of the specific month
OR (status = 0 AND EmployedTo >= #MonthStart AND EmployedTO <= #MonthEnd)
--Get all employees who stopped working in the given timeframe.

SQL Server: how do I get data from a history table?

Can you please help me build an SQL query to retrieve data from a history table?
I'm a newbie with only a one-week coding experience. I've been trying simple SELECT statements so far but have hit a stumbling block.
My football club's database has three tables. The first one links balls to players:
BallDetail
| BallID | PlayerID | TeamID |
|-------------------|--------|
| 1 | 11 | 21 |
| 2 | 12 | 22 |
The second one lists things that happen to the balls:
BallEventHistory
| BallID | Event | EventDate |
|--------|------ |------------|
| 1 | Pass | 2012-01-01 |
| 1 | Shoot | 2012-02-01 |
| 1 | Miss | 2012-03-01 |
| 2 | Pass | 2012-01-01 |
| 2 | Shoot | 2012-02-01 |
And the third one is a history change table. After a ball changes hands, history is recorded:
HistoryChanges
| BallID | ColumnName | ValueOld | ValueNew |
|--------|------------|----------|----------|
| 2 | PlayerID | 11 | 12 |
| 2 | TeamID | 21 | 22 |
I'm trying to obtain a table that would list all passes and shoots Player 11 had done to all balls before the balls went to other players. Like this:
| PlayerID | BallID | Event | Month |
|----------|--------|-------|-------|
| 11 | 1 | Pass | Jan |
| 11 | 1 | Shoot | Feb |
| 11 | 2 | Pass | Jan |
I begin so:
SELECT PlayerID, BallID, Event, DateName(month, EventDate)
FROM BallDetail bd INNER JOIN BallEventHistory beh ON bd.BallID = beh.BallID
WHERE PlayerID = 11 AND Event IN (Pass, Shoot) ...
But how to make sure that Ball 2 also gets included despite being with another player now?
Select PlayerID,BallID,Event,datename(month,EventDate) as Month,Count(*) as cnt from
(
Select
Coalesce(
(Select ValueNew from #HistoryChanges where ChangeDate=(Select max(ChangeDate) from #HistoryChanges h2 where h2.BallID=h.BallID and ColumnName='PlayerID' and ChangeDate<=EventDate) and BallID=h.BallID and ColumnName='PlayerID')
,(Select PlayerID from #BallDetail where BallID=h.BallID)
) as PlayerID,
h.BallID,h.Event,EventDate
from #BallEventHistory h
) a
Group by PlayerID, BallID, Event,datename(month,EventDate)
SELECT d.PlayerID, d.BallID, h.Event, DATENAME(mm, h.EventDate) AS Month
FROM BallDetail d JOIN BallEventHistory h ON d.BallID = h.BallID
WHERE h.Event IN ('Pass', 'Shoot') AND d.PlayerID = 11
OR EXISTS (SELECT 1
FROM dbo.HistoryChanges c
WHERE c.ValueOld = 11 AND c.ValueNew = d.PlayerID AND c.ColumnName = 'PlayerID' and c.ChangeDate = h.EventDate)