Conditional SUM in SQL Server 2014 - sql

I am using SQL Server 2014. When I was testing my code I noticed a problem.
Assume that max personal hour is 80 hours.
SELECT
lsm.EmployeeName,
pd.absenceDate,
pd.amountInDays * 8 AS [HoursReported],
pd.status,
(SUM(CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END) OVER (partition by lsm.[EmployeeName] order by pd.absenceDate)) AS [TotalUsedHours]
( #maxPSHours ) - (sum(
CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END)
over (
partition by lsm.[EmployeeName] order by pd.absenceDate)) AS [TotalRemainingHours]
FROM
[LocationStaffMembers] lsm
INNER JOIN
[PersonalDays] pd ON lsm.staffMemberId = pd.staffMemberId
This query returns these results:
EmployeeName AbsenceDate HoursReported Status TotalUsdHrs TotalRemingHrs
X 11/11/2015 4 approved 4 76
X 11/15/2015 8 approved 12 68
X 11/20/2015 2 decline 14 66
X 11/20/2015 2 approved 14 66
So, query works fine for different status. First 2 rows are fine. But when an employee does more than one action in a day (decline, approved etc.), my query only shows the total used and total remaining for the day.
Here is the expected result.
EmployeeName AbsenceDate HoursReported Status TotalUsdHrs TotalRemingHrs
X 11/11/2015 4 approved 4 76
X 11/15/2015 8 approved 12 68
X 11/20/2015 2 decline 12 68
X 11/20/2015 2 approved 14 66

You are doing a cumulative sum that returns results based on the order of AbsenceDate (sum(...) over (partition by ... order by pd.absenceDate). But your last 2 records have the exact same date (11/20/2015) -- at least, according to what you are showing us. This creates an ambiguity.
So, it is absolutely conceivable, and legal, that SQL Server is processing the 2 approved hours row before the 2 declined hours row when calculating the cumulative sum --which would explain your current results--, despite the fact that rows themselves are returned to you in a different order (BTW, consider adding an order by clause to the query, otherwise, the order of the rows themselves are not guaranteed).
If the 2 rows do in fact share the exact same date, you'll have to find a 2nd column to remove the ambiguity and add that to the order by clause in the cumulative sum window function. Maybe you could add a timestamp field that you can order by.
Or maybe you always want the declined status to be considered ahead of the approved status when the AbsenceDate is the same. Here is an example of a query that would do exactly that (notice the changes in the order by clauses):
SELECT
lsm.EmployeeName,
pd.absenceDate,
pd.amountInDays * 8 AS [HoursReported],
pd.status,
(SUM(CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END) OVER (partition by lsm.[EmployeeName] order by pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end)) AS [TotalUsedHours]
( #maxPSHours ) - (sum(
CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END)
over (
partition by lsm.[EmployeeName] order by pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end)) AS [TotalRemainingHours]
FROM
[LocationStaffMembers] lsm
INNER JOIN
[PersonalDays] pd ON lsm.staffMemberId = pd.staffMemberId
ORDER BY lsm.[EmployeeName],
pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end

Related

Snowflake SQL: trying to calculate time difference between subsets of subsequent rows

I have some data like the following in a Snowflake database
DEVICE_SERIAL
REASON_CODE
VERSION
MESSAGE_CREATED_AT
NEXT_REASON_CODE
BA1254862158
1
4
2022-06-23 02:06:03
4
BA1254862158
4
4
2022-06-23 02:07:07
1
BA1110001111
1
5
2022-06-16 16:19:04
4
BA1110001111
4
5
2022-06-16 17:43:04
1
BA1110001111
5
5
2022-06-20 14:37:45
4
BA1110001111
4
5
2022-06-20 17:31:12
1
that's the result of a previous query. I'm trying to get the difference between message_created_at timestamps where the device_serial is the same between subsequent rows, and the first row (of the pair for the difference) has reason_code of 1 or 5, and the second row of the pair has reason_code 4.
For this example, my desired output would be
DEVICE_SERIAL
VERSION
DELTA_SECONDS
BA1254862158
4
64
BA1110001111
5
5040
BA1110001111
5
10407
It's easy to calculate the time difference between every pair of rows (just lead or lag + datediff). But I'm not sure how to structure a query to select only the desired rows so that I can get a datediff between them, without calculating spurious datediffs.
My ultimate goal is to see how these datediffs change between versions. I am but a lowly C programmer, my SQL-fu is weak.
with data as (
select *,
count(case when reason_code in (1, 5) then 1 end)
over (partition by device_serial order by message_created_at) as grp
/* or alternately bracket by the end code */
-- count(case when reason_code = 4 then 1 end)
-- over (partition by device_serial order by message_created_at desc) as grp
from T
)
select device_serial, min(version) as version,
datediff(second, min(message_created_at), max(message_created_at)) as delta_seconds
from data
group by device_serial, grp

Calculate moving average with null values

I have a school graduation data set by year and subgroup and have been provided the numerator and denominator and the single year graduation rate but I also need to calculate a 3 year moving average. I was advised by a statistician that no longer works with us that to do this I needed to get the running total for the numerator for 3 years and the running total for 3 years for the denominator. I understand the math behind it and have checked my work by hand and via excel with a few subgroups. I have also calculated this using T-SQL with no problem so long as there are no null records but I’m struggling with the calculation when there are nulls or 0.
I have tried running the query accounting for null by using NULLIF
ID,
Bldg,
GradClass,
Sbgrp ,
TGrads,
TStus,
Rate,
/*Numerator Running total*/
SUM (TGrads) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) AS NumSum,
/*Denominator Running Total*/
SUM ( TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) AS DenSum,
/*Moving Year Average*/
(
( SUM ( TGrads) OVER ( partition BY DistrictID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ) / NULLIF ( ( SUM ( TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ), 0 ) * 100
) AS 3yrAvg
FROM
KResults.DGSRGradBldg
First question, I was provided a record for all subgroups even if they didn’t have students in the subgroup. I want to keep the record so that all subgroups are accounted for within the district and since I know that they didn’t have data, can I substitute the Null values in Tgrads, TStus with a 0? If I do substitute those values with a 0 how can I show the rate as null?
Second question how can I compute the rate with either a null or 0 denominator? I understand you can’t divide by 0 but I want to maintain the record so it’s easy and clear to see that they had no data. How can I do this? When I try to calculate this without accounting for Null I get errors, 1.)Divide by zero error encountered. (8134) and 2.) Null value is eliminated by an aggregate or other SET operation. (8153).
Knowing I can’t divide by 0 or Null I modified my query to include NULLIF and when I do that the query runs with no errors but I don’t get accurate percentage for rates that are below 100%. All my rates are now either 100% or 0 - note the last row, the moving average of 2/3 is not 0.
Here’s what the data looks like if I try to account for nulls my Moving three year average shows as 0. Note the Moving three year Avg Column shows all 0.
ID Bldg Class Sbggrp TGrads TStus Rate NumSum DenSum 3yrAvg
A 1 2014 A1 46 49 93.9 46 49 0
A 1 2015 A1 41 46 89.1 87 95 0
A 1 2016 A1 47 49 95.9 134 144 0
A 1 2017 A1 38 40 95.0 126 135 0
A 1 2018 A1 59 59 98.3 143 148 0
A 1 2014 A2 1 1 100 1 1 100
A 1 2015 A2 1 1 100
A 1 2016 A2 1 1 100
A 1 2017 A2 2 3 66.7 2 3 0
A 1 2018 A2 2 2 100 4 5 0
Any advice would be appreciated but please provide suggestions kindly to this newbie.
Thanks for your time and help.
Answer to question 1: put in the select condition
ISNULL(TGrads,0) AS TGRADS,
ISNULL(TStus,0) AS TSTUS,
Answer to question 2: I'd do this
(CASE WHEN SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) IS NOT NULL
AND SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) <>0
THEN (SUM(TGrads) OVER ( partition BY DistrictID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) / (SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ) ) * 100
ELSE NULL END
) AS 3yrAvg
I put null after "ELSE"...You can choose your default value.

Finding first sighting in SQL

We have a time series in an spark sql table which describes every time a user does an event.
However, users tend to do many events in a burst. I want to find the min time for everyone of these bursts.
Unfortunately this is historical data so I cant change how the table was created. So I essentially want a select min(time_), user from my_table group by user, but for each burst. Any help would be much appreciated!
EDIT:
Some example data would be:
user time_
0 10
0 11
2 12
0 12
2 13
2 15
0 83
0 84
0 85
so for example in the above data I would like to find (0, 10), (2, 12) and (0, 83). We can say that a burst occurs if it is within 1 hour (that would be 60 in the above example data).
If this is the only information you need:
select user
,time_
from (select user
,time_
,case when time_ - lag (time_,1,time_-60) over (partition by user order by time_) >= 60 then 'Y' else null end as burst
from my_table
) t
where burst = 'Y'
;
user time_
0 10
0 83
2 12
If you'll need to gather some additional information on each burst:
select user
,burst_seq
,min (time_) as min_time_
,max (time_) as max_time_
,count (*) as events_num
from (select user
,time_
,count(burst) over
(
partition by user
order by time_
rows unbounded preceding
) + 1 as burst_seq
from (select user
,time_
,case when time_ - lag (time_) over (partition by user order by time_) >= 60 then 'Y' else null end as burst
from my_table
) t
) t
group by user
,burst_seq
;
user burst_seq min_time_ max_time_ events_num
0 1 10 12 3
0 2 83 85 3
2 1 12 15 3
P.s.
There seems to be a bug with the CASE statement.
case when ... then 'Y' end yields FAILED: IndexOutOfBoundsException Index: 2, Size: 2 although it is a legal syntax.
Adding else null solved it.

Count of distinct values per day, excluding reoccuring until value changes

I'm really struggling with how to explain this so I'll try and give you the format of the table below, and the desired outcome.
I have a table which contains a uniqueID, date, userID and result. I'm trying to count the number of results that are 'Correct' per day, but I only want to count unique occurances based on the userID column. I then want to exclude any furhter occurances of 'Correct' for that particular userID, until the result for the userID changes to 'Success'.
UID Date UserID Result
1 01/01/2014 5 Correct
2 01/01/2014 5 Correct
3 02/01/2014 4 Correct
4 03/01/2014 4 Correct
5 03/01/2014 5 Incorrect
6 03/01/2014 4 Incorrect
7 05/01/2014 5 Correct
8 07/01/2014 4 Correct
9 08/01/2014 5 Success
10 08/01/2014 4 Success
Based on the above data, I'd expect to see the below:
Date Correct Success
01/01/2014 1 0
02/01/2014 1 0
03/01/2014 0 0
05/01/2014 0 0
07/01/2014 0 0
08/01/2014 0 2
Can anyone help? I'm using SQL Server 2008
Use count(distinct) with case:
select date,
count(distinct case when result = 'Correct' then UserId end) as Correct,
count(distinct case when result = 'Success' then UserId end) as Success
from data d
group by date
order by date;
EDIT:
The above counts correct on all occurrences. If you only want the first one to be counted:
select date,
count(case when result = 'Correct' and seqnum = 1 then UserId end) as Correct,
count(case when result = 'Success' and seqnum = 1 then UserId end) as Success
from (select d.*,
row_number() over (partition by UserId, result order by Uid) as seqnum
from data d
) d;
In this case, the distinct is unnecessary.

Referencing the value of the previous calculcated value in Oracle

How can one reference a calculated value from the previous row in a SQL query? In my case each row is an event that somehow manipulates the same value from the previous row.
The raw data looks like this:
Eventno Eventtype Totalcharge
3 ACQ 32
2 OUT NULL
1 OUT NULL
Lets say each Eventtype=OUT should half the previous row totalcharge in a column called Remaincharge:
Eventno Eventtype Totalcharge Remaincharge
3 ACQ 32 32
2 OUT NULL 16
1 OUT NULL 8
I've already tried the LAG analytic function but that does not allow me to get a calculated value from the previous row. Tried something like this:
LAG(remaincharge, 1, totalcharge) OVER (PARTITION BY ...) as remaincharge
But this didn't work because remaingcharge could not be found.
Any ideas how to achieve this? Would need a analytics function that can give me the the cumulative sum but given a function instead with access to the previous value.
Thank you in advance!
Update problem description
I'm afraid my example problem was to general, here is a better problem description:
What remains of totalcharge is decided by the ratio of outqty/(previous remainqty).
Eventno Eventtype Totalcharge Remainqty Outqty
4 ACQ 32 100 0
3 OTHER NULL 100 0
2 OUT NULL 60 40
1 OUT NULL 0 60
Eventno Eventtype Totalcharge Remainqty Outqty Remaincharge
4 ACQ 32 100 0 32
3 OTHER NULL 100 0 32 - (0/100 * 32) = 32
2 OUT NULL 60 40 32 - (40/100 * 32) = 12.8
1 OUT NULL 0 60 12.8 - (60/60 * 12.8) = 0
In your case you could work out the first value using the FIRST_VALUE() analytic function and the power of 2 that you have to divide by with RANK() in a sub-query and then use that. It's very specific to your example but should give you the general idea:
select eventno, eventtype, totalcharge
, case when eventtype <> 'OUT' then firstcharge
else firstcharge / power(2, "rank" - 1)
end as remaincharge
from ( select a.*
, first_value(totalcharge) over
( partition by 1 order by eventno desc ) as firstcharge
, rank() over ( partition by 1 order by eventno desc ) as "rank"
from the_table a
)
Here's a SQL Fiddle to demonstrate. I haven't partitioned by anything because you've got nothing in your raw data to partition by...
A variation on Ben's answer to use a windowing clause, which seems to take care of your updated requirements:
select eventno, eventtype, totalcharge, remainingqty, outqty,
initial_charge - case when running_outqty = 0 then 0
else (running_outqty / 100) * initial_charge end as remainingcharge
from (
select eventno, eventtype, totalcharge, remainingqty, outqty,
first_value(totalcharge) over (partition by null
order by eventno desc) as initial_charge,
sum(outqty) over (partition by null
order by eventno desc
rows between unbounded preceding and current row)
as running_outqty
from t42
);
Except it gives 19.2 instead of 12.8 for the third row, but that's what your formula suggests it should be:
EVENTNO EVENT TOTALCHARGE REMAININGQTY OUTQTY REMAININGCHARGE
---------- ----- ----------- ------------ ---------- ---------------
4 ACQ 32 100 0 32
3 OTHER 100 0 32
2 OUT 60 40 19.2
1 OUT 0 60 0
If I add another split so it goes from 60 to zero in two steps, with another non-OUT record in the mix too:
EVENTNO EVENT TOTALCHARGE REMAININGQTY OUTQTY REMAININGCHARGE
---------- ----- ----------- ------------ ---------- ---------------
6 ACQ 32 100 0 32
5 OTHER 100 0 32
4 OUT 60 40 19.2
3 OUT 30 30 9.6
2 OTHER 30 0 9.6
1 OUT 0 30 0
There's an assumption that the remaining quantity is consistent and you can effectively track a running total of what has gone before, but from the data you've shown that looks plausible. The inner query calculates that running total for each row, and the outer query does the calculation; that could be condensed but is hopefully clearer like this...
Ben's answer is the better one (will probably perform better) but you can also do it like this:
select t.*, (connect_by_root Totalcharge) / power (2,level-1) Remaincharge
from the_table t
start with EVENTTYPE = 'ACQ'
connect by prior eventno = eventno + 1;
I think it's easier to read
Here is a demo