Aggregating data by groups

Aggregating data by groups - sql

I have this table:
Location
PT_ID
Visit_DT
Discharge_DT
InjuryLevel
InjuryCode
Claim_ID
Cost
Ab1
0001
01-01-2021
01-03-2021
7
I03
clm078
-400
Ab1
0001
01-01-2021
01-03-2021
1
I03
clm079
400
Ab1
0001
01-01-2021
01-03-2021
3
I03
clm068
500
Ab1
0001
01-01-2021
01-03-2021
3
I03
clm008
75
Ab2
0002
04-11-2021
04-12-2021
5
I03
clm111
1000
Ab2
0002
05-01-2021
05-03-2021
5
I03
clm176
900
Ab2
0002
08-08-2021
08-09-2021
6
I03
clm187
2000
Whats happening:
PT 001 visits the hospital on 01-01-2021 and there are three claims that occur on that day, all for the same visit with different injury codes recorded. I would like to pick the max injurylevel for that patient on that day (7) and indicate that they had 1 visit that was equal to InjuryLevel6to10. For patient 002, they have 3 different visits, 2 that fall under InjuryLevel1to5 and 1 in InjuryLevel6to10 (as shown below).
For both patients I would also like to add up their total cost.
Desired Output:
Location
PT_ID
InjuryLevel1to5
InjuryLevel6to10
TotalCost
Ab1
0001
0
1
575
Ab2
0002
2
1
3900
Any help would be appreciated

Admittedly I'm not entirely sure of your criteria and your sample data needs a bit more variety to show each expected case.
However does the following work for you, or get you close?
select location, pt_id,
Sum(case when il between 1 and 5 then ct else 0 end) InjuryLevel1to5,
Sum(case when il between 6 and 10 then ct else 0 end) InjuryLevel6to10,
Sum(Cost) totalCost
from (
select location, pt_id,Visit_DT,
Max(InjuryLevel) il,
Count(distinct Visit_DT) ct,
Sum(cost) cost
from t
group by location, pt_id,Visit_DT
)t
group by PT_ID, Location;

Related

Rolling On-Hand Remainder column?

CONumber, LineNumber, PartNumber, OrderQty, ScheduleDate, OnHandQty columns are a pure SELECT query with no transformations. I am trying to recreate the RollingOnHand column in SQL.
The rules are
If a part only has one row, report the real [OnHandQty]
If a part has multiple rows, the oldest order consumes its [OrderQty] from [OnHandQty]
The next oldest order pulls its [OrderQty] from the remaining [OnHandQty], repeat until final row of the matching part
The last row of a given part will display the remaining [OnHandQty]
Is this possible to accomplish in an SQL query?
CONumber
LineNumber
PartNumber
OrderQty
ScheduleDate
OnHandQty
RollingOnHand
C02959
00002
Part 01
102
2022-04-01
0
0
C04017
00001
Part 02
2007
2022-04-01
5099
5099
C04107
00001
Part 03
1
2022-03-09
0
0
C04106
00001
Part 04
1
2022-03-09
0
0
C04108
00001
Part 05
1
2022-03-09
0
0
C03514
00002
Part 06
250
2022-03-11
310
250
C03514
00003
Part 06
250
2022-03-18
310
60
C03757
00001
Part 06
250
2022-04-06
310
0
C04225
00002
Part 07
40
2022-03-31
53
53
C03965
00002
Part 08
24
2022-04-04
0
0
C04034
00001
Part 09
88
2022-03-18
128
128
C04144
00002
Part 10
22
2022-04-04
0
0
C04141
00001
Part 10
100
2022-04-04
0
0
C03734
00003
Part 11
116
2022-03-29
103
103
C03379
00001
Part 12
128
2022-03-07
19
19
C03344
00003
Part 13
40
2022-03-11
5
5
C04058
00001
Part 14
407
2022-03-25
0
0
C03697
00002
Part 15
436
2022-04-04
235
235
C03689
00002
Part 16
111
2022-03-16
87
87
C03690
00001
Part 16
250
2022-03-23
87
0
C03690
00002
Part 16
250
2022-04-06
87
0
C03240
00004
Part 17
3
2022-03-16
30
3
C03725
00001
Part 17
250
2022-03-16
30
27
C03725
00002
Part 17
250
2022-03-23
30
0
C03726
00001
Part 17
250
2022-04-01
30
0
C03726
00002
Part 17
250
2022-04-06
30
0
C03596
00017
Part 18
56
2022-04-06
344
344
C03927
00001
Part 19
600
2022-04-04
1800
600
C03927
00002
Part 19
1000
2022-04-06
1800
1200

I think this basically does what you need (Fiddle)
WITH T AS
(
SELECT *,
AlreadyConsumed = SUM(OrderQty) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
PrevLineNumber = LAG([LineNumber]) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC),
NextLineNumber = LEAD([LineNumber]) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC)
FROM Demo
)
SELECT CONumber,
LineNumber,
PartNumber,
OrderQty,
ScheduleDate,
OnHandQty,
RollingOnHand = CASE
--If a part only has one row, report the real [OnHandQty]
WHEN PrevLineNumber IS NULL
AND NextLineNumber IS NULL THEN OnHandQty
--Not the last row and won't use all the remainder up
WHEN NextLineNumber IS NOT NULL AND Remainder > OrderQty THEN OrderQty
--otherwise use what's left
ELSE Remainder
END
FROM T
CROSS APPLY (SELECT CASE WHEN AlreadyConsumed > OnHandQty THEN 0 ELSE OnHandQty - ISNULL(AlreadyConsumed,0) END) C(Remainder)
The
SUM ... PARTITION BY [PartNumber] ... ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING computes the cumulative OrderQty for all rows before the current row (not including it)
The LAG/ LEAD results are used as indicators to determine whether we are in the first/last rows of a partition and special logic is needed.
I didn't quite follow the rationale behind the business logic so I may have made some invalid simplifications but it returns the desired results with the sample data and anyway the query should be easy to tweak if needed.

BD2: SQL _CASE with group by

I have the following tables
---SALARY_ITEMS---
PERSONID | EMPLOYMENT _REF | GROUP1 | CODE | FROM | END | QUANTI
000101 XYX 400 11101 2020-02-12 2020-02-12 12
000101 XYX 300 1100 2020-01-29 2020-02-29 1
000102 XYY 450 11111 2020-02-01 2020-02-12 19
000102 XYY 400 11101 2020-02-02 2020-02-12 82
000103 XYA 500 1100 2020-02-10 2020-02-12 11
000104 XYB 700 1100 2020-01-12 2020-02-12 24
---PERSON ---
PERSONID NAME
000101 Carolina
000102 Helen
000103 Jack
000104 Anna
---EMPLOYMENT---
PERSONID EMPLOYMENT _REF POSITION
000101 XYX doctor
000102 XYY nurse
000103 XYA nurse
000104 XYB Proffesor
----absent---
PERSONID CODE2 FROM END
000101 123 2020-03-01 2020-06-30
000102 120 2020-02-05 2020-02-13
000102 123 2020-03-01 2020-03-28
000103 115 2020-05-05 2020-06-30
000104 123 2020-02-01 2020-05-30
What I tried to do: get all employee that they are doctor and nurse and have certain group with certain code and works over 100 hours in a 2020 -Feb.
The following SQL query give me what i want But i want to add something to my query that is :
create a new column to see if the employee was absent in the same period 2020-feb with absent code 120 or 119 or both.
If he was I will get the 'CODE2' ELSE 'NOTHING'.
How can I do this in DB2?
This is the result I need to get:
PERSONID | NAME | POSITION | QUANTITY |ABSENT (this what i want to have)
000102 Helen NURSE 101 120
Query:
SELECT
S.PERSONID, P.NAME,E.POSTION , sum(S.QUANTITY) as QUANTITY
FROM
SALARY_ITEMS S
LEFT JOIN
PERSON P ON S.PERSONID = P.PERSONID
LEFT JOIN
EMPLOYMENT E ON E.EMPLOYMENT_REF = S.EMPLOYMENT _REF
WHERE
S.group1 IN ('400', '440', '450', '470', '640')
AND S.code IN ('11101', '11111', '11121', '11131', '11141')
AND S.from >= '2020-02-01'
AND S.end <= '2020-02-29'
AND E.POSTION IN ('nurse', 'doctor')
AND (SELECT SUM(S2.QUANTITY) AS QUANTITY2
FROM SALARY_ITEMS S2
WHERE S2.group1 IN ('400', '440', '450', '470', '640')
AND S2.code IN ('11101', '11111', '11121', '11131', '11141')
AND S2.from >= '2020-02-01'
AND S2.end <= '2020-02-29'
AND S.PERSONID = S2.PERSONID) >= '100'
GROUP BY
S.PERSONID, P.NAME, E.POSTION

Match group of variables and values with the nearest datetime

I have a transaction table that looks like that:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:11:00 001 101 -45 Y
2021-03-01 10:11:00 001 105 -25 Y
2021-03-01 10:11:00 001 109 -40 Y
The table does not have an id column; the transaction_start for a given store_no will never be the same.
Whenever a transaction is post voided, the transaction is then repeated with the same store_no, item_no but with a negative/minus amount and an equal or higher transaction_start. Also, the column post_voided is then equal to 'Y'.
In the example above, the rows 1-3 have the same transaction_start and store_no, thus belonging to the same receipt, containing three different items (101, 105, 109). The same logic is applied to the other rows: rows 4-5 belong to a same receipt, and so on. In the example, 4 different receipts can be seen. The last receipt, given by the last three rows, is a post voided of the first receipt (rows 1-3).
What I want to do is to change the transaction_start for the post_voided = 'Y' transactions (in my example, only one receipt - represented by the last three rows - has it) to the next/closest datetime of a similar receipt that has the variables store_no, item_no and (negative) amount (but post_voided = 'N') (in my example, the similar ticket is given by the first three rows - store_no, all item_no and (positive) amount match). The transaction_start for the post voided receipt is always equal or higher than the "original" receipt.
Desired output:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:00:00 001 101 -45 Y
2021-03-01 10:00:00 001 105 -25 Y
2021-03-01 10:00:00 001 109 -40 Y
Here a link of the table: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=26142fa24e46acb4213b96c86f4eb94b
Thanks in advance!

Consider below
select a.* replace(ifnull(b.transaction_start, a.transaction_start) as transaction_start)
from `project.dataset.table` a
left join (
select * replace(-amount as amount)
from `project.dataset.table`
where post_voided = 'N'
) b
using (store_no, item_no)
if applied to sample data in your question - output is
Consider below for new / extended example (https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=91f9f180fd672e7c357aa48d18ced5fd)
select x.* replace(ifnull(y.original_transaction_start, x.transaction_start) as transaction_start)
from `project.dataset.table` x
left join (
select b.transaction_start, b.store_no, b.item_no, b.amount amount,
max(a.transaction_start) original_transaction_start
from `project.dataset.table` a
join `project.dataset.table` b
on a.store_no = b.store_no
and a.item_no = b.item_no
and a.amount = -b.amount
and a.post_voided = 'N'
and b.post_voided = 'Y'
and a.transaction_start < b.transaction_start
group by b.transaction_start, b.store_no, b.item_no, b.amount
) y
using (store_no, item_no, amount, transaction_start)
with output

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;

You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

Incremental count in SQL Server 2005

I am working with a Raiser's Edge database using SQL Server 2005. I have written SQL that will produce a temporary table containing details of direct debit instalments. Below is a small table containing the key variables for the question I'm going to ask, with some fictional data:
Donor_ID Instalment_ID Instalment_Date Amount
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00
1234 1116 01/06/2011 £0.00
2345 2111 01/01/2011 £0.00
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00
2345 2115 01/05/2011 £0.00
2345 2116 01/06/2011 £0.00
As you will see, some of the values in the Amount column are £0.00. This can occur when a donor has insufficient funds in their account, for example.
What I'd like to do is write a SQL query that will create a field containing an incremental count of consecutive £0.00 payments that resets after a non-£0.00 payment or after a change in Donor_ID. I have reproduced the above data below, with the field I'd like to see.
Donor_ID Instalment_ID Instalment_Date Amount New_Field
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00 1
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00 1
1234 1116 01/06/2011 £0.00 2
2345 2111 01/01/2011 £0.00 1
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00 1
2345 2115 01/05/2011 £0.00 2
2345 2116 01/06/2011 £0.00 3
To help clarify what I'm looking for, I think what I'm looking to do would be similar to a winning streak field on a list of a football team's results. For example:
Opponent Score Winning_Streak
Arsenal 1-0 1
Liverpool 0-0
Swansea 3-1 1
Chelsea 2-1 2
Fulham 4-0 3
Stoke 0-0
Man Utd 1-3
Reading 2-1 1
I've considered various options, but have made no progress. Unless I've missed something obvious, I think that a solution more advanced than my current SQL programming level might be required.

If I am thinking about this problem correctly, I believe that you want a row number when the Amount is 0.00 pounds.
Select 0 as As InsufficientCount
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount > 0.00
Union
Select Row_Number() Over (Partition By Donor_ID Order By Installment_ID)
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount = 0.00
This union select should only give you 'ranks' where the Amount equals 0.

Am calling your new field streakAmount
ALTER TABLE instalments ADD streakAmount int NULL;
Then, to update the value:
UPDATE instalments
SET streakAmount =
(SELECT
COUNT(*)
FROM
instalments streak
WHERE
streak.donor_id = instalments.donor_id
AND
streak.instalment_date <= instalments.instalment_date
AND
(streak.instalment_date >
-- find previous instalment date, if any exists
COALESCE(
(
SELECT
MAX(instalment_date)
FROM
instalments prev
WHERE
prev.donor_id = instalments.donor_id
AND
prev.amount > 0
AND
prev.instalment_date < instalments.instalment_date
)
-- otherwise min date
, cast('1753-1-1' AS date))
)
)
WHERE
amount = 0;
http://sqlfiddle.com/#!6/a571f/18

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aggregating data by groups - sql

Related

Rolling On-Hand Remainder column?

BD2: SQL _CASE with group by

Match group of variables and values with the nearest datetime

Query to find active days per year to find revenue per user per year

Incremental count in SQL Server 2005

Categories

Resources