I've spent a lot of time trying to see if this exists elsewhere, but unfortunately it doesn't. I think I've solved it, but am looking for any advice on how to make this a bit more elegant/streamlined. Hopefully this will help someone else!
CMS publishes a Risk Adjustment model where diseases are grouped into hierarchies. Only the most severe form of the disease is counted toward a patient's Risk Adjustment Score. CMS does publish the model in SAS, but not in SQL. Every other aspect of the model is straightforward apart from applying the hierarchy / trumping logic below.
There are two tables, one containing member/patient IDs and their hierarchical condition categories (HCCs). The other table is the hierarchy table, where only the most severe form of the HCC is meant to be kept:
Members:
memberId
HCC
A
17
A
18
A
19
B
18
B
19
C
19
Hierarchy
HCC
dropHCC
17
18
17
19
18
19
If a member has 17, 18, and 19, only 17 would be kept as a result. If a member only has 19, then 19 would remain. 17 is considered a more severe form of the condition category which includes 18 and 19, but for scoring purposes we'd only want to count 17.
So, applying the Hierarchy to the Members table, the results should be:
memberId
HCC
A
17
B
18
C
19
As mentioned, I've already solved this. I'm wondering if there are any other ways that are more efficient/elegant?
;with members as (
select 123456 as memberID, 17 as hcc
UNION
select 123456 as memberID, 18 as hcc
UNION
select 123456 as memberID, 19 as hcc
UNION
select 2222222 as memberID, 19 as hcc
UNION
select 9999999 as memberID, 18 as hcc
UNION
select 9999999 as memberID, 19 as hcc
)
, Hierarchy as
(
Select 17 as hcc, 18 as dropHCC, 'diabetes1' as hccCategory
UNION
Select 17 as hcc, 19 as dropHCC, 'diabetes1' as hccCategory
UNION
Select 18 as hcc, 19 as dropHCC, 'diabetes2' as hccCategory
)
select m.*--, h2.dropHCC as hccRemovedBy
from members m
left join(
select m.*, r.drophcc
from members m
inner join ( select memberid, m.hcc, h.drophcc
from members m
inner join hierarchy h on h.hcc = m.hcc) r on
r.memberid = m.memberid
and r.dropHCC = m.hcc) h2
on h2.memberID = m.memberID
and h2.hcc = m.hcc
where h2.dropHCC is null --remove this criteria in the event you want to see what dropped
Why can't you just use MIN(HCC)?
If this is simplified data and the real HCC codes are not integers but are truly hierarchical then you might need to look at a recursive CTE. These can be tricky to write, but basically it is a CTE that has two parts, the first part generates a base dataset (the top of the hierarchy) and then UNIONS a second query that references the CTE you are creating, normally with some sort of additive value (Hierarchy depth or concatenated string). You can then match the correct hierarcy record to the rows in your dataset and use them as a filter (e.g. MIN(HierarchyDepth) or LEFT(HierarchyString, LEN(HCC) = HCC)
Depending on the context of where you use it there are recursion limits. By default these are 100, but you can explicity set an option to change that, BUT NOT IF YOU ARE USING IT IN A TABLE FUNCTION!!!!. It is computationally quite expensive so if the data is slow to change you may want to persist it to the database so it can be indexed.
Related
I'm very much new to SQL and I'm trying to use CROSS APPLY, something I know very little about.
I'm trying to pull two SUMs of items sorted by an ID from two different tables. One SUM of all items dispensed by a cartridge, one SUM of all items refilled into a cartridge. The dispenses and refills are in separate tables. In Sample 1 you can see a piece of code that works for one of these two SUMs, currently its for the Dispensed SUM, but it also works if I change everything for the refilled SUM. Point being I can only do one SUM in this CROSS APPLY, regardless which one of the two.
So it goes wrong when I try to pull both SUMs in this one CROSS APPLY, probably cause I don't really know what I'm doing. I try to do this with the code seen in Sample 2 (which is pretty much the same code).
Some extra context:
There are two ID's here that are important:
The CartridgeRefill.FK_CartridgeRegistration_Id (or ID) is the ID for a cartridge itself. The FK_CartridgeRefill_Id is the ID for a refill, a cartridge can go through multiple refills and dispenses are registered by what refill they were dispensed from. That's why you can see the same ID multiple times in the output.
Sample 1:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
Sample 2:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed, Sums.Refilled
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
,SUM(CartridgeRefillItem.Amount) AS Refilled
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
When I run sample 1 I get this output:
ID Dispensed
10 95
8 143
6 143
11 70
11 312
11 354
8 19
8 24
8 3
8 33
This output is correct, it displays the number of dispensed items next to the ID it belongs to.
This is the error I get when I run sample 2:
Msg 4101, Level 15, State 1, Line 15
Aggregates on the right side of an APPLY cannot reference columns from the left side.
But what I want to see is:
ID Dispensed Refilled (example)
10 95 143
8 143 12
6 143 etc...
11 70
11 312
11 354
8 19
8 24
8 3
8 33
I think it has something to do with CROSS APPLY running line by line? But again, I still don't exactly know what I'm doing yet. Any help would be really appreciated and please ask whatever you need to know :)
Error is quite self explanatory, you cannot run an aggregate using a reference that's outside of CROSS APPLY. You'll need to rewrite your query by adding a additional subquery to calculate SUM or use a GROUP BY clause. I've quickly scraped this:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed, SUM(CartridgeRefillMedication.Amount) AS Refilled
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillMedication.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
GROUP BY CartridgeRefill.FK_CartridgeRegistration_Id;
Hopefully this works.
You may not want aggregation at all. The number of rows is not being reduced, so this may be what you want:
SELECT cr.FK_CartridgeRegistration_Id AS ID,
d.Dispensed, cr.Amount AS Refilled
FROM CartridgeRefillItem cr CROSS APPLY
(SELECT SUM(cd.Amount) AS Dispensed
FROM CartridgeDispenseAttempt c
WHERE cr.FK_CartridgeRefill_Id = cd.FK_CartridgeRefill_Id
) d;
I would expect that you want separate totals for each id. If so, then your sample results are not sensible because ids are repeated. But this would seem to do something useful:
select id, sum(refill_amount) as refill_amount,
sum(dispensed_amount) as dispensed_amount
from ((select cr.FK_CartridgeRegistration_Id as id,
cr.Amount as refill_amount,
0 as dispensed_amount
from CartridgeRefillItem cr
) union all
(select cd.FK_CartridgeRegistration_Id as id,
0, cd.Amount
from CartridgeDispenseAttempt cd
)
) c
group by id
I'm playing with BigQuery and nested tables, and SQL is not my strong suit. I have a real problem with actual production data that I'm trying to solve, and at the same time trying to break-in some SQL/BQ concepts into my head.
My query is similar to some of what's on the Working with Arrays in Standard SQL page, but similar is not close enough for me just yet.
Let me throw you some example data structured very similarly to my real data, then describe what I need out of it.
Basically, I have two tables, and I want to use one to filter the other.
Table 1 has some two-level nesting and can be built like this:
WITH data AS (
SELECT "Test 1" AS name, [STRUCT(1 AS id, [20, 21] AS results), STRUCT(2 AS id, [22, 23] AS results)] AS resultset
UNION ALL
SELECT "Test 2" AS name, [STRUCT(1 AS id, [23, 24] AS results), STRUCT(2 AS id, [25, 26] AS results)] AS resultset
UNION ALL
SELECT "Test 3" AS name, [STRUCT(1 AS id, [26, 27] AS results), STRUCT(2 AS id, [28, 29] AS results)] AS resultset
)
SELECT * FROM data
What the numbers mean is irrelevant. What's important is that table 2 contains ranges that I want to use to filter table 1. Table 2 can be built like this:
ranges AS (
SELECT "Range 1" AS title, 24.0 AS min, 25.0 AS max
UNION ALL
SELECT "Range 2" AS title, 26.0 AS min, 27.0 AS max
)
SELECT * from ranges
What I want to end up with is rows from the first table where any result matches one or more of the ranges in the second table, but none of the rows with no matches.
I know I can juggle some UNNEST()ing and JOINing of the two tables to get at a result which is filtered, but which will contain duplicates because of the unnesting:
WITH data AS (
SELECT "Test 1" as name, [STRUCT(1 as id, [20, 21] as results), STRUCT(2 as id, [22, 23] as results)] as resultset
UNION ALL
SELECT "Test 2" as name, [STRUCT(1 as id, [23, 24] as results), STRUCT(2 as id, [25, 26] as results)] as resultset
UNION ALL
SELECT "Test 3" as name, [STRUCT(1 as id, [26, 27] as results), STRUCT(2 as id, [28, 29] as results)] as resultset
),
ranges AS (
SELECT "Range 1" AS title, 24.0 as min, 25.0 as max
UNION ALL
SELECT "Range 2" AS title, 26.0 as min, 27.0 as max
)
SELECT data.*
FROM data, UNNEST(resultset), UNNEST(results) r
JOIN ranges
ON r BETWEEN min AND max
So this is what I have:
Row name resultset.id resultset.results
1 Test 2 1 23
24
2 25
26
2 Test 2 1 23
24
2 25
26
3 Test 2 1 23
24
2 25
26
4 Test 3 1 26
27
2 28
29
5 Test 3 1 26
27
2 28
29
What I want is to call DISTINCT data.* in the SELECT to reduce this back down to the two unique rows and be done with it.
In other words, this is what I want:
Row name resultset.id resultset.results
1 Test 2 1 23
24
2 25
26
2 Test 3 1 26
27
2 28
29
But I cannot do that with nested data.
So, I have two questions:
How do I collapse identical rows in this case?
Have I led myself up the wrong path, and is there a better way to achieve this?
Regarding the data: I cannot change the first table. The second table I can screw around with, if it leads to a simple solution.
Below is for BigQuery Standard SQL
The simplest solution would be (w/o actually changing the core of query you already have) is to add GROUP BY as below
#standardSQL
SELECT ANY_VALUE(data).*
FROM data, UNNEST(resultset), UNNEST(results) r
JOIN ranges ON r BETWEEN min AND max
GROUP BY TO_JSON_STRING(data)
this works! But I don't understand why. Can you elaborate?
Sure.
SELECT DISTINCT ... FROM ... conceptually equivalent to SELECT ... GROUP BY
So, the task was to find appropriate value to GROUP BY and respective Aggregation function (required by GROUP BY)
ANY_VALUE and TO_JSON_STRING(data) are what we needed here
Try selecting the data you want from the datasets. This query return unique but unnested results:
SELECT data.name, rs.id, r
FROM data
left join UNNEST(resultset) rs
left join UNNEST(results) as r
JOIN ranges ON r BETWEEN min AND max
I have been trying for several days to figure out a solution to this issue but have not been able to come up with an answer. What I have is a data set that looks like this:
Id ParentId Name
16 NULL i_ss_16_Grommets
25 16 ss_25_Grommets
26 NULL inactive_Grommets Clone
27 NULL inactive_Grommets Clone Clone
46 25 ss_46_Grommets
47 46 ss_47_Grommets
48 47 Grommets
What I need to come up with is a function where I can pass an Id and then get the correct Name. The way that I need to find the name involves a sort of reverse hierarchy since it is the youngest child in a branch that will be used. For example, if I pass in Id 46, I need the function to return 'Grommets'. If I pass in Id 47, I need to see 'Grommets', if I pass in Id 26, I would see 'inactive_Grommets Clone' since there are no descendents.
Even though it looks like I could just strip off anything with an underscore after it, I would not be able to since there is no guarantee that the child will be named the same.
Hopefully this makes sense. Any help would be greatly appreciated.
Option with recursive CTE
DECLARE #Id int = 46
;WITH cte AS
(
SELECT Id, ParentId, Name
FROM dbo.test60
WHERE Id = #Id
UNION ALL
SELECT t.Id, t.ParentId, t.Name
FROM dbo.test60 t JOIN cte c ON t.ParentId = c.Id
)
SELECT TOP 1 *
FROM cte
ORDER BY Id DESC
Demo on SQLFiddle
I have a single table that lists dependencies and I can't figure out how I can sort this in the actual order the diagram is displayed (using DB2 SQL)
Diagram (Lists out the GROUP)
34 -> 23 -> 65 ->....
The goal is to sort in the order of the diagram
The Table has two fields GROUP and DEPEND. The ideal first column would be 34,0 (0 since it is dependent on nothing) followed by 23,34 (dependent on GROUP 34), followed by 65,23 (dependent on GROUP 23). Following the pattern of GROUP, DEPEND.
So the results would be as follows:
1. GROUP DEPEND
2. 34 0
3. 23 34
4. 65 23
Is it possible to use a variable or something to view the previous record's GROUP to determine the next row?
Thanks so much for any assistance or ideas
Current versions of DB2 support recursive queries, so the following should work (at least it does with my DB2 9.7 Express-C on Windows):
with dep_tree (groupno, depend, group_order) as (
select groupno, depend, 1 as group_order
from group_list
where depend = 0
union all
select c.groupno, c.depend, p.group_order + 1
from group_list c,
dep_tree p
where p.groupno = c.depend
)
select group_order, groupno, depend
from dep_tree
order by group_order;
Note that I used groupno instead of GROUP as the column name, because GROUP is a reserved word and should not be used as a column name.
I hope that someone can help me with my issue. I need to create in a single SELECT statement (the system that we use has some pivot tables in Excel that handle one single SELECT) the following:
I have a INL (Invoice Lines) table, that has a lot of fields, but the important one is the date.
INL_ID DATE
19 2004-03-15 00:00:00.000
20 2004-03-15 00:00:00.000
21 2004-03-15 00:00:00.000
22 2004-03-16 00:00:00.000
23 2004-03-16 00:00:00.000
24 2004-03-16 00:00:00.000
Now, I also have a ILD (Invoice Line Details) that are related by an ID field to the INL table. From the second table I will need to use the scu_qty field to "repeat" values from the first one in my results sheet.
The ILD table values that we need are:
INL_ID scu_qty
19 1
20 1
21 1
22 4
23 4
Now, with the scu_qty I need to repeat the value of the first table and also add one day each record, the scu_qty is the quantity of days of the services that we sell in the ILD table.
So I need to get something like (i'm going to show the INL_ID 22 that you can see has a value different of 1 in the SCU_QTY). The results of the select has to give me something like:
INL_ID DATE
22 2004-03-15 0:00:00
22 2004-03-16 0:00:00
22 2004-03-17 0:00:00
22 2004-03-18 0:00:00
In this information I only wrote the fields that need to be repeated and calculated, of course I will need more fields, but will be repeated from the INL table, so I don't put them so you don't get confused.
I hope that someone can help me with this, it's very important for us this report. Thanks a lot in advance
(Sorry for my English, that isn't my first language)
SELECT INL_ID, scu_qty, CalculatedDATE ...
FROM INL
INNER JOIN ILD ON ...
INNER JOIN SequenceTable ON SequenceTable.seqNo <= ILD.scu_qty
ORDER BY INL_ID, SequenceTable.seqNo
Depending on your SQL flavour you will need to lookup date manipulation functions to do
CalculatedDATE = {INL.DATE + SequenceTable.seqNo (days)}
select INL.INL_ID, `DATE`
from
INL
inner join
ILD on INL.INL_ID = ILD.INL_ID
inner join (
select 1 as qty union select 2 union select 3 union select 4
) s on s.qty <= ILD.scu_qty
order by INL.INL_ID
In instead of that subselect you will need a table if quantity is a bit bigger. Or tell what is your RDBMS and there can be an easier way.