SQL select multiple values present in multiple columns - sql

I have two tables DiagnosisCodes and DiagnosisConditions as shown below. I need to find the members(IDs) who have a combination of Hypertension and Diabetes. The problem here is the DiagnosisCodes are spread across 10 columns. How do I check if the member qualifies for both conditions
DiagnosisCodes
+----+-------+-------+-------+-----+--------+
| ID | Diag1 | Diag2 | Diag3 | ... | Diag10 |
+----+-------+-------+-------+-----+--------+
| A | 2502 | 2593 | NULL | ... | NULL |
| B | 2F93 | 2509 | 2593 | ... | NULL |
| C | C257 | 2509 | C6375 | ... | NULL |
+----+-------+-------+-------+-----+--------+
DiagnosisConditions
+------+--------------+
| Code | Condition |
+------+--------------+
| 2502 | Hypertension |
| 2593 | Diabetes |
| 2509 | Diabetes |
| 2F93 | Hypertension |
| 2673 | HeartFailure |
+------+--------------+
Expected Result
+---------+
| Members |
+---------+
| A |
| B |
+---------+
How do I query to check Mulitple values which are present in Multiple columns. Do you suggest to use EXISTS?
SELECT DISTINCT id
FROM diagnosiscodes
WHERE ( diag1, diag2...diag10 ) IN (SELECT code
FROM diagnosiscondition
WHERE condition IN ( 'Hypertension','Diabetes' )
)

I would do this using group by and having:
select dc.id
from diagnosiscodes dc join
diagnosiscondistions dcon
on dcon.code in (dc.diag1, dc.diag2, . . . )
group by id
having sum(case when dcon.condition = 'diabetes' then 1 else 0 end) > 0 and
sum(case when dcon.condition = 'Hypertension' then 1 else 0 end) > 0;
Then, you should fix your data structure. Having separate columns with the same information distinguished by a number is usually a sign of a poor data structure. You should have a table, called somethhing like PatientDiagnoses with one row per patient and diagnosis.

Here is one way by unpivoting the data
SELECT DISTINCT id
FROM yourtable
CROSS apply (VALUES (Diag1),(Diag2),..(Diag10))tc(Diag)
WHERE Diag IN (SELECT code
FROM diagnosiscondition
WHERE condition IN ( 'Hypertension', 'Diabetes' ) group by code having count(distinct condition)=2)

Related

Selecting the two most common attribute pairings from a Entity-Attribute Table?

I have a simple Entity-Attribute table in my database describing simply if an Entity has some Attribute by the existance of a row consisting of (Entity, Attribute).
I want to find out, of all the Entities with two and only two Attributes, what are the most common Attribute pairs
For example, if my table looked like:
+--------+-----------+
| Entity | Attribute |
+--------+-----------+
| Bob | A |
| Sally | B |
| Terry | C |
| Bob | B |
| Sally | A |
| Terry | D |
| Larry | C |
+--------+-----------+
I would want it to return
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| C | D | 1 |
+-------------+-------------+-------+
I currently have a short query that looks like:
WITH TwoAtts (
SELECT entity
FROM table
GROUP BY entity
HAVING COUNT(att) = 2
)
SELECT t1.att, t2.att, COUNT(entity)
FROM table t1
JOIN table t2
ON t1.entity = t2.entity
WHERE t1.entity IN (SELECT * FROM TwoAtts)
AND t1.att != t2.att
GROUP BY t1.att, t2.att
ORDER BY COUNT(entity) DESC
but is only capable of producing "duplicate" results like
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| B | A | 2 |
| D | C | 1 |
| C | D | 1 |
+-------------+-------------+-------+
In a sense I would like to be able to run a unordered DISTINCT / set operator over the two attribute columns, but I am not sure how to acheive this functionality in SQL?
Hmmm, I think you want two levels of aggregation, with some filtering:
select attribute_1, attribute_2, count(*)
from (select min(ea.attribute) as attribute_1, max(ea.attribute) as attribute_2
from entity_attribute ea
group by entity
having count(*) = 2
) aa
group by attribute_1, attribute_2;
Here is a db<>fiddle

SQL Group rows for every ID using left outer join

I have a table with almost a million records of claims for 6 different conditions like Diabetes, Hypertension, Heart Failure etc. Every member has a number of claims. He might have claims with the condition as Diabetes or Hypertension or anything else. My goal is to group the conditions they have(number of claims) per every member row.
Existing table
+--------------+---------------+------+------------+
| Conditions | ConditionCode | ID | Member_Key |
+--------------+---------------+------+------------+
| DM | 3001 | 1212 | A1528 |
| HTN | 5001 | 1213 | A1528 |
| COPD | 6001 | 1214 | A1528 |
| DM | 3001 | 1215 | A1528 |
| CAD | 8001 | 1823 | B4354 |
| HTN | 5001 | 3458 | B4354 |
+--------------+---------------+------+------------+
Desired Result
+------------+------+-----+----+----+-----+-----+
| Member_Key | COPD | CAD | DM | HF | CHF | HTN |
+------------+------+-----+----+----+-----+-----+
| A1528 | 1 | | 2 | | | 1 |
| B4354 | | 1 | | | | 1 |
+------------+------+-----+----+----+-----+-----+
Query
select distinct tr.Member_Key,C.COPD,D.CAD,DM.DM,HF.HF,CHF.CHF,HTN.HTN
FROM myTable tr
--COPD
left outer join (select Member_Key,'X' as COPD
FROM myTable
where Condition=6001) C
on C.Member_Key=tr.Member_Key
--CAD
left outer join ( ....
For now I'm just using 'X'. But i'm trying to get the number of claims in place of X based on condition. I don't think using a left outer join is efficient when you are searching 1 million rows and doing a distinct. Do you have any other approach in solving this
You don't want so many sub-queries, this is easy with group by and case statements:
SELECT Member_Key
SUM(CASE WHEN Condition=6001 THEN 1 ELSE 0 END) AS COPD,
SUM(CASE WHEN Condition=3001 THEN 1 ELSE 0 END) AS DM,
SUM(CASE WHEN Condition=5001 THEN 1 ELSE 0 END) AS HTN,
SUM(CASE WHEN Condition=8001 THEN 1 ELSE 0 END) AS CAD
FROM myTable
GROUP BY Member_Key
This is an ideal situation for CASE statments:
SELECT tr.Member_Key,
SUM(CASE WHEN Condition=6001 THEN 1 ELSE 0 END) as COPD,
SUM(CASE WHEN Condition=6002 THEN 1 ELSE 0 END) as OtherIssue,
SUM(CASE etc.)
FROM myTable tr
GROUP BY tr.Member_Key
This should be done with a PIVOT, like:
SELECT *
FROM
(SELECT conditions, member_key
FROM t) src
PIVOT
(COUNT (conditions)
for conditions in ([COPD], [CAD], [DM], [HF], [CHF], [HTN])) pvt

Merging multiple rows according to an order

Suppose there are the following rows
| Id | MachineName | WorkerName | MachineState |
|----------------------------------------------|
| 1 | Alpha | Young | RUNNING |
| 1 | Beta | | STOPPED |
| 1 | Gamma | Foo | READY |
| 1 | Zeta | Zatta | |
| 2 | Guu | Niim | RUNNING |
| 2 | Yuu | Jaam | STOPPED |
| 2 | Nuu | | READY |
| 2 | Faah | Siim | |
| 3 | Iem | | RUNNING |
| 3 | Nyt | Fish | READY |
| 3 | Qwe | Siim | |
We want to merge these rows according to following priority :
STOPPED > RUNNING > READY > (null or empty)
If a row has a value for greatest priority, then value from that row should be used (only if it is not null). If it is null, a value from any other row should be used. The rows should be grouped by id
The correct output for the above input is :
| Id | MachineName | WorkerName | MachineState |
|----------------------------------------------|
| 1 | Beta | Foo | STOPPED |
| 2 | Yuu | Jaam | STOPPED |
| 3 | Iem | Fish | RUNNING |
What would be a good sql query to accomplish this? I tried using joins, but it did not work out.
You can view this as a case of the group-wise maximum problem, provided you can obtain a suitable ordering over your MachineState column—e.g. by using a CASE expression:
SELECT a.Id,
COALESCE(a.MachineName, t.MachineName) MachineName,
COALESCE(a.WorkerName , t.WorkerName ) WorkerName,
a.MachineState
FROM myTable a JOIN (
SELECT Id,
MIN(MachineName) AS MachineName,
MIN(WorkerName ) AS WorkerName,
MAX(CASE MachineState
WHEN 'READY' THEN 1
WHEN 'RUNNING' THEN 2
WHEN 'STOPPED' THEN 3
END) AS MachineState
FROM myTable
GROUP BY Id
) t ON t.Id = a.Id AND t.MachineState = CASE a.MachineState
WHEN 'READY' THEN 1
WHEN 'RUNNING' THEN 2
WHEN 'STOPPED' THEN 3
END
See it on sqlfiddle:
| id | machinename | workername | machinestate |
|----|-------------|------------|--------------|
| 1 | Beta | Foo | STOPPED |
| 2 | Yuu | Jaam | STOPPED |
| 3 | Iem | Fish | RUNNING |
You could save yourself the pain of using CASE if MachineState was an ENUM type column (defined in the appropriate order). It so happens in this case that a simple lexicographic ordering over the string value will yield the same result, but that's a coincidence on which you really shouldn't rely as it's bound to slip under the radar when someone tries to maintain this code in the future.
This is a prioritization query. One method uses variables. Another uses union all . . . this works if the states are not repeated for a given id:
select t.*
from table t
where machinestate = 'STOPPED'
union all
select t.*
from table t
where machinestate = 'RUNNING' and
not exists (select 1 from table t2 where t2.id = t.id and t2.machinestate in ('STOPPED'))
union all
select t.*
from table t
where machinestate = 'READY' and
not exists (select 1 from table t2 where t2.id = t.id and t2.machinestate in ('STOPPED', 'RUNNING'));
change MachineState as enum:
`MachineState` enum('READY','RUNNING','STOPPED') DEFAULT NULL
and sql is simple:
select t.id,state.machinename,state.workername,t.mstate from state,(select id,max(MachineState) mstate from state group by Id) t where t.mstate=state.machinestate and t.id=state.id;

PostgreSQL select all from one table and join count from table relation

I have two tables, post_categories and posts. I'm trying to select * from post_categories;, but also return a temporary column with the count for each time a post category is used on a post.
Posts
| id | name | post_category_id |
| 1 | test | 1 |
| 2 | nest | 1 |
| 3 | vest | 2 |
| 4 | zest | 3 |
Post Categories
| id | name |
| 1 | cat_1 |
| 2 | cat_2 |
| 3 | cat_3 |
Basically, I'm trying to do this without subqueries and with joins instead. Something like this, but in real psql.
select * from post_categories some-type-of-join posts, count(*)
Resulting in this, ideally.
| id | name | count |
| 1 | cat_1 | 2 |
| 2 | cat_2 | 1 |
| 3 | cat_3 | 1 |
Your help is greatly appreciated :D
You can use a derived table that contains the counts per post_category_id and left join it to the post_categories table
select p.*, coalesce(t1.p_count,0)
from post_categories p
left join (
select post_category_id, count(*) p_count
from posts
group by post_category_id
) t1 on t1.post_category_id = p.id
select post_categories.id, post_categories.name , count(posts.id)
from post_categories
inner join posts
on post_category_id = post_categories.id
group by post_categories.id, post_categories.name

How to apply a SUM operation without grouping the results in SQL?

I have a table like this one:
+----+---------+----------+
| id | group | value |
+----+---------+----------+
| 1 | GROUP A | 0.641028 |
| 2 | GROUP B | 0.946927 |
| 3 | GROUP A | 0.811552 |
| 4 | GROUP C | 0.216978 |
| 5 | GROUP A | 0.650232 |
+----+---------+----------+
If I perform the following query:
SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`;
I, obviously, get:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 4 | 0.216977506875992 |
+----+-------------------+
But I need a table like this one:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 3 | 2.10281205177307 |
| 4 | 0.216977506875992 |
| 5 | 2.10281205177307 |
+----+-------------------+
Where summed rows are explicitly repeated.
Is there a way to obtain this result without using multiple (nested) queries?
IT would depend on your SQL server, in Postgres/Oracle I'd use Window Functions. In MySQL... not possible afaik.
Perhaps you can fake it like this:
SELECT a.id, SUM(b.value) AS `sum`
FROM test AS a
JOIN test AS b ON a.`group` = b.`group`
GROUP BY a.id, b.`group`;
No there isn't AFAIK. You will have to use a join like
SELECT t.`id`, tsum.sum AS `sum`
FROM `test` as t GROUP BY `group`
JOIN (SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`) AS tsum
ON tsum.id = t.id