Pulling distinct values for individuals - sql

I am trying to get a read out of all patients with a result in any of the 4 axis categories on their most recent date from proc_chron and where the patient is a(active) under their case_status well excluding patients where all four categories are null. SQL server 2005.
select
pct.patient_id,
pct.clinic_id,
pct.axis_I_II_1,
pct.axis_I_II_2,
pct.axis_I_II_3,
pct.axis_III_1,
pct.proc_chron
from patient_clin_tran pct
join patient p
on p.patient_id = pct.patient_id
group by pct.patient_id, pct.clinic_id, pct.axis_I_II_1,pct.axis_I_II_2, pct.axis_I_II_3, pct.axis_III_1, p.case_status, pct.proc_chron
having p.case_status = 'a' and pct.proc_chron = (select max(pct.proc_chron))
order by pct.patient_id

select
pct.patient_id,
pct.clinic_id,
pct.axis_I_II_1,
pct.axis_I_II_2,
pct.axis_I_II_3,
pct.axis_III_1,
pct.proc_chron
from patient_clin_tran pct
join patient p
on p.patient_id = pct.patient_id
where p.case_status = 'a'
and pct.proc_chron = (select max(proc_chron) from patient_clin_tran pct2 where pct2.patientid = p.patientid)
and (pct.axis_I_II_1 is not null or pct.axis_I_II_2 is not null or pct.axis_I_II_3 is not null or pct.axis_I_II_4 is not null)
order by pct.patient_id

Related

SQL : Percentage Completed

I need to have a SQL query to calculate the percentage of courses completed by location which are different SQL tables.
Courses table has a Status = 'C' (Completed status).
select Locations.Name, ( ??? ) as PercentCompleted
from Locations inner join Candidates ON Locations.Id = Candidates.SpecifiedLocation
inner join Courses on Candidates.Id = Courses.CandidateId
Group By Locations.Name
I want the results to be:
Location PercentCompleted
Loc1 10
Loc2 50
Loc3 75
where 10, 50 and 75 are percentages of courses completed per location.
Can this be achieved with a single SQL query?
If I understand correctly, I think you can do:
select l.Name,
avg(case when co.status = 'C' then 100.0 else 0 end) as PercentCompleted
from Locations l inner join
Candidates c
on l.Id = c.SpecifiedLocation inner join
Courses co
on c.Id = co.CandidateId
group by l.name;
try like below
select Locations.Name, (sum(case when Status = 'C' then 1 else 0 end)/(select count(*)
from Candidates c where c.SpecifiedLocation=Locations.Id))*100
as PercentCompleted
from Locations inner join Candidates ON Locations.Id = Candidates.SpecifiedLocation
inner join Courses on Candidates.Id = Courses.CandidateId
Group By Locations.Name

Pull top percent of IDs from a list based off of another CTE percentage

I have the below CTE to pull in membership then to pull in all claims for those members ( if they had a claim hit within the date parameters) , now from the total membership (=1961) I need to pull in the TOP 3% or 0.03 from the claims CTE. I see 1961*0.03 is rounded to 59, so I need to pull in the top (59) Medicaid IDs from Claims with the highest total of claims utilization .
So for example in the number_to_pull CTE that gives what lines need to pull in ( it gives the 3% of membership ), then in the sum_of_claims CTE I want to ONLY pull in the top 3% Medicaid IDs from Claims CTE.. Since the membership can change depending on the date parms I want the sum_of_claims to have something like the below but I am not sure how to get started
End result is I will have a list top (3%) of Medicaid IDs who have the most hits per claim for the date span called out
I need something like this, but I want it to pull in whatever the number is in the number_to_pull CTE and then to pull That number based off of the sum of claims.
Select Top ( select
round(count(mt.medicaid_no)*0.03) as percentt
from membership mt)
cll.medicaid_no
,count(distinct claim_number) as sum_of_claims
from claims cll
Group by cll.medicaid_no
) select * from sum_of_claims
This is what my codes actually looks like
WITH
DATES AS
(
select TRUNC(TRUNC(SYSDATE,'y')-1,'y') as startdate,
TRUNC(SYSDATE,'y')-1 as enddate
from dual
),
membership as (
select Distinct
mbr.medicaid_no
,mbd.memb_dim_id
,mbd.memb_demographics_full_date
from dw.fact_member_demographics mbd
inner join dates d
on 1=1
inner join dw.DIM_MEMBER mbr
on mbd.memb_dim_id = mbr.memb_dim_id
Where EXTRACT(YEAR FROM mbd.memb_demographics_full_date)= extract(year from d.startdate)
and mbd.company_dim_id in ('575')
and mbd.age > 18
) ---select * from membership
,number_to_pull as (
select
round(count(mt.medicaid_no)*0.03) as percentt
from membership mt
) ---select * from top_number
,Claims as (
select
mbdd.medicaid_no
,mbdd.memb_dim_id
,dc.company_desc
,cl.primary_svc_date
,cl.claim_number
,case when cl.io_flag_dim_id = '1' then 'Inpatient'
when cl.io_flag_dim_id = '2' then 'Outpatient' else 'false' end as In_Op
,cl.admit_type
,proc.procedure_code
,dx1.diagnosis_code as dx1
,dx1.diagnosis_short_desc as dx1desc
,dx2.diagnosis_code as dx2
,dx2.diagnosis_short_desc as dx2desc
,dx3.diagnosis_code as dx3
,dx3.diagnosis_short_desc as dx3desc
,dx4.diagnosis_code as dx4
,dx4.diagnosis_short_desc as dx4desc
,dx5.diagnosis_code as dx5
,dx5.diagnosis_short_desc as dx5desc
,bt.inp_outp_ind
from membership mbdd
left join dw.fact_claim cl
on mbdd.memb_dim_id = cl.memb_dim_id
inner join dates d
on 1=1
inner join dw.DIM_PROCEDURE_CODE proc
on cl.cpt_code_dim_id = proc.procedure_dim_id
inner join dw.DIM_DIAGNOSIS dx1
on cl.diagnosis_1_dim_id = dx1.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx2
on cl.diagnosis_2_dim_id = dx2.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx3
on cl.diagnosis_3_dim_id = dx3.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx4
on cl.diagnosis_4_dim_id = dx4.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx5
on cl.diagnosis_5_dim_id = dx5.diagnosis_dim_id
inner join dw.DIM_BILL_TYPE bt
on cl.bill_type_dim_id = bt.bill_type_dim_id
inner join dw.DIM_COMPANY dc
on cl.company_dim_id = dc.company_dim_id
Where cl.primary_svc_date between d.startdate and d.enddate
and cl.company_dim_id in ('575')
and CL.WHOLE_CLAIM_STATUS_DIM_ID IN (1,2)
and cl.io_flag_dim_id in ('1','2')
) ---select * from claims
,sum_of_claims AS (
Select
---- this is where I want to pull in the top 3% based off of membeship and sum of claims per Medicaid
cll.medicaid_no
,count(distinct claim_number) as sum_of_claims
from claims cll
Group by cll.medicaid_no
) select * from sum_of_claims
The end result I want is a list of Medicaid IDs and there total sum of claims, but this list will ONLY be the top 59 lines (3%)
MEDICAID_NO SUM_OF_CLAIMS
111111 $12,439.61
333333 $5,315.57
444444 $2,007.00
555555 $1,823.98
888888 $1,770.00
777777 $1,211.47
9999999 $1,157.61
6666666 $1,068.76
If I read your question, correctly, you want to get the top 3% here? This is the final query:
select * from sum_of_claims;
I think you want to replace this with something like the following:
SELECT medicaid_no, sum_of_claims FROM (
SELECT medicaid_no, sum_of_claims, COUNT(*) OVER () AS total_cnt
, ROW_NUMBER() OVER ( ORDER BY sum_of_claims DESC ) AS rn
FROM sum_of_claims
) WHERE rn <= 0.03 * total_cnt;
This will get the "top" 3% of records (where "top" is defined as those with the greatest claim amounts).
By the way, I find it hard to believe that this is what you want:
,count(distinct claim_number) as sum_of_claims
That wouldn't give a sum at all!
Hope this helps.
Thanks David F. I was able to pull in the top 3% members from claims . Below is my code .
WITH
DATES AS
(
select TRUNC(TRUNC(SYSDATE,'y')-1,'y') as startdate,
TRUNC(SYSDATE,'y')-1 as enddate
from dual
),
membership as (
select Distinct
mbr.medicaid_no
,mbd.memb_dim_id
,mbd.memb_demographics_full_date
from dw.fact_member_demographics mbd
inner join dates d
on 1=1
inner join dw.DIM_MEMBER mbr
on mbd.memb_dim_id = mbr.memb_dim_id
Where EXTRACT(YEAR FROM mbd.memb_demographics_full_date)= extract(year from d.startdate)
and mbd.company_dim_id in ('575')
and mbd.age > 18
) ---select * from membership
,Claims as (
select
mbdd.medicaid_no
,mbdd.memb_dim_id
,dc.company_desc
,cl.primary_svc_date
,cl.claim_number
,case when cl.io_flag_dim_id = '1' then 'Inpatient'
when cl.io_flag_dim_id = '2' then 'Outpatient' else 'false' end as In_Op
,cl.admit_type
,proc.procedure_code
,dx1.diagnosis_code as dx1
,dx1.diagnosis_short_desc as dx1desc
,dx2.diagnosis_code as dx2
,dx2.diagnosis_short_desc as dx2desc
,dx3.diagnosis_code as dx3
,dx3.diagnosis_short_desc as dx3desc
,dx4.diagnosis_code as dx4
,dx4.diagnosis_short_desc as dx4desc
,dx5.diagnosis_code as dx5
,dx5.diagnosis_short_desc as dx5desc
,bt.inp_outp_ind
,cl.net_amt
from membership mbdd
left join dw.fact_claim cl
on mbdd.memb_dim_id = cl.memb_dim_id
inner join dates d
on 1=1
inner join dw.DIM_PROCEDURE_CODE proc
on cl.cpt_code_dim_id = proc.procedure_dim_id
inner join dw.DIM_DIAGNOSIS dx1
on cl.diagnosis_1_dim_id = dx1.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx2
on cl.diagnosis_2_dim_id = dx2.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx3
on cl.diagnosis_3_dim_id = dx3.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx4
on cl.diagnosis_4_dim_id = dx4.diagnosis_dim_id
inner join dw.DIM_DIAGNOSIS dx5
on cl.diagnosis_5_dim_id = dx5.diagnosis_dim_id
inner join dw.DIM_BILL_TYPE bt
on cl.bill_type_dim_id = bt.bill_type_dim_id
inner join dw.DIM_COMPANY dc
on cl.company_dim_id = dc.company_dim_id
Where cl.primary_svc_date between d.startdate and d.enddate
and cl.company_dim_id in ('575')
and CL.WHOLE_CLAIM_STATUS_DIM_ID IN (1,2) -- pulling in only paid claims -- use whole claim
and cl.io_flag_dim_id in ('1','2')
) ---select * from claims
,sum_of_claims AS (
Select
cll.medicaid_no
,sum(distinct cll.net_amt) as sum_of_claims
from claims cll
Group by cll.medicaid_no
)
----- this is the new part added below
SELECT medicaid_no, sum_of_claims FROM (
SELECT sh.medicaid_no, sh.sum_of_claims, COUNT(*) OVER () AS total_cnt
, ROW_NUMBER() OVER ( ORDER BY sh.sum_of_claims DESC ) AS rn
FROM sum_of_claims sh
)
Where (rn)<(SELECT round(count(medicaid_no)*0.03) as percentt
from membership)

Filter between dates grouping 3 tables in SQL Server

I have this SQL in SQL Server:
SELECT
Itens.Mercadoria, Mercadoria.Nome, Cabecalho.Data,
SUM(ValorUnitario) AS Total,
SUM(Quantidade) AS Quantidade
FROM
Itens
INNER JOIN
Mercadoria ON Itens.Mercadoria = Mercadoria.Codigo
INNER JOIN
Cabecalho ON Cabecalho.Codigo = Itens.Cabecalho
WHERE
Cabecalho.Data >= '2016-01-01'
AND Cabecalho.Data <= '2018-12-31'
GROUP BY
Itens.Mercadoria, Mercadoria.Nome, Cabecalho.Data
ORDER BY
4 DESC
It is returning the following result.
The highlighted values are repeating, I do not want to be repeated, I want to show only once each item and that the Quantidade and Total fields are SUM.
For example:
`Camisa Polo` -> **Quantidade = 23**
`Calça Jeans` -> **Quantidade = 15**
`Camiseta Estampada` -> **Quantidade = 21**
Assuming thate the relation between Sales and SaleItems is based on SalesId
you can use between assign to your_start_date and your_end_date a proper value
select Products.ProductName
, sum(SaleItems.Price)
, sum(SaleItems.Quantity)
from Products
inner join SaleItems on SaleItems.IdProduct = Products.IdProduct
inner join Sales on Sales.IdSale = SaleItems.IdSale
where SaleDate between your_start_date and your_end_date
group by Products.ProductName
In you case remove or aggregated the Cabecalho.Data column eg:
SELECT Itens.Mercadoria
, Mercadoria.Nome
, SUM(ValorUnitario) AS Total
, SUM(Quantidade) AS Quantidade
FROM Itens INNER JOIN Mercadoria ON Itens.Mercadoria = Mercadoria.Codigo
INNER JOIN Cabecalho ON Cabecalho.Codigo = Itens.Cabecalho
WHERE Cabecalho.Data between '2016-01-01' AND '2018-12-31'
GROUP BY Itens.Mercadoria, Mercadoria.Nome
ORDER BY 4 DESC
or
SELECT Itens.Mercadoria
, Mercadoria.Nome
, max(Cabecalho.Data)
, SUM(ValorUnitario) AS Total
, SUM(Quantidade) AS Quantidade
FROM Itens INNER JOIN Mercadoria ON Itens.Mercadoria = Mercadoria.Codigo
INNER JOIN Cabecalho ON Cabecalho.Codigo = Itens.Cabecalho
WHERE Cabecalho.Data between '2016-01-01' AND '2018-12-31'
GROUP BY Itens.Mercadoria, Mercadoria.Nome
ORDER BY 4 DESC

Aggregation for join ON clause

I have a table item_table like this:
item age
--------------
1 1
1 6
2 2
I have the other table price_table like this:
item pricetype price
--------------------------
1 O 5
1 P 6
1 V 7
2 O 8
2 P 9
2 V 10
So, I want to inner join above two tables.
select *
from item_table i
inner join price_table p
on ...
There are some conditions about the on:
if the average of age of an item is bigger than 3, then I do: inner join price_table on pricetype = 'O' or pricetype = 'P'
If not, then I do: inner join price_table on pricetype = 'O' or pricetype = 'P' or pricetype = 'V'
So there are conditions for on conditions.
I then write the query like this:
select i.item, i.type, p.pricetype, p.price
from item_table i
inner join price_table p on i.item = p.item
and (avg(i.age) >= 3 and p.pricetype in ('O', 'P'))
or (avg(i.age) < 3 and p.pricetype in ('O', 'P', 'V'))
The error is given: An aggregate cannot appear in an ON clause unless it is in a subquery contained in a HAVING clause or select list, and the column being aggregated is an outer reference.
I can't move the avg to Having because other conditions are depending on the avg.
How can I write the select query?
select *
from (
select item, avg(age) as AvgAge
from item_table
group by item
) ia
inner join price_table p on ia.item = p.item
and ((ia.AvgAge >= 3 and p.pricetype in ('O', 'P'))
or (ia.AvgAge < 3 and p.pricetype in ('O', 'P', 'V')))
SQL Fiddle Example 1
This can be simplified to:
select *
from (
select item, avg(age) as AvgAge
from item_table
group by item
) ia
inner join price_table p on ia.item = p.item
and (p.pricetype in ('O', 'P')
or (ia.AvgAge < 3 and p.pricetype = 'V'))
SQL Fiddle Example 2
Did you try placing the aggregation in a subquery, then you have the avg() value for use in the JOIN clause:
select i.item, i.type, p.pricetype, p.price
from
(
select avg(i.age) age, i.item, i.type -- not sure where type is coming from in your OP as it is not in the table you showed
from item_table i
group by i.item, i.type
) i
inner join price_table p
on i.item = p.item
and ((i.age>= 3 and p.pricetype in ('O', 'P'))
or (i.age < 3 and p.pricetype in ('O', 'P', 'V')))

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc
I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.
Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA