Pick up only the most recent object associated with a person - sql

I'm having trouble with an exam question, I didn't get it correct on the exam but want to know what I am missing.
The Question:
List the first diagnosis for each patient showing the patient's name,
diagnosis code and diagnosis date. If the patient has two or more
diagnoses on the earliest date, it's okay to just show one of those
diagnoses. Tables needed: encounters, patients, encounter_diagnoses,
diagnoses. Your result set should have four rows.
Here is what I had:
select p.patient_nm, max(e.start_dts)
from edw_emr_ods.patients p
join edw_emr_ods.encounters e
on p.patient_id = e.patient_id
join edw_emr_ods.encounter_diagnoses ed
on e.encounter_id = ed.encounter_id
left join edw_emr_ods.diagnoses d
on ed.encounter_diagnoses_id = d.diagnosis_id
group by p.patient_nm
order by p.patient_nm asc
As you can see I did not include the Diagnosis code. (more on that later) This returned 4 rows:
When I attempt to add the Diagnosis code I get
"Column 'edw_emr_ods.diagnoses.code' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
The only way I could figure out to remove this is to add code in the group by, since it is unable to sort the code and name together without it. But this returns too many rows per patient.
So my question is this "How do I only pick up the name, date, and code of the most recent diagnosis?"

Why don't use rownumber() over(...) to answer your question : "How do I only pick up the name, date, and code of the most recent diagnosis :
SELECT * FROM (
select p.patient_nm, e.start_dts row_number() over(partition by ed.code,p.patient_nm order by e.start_dts desc) as rn
from edw_emr_ods.patients p
join edw_emr_ods.encounters e
on p.patient_id = e.patient_id
join edw_emr_ods.encounter_diagnoses ed
on e.encounter_id = ed.encounter_id
left join edw_emr_ods.diagnoses d
on ed.encounter_diagnoses_id = d.diagnosis_id ) AS T
WHERE rn =1

Related

SQL Join from Two Tables showing only maximum date in one table

I have two tables, visits and encounters. Each Visit by a student may have several encounters, at different times. I would like a query with visitid, encounterid, and encounterdate showing ONLY the latest encounter for each visit, My results MUST include visits with no encounters.
My tables ;
Visits
visit_id
studenti_id
Encounters
encounter_id
visit_id
encounter_datetime
What I have tried
select
Visits.visit_id,
Encounters.encounter_id,
Encounters.encounter_datetime
FRom Visits
LEFT OUTER JOIN Encounters
ON Visits.visit_id = Encounters.visit_id
INNER JOIN (
select Encounters.visit_id, MAX(Encounters.encounter_datetime)as Latest
from Encounters
group by Encounters.visit_id
) as NewEncounters
ON Encounters.visit_id = NewEncounters.visit_id
AND Encounters.encounter_datetime = NewEncounters.Latest
This returns the results I want, HOWEVER, Visits without encounters are not in the results.
I actually don't see a clean way to salvage your direct join approach, but if your database support ROW_NUMBER, it is an easy option:
WITH cte AS (
SELECT v.visit_id, e.encounter_id, e.encounter_datetime,
ROW_NUMBER() OVER (PARTITION BY v.visit_id ORDER BY e.encounter_datetime DESC) rn
FROM Visits v
LEFT JOIN Encounters e ON v.visit_id = e.visit_id
)
SELECT visit_id, encounter_id, encounter_datetime,
FROM cte
WHERE rn = 1;
For the problem of getting the max of several dates I give an (untested! Sorry)
code example, which, however, points out the line of approach.
Select
Visits.visit_id,
a.encounter_id,
max(a.encounter_datetime) as Max_Datetime
FRom Visits
LEFT OUTER JOIN Encounters a
ON Visits.visit_id = a.Encounters.visit_id
inner join
Encounters b
on a.visit_id=b.visit_id
and
a.encounter_datetime<=b.encounter_datetime
group by
Visits.visit_id,
a.encounter_id,
a.encounter_datetime;
For visits without encounters you can union a query with a where clause using Is Null.
Maybe your database needs some syntactic fumbling with ; etc.

Filtering SELECT TOP WITH TIES When No Records Exist for a Specific Column

Question: How can I filter my results (see below) to exclude erroneous data? I'm guessing my problem is somewhere in the WHERE clause but for the life of me, I can't figure it out.
End Goal: Return NULL values for the CDA_Orientation column where no values exist in the portfolio and e_component tables (e.g. employee has not had Orientation yet)
DB Schema:
Result Set with Errors:
NOTE: The Orientation dates for Eastman, DeLuca, and Fontano are the same date and represent the TOP 1 result from the course_startdate column of the portfolio table.
What I Want the Results to Look Like:
If I've done my JOINS correctly, the CDA_Orientation column should show NULL because there is no entry in the portfolio table (and accordingly, the e_component table) for these three individuals. The entry is only created by the front end when the Employee is assigned to a course.
Here is My Code:
SELECT TOP (1) WITH TIES
P.lastname+', '+P.firstname AS Employee,
P.person_id,
CONVERT(DATE,PC.CDAI_EXP_DATE) AS CDA_Infant,
CONVERT(DATE,PC.CDAP_EXP_DATE) AS CDA_Preschool,
CONVERT(DATE,PO.course_startdate) AS CDA_Orientation
FROM person P
JOIN person_custom PC ON PC.person_id=P.person_id
LEFT JOIN portfolio PO ON P.person_id=PO.person_id
FULL JOIN e_component EC ON PO.component_id=EC.component_id
WHERE (cdai_exp_date IS NOT NULL OR cdap_exp_date IS NOT NULL)
AND PO.course_startdate IN (SELECT course_startdate
FROM portfolio PO
LEFT JOIN e_component EC ON PO.component_id=EC.component_id
WHERE (EC.userdefined_id LIKE '000150%' AND PO.status=11))
ORDER BY ROW_NUMBER() OVER(PARTITION BY P.lastname+', '+P.firstname
ORDER BY PO.person_id)
NOTE: The TOP (1) WITH TIES has successfully pulled the most recent orientation date (employees can have more than one) from the portfolio table for Tarkin and Rust. I've cut out any and all unnecessary JOINS and caveats.
Thanks in advance!
I believe the joins are the issue. Using WITH TIES in that way is also confusing if you're just trying to get a record for each person; I would use a GROUP BY. If you wanted to do it without a sub-query you could do:
SELECT
P.lastname+', '+P.firstname AS Employee,
P.person_id,
CONVERT(DATE,PC.CDAI_EXP_DATE) AS CDA_Infant,
CONVERT(DATE,PC.CDAP_EXP_DATE) AS CDA_Preschool,
MAX(CONVERT(DATE,PO.course_startdate)) AS CDA_Orientation
FROM #person P
JOIN #person_custom PC
ON PC.person_id=P.person_id
LEFT JOIN
(#portfolio PO
JOIN #e_component EC
ON PO.component_id=EC.component_id
AND EC.userdefined_id LIKE '000150%'
AND PO.status=11)
ON P.person_id=PO.person_id
WHERE (cdai_exp_date IS NOT NULL OR cdap_exp_date IS NOT NULL)
GROUP BY P.lastname, P.firstname, P.person_id,PC.CDAI_EXP_DATE,PC.CDAP_EXP_DATE

Oracle Sql Duplicate rows when joining new table

I am using oracle sql to join tables. I use the following code:
SELECT
T.TRANSACTION_KEY,
PR.ACCOUNT_KEY,
T.ACCT_CURR_AMOUNT,
T.EXECUTION_LOCAL_DATE_TIME,
TC.DESCRIPTION,
T.OPP_ACCOUNT_NAME,
T.OPP_COUNTRY,
PT.PARTY_TYPE_DESC,
P.PARTY_NAME,
P.CUSTOM_SMALL_STRING_02,
CO.COUNTRY_NAME,
LE.LIST_CD
FROM TRANSACTIONS T
LEFT JOIN TRANSACTION_CODE TC
ON T.TRANSACTION_CODE = TC.ENTITY
LEFT JOIN PARTY_ACCOUNT_RELATION PR
ON T.ACCOUNT = PR.ACCOUNT
LEFT JOIN PARTY P
ON PR.PARTY_KEY = P.PARTY_KEY
LEFT JOIN PARTY_TYPE PT
ON P.PARTY_TYPE = PT.ENTITY
LEFT JOIN COUNTRY CO
ON T.OPP_COUNTRY = CO.ENTITY
LEFT JOIN LISTED_ENTITY LE
ON CO.COUNTRY = LE.ENTITY_KEY
WHERE
PR.PARTY_KEY = '111111111' and T.EXECUTION_LOCAL_DATE_TIME>'2017-01-01';
It works fine until now but I want to join another table which has a column in common(ENTITY_KEY) with PARTY_ACCOUNT_RELATION table (ACCOUNT_KEY) and I want to include some of the new table's columns but when I do that, it becomes dublicated. I am adding the following lines before "where" statment:
LEFT JOIN EVALUATE_RULE ER
ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
Does anyone know where the problem is?
If joining another table into an existing query causes the existing rows to be duplicated, it is because the table being joined in has duplicate values in the columns that are being used as keys for the join
In your case, if you do
SELECT ENTITY_KEY FROM EVALUATE_RULE GROUP BY ENTITY_KEY HAVING COUNT(*) > 1
You'll see which entity_keys are duplicated. When these duplicates are joined to the existing data, the existing data has to be doubled up to permit both rows from EVALUATE_RULE with the same ENTITY_KEY to exist in the result set
You must either de-dupe the table, or put other clauses into your ON condition to further restrict the rows coming from EVALUATE_RULE.
For example, after adding EVALUATE_RULE and putting ER.* in your SELECT list, imagine that you can see that the rows from ER are status = 'old' and status = 'current' but you know you only want the current ones.. So put AND er.status = 'current' in your ON clause
Your comment indicates that multiple records differ by some column you don't care about, so this technique will just select only one row:
LEFT JOIN
(SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e) er
ON
er.entity_key = pr.account_key and
er.rown = 1
If you want info on why this works, run that sql in isolation:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e
ORDER BY e.entity_key -- i added this to make it more clear what is going on. You don't need it in your main query
It just assigns a number to each row in the table, the number restarts at 1 every time entity_key changes, so we can then select all those with rown = 1
If it turns out you DO want something specific like "the latest row from evaluate_rule", you can use something like this:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.created_date DESC) as rown FROM evaluate_rule e
Now the latest created_date row will always have rown = 1
So far as I can understain from your description, table EVALUATE_RULE has moro records with ACCOUNT_KEY=ENTITY_KEY.
You can change your query section:
LEFT JOIN EVALUATE_RULE ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
to
LEFT JOIN (SELECT DISTINCT ENTITY_KEY FROM EVALUATE_RULE) ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
If you post structure of EVALUATE_RULE (indicating PK columns) I can change my answer to let you includ EVALUATE_RULE columns in final query.

Specified number of equal rows in oracle

I'm working on a medical database with oracle where I'm trying to find randomly matched samples. I have created a table with all patients and after that a table with the patients which have the illness I am looking for. Now I am wondering if it is possible to match exactly 3 randomly chosen patients to my target group (so each patient from the target group gets 3 randomly chosen patients from the table with all patients) compared on the basis of gender and year of birth.
SELECT A.PATIENTID
FROM ALLPATIENTS A,
DIAGNOSES B
WHERE A.YEAROFBIRTH = B.YEAROFBIRTH
AND A.GENDER = B.GENDER
AND A.PATIENTID NOT IN (SELECT PATIENTID
FROM DIAGNOSES);
My query is showing me all patients which have a match in the diagnoses group. That means that patients from diagnoses group with a for example more common year of birth are overrepresented. Thats why I want only 3 samples for each patient from my diagnoses group. I hope you could get an idea of what im talking about
Thanks so much
You could solve the problem by using weights in your analysis. However, that is not your question. Here is a way to get three randomly selected rows:
select *
from (select d.PATIENTID, p.PATIENTID,
ROW_NUMBER() OVER (PARTITION BY d.PATIENTID ORDER BY dbms_random.value) as seqnum
from ALLPATIENTS p join
DIAGNOSES d
where p.YEAROFBIRTH = d.YEAROFBIRTH AND
p.GENDER = d.GENDER AND
p.PATIENTID NOT IN (select d2.PATIENTID from DIAGNOSES d2)
) dp
where seqnum <= 3;
This enumerates all the matching rows and then randomly chooses three. Note: this is with replacement, so a patient can appear in more than one cohort. Without replacement is more challenging, but possible.

Joining two most recent events from two tables

I'm trying to build a report in SQL that shows when a patient last received a particular lab service and the facility at which they've received that service. Unfortunately, the lab procedure and facility are in different tables. Here is what I have now (apologies in advance for my weird aliasing, it makes better since with the actual table names):
;with temp as (Select distinct flow.pid, flow.labdate as obsdate, flow.labvalue as obsvalue
From labstable as flow
Where flow.name = 'lab name'
)
Select distinct p.patientid, MAX(temp.obsdate) [Last Reading], COUNT(temp.obsdate) [Number of Readings],
Case
When count(temp.obsdate) > 2 then 'Active'Else 'Inactive' End [Status], facility.NAME [Facility]
From Patientrecord as p
Join temp on temp.pid = p.PId
Join (Select loc.name, MAX(a.apptstart)[Last appt], a.patientid
From Appointmentstable as a
Join Facility as loc on loc.facilityid = a.FacilityId
Where a.ApptStart = (Select MAX(appointments.apptstart) from Appointments where appointments.patinetId = a.patientid)
Group by loc.NAME, a.patientId
) facility on facility.patientId = p.PatientId
Group by p.PatientId, facility.NAME
Having MAX(temp.obsdate) between DATEADD(yyyy, -1, GETDATE()) and GETDATE()
Order by [Last Reading] asc
My problem with this is that if the patient has been to more than one facility within the time frame, the subquery is selecting each facility into the join, inflating the results by apprx 4000. I need to find a way to select ONLY the VERY MOST RECENT facility from the appointments list, then join it back to the lab. Labs do not have a visitID (that would make too much sense). I'm fairly confident that I'm missing something in either my subquery select or the corresponding join, but after four days I think I need professional help.
Suggestions are much appreciated and please let me know where I can clarify. Thank you in advance!
Change your subquery with alias "facility" to something like this:
...
join (
select patientid, loc_name, last_appt
from (
select patientid, loc_name=loc.name, last_appt=apptstart,
seqnum=row_number() over (partition by patientid order by apptstart desc)
from AppointmentsTable a
inner join Facility loc on loc.facilityid = a.facilityid
) x
where seqnum = 1
) facility
on ...
...
The key difference is the use of the row_number() windowing function. The "partition by" and "order by" clauses guarantee you'll get one set of row numbers per patient and the row with the most recent date will be assigned row number 1. The filter of "seqnum = 1" makes sure you get only the one row you want for each patient.