Specified number of equal rows in oracle - sql

I'm working on a medical database with oracle where I'm trying to find randomly matched samples. I have created a table with all patients and after that a table with the patients which have the illness I am looking for. Now I am wondering if it is possible to match exactly 3 randomly chosen patients to my target group (so each patient from the target group gets 3 randomly chosen patients from the table with all patients) compared on the basis of gender and year of birth.
SELECT A.PATIENTID
FROM ALLPATIENTS A,
DIAGNOSES B
WHERE A.YEAROFBIRTH = B.YEAROFBIRTH
AND A.GENDER = B.GENDER
AND A.PATIENTID NOT IN (SELECT PATIENTID
FROM DIAGNOSES);
My query is showing me all patients which have a match in the diagnoses group. That means that patients from diagnoses group with a for example more common year of birth are overrepresented. Thats why I want only 3 samples for each patient from my diagnoses group. I hope you could get an idea of what im talking about
Thanks so much

You could solve the problem by using weights in your analysis. However, that is not your question. Here is a way to get three randomly selected rows:
select *
from (select d.PATIENTID, p.PATIENTID,
ROW_NUMBER() OVER (PARTITION BY d.PATIENTID ORDER BY dbms_random.value) as seqnum
from ALLPATIENTS p join
DIAGNOSES d
where p.YEAROFBIRTH = d.YEAROFBIRTH AND
p.GENDER = d.GENDER AND
p.PATIENTID NOT IN (select d2.PATIENTID from DIAGNOSES d2)
) dp
where seqnum <= 3;
This enumerates all the matching rows and then randomly chooses three. Note: this is with replacement, so a patient can appear in more than one cohort. Without replacement is more challenging, but possible.

Related

Micorosft Access SQL - Counting Number of foreign key records across 3 related tables

I am experienced at VBA but new to SQL.
I am developing a test sheet program in MS Access for the plant that I work at. This test sheet program will be used across 3 product lines.
When an order is created, it can contain up to all 3 products. The products are unique enough that I cannot put them all into their own table. So I have broken the test sheets up into 3 tables, each table representing its respective product test sheet. Please see the image below for my relationship setup.
What I am trying to do:
I am trying to design a query that will be my master list (outputting to a continuous form). The master list will show all orders, and also show how many units have been tested in each order. See below for my desired output.
My Issue:
It is not properly counting the number of related records. See the linked photo.
I know my key field is Order Number but I am searching by Job name. Originally my key field was job name but then switched it to order number.
thank you for your time, I am happy to provide more information if needed.
Consider joining three aggregate saved queries:
SELECT OrderNumber
, COUNT(*) AS CountofCassetteTests
FROM TblCassetteTestSheet
GROUP BY OrderNumber
SELECT OrderNumber
, COUNT(*) AS CountofSentinelTests
FROM TblSentinelTestSheet
GROUP BY OrderNumber
SELECT OrderNumber
, COUNT(*) AS CountofHUVTests
FROM TblHUVTestSheet
GROUP BY OrderNumber
Then join to TblJobName in your final query (parentheses are required):
SELECT j.OrderNumber, j.JobName
, c.CountofCassetteTests AS [# Of Cassettes Tested]
, s.CountofSentinelTests AS [# Of Sentinels Tested]
, h.CountofHUVTests AS [# Of HUV Tested]
, j.JobEndDate, j.SPONumber
FROM ((TblJobName j
LEFT JOIN QryCassetteTestSheet c
ON j.OrderNumber = c.OrderNumber)
LEFT JOIN QrySentinelTestSheet s
ON j.OrderNumber = s.OrderNumber)
LEFT JOIN QryHUVTestSheet h
ON j.OrderNumber = h.OrderNumber
Conceivably you can run all in one query using subqueries (and maybe one day even Common Table Expressions, CTEs, if the MS Access team ever enhances its older SQL dialect):
SELECT j.OrderNumber, j.JobName
, c.CountofCassetteTests AS [# Of Cassettes Tested]
, s.CountofSentinelTests AS [# Of Sentinels Tested]
, h.CountofHUVTests AS [# Of HUV Tested]
, j.JobEndDate, j.SPONumber
FROM ((TblJobName j
LEFT JOIN
(SELECT OrderNumber, COUNT(*) AS CountofCassetteTests
FROM TblCassetteTestSheet
GROUP BY OrderNumber) c
ON j.OrderNumber = c.OrderNumber)
LEFT JOIN
(SELECT OrderNumber, COUNT(*) AS CountofSentinelTests
FROM TblSentinelTestSheet
GROUP BY OrderNumber) s
ON j.OrderNumber = s.OrderNumber)
LEFT JOIN
(SELECT OrderNumber, COUNT(*) AS CountofHUVTests
FROM TblHUVTestSheet
GROUP BY OrderNumber) h
ON j.OrderNumber = h.OrderNumber
I have created 3 queries. Each query counts the number of tests for that product that are associated with an order number. Below you will see my code for
QryCountofTestedHUV
SELECT TblJobName.OrderNumber, TblJobName.JobName
(
SELECT
Count(*)
FROM
TblHUVTestSheet
Where TblHUVTestSheet.OrderNumber=TblJobName.OrderNumber
) AS CountofHUVTests
FROM TblJobName LEFT JOIN TblHUVTestSheet ON TblJobName.OrderNumber = TblHUVTestSheet.OrderNumber
GROUP BY TblJobName.OrderNumber, TblJobName.JobName;
From there I have gone into the relationships view, and linked QryCountofTestedHUV.OrderNumber to TblJobName.OrderNumber. Now I can add CountofHUVTests from query QryCountofTestedHUV as a field to my query, QryAllJobs! QryAllJobs is my master list I showed earlier.
I have repeated this 3 times for all 3 product lines and it works!

Pick up only the most recent object associated with a person

I'm having trouble with an exam question, I didn't get it correct on the exam but want to know what I am missing.
The Question:
List the first diagnosis for each patient showing the patient's name,
diagnosis code and diagnosis date. If the patient has two or more
diagnoses on the earliest date, it's okay to just show one of those
diagnoses. Tables needed: encounters, patients, encounter_diagnoses,
diagnoses. Your result set should have four rows.
Here is what I had:
select p.patient_nm, max(e.start_dts)
from edw_emr_ods.patients p
join edw_emr_ods.encounters e
on p.patient_id = e.patient_id
join edw_emr_ods.encounter_diagnoses ed
on e.encounter_id = ed.encounter_id
left join edw_emr_ods.diagnoses d
on ed.encounter_diagnoses_id = d.diagnosis_id
group by p.patient_nm
order by p.patient_nm asc
As you can see I did not include the Diagnosis code. (more on that later) This returned 4 rows:
When I attempt to add the Diagnosis code I get
"Column 'edw_emr_ods.diagnoses.code' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
The only way I could figure out to remove this is to add code in the group by, since it is unable to sort the code and name together without it. But this returns too many rows per patient.
So my question is this "How do I only pick up the name, date, and code of the most recent diagnosis?"
Why don't use rownumber() over(...) to answer your question : "How do I only pick up the name, date, and code of the most recent diagnosis :
SELECT * FROM (
select p.patient_nm, e.start_dts row_number() over(partition by ed.code,p.patient_nm order by e.start_dts desc) as rn
from edw_emr_ods.patients p
join edw_emr_ods.encounters e
on p.patient_id = e.patient_id
join edw_emr_ods.encounter_diagnoses ed
on e.encounter_id = ed.encounter_id
left join edw_emr_ods.diagnoses d
on ed.encounter_diagnoses_id = d.diagnosis_id ) AS T
WHERE rn =1

issue formatting a sql query to only display unique values of 4 tables each containing a different number of results

i have 4 tables, employees, skills, interests, and goals. im trying to display all skills, interests, and goals for one employee but there a different number of values within each of the skills, interests, and goals tables. so for example employee 1 has 3 interests, 5 skills, and 4 goals. what im trying to do is display 5 rows and 5 columns. column 1 would have the name of employee 1 listed 5 times. column 2 would have the skills of the employee. column 3 would have the interests of the employee with 2 nulls. column 4 would have the goals of the employee with one null. like below
i have tried a number of different joins but i keep getting all possible combinations as the output.
any help on this would be greatly appreciated.
Okay, this is an ugly solution, but it works for me, at least in SQL Server:
WITH employees_temp as (
SELECT employees.first_name, row_number() over (ORDER BY id) as RowNum
FROM employees
WHERE employees.first_name LIKE 'will'
),
skills_temp as (
SELECT skills.skill, , row_number() over (ORDER BY skills.employee_id) as RowNum
FROM skills
INNER JOIN employees
ON skills.employee_id = employees.id
WHERE employees.first_name LIKE 'will'
),
goals_temp as (
SELECT goals.goal, , row_number() over (ORDER BY goals.employee_id) as RowNum
FROM goals
INNER JOIN employees
ON goals.employee_id = employees.id
WHERE employees.first_name LIKE 'will'
),
interests_temp as (
SELECT interests.interest, , row_number() over (ORDER BY interests.employee_id) as RowNum
FROM interests
INNER JOIN employees
ON interests.employee_id = employees.id
WHERE employees.first_name LIKE 'will'
)
select employees_temp.first_name, skills_temp.skill, goals_temp.goal, interests_temp.interest
from employees_temp
full outer join skills_temp
on employees_temp.RowNum = skills_temp.RowNum
full outer join goals_temp
on employees_temp.RowNum = goals_temp.RowNum
full outer join interests_temp
on employees_temp.RowNum = interests_temp.RowNum
What this is doing is selecting the data you need for each of the four queries, and adding in a row number. Then, we join all four of those together, joining on the row number. We need a row number to join on, or else it will, as you saw, create every possible combination. The row number functions as a sort of dummy ID for us to join on in order to prevent that.
A couple caveats:
You need to be able to do an OUTER JOIN in order for this to work.
As a matter of principle, I would filter based on employee ID when possible, rather than pattern matching the first name. This would also simplify the query by removing the need for multiple joins in the first half.
If you do so, you're probably best creating a variable at the very beginning of the query to set the employee ID, and then using that variable in the query, so you don't have to update it in multiple places. If that's feasible for the situation you're working in.

How do I join with the latest update?

I have two tables:
delivery with columns uid, dtime, candy which records which candy was given to which user when
lookup with columns uid and ltime which records when the user's pocket was examined
I need to know the result of the lookup, i.e., the result table should have columns uid, ltime, candy, telling me what was found in the user's pocket (assume the user eats the old candy when given the new one).
There were several deliveries before each lookup.
I need only the latest one.
E.g.,
select l.uid, l.ltime,
d.candy /* ... for max(d.dtime):
IOW, I want to sort (d.dtime, d.candy)
by the first field in decreasing order,
then take the second field in the first element */
from delivery d
join lookup l
on d.uid = l.uid
and d.dtime <= l.ltime
group by l.uid, l.ltime
So, how do I know what was found by the lookup?
Use Top 1 with Ties to get latest delivery and Join back to the Lookup Table
Select * from lookup
Inner Join (
Select Top 1 with Ties uid,dtime
From delivery
Order by row_number() over (partition by uid order by dtime desc)) as Delivery
on lookup.uid = Delivery.uid and lookup.ltime >= delivery.dtime
I would suggest outer apply:
select l.*, d.candy
from lookup l outer apply
(select top 1 d.*
from delivery d
where d.uid = l.uid and d.dtime <= l.ltime
order by d.dtime desc
) d;
That answers your question. But, wouldn't the user have all the candies since the last lookup? Or, are we assuming that the user eats the candy on hand when the user is given another? Perhaps the pocket only holds one candy.

Joining two most recent events from two tables

I'm trying to build a report in SQL that shows when a patient last received a particular lab service and the facility at which they've received that service. Unfortunately, the lab procedure and facility are in different tables. Here is what I have now (apologies in advance for my weird aliasing, it makes better since with the actual table names):
;with temp as (Select distinct flow.pid, flow.labdate as obsdate, flow.labvalue as obsvalue
From labstable as flow
Where flow.name = 'lab name'
)
Select distinct p.patientid, MAX(temp.obsdate) [Last Reading], COUNT(temp.obsdate) [Number of Readings],
Case
When count(temp.obsdate) > 2 then 'Active'Else 'Inactive' End [Status], facility.NAME [Facility]
From Patientrecord as p
Join temp on temp.pid = p.PId
Join (Select loc.name, MAX(a.apptstart)[Last appt], a.patientid
From Appointmentstable as a
Join Facility as loc on loc.facilityid = a.FacilityId
Where a.ApptStart = (Select MAX(appointments.apptstart) from Appointments where appointments.patinetId = a.patientid)
Group by loc.NAME, a.patientId
) facility on facility.patientId = p.PatientId
Group by p.PatientId, facility.NAME
Having MAX(temp.obsdate) between DATEADD(yyyy, -1, GETDATE()) and GETDATE()
Order by [Last Reading] asc
My problem with this is that if the patient has been to more than one facility within the time frame, the subquery is selecting each facility into the join, inflating the results by apprx 4000. I need to find a way to select ONLY the VERY MOST RECENT facility from the appointments list, then join it back to the lab. Labs do not have a visitID (that would make too much sense). I'm fairly confident that I'm missing something in either my subquery select or the corresponding join, but after four days I think I need professional help.
Suggestions are much appreciated and please let me know where I can clarify. Thank you in advance!
Change your subquery with alias "facility" to something like this:
...
join (
select patientid, loc_name, last_appt
from (
select patientid, loc_name=loc.name, last_appt=apptstart,
seqnum=row_number() over (partition by patientid order by apptstart desc)
from AppointmentsTable a
inner join Facility loc on loc.facilityid = a.facilityid
) x
where seqnum = 1
) facility
on ...
...
The key difference is the use of the row_number() windowing function. The "partition by" and "order by" clauses guarantee you'll get one set of row numbers per patient and the row with the most recent date will be assigned row number 1. The filter of "seqnum = 1" makes sure you get only the one row you want for each patient.