Joining two most recent events from two tables

Joining two most recent events from two tables - sql

I'm trying to build a report in SQL that shows when a patient last received a particular lab service and the facility at which they've received that service. Unfortunately, the lab procedure and facility are in different tables. Here is what I have now (apologies in advance for my weird aliasing, it makes better since with the actual table names):
;with temp as (Select distinct flow.pid, flow.labdate as obsdate, flow.labvalue as obsvalue
From labstable as flow
Where flow.name = 'lab name'
)
Select distinct p.patientid, MAX(temp.obsdate) [Last Reading], COUNT(temp.obsdate) [Number of Readings],
Case
When count(temp.obsdate) > 2 then 'Active'Else 'Inactive' End [Status], facility.NAME [Facility]
From Patientrecord as p
Join temp on temp.pid = p.PId
Join (Select loc.name, MAX(a.apptstart)[Last appt], a.patientid
From Appointmentstable as a
Join Facility as loc on loc.facilityid = a.FacilityId
Where a.ApptStart = (Select MAX(appointments.apptstart) from Appointments where appointments.patinetId = a.patientid)
Group by loc.NAME, a.patientId
) facility on facility.patientId = p.PatientId
Group by p.PatientId, facility.NAME
Having MAX(temp.obsdate) between DATEADD(yyyy, -1, GETDATE()) and GETDATE()
Order by [Last Reading] asc
My problem with this is that if the patient has been to more than one facility within the time frame, the subquery is selecting each facility into the join, inflating the results by apprx 4000. I need to find a way to select ONLY the VERY MOST RECENT facility from the appointments list, then join it back to the lab. Labs do not have a visitID (that would make too much sense). I'm fairly confident that I'm missing something in either my subquery select or the corresponding join, but after four days I think I need professional help.
Suggestions are much appreciated and please let me know where I can clarify. Thank you in advance!

Change your subquery with alias "facility" to something like this:
...
join (
select patientid, loc_name, last_appt
from (
select patientid, loc_name=loc.name, last_appt=apptstart,
seqnum=row_number() over (partition by patientid order by apptstart desc)
from AppointmentsTable a
inner join Facility loc on loc.facilityid = a.facilityid
) x
where seqnum = 1
) facility
on ...
...
The key difference is the use of the row_number() windowing function. The "partition by" and "order by" clauses guarantee you'll get one set of row numbers per patient and the row with the most recent date will be assigned row number 1. The filter of "seqnum = 1" makes sure you get only the one row you want for each patient.

Related

How to avoid duplicates between two tables on using join?

I have two tables work_table and progress_table.
work_table has following columns:
id[primary key],
department,
dept_name,
dept_code,
created_time,
updated_time
progress_table has following columns:
id[primary key],
project_id,
progress,
progress_date
I need only the last updated progress value to be updated in the table now am getting duplicates.
Here is the tried code:
select
row_number() over (order by a.dept_code asc) AS sno,
a.dept_name,
b.project_id,
p.physical_progress,
DATE(b.updated_time) as updated_date,
b.created_time
from
masters.dept_users as a,
work_table as b
LEFT JOIN
progress as p on b.id = p.project_id
order by
a.dept_name asc
It shows the duplicate values for progress with the same id how to resolve it?[the progress values are integer whose values are feed to the form]

Having reformatted your query, some things become clear...
You've mixed , and JOIN syntax (why!?)
You start with the masters.dept_users table, but don't mention it in your description
You have no join predicate between dept_users and work_table
You calculate an sno, but have no partition by and never use it
Your query includes columns not mentioned in the table descriptions above
And to top it off, you use meaningless aliases like a and b? Please for the love of other, and your future self (who will try to read this one day) make the aliases meaningful in Some way.
You possibly want something like...
WITH
sorted_progress AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY project_id
ORDER BY progress_date DESC -- This may need to be updated_time, your question is very unclear
)
AS seq_num
FROM
progress
)
SELECT
<whatever>
FROM
masters.dept_users AS u
INNER JOIN
work_table AS w
ON w.user_id = u.id -- This is a GUESS, but you need to do SOMETHING here
LEFT JOIN
sorted_progress AS p
ON p.project_id = w.id -- Even this looks suspect, are you SURE that w.id is the project_id?
AND p.seq_num = 1
That at least shows how to get that latest progress record (p.seq_num = 1), but whether the other joins are correct is something you'll have to double (and triple) check for yourself.

Pick up only the most recent object associated with a person

I'm having trouble with an exam question, I didn't get it correct on the exam but want to know what I am missing.
The Question:
List the first diagnosis for each patient showing the patient's name,
diagnosis code and diagnosis date. If the patient has two or more
diagnoses on the earliest date, it's okay to just show one of those
diagnoses. Tables needed: encounters, patients, encounter_diagnoses,
diagnoses. Your result set should have four rows.
Here is what I had:
select p.patient_nm, max(e.start_dts)
from edw_emr_ods.patients p
join edw_emr_ods.encounters e
on p.patient_id = e.patient_id
join edw_emr_ods.encounter_diagnoses ed
on e.encounter_id = ed.encounter_id
left join edw_emr_ods.diagnoses d
on ed.encounter_diagnoses_id = d.diagnosis_id
group by p.patient_nm
order by p.patient_nm asc
As you can see I did not include the Diagnosis code. (more on that later) This returned 4 rows:
When I attempt to add the Diagnosis code I get
"Column 'edw_emr_ods.diagnoses.code' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
The only way I could figure out to remove this is to add code in the group by, since it is unable to sort the code and name together without it. But this returns too many rows per patient.
So my question is this "How do I only pick up the name, date, and code of the most recent diagnosis?"

Why don't use rownumber() over(...) to answer your question : "How do I only pick up the name, date, and code of the most recent diagnosis :
SELECT * FROM (
select p.patient_nm, e.start_dts row_number() over(partition by ed.code,p.patient_nm order by e.start_dts desc) as rn
from edw_emr_ods.patients p
join edw_emr_ods.encounters e
on p.patient_id = e.patient_id
join edw_emr_ods.encounter_diagnoses ed
on e.encounter_id = ed.encounter_id
left join edw_emr_ods.diagnoses d
on ed.encounter_diagnoses_id = d.diagnosis_id ) AS T
WHERE rn =1

Bigquery SQL code to pull earliest contact

I have a copy of our salesforce data in bigquery, I'm trying to join the contact table together with the account table.
I want to return every account in the dataset but I only want the contact that was created first for each account.
I've gone around and around in circles today googling and trying to cobble a query together but all roads either lead to no accounts, a single account or loads of contacts per account (ignoring the earliest requirement).
Here's the latest query. that produces no results. I think I'm nearly there but still struggling. any help would be most appreciated.
SELECT distinct
c.accountid as Acct_id
,a.id as a_Acct_ID
,c.id as Cont_ID
,a.id AS a_CONT_ID
,c.email
,c.createddate
FROM `sfdcaccounttable` a
INNER JOIN `sfdccontacttable` c
ON c.accountid = a.id
INNER JOIN
(SELECT a2.id, c2.accountid, c2.createddate AS MINCREATEDDATE
FROM `sfdccontacttable` c2
INNER JOIN `sfdcaccounttable` a2 ON a2.id = c2.accountid
GROUP BY 1,2,3
ORDER BY c2.createddate asc LIMIT 1) c3
ON c.id = c3.id
ORDER BY a.id asc
LIMIT 10

The solution shared above is very BigQuery specific: it does have some quirks you need to work around like the memory error you got.
I once answered a similar question here that is more portable and easier to maintain.
Essentially you need to create a smaller table(even better to make it a view) with the ID and it's first transaction. It's similar to what you shared by slightly different as you need to group ONLY in the topmost query.
It looks something like this
select
# contact ids that are first time contacts
b.id as cont_id,
b.accountid
from `sfdccontacttable` as b inner join
( select accountid,
min(createddate) as first_tx_time
FROM `sfdccontacttable`
group by 1) as a on (a.accountid = b.accountid and b.createddate = a.first_tx_time)
group by 1, 2
You need to do it this way because otherwise you can end up with multiple IDs per account (if there are any other dimensions associated with it). This way also it is kinda future proof as you can have multiple dimensions added to the underlying tables without affecting the result and also you can use a where clause in the inner query to define a "valid" contact and so on. You can then save that as a view and simply reference it in any subquery or join operation

Setup a view/subquery for client_first or client_last
as:
SELECT * except(_rank) from (
select rank() over (partition by accountid order by createddate ASC) as _rank,
*
FROM `prj.dataset.sfdccontacttable`
) where _rank=1
basically it uses a Window function to number the rows, and return the first row, using ASC that's first client, using DESC that's last client entry.
You can do that same for accounts as well, then you can join two simple, as exactly 1 record will be for each entity.
UPDATE
You could also try using ARRAY_AGG which has less memory footprint.
#standardSQL
SELECT e.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.createddate ASC LIMIT 1
)[OFFSET(0)] e
FROM `dataset.sfdccontacttable` t
GROUP BY t.accountid
)

How to group my table for latest date and ID?

I have a table like this:
I need group this table latest date for every ID.
I mean, I want to get last row every ID. Here is my query:
SELECT DISTINCT ch.Date,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, [Date] from tblCommonHistory ) ch
ON ch.TableIDentity = rk.ID order by ID
How can I do what I want?
EDIT: This query worked for me:
SELECT DISTINCT ch.dt,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, max([Date]) as dt from tblCommonHistory group by TableIdentity) ch ON ch.TableIDentity = rk.ID order by ID

Just use aggregation:
select TableIdentity, max([date])
from tblCommonHistory
group by TableIdentity;
Your question only mentions one table. Your query has two; I don't understand the discrepancy.

It's strange that you have duplicated TableIdentity in tblCommonHistory, but otherwise you should not be getting multiple dates for the same ID from your query.
And also, the only reason to join the 2 tables seems to be that you need to skip those ID that are not present in the tblrisk (is it what you need to do?)
In that case, I'd suggest
SELECT max(ch.Date) AS [Date],ID FROM dbo.tblrisk AS rk
inner join tblCommonHistory AS ch ON ch.TableIDentity = rk.ID
group by ID order by ID

How do I select the Max in this query? Help for exam

So, I'm going thru a lot of exercises for a final SQL exam I have on thursday and I came across another query I'm having doubts about.
The tables in the exercise are supposed to be from a hotel DB. You have three tables involved:
STAY ROOM ROOM_TYPE
=========== ============ ============
PK ID_STAY PK ID_ROOM PK ID_ROOM_TYPE
DAYS_QUANT ID_ROOM_TYPE FK DESCRIPTION
DATE PRICE
ID_ROOM FK
The query they are asking me to do is "Show all data for the Room that has been rented for the highest amount of days (in total) in 2011, by room type (you have to show ID Room Type and Description)"
This is the way I solved it, I don't know if it's ok:
SELECT RT.ID_ROOM_TYPE, RT.DESCRIPTON, R.*, SUM(S.DAYS_QUANT)
FROM STAY S, ROOM R, ROOM_TYPE RT
WHERE YEAR(S.DATE) = '2011'
GROUP BY RT.ID_ROOM_TYPE, RT.DESCRIPTON, R.*
ORDER BY SUM(S.DAYS_QUANT) DESC
LIMIT 1
So, the first thing I'm not sure about, is that R.* I included. Can I put it like that in a SELECT? Can it also be included like that in a GROUP BY?
The other thing I'm not sure about if I will be allowed to use LIMIT or SELECT TOP 1 statements in the exam. Can anyone think of a way to solve this without using those? like with a MAX() statement or something?

I believe that you are not allowed to use CTEs so I expanded last part of Steve Kass's answer. You may get desired results without TOP or Limit clauses by comparing total days a room was occupied by max total number of days any room of the same type was occupied. To do so, you would first sum days by room and then, using identical derived table, get maximum of days per room type. Joining the two by room type and days you would isolate most used rooms. Then you join starting tables to show all the data. Unlike TOP or Limit this will produce more records in case of a tie.
P.S. this is NOT tested. I believe it will work, but there might be a typo.
select r.*, rt.*, roomDays.TotalDays
from Room r inner join Room_type rt
on r.id_room_type = rt.id_room_type
inner join
(select id_room, id_room_type, sum(days_quant) TotalDays
from Stay
inner join Room
on Stay.id_room = Room.id_room
where year(Date) = 2011
group by id_room, id_room_type) roomDays
on r.id_room = roomDays.id_room
inner join
(select id_room_type, max(TotalDays) TotalDays
from
(select id_room, id_room_type, sum(days_quant) TotalDays
from Stay
inner join Room
on Stay.id_room = Room.id_room
where year(Date) = 2011
group by id_room, id_room_type) roomDaysHelper
group by id_room_type) roomTypeDays
on r.id_room_type = roomTypeDays.id_room_type
and roomDays.TotalDays = roomTypeDays.TotalDays

select r.*, t.*
from room r
join room_type t on t.id_room_type = r.id_room_type
where r.id in
(select
(select r.id_room
from room r
join stay on stay.id_room = r.id_room
where year(s.date) = '2011'
and r.id_room_type = t.id_room_type
group by r.id_room
order by sum(s.days_quant) desc
limit 1) room_id
from room_type t)

It's always possible to avoid LIMIT 1 or SELECT TOP. One way is to express the top row as the row for which there is no higher row. WHERE NOT EXISTS expresses the idea of "for which there is no."
One way to think of this is as follows: Select those rooms (along with their total days and type information) for which there is no room of the same type with a greater number of total days. That gives you this query (not carefully proofread):
with StayTotals as (
select
STAY.ID_ROOM,
ROOM_TYPE.ID_ROOM_TYPE,
ROOM_TYPE.DESCRIPTION,
SUM(STAY.DAYS_QUANT) AS TotalDays2011
from STAY join ROOM on STAY.ID_ROOM = ROOM.ID_ROOM
join ROOM_TYPE on ROOM.ID_ROOM_TYPE = ROOM_TYPE.ID_ROOM_TYPE
where YEAR(STAY.DATE) = 2011
group by STAY.ID_ROOM, ROOM_TYPE.ID_ROOM_TYPE, ROOM_TYPE.DESCRIPTION
)
select *
from StayTotals as T1
where not exists (
select *
from StayTotals as T2
where T2.ID_ROOM_TYPE = T1.ID_ROOM_TYPE
and T2.TotalDays2011 > T1.TotalDays2011
);
If you can't use CTEs (the WITH clause), you can rewrite it using subqueries, but it's awkward.
Ranking functions have been part of the SQL standard for quite a while. If you can use them, this may also work:
with StayTotals as (
select
STAY.ID_ROOM,
ROOM_TYPE.ID_ROOM_TYPE,
ROOM_TYPE.DESCRIPTION,
SUM(STAY.DAYS_QUANT) AS TotalDays2011
from STAY join ROOM on STAY.ID_ROOM = ROOM.ID_ROOM
join ROOM_TYPE on ROOM.ID_ROOM_TYPE = ROOM_TYPE.ID_ROOM_TYPE
where YEAR(STAY.DATE) = 2011
group by STAY.ID_ROOM, ROOM_TYPE.ID_ROOM_TYPE, ROOM_TYPE.DESCRIPTION
), StayTotalsRankedByType as (
select
ID_ROOM,
ID_ROOM_TYPE,
DESCRIPTION,
TotalDays2011,
RANK() OVER (
PARTITION BY ID_ROOM_TYPE
ORDER BY TotalDays2011 DESC
) as RankInRoomType
from StayTotals
)
select
ID_ROOM,
ID_ROOM_TYPE,
DESCRIPTION,
TotalDays2011
from StayTotalsRankedByType
where RankInRoomType = 1;
Finally, one other way to pull in additional columns to describe the grouped MAX results is to use a "carryalong" sort, which was a handy technique before ranking functions were available. Adam Machanic gives an example here, and there are useful threads on the topic from Usenet, such as this one.

How about this?
select room.id_room, room_type.description, room.price
from room inner join room_type
on room.id_room.type = room_type.id_room_type
where room.room_id = (select room_id from stay
where year (date) = 2011
group by id_room
order by sum (days_quant) desc);
Unfortunately, this query (as it is now) doesn't show how for many days the most popular room had been rented. But there's no 'limit 1'!

Thank you all! with all the ideas you gave me I came up with this, let me know if you think it's ok please!
SELECT R.ID_ROOM, R.ID_ROOM_TYPE, T.DESCRIPTION, SUM(S.DAYS_CUANT)
FROM ROOM R, ROOM_TYPE T, STAY S
(SELECT ID_STAY, SUM(S.DAYS_QUANT) TOTALDAYS
FROM STAY S
WHERE YEAR(S.DATE) = 2011
GROUP BY S.ID_STAY) STAYHELPER
WHERE YEAR(S.DATE) = 2011
GROUP BY R.ID_ROOM, R.ID_ROOM_TYPE, T.DESCRIPTION
HAVING SUM(S.DAYS_QUANT) >= ALL STAYHELPER.TOTALDAYS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Joining two most recent events from two tables - sql

Related

How to avoid duplicates between two tables on using join?

Pick up only the most recent object associated with a person

Bigquery SQL code to pull earliest contact

How to group my table for latest date and ID?

How do I select the Max in this query? Help for exam

Categories

Resources