SQL Join from Two Tables showing only maximum date in one table - sql

I have two tables, visits and encounters. Each Visit by a student may have several encounters, at different times. I would like a query with visitid, encounterid, and encounterdate showing ONLY the latest encounter for each visit, My results MUST include visits with no encounters.
My tables ;
Visits
visit_id
studenti_id
Encounters
encounter_id
visit_id
encounter_datetime
What I have tried
select
Visits.visit_id,
Encounters.encounter_id,
Encounters.encounter_datetime
FRom Visits
LEFT OUTER JOIN Encounters
ON Visits.visit_id = Encounters.visit_id
INNER JOIN (
select Encounters.visit_id, MAX(Encounters.encounter_datetime)as Latest
from Encounters
group by Encounters.visit_id
) as NewEncounters
ON Encounters.visit_id = NewEncounters.visit_id
AND Encounters.encounter_datetime = NewEncounters.Latest
This returns the results I want, HOWEVER, Visits without encounters are not in the results.

I actually don't see a clean way to salvage your direct join approach, but if your database support ROW_NUMBER, it is an easy option:
WITH cte AS (
SELECT v.visit_id, e.encounter_id, e.encounter_datetime,
ROW_NUMBER() OVER (PARTITION BY v.visit_id ORDER BY e.encounter_datetime DESC) rn
FROM Visits v
LEFT JOIN Encounters e ON v.visit_id = e.visit_id
)
SELECT visit_id, encounter_id, encounter_datetime,
FROM cte
WHERE rn = 1;

For the problem of getting the max of several dates I give an (untested! Sorry)
code example, which, however, points out the line of approach.
Select
Visits.visit_id,
a.encounter_id,
max(a.encounter_datetime) as Max_Datetime
FRom Visits
LEFT OUTER JOIN Encounters a
ON Visits.visit_id = a.Encounters.visit_id
inner join
Encounters b
on a.visit_id=b.visit_id
and
a.encounter_datetime<=b.encounter_datetime
group by
Visits.visit_id,
a.encounter_id,
a.encounter_datetime;
For visits without encounters you can union a query with a where clause using Is Null.
Maybe your database needs some syntactic fumbling with ; etc.

Related

Bigquery SQL code to pull earliest contact

I have a copy of our salesforce data in bigquery, I'm trying to join the contact table together with the account table.
I want to return every account in the dataset but I only want the contact that was created first for each account.
I've gone around and around in circles today googling and trying to cobble a query together but all roads either lead to no accounts, a single account or loads of contacts per account (ignoring the earliest requirement).
Here's the latest query. that produces no results. I think I'm nearly there but still struggling. any help would be most appreciated.
SELECT distinct
c.accountid as Acct_id
,a.id as a_Acct_ID
,c.id as Cont_ID
,a.id AS a_CONT_ID
,c.email
,c.createddate
FROM `sfdcaccounttable` a
INNER JOIN `sfdccontacttable` c
ON c.accountid = a.id
INNER JOIN
(SELECT a2.id, c2.accountid, c2.createddate AS MINCREATEDDATE
FROM `sfdccontacttable` c2
INNER JOIN `sfdcaccounttable` a2 ON a2.id = c2.accountid
GROUP BY 1,2,3
ORDER BY c2.createddate asc LIMIT 1) c3
ON c.id = c3.id
ORDER BY a.id asc
LIMIT 10
The solution shared above is very BigQuery specific: it does have some quirks you need to work around like the memory error you got.
I once answered a similar question here that is more portable and easier to maintain.
Essentially you need to create a smaller table(even better to make it a view) with the ID and it's first transaction. It's similar to what you shared by slightly different as you need to group ONLY in the topmost query.
It looks something like this
select
# contact ids that are first time contacts
b.id as cont_id,
b.accountid
from `sfdccontacttable` as b inner join
( select accountid,
min(createddate) as first_tx_time
FROM `sfdccontacttable`
group by 1) as a on (a.accountid = b.accountid and b.createddate = a.first_tx_time)
group by 1, 2
You need to do it this way because otherwise you can end up with multiple IDs per account (if there are any other dimensions associated with it). This way also it is kinda future proof as you can have multiple dimensions added to the underlying tables without affecting the result and also you can use a where clause in the inner query to define a "valid" contact and so on. You can then save that as a view and simply reference it in any subquery or join operation
Setup a view/subquery for client_first or client_last
as:
SELECT * except(_rank) from (
select rank() over (partition by accountid order by createddate ASC) as _rank,
*
FROM `prj.dataset.sfdccontacttable`
) where _rank=1
basically it uses a Window function to number the rows, and return the first row, using ASC that's first client, using DESC that's last client entry.
You can do that same for accounts as well, then you can join two simple, as exactly 1 record will be for each entity.
UPDATE
You could also try using ARRAY_AGG which has less memory footprint.
#standardSQL
SELECT e.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.createddate ASC LIMIT 1
)[OFFSET(0)] e
FROM `dataset.sfdccontacttable` t
GROUP BY t.accountid
)

Oracle Sql Duplicate rows when joining new table

I am using oracle sql to join tables. I use the following code:
SELECT
T.TRANSACTION_KEY,
PR.ACCOUNT_KEY,
T.ACCT_CURR_AMOUNT,
T.EXECUTION_LOCAL_DATE_TIME,
TC.DESCRIPTION,
T.OPP_ACCOUNT_NAME,
T.OPP_COUNTRY,
PT.PARTY_TYPE_DESC,
P.PARTY_NAME,
P.CUSTOM_SMALL_STRING_02,
CO.COUNTRY_NAME,
LE.LIST_CD
FROM TRANSACTIONS T
LEFT JOIN TRANSACTION_CODE TC
ON T.TRANSACTION_CODE = TC.ENTITY
LEFT JOIN PARTY_ACCOUNT_RELATION PR
ON T.ACCOUNT = PR.ACCOUNT
LEFT JOIN PARTY P
ON PR.PARTY_KEY = P.PARTY_KEY
LEFT JOIN PARTY_TYPE PT
ON P.PARTY_TYPE = PT.ENTITY
LEFT JOIN COUNTRY CO
ON T.OPP_COUNTRY = CO.ENTITY
LEFT JOIN LISTED_ENTITY LE
ON CO.COUNTRY = LE.ENTITY_KEY
WHERE
PR.PARTY_KEY = '111111111' and T.EXECUTION_LOCAL_DATE_TIME>'2017-01-01';
It works fine until now but I want to join another table which has a column in common(ENTITY_KEY) with PARTY_ACCOUNT_RELATION table (ACCOUNT_KEY) and I want to include some of the new table's columns but when I do that, it becomes dublicated. I am adding the following lines before "where" statment:
LEFT JOIN EVALUATE_RULE ER
ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
Does anyone know where the problem is?
If joining another table into an existing query causes the existing rows to be duplicated, it is because the table being joined in has duplicate values in the columns that are being used as keys for the join
In your case, if you do
SELECT ENTITY_KEY FROM EVALUATE_RULE GROUP BY ENTITY_KEY HAVING COUNT(*) > 1
You'll see which entity_keys are duplicated. When these duplicates are joined to the existing data, the existing data has to be doubled up to permit both rows from EVALUATE_RULE with the same ENTITY_KEY to exist in the result set
You must either de-dupe the table, or put other clauses into your ON condition to further restrict the rows coming from EVALUATE_RULE.
For example, after adding EVALUATE_RULE and putting ER.* in your SELECT list, imagine that you can see that the rows from ER are status = 'old' and status = 'current' but you know you only want the current ones.. So put AND er.status = 'current' in your ON clause
Your comment indicates that multiple records differ by some column you don't care about, so this technique will just select only one row:
LEFT JOIN
(SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e) er
ON
er.entity_key = pr.account_key and
er.rown = 1
If you want info on why this works, run that sql in isolation:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e
ORDER BY e.entity_key -- i added this to make it more clear what is going on. You don't need it in your main query
It just assigns a number to each row in the table, the number restarts at 1 every time entity_key changes, so we can then select all those with rown = 1
If it turns out you DO want something specific like "the latest row from evaluate_rule", you can use something like this:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.created_date DESC) as rown FROM evaluate_rule e
Now the latest created_date row will always have rown = 1
So far as I can understain from your description, table EVALUATE_RULE has moro records with ACCOUNT_KEY=ENTITY_KEY.
You can change your query section:
LEFT JOIN EVALUATE_RULE ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
to
LEFT JOIN (SELECT DISTINCT ENTITY_KEY FROM EVALUATE_RULE) ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
If you post structure of EVALUATE_RULE (indicating PK columns) I can change my answer to let you includ EVALUATE_RULE columns in final query.

How to group my table for latest date and ID?

I have a table like this:
I need group this table latest date for every ID.
I mean, I want to get last row every ID. Here is my query:
SELECT DISTINCT ch.Date,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, [Date] from tblCommonHistory ) ch
ON ch.TableIDentity = rk.ID order by ID
How can I do what I want?
EDIT: This query worked for me:
SELECT DISTINCT ch.dt,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, max([Date]) as dt from tblCommonHistory group by TableIdentity) ch ON ch.TableIDentity = rk.ID order by ID
Just use aggregation:
select TableIdentity, max([date])
from tblCommonHistory
group by TableIdentity;
Your question only mentions one table. Your query has two; I don't understand the discrepancy.
It's strange that you have duplicated TableIdentity in tblCommonHistory, but otherwise you should not be getting multiple dates for the same ID from your query.
And also, the only reason to join the 2 tables seems to be that you need to skip those ID that are not present in the tblrisk (is it what you need to do?)
In that case, I'd suggest
SELECT max(ch.Date) AS [Date],ID FROM dbo.tblrisk AS rk
inner join tblCommonHistory AS ch ON ch.TableIDentity = rk.ID
group by ID order by ID

Joining two most recent events from two tables

I'm trying to build a report in SQL that shows when a patient last received a particular lab service and the facility at which they've received that service. Unfortunately, the lab procedure and facility are in different tables. Here is what I have now (apologies in advance for my weird aliasing, it makes better since with the actual table names):
;with temp as (Select distinct flow.pid, flow.labdate as obsdate, flow.labvalue as obsvalue
From labstable as flow
Where flow.name = 'lab name'
)
Select distinct p.patientid, MAX(temp.obsdate) [Last Reading], COUNT(temp.obsdate) [Number of Readings],
Case
When count(temp.obsdate) > 2 then 'Active'Else 'Inactive' End [Status], facility.NAME [Facility]
From Patientrecord as p
Join temp on temp.pid = p.PId
Join (Select loc.name, MAX(a.apptstart)[Last appt], a.patientid
From Appointmentstable as a
Join Facility as loc on loc.facilityid = a.FacilityId
Where a.ApptStart = (Select MAX(appointments.apptstart) from Appointments where appointments.patinetId = a.patientid)
Group by loc.NAME, a.patientId
) facility on facility.patientId = p.PatientId
Group by p.PatientId, facility.NAME
Having MAX(temp.obsdate) between DATEADD(yyyy, -1, GETDATE()) and GETDATE()
Order by [Last Reading] asc
My problem with this is that if the patient has been to more than one facility within the time frame, the subquery is selecting each facility into the join, inflating the results by apprx 4000. I need to find a way to select ONLY the VERY MOST RECENT facility from the appointments list, then join it back to the lab. Labs do not have a visitID (that would make too much sense). I'm fairly confident that I'm missing something in either my subquery select or the corresponding join, but after four days I think I need professional help.
Suggestions are much appreciated and please let me know where I can clarify. Thank you in advance!
Change your subquery with alias "facility" to something like this:
...
join (
select patientid, loc_name, last_appt
from (
select patientid, loc_name=loc.name, last_appt=apptstart,
seqnum=row_number() over (partition by patientid order by apptstart desc)
from AppointmentsTable a
inner join Facility loc on loc.facilityid = a.facilityid
) x
where seqnum = 1
) facility
on ...
...
The key difference is the use of the row_number() windowing function. The "partition by" and "order by" clauses guarantee you'll get one set of row numbers per patient and the row with the most recent date will be assigned row number 1. The filter of "seqnum = 1" makes sure you get only the one row you want for each patient.

How to select DISTINCT results in an SQL JOIN query

this is my query so please check it and tell me. in this query is execute successfully but distinct is not working:
SELECT
DISTINCT(ticket_message.ticket_id),
support_ticket.user_id,
support_ticket.priority,
support_ticket.subject,
support_ticket.status,
ticket_message.message
FROM
support_ticket
LEFT OUTER JOIN ticket_message ON support_ticket.ticket_id = ticket_message.ticket_id
LEFT OUTER JOIN assign_ticket ON ticket_message.ticket_id = assign_ticket.ticket_id
The word distinct is a modifier to the keyword SELECT. So you need to think of it as SELECT DISTINCT and it ALWAYS operates across the entire row. It simply ignores the parentheses seen in the following:
select distinct(ticket_message.ticket_id)
because distinct is NOT a function.
So. What we appear to have is a support ticket with associated messages. There are usually multiple messages per support ticket, so I suspect what you want is more complex. For example you might want just the most recent message for each support ticket.
To achieve most recent we need a timestamp (or "datetime") column and we also need to know if your database supports "window functions". Let's assume you have a timestamp column called message_at and you database does support window functions, then this would reduce the number of rows:
SELECT
support_ticket.ticket_id
, support_ticket.user_id
, support_ticket.support_section
, support_ticket.priority
, support_ticket.subject
, support_ticket.status
, tm.file
, tm.message
, assign_ticket.section_id
, assign_ticket.section_admin_id
FROM support_ticket
LEFT OUTER JOIN (
SELECT
ticket_id
, file
, message
, ROW_NUMBER() OVER (PARTITION BY ticket_id ORDER BY message_at DESC) AS row_num
FROM ticket_message
) tm ON support_ticket.ticket_id = tm.ticket_id
AND tm.row_num = 1
LEFT OUTER JOIN assign_ticket ON tm.ticket_id = assign_ticket.ticket_id
ROW_NUMBER() OVER (PARTITION BY ticket_id ORDER BY message_at DESC) assigns the number 1 to the most recent message, and later we ignore all rows that are > 1 thus removing unwanted repetition in the results.
So.
We really need to know much more about your actual data, the database (and version) you are using and your real needs. It is almost certain that select distinct is NOT the right technique for what you are trying to achieve.
I suggest you read these: Provide a Minimal Complete Verifiable Example (MCVE)
and Why should I provide a MCVE
Use this statement:
SELECT DISTINCT
ticket_message.ticket_id
FROM
support_ticket
LEFT OUTER JOIN ticket_message ON
support_ticket.ticket_id = ticket_message.ticket_id
LEFT OUTER JOIN assign_ticket ON
ticket_message.ticket_id = assign_ticket.ticket_id
As soon as you add more columns to your query, DISTINCT also takes them into account as well.