Query that merges latest reference in multiple tables into shared rows - sql

I have the following schema which is pretty simple and straight forward:
I want to write an optimized query that returns me a list of all phones with their latest message and latest picture taken. In this case with the latest "CreatedAt" for both fields:
Example expected data-set:
-------------------------------------------
| Phone Id | Message Id | Picture Id |
-------------------------------------------
| 1 | 3 | 4 |
| 2 | 4 | 5 |
| 3 | 5 | 6 |
-------------------------------------------
Right now I'm not sure how to write such a query so I just grab everything and then programatically filter it out with server side code i.e:
SELECT * FROM Phones
LEFT OUTER JOIN Messages ON Messages.PhoneId = Phones.Id
LEFT OUTER JOIN Photos ON Photos.PhoneId = Phones.Id
--and then code that filters the CreatedAt in another language
How can I write the following query?

OUTER APPLY seems useful here:
SELECT p.*, m.*, ph.*
FROM Phones p OUTER APPLY
(SELECT TOP (1) m.*
FROM Messages m
WHERE m.PhoneId = p.Id
ORDER BY m.CreatedAt DESC
) m OUTER APPLY
(SELECT TOP (1) ph.*
FROM Photos ph
WHERE ph.PhoneId = p.Id
ORDER BY ph.CreatedAt DESC
) ph;

Related

T-SQL Left Outer Join Select Top 1 - MAX

I have data from a table that looks like this:
encounter | prov_id_name
---------------------------
12345678 | 123456ProviderA
I then want to match it up on the provider id from a dimensional table instead of pulling in the substring if there is a match in the dim table.
The dimensional table looks like the following:
orgz_cd | src_pract_no | pract_rpt_name
----------------------------------------
0002 | 123456 | PROVIDER A X
1234 | 123456 | Provider A
4321 | 123456 | Provider A
I used the following SQL worked to get what I needed:
LEFT OUTER JOIN (
SELECT ZZZ.src_pract_no
, MAX(ZZZ.pract_rpt_name) PRACT_RPT_NAME
FROM smsdss.pract_dim_v AS ZZZ
GROUP BY src_pract_no
) AS MD
ON LEFT(HL7.PRIM_CARE_PROV_NAME_ID, 6) = MD.SRC_PRACT_NO
My question is why did this not work, which is what I originally tried, which would give no results at all:
LEFT OUTER JOIN (
SELECT TOP 1 ZZZ.src_pract_no
, MAX(ZZZ.pract_rpt_name) PRACT_RPT_NAME
FROM smsdss.pract_dim_v AS ZZZ
) AS MD
ON LEFT(HL7.PRIM_CARE_PROV_NAME_ID, 6) = MD.SRC_PRACT_NO
I also tried:
LEFT OUTER JOIN smsdss.pract_dim_v AS MD
ON LEFT(HL7.PRIM_CARE_PROV_NAME_ID, 6) = (
SELECT TOP 1 SRC_PRACT_NO
, PRACT_RPT_NAME
FROM SMSDSS.PRACT_DIM_V
)
I am thinking no results came back or rather it did not work as I expected because the subquery is being evaluated only once for the first result that comes back not finding a match and then that's it, not sure though.
I think you want OUTER APPLY:
OUTER APPLY
(SELECT TOP 1 pd.pract_rpt_name
FROM smsdss.pract_dim_v pd
WHERE LEFT(HL7.PRIM_CARE_PROV_NAME_ID, 6) = pd.SRC_PRACT_NO
-- ORDER BY ?
) MD
Use an ORDER BY if you want a particular name (such as the longest or most recent) when there are multiple matches.

GROUP BY with SUM without removing empty (null) values

TABLES:
Players
player_no | transaction_id
----------------------------
1 | 11
2 | 22
3 | (null)
1 | 33
Transactions
id | value |
-----------------------
11 | 5
22 | 10
33 | 2
My goal is to fetch all data, maintaining all the players, even with null values in following query:
SELECT p.player_no, COUNT(p.player_no), SUM(t.value) FROM Players p
INNER JOIN Transactions t ON p.transaction_id = t.id
GROUP BY p.player_no
nevertheless results omit null value, example:
player_no | count | sum
------------------------
1 | 2 | 7
2 | 1 | 10
What I would like to have is mention about the empty value:
player_no | count | sum
------------------------
1 | 2 | 7
2 | 1 | 10
3 | 0 | 0
What do I miss here?
Actually I use QueryDSL for that, but translated example into pure SQL since it behaves in the same manner.
using LEFT JOIN and coalesce function
SELECT p.player_no, COUNT(p.player_no), coalesce(SUM(t.value),0)
FROM Players p
LEFT JOIN Transactions t ON p.transaction_id = t.id
GROUP BY p.player_no
Change your JOIN to a LEFT JOIN, then add IFNULL(value, 0) in your SUM()
left join keeps all the rows in the left table
SELECT p.player_no
, COUNT(*) as count
, SUM(isnull(t.value,0))
FROM Players p
LEFT JOIN Transactions t
ON p.transaction_id = t.id
GROUP BY p.player_no
You might be looking for count(t.value) rather than count(*)
I'm just offering this so you have a correct answer:
SELECT p.player_no, COUNT(t.id) as [count], COALESCE(SUM(t.value), 0) as [sum]
FROM Players p LEFT JOIN
Transactions t
ON p.transaction_id = t.id
GROUP BY p.player_no;
You need to pay attention to the aggregation functions as well as the JOIN.
Please Try This:
SELECT P.player_no,
COUNT(*) as count,
SUM(isnull(T.value,0))
FROM Players P
LEFT JOIN Transactions T
ON P.transaction_id = T.id
GROUP BY P.player_no
Hope this helps.

Count / sum values in subquery and order by it

I have tables like below:
user
id | status
1 | 0
gallery
id | status | create_by_user_id
1 | 0 | 1
2 | 0 | 1
3 | 0 | 1
media
id | status
1 | 0
2 | 0
3 | 0
gallery_media
fk gallery.id fk media.id
id | gallery_id | media_id | sequence
1 | 1 | 1 | 1
2 | 2 | 2 | 1
3 | 2 | 3 | 2
monitor_traffic
1:gallery 2:media
id | anonymous_id | user_id | endpoint_code | endpoint_id
1 | 1 | | 1 | 2 gallery.id 2
2 | 2 | | 1 | 2 gallery.id 2
3 | | 1 | 2 | 3 media.id 3 include in gallery.id 2
these means gallery.id 2 contain 3 rows
gallery_information
fk gallery.id
id | gallery_id
gallery includes media.
monitor_traffic.endpoint_code: 1 .. gallery; 2 .. media
If 1 then monitor_traffic.endpoint_id references gallery.id
monitor_traffic.user_id, monitor_traffic.anonymous_id integer or null
Objective
I want to output gallery rows sort by count each gallery rows in monitor_traffic, then count the gallery related media rows in monitor_traffic. Finally sum them.
The query I provide only counts media in monitor_traffic without summing them and also does not count gallery in monitor_traffic.
How to do this?
This is part of a function, input option then output build query, something like this. I hope to find a solution (maybe with a subquery) that does not require to change other parts of the query.
Query:
SELECT
g.*,
row_to_json(gi.*) as gallery_information
FROM gallery g
LEFT JOIN gallery_information gi ON gi.gallery_id = g.id
LEFT JOIN "user" u ON u.id = g.create_by_user_id
-- start
LEFT JOIN gallery_media gm ON gm.gallery_id = g.id
LEFT JOIN (
SELECT
endpoint_id,
COUNT(*) as mt_count
FROM monitor_traffic
WHERE endpoint_code = 2
GROUP BY endpoint_id
) mt ON mt.endpoint_id = m.id
-- end
ORDER BY mt.mt_count desc NULLS LAST;
sql fiddle
I suggest a CTE to count both types in one aggregation and join to it two times in the FROM clause:
WITH mt AS ( -- count once for both media and gallery
SELECT endpoint_code, endpoint_id, count(*) AS ct
FROM monitor_traffic
GROUP BY 1, 2
)
SELECT g.*, row_to_json(gi.*) AS gallery_information
FROM gallery g
LEFT JOIN mt ON mt.endpoint_id = g.id -- 1st join to mt
AND mt.endpoint_code = 1 -- gallery
LEFT JOIN (
SELECT gm.gallery_id, sum(ct) AS ct
FROM gallery_media gm
JOIN mt ON mt.endpoint_id = gm.media_id -- 2nd join to mt
AND mt.endpoint_code = 2 -- media
GROUP BY 1
) mmt ON mmt.gallery_id = g.id
LEFT JOIN gallery_information gi ON gi.gallery_id = g.id
ORDER BY mt.ct DESC NULLS LAST -- count of galleries
, mmt.ct DESC NULLS LAST; -- count of "gallery related media"
Or, to order by the sum of both counts:
...
ORDER BY COALESCE(mt.ct, 0) + COALESCE(mmt.ct, 0) DESC;
Aggregate first, then join. That prevents complications with "proxy-cross joins" that multiply rows:
Two SQL LEFT JOINS produce incorrect result
The LEFT JOIN to "user" seems to be dead freight. Remove it:
LEFT JOIN "user" u ON u.id = g.create_by_user_id
Don't use reserved words like "user" as identifier, even if that's allowed as long as you double-quote. Very error-prone.

Postgres: Getting the highest matching log associated with a record?

I store a log as follows:
LOG
ID | MODELID | EVENT
1 | 1 | Upped
2 | 1 | Downed
3 | 2 | Downed
4 | 1 | Upped
5 | 2 | Multiplexed
6 | 1 | Removed
Then I have the models as:
MODEL
ID | NAME
1 | Model 1
2 | Model 2
I want to end up with the LOG entry with the HIGHEST ID in LOG associated with a model as a result:
NAME | EVENT
Model 1 | Removed
Model 2 | Multiplexed
A simple join gives me all the results:
SELECT * FROM MODEL AS M LEFT JOIN LOG AS L
ON L.MODELID = M.ID
But this gives me all the records. What am I missing?
Try this
SELECT M.NAME,L.EVENT FROM LOG L INNER JOIN MODEL M
ON L.MODELID = M.ID
WHERE L.ID IN
(
SELECT MAX(ID) FROM LOG GROUP BY MODELID
)
Maybe you need a subselect. Let's start by breaking down the problem.
First you want the HIGHEST ID for a given MODELID in the LOG table.
SELECT
MODELID
,MAX(ID)
FROM
LOG
GROUP BY
MODELID
Now if we use this as a subselect (virtual table) then you can also get the model name.
E.g.
SELECT
M.NAME
,L.EVENT
FROM
MODEL M
,(
SELECT
MODELID AS MODELID
,MAX(ID) AS MAXID
FROM
LOG
GROUP BY
MODELID
) S
,LOG L
WHERE
M.ID = S.MODELID
AND L.ID = S.MAXID
Give that a go (I haven't tested it myself).
with cte as (
select M.NAME,L.EVENT, row_number() over(partition by M.ID order by L.ID desc) as row_num
from MODEL as M
inner join LOG as L on L.MODELID = M.ID
)
select NAME, EVENT from cte
where row_num = 1
see sql fiddle example
I'm not sure about PostgreSQL, but I think that this one should be faster than counting max, because it's just one pass instead of two. Also this ones generalizes better - you could order by more than one column.

Subquery to return the latest entry for each parent ID

I have a parent table with entries for documents and I have a history table which logs an audit entry every time a user accesses one of the documents.
I'm writing a search query to return a list of documents (filtered by various criteria) with the latest user id to access each document returned in the result set.
Thus for
DOCUMENTS
ID | NAME
1 | Document 1
2 | Document 2
3 | Document 3
4 | Document 4
5 | Document 5
HISTORY
DOC_ID | USER_ID | TIMESTAMP
1 | 12345 | TODAY
1 | 11111 | IN THE PAST
1 | 11111 | IN THE PAST
1 | 12345 | IN THE PAST
2 | 11111 | TODAY
2 | 12345 | IN THE PAST
3 | 12345 | IN THE PAST
I'd be looking to get a return from my search like
ID | NAME | LAST_USER_ID
1 | Document 1 | 12345
2 | Document 2 | 11111
3 | Document 3 | 12345
4 | Document 4 |
5 | Document 5 |
Can I easily do this with one SQL query and a join between the two tables?
Revising what Andy White produced, and replacing square brackets (MS SQL Server notation) with DB2 (and ISO standard SQL) "delimited identifiers":
SELECT d.id, d.name, h.last_user_id
FROM Documents d LEFT JOIN
(SELECT r.doc_id AS id, user_id AS last_user_id
FROM History r JOIN
(SELECT doc_id, MAX("timestamp") AS "timestamp"
FROM History
GROUP BY doc_id
) AS l
ON r."timestamp" = l."timestamp"
AND r.doc_id = l.doc_id
) AS h
ON d.id = h.id
I'm not absolutely sure whether "timestamp" or "TIMESTAMP" is correct - probably the latter.
The advantage of this is that it replaces the inner correlated sub-query in Andy's version with a simpler non-correlated sub-query, which has the potential to be (radically?) more efficient.
I couldn't get the "HAVING MAX(TIMESTAMP)" to run in SQL Server - I guess having requires a boolean expression like "having max(TIMESTAMP) > 2009-03-05" or something, which doesn't apply in this case. (I might be doing something wrong...)
Here is something that seems to work - note the join has 2 conditions (not sure if this is good or not):
select
d.ID,
d.NAME,
h."USER_ID" as "LAST_USER_ID"
from Documents d
left join History h
on d.ID = h.DOC_ID
and h."TIMESTAMP" =
(
select max("TIMESTAMP")
from "HISTORY"
where "DOC_ID" = d.ID
)
This doesn't use a join, but for some queries like this I like to inline the select for the field. If you want to catch the situation when no user has accessed you can wrap it with an NVL().
select a.ID, a.NAME,
(select x.user_id
from HISTORY x
where x.doc_id = a.id
and x.timestamp = (select max(x1.timestamp)
from HISTORY x1
where x1.doc_id = x.doc_id)) as LAST_USER_ID
from DOCUMENTS a
where <your criteria here>
I think it should be something like this:
SELECT ID, Name, b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a LEFT JOIN
( SELECT DOC_ID, USER_ID
FROM HISTORY
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )) as b
ON a.ID = b.DOC_ID
this might work also:
SELECT ID, Name, b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a
LEFT JOIN HISTORY b ON a.ID = b.DOC_ID
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )
Select ID, Name, User_ID
From Documents Left Outer Join
History a on ID = DOC_ID
Where ( TimeStamp = ( Select Max(TimeStamp)
From History b
Where a.DOC_ID = b.DOC_ID ) OR
TimeStamp Is NULL ) /* this accomodates the Left */