SQL subquery with latest record - sql

I've read just about every question on here that I can find that is referencing getting the latest record from a subquery, but I just can't work out how to make it work in my situation.
I'm creating an SSRS report for use on SQL Server 2008.
In the database is a table of contacts and DBSdata. I want to pull up a list of contacts and the latest record (many of the fields from that row) from the DBSdata table (expiry date furthest in the future)
Contacts
========
PKContactID ContactName
----------- -----------
1 JONES Chris
2 SMITH Mary
3 GREY Jean
DBSdata
=======
Ordinal FKContactID ExpiryDate IssueDate DBSType
------- ----------- ---------- --------- -------
3 1 2021-09-01 2019-09-01 Internal
2 1 2019-08-31 2017-08-31 External
1 1 2017-07-01 2015-07-01 Internal
2 2 2021-04-15 2019-04-15 Internal
1 2 2019-05-05 2017-05-06 External
1 3 2018-01-03 2016-03-02 External
And the result I'd like is:
Latest DBS
==========
PKContactID ContactName ExpiryDate IssueDate DBSType
-------------------------------------------------------------------
3 GREY Jean 2018-01-03 2016-03-02 External
1 JONES Chris 2021-09-01 2019-09-01 Internal
2 SMITH Mary 2021-04-15 2019-04-15 Internal
[The DBSData table doesn't have it's own Primary Key field - that's not something I have control over, unfortunately... And the ordinal increases per contact, so FKContactID+Ordinal is unique....]
This is the code I've kind of got to, but it isn't working. The system I'm uploading the SSRS to doesn't give me any useful error message at all, so I can't be more specific about what isn't working I'm afraid. I get none of the SSRS report displayed, just an error saying the dataset source isn't working.
SELECT
c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType
FROM
Contacts c
LEFT JOIN (
SELECT TOP 1 FKContactID, ExpiryDate, IssueDate, DBSType
FROM DBSData
WHERE FKContactID = c.PKContactID
ORDER BY ExpiryDate DESC
) d ON c.PKContactID = d.FKContactID
ORDER BY
c.ContactName
I suspect it's something to do with that WHERE in the subquery, but if I don't have that, that whole table is using the WHOLE table and returning 1 row, not the top 1 for that contact.

Your method would work using APPLY, instead of JOIN:
SELECT c.PKContactID, c.ContactName,
d.ExpiryDate, d.IssueDate, d.DBSType
FROM Contacts c OUTER APPLY
(SELECT TOP 1 d.*
FROM DBSData d
WHERE d.FKContactID = c.PKContactID
ORDER BY d.ExpiryDate DESC
) d
ORDER BY c.ContactName;
Technically APPLY implements something called a lateral join. This is like a correlated subquery, but it can return multiple rows and multiple columns. Lateral joins are very powerful, and this is a good example for using them.
For performance, you want indexes on DBSData(FKContactID, ExpiryDate DESC) (perhaps including the other columns you want as well) and Contacts(ContactName).
With the right indexes, I would expect this to have performance at least as good as other methods.
An alternative that also typically has good performance is using a correlated subquery for filtering:
SELECT c.PKContactID, c.ContactName,
d.ExpiryDate, d.IssueDate, d.DBSType
FROM Contacts c LEFT JOIN
DBSData d
ON d.FKContactID = c.PKContactID AND
d.ExpiryDate = (SELECT MAX(d2.ExpiryDate)
FROM DBSData d
WHERE d2.FKContactID = d.FKContactID
);
Note that to match the LEFT JOIN, the correlation condition needs to be in the ON clause, not the WHERE clause.
Finally, if you do use window functions, I would recommend a subquery for getting the first row:
SELECT c.PKContactID, c.ContactName,
d.ExpiryDate, d.IssueDate, d.DBSType
FROM Contacts c LEFT JOIN
(SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY d.FKContactID ORDER BY d.PKContactID DESC) as seqnum
FROM DBSData d
) d
ON d.FKContactID = c.PKContactID AND
d.seqnum = 1;
Doing the subquery before the JOIN gives more opportunities for the optimizer to produce a better execution plan.

Here's one option using row_number():
SELECT *
FROM (
SELECT
c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType,
row_number() over (partition by c.PKContactID order by d.ExpiryDate desc) rn
FROM
Contacts c
LEFT JOIN DBSData d ON d.FKContactID = c.PKContactID
) t
WHERE rn = 1
ORDER BY ContactName
Online Demo

This Solution gives result as you expected and performance is so much higher.
select c.PKContactID,c.ContactName,d.ExpiryDate, d.IssueDate, d.DBSType from Contacts c
inner join DBSdata d
on c.PKContactID=d.FKContactID
where d.Ordinal in (select max(d.Ordinal) from DBSdata d where d.FKContactID=c.PKContactID)
order by c.ContactName

Related

How to join three tables and set blank fields to null?

I used full join and left join to join the Person, Tasks and Task tables. The result shown on the screen resulted in a number of lines greater than six. Unset fields have the value NULL and that's good.
The expected output can be obtained using such joins, however it is necessary to use clauses that allow the common fields to be joined. How can I do this?
A CROSS JOIN produces all the combinations you want. Then a simple outer join can retrieve the related rows (should they exist).
You don't mention the database you are using so a faily standard query will do. For example (in PostgreSQL):
select
row_number() over(order by p.id, t.id) as id,
p.name,
case when x.st is not null then t.hr end,
x.st
from person p
cross join tasks t
left join task x on x.personid_fk = p.id and x.taskid_fk = t.id
order by p.id, t.id;
Result:
id name case st
--- ----- ------ -----
1 Anna null null
2 Anna null null
3 Luo 13:00 true
4 Luo 14:00 false
5 John null null
6 John null null
See running example at DB Fiddle.

Finding max of a column while doing inner join of two tables

I have two tables as follows:
Table A
=====================
student_id test_week
-------- ---------
s1 2018-12-01
s1 2018-12-08
Table B
======================
student_id last_updated remarks
-------- ------------ --------
s1 2018-12-06 Fail
s1 2018-12-10 Pass
Above two tables, I want to fetch following columns:
student_id, last(test_week) and remarks such that
last_updated>=test_week -1 and last_updated<=test_week-15,
i.e. last_updated should be within two weeks of last(test_week), so following will be the result for above entries:
s1 2018-12-08 Pass
I have written like following:
select a.student_id, test_week, remarks
from A inner join B
on A.student_id = B.student_id
and DATEDIFF(last_updated, test_week)>=1
and DATEDIFF(last_updated, test_week)<=15;
But how I will handle the last(test_week), that I am not getting.
If you need the only record related to the last test_week then you can do the following. If I understood this right.
select top 1 a.student_id, test_week, remarks
from A inner join B
on A.student_id = B.student_id
and DATEDIFF(last_updated, test_week)>=1
and DATEDIFF(last_updated, test_week)<=15
order by last_week desc;
You can try to use window function row_number(). The following query will give the max(test_week) for every student_id.
select * from (
select id, test_week, remarks, row_number()
over (partition by id order by test_week desc) as rn
from (
select a.id, test_week, remarks from A join B on A.id = B.id and last_updated - test_week >=1 and last_updated - test_week <=15)tb1
)tb2 where rn=1;
Note : The above query is supported in postgresql, you might want to convert it into equivalent Mysql query

How to count all subquery results that returns only the most recent item?

In this post (Adding a Query to a Subquery then produces no results) #D-Shih provided a great solution, which I would like to extend.
How do I add to the results returned, the count of reports by that teacher, even if the subquery is only finding the last one?
I'm trying to solve the <???> AS CountOfReports, line below, but my SQL skills are not that great.
SELECT
t.NAME,
t1.REPORTINGTYPE,
<???> AS CountOfReports, <<<< ****
t1.REPORTINGPERIOD
FROM
teachers AS t
INNER JOIN
(SELECT
*,
(SELECT COUNT(*) FROM REPORTS tt
WHERE tt.TEACHER_ID = t1.TEACHER_ID
AND tt.REPORTINGPERIOD >= t1.REPORTINGPERIOD) rn
FROM
REPORTS t1) AS t1 ON t1.TEACHER_ID = t.id AND rn = 1
ORDER BY
t.NAME
You can compute the count with a correlated subquery:
SELECT t.Name,
r.ReportingType,
max(r.ReportingPeriod),
(SELECT count(*)
FROM Reports r2
WHERE r2.Teacher_ID = r.Teacher_ID
) AS Reports
FROM Teachers t
JOIN Reports r ON t.ID = r.Teacher_ID
GROUP BY r.Teacher_ID;
NAME REPORTINGTYPE max(r.ReportingPeriod) Reports
-------------- ------------- ---------------------- ----------
Mr John Smith Final 2017-03 3
Ms Janet Smith Draft 2018-07 2

Grouping the data and showing 1 row per group in postgres

I have two tables which look like this :-
Component Table
Revision Table
I want to get the name,model_id,rev_id from this table such that the result set has the data like shown below :-
name model_id rev_id created_at
ABC 1234 2 23456
ABC 5678 2 10001
XYZ 4567
Here the data is grouped by name,model_id and only 1 data for each group is shown which has the highest value of created_at.
I am using the below query but it is giving me incorrect result.
SELECT cm.name,cm.model_id,r.created_at from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by cm.name,cm.model_id,r.created_at
ORDER BY cm.name asc,
r.created_at DESC;
Result :-
Anyone's help will be highly appreciated.
use max and sub-query
select T1.name,T1.model_id,r.rev_id,T1.created_at from
(
select cm.name,
cm.model_id,
MAX(r.created_at) As created_at from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by cm.name,cm.model_id
) T1
left join revision r
on T1.created_at =r.created_at
http://www.sqlfiddle.com/#!17/68cb5/4
name model_id rev_id created_at
ABC 1234 2 23456
ABC 5678 2 10001
xyz 4567
In your SELECT you're missing rev_id
Try this:
SELECT
cm.name,
cm.model_id,
MAX(r.rev_id) AS rev_id,
MAX(r.created_at) As created_at
from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by 1,2
ORDER BY cm.name asc,
r.created_at DESC;
What you were missing is the statement to say you only want the max record from the join table. So you need to join records, but the join will bring in all records from table r. If you group by the 2 columns in component, then select the max from r, on the id and created date, it'll only pick the top out the available to join
I would use distinct on:
select distinct on (m.id) m.id, m.name, r.rev_id, r.created_at
from model m left join
revision r
on m.model_id = r.model_id
order by m.id, r.rev_id;

Allow nulls / de-duplicate within multi-table join? T-SQL

I was wondering if there is a way in either SSIS or T-SQL (SQL Server 2012) to easily return non-duplicate data when doing a multi-table join (per-column, not per row)
I am trying to denormalize / flatten a bunch of data for conversion into a warehouse and I am winding up duplicating a ton of data. I'm hoping there is a sort of rollup/summary function or a design concept I am missing that can help me when merging multiple tables to a single destination.
Example
Let's say for example I have three tables: CUSTOMERS, CUSTOMER_ADDRESSES and CUSTOMER_ACCOUNTS. They and their data look like this:
CUSTOMERS
CUST_ID NAME
1 Burton Guster
CUSTOMER_ADDRESSES
CUST_ID ADDR_SEQ ADDRESS
1 1 123 Awesome St
1 2 456 Fake St
CUSTOMER_ACCOUNTS
CUST_ID ACCT_SEQ ACCT_TYPE ACCOUNT_OPEN_DT
1 1 TAP 1/1/1989
1 2 PHARMA 1/1/2010
I join them using a query like this:
SELECT a.CUST_ID, a.NAME, b.ADDRESS, c.ACCT_TYPE, c.ACCOUNT_OPEN_DT
FROM CUSTOMERS a
JOIN CUSTOMER_ADDRESSES b on a.CUST_ID = b.CUST_ID
JOIN CUSTOMER_ACCOUNTS c on a.CUST_ID = c.CUST_ID
Obviously each row joins to each row and as expected my output looks like this:
ID NAME ADDRESS ACCT_TYPE ACCT_OPEN_DT
1 Burton Guster 123 Awesome St TAP 1/1/1989
1 Burton Guster 123 Awesome St PHARMA 1/1/2010
1 Burton Guster 456 Fake St TAP 1/1/1989
1 Burton Guster 456 Fake St PHARMA 1/1/2010
Is there any way for me to get something like this instead?:
ID NAME ADDRESS ACCT_TYPE ACCT_OPEN_DT
1 Burton Guster 123 Awesome St TAP 1/1/1989
1 NULL 456 Fake St PHARMA 1/1/2010
The goal being to group each column, returning the distinct value per column only once. The larger set would be grouped by the customer ID.
Thank you
Sure, it can be done, although it's kinda awkward to do... :-)
You can use ROW_NUMBER() to get a running row number per costumer from each table independently. Then you can use these row numbers to bring the data together:
;WITH custCTE AS (
SELECT CUST_ID, NAME, 1 AS CUST_ROW_N
FROM CUSTOMERS
),
addrCTE AS (
SELECT CUST_ID, ADDRESS, ROW_NUMBER() OVER(PARTITION BY CUST_ID ORDER BY ADDR_SEQ) CUST_ROW_N
FROM CUSTOMER_ADDRESSES
),
acctCTE AS (
SELECT CUST_ID, ACCT_TYPE, ACCOUNT_OPEN_DT, ROW_NUMBER() OVER(PARTITION BY CUST_ID ORDER BY ACCT_SEQ) CUST_ROW_N
FROM CUSTOMER_ACCOUNTS
)
SELECT COALESCE(a.CUST_ID, b.CUST_ID, c.CUST_ID), a.NAME, b.ADDRESS, c.ACCT_TYPE, c.ACCOUNT_OPEN_DT
FROM custCTE a FULL JOIN addrCTE b ON
a.CUST_ID = b.CUST_ID AND a.CUST_ROW_N = b.CUST_ROW_N FULL JOIN acctCTE c ON
(b.CUST_ID = c.CUST_ID AND b.CUST_ROW_N = c.CUST_ROW_N) OR (a.CUST_ID = c.CUST_ID AND a.CUST_ROW_N = c.CUST_ROW_N)
Here's an SQLFiddle