Subselect in the ON clause - sql

Please dont bash me if there are already answers for this question, but I found none.
Basically I want to make a subselect in the ON clause of a Left join to get the newest entry in a timeframe.
(start and endtime are timestamps, hardcoded, in local variables or host variables in a Cobol program) to simplify I used integers in that question.
Select * from table1 as t1
left join table2 as t2 on
t1.primary = t2.secondary
and t2.timestamp = (
select max(t2a.timestamp) from table2 as t2a
where t2.primary = t2a.primary
and t2a.timestamp > starttime
and t2a.timestamp < endtime
)
Now this does not work, I get the following error:
AN ON CLAUSE IS INVALID. SQLCODE=-338
Because (see the docs)
The ON clause cannot contain a subquery.
Now what we can do to surround that is instead of joining table2 to join a already delimited subquery. But that surrounds the query optimizer what literally kills the performance:
Select * from table1 as t1
left join (
select t2a.secondary from table2 as t2a
where t2a.timestamp = (
select max(t2b.timestamp)
from table2 as t2b
where t2a.primary = t2b.primary
and t2b.timestamp > starttime
and t2b.timestamp < endtime
)
)as t2
on t1.primary = t2.secondary
Any idea how to slove this?
Example data table1:
t1.primary
1
2
3
Example data table2:
t2.primary t2.secondary t2.timestamp
1 1 4
2 1 5
3 1 10
4 2 4
5 2 5
Variables:
starttime = 3
endtime = 6
Expected result:
t1.primary t2.primary t2.secondary t2.timestamp
1 2 1 5 --Leftjoined the newest entry in range
2 5 2 5
3 NULL NULL NULL

This should work
select *
from table1 t1
left join (
select t2.primary, t2.secondary, t2.timestamp,
row_number() over (partition by t2.secondary order by t2.timestamp desc) rn
from table2 t2
where t2.timestamp between starttime and endtime
) t on t1.primary = t.secondary and t.rn = 1
If you have an index table2(timestamp, secondary, primary) or at least table2(timestamp, secondary) then it should run really fast. Without the indexes, it still works with quite good performance, since it leads to one sequential scan of the tables.

something like this. Just typed in before lunch so don't bash me if it doesn't work.
select * from table1 a left join
(select t2b.primary, max(t2b.timestamp) mxts
from table2 t2b
group by t2b.primary
) as b on a.primary = b.primary
left join table2 on b.primary = table2.secondary and
table2.timestamp = mxts and table2.timestamp between mystartts and myendts
NOte: Don't assume timestamps are unique and can be used to extract the last entry from a table because this will undobtly fraile.

Related

counting totals after left join and requiring 0 for a NULL variable - SQL Server

I am using SQL Server Management Studio 2012 and I am running the following query:
SELECT T1.ID, COUNT(DISTINCT T2.APPOINTMENT_DATE) AS [TOTAL_APPOINTMENTS]
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
WHERE T2.APPOINTMENT_DATE > '2019-01-01' AND T2.APPOINTMENT_DATE < '2020-01-01'
AND (T1.ID = 1 OR T1.ID = 2 OR T1.ID = 3)
I would like the total number of appointments for these 3 individuals for now. Then, I will include everyone in Table 1. Table 1 gives me the ID (one row per individual), Table 2 gives me all appointments across different days per individual.
The results I get are:
ID TOTAL_APPOINTMENTS
1 12
2 3
But I would like:
ID TOTAL_APPOINTMENTS
1 12
2 3
3 0
Can you please advise?
Move the WHERE conditions on the second table to the ON clause:
SELECT T1.ID, COUNT(DISTINCT T2.APPOINTMENT_DATE) AS [TOTAL_APPOINTMENTS]
FROM T1 LEFT JOIN
T2
ON T1.ID = T2.ID AND
T2.APPOINTMENT_DATE > '2019-01-01' AND
T2.APPOINTMENT_DATE < '2020-01-01'
WHERE T1.ID IN (1, 2, 3);
Note that the conditions on the first table remain in the WHERE clause. Also, IN is simpler than a bunch of OR conditions.

Querying two tables to filter data using select case

I have two tables
Table 1 looks like this
ID Repeats
-----------
A 1
A 1
A 0
B 2
B 2
C 2
D 1
Table 2 looks like this
ID values
-----------
A 100
B 200
C 100
D 300
Using a view I need a result like this
ID values Repeats
-------------------
A 100 NA
B 200 2
C 100 2
D 300 1
that means, I want unique ID, its values and Repeats. Repeats value should display NA when there are multiple values against single ID and it should display the Repeats value in case there is single value for repeats.
Initially I needed to display the max value of repeats so I tried the following view
ALTER VIEW [dbo].[BookingView1]
AS
SELECT bv.*, bd2.Repeats FROM Table1 bv
JOIN
(
SELECT distinct bd.id, bd.Repeats FROM table2 bd
JOIN
(
SELECT Id, MAX(Repeats) AS MaxRepeatCount
FROM table2
GROUP BY Id
) bd1
ON bd.Id = bd1.Id
AND bd.Repeats = bd1.MaxRepeatCount
) bd2
ON bv.Id = bd2.Id;
and this returns the correct result but when trying to implement the CASE it fails to return unique ID results. Please help!!
One method uses outer apply:
select t2.*, t1.repeats
from table2 t2 outer apply
(select (case when max(repeats) = min(repeats) then max(repeats)
else 'NA'
end) as repeats
from table1 t1
where t1.id = t2.id
) t1;
Two notes:
This assumes that repeats is a string. If it is a number, you need to cast it to a string.
repeats is not null.
For the sake of completeness, I'm including another approach that will work if repeats is NULL. However, Gordon's answer has a much simpler query plan and should be preferred.
Option 1 (Works with NULLs):
SELECT
t1.ID, t2.[Values],
CASE
WHEN COUNT(*) > 1 THEN 'NA'
ELSE CAST(MAX(Repeats) AS VARCHAR(2))
END Repeats
FROM (
SELECT DISTINCT t1.ID, t1.Repeats
FROM #table1 t1
) t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values]
Option 2 (does not contain explicit subqueries, but does not work with NULLs):
SELECT DISTINCT
t1.ID,
t2.[Values],
CASE
WHEN COUNT(t1.Repeats) OVER (PARTITION BY COUNT(DISTINCT t1.Repeats), t1.ID) > 1 THEN 'NA'
ELSE CAST(t1.Repeats AS VARCHAR(2))
END Repeats
FROM #table1 t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values], t1.Repeats
NOTE:
This may not give desired results if table2 has different values for the same ID.

SQL Server Return Rows Where Field Changed

I have a table with 3 values.
ID AuditDateTime UpdateType
12 12-15-2015 18:09 1
45 12-04-2015 17:41 0
75 12-21-2015 04:26 0
12 12-17-2015 07:43 0
35 12-01-2015 05:36 1
45 12-15-2015 04:35 0
I'm trying to return only records where the UpdateType has changed from AuditDateTime based on the IDs. So in this example, ID 12 changes from the 12-15 entry to the 12-17 entry. I would want that record returned. There will be multiple instances of ID 12, and I need all records returned where an ID's UpdateType has changed from its previous entry. I tried adding a row_number but it didn't insert sequentially because the records are not in the table in order. I've done a ton of searching with no luck. Any help would be greatly appreciated.
By using a CTE it is possible to find the previous record based upon the order of the AuditDateTime
WITH CTEData AS
(SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AuditDateTime) [ROWNUM], *
FROM #tmpTable)
SELECT A.ID, A.AuditDateTime, A.UpdateType
FROM CTEData A INNER JOIN CTEData B
ON (A.ROWNUM - 1) = B.ROWNUM AND
A.ID = B.ID
WHERE A.UpdateType <> B.UpdateType
The Inner Join back onto the CTE will give in one query both the current record (Table Alias A) and previous row (Table Alias B).
This should do what you're trying to do I believe
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
LEFT OUTER JOIN dbo.My_Table T3 ON
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
WHERE
T3.ID IS NULL
Alternatively:
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
WHERE
NOT EXISTS
(
SELECT *
FROM
dbo.My_Table T3
WHERE
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
)
The basic gist of both queries is that you're looking for rows where an earlier row had a different type and no other rows exist between the two rows (hence, they're sequential). Both queries are logically identical, but might have differing performance.
Also, these queries assume that no two rows will have identical audit times. If that's not the case then you'll need to define what you expect to get when that happens.
You can use the lag() window function to find the previous value for the same ID. Now you can pick only those rows that introduce a change:
select *
from (
select lag(UpdateType) over (
partition by ID
order by AuditDateTime) as prev_updatetype
, *
from YourTable
) sub
where prev_updatetype <> updatetype
Example at SQL Fiddle.

Self join for two same rows with one different column

Hi I have table with the following data
A B bid status
10 20 1 SUCCESS_1
10 20 1 SUCCESS_2
10 30 2 SUCCESS_1
10 30 2 SUCCESS_2
Now I want to print or count above rows based on SUCCESS_1 and SUCCESS_2. I created the following query but it does not work it just returns one row by combining two rows.
select * from tbl t1 join tbl t2 on
on (t1.A=t2.A and t1.B=t2.B and
(t1.Status = 'SUCCESS_1' and t2.Status = 'SUCCESS_2')
where t1.bid= 1
I want output as the following for the above query
A B bid status
10 20 1 SUCCESS_1
10 20 1 SUCCESS_2
I am new to SQL please guide. Thanks in advance.
If you need to do the join for some reason (e.g. your database does not let you select everything if you group by 1 column, because it wants everything projected to either be grouped or be an aggregate), you could do the following:
select t1.*
from tbl t1 join tbl t2
on (t1.A=t2.A and t1.B=t2.B and t1.Status = 'SUCCESS_1' and t2.Status = 'SUCCESS_2')
where t1.bid= 1
union all select t2.*
from tbl t1 join tbl t2
on (t1.A=t2.A and t1.B=t2.B and t1.Status = 'SUCCESS_1' and t2.Status = 'SUCCESS_2')
where t1.bid= 1
order by 1,2,3,4
Your original query is pulling back all the data in one row, but this one pulls back the two rows that make that resulting join row separately.
SELECT * FROM `tbl1` WHERE `bid`=1 GROUP BY `status`

How can I avoid a sub-query?

This is my table:
ID KEY VALUE
1 alpha 100
2 alpha 500
3 alpha 22
4 beta 60
5 beta 10
I'm trying to retrieve a list of all KEY-s with their latest values (where ID is in its maximum):
ID KEY VALUE
3 alpha 22
5 beta 10
In MySQL I'm using this query, which is not effective:
SELECT temp.* FROM
(SELECT * FROM t ORDER BY id DESC) AS temp
GROUP BY key
Is it possible to avoid a sub-query in this case?
Use an INNER JOIN to join with your max ID's.
SELECT t.*
FROM t
INNER JOIN (
SELECT ID = MAX(ID)
FROM t
GROUP BY
key
) tm ON tm.ID = t.ID
Assuming the ID column is indexed, this is likely as fast as its going to get.
here is the mysql documentation page that discusses this topic.
it presents three distinct options.
the only one that doesn't involve a sub query is:
SELECT t1.id, t1.k, t1.value
FROM t t1
LEFT JOIN t t2 ON t1.k = t2.k AND t1.id < t2.id
WHERE t2.k IS NULL;
There's page in the manual explaining how to do this