What is the alternative for outer apply? - sql

Recently I have added outer apply in my a query. Since then this query takes forever. One reason i know that the table it is associated with is the biggest table in the database now.
select
a.*,
b.*,
BTab.*,
BTTab.*
from
tableA a
join tableB b ON a.ID = b.UID
join *****
left join *******
....
....
....
outer apply
(SELECT TOP 1 *
FROM
biggestTable bt
WHERE
bt.id = a.id
and a.id <> 100
ORDER BY a.datetime desc) BTab
Outer apply
(SELECT TOP 1 *
FROM
biggestTable btt
WHERE
btt.id = a.id
AND btt.DateTime <> '1948-01-01 00:00:00.000'
and btt.value = 0
order by btt.datetime desc) BTTab
where
..................
.................
....................
.................
Is there any better solution than using outer apply?

Here's an alternative, can't say whether its better or not. You may simply need better indexes on your big table
WITH BTAB as
( SELECT TOP 1
* ,
row_nubmer() over (partition by b.id) rn
FROM
biggestTable bt
) ,
BTTab as (SELECT TOP 1
* ,
row_nubmer() over (partition by btt.id order by btt.datetime desc) rn
FROM
biggestTable btt
WHERE
AND btt.DateTime <> '1948-01-01 00:00:00.000'
and btt.value = 0
)
select
a.*,
b.*,
BTab.*,
BTTab.*
from
tableA a
join tableB b ON a.ID = b.UID
join *****
left join BTab on ON a.ID = BTab.ID
and BTAB.rn = 1
left join BTTabon ON a.ID = BTTab.ID
and BTTab.rn = 1

+1 for Conrad as his answer might be all you need and I reused some of his syntax.
Problem with Apply and CTE is they are evaluated for each row in the a, b join.
I would create two temporary tables. To represent the max rows and put a PK on them. The benefit is these two expensive quires are done once and the join is to a PK. Big benefit joining to a PK. I eat the overhead of #temp to get a single evaluation and PK a lot.
Create table #Btab (int ID PK, ...)
insert into #Btab
WITH BTAB as
( SELECT * ,
row_nubmer() over (partition by b.id) rn
FROM
biggestTable
where ID <> 100
)
Select * from BTAB
Where RN = 1
order by ID
Create table #Bttab (int ID PK, ...)
insert into #Bttab
WITH BTTAB as
( SELECT * ,
row_nubmer() over (partition by id order by datetime desc) rn
FROM
biggestTable
where DateTime <> '1948-01-01 00:00:00.000' and value = 0
)
Select * from BTAB
Where RN = 1
order by ID
select
a.*,
b.*,
#Btab.*,
#Bttab.*
from
tableA a
join tableB b ON a.ID = b.UID
join *****
left join *******
....
....
....
left outer outer join #Btab
on #Btab.ID = a.ID
left outer outer join #Bttab
on #Bttab.ID = a.ID
where
..................
.................
P.S. I am exploring TVP over #TEMP for this. A TVP supports a PK and has less overhead than #tmp. But I have not compared them head to head in this type of application.
Tested TVP over #TEMP and got a 1/2 second improvement (about the time it take to create and delete a temporary table).

Related

How to join large subset of data with smaller subset data

I have three tables in SQL Server
TABLE_A - contains 500 rows
TABLE_B - contains 1 million rows
TABLE_C - contains 1 million rows
I want to select the rows from TABLE_B and TABLE_C join with TABLE_A based on a row number position from TABLE_B and TABLE_C tables.
Below is my sample query:
SELECT TOP (50), *
INTO ##tempResult
FROM TABLE_A
LEFT JOIN
(SELECT *
FROM
(SELECT
memberID,
ROW_NUMBER() OVER (PARTITION BY TABLE_A.member_id ORDER BY TABLE_A EM.UTupdateDate DESC) AS rowNum,
FROM
TABLE_B
JOIN
TABLE_C ON TABLE_B.memberID = TABLE_C.memberID
)
) AS TABLE_subset
WHERE
TABLE_subset.rowNum <=2
) AS TABLE_INC ON TABLE_A.memberID = TABLE_INC.memberID
WHERE TABLE_A.colA = 'XYZ'
Here the TABLE_subset is joining entire records in TABLE_B and TABLE_C, but I want to join only the top 50 records with TABLE_A.
Is there any way to achieve this ?
Your question and query doesn't match exactly, but CROSS APPLY is probably your friend here.
The general idea is:
select TOP 50 *
from tableA a
CROSS APPLY (
SELECT TOP 2 b.id, c.otherid
from tableB b
inner join tableC c
ON c.id = b.id
where b.id = a.id -- Here you match field between A and B
order by b.date DESC -- order by something
) data
Now just need to adapt to your needs

Exposing more fields on group by sql

I know, in a Group By you can't Select a field that is not in an aggregate function or the GROUP BY clause.
However, There must be a workaround using joins or something else.
I have TWO tables BMP_VISITS_SITES and BMP_VISITS_COMMENTS which are connected by StationID in a one-to-many relationship. One Site can have many comments.
I'm trying to write a query that returns all Sites and the latest (only 1) comment. I have a "working" query but it only returns two columns which are in either an aggregate function or group by.
Here is my "working" query:
select a.StationID,
MAX(b.[dateobserved]) as LastDateObserved,
a.Status
from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
group by a.StationID;
But how can I access all the columns in both tables?
I've tried inner joins with 1/2 success. When I join my BMP_VISITS_SITES to the above query I get all the fields of the table (t1). Great, but as soon as I try joining on BMP_VISITS_COMMENTS (t3) I get more results than I should.
select t1.*, t2.*
--,t3.*
from BMP_VISITS_SITES t1
inner join (
select a.StationID, MAX(b.[dateobserved]) as LastDateObserved from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
group by a.StationID
) t2 on t2.StationID = t1.StationID
--inner join sde.BMP_VISITS_COMMENTS t3 on t3.StationID = t2.StationID;
SELECT a.*, b.* FROM
BMP_VISITS_SITES a
OUTER APPLY
(
SELECT TOP 1 *
FROM BMP_VISITS_COMMENTS b
WHERE b.StationID = a.StationID
ORDER BY LastDateObserved DESC
) b
You can use apply to get the last comment record and return all fields from both sides of the query.
Use row_number()
select *
from
(
select a.StationID,
a.Status,
b.*,
row_number() over (partition by a.stationid, a.status order by b.[dateobserved] desc) as rn
from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
) v
where rn = 1

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID

Select DISTINCT or UNIQUE records or rows in Oracle

I want to select distinct or unique records from a database I am querying. How can I do this but at the same time select the entire record instead of just the column that I am distinguishing as unique? Do I have to do unruly joins?
Depending on the database that you are using, you can use window functions. If you want only rows that never repeat:
select t.*
from (select t.*,
count(*) over (partition by <id>) as numdups
from t
) t
where numdups = 1
If you want one example of each row:
select t.*
from (select t.*,
row_number(*) over (partition by <id> order by <id>) as seqnum
from t
) t
where seqnum = 1
If you don't have window functions, you can get the same thing done with "unruly joins".
If you want only one column out of several to be unique and you have joins that might include multiple records, then you have to determine which of the two or more values you want the query to provide. This can be done with aggregate functions, with correlated sub-queries or derived tables or CTEs (In SQL Server not sure if Oracle has those).
But you have to determine which value you want before you write the query. Once you know that then you probably know how to get it.
Here are some quick examples (I used SQL Server coding conventions but most of this should make sense in Oracle as it is all basic SQL, Oracle may have a different way of declaring a parameter):
select a.a_id, max (b.test) , min (c.test2)
from tablea a
join tableb b on a.a_id = b.a_id
join tablec c on a.a_id = c.a_id
group by a.a_id
order by b.test, c.test2
Select a.a_id, (select top 1 b.test from tableb b where a.a_id = b.a_id order by test2),
(select top 1 b.test2 from tableb b where a.a_id = b.a_id order by test2),
(select top 1 c.test3 from tablec c where a.a_id = c.a_id order by test4)
from tablea a
declare #a_id int
set #a_id = 189
select a.a_id , b.test, b.test4
from tablea a
join tableb b on a.a_id = b.a_id
join (select min(b.b_id) from tableb b where b.a_id = #a_id order by b.test3) c on c.b_id = b.b_id
where a.a_id = #a_id
In the second example
select t.*
from (select t.*,
row_number() over (partition by id order by id ) as seqnum
from t
) t
where seqnum = 1
the row_number() must be without star in the braces.

SQL: Turn a subquery into a join: How to refer to outside table in nested join where clause?

I am trying to change my sub-query in to a join where it selects only one record in the sub-query. It seems to run the sub-query for each found record, taking over a minute to execute:
select afield1, afield2, (
select top 1 b.field1
from anothertable as b
where b.aForeignKey = a.id
order by field1
) as bfield1
from sometable as a
If I try to only select related records, it doesn't know how to bind a.id in the nested select.
select afield1, afield2, bfield1
from sometable a left join (
select top 1 id, bfield, aForeignKey
from anothertable
where anothertable.aForeignKey = a.id
order by bfield) b on
b.aForeignKey = a.id
-- Results in the multi-part identifier "a.id" could not be bound
If I hard code values in the nested where clause, the select duration drops from 60 seconds to under five. Anyone have any suggestions on how to join the two tables while not processing every record in the inner table?
EDIT:
I ended up adding
left outer join (
select *, row_number() over (partition by / order by) as rank) b on
b.aforeignkey = a.id and b.rank = 1
went from ~50 seconds to 8 for 22M rows.
Try this:
WITH qry AS
(
SELECT afield1,
afield2,
b.field1 AS bfield1,
ROW_NUMBER() OVER(PARTITION BY a.id ORDER BY field1) rn
FROM sometable a LEFT JOIN anothertable b
ON b.aForeignKey = a.id
)
SELECT *
FROM qry
WHERE rn = 1
Try this
select afield1,
afield2,
bfield1
from sometable a
left join
(select top 1 id, bfield, aForeignKey from anothertable where aForeignKey in(a.id) order by bfield) b on b.aForeignKey = a.id