Avoiding subquery

Avoiding subquery - sql

Fellows,
I have a query as follows:
SELECT A.ID, B.ID, (HUGE SUBQUERY) as HS
FROM TABLE_A JOIN TABLE_B ON A.ID = B.ID
WHERE (HUGE SUBQUERY) > 0
I'd like to avoid repeating the subquery.
Is there any way to rewrite my WHERE as something like
WHERE HS > 0
Or I must turn my subquery into a join?

Look at With clause :
WITH HS AS (Huge subquery)
SELECT A.ID, B.ID, HS
FROM TABLE_A JOIN TABLE_B ON A.ID = B.ID
WHERE HS > 0
OR
SELECT *
FROM
(
SELECT A.ID, B.ID, (HUGE SUBQUERY) as HS
FROM TABLE_A JOIN TABLE_B ON A.ID = B.ID
)
WHERE HS > 0

You could use a CTE:
WITH cteHS AS (
SELECT xxx AS Value
FROM Huge Subquery
)
SELECT A.ID, B.ID, cteHS.Value as HS
FROM TABLE_A, cteHS
JOIN TABLE_B ON A.ID = B.ID
WHERE cteHS.Value > 0

Related

SQL - Get a reference to a table in a subquery

Is it possible to access a table from a subquery?
Select d.table_c.*
from (with table_c as (select *
from table_a)
select *
from table_b
where table_a.id = table_b.id) as d
table_c is inside the subquery of d, I've tried to access it using d.table_c, but it doesn't seem to work.

You cannot use CTE as subquery. But you can write like below.
;WITH table_c
as
(SELECT * FROM table_a)
SELECT *
from table_b b
INNER JOIN table_c c on c.id = b.id

SQL summations with multiple outer joins

I have tables a, b, c, and d whereby:
There are 0 or more b rows for each a row
There are 0 or more c rows for each a row
There are 0 or more d rows for each a row
If I try a query like the following:
SELECT a.id, SUM(b.debit), SUM(c.credit), SUM(d.other)
FROM a
LEFT JOIN b on a.id = b.a_id
LEFT JOIN c on a.id = c.a_id
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
I notice that I have created a cartesian product and therefore my sums are incorrect (much too large).
I see that there are other SO questions and answers, however I'm still not grasping how I can accomplish what I want to do in a single query. Is it possible in SQL to write a query which aggregates all of the following data:
SELECT a.id, SUM(b.debit)
FROM a
LEFT JOIN b on a.id = b.a_id
GROUP BY a.id
SELECT a.id, SUM(c.credit)
FROM a
LEFT JOIN c on a.id = c.a_id
GROUP BY a.id
SELECT a.id, SUM(d.other)
FROM a
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
in a single query?

Your analysis is correct. Unrelated JOIN create cartesian products.
You have to do the sums separately and then do a final addition. This is doable in one query and you have several options for that:
Sub-requests in your SELECT: SELECT a.id, (SELECT SUM(b.debit) FROM b WHERE b.a_id = a.id) + ...
CROSS APPLY with a similar query as the first bullet then SELECT a.id, b_sum + c_sum + d_sum
UNION ALL as you suggested with an outer SUM and GROUP BY on top of that.
LEFT JOIN to similar subqueries as above.
And probably more... The performance of the various solutions might be slightly different depending on how many rows in A you want to select.

SELECT a.ID, debit, credit, other
FROM a
LEFT JOIN (SELECT a_id, SUM(b.debit) as debit
FROM b
GROUP BY a_id) b ON a.ID = b.a_id
LEFT JOIN (SELECT a_id, SUM(b.credit) as credit
FROM c
GROUP BY a_id) c ON a.ID = c.a_id
LEFT JOIN (SELECT a_id, SUM(b.other) as other
FROM d
GROUP BY a_id) d ON a.ID = d.a_id

Can also be done with correlated subqueries:
SELECT a.id
, (SELECT SUM(debit) FROM b WHERE a.id = b.a_id)
, (SELECT SUM(credit) FROM c WHERE a.id = c.a_id)
, (SELECT SUM(other) FROM d WHERE a.id = d.a_id)
FROM a

SQL Server Function for Maximum Date in Different Fields

I have multiple related tables in a database and each can be updated separately and have their own LastUpdated Date field. In one of these tables there is more than one LastUpdated field each indicating from which source that record was updated. We commonly query these multiple tables as a single item with joins and thus I would like to know for each record what the most recent LastUpdated record is across all joins. I know this can be achieved with many sub-queries but I was wondering whether there was something along the lines of the Coalesce function into which you can pass many fields and it returns the first non-null value.
So it would read something like:
SELECT a.a_id,
a.name,
b.b_id,
b.detail,
c.c_id,
c.otherfield,
Maxdate(a.lastupdated, a.lastupdatedfromweb, b.lastupdated,
c.lastupdated) AS
LastUpdatedDate
FROM a
INNER JOIN b
ON a.a_id = b.a_id
INNER JOIN c
ON b.b_id = c.b_id
Any ideas? Could this be written as a custom function or does it exist in the box? I am working on SQL Server 2005 and 2008 if that helps.

In 2008+ you can use CROSS APPLY to create a row for each of your dates, then select the Max:
SELECT
a.a_Id,
a.Name,
b.b_Id,
b.Detail,
c.c_Id,
c.OtherField,
ld.LastUpdatedDate
FROM
a INNER JOIN b ON a.a_Id = b.a_Id
INNER JOIN c ON b.b_Id = c.b_Id
CROSS APPLY
( SELECT LastUpdatedDate = MAX(LastUpdatedDate)
FROM (VALUES
(a.LastUpdated),
(a.LastUpdatedFromWeb),
(b.LastUpdated),
(c.LastUpdated)
) d (LastUpdatedDate)
) ld
You could also do this as a correlated subquery, but during optimisation SQL Server will rewrite correlated subqueries as OUTER APPLY so I prefer to just cut a step out and write the APPLY myself, as I have found some unusual behaviour when SQL deconstructs the correlated subquery and rewrites it as an APPLY. More details of this are described in this answer
For 2005, I think you will need to use a slightly different method as it doesn't support table valued constructors:
SELECT
a.a_Id,
a.Name,
b.b_Id,
b.Detail,
c.c_Id,
c.OtherField,
ld.LastUpdatedDate,
( SELECT MAX(CASE Number
WHEN 1 THEN a.LastUpdated
WHEN 2 THEN a.LastUpdatedFromWeb
WHEN 3 THEN b.LastUpdated
WHEN 4 THEN c.LastUpdated
END)
FROM (SELECT TOP 4 Number = ROW_NUMBER() OVER(ORDER BY object_id)
FROM sys.all_objects) n
) AS LastUpdated
FROM
a INNER JOIN b ON a.a_Id = b.a_Id
INNER JOIN c ON b.b_Id = c.b_Id;
Or:
SELECT
a.a_Id,
a.Name,
b.b_Id,
b.Detail,
c.c_Id,
c.OtherField,
ld.LastUpdatedDate,
( SELECT MAX(LastUpdated)
FROM ( SELECT a.LastUpdated UNION ALL
SELECT a.LastUpdatedFromWeb UNION ALL
SELECT b.LastUpdated UNION ALL
SELECT c.LastUpdated
) d
) AS LastUpdated
FROM
a INNER JOIN b ON a.a_Id = b.a_Id
INNER JOIN c ON b.b_Id = c.b_Id;
Unfortunately I no longer have any 2005 instances installed so I can't test this.
EDIT
Just realised that you wanted the source of the field too, and also remembered that 2005 does support the use of APPLY, so for 2008+:
SELECT
a.a_Id,
a.Name,
b.b_Id,
b.Detail,
c.c_Id,
c.OtherField,
ld.LastUpdatedDate,
ld.FieldName
FROM
a INNER JOIN b ON a.a_Id = b.a_Id
INNER JOIN c ON b.b_Id = c.b_Id
CROSS APPLY
( SELECT TOP 1 LastUpdatedDate, FieldName
FROM (VALUES
(a.LastUpdated, 'a.LastUpdated'),
(a.LastUpdatedFromWeb, 'a.LastUpdatedFromWeb'),
(b.LastUpdated, 'b.LastUpdated'),
(c.LastUpdated, 'c.LastUpdated')
) d (LastUpdatedDate, FieldName)
ORDER BY LastUpdated DESC
) ld
For 2005:
SELECT
a.a_Id,
a.Name,
b.b_Id,
b.Detail,
c.c_Id,
c.OtherField,
ld.LastUpdatedDate,
ld.FieldName
FROM
a INNER JOIN b ON a.a_Id = b.a_Id
INNER JOIN c ON b.b_Id = c.b_Id
CROSS APPLY
( SELECT TOP 1 LastUpdatedDate = LastUpdated, FieldName
FROM (
SELECT a.LastUpdated, FieldName = 'a.LastUpdated' UNION ALL
SELECT a.LastUpdatedFromWeb, 'a.LastUpdatedFromWeb' UNION ALL
SELECT b.LastUpdated, 'b.LastUpdated' UNION ALL
SELECT c.LastUpdated, 'c.LastUpdated'
) d
ORDER BY LastUpdated DESC
) ld

SELECT
a.a_Id,
a.Name,
b.b_Id,
b.Detail,
c.c_Id,
c.OtherField,
MAX(SELECT a.LastUpdated, a.LastUpdatedFromWeb, b.LastUpdated, c.LastUpdated FROM a INNER JOIN b ON a.a_Id = b.a_Id INNER JOIN c ON b.b_Id = c.b_Id) AS LastUpdatedDate
FROM
a INNER JOIN b ON a.a_Id = b.a_Id
INNER JOIN c ON b.b_Id = c.b_Id

sql - multiple layers of correlated subqueries

I have table A, B and C
I want to return all entries in table A that do not exist in table B and of that list do not exist in table C.
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
this gives me the first result of entries in A that are not in B. But now I want only those entries of this result that are also not in C.
I tried flavours of:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
where not exists (select 1 from table_C as c
where a.id = c.id)
But that isnt the correct logic. If there is a way to store the results from the first query and then select * from that result that are not existent in table C. But I'm not sure how to do that. I appreciate the help.

Try this:
select * from (
select a.*, b.id as b_id, c.id as c_id
from table_A as a
left outer join table_B as b on a.id = b.id
left outer join table_C as c on c.id = a.id
) T
where b_id is null
and c_id is null
Another implementation is this:
select a1.*
from table_A as a1
inner join (
select a.id from table_A
except
select b.id from table_B
except
select c.id from table_c
) as a2 on a1.id = a2.id
Note the restrictions on the form of the sub-query as described here. The second implementation, by most succinctly and clearly describing the desired operation to SQL Server, is likely to be the most efficient.

You have two WHERE clauses in (the external part of) your second query. That is not valid SQL. If you remove it, it should work as expected:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
not exists (select 1 from table_C as c -- WHERE removed
where a.id = c.id) ;
Tested in SQL-Fiddle (thnx #Alexander)

how about using LEFT JOIN
SELECT a.*
FROM TableA a
LEFT JOIN TableB b
ON a.ID = b.ID
LEFT JOIN TableC c
ON a.ID = c.ID
WHERE b.ID IS NULL AND
c.ID IS NULL
SQLFiddle Demo

One more option with NOT EXISTS operator
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM (SELECT b.ID
FROM dbo.test72 b
UNION ALL
SELECT c.ID
FROM dbo.test73 c) x
WHERE a.ID = x.ID
)
Demo on SQLFiddle
Option from #ypercube.Thank for the present;)
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM dbo.test72 b
WHERE a.ID = b.ID
UNION ALL
SELECT 1
FROM dbo.test73 c
WHERE a.ID = c.ID
);
Demo on SQLFiddle

I do not like "not exists" but if for some reason it seems to be more logical to you; then you can use a alias for your first query. Subsequently, you can re apply another "not exists" clause. Something like:
SELECT * FROM
( select * from tableA as a
where not exists (select 1 from tableB as b
where a.id = b.id) )
AS A_NOT_IN_B
WHERE NOT EXISTS (
SELECT 1 FROM tableC as c
WHERE c.id = A_NOT_IN_B.id
)

Aliasing derived table which is a union of two selects

I can't get the syntax right for aliasing the derived table correctly:
SELECT * FROM
(SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
I'm getting a Duplicate column name of B_id. Any suggestions?

The problem isn't the union, it's the select a.*, b.* in each of the inner select statements - since a and b both have B_id columns, that means you have two B_id cols in the result.
You can fix that by changing the selects to something like:
select a.*, b.col_1, b.col_2 -- repeat for columns of b you need
In general, I'd avoid using select table1.* in queries you're using from code (rather than just interactive queries). If someone adds a column to the table, various queries can suddenly stop working.

In your derived table, you are retrieving the column id that exists in table a and table b, so you need to choose one of them or give an alias to them:
SELECT * FROM
(SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1

First, you could use UNION ALL instead of UNION. The two subqueries will have no common rows because of the excluding condtion on a.flag.
Another way you could write it, is:
SELECT a.*, b.*
FROM a
INNER JOIN b
ON a.B_id = b.B_id
WHERE ( a.flag IS NULL
AND b.date < NOW()
)
OR
( a.flag IS NOT NULL
AND EXISTS
( SELECT *
FROM c
WHERE a.C_id = c.C_id
AND c.date < NOW()
)
)
ORDER BY RAND()
LIMIT 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Avoiding subquery - sql

Fellows, I have a query as follows: SELECT A.ID, B.ID, (HUGE SUBQUERY) as HS FROM TABLE_A JOIN TABLE_B ON A.ID = B.ID WHERE (HUGE SUBQUERY) > 0 I'd like to avoid repeating the subquery. Is there any way to rewrite my WHERE as something like WHERE HS > 0 Or I must turn my subquery into a join?

Look at With clause : WITH HS AS (Huge subquery) SELECT A.ID, B.ID, HS FROM TABLE_A JOIN TABLE_B ON A.ID = B.ID WHERE HS > 0 OR SELECT * FROM ( SELECT A.ID, B.ID, (HUGE SUBQUERY) as HS FROM TABLE_A JOIN TABLE_B ON A.ID = B.ID ) WHERE HS > 0

You could use a CTE: WITH cteHS AS ( SELECT xxx AS Value FROM Huge Subquery ) SELECT A.ID, B.ID, cteHS.Value as HS FROM TABLE_A, cteHS JOIN TABLE_B ON A.ID = B.ID WHERE cteHS.Value > 0

Related

SQL - Get a reference to a table in a subquery

SQL summations with multiple outer joins

SQL Server Function for Maximum Date in Different Fields

sql - multiple layers of correlated subqueries

Aliasing derived table which is a union of two selects

Categories

Resources