SQL Server - Subquery in Join vs Subquery in Where clause - sql

I have a situation where I have source tables that get dumped with all the historic data on a daily basis. The way for me to extract the latest dump is by filtering the records using a date field.
Now I have scenarios where I may need to fetch data from about 4-5 tables in the same query. In that case, which one of the below options would be better for the tables that have high number of records:
SELECT A.col1,
B.col2,
C.col3
FROM (SELECT col1, x
FROM tableA
WHERE posting_date = (SELECT max(posting_date) from tableA)
) A
JOIN
(SELECT col2, y, z
FROM tableB
WHERE posting_date = (SELECT max(posting_date) from tableB)
) B
ON B.y = A.x
JOIN
(SELECT col3, w
FROM tableC
WHERE posting_date = (SELECT max(posting_date) from tableC)
) C
ON C.w = B.z
OR should I do a simple subqueries in the WHERE clause,
SELECT A.col1,
B.col2,
C.col3
FROM tableA A,
tableB B,
tableC
WHERE A.posting_date = (SELECT max(posting_date) from tableA)
AND B.posting_date = (SELECT max(posting_date) from tableB)
AND C.posting_date = (SELECT max(posting_date) from tableC)
AND A.x = B.y
AND B.z = C.w
From the readability perspective, I find the second option better. But I am not too sure of the performance when there will be a lot of records in all the required tables.

I, personally, think that using the ANSI-92 JOIN syntax and then putting the clauses in the WHERE would be the most readable though.
SELECT A.col1,
B.col2,
C.col3
FROM dbo.tableA A
JOIN dbo.tableB B ON A.x = B.y
JOIN dbo.tableC B.z = C.w
WHERE A.posting_date = (SELECT MAX(sq.posting_date) from tableA sq)
AND B.posting_date = (SELECT MAX(sq.posting_date) from tableB sq)
AND C.posting_date = (SELECT MAX(sq.posting_date) from tableC sq);

I wouldn't do either of them. Window functions will serve you better.
Obviously use proper join syntax, not those awful, deprecated comma-joins.
SELECT A.col1,
B.col2,
C.col3
FROM (
SELECT *, maxdate = MAX(a.posting_date) OVER ()
FROM dbo.tableA a
) A
JOIN (
SELECT *, maxdate = MAX(b.posting_date) OVER ()
FROM dbo.tableB b
) B ON A.x = B.y
JOIN (
SELECT *, maxdate = MAX(c.posting_date) OVER ()
FROM dbo.tableC c
) C ON B.z = C.w
WHERE A.posting_date = A.maxdate
AND B.posting_date = B.maxdate
AND C.posting_date = C.maxdate;

Related

conditional select inside select

select t.col1,b.somecolumn t.col2,a.col1,VQ.a,VQ.b,VQ.e,VQ.d,VQ.f,
(select t.status as a, p.id as b,p.permit as c, p.des as d, p.error_code as e, p.cause as f
from table_A t
inner join table_B p on t.a = p.a
where p.c = 'license' and t.status = 'Fail') as VQ
from table_A t
join table_C a on t.col1 = a.asset_id
join table_B b on t.somecolumn = b.somecolumn ;
When I execute the above code, I encounter the error
SQL Error [42601]: ERROR: navigation on column "vq" is not allowed as it is not SUPER type
I am trying to do a select inside select.
It looks like you want to return multiple columns from a correlated subquery. If so, you can do so with a lateral join:
select t.col1, b.somecolumn t.col2, a.col1, vq.*
from table_A t join
table_C a
on t.col1 = a.asset_id join
table_B b
on t.somecolumn = b.somecolumn left join lateral
(select t.status as a, p.id as b,p.permit as c, p.des as d, p.error_code as e, p.cause as f
from table_A t join
table_B p on t.a = p.a
where p.c = 'license' and t.status = 'Fail'
) vq
on 1=1;

Combine two sql queries into a single table

I have two SQL queries where uses an inner join first to match based on a condition, and the other does not. Ultimately, I would like the difference between the columns created by each query. How can I do this?
I have tried unioning and joining the queries as in some similar posts, but it won't work. I wonder if the issue is around the joins within each query.
Query 1 :
SELECT A.date, COUNT(DISTINCT A.id)
FROM A
INNER JOIN B
ON A.id = B.id AND A.date = B.date
AND B.col1 = 'value1'
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC
Query 2 :
SELECT A.date, COUNT(DISTINCT A.id)
FROM A
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC
Your left join of c is actually turned to an inner join because it's used in a NULL excluding expression in the WHERE clause. So you can directly inner join c and left join b. Then you can use a case in one count() to count only the instances where a row from b was joined. Subtract that value from another count() counting all occurrences to get difference.
SELECT a.date,
count(DISTINCT a.id)
-
count(DISTINCT CASE
WHEN b.id IS NOT NULL THEN
a.id
END)
FROM a
INNER JOIN c
ON c.key = a.key
AND c.category = 'cat1'
LEFT JOIN b
ON a.id = b.id
AND a.date = b.date
AND b.col1 = 'value1'
WHERE a.col1 = 'value2'
GROUP BY 1
ORDER BY 1 DESC;
SELECT A.date, COUNT(DISTINCT A.id)
FROM A
INNER JOIN B
ON A.id = B.id AND A.date = B.date
AND B.col1 = 'value1'
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC
UNION
SELECT A.date, COUNT(DISTINCT A.id)
FROM A
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC
A simple way is to JOIN the two queries, using the date column, which is available in both queries :
SELECT x.date, x.cnt, y.cnt, y.cnt - x.cnt
FROM
(
SELECT A.date, COUNT(DISTINCT A.id) AS cnt
FROM A
INNER JOIN B ON A.id = B.id AND A.date = B.date AND B.col1 = 'value1'
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2' AND C.category = 'cat1'
GROUP BY 1
) AS x
INNER JOIN (
SELECT A.date, COUNT(DISTINCT A.id) AS cnt
FROM A
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2' AND C.category = 'cat1'
GROUP BY 1
) AS y ON x.date = y.date
ORDER BY 1 DESC
You might want to adapt the join type according to your data layout :
LEFT JOIN if all dates are available in the first subquery but may be missing in the second subquery
RIGHT JOIN if the situation is the other way around
FULL OUTER JOIN if you want all available dates from both ends
If you choose any of the above option, you would need to use COALESCE to prevent the substraction to return NULL when one of the terms is NULL.

Query Logic best approach

i'm after the data obtained by my two queries plus any other data from the driving table. I'm using the following code but have a feeling my results are wrong.
select * from(
select * from tbl_a a
inner join tbl_b b on (a.id = b.id and a.col_a = b.col_b and a.col_c = '1')
union all
select * from tbl_a a
inner join tbl_b b on (a.col_a = b.col_b and a.col_c = '1')
where (1=1)
and a.id <> b.id
and a.start_time <= b.u_start_time
and a.end_time >= b.u_end_time
union all
select * from tbl_a a
where a.another_id
NOT IN ( -- either query above)
) results;
I'd just like to know if this makes sense or how I could possibly simplify some of this...
Here is query for the first 2 unions,and it is not clear what is the third union condition
SELECT *
FROM
tbl_a a
left join tbl_b b on b.id = a.id and b.col_b = a.col_a
left join tbl_b b1 on a.col_a= b1.col_b and a.id<>b1.id and a.start_time<=b1.u_start_time and a.end_time>=b1.u_end_time
WHERE
a.col_c=1
and COALESCE(b.id,b1.id) is not null

sql - multiple layers of correlated subqueries

I have table A, B and C
I want to return all entries in table A that do not exist in table B and of that list do not exist in table C.
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
this gives me the first result of entries in A that are not in B. But now I want only those entries of this result that are also not in C.
I tried flavours of:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
where not exists (select 1 from table_C as c
where a.id = c.id)
But that isnt the correct logic. If there is a way to store the results from the first query and then select * from that result that are not existent in table C. But I'm not sure how to do that. I appreciate the help.
Try this:
select * from (
select a.*, b.id as b_id, c.id as c_id
from table_A as a
left outer join table_B as b on a.id = b.id
left outer join table_C as c on c.id = a.id
) T
where b_id is null
and c_id is null
Another implementation is this:
select a1.*
from table_A as a1
inner join (
select a.id from table_A
except
select b.id from table_B
except
select c.id from table_c
) as a2 on a1.id = a2.id
Note the restrictions on the form of the sub-query as described here. The second implementation, by most succinctly and clearly describing the desired operation to SQL Server, is likely to be the most efficient.
You have two WHERE clauses in (the external part of) your second query. That is not valid SQL. If you remove it, it should work as expected:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
not exists (select 1 from table_C as c -- WHERE removed
where a.id = c.id) ;
Tested in SQL-Fiddle (thnx #Alexander)
how about using LEFT JOIN
SELECT a.*
FROM TableA a
LEFT JOIN TableB b
ON a.ID = b.ID
LEFT JOIN TableC c
ON a.ID = c.ID
WHERE b.ID IS NULL AND
c.ID IS NULL
SQLFiddle Demo
One more option with NOT EXISTS operator
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM (SELECT b.ID
FROM dbo.test72 b
UNION ALL
SELECT c.ID
FROM dbo.test73 c) x
WHERE a.ID = x.ID
)
Demo on SQLFiddle
Option from #ypercube.Thank for the present;)
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM dbo.test72 b
WHERE a.ID = b.ID
UNION ALL
SELECT 1
FROM dbo.test73 c
WHERE a.ID = c.ID
);
Demo on SQLFiddle
I do not like "not exists" but if for some reason it seems to be more logical to you; then you can use a alias for your first query. Subsequently, you can re apply another "not exists" clause. Something like:
SELECT * FROM
( select * from tableA as a
where not exists (select 1 from tableB as b
where a.id = b.id) )
AS A_NOT_IN_B
WHERE NOT EXISTS (
SELECT 1 FROM tableC as c
WHERE c.id = A_NOT_IN_B.id
)

Aliasing derived table which is a union of two selects

I can't get the syntax right for aliasing the derived table correctly:
SELECT * FROM
(SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
I'm getting a Duplicate column name of B_id. Any suggestions?
The problem isn't the union, it's the select a.*, b.* in each of the inner select statements - since a and b both have B_id columns, that means you have two B_id cols in the result.
You can fix that by changing the selects to something like:
select a.*, b.col_1, b.col_2 -- repeat for columns of b you need
In general, I'd avoid using select table1.* in queries you're using from code (rather than just interactive queries). If someone adds a column to the table, various queries can suddenly stop working.
In your derived table, you are retrieving the column id that exists in table a and table b, so you need to choose one of them or give an alias to them:
SELECT * FROM
(SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
First, you could use UNION ALL instead of UNION. The two subqueries will have no common rows because of the excluding condtion on a.flag.
Another way you could write it, is:
SELECT a.*, b.*
FROM a
INNER JOIN b
ON a.B_id = b.B_id
WHERE ( a.flag IS NULL
AND b.date < NOW()
)
OR
( a.flag IS NOT NULL
AND EXISTS
( SELECT *
FROM c
WHERE a.C_id = c.C_id
AND c.date < NOW()
)
)
ORDER BY RAND()
LIMIT 1