How good is it to write a query like this? - sql

select
a,
b,
(select x from table3 where id = z.id) as c,
d
from
table1 z, table2 zz
where z.id = zz.id;
I know that the query can be simplified easily like below:
select a,
b,
c.x,
d
from
table1 z,table2 zz, table3 c,
where z.id = zz.id and z.id = c.id;
but i want to know what is the performance impact or extra execution happens in case1 or they both have same performance? Asking just for knowledge.

If you want to use a correlated subquery (which is fine), then you should do:
select a, b,
(select t3.x from table3 t3 where t3.id = z.id) as c,
d
from table1 z join
table2 zz
on z.id = zz.id;
Important changes:
Qualify all column names (I don't know where a, b and d come from).
Use explicit join.
You can also write this query as:
select a, b, t3.x, d
from table1 z join
table2 zz
on z.id = zz.id left join
table3 t3
on t3.id = z.id;
This query is subtly different from the previous one. The previous one will return an error if the subquery returns more than one row. This one will put each such value in a different column.
That said, the Oracle optimizer is quite good. I would be surprised if there were any noticeable performance difference.

The first query, with a correlated sub-query, will always return data even if table3 is empty. You need an outer join to get the same result:
select a,
b,
c.x,
d
from table1 z
join table2 zz on z.id = zz.id
left join table3 c on z.id = c.id

Using join the query has been more readable
But performance is that same
select a,
b,
c.x,
d
from table1 z
join table2 zz on z.id = zz.id
join table3 c on z.id = c.id;

If your subquery is returning a single value based on a single input, it is a scalar subquery. A scalar subquery MIGHT improve performance for your query. It will do so under a couple of basic conditions. First, if z.id has a relatively low number of possible values. Scalar subquery processing will cache up to 254 values, if I recall. Second, if the rest of the query is returning a relatively high number of rows. In this case, if you only return a few rows, then the caching will not have an opportunity to help. But if you are returning a lot of rows, the caching benefits will build up.
Others have already highlighted how your original queries are not quite equivalent.
See more on scalar subqueries here -> Scalar Subqueries

Related

Postgresql LATERAL vs INNER JOIN

JOIN
SELECT *
FROM a
INNER JOIN (
SELECT b.id, Count(*) AS Count
FROM b
GROUP BY b.id ) AS b ON b.id = a.id;
LATERAL
SELECT *
FROM a,
LATERAL (
SELECT Count(*) AS Count
FROM b
WHERE a.id = b.id ) AS b;
I understand that here join will be computed once and then merge with the main request vs the request for each FROM.
It seems to me that if join will rotate a few rows to one frame then it will be more efficient but if it will be 1 to 1 then LATERAL - I think right?
If I understand you right you are asking which of the two statements is more efficient.
You can test that yourself using EXPLAIN (ANALYZE), and I guess that the answer depends on the data:
If there are few rows in a, the LATERAL join will probably be more efficient if there is an index on b(id).
If there are many rows in a, the first query will probably be more efficient, because it can use a hash or merge join.

SQL Server double left join counts are different

Code:
Select a.x,
a.y,
b.p,
c.i
from table1 a left join table2 b on a.z=b.z
left join table3 on a.z=c.z;
When I am using the above code I am not getting the correct counts:
Table1 has 30 records.
After first left join I get 30 records but after 2nd left join I am getting 33 records.
I am having hard time figuring out why I am getting different counts. According to my understanding I should be getting 30 counts even after the 2nd left join.
Can anyone help me understand this difference?
I am using sql server 2012
There are multiple rows in table3 with the same z value.
You can find them by doing:
select z, count(*)
from table3
group by z
having count(*) >= 2
order by count(*) desc;
If you want at most one match, then outer apply can be useful:
Select a.x, a.y, b.p, c.i
from table1 a outer apply
(select top 1 b.*
from table2 b
where a.z = b.z
) b outer apply
(select top 1 c.*
from table3 c
where a.z = c.z
) c;
Of course, top 1 should be used with order by, but I don't know which row you want. And, this is probably a stop-gap; you should figure out why there are duplicates.
In your table table3 contain more then 1 row per 1 row in table1. Check one value which is occured more times in both tables.
You can use group by with max function to make one to one row.

How inefficient are virtual table JOINs?

Say I have a query like this, where I join a number of virtual tables:
SELECT table1.a, tbl2.a, tbl3.b, tbl4.c, tbl5.a, tbl6.a
FROM table1
JOIN (SELECT x, a, b, c FROM table2 WHERE foo='bar') tbl2 ON table1.x = tbl2.x
JOIN (SELECT x, a, b, c FROM table3 WHERE foo='bar') tbl3 ON table1.x = tbl3.x
JOIN (SELECT x, a, b, c FROM table4 WHERE foo='bar') tbl4 ON table1.x = tbl2.x
JOIN (SELECT x, a, b, c FROM table5 WHERE foo='bar') tbl5 ON table1.x = tbl5.x
JOIN (SELECT x, a, b, c FROM table6 WHERE foo='bar') tbl6 ON table1.x = tbl6.x
WHERE anotherconstraint='value'
In my real query, each JOIN has its own JOINs, aggregate functions, and WHERE constraints.
How well/poorly would a query like this run? Also, what is the impact difference between this and running all of the individual virtual tables as their own query and linking the results together outside of SQL?
There's nothing inherently bad about using inline views (which is AFAIK the correct term for what you call "virtual tables"). I do recommend learning to view and understand execution plans so you can investigate specific performance issues.
In general, I think it's a very bad idea to execute multiple single-table queries and then essentially join the results together in your front-end code. Doing joins is what an RDBMS is designed for, why re-write it?
Why not just:
SELECT table1.a, tbl2.a, tbl3.b, tbl4.c, tbl5.a, tbl6.a
FROM table1 JOIN table2 on table1.x = table2.x AND table2.foo = 'bar'
JOIN table3 on table1.x = table3.x AND table3.foo = 'bar'
JOIN table4 on table1.x = table4.x AND table4.foo = 'bar'
JOIN table5 on table1.x = table5.x AND table5.foo = 'bar'
JOIN table6 on table1.x = table6.x AND table6.foo = 'bar'
WHERE anotherconstraint='value';
EDIT:
How well would it run? Who knows? As #Vinko states, the answer lies in looking at the execution plan, perhaps supplying hints where appropriate. Something this complex cannot be answered by looking at a contrived example.

Join a table only if result set > 0

I have a table A joined with a table B which give me a result set.
I want to join a table C to the previous ones in order to restrict the result set. But in case there is no result with this join, I would like to have the same result set than before (without taking care of C).
Can you think of way to do that in SQL ?
SELECT *
FROM TableA
INNER JOIN TableB
ON TableA.ID = TableB.TableAID
LEFT JOIN TableC
ON TableC.ID = TableB.TableCID
This will return all rows from Tables A & B but only the rows from TableC where the ON criteria match.
Otherwise conditional joins don't really apply in standard SQL. If you are using SQL Server you can perform some stored procedure logic to check the results from TableC and if there are none then only get data from Table A & B. But this approach with be provider specific
Not possible with regular SQL since it involves logic.
Your best bet is to make a small script, e.g. (in pseudo code)
select * into #tmp from x inner join y inner join z where blabla;
if (exists (select * from #tmp))
BEGIN
select * from #tmp
END
else
BEGIN
select * from x inner join y where blabla;
END
Edit:
But if I were you, I would just always join with C using a LEFT JOIN, so you can see if the result was in one or the other result set...
e.g.
select x.*, y.*, case when z.id is null then 0 else 1 end from x inner join y left join z on blabla where blabla;
But that of course assumes you are able to alter the code path that reads the result.
I see a problem in the LEFT/OUTER JOIN methods. If you do it you could get some results that are in A and B but not in C. If I understand well the porpouse is join AB with C, I mean the result when crossing with C must include the three restrictions. So the #Cine solution is the apropiate to this case.

can this be written with an outer join

The requirement is to copy rows from Table B into Table A. Only rows with an id that doesn't already exist, need to be copied over:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
WHERE b.id IS NOT IN (SELECT id FROM A WHERE x='t');
^^^^^^^^^^^
Now, I was trying to write this with an outer join to compare the explain paths, but I can't write this (efficiently at least).
Note that the sql highlighted with ^'s make this tricky.
try
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM TableB b
Left Join TableA a
On a.Id = b.Id
And a.x = 't'
Where a.Id Is Null
But I prefer the subquery representation as I think it more clearly expresses what you are doing.
Why are you not happy with what you have? If you check your explain plan, I promise you it says that an anti-join is performed, if the optimizer thinks that is the most efficient way (which it most likely will).
For everyone who reads this: SQL is not what actually is executed. SQL is a way of telling the database what you want, not what to do. All decent databases will be able to treat NOT EXISTS and NOT IN as equal (when they are, ie. there are no null values) and perform an anti-join. The trick with an outer join and an IS NULL condition doesn't work on SQL Server, though (SQL Server is not clever enough to transform it to an antijoin).
Your query will perform better than the query with outer join.
I guess the following query will do the job:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
LEFT JOIN A a
ON b.id = a.id AND NOT a.x='t'
INSERT INTO A (id, x, y)
SELECT
B.id, B.x, B.y
FROM
B
WHERE
NOT EXISTS (SELECT * FROM A WHERE B.id = A.id AND A.x = 't')