Why does an INSERT take time when using subqueries - sql

INSERT INTO TableA
SELECT
x,
y,
z
FROM TableB
WHERE x IN
(select DISTINCT x
FROM TableC
WHERE x NOT IN
(SELECT DISTINCT x from TableD)
)
This query takes forever and it doesn't complete.
When I run the each select query it works fine but when I run it all it takes forever? Can you see the reason?

Try this query :
insert into TableA
select b.*
from TableB b --with(nolock)
left outer join TableC c --with(nolock)
on b.x = c.x
left outer join TableD d --with(nolock)
on c.x = d.x
where c.x is not null and d.x is null
if it is also running infinite then uncomment with(nolock) and try again. if it does not work then check estimated execution plan.

Firstly you need to look at the execution plan for the query - it might tell you where the bottlenecks are or if there are missing indexes that would significantly speed up your query - I think this is likely as your query is simple so I don't see why it would take so long;
I think you could also restructure you query so that it uses Joins instead of the not in - it would help if i knew the data to see if this produced the same results, But i think it should;
SELECT B.x,
B.y,
B.Z
FROM TableB B
INNER JOIN --where in
(
SELECT DISTINCT x
FROM TableC c
LEFT JOIN TableD d
ON c.x = d.x
WHERE d.x IS NULL -- c x not in d x
) sub
on B.x = sub.x

Subqueries and DISTINCT when not needed are notoriously poor for performance. You can accomplish what you need using JOINs.
SELECT b.x, b.y, b.z
FROM TableB b
INNER JOIN TableC c ON c.x=b.x
LEFT JOIN TableD d ON d.x=b.x
WHERE d.x IS NULL
GROUP BY b.x, b.y, b.z -- only if you have duplicates and need unique records
The INNER JOIN on TableC fixes your 1st "IN", then the LEFT JOIN and d.x IS NULL fixes your "NOT IN" clause.
Lastly, make sure that you have indexes on the "x" column in each table.
CREATE INDEX IX_TableB_X ON TableB (X);
CREATE INDEX IX_TableC_X ON TableC (X);
CREATE INDEX IX_TableD_X ON TableD (X);

Related

Does the order of columns in a SQL join statement matter? [duplicate]

Disregarding performance, will I get the same result from query A and B below? How about C and D?
----- Scenario 1:
-- A (left join)
select *
from a left join b
on <blahblah>
left join c
on <blahblan>
-- B (left join)
select *
from a left join c
on <blahblah>
left join b
on <blahblan>
----- Scenario 2:
-- C (inner join)
select *
from a join b
on <blahblah>
join c
on <blahblan>
-- D (inner join)
select *
from a join c
on <blahblah>
join b
on <blahblan>
For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.
For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.
First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a
Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
is equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
but:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id
is not equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id
Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:
a LEFT JOIN b
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
This is equivalent to a LEFT JOIN (b LEFT JOIN c):
a LEFT JOIN
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition
only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.
You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.
If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.
If you try joining C on a field from B before joining B, i.e.:
SELECT A.x,
A.y,
A.z
FROM A
INNER JOIN C
on B.x = C.x
INNER JOIN B
on A.x = B.x
your query will fail, so in this case the order matters.
for regular Joins, it doesn't. TableA join TableB will produce the same execution plan as TableB join TableA (so your C and D examples would be the same)
for left and right joins it does. TableA left Join TableB is different than TableB left Join TableA, BUT its the same than TableB right Join TableA
Oracle optimizer chooses join order of tables for inner join.
Optimizer chooses the join order of tables only in simple FROM clauses .
U can check the oracle documentation in their website.
And for the left, right outer join the most voted answer is right.
The optimizer chooses the optimal join order as well as the optimal index for each table. The join order can affect which index is the best choice. The optimizer can choose an index as the access path for a table if it is the inner table, but not if it is the outer table (and there are no further qualifications).
The optimizer chooses the join order of tables only in simple FROM clauses. Most joins using the JOIN keyword are flattened into simple joins, so the optimizer chooses their join order.
The optimizer does not choose the join order for outer joins; it uses the order specified in the statement.
When selecting a join order, the optimizer takes into account:
The size of each table
The indexes available on each table
Whether an index on a table is useful in a particular join order
The number of rows and pages to be scanned for each table in each join order

T-SQL: Wild Card as table name alias in select statement possible?

This is more of a curiosity than an actual applied question. Say you have a statement with multiple joins such as:
SELECT
a.name,
b.salary,
c.x
FROM
[table1] a
INNER JOIN [table2] b
ON a.key = b.key
INNER JOIN [table3] c
ON b.key = c.key
Now, say you were to make several more joins to other tables whose schema was unfamiliar, however you know:
the keys on which to make the join
that several of those tables has a column with the the name 'x'.
Is it possible to select 'x' from all tables that contain it, without explicitly referring to the table alias. So it would ave a similar results as this (if it were possible)
SELECT
a.name,
b.salary,
*.x
...
No this isn't possible.
You can use a.* to get all columns from a but it is not valid to use a wildcard as the table name.
#Martin Smith is correct that you can't use *.x and refer to columns from multiple tables. There is however a way to write a query that shows all columns x from tables where they exist without breaking if one or more of the tables do not have such column. It's a rather complicated way that (mis)uses scope resolution.
Lets say that some of the tables (b and d in the example) have a column named x, while some others (c here) do not have such column. Then you can replace INNER joins with CROSS APPLY and LEFT joins with OUTER APPLY and a query with:
SELECT
a.name,
a.salary,
b.x AS bx,
'WITHOUT column x' AS cx,
d.x AS dx
FROM
a
INNER JOIN b
ON a.aid = b.aid
LEFT JOIN c
ON a.aid = c.aid
LEFT JOIN d
ON a.aid = d.aid ;
would be written as:
SELECT
a.name,
a.salary,
bx,
cx,
dx
FROM
( SELECT a.*,
'WITHOUT column x' AS x
FROM a
) a
CROSS APPLY
( SELECT x AS bx
FROM b
WHERE a.aid = b.aid
) b
OUTER APPLY
( SELECT x AS cx
FROM c
WHERE a.aid = c.aid
) c
OUTER APPLY
( SELECT x AS dx
FROM d
WHERE a.aid = d.aid
) d ;
Tested at SQL-Server 2008: SQL-Fiddle

SQL Case With Many Columns

Hi I have looked through the "Case with multiple columns" questions and don't see something the same as this so I think I should ask.
Basically I have two tables (both are the result of a subquery) which I want to join. They have the same column names. If I join them on their ids and SELECT * I get each row being something like this:
A.id, A.x, A.y, A.z, A.num, B.id, B.x, B.y, B.z, B.num
What I want is a way to only select the columns of the table with the lower value of num. So in this case the result table would always have 5 columns, id, x, y, z, num, and I don't care which table id, x, y, z, num came from after the fact. Also either table result is fine if they are equal.
SELECT CASE WHEN A.num < B.num THEN A.* ELSE B.* END FROM A JOIN B ON A.id=B.id
would be perfect but you can only return one column in a CASE statement, and I could use a CASE for every column but that seems so wasteful (there are 8 in each table in my actual database so I would have 8 CASE statements).
This is SQLite btw. Any help would be appreciated!
Edit for more info on A and B:
A and B come from Queries like this
SELECT "thought case statement might go here" FROM
(SELECT id, x, y, z, num FROM Table1 a JOIN Table2 b ON a.id=b.id AND (y BETWEEN (53348574-3593) AND (53348574+3593)) AND (z BETWEEN (-6259973-6027) AND (-6259973+6027)) JOIN Table3 c ON c.id= b.id GROUP BY a.id, c.r) A
JOIN
(SELECT id, x, y, z, num FROM Table1 a JOIN Table2 b ON a.id=b.id AND (y BETWEEN (53401007-3593) AND (53401007+3593)) AND (z BETWEEN (-6397286-6027) AND (-6397286+6027)) JOIN Table3 c ON c.id= b.id GROUP BY a.id, c.r ) B ON A.id=B.id
So it joins two tables, made based on geolocation if you're wondering why the big numbers, and needs to decide which of the tables to take its data from based on attributes of what it finds in either of the locations.
try
select A.id, A.x, A.y, A.z, A.num from A JOIN B ON A.id=B.id where a.num<b.num
union
select b.id, b.x, b.y, b.z, b.num from A JOIN B ON A.id=B.id where b.num<a.numhere
I don't know of any RDBMS supporting what you want. You will have to write 8 CASE statements. But why is it so wasteful? Or are you just lazy? :)
Edit:
See, when you write SELECT * ... what your RDBMS is doing is to query system tables (information_schema and so on) and get the list of columns in the table.
So when you write
SELECT CASE WHEN A.num < B.num THEN A.* ELSE B.* END ...
you basically write
SELECT CASE WHEN A.num < B.num THEN A.num, A.whatever, A.more, ... ELSE B.num, B.whatever, B.more... END FROM A JOIN B ON A.id=B.id
and this is unfortunately wrong syntax.

Does the join order matter in SQL?

Disregarding performance, will I get the same result from query A and B below? How about C and D?
----- Scenario 1:
-- A (left join)
select *
from a left join b
on <blahblah>
left join c
on <blahblan>
-- B (left join)
select *
from a left join c
on <blahblah>
left join b
on <blahblan>
----- Scenario 2:
-- C (inner join)
select *
from a join b
on <blahblah>
join c
on <blahblan>
-- D (inner join)
select *
from a join c
on <blahblah>
join b
on <blahblan>
For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.
For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.
First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a
Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
is equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
but:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id
is not equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id
Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:
a LEFT JOIN b
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
This is equivalent to a LEFT JOIN (b LEFT JOIN c):
a LEFT JOIN
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition
only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.
You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.
If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.
If you try joining C on a field from B before joining B, i.e.:
SELECT A.x,
A.y,
A.z
FROM A
INNER JOIN C
on B.x = C.x
INNER JOIN B
on A.x = B.x
your query will fail, so in this case the order matters.
for regular Joins, it doesn't. TableA join TableB will produce the same execution plan as TableB join TableA (so your C and D examples would be the same)
for left and right joins it does. TableA left Join TableB is different than TableB left Join TableA, BUT its the same than TableB right Join TableA
Oracle optimizer chooses join order of tables for inner join.
Optimizer chooses the join order of tables only in simple FROM clauses .
U can check the oracle documentation in their website.
And for the left, right outer join the most voted answer is right.
The optimizer chooses the optimal join order as well as the optimal index for each table. The join order can affect which index is the best choice. The optimizer can choose an index as the access path for a table if it is the inner table, but not if it is the outer table (and there are no further qualifications).
The optimizer chooses the join order of tables only in simple FROM clauses. Most joins using the JOIN keyword are flattened into simple joins, so the optimizer chooses their join order.
The optimizer does not choose the join order for outer joins; it uses the order specified in the statement.
When selecting a join order, the optimizer takes into account:
The size of each table
The indexes available on each table
Whether an index on a table is useful in a particular join order
The number of rows and pages to be scanned for each table in each join order

can this be written with an outer join

The requirement is to copy rows from Table B into Table A. Only rows with an id that doesn't already exist, need to be copied over:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
WHERE b.id IS NOT IN (SELECT id FROM A WHERE x='t');
^^^^^^^^^^^
Now, I was trying to write this with an outer join to compare the explain paths, but I can't write this (efficiently at least).
Note that the sql highlighted with ^'s make this tricky.
try
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM TableB b
Left Join TableA a
On a.Id = b.Id
And a.x = 't'
Where a.Id Is Null
But I prefer the subquery representation as I think it more clearly expresses what you are doing.
Why are you not happy with what you have? If you check your explain plan, I promise you it says that an anti-join is performed, if the optimizer thinks that is the most efficient way (which it most likely will).
For everyone who reads this: SQL is not what actually is executed. SQL is a way of telling the database what you want, not what to do. All decent databases will be able to treat NOT EXISTS and NOT IN as equal (when they are, ie. there are no null values) and perform an anti-join. The trick with an outer join and an IS NULL condition doesn't work on SQL Server, though (SQL Server is not clever enough to transform it to an antijoin).
Your query will perform better than the query with outer join.
I guess the following query will do the job:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
LEFT JOIN A a
ON b.id = a.id AND NOT a.x='t'
INSERT INTO A (id, x, y)
SELECT
B.id, B.x, B.y
FROM
B
WHERE
NOT EXISTS (SELECT * FROM A WHERE B.id = A.id AND A.x = 't')