How inefficient are virtual table JOINs? - sql

Say I have a query like this, where I join a number of virtual tables:
SELECT table1.a, tbl2.a, tbl3.b, tbl4.c, tbl5.a, tbl6.a
FROM table1
JOIN (SELECT x, a, b, c FROM table2 WHERE foo='bar') tbl2 ON table1.x = tbl2.x
JOIN (SELECT x, a, b, c FROM table3 WHERE foo='bar') tbl3 ON table1.x = tbl3.x
JOIN (SELECT x, a, b, c FROM table4 WHERE foo='bar') tbl4 ON table1.x = tbl2.x
JOIN (SELECT x, a, b, c FROM table5 WHERE foo='bar') tbl5 ON table1.x = tbl5.x
JOIN (SELECT x, a, b, c FROM table6 WHERE foo='bar') tbl6 ON table1.x = tbl6.x
WHERE anotherconstraint='value'
In my real query, each JOIN has its own JOINs, aggregate functions, and WHERE constraints.
How well/poorly would a query like this run? Also, what is the impact difference between this and running all of the individual virtual tables as their own query and linking the results together outside of SQL?

There's nothing inherently bad about using inline views (which is AFAIK the correct term for what you call "virtual tables"). I do recommend learning to view and understand execution plans so you can investigate specific performance issues.
In general, I think it's a very bad idea to execute multiple single-table queries and then essentially join the results together in your front-end code. Doing joins is what an RDBMS is designed for, why re-write it?

Why not just:
SELECT table1.a, tbl2.a, tbl3.b, tbl4.c, tbl5.a, tbl6.a
FROM table1 JOIN table2 on table1.x = table2.x AND table2.foo = 'bar'
JOIN table3 on table1.x = table3.x AND table3.foo = 'bar'
JOIN table4 on table1.x = table4.x AND table4.foo = 'bar'
JOIN table5 on table1.x = table5.x AND table5.foo = 'bar'
JOIN table6 on table1.x = table6.x AND table6.foo = 'bar'
WHERE anotherconstraint='value';
EDIT:
How well would it run? Who knows? As #Vinko states, the answer lies in looking at the execution plan, perhaps supplying hints where appropriate. Something this complex cannot be answered by looking at a contrived example.

Related

How good is it to write a query like this?

select
a,
b,
(select x from table3 where id = z.id) as c,
d
from
table1 z, table2 zz
where z.id = zz.id;
I know that the query can be simplified easily like below:
select a,
b,
c.x,
d
from
table1 z,table2 zz, table3 c,
where z.id = zz.id and z.id = c.id;
but i want to know what is the performance impact or extra execution happens in case1 or they both have same performance? Asking just for knowledge.
If you want to use a correlated subquery (which is fine), then you should do:
select a, b,
(select t3.x from table3 t3 where t3.id = z.id) as c,
d
from table1 z join
table2 zz
on z.id = zz.id;
Important changes:
Qualify all column names (I don't know where a, b and d come from).
Use explicit join.
You can also write this query as:
select a, b, t3.x, d
from table1 z join
table2 zz
on z.id = zz.id left join
table3 t3
on t3.id = z.id;
This query is subtly different from the previous one. The previous one will return an error if the subquery returns more than one row. This one will put each such value in a different column.
That said, the Oracle optimizer is quite good. I would be surprised if there were any noticeable performance difference.
The first query, with a correlated sub-query, will always return data even if table3 is empty. You need an outer join to get the same result:
select a,
b,
c.x,
d
from table1 z
join table2 zz on z.id = zz.id
left join table3 c on z.id = c.id
Using join the query has been more readable
But performance is that same
select a,
b,
c.x,
d
from table1 z
join table2 zz on z.id = zz.id
join table3 c on z.id = c.id;
If your subquery is returning a single value based on a single input, it is a scalar subquery. A scalar subquery MIGHT improve performance for your query. It will do so under a couple of basic conditions. First, if z.id has a relatively low number of possible values. Scalar subquery processing will cache up to 254 values, if I recall. Second, if the rest of the query is returning a relatively high number of rows. In this case, if you only return a few rows, then the caching will not have an opportunity to help. But if you are returning a lot of rows, the caching benefits will build up.
Others have already highlighted how your original queries are not quite equivalent.
See more on scalar subqueries here -> Scalar Subqueries

Use Temporary Tables or Nested Select in retrieving data from multiple table?

I've got a query similar to this below where data are retrieve from multiple tables.. The problem is if this table is to retrieve multiple data... the process would definitely would it be better or more efficient to use nested select or temp table to optimize my select statement... and how should I be grouping my joins...
Select a.Name,
b.type,
c.color,
d.group,
e.location
f.quantity
g.cost
from Table1 a
INNER JOIN Table2 b ON a.ID=b.ItemCode
INNER JOIN TABLE3 c ON b.ItemCOde = c.groupID
INNER JOIN TABLE4 d ON c.groupID = d.batchID
LEFT JOIN TABLE5 e ON d.batchID = e.PostalID
LEFT JOIN TABLE6 f ON e.PostalID = f.CountID
LEFT JOIN TABLE7 g ON f.CountID = g.InventoryNo
The order of join could be important: start with the most selective table(s) and continue with least selective table(s).
Nested queries vs. temp table: it's old dilemma and there is no "magic" solution. In some cases temp table can improve performance. The truth is: every query is different story. Try with both solution and analyze query execution plan.
This might work..!!!
Select a.Name,b.type,c.color,d.group,e.locationf.quantity,g.cost
from Table1 a,Table2 b,TABLE3 c,TABLE4 d,TABLE5 e,TABLE6 f,TABLE7 g
where a.ID=b.ItemCode,b.ItemCOde = c.groupID,c.groupID = d.batchID,
d.batchID = e.PostalID,
e.PostalID = f.CountID,f.CountID = g.InventoryNo;

Is it possible to use IF or CASE in sql FROM statement

I have a long stored procedure and I would like to make a slight modification to the procedure without having to create a new one(for maintenance purposes).
Is it possible to use a IF or CASE in the FROM statement of the select statement to join other tables?
Like this:
from tableA a
join tableB b a.indexed = c.indexed
IF #Param='Y'
BEGIN
join tableC c a.indexed = c.indexed
END
It didn't seem to work for me. But I am wondering if this is even possible and/or if this even makes sense to do.
Thanks.
No, it is not possible. You can only accomplish this through the use of dynamic SQL.
The Curse and Blessings of Dynamic SQL
An Intro to Dynamic SQL
I would not advise using Dynamic SQL, there are most likely better ways to perform this operation but you would have to provide more info.
You can achieve something like it if you have a left outer join
Consider
declare #param bit = 1
select a.*, b.*, c.* from a
inner join b on a.id = b.a_id
left outer join c on b.id = c.b_id and #param = 1
This will return all columns from a, b, c.
Now try with
declare #param bit = 0
This will return all columns from a and b, and nulls for columns of c.
It won't work if both joins are inner.
No this is not possible. Your best bet would probably be to select from both tables and only include the data your care about. If you provide an example of what you are trying to do I can provide a better answer.
Attempt at an example:
SELECT t1.id, COALESCE(t2.name, t3.name)
FROM Table1 as t1
LEFT JOIN Table2 as t2
ON t1.id = t2.id
LEFT JOIN Table2 as t3
ON t1.id = t3.id
While what you proposed is not possible, you can play with your where conditions:
from tableA a
inner join tableB b ON a.indexed = c.indexed
left join tableC c ON a.indexed = c.indexed AND 1 = CASE #Param WHEN 'Y' THEN 1 ELSE 0 END
More performant would be to just doing a big
IF #Param='Y' THEN
from tableA a
inner join tableB b ON a.indexed = c.indexed
ELSE
from tableA a
inner join tableB b ON a.indexed = c.indexed
left join tableC c ON a.indexed = c.indexed
You haven't revealed you SELECT clause. The essence of what you want is as follows:
SELECT indexed
FROM tableA
INTERSECT
SELECT indexed
FROM tableB
INTERSECT
SELECT indexed
FROM tableC
WHERE #Param = 'Y'
Then use this table expression as dictated by your SELECT clause e.g. say you only want to project tableA:
WITH T
AS
(
SELECT indexed
FROM tableA
INTERSECT
SELECT indexed
FROM tableB
INTERSECT
SELECT indexed
FROM tableC
WHERE #Param = 'Y'
)
SELECT *
FROM tableA
WHERE indexed IN ( SELECT indexed FROM T );

SQL style question: INNER JOIN in FROM clause or WHERE clause?

If you are going to join multiple tables in a SQL query, where do you think is a better place to put the join statement: in the FROM clause or the WHERE clause?
If you are going to do it in the FROM clause, how do you format it so that it is clear and readable? (I'm talking about indents, newlines, whitespace in general.)
Are there any advantages/disadvantages to each?
I tend to use the FROM clause, or rather the JOIN clause itself, indenting like this (and using aliases):
SELECT t1.field1, t2.field2, t3.field3
FROM table1 t1
INNER JOIN table2 t2
ON t1.id1 = t2.id1
INNER JOIN table3 t3
ON t1.id1 = t3.id3
This keeps the join condition close to where the join is made. I find it easier to understand this way then trying to look through the WHERE clause to figure out what exactly is joined how.
When making OUTER JOINs (ANSI-89 or ANSI-92), filtration location matters because criteria specified in the ON clause is applied before the JOIN is made. Criteria against an OUTER JOINed table provided in the WHERE clause is applied after the JOIN is made. This can produce very different result sets.
In comparison, it doesn't matter for INNER JOINs if the criteria is provided in the ON or WHERE clauses -- the result will be the same. That said, I strive to keep the WHERE clause clean -- anything related to JOINed tables will be in their respective ON clause. Saves hunting through the WHERE clause, which is why ANSI-92 syntax is more readable.
I prefer the FROM clause if for no other reason that it distinguishes between filtering results (from a Cartesian product) merely between foreign key relationships and between a logical restriction. For example:
SELECT * FROM Products P JOIN ProductPricing PP ON P.Id = PP.ProductId
WHERE PP.Price > 10
As opposed to
SELECT * FROM Products P, ProductPricing PP
WHERE P.Id = PP.ProductID AND Price > 10
I can look at the first one and instantly know that the only logical restriction I'm placing is the price, as opposed to the implicit machinery of joining tables together on the relationship key.
I almost always use the ANSI 92 joins because it makes it clear that these conditions are for JOINING.
Typically I write it this way
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
sometimes I write it this way when it trivial
FROM
foo f
INNER JOIN bar b ON f.id = b.id
INNER JOIN baz b2 ON b.id = b2.id
When its not trivial I do the first way
e.g.
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
and b.type = 1
or
FROM
foo f
INNER JOIN (
SELECT max(date) date, id
FROM foo
GROUP BY
id) lastF
ON f.id = lastF.id
and f.date = lastF.Date
Or really the weird (not sure if I got the parens correctly but its supposed to be an LEFT join to table bar but bar needs an inner join to baz)
FROM
foo f
LEFT JOIN (bar b
INNER JOIN baz b2
ON b.id = b2.id
)ON f.id = b.id
You should put joins in Join clauses which means the From clause. A different question could be had about where to put filtering statements.
With respect to indenting, there are many styles. My preference is to indent related joins and keep main clauses like Select, From, Where, Group By, Having and Order By indented at the same level. In addition, I put each of these main attributes and the first line of an On clause on its own line.
Select ..
From Table1
Join Table2
On Table2.FK = Table1.PK
And Table2.OtherCol = '12345'
And Table2.OtherCol2 = 9876
Left Join (Table3
Join Table4
On Table4.FK = Table3.PK)
On Table3.FK = Table2.PK
Where ...
Group By ...
Having ...
Order By ...
Use the FROM clause to be compliant with ANSI-92 standards.
This:
select *
from a
inner join b
on a.id = b.id
where a.SomeColumn = 'x'
Not this:
select *
from a, b
where a.id = b.id
and a.SomeColumn = 'x'
I definitely always do my JOINS (of whatever type) in my FROM clause.
The way I indent them is this:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
In fact, I'll generally go a step farther and move as much of my constraining logic from the WHERE clause to the FROM clause, because this (at least in MS SQL) front-loads the constraint, meaning that it reduces the size of the recordset sooner in the query construction (I've seen documentation that contradicts this, but my execution plans are invariably more efficient when I do it this way).
For example, if I wanted to only select things in the above query where t3.id = 3, you could but that in the WHERE clause, or you could do it this way:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
AND
t3.id = 3
I personally find queries laid out in this way to be very readable and maintainable, but this is certainly a matter of personal preference, so YMMV.
Regardless, I hope this helps.
ANSI joins. I omit any optional keywords from the SQL as they only add noise to the equation. There's no such thing as a left inner join, is there? And by default, a simple join is an inner join, so there's no particular point to saying 'inner join'.
Then I column align things as much as possible.
The point being that a large complex SQL query can be very difficult to comprehend, so the more order that is imposed on it to make it more readable, the better. Any body looking at the query to fix, modify or tune it, needs to be able to answer a few things off right off the bat:
what tables/views are involved in the query?
what are the criteria for each join? What's the cardinality of each join?
what/how many columns are returned by the query
I like to write my queries so they look something like this:
select PatientID = rpt.ipatientid ,
EventDate = d.dEvent ,
Side = d.cSide ,
OutsideHistoryDate = convert(nchar, d.devent,112) ,
Outcome = p.cOvrClass ,
ProcedureType = cat.ctype ,
ProcedureCategoryMajor = cat.cmajor ,
ProcedureCategoryMinor = cat.cminor
from dbo.procrpt rpt
join dbo.procd d on d.iprocrptid = rpt.iprocrptid
join dbo.proclu lu on lu.iprocluid = d.iprocluid
join dbo.pathlgy p on p.iProcID = d.iprocid
left join dbo.proccat cat on cat.iproccatid = lu.iproccatid
where procrpt.ipatientid = #iPatientID

is there a way to do multiple left outer joins in oracle?

Why won't this work in Oracle?
Is there a way to make this work?
FROM table1 a,
table2 b,
table3 c
WHERE a.some_id = '10'
AND a.other_id (+)= b.other_id
AND a.other_id (+)= c.other_id
I want table1 to be left outer joined on multiple tables...
If I try to change it to use an ANSI join, I get compilation errors. I did the following:
FROM table2 b, table3 c
LEFT JOIN table1 a ON a.other_id = b.other_id and a.other_id = c.other_id
OK, looking at the examples from the Oracle docs, my recollection of the syntax was correct, so I'm turning my comment into an answer. Assuming that your goal is a left outer join where A is the base table, and you join matching rows from B and C, rewrite your query as follows (note that I'm just changing the prefixes; I like to have the source rowset on the right).
FROM table1 a,
table2 b,
table3 c
WHERE a.some_id = '10'
AND b.other_id (+)= a.other_id
AND c.other_id (+)= a.other_id
If that's not what you're trying to do, then the query is borked: you're doing a cartesian join of B and C, and then attempting an outer join from that partial result to A, with an additional predicate on A. Which doesn't make a lot of sense.
use ansi joins. They are way clearer IMO. BUt for some reason they don't work with materialized views...
You can do something like this.
FROM table1 a, table2 b, table3 c
WHERE a.some_id = '10'
AND a.other_id = b.other_id(+)
AND a.other_id = c.other_id(+)
I wanted to address separately this part of your question:
If I try to change it to ANSI join I get compilation errors. I did the following:
FROM table2 b, table3 c
LEFT JOIN table1 a ON a.other_id = b.other_id and a.other_id = c.other_id
In an ANSI join, at least in Oracle, you are operating on exactly two row sources. The LEFT JOIN operator in your example has table3 and table1 as its operands; so you cannot reference "b.otherid" in the ON clause. You need a new join operator for each additional table.
I believe what you are trying to do is outer join table 2 and table 3 to table 1. So what you should be doing is this:
FROM table1 a LEFT JOIN table2 b ON b.other_id = a.other_id
LEFT JOIN table3 c ON c.other_id = a.other_id
or Henry Gao's query if you want to use Oracle-specific syntax.
In oracle you cannot outer join the same table to more than one other table. You can create views that have joins in them, then join to that view. As side note, you also cannot outer join to an sub select, so that is not an option here either.
You could off course try the following (Table b and c being the BASE)
FROM (SELECT other_id FROM table2
UNION
SELECT other_id FROM table3) b
LEFT JOIN table1 a b.other_id = a.other_id
But then again I am an Oracle Nono