Performance comparison with postgresql : left join vs union all - sql

I want to know what is the best and the fastest solution between a "left outer join" and an "union all".
The database is a PostgreSQL.
Query with an UNION ALL :
SELECT * FROM element, user WHERE elm_usr_id = usr_id
UNION ALL
SELECT * FROM element WHERE elm_usr_id ISNULL;
Query with a LEFT OUTER JOIN :
SELECT * FROM element LEFT OUTER JOIN user ON elm_usr_id = usr_id;

Your two queries may not produce the same result.
Your query with UNION ALL returns rows that matches plus rows that not matches because of a null value in elm_usr_id.
While the query with LEFT JOIN (same as LEFT OUTER JOIN) returns rows that matches plus rows that not matches because of any not corresponding value.
Regarding to this, the query with LEFT JOIN is more secure if you expect to see all rows.
Back to your original question, the query with LEFT JOIN is the best on for taking advantage of indexes. For example, if you'd like to have a sorted result, then the UNION query will be far slowest. Or if your query is a subquery in a main query, then the UNION will prevent any possible exploitation of table [element] indexes. So it will be slow to perform a JOIN or WHERE of such a subquery.

I would suggest LEFT OUTER JOIN over union all in this particular scenario,
as in union all you have to read the tables twice, whereas in LEFT OUTER JOIN only once

Probably the LEFT JOIN, but you can see the query plan by running EXPLAIN ANALYSE SELECT.... The UNION ALL form might be clearer if you were modifying columns based on the null-ness of elm_usr_id but you could always use CASE to do column modifications with a LEFT JOIN.

Related

Terminology for when a join propagates out additional rows

When joining between two tables/queries:
with
cte1 (id) as (
select 1 from dual),
cte2 (id) as (
select 1 from dual union all
select 1 from dual)
select
cte1.id as cte1_id,
cte2.id as cte2_id
from
cte1
left join
cte2
on cte1.id = cte2.id
CTE1_ID CTE2_ID
1 1
1 1
Unsurprisingly, that join propagates out additional rows. The query on the left side of the join only had one row. But the resultset has two rows due to the join.
I suspect “propagate” isn’t quite the right word for describing that scenario.
What’s the proper term?
For example, when talking to people who are new to SQL, I often say, “Be careful with that join. It looks like you’re accidentally propagating out additional rows, since the join is 1:many.”
In this example you are not propagating rows (at least in my understanding anyway). You have two rows in the table on the right side of the join and you have two rows in the result.
However, if you had this:
WITH cte AS
(
SELECT 1 AS id FROM dual
UNION ALL
SELECT 1 FROM dual
)
SELECT x.id, y.id
FROM cte x
INNER JOIN cte y ON x.id = y.id;
You would start with 2 rows and the query would return 4, because the join is partial. To me, this is propagating data.
When every row from one side of the join is joined with every row on the other, the term you are looking for is a "cartesian product", which is achieved in SQL using a "cross join" or, in cases where the join is not unique but is limited partially, you could use "partial cartesian product" (though I don't recommend it) or more commonly a "partial cross-join". I think the latter is more likely to be readily appreciated by SQL developers.
In either case, there are times where both can be appropriate but a lot of the time they are the result of an error in a join clause.
What’s the proper term?
"Cartesian Product" could be one term you can use.
I.e. "Be careful of that join. It looks like you are accidentally returning the cartesian product of the two tables."
A CROSS JOIN will return the cartesian product of the two joined tables; it is also called a "Cartesian Join".
An INNER JOIN will return the cartesian product of the two joined tables that is filtered by some relationship (the join condition(s)) between columns of the two tables; it is also called an "Equi Join".
An OUTER JOIN is similar to the INNER JOIN but will also return the non-matched rows on one (for LEFT or RIGHT joins) or both (for FULL joins) sides of the join condition.
why a LEFT JOIN an INNER JOIN would do the same!
And no it doesn't propagate, you have in cte2 2 id's with 1 that is what UNION ALL actually amkes so when you join both tables, with the same id you will receive 2 rows as joined result set.
A Left Join also takes all rows of the left tables and troes to join in your case by the id and if it didn't find any companion, it adds the row with the right table columns as NULL.
So no wonders and no miracles, simple SQL

There are a few questions about the select statement

I have a few questions about the select statement.
First of all, I have normalized 15 tables for this select query.
The problem is invisible because there is not much data right now.
However, since I try to process many tables in one select query, it seems to cause problems later.
So I want to add a few more select statements to divide the tables to search, but I want to know how different it is from doing it at once.
Secondly, if I use join, I will use outer join. If I join multiple tables with outer join, I'm not sure how to use left outer join and right outer join.
The currently created select query refers to 8 tables and one join is linked.
That is, the remaining rest of the tables have obtained data in subqueries and the remaining eight tables are likely to use join.
I would appreciate it if you could let me know the direction of the multiple outer joins.
Let me briefly show you some of the current select queries.
select
a.cal1,a.cal2,a.cal3,...,
(select b.cal1 from b
where a.cal4=b.cal2)
as "bcals",
(select c.cal1 from c
where a.cal5=c.cal2)
as "ccals",
....,
(select e.cal1 from e
where a.caln=e.cal2)
as "ecals",
(select sum(extract(year from age(f.endday,f.startday))
from f
where e.cal1=a.cal1)
as "fcals",
g.cal1,g.cal2,g.cal3,...,
(select h.cal1 from h
where g.cal4=h.cal2)
as "hcals"
from a left outer join g on a.cal1=g.cal5
where a.cal1=?;
Result:
a.cal1|a.cal2|a.cal3|...|hcals
var1 |var2 |var3 |...|varn
After this, I wonder how to join the rest of the tables.
To sum up
If there are many tables that need to be included in a select query statement, what is the difference between performance and performance when this complex query is divided into multiple select statements?
If we write inside a select statement, how should outer join be?
Is there a problem with the query?
Actually your code is correct, but it looks very complex. People will find it difficult to understand it. Using joins you can minimize the lines of code and also make it more readable.
SELECT
TBL1.AMOUNT T1,
TBL2.AMOUNT T2,
TBL3.AMOUNT T3
FROM TBL1
LEFT JOIN TBL2 ON TBL2.ID = TBL1.ID
LEFT JOIN TBL3 ON TBL3.ID = TBL1.ID
In the above code , there are three tables, and two joins. One can easily understand and debug/make changes. Please try this for your code.

SQL: Except Join on multiple queries

Getting a bit stuck trying to build this query. (SQL SERVER)
I'm trying to join two tables on similar rows, but then stack the unique rows from both table 1 and table 2 on the result set. I was first shooting for a full outer join, but it leaves my key fields blank when the data comes from only one of the tables.
Example: Full Outer Join
Here's what I would like for the query to be able to do:
Essentially, I would like to have a result table where the key fields (Part and Operation) are all returned in two columns (so like a union), but the Estimated and Actual Rate columns returned side by side where there is a matching row between table 1 and table 2.
I've also been trying to inner join the two tables to make a subquery, then using that inner join for except clause on each of the tables, then stacking the original inner join with the two except unions.
Current Attempt: One Join, Two Excepts, Two Unions
UPDATE: I got the current attempt to return values! It's a bit complicated though, Appreciate any advice or feedback though! Great answers below thanks, I will need to do some comparisons
Thanks
SELECT ISNULL(t1.part,t2.part) AS Part,
ISNULL(t1.operation,t2.operation) AS Operation,
ISNULL('Estimated Rate',0) AS 'Estimated Rate',
ISNULL('Actual Rate',0) AS 'Actual Rate'
FROM table1 t1
FULL OUTER JOIN table2 t2
ON t1.part = t2.part
AND t1.operation = t2.operation
I would do this as a union all and group by:
select part, operation,
sum(estimatedrate) as estimatedrate, sum(actualrate) as actualrate
from ((select part, operation, estimatedrate, 0 as actualrate
from table1
) union all
(select part, operation, 0 as estimatedrate, 0 actualrate
from table1
)
) er
group by part, operation;

INNER JOIN with complex condition dramatically increases the execution time

I have 2 tables with several identical fields needed to be linked in JOIN condition. E.g. in each table there are fields: P1, P2. I want to write the following join query:
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
OR Table1.P2 = Table2.P2
OR Table1.P1 = Table2.P2
OR Table1.P2 = Table2.P1
In the case I have huge tables this request is executing a lot of time.
I tried to test how long will be the request of a query with one condition only. First, I have modified the tables in such way all data from P2 & P1 where copied as new rows into Table1 & Table2. So my query is simple:
SELECT ... FROM Table1 INNER JOIN Table2 ON Table1.P = Table2.P
The result was more then surprised: the execution time from many hours (the 1st case) was reduced to 2-3 seconds!
Why is it so different? Does it mean the complex conditions are always reduce performance? How can I improve the issue? May be P1,P2 indexing will help? I want to remain the 1st DB schema and not to move to one field P.
The reason the queries are different is because of the join strategies being used by the optimizer. There are basically four ways that two tables can be joined:
"Hash join": Creates a hash table on one of the tables which it uses to look up the values in the second.
"Merge join": Sorts both tables on the key and then readsthe results sequentially for the join.
"Index lookup": Uses an index to look up values in one table.
"Nested Loop": Compars each value in each table to all the values in the other table.
(And there are variations on these, such as using an index instead of a table, working with partitions, and handling multiple processors.) Unfortunately, in SQL Server Management Studio both (3) and (4) are shown as nested loop joins. If you look more closely, you can tell the difference from the parameters in the node.
In any case, your original join is one of the first three -- and it goes fast. These joins can basically only be used on "equi-joins". That is, when the condition joining the two tables includes an equality operator.
When you switch from a single equality to an "in" or set of "or" conditions, the join condition has changed from an equijoin to a non-equijoin. My observation is that SQL Server does a lousy job of optimization in this case (and, to be fair, I think other databases do pretty much the same thing). Your performance hit is the hit of going from a good join algorithm to the nested loops algorithm.
Without testing, I might suggest some of the following strategies.
Build an index on P1 and P2 in both tables. SQL Server might use the index even for a non-equijoin.
Use the union query suggested in another solution. Each query should be correctly optimized.
Assuming these are 1-1 joins, you can also do this as a set of multiple joins:
from table1 t1 left outer join
table2 t2_11
on t1.p1 = t2_11.p1 left outer join
table2 t2_12
on t1.p1 = t2_12.p2 left outer join
table2 t2_21
on t1.p2 = t2_21.p2 left outer join
table2 t2_22
on t1.p2 = t2_22.p2
And then use case/coalesce logic in the SELECT to get the value that you actually want. Although this may look more complicated, it should be quite efficient.
you can use 4 query and Union there result
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P2
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P2
Does using CTEs help performance?
;WITH Table1_cte
AS
(
SELECT
...
[P] = P1
FROM Table1
UNION
SELECT
...
[P] = P2
FROM Table1
)
, Table2_cte
AS
(
SELECT
...
[P] = P1
FROM Table2
UNION
SELECT
...
[P] = P2
FROM Table2
)
SELECT ... FROM Table1_cte x
INNER JOIN
Table2_cte y
ON x.P = y.P
I suspect, as far as the processor is concerned, the above is just different syntax for the same complex conditions.

How to do a full outer join without having full outer join available

Last week I was surprised to find out that sybase 12 doesn't support full outer joins.
But it occurred to me that a full outer join should be the same as a left outer join unioned with a right outer join of the same sql.
Can anybody think of a reason this would not hold true?
UNION ALL the left join with the right join, but limit the right join to only rows that do not exist in the base table (return null on the join when they would not be null in the table if they existed).
For this code you will need to create two tables t1 and t2. t1 should have one column named c1 with five rows containing the values 1-5. t2 should also have a c1 column with five rows containing the values 2-6.
Full Outer Join:
select * from t1 full outer join t2 on t1.c1=t2.c1 order by 1, 2;
Full Outer Join Equivalent:
select t1.c1, t2.c1 from t1 left join t2 on t1.c1=t2.c1
union all
select t1.c1, t2.c1 from t1 right join t2 on t1.c1=t2.c1
where t1.c1 is null
order by 1, 2;
Note the where clause on the right joined select that limits the results to only those that would not be duplicates.
UNION-ing two OUTER JOIN statements should result in duplicate rows representing the data you'd get from an INNER JOIN. You'd have to probably do a SELECT DISTINCT on the data set produced by the UNION. Generally if you have to use a SELECT DISTINCT that means it's not a well-designed query (or so I've heard).
If you union them with UNION ALL, you'll get duplicates. If you just use UNION without the ALL, it will filter duplicates and therefore be equivalent to a full join, but the query will also be a lot more expensive because it has to perform a distinct sort.
Well first, I don't know why you are using 12.x. It was EndOfLifed on 31 Dec 2009, after having been notified on 3 Apr 2007. 15.0.2 (first solid version) came out in Jan 2009. 15.5 is much better and was available 02 Dec 2009, so you are two major releases, and over at least 13 months, out of date.
ASE 12.5.4 has the new Join syntax. (you have not specified, you may be on 12.5.0.3, the release prior to that).
DB2 and Sybase did not implement FULL OUTER JOIN, for precisely the reason you have identified: it is covered by LEFT ... UNION ... RIGHT without ALL. It is not a case of "not supporting" a FOJ; it is a case of the keyword is missing.
And then you have the issue that Sybase and DB2 types would generally never use outer joins let alone FOJs, because their databases tend to be more normalised, etc.
Last, there is perfectly ordinary SQL you can use in any version of Sybase that will supply the function of FOJ, and will be distinctly faster on 12.x; only marginally faster on 15.x. It is kind of like the RANK() function: quite unnecessary if you can write a Subquery.
The second reason it does not need FULL OUTER, as some of the low end engines do, is because the new optimiser is extremely fast, and the query is fully normalised. Ie. it performs the LEFT and the RIGHT in a single pass.
Depending on you SARGs and DataType mismatches, etc it may still have to sort-merge, but that too is streamed at all three levels: disk I/O subsystem; engine(s); and network handler. If your tables are partitioned, then it is additionally parallelised at that level.
If your server is not configured and your result set is very large, you may need to increase proc cache size and number of sort buffers. That's all.