Full outer join tables on one or other column using OR in ON clause - sql

I want to do a full outer join on 2 tables - Table_A and Table_B on 2 columns Unique_ID1 OR Unique_ID2 -as some rows may match on one and others on another and I have no way to determine this.
I tried this -
Select *
from Table_A
full outer join Table_B on Table_A.Unique_ID1 = Table_A.Unique_ID1
OR Table_A.Unique_ID2 = Table_A.Unique_ID2
While this gives me no error, the query runs forever. What is the best way to re-structure this to get the desired output?

I think this is the query you actually were intending to run:
SELECT *
FROM Table_A a
FULL OUTER JOIN Table_B b
ON a.Unique_ID1 = b.Unique_ID1 OR a.Unique_ID2 = b.Unique_ID2;
That being said, if you still have performance problems with the above, then adding indices might be in order here. The ON clause of your query is actually a bit tricky to index. You could rewrite the query as a union:
SELECT *
FROM Table_A a
FULL OUTER JOIN Table_B b ON a.Unique_ID1 = b.Unique_ID1
UNION
SELECT *
FROM Table_A a
FULL OUTER JOIN Table_B b ON a.Unique_ID2 = b.Unique_ID2;
Then add the following indices:
CREATE INDEX idx1 ON Table_A (Unique_ID1);
CREATE INDEX idx2 ON Table_A (Unique_ID2);
CREATE INDEX idx1 ON Table_B (Unique_ID1);
CREATE INDEX idx2 ON Table_B (Unique_ID2);
While my union version might not logically be identical to your current query, you can probably tweak it to make it so. And, it is likely to be sargable, with both halves of the union using an index.

Your condition always evaluated to true since you have the column from Table_A in both sides of the equality check, producing a full Cartesian product. Replace one side of the equality with Table_B, and you should be OK:
SELECT *
FROM Table_A
FULL OUTER JOIN Table_B ON Table_A.Unique_ID1 = Table_B.Unique_ID1 OR
-- Here ----------------------------------------------^
Table_A.Unique_ID2 = Table_B.Unique_ID2
-- And here ------------------------------------------^

Related

In SQL is there a way to use select * on a join?

Using Snowflake,have 2 tables, one with many columns and the other with a few, trying to select * on their join, get the following error:
SQL compilation error:duplicate column name
which makes sense because my joining columns are in both tables, could probably use select with columns names instead of *, but is there a way I could avoid that? or at least have the query infer the columns names dynamically from any table it gets?
I am quite sure snowflake will let you choose all from both halves of two+ tables via
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
what you will not be able to do is refer to the named of the columns in GROUP BY indirectly, thus this will not work
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY x
even though some databases know because you have JOIN ON a.x = b.x there is only one x, snowflake will not allow it (well it didn't last time I tried this)
but you can with the above use the alias name or the output column position thus both the following will work.
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY a.x
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY 1 -- assuming x is the first column
in general the * and a.* forms are super convenient, but are actually bad for performance.
when selecting you are now are risk of getting the columns back in a different order if the table has been recreated, thus making reading code unstable. Which also impacts VIEWs.
It also means all meta data for the table need to be loaded to know what the complete form of the data will be in. Where if you want x,y,z only and later a w was added to the table, the whole query plan can be compiled faster.
Lastly if you are selecting SELECT * FROM table in a sub-select and only a sub-set of those columns are needed the execution compiler doesn't need to prune these. And if all variables are attached to a correctly aliased table, if later a second table adds the same named column, naked columns are not later ambiguous. Which will only occur when that SQL is run, which might be an "annual report" which doesn't happen that often. wow, what a long use alias rant.
You can prefix the name of the column with the name of the table:
select table_a.id, table_b.name from table_a join table_b using (id)
The same works in combination with *:
select table_a.id, table_b.* from table_a join table_b using (id)
It works in "join" and "where" parts of the statement as well
select table_a.id, table_b.* from table_a join table_b
on table_a.id = table_b.id where table_b.name LIKE 'b%'
You can use table aliases to make the statement sorter:
select a.id, b.* from table_a a join table_b b
on a.id = b.id
Aliases could be applies on fields to use in subqueries, client software and (depending on the SQL server) in the other parts of the statements, for example 'order by':
select a.id as a_id, b.* from table_a a join table_b b
on a.id = b.id order by a_id
If you're after a result that includes all the distinct non-join columns from each table in the join with the join columns included in the output only once (given they will be identical for an inner-join) you can use NATURAL JOIN.
e.g.
select * from d1 natural inner join d2 order by id;
See examples: https://docs.snowflake.com/en/sql-reference/constructs/join.html#examples

Oracle SQL - Select not using index as expected

So I haven't used Oracle in more than 5 years and I'm out of practice. I've been on SQL Server all that time.
I'm looking at some of the existing queries and trying to improve them, but they're reacting really weirdly. According to the explain plan instead of going faster they're instead doing full table scans and not using the indexes.
In the original query, there is an equijoin done between two tables done in the where statement. We'll call them table A and B. I used an explain plan followed by SELECT * FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE')); and it tells me that Table A is queried by Local Index.
TABLE ACCESS BY LOCAL INDEX ROWID
SELECT A.*
FROM TableA A, TableB B
WHERE A.SecondaryID = B.ID;
I tried to change the query and join TableA with a new table (Table C). Table C is a subset of Table B with 700 records instead of 100K. However the explain plan tells me that Table A is now queried with a full lookup.
CREATE TableC
AS<br>
SELECT * FROM TableB WHERE Active='Y';
SELECT A.*
FROM TableA A, TableC C
WHERE A.SecondaryID = C.ID;
Next step, I kept the join between tables A & C, but used a hint to tell it to use the index on Table A. However it still does a full lookup.
SELECT /*+ INDEX (A_NDX01) */ A.*
FROM TableA A, TableC C
WHERE A.SecondaryID = C.ID;
So I tried to change from a join to a simple Select of table A and use an IN statement to compare to table C. Still a full table scan.
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC);
Lastly, I took the previous statement and changed the subselect to pull the top 1000 records, and it used the index. The odd thing is that there are only 700 records in Table C.
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC WHERE rownum <1000
)
I was wondering if someone could help me figure out what's happening?
My best guess is that since TableC is a new table, maybe the optimizer doesn't know how many records are in it and that's why it's it will only use the index if it knows that there are fewer than 1000 records?
I tried to run dbms_stats.gather_schema_stats on my schema though and it did not help.
Thank you for your help.
As a general rule Using an index will not necessarily make your query go faster ALWAYS.
Hints are directives to the optimizer to make use of the path, it doenst mean optimizer would choose to obey the hint directive. In this case, the optimizer would have considered that an index lookup on TableA is more expensive in the
SELECT A.*
FROM TableA A, TableB B
WHERE A.SecondaryID = B.ID;
SELECT /*+ INDEX (A_NDX01) */ A.*
FROM TableA A, TableC C
WHERE A.SecondaryID = C.ID;
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC);
Internally it might have converted all of these statements(IN) into a join which when considering the data in the tableA and tableC decided to make use of full table scan.
When you did the rownum condition, this plan conversion was not done. This is because view-merging will not work when it has the rownum in the query block.
I believe this is what is happening when you did
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC WHERE rownum <1000)
Have a look at the following link
Oracle. Preventing merge subquery and main query conditions

SQL Full Outer Join : Whats the difference in these queries?

Lets suppose I have a full outer join query written in these styles:
SELECT * FROM Table_A
FULL OUTER JOIN Table_B ON (Table_A.Col1 = Table_B.Col1 AND Table_B.iscurrent=1)
Versus
SELECT * FROM Table_A
FULL OUTER JOIN (Select * FROM Table_B Where iscurrent=1) AS Table_B
ON (Table_A.Col1 = Table_B.Col1)
Both are producing different results in my database (Azure SQL DB).
How come?
Why are they returning different results? Because they are different queries. FULL OUTER JOIN is very tricky. Let me explain.
The result set from the first query has rows from all the rows in both tables, even those where Table_B.iscurrent <> 1. If this is not true, then the corresponding columns will be NULL, but the row will be there.
The result set from the second query will have no rows were Table_B.iscurrent <> 1. These are filtered out before the FULL OUTER JOIN, so they are not among the rows being counted.
In general, I find that FULL OUTER JOIN is very rarely needed. I do use it, but quite rarely. Typically LEFT JOIN or UNION ALL does what I really want.
On the second only rows with Where Table_B.iscurrent = 1 are included.
In the first you have all the rows in Table_B but they just don't connect to Table_A if iscurrent <> 1.
SELECT *
FROM Table_A
FULL OUTER JOIN Table_B
ON Table_A.Col1 = Table_B.Col1
AND Table_B.iscurrent = 1
SELECT *
FROM Table_A
FULL OUTER JOIN ( Select *
FROM Table_B
Where iscurrent = 1
) AS Table_B
ON Table_A.Col1 = Table_B.Col1

Get rid off from matching record and add not equal data

I have following tables:
Table a:
Name
T1
T2
T3
T4
Table b:
Name
T1
T2
T3
T4
T5
T6
I need to select all from table a and add what is not in table a from table b, result below:
T1
T2
T3
T4
T5
T6
Thanks for help
If you want all unique names from both the tables, use UNION:
select name from table_a
union
select name from table_b;
Here is another way:
select ta.name from ta
union all
select tb.name from tb
left join ta
on tb.name = ta.name
where ta.name is null
I would do this with an anti-join (a NOT IN condition). As written below, it will not work correctly if NULL is possible in that column in table a (in that case, the anti-join should be written with a NOT EXISTS condition). I assume the column is NOT NULL.
An anti-join is faster than a join, because as soon as a value from table b is also found in a, the joining for that row of table b stops and processing moves on to the next row. In a join, the joining continues, there is no such short-circuiting.
Oto's solution uses a join rather than an anti-join. However, I believe the Oracle query optimizer recognizes, in this simple case, that an anti-join is sufficient, and it will rewrite the query to use an anti-join. This is something you can verify by running Explain Plan on both queries. With that said, in a similar but much more complicated problem, the optimizer may not be able to "see" this shortcut; this is why I believe it's best to write anti-joins (and semi-joins, where we use IN or EXISTS conditions) explicitly, rather than rely on the optimizer.
The query should be
select name from a
union all
select name from b where name not in ( select name from a );
Here's one way to do that:
Select distinct Name
from (
select Name from Table A
UNION ALL
select Name from Table B
)

SQL filter LEFT TABLE before left join

I have read a number of posts from SO and I understand the differences between filtering in the where clause and on clause. But most of those examples are filtering on the RIGHT table (when using left join). If I have a query such as below:
select * from tableA A left join tableB B on A.ID = B.ID and A.ID = 20
The return values are not what I expected. I would have thought it first filters the left table and fetches only rows with ID = 20 and then do a left join with tableB.
Of course, this should be technically the same as doing:
select * from tableA A left join table B on A.ID = B.ID where A.ID = 20
But I thought the performance would be better if you could filter the table before doing a join. Can someone enlighten me on how this SQL is processed and help me understand this thoroughly.
A left join follows a simple rule. It keeps all the rows in the first table. The values of columns depend on the on clause. If there is no match, then the corresponding table's columns are NULL -- whether the first or second table.
So, for this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID and A.ID = 20;
All the rows in A are in the result set, regardless of whether or not there is a match. When the id is not 20, then the rows and columns are still taken from A. However, the condition is false so the columns in B are NULL. This is a simple rule. It does not depend on whether the conditions are on the first table or the second table.
For this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID
where A.ID = 20;
The from clause keeps all the rows in A. But then the where clause has its effect. And it filters the rows so on only id 20s are in the result set.
When using a left join:
Filter conditions on the first table go in the where clause.
Filter conditions on subsequent tables go in the on clause.
Where you have from tablea, you could put a subquery like from (select x.* from tablea X where x.value=20) TA
Then refer to TA like you did tablea previously.
Likely the query optimizer would do this for you.
Oracle should have a way to show the query plan. Put "Explain plan" before the sql statement. Look at the plan both ways and see what it does.
In your first SQL statement, A.ID=20 is not being joined to anything technically. Joins are used to connect two separate tables together, with the ON statement joining columns by associating them as keys.
WHERE statements allow the filtering of data by reducing the number of rows returned only where that value can be found under that particular column.