Oracle - MINUS-operator with different results than OUTER JOIN - sql

In Oracle the following MINUS SQL statement returns results, while the allegedly equivalent OUTER JOIN statement doesn't return any.
Results:
SELECT
/*+parallel (8)*/
pd.item_id
FROM MY_TABLE#DB_LINK_PROD_ENV
WHERE pd.valid_to='09.09.9999'
MINUS
SELECT
/*+parallel (8)*/
it.item_id
FROM MY_TABLE#DB_LINK_TEST_ENV
WHERE it.valid_to='09.09.9999' ;
No results:
SELECT
/*+parallel (8)*/
pd.item_id,
it.item_id
FROM MY_TABLE#DB_LINK_PROD_ENV
LEFT OUTER JOIN MY_TABLE#DB_LINK_TEST_ENV
ON pd.item_id = it.item_id
WHERE it.valid_to ='09.09.9999'
AND pd.valid_to ='09.09.9999'
AND it.item_id IS NULL;
Without knowing the data, what could be the reason?

In first query it is MINUS. Means it will show all item_id present in DB_LINK_PROD_ENV having valid_to='09.09.9999' but not present in DB_LINK_TEST_ENV having valid_to='09.09.9999'.
In second one it is LEFT JOIN with AND condition.
it.valid_to ='09.09.9999'
AND pd.valid_to ='09.09.9999'
So it is possible that there are records in DB_LINK_PROD_ENV having valid_to='09.09.9999' but NO any record in DB_LINK_TEST_ENV with valid_to='09.09.9999'.
So when you will perform MINUS in first query, it will show record present in DB_LINK_PROD_ENV. But in second query AND condition will fail to give you any record.

Related

Minus operation gives wrong answer

I am trying to get the result of the minus operation of join tables which means that I am finding unmatched records.
I tried:
SELECT count(*) FROM mp_v1 mp
left join cm_v1 sop
on mp.study_name=sop.study_name and
sop.site_id=sop.site_id
--where mp.study_name='1101'
MINUS
SELECT count(*) FROM iv_mpv1 mp
inner join cm sop
on mp.study_name=sop.study_name and
sop.site_id=sop.site_id
--where mp.study_name='1101'
output: the count of this gives me 171183251
but when I run the first query individually I get 171183251 for left outer join and 171070345 for inner join so the output needs to be 112906. I am not sure where my query is wrong. Could anyone please give your opinion.
If you want unmatched records you wouldn't use MINUS on the counts. The query would look more like:
SELECT COUNT(*)
FROM ((SELECT *
FROM mp_v1 mp LEFT JOIN
cm_v1 sop
USING (study_name, site_id)
) MINUS
(SELECT *
FROM iv_mpv1 mp LEFT JOIN
iv_cmv1 sop
USING (study_name, site_id)
)
) x;
Also note that MINUS removes duplicates, so if you have duplicates within each set of tables, then they only count as one row.
The SELECT * assumes that the tables have the same columns and compatible types -- which makes sense given the gist of the question. You may need to list the particular columns you care about.

How join two query by removing inner query name in MS Access

I have two tables. One table has floor number(tb_FloorNumber.FloorNumber. records :For example 1 to 15) and another table which has Floor number and User_Id column(tb_Emp_Master.FloorNumber, tb_Emp_Master.User_Id). I want to bring all the records from tb_FloorNumber and only the records from tb_Emp_Master with the condition (User_Id = "fat35108").
I know I can do this with two queries like this :
Query 1:
SELECT DISTINCT tb_Emp_Master.FloorNumber
FROM tb_Emp_Master
WHERE (((tb_Emp_Master.User_Id)="fat35108"));
Query2:
SELECT DISTINCT tb_FloorNumber.FloorNumber, Query1.FloorNumber
FROM tb_FloorNumber LEFT JOIN Query1 ON tb_FloorNumber.FloorNumber = Query1.FloorNumber;
But I want to write this query with sing query instead of using Query1 inside the Query 2
I have tried like this:
SELECT DISTINCT tb_FloorNumber.FloorNumber, tb_Emp_Master.FloorNumber
FROM tb_FloorNumber LEFT JOIN tb_Emp_Master ON tb_FloorNumber.FloorNumber = tb_Emp_Master.FloorNumber
WHERE (((tb_Emp_Master.User_Id)="fat35108"));
But it brings only one record (For instance 8)
Please help me how to write this
If you set the condition:
tb_Emp_Master.User_Id = "fat35108"
in the WHERE clause, then you actually get an INNER JOIN instead of a LEFT JOIN because you filter only the matched rows from tb_Emp_Master.
Use tb_Emp_Master in the LEFT JOIN instead of Query1 and set the condition in the ON clause:
SELECT DISTINCT
tb_FloorNumber.FloorNumber,
tb_Emp_Master.FloorNumber
FROM tb_FloorNumber LEFT JOIN tb_Emp_Master
ON tb_FloorNumber.FloorNumber = tb_Emp_Master.FloorNumber AND tb_Emp_Master.User_Id = "fat35108";
I don't know why you need DISTINCT so I use it too.

Impala SQL LEFT ANTI JOIN

Goal is to find the empid's for a given timerange that are present in LEFT table but not in RIGHT table.
I have the following two Impala queries which I ran and got different results?
QUERY 1: select count(dbonetable.empid), COUNT(DISTINCT dbtwotable.empid) from
(select distinct dbonetable.empid
from dbonedbtable dbonetable
WHERE (dbonetable.expiration_dt >= '2009-01-01' OR dbonetable.expiration_dt IS NULL) AND dbonetable.effective_dt <= '2019-01-01' AND dbonetable.empid IS NOT NULL) dbonetable
LEFT join dbtwodbtable dbtwotable ON dbonetable.empid = dbtwotable.empid
--43324489 43270569
QUERY 2: select count(*) from (
select distinct dbonetable.empid from dbonedbtable dbonetable
LEFT ANTI join dbtwodbtable dbtwotable ON dbonetable.empid = dbtwotable.empid
AND (dbonetable.expiration_dt >= '2009-01-01' OR dbonetable.expiration_dt IS NULL) AND dbonetable.effective_dt <= '2019-01-01' AND dbonetable.empid IS NOT NULL) tab
--19088973
--For LEFT ANTI JOIN, this clause returns those values from the left-hand table that have no matching value in the right-hand table.
To explain the Context,
Query 2: Trying to find all the empid's that are in dbonetable and are not in dbtwotable using LEFT ANTI JOIN which I learned from here:
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_joins.html
--For LEFT ANTI JOIN, this clause returns those values from the left-hand table that have no matching value in the right-hand table.
And in Query 1:
The dbOnetable calculated based on where clause and results from it are LEFT OUTER joined with dbtwotable, And on top of that result, I am doing a count(dbonetable.empid) and COUNT(DISTINCT dbtwotable.empid) which gave me a result as --43324489 43270569, which means 53,920.
My question either my Query 1 result should be 43324489 -43270569 = 53,920 or my Query 2 Result should be 19088973.
what could be missing here, is my Query 1 is incorrect? Or is my LEFT ANTI JOIN is misleading?
Thank you all in Advance.
It's different because you forgot specifying "where dbtwotable.empid is null" in the query 1
Additionally, your query 2 is logically different from query 1 because in query 1, you join only on equivalence of empid1 and empid2, while in query 2 your join has much more conditions, so the tables have much fewer common entries compared to query 1, and as a result, the final count is much larger.
If you make a join condition in query 2 the same as in query 1 and put everything else into where clause, you will get the same count that you got in query 1 (updated) which is 53920. That's the count you need

Why is LEFT JOIN deleting rows?

I have been using sql for a long time, but I am now working in Databricks and I am getting a very strange result. I have a table called block_durations with a set of ids (called block_ts), and I have another table called mergetable, which I want to left join to that table. Mergetable is indexed by acct_id and block_ts, so it has many different records for each block_ts. I want to keep the rows in block_durations that don't match, and if there are multiple matches in mergetable I want there to be multiple corresponding entries in the resulting join, as you would expect from a left join.
But this is not happening. In order to demonstrate this, I am showing the result of joining mergetable, after filtering for a single acct_id so that there is at most one match per block_ts.
select count(*) from mergetable where acct_id = '0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
16579
select count(*) from block_durations
82817
select count(*) from
(
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN mergetable mt
ON mt.block_ts = bd.block_ts
where acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
) countTable
16579
As you can see, even though there are >80000 records in block_durations, most of them are getting lost in the left join. Why is this happening? I thought the whole point of a left join is that the non-matching rows of the left table are kept. This is exactly the behavior I would expect from an inner join -- and indeed when I switch to an inner join nothing changes.
Could someone please help me figure out what's going on?
-Paul
All rows from left side of the join are preserved, but later on you run WHERE ... condition on that which removed rows not matching the condition.
Merge your WHERE condition into JOIN condition:
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN mergetable mt
ON mt.block_ts = bd.block_ts AND acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
You can also filter mergetable before you run JOIN on the results:
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN (SELECT * FROM mergetable WHERE acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98') mt
ON mt.block_ts = bd.block_ts

What's the difference between filtering in the WHERE clause compared to the ON clause?

I would like to know if there is any difference in using the WHERE clause or using the matching in the ON of the inner join.
The result in this case is the same.
First query:
with Catmin as
(
select categoryid, MIN(unitprice) as mn
from production.Products
group by categoryid
)
select p.productname, mn
from Catmin
inner join Production.Products p
on p.categoryid = Catmin.categoryid
and p.unitprice = Catmin.mn;
Second query:
with Catmin as
(
select categoryid, MIN(unitprice) as mn
from production.Products
group by categoryid
)
select p.productname, mn
from Catmin
inner join Production.Products p
on p.categoryid = Catmin.categoryid
where p.unitprice = Catmin.mn; // this is changed
Result both queries:
My answer may be a bit off-topic, but I would like to highlight a problem that may occur when you turn your INNER JOIN into an OUTER JOIN.
In this case, the most important difference between putting predicates (test conditions) on the ON or WHERE clauses is that you can turn LEFT or RIGHT OUTER JOINS into INNER JOINS without noticing it, if you put fields of the table to be left out in the WHERE clause.
For example, in a LEFT JOIN between tables A and B, if you include a condition that involves fields of B on the WHERE clause, there's a good chance that there will be no null rows returned from B in the result set. Effectively, and implicitly, you turned your LEFT JOIN into an INNER JOIN.
On the other hand, if you include the same test in the ON clause, null rows will continue to be returned.
For example, take the query below:
SELECT * FROM A
LEFT JOIN B
ON A.ID=B.ID
The query will also return rows from A that do not match any of B.
Take this second query:
SELECT * FROM A
LEFT JOIN B
WHERE A.ID=B.ID
This second query won't return any rows from A that don't match B, even though you think it will because you specified a LEFT JOIN. That's because the test A.ID=B.ID will leave out of the result set any rows with B.ID that are null.
That's why I favor putting predicates in the ON clause rather than in the WHERE clause.
The results are exactly same.
Using "ON" clause is more suggested due to increasing performance of the query.
Instead of requesting the data from tables then filtering, by using on clause, you first filter first data-set and then join the data to other tables. So, lesser data to match and faster result is given.
There is no difference between the above two queries outputs both of them result same.
When you are using On Clause the join operation joins only those rows that matches the codidtion specified on ON Clause
Where as in case of Where Clause, the join opeartion joins all the rows and then filters out based on where condidtion Specified
So, obviously On Clause is more effective and should be preferred over where condidtion