PostgreSQL LEFT OUTER JOIN Conditionals not working - sql

This LEFT OUTER JOIN with several conditionals is not working, it's probably something obvious. It is returning the result of all distinct sid and not performing conditionals at all.
SELECT
count(distinct student_status.sid)
FROM studentcoursedb.student_status
LEFT OUTER JOIN studentcoursedb.student_status AS t0
ON t0.sid = student_status.sid
AND t0.term < student_status.term
AND student_status.major LIKE 'ABC%';
The result, 32684 is the count of total distinct sids, the same value returned by this query:
select count(distinct sid)
from studentcoursedb.student_status;

The two query
SELECT
count(distinct student_status.sid)
FROM studentcoursedb.student_status
LEFT OUTER JOIN studentcoursedb.student_status AS t0
ON t0.sid = student_status.sid
AND t0.term < student_status.term
AND student_status.major LIKE 'ABC%';
select count(distinct sid)
from studentcoursedb.student_status;
return the same number of rows correctly because
You are left joining (left join or left outer join is the same) the same table this mean that the resulting number of rows is ever the same number of the main table
If you want a subset matching you should use inner join (or other join relation)

You are counting a column from the left table that might have duplicate rows as a result of the LEFT JOIN, but certainly no filtered rows.

A LEFT OUTER JOIN keeps all rows in the first table along with matching rows in the second. Hence, it does not filter the first table. You are counting a column from the first table. So, the LEFT OUTER JOIN does not affect the distinct count.
If you want to filter rows, then use INNER JOIN instead. I would also move the conditions to the WHERE clause:
SELECT count(distinct ss.sid)
FROM studentcoursedb.student_status ss INNER JOIN
studentcoursedb.student_status ss2
ON ss2.sid = ss.sid
WHERE ss2.term < ss.term AND ss.major LIKE 'ABC%';
I should note that I don't think you need a self join. Have you considered:
select dense_rank(ss.term) over (order by term)
from studentcoursedb.student_status ss
where ss.major like 'ABC%';
Much simpler and should have better performance.

Related

Joined query producing more results compared to solo query

I am performing the following query which has an inner join against another table.
select count(myTable.name)
from sch2.sample_detail as myTable
inner join sch1.otherTable as otherTable on myTable.name = otherTable.name
where otherTable.is_valid = 1
and myTable.name IS NOT NULL;
This produces a count of 4912304.
The following is a query just on a single table (my table).
SELECT COUNT(myTable.name)
from sch2.sample_detail as myTable
where myTable.name IS NOT NULL;
This produces a count of 2864654.
But how is this possible? Both queries have the clause where myTable.name IS NOT NULL.
Shouldn't the second query produce same results or if not even more cos the second query doesn't have the otherTable.is_valid = 1 clause?
Why does the inner join produces a higher count of result?
Please advice if there is something I should amend in the 1st query, thanks.
Inner, left or cross join can duplicate rows. sch1.otherTable.name is not unique and this causing rows duplication because for each row in left table all corresponding rows from right table are being selected, this is normal join behavior.
To get duplicate names list use this query and decide how to remove duplicated rows: filter or distinct or filter by row_number, etc.
select count(*) cnt,
name
from sch1.otherTable
having count(*)>1
order by cnt desc;
If you need EXISTS (and do not need to select columns from otherTable), use left semi join.
Also subquery with distinct can be used to pre-aggregate name before join and filter:
select count(myTable.name)
from sch2.sample_detail as myTable
LEFT SEMI JOIN (select distinct name from sch1.otherTable otherTable where otherTable.is_valid = 1 ) as otherTable on myTable.name = otherTable.name
where myTable.name IS NOT NULL;

Why an 'ON' clause is required in a left outer join

As far as I understand in a left outer join between two tables (say a & b) all the rows of the table on the left side of the join are retrieved regardless of the values in the rows on the right table. Then why do we need an 'ON' clause specifying a condition, something like this:
select * from a LEFT OUTER JOIN b on a.some_column1 = b.some_column2;
Why is there a need for the statement "a.some_column1 = b.some_column2".
A left join would return all the rows from table a, and for each row the matching row in table b, if it exists - if it doesn't, nulls would be returned instead of b's columns. The on clause defines how this matching is done.
An on clause is required since you are "joining", and you need to tell which columns you want to join by. Otherwise you would use traditional from without any where condition to all possible row combinations. But you wanted a join, right?
Yeah, that's pretty much it is.
As far as I understand in a left outer join between two tables (say a & b) all the rows of the table on the left side of the join are retrieved regardless of the values in the rows on the right table.
That is correct in the sense that it says something about what left join on returns, but it isn't a definition of what it returns. left join on returns inner join on rows plus (union all) unmatched left table rows extended by nulls.
inner join on returns the rows of cross join that satisfy the on condition--which could be any condition on the columns. cross join returns every combination of a row from the left table & a row from the right table.
What do you expect outer join without on to mean? In standard SQL outer & inner join have to have an on. inner join on a true condition is the same as cross join. Which has no unmatched left table rows. So if you want outer join with no on to mean outer join on a true condition then, since there are no unmatched rows in the inner join on that condition, the result is also just cross join. (MySQL allows inner join to be used without an on, but it just means cross join.)

NOT IN converted to LEFT JOIN giving different result

please help on below query
select * from processed_h where c_type not in (select convert(int,n_index) from index_m where n_index <>'0') **-- 902 rows**
select * from processed_h where c_type not in (2001,2002,2003) **-- 902 rows**
select convert(int,n_index) from index_m where n_index <>'0' **--- 2001,2002,2003**
I tried to convert the not in to LEFT JOIN as below but it is giving me 40,000 rows returned what I am doing wrong
select A.* from processed_h A LEFT JOIN index_m B on A.c_type <> convert(int,B.n_index) and B.n_index <>'0' --40,000 + rows
A LEFT JOIN returns ALL rows from the "left-hand" table regardless of whether the condition matches or not, which is why you are getting the "extra" rows.
An INNER JOIN might give you the same number of rows, but if there are multiple matches in the "right-hand" table then you'll still get more rows than you expect.
If NOT IN gives you the expected results then I'd stick with that. You probably aren;t going to see significant improvements with a join. The only reason I would change to an INNER JOIN is if I needed columns from the joined table in my output.
For the equivalent of a NOT IN using a left join, you need to link the tables as though the results in the linked table should be IN the resultset, then select only those records where the outer joined table did not return a record - like so:
select A.* from processed_h A
LEFT JOIN index_m B on A.c_type = convert(int,B.n_index) and B.n_index <>'0'
WHERE B.n_index IS NULL
However, you might get better performance using a NOT EXISTS query instead:
select A.* from processed_h A
where not exists
(select 1 from index_m B where B.n_index <>'0' and A.c_type = convert(int,B.n_index) )

Where Exists query returning incorrect results

The inner query here returns values that only appear in one of the tables. The outer query is supposed to return a count of those. Instead, it returns the entire table, not just the NULL values.
select count(*) from tblicd
where exists
(
select i.icd_id
from tblicd i left outer join icd_jxn on icd_jxn.icd_id=i.icd_id
where icd_jxn.icd_id is null
)
The inner query
select i.icd_id
from tblicd i left outer join icd_jxn on icd_jxn.icd_id=i.icd_id
where icd_jxn.icd_id is null
works and does what I want. I'd like (using a sub query method like this) to use the outer query to just return the number of rows that the inner query returns.
You need to join the two (outer and inner) tblicd tables in the subquery:
and i.icd_id = tblicd.icd_id
(or whatever the id of the tblicd table is)
The query you posted doesn't make any sense. However, from your description, it sounds like you've got two tables and you're trying to find any IDs that don't exist in both tables. If that's correct, you should try something like this:
select count(*) as cnt
from table1 t1
full outer join
table2 t2
on t1.id = t2.id
where t1.id is null
or t2.id is null
This may not work in the database you're using, but since you didn't tell us that, we can't tailor the solution to fit your dialect of SQL.
Based on the revised question, you could simplify this a number of ways:
select count(*)
from tblicd
where not exists (select i.icd_id
from icd_jxn
where icd_jxn.icd_id = tblicd)
select count(tblicd.icd_id)
from tblicd
left join
icd_jxn
on tblicd.icd_id = icd_jxn.icd_id
where icd_jxn.icd_id is null
select count(tblicd.icd_id)
from tblicd
where icd_id not in (select icd_id
from icd_jxn)
Basically, there's no reason to select from tblicd twice.

Is inner join the same as equi-join?

Can you tell me if inner join and equi-join are the same or not ?
An 'inner join' is not the same as an 'equi-join' in general terms.
'equi-join' means joining tables using the equality operator or equivalent. I would still call an outer join an 'equi-join' if it only uses equality (others may disagree).
'inner join' is opposed to 'outer join' and determines how to join two sets when there is no matching value.
Simply put: an equi-join is a possible type of inner-joins
For a more in-depth explanation:
An inner-join is a join that returns only rows from joined tables where a certain condition is met. This condition may be of equality, which means we would have an equi-join; if the condition is not that of equality - which may be a non-equality, greater than, lesser than, between, etc. - we have a nonequi-join, called more precisely theta-join.
If we do not want such conditions to be necessarily met, we can have
outer joins (all rows from all tables returned), left join (all rows
from left table returned, only matching for right table), right join
(all rows from right table returned, only matching for left table).
The answer is NO.
An equi-join is used to match two columns from two tables using explicit operator =:
Example:
select *
from table T1, table2 T2
where T1.column_name1 = T2.column_name2
An inner join is used to get the cross product between two tables, combining all records from both tables. To get the right result you can use a equi-join or one natural join (column names between tables must be the same)
Using equi-join (explicit and implicit)
select *
from table T1 INNER JOIN table2 T2
on T1.column_name = T2.column_name
select *
from table T1, table2 T2
where T1.column_name = T2.column_name
Or Using natural join
select *
from table T1 NATURAL JOIN table2 T2
The answer is No,here is the short and simple for readers.
Inner join can have equality (=) and other operators (like <,>,<>) in the join condition.
Equi join only have equality (=) operator in the join condition.
Equi join can be an Inner join,Left Outer join, Right Outer join
If there has to made out a difference then ,I think here it is .I tested it with DB2.
In 'equi join'.you have to select the comparing column of the table being joined , in inner join it is not compulsory you do that . Example :-
Select k.id,k.name FROM customer k
inner join dealer on(
k.id =dealer.id
)
here the resulted rows are only two columns rows
id name
But I think in equi join you have to select the columns of other table too
Select k.id,k.name,d.id FROM customer k,dealer d
where
k.id =d.id
and this will result in rows with three columns , there is no way you cannot have the unwanted compared column of dealer here(even if you don't want it) , the rows will look like
id(from customer) name(from Customer) id(from dealer)
May be this is not true for your question.But it might be one of the major difference.
The answer is YES, But as a resultset. So here is an example.
Consider three tables:
orders(ord_no, purch_amt, ord_date, customer_id, salesman_id)
customer(customer_id,cust_name, city, grade, salesman_id)
salesman(salesman_id, name, city, commission)
Now if I have a query like this:
Find the details of an order.
Using INNER JOIN:
SELECT * FROM orders a INNER JOIN customer b ON a.customer_id=b.customer_id
INNER JOIN salesman c ON a.salesman_id=c.salesman_id;
Using EQUI JOIN:
SELECT * FROM orders a, customer b,salesman c where
a.customer_id=b.customer_id and a.salesman_id=c.salesman_id;
Execute both queries. You will get the same output.
Coming to your question There is no difference in output of equijoin and inner join. But there might be a difference in inner executions of both the types.