In SQL, a Join is actually an Intersection? And it is also a linkage or a "Sideway Union"? - sql

I always thought of a Join in SQL as some kind of linkage between two tables.
For example,
select e.name, d.name from employees e, departments d
where employees.deptID = departments.deptID
In this case, it is linking two tables, to show each employee with a department name instead of a department ID. And kind of like a "linkage" or "Union" sideway".
But, after learning about inner join vs outer join, it shows that a Join (Inner join) is actually an intersection.
For example, when one table has the ID 1, 2, 7, 8, while another table has the ID 7 and 8 only, the way we get the intersection is:
select * from t1, t2 where t1.ID = t2.ID
to get the two records of "7 and 8". So it is actually an intersection.
So we have the "Intersection" of 2 tables. Compare this with the "Union" operation on 2 tables. Can a Join be thought of as an "Intersection"? But what about the "linking" or "sideway union" aspect of it?

You're on the right track; the rows returned by an INNER JOIN are those that satisfy the join conditions. But this is like an intersection only because you're using equality in your join condition, applied to columns from each table.
Also be aware that INTERSECTION is already an SQL operation and it has another meaning -- and it's not the same as JOIN.
An SQL JOIN can produce a new type of row, which has all the columns from both joined tables. For example: col4, col5, and col6 don't exist in table A, but they do exist in the result of a join with table B:
SELECT a.col1, a.col2, a.col3, b.col4, b.col5, b.col6
FROM A INNER JOIN B ON a.col2=b.col5;
An SQL INTERSECTION returns rows that are common to two separate tables, which must already have the same columns.
SELECT col1, col2, col3 FROM A
INTERSECT
SELECT col1, col2, col3 FROM B;
This happens to produce the same result as the following join:
SELECT a.col1, a.col2, a.col3
FROM A INNER JOIN B ON a.col1=b.col1 AND a.col2=b.col2 AND a.col3=b.col3;
Not every brand of database supports the INTERSECTION operator.

A join 'links' or erm... joins the rows from two tables. I think that's what you mean by 'sideways union' although I personally think that is a terrible way to phrase it. But there are different types of joins that do slightly different things:
An inner join is indeed an intersection.
A full outer join is a union.
This page on Jeff Atwood's blog describes other possibilities.

An Outer Join - is not related to - Union or Union All.
For example, a 'null' would not occur as a result of Union or Union All operation, but it results from an Outer Join.

INNER JOIN treats two NULLs as two different values. So, if you join based on a nullable column, and if both tables have NULL values in that column, then INNER JOIN will ignore those rows.
Therefore, to correctly retrieve all common rows between two tables, INTERSECT should be used. INTERSECT treats two NULLs as the same value.
Example(SQLite):
Create two tables with nullable columns:
CREATE TABLE Table1 (id INT, firstName TEXT);
CREATE TABLE Table2 (id INT, firstName TEXT);
Insert NULL values:
INSERT INTO Table1 VALUES (1, NULL);
INSERT INTO Table2 VALUES (1, NULL);
Retrieve common rows using INNER JOIN (This shows no output):
SELECT * FROM Table1 INNER JOIN Table2 ON
Table1.id=Table2.id AND Table1.firstName=Table2.firstName;
Retrieve common rows using INTERSECT (This correctly shows the common row):
SELECT * FROM Table1 INTERSECT SELECT * FROM Table2;
Conclusion:
Even though, many times both INTERSECT and INNER JOIN can be used to get the same results, they are not the same and should be picked depending on the situation.

Related

How to compare two tables in Postgresql?

I have two identical tables:
A : id1, id2, qty, unit
B: id1, id2, qty, unit
The set of (id1,id2) is identifying each row and it can appear only once in each table.
I have 140 rows in table A and 141 rows in table B.
I would like to find all the keys (id1,id2) that are not appearing in both tables. There is 1 for sure but there can't be more (for example if each table has whole different data).
I wrote this query:
(TABLE a EXCEPT TABLE b)
UNION ALL
(TABLE b EXCEPT TABLE a) ;
But it's not working. It compares the whole table where I don't care if qty or unit are different, I only care about id1,id2.
use a full outer join:
select a.*,b.*
from a full outer join b
on a.id1=b.id1 and a.id2=b.id2
this show both tables side by side. with gaps where there is an unmatched row.
select a.*,b.*
from a full outer join b
on a.id1=b.id1 and a.id2=b.id2
where a.id1 is null or b.id1 is null;
that will only show unmatched rows.
or you can use not in
select * from a
where (id1,id2) not in
( select id1,id2 from b )
that will show rows from a not matched by b.
or the same result using a join
select a.*
from a left outer join b
on a.id1=b.id1 and a.id2=b.id2
where b.id1 is null
sometimes the join is faster than the "not in"
Here is an example of using EXCEPT to see what records are different. Reverse the select statements to see what is different. a except s / then s except a
SELECT
a.address_entrytype,
a.address_street,
a.address_city,
a.address_state,
a.address_postal_code,
a.company_id
FROM
prospects.address a
except
SELECT
s.address_entrytype,
s.address_street,
s.address_city,
s.address_state,
s.address_postal_code,
s.company_id
FROM
prospects.address_short s

Correct way to select from two tables in SQL Server with no common field to join on

Back in the old days, I used to write select statements like this:
SELECT
table1.columnA, table2.columnA
FROM
table1, table2
WHERE
table1.columnA = 'Some value'
However I was told that having comma separated table names in the "FROM" clause is not ANSI92 compatible. There should always be a JOIN statement.
This leads to my problem.... I want to do a comparison of data between two tables but there is no common field in both tables with which to create a join. If I use the 'legacy' method of comma separated table names in the FROM clause (see code example), then it works perfectly fine. I feel uncomfortable using this method if it is considered wrong or bad practice.
Anyone know what to do in this situation?
Extra Info:
Table1 contains a list of locations in 'geography' data type
Table2 contains a different list of 'geography' locations
I am writing select statement to compare the distances between the locations. As far I know you cant do a JOIN on a geography column??
You can (should) use CROSS JOIN. Following query will be equivalent to yours:
SELECT
table1.columnA
, table2.columnA
FROM table1
CROSS JOIN table2
WHERE table1.columnA = 'Some value'
or you can even use INNER JOIN with some always true conditon:
FROM table1
INNER JOIN table2 ON 1=1
Cross join will help to join multiple tables with no common fields.But be careful while joining as this join will give cartesian resultset of two tables.
QUERY:
SELECT
table1.columnA
, table2,columnA
FROM table1
CROSS JOIN table2
Alternative way to join on some condition that is always true like
SELECT
table1.columnA
, table2,columnA
FROM table1
INNER JOIN table2 ON 1=1
But this type of query should be avoided for performance as well as coding standards.
A suggestion - when using cross join please take care of the duplicate scenarios. For example in your case:
Table 1 may have >1 columns as part of primary keys(say table1_id,
id2, id3, table2_id)
Table 2 may have >1 columns as part of primary keys(say table2_id,
id3, id4)
since there are common keys between these two tables (i.e. foreign keys in one/other) - we will end up with duplicate results. hence using the following form is good:
WITH data_mined_table (col1, col2, col3, etc....) AS
SELECT DISTINCT col1, col2, col3, blabla
FROM table_1 (NOLOCK), table_2(NOLOCK))
SELECT * from data_mined WHERE data_mined_table.col1 = :my_param_value

Left join on dates all dates

I have several tables with dates that I'm trying to join to to make a large table where the data is grouped by date.
I'm accomplishing this right now by LEFT JOIN'ing to subselect's generated from the tables that I need to join to ( a lot of them are the same table with different where queries and involve SUM and COUNT so I think I have to use subselects ). The problem that I'm having is that if one of the dates doesn't existing in the first table then it doesn't show up in the table even if there are rows in subsequent tables that it's joined to with that date. I'm joining based upon DATE(datetime_column).
So it's like
SELECT date, col 1
FROM a
LEFT JOIN (SELECT date, col2 FROM a1) a2 ON DATE(a.date)=DATE(a2.date)
LEFT JOIN (SELECT date, col3 FROM a3) a4 ON DATE(a3.date)=DATE(a4.date)
Make sense? Probably not..
There are basically two ways to do so:
You can use a FULL OUTER JOIN
Full outer join
Conceptually, a full outer join combines the effect of applying both
left and right outer joins. Where records in the FULL OUTER JOINed
tables do not match, the result set will have NULL values for every
column of the table that lacks a matching row. For those records that
do match, a single row will be produced in the result set (containing
fields populated from both tables).
...
Some database systems do not support the full outer join functionality
directly, but they can emulate it through the use of an inner join and
UNION ALL selects of the "single table rows" from left and right
tables respectively. The same example can appear as follows:
SELECT employee.LastName, employee.DepartmentID, department.DepartmentName, department.DepartmentID
FROM employee
INNER JOIN department ON employee.DepartmentID = department.DepartmentID
UNION ALL
SELECT employee.LastName, employee.DepartmentID, CAST(NULL AS VARCHAR(20)), CAST(NULL AS INTEGER)
FROM employee
WHERE NOT EXISTS (SELECT * FROM department WHERE employee.DepartmentID = department.DepartmentID)
UNION ALL
SELECT CAST(NULL AS VARCHAR(20)), CAST(NULL AS INTEGER),
department.DepartmentName, department.DepartmentID
FROM department
WHERE NOT EXISTS (SELECT * FROM employee WHERE employee.DepartmentID = department.DepartmentID)
Other-ways you can make a master view, witch contains all the distinct keys of all the tables, to LEFT JOIN with all the tables.
select *
from (
SELECT date
FROM a
union
SELECT date
FROM a1
union
SELECT date
FROM a3
)
LEFT JOIN a using (date)
LEFT JOIN a1 using (date)
LEFT JOIN a3 using (date)
Sometime I prefer the second way to the FULL OUTER JOIN because FULL OUTER JOIN is not supported on many RDBMS and because there many of those who support it that do not optimize it well, Oracle's current version for example just threats a full outer join as the equivalent query showed in the citation, witch is very lossy for performances.
Try using the OUTER JOIN to fetch all the records from main table and only matching records from the sub/child table.
SELECT a.Col1, b.Col1 FROM a LEFT OUTER JOIN b ON a.Col2=b.Col2
Refer Join (SQL) for details on Joins.
You have another option for this, which is not using a join at all. You can bring the results together using unions and aggregations:
SELECT date, max(col1) as col1, max(col2) as col2, max(col3) as col3
FROM ((select date, col1, NULL as col2, NULL as col3 from a1) union all
(SELECT date, NULL, col2, NULL FROM a2) union all
(SELECT date, NULL, NULL, col3 FROM a3)
) t
group by date
Often the solution is the second one given by Alessandro (the first version is very cumbersome). One caveat. His solution pulls the dates from the data. Sometimes you want to generate the master list, perhaps from a calendar table or perhaps by generating the list of dates (the specifics for that depend entirely on the database).

Comparing two datasets SQL SSRS 2005

I have two datasets on two seperate servers. They both pull one column of information each.
I would like to build a report showing the values of the rows that only appear in one of the datasets.
From what I have read, it seems I would like to do this on the SQL side, not the reporting side; I am not sure how to do that.
If someone could shed some light on how that is possible, I would really appreciate it.
You can use the NOT EXISTS clause to get the differences between the two tables.
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
This will show the rows in Table1 that don't appear in Table2. You can reverse the table names to find the rows in Table2 that don't appear in Table1.
Edit: Made an edit to show what the case would be in the event of linked servers. Also, if you wanted to see all of the rows that are not shared in both tables at the same time, you can try something as in the below.
SELECT
Column, 'Table1' TableName
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
UNION
SELECT
Column, 'Table2' TableName
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
NOT EXISTS
(
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
Table1.Column = Table2.Column
)
You can also use a left join:
select a.* from tableA a
left join tableB b
on a.PrimaryKey = b.ForeignKey
where b.ForeignKey is null
This query will return all records from tableA that do not have corresponding records in tableB.
If you want rows that appear in exactly one data set and you have a matching key on each table, then you can use a full outer join:
select *
from table1 t1 full outer join
table2 t2
on t1.key = t2.key
where t1.key is null and t2.key is not null or
t1.key is not null and t2.key is null
The where condition chooses the rows where exactly one match.
The problem with this query, though, is that you get lots of columns with nulls. One way to fix this is by going through the columns one by one in the SELECT clause.
select coalesce(t1.key, t2.key) as key, . . .
Another way to solve this problem is to use a union with a window function. This version brings together all the rows and counts the number of times that key appears:
select t.*
from (select t.*, count(*) over (partition by key) as keycnt
from ((select 'Table1' as which, t.*
from table1 t
) union all
(select 'Table2' as which, t.*
from table2 t
)
) t
) t
where keycnt = 1
This has the additional column specifying which table the value comes from. It also has an extra column, keycnt, with the value 1. If you have a composite key, you would just replace with the list of columns specifying a match between the two tables.

Is inner join the same as equi-join?

Can you tell me if inner join and equi-join are the same or not ?
An 'inner join' is not the same as an 'equi-join' in general terms.
'equi-join' means joining tables using the equality operator or equivalent. I would still call an outer join an 'equi-join' if it only uses equality (others may disagree).
'inner join' is opposed to 'outer join' and determines how to join two sets when there is no matching value.
Simply put: an equi-join is a possible type of inner-joins
For a more in-depth explanation:
An inner-join is a join that returns only rows from joined tables where a certain condition is met. This condition may be of equality, which means we would have an equi-join; if the condition is not that of equality - which may be a non-equality, greater than, lesser than, between, etc. - we have a nonequi-join, called more precisely theta-join.
If we do not want such conditions to be necessarily met, we can have
outer joins (all rows from all tables returned), left join (all rows
from left table returned, only matching for right table), right join
(all rows from right table returned, only matching for left table).
The answer is NO.
An equi-join is used to match two columns from two tables using explicit operator =:
Example:
select *
from table T1, table2 T2
where T1.column_name1 = T2.column_name2
An inner join is used to get the cross product between two tables, combining all records from both tables. To get the right result you can use a equi-join or one natural join (column names between tables must be the same)
Using equi-join (explicit and implicit)
select *
from table T1 INNER JOIN table2 T2
on T1.column_name = T2.column_name
select *
from table T1, table2 T2
where T1.column_name = T2.column_name
Or Using natural join
select *
from table T1 NATURAL JOIN table2 T2
The answer is No,here is the short and simple for readers.
Inner join can have equality (=) and other operators (like <,>,<>) in the join condition.
Equi join only have equality (=) operator in the join condition.
Equi join can be an Inner join,Left Outer join, Right Outer join
If there has to made out a difference then ,I think here it is .I tested it with DB2.
In 'equi join'.you have to select the comparing column of the table being joined , in inner join it is not compulsory you do that . Example :-
Select k.id,k.name FROM customer k
inner join dealer on(
k.id =dealer.id
)
here the resulted rows are only two columns rows
id name
But I think in equi join you have to select the columns of other table too
Select k.id,k.name,d.id FROM customer k,dealer d
where
k.id =d.id
and this will result in rows with three columns , there is no way you cannot have the unwanted compared column of dealer here(even if you don't want it) , the rows will look like
id(from customer) name(from Customer) id(from dealer)
May be this is not true for your question.But it might be one of the major difference.
The answer is YES, But as a resultset. So here is an example.
Consider three tables:
orders(ord_no, purch_amt, ord_date, customer_id, salesman_id)
customer(customer_id,cust_name, city, grade, salesman_id)
salesman(salesman_id, name, city, commission)
Now if I have a query like this:
Find the details of an order.
Using INNER JOIN:
SELECT * FROM orders a INNER JOIN customer b ON a.customer_id=b.customer_id
INNER JOIN salesman c ON a.salesman_id=c.salesman_id;
Using EQUI JOIN:
SELECT * FROM orders a, customer b,salesman c where
a.customer_id=b.customer_id and a.salesman_id=c.salesman_id;
Execute both queries. You will get the same output.
Coming to your question There is no difference in output of equijoin and inner join. But there might be a difference in inner executions of both the types.