How does JOIN work exactly in SQL - sql

I know that joins work by combining two or more tables by their attributes, so if you have two tables that both have three columns and both have column INDEX, if you use table1 JOIN table2 you will get a new table with 5 columns, but what if you do not have a column that is shared by both table1 and table2? Can you still use JOIN or do you have to use TIMES?

Join is not a method for combining tables. It is a method to select records (and selected fields) from 2 or more tables where every table in the query must carry a field that can be matched to a field in another table in the query. The matched fields need not have the same name, but must carry the same type of data. Lacking this would be like trying to create meaning from joining a list of license plates of cars in NYC, with height data from lumberjacks in Washington state -- not meaningful.
Ex:)
Select h.name, h.home_address, h.home_phone, w.work_address,
w.department
from home h, work w
where h.employee_id = w.emp_id
As long as both columns: employee_id and emp_id carry the same information this query will work

In Microsoft Access, to get five rows from a three column table joined to a two column table, you'd use:
SELECT Table1.*, Table2.* FROM Table1 INNER JOIN Table2 ON Table1.Field1 = Table2.Field1;
You can query whatever you want, and join whatever you want, though.
If your one table is a list of people, and your other is a list of cars, and you want to see what people have names that are also models of cars, you can do:
SELECT Table1.Name, Table1.Age, Table2.Make, Table2.Year
FROM Table1 INNER JOIN Table2 ON Table1.Name = Table2.Model;
Only when Name is the same as Model will it show a record.
This is the same idea for joining tables in any relational DBMS I've used.

You are right you can join two tables even if they do not have shared column.
Join uses primary to prevent mistakes on inserting or deleting when user trying to insert record that does not has a parent one or some thing like this.
join methods has many types you can view them here:
http://dev.mysql.com/doc/refman/5.7/en/join.html
LEFT JOIN: select all records from first table, then selecting all records from second table that fulfilling the condition after ON clause.

you can't join the tables if they do not share a common column. If you can find a 3rd table that has common columns with table1 and table2 you can get them to join that way. so join table2 and tabl3 on a common column and than join table3 back to table1 on a common column.

Related

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated

How to drop one join key when joining two tables

I have two tables. Both have lot of columns. Now I have a common column called ID on which I would join.
Now since this variable ID is present in both the tables if I do simply this
select a.*,b.*
from table_a as a
left join table_b as b on a.id=b.id
This will give an error as id is duplicate (present in both the tables and getting included for both).
I don't want to write down separately each column of b in the select statement. I have lots of columns and that is a pain. Can I rename the ID column of b in the join statement itself similar to SAS data merge statements?
I am using Postgres.
Postgres would not give you an error for duplicate output column names, but some clients do. (Duplicate names are also not very useful.)
Either way, use the USING clause as join condition to fold the two join columns into one:
SELECT *
FROM tbl_a a
LEFT JOIN tbl_b b USING (id);
While you join the same table (self-join) there will be more duplicate column names. The query would make hardly any sense to begin with. This starts to make sense for different tables. Like you stated in your question to begin with: I have two tables ...
To avoid all duplicate column names, you have to list them in the SELECT clause explicitly - possibly dealing out column aliases to get both instances with different names.
Or you can use a NATURAL join - if that fits your unexplained use case:
SELECT *
FROM tbl_a a
NATURAL LEFT JOIN tbl_b b;
This joins on all columns that share the same name and folds those automatically - exactly the same as listing all common column names in a USING clause. You need to be aware of rules for possible NULL values ...
Details in the manual.

SQL Select - Fetching two different values from another table based on two different IDs

I have two tables
Table 1 has five columns
EmployeeID,
EmployeeCarModelID,
EmployeeCarModelName,
SpouseCarModelID,
SpouseCarModelName
Table 2 has two columns
CarModelID,
CarModelName
How would I construct a select statement which will pull through the CarModelName to both the EmployeeCarModelName and the SpouseCarModelName based on their respective IDs? I'm not sure I can use a JOIN statement to do this as we are looking at two different id columns within the same table.
You need two joins to do this. I think you want:
select t1.EmployeeId, t1.EmployeeCarModelID, t2emp.CarModelName as EmployeeCarModelName,
t1.SpouseCarModelID, t2sp.CarModelName as SpouseCarModelName
from table1 t1 left join
table2 t2emp
on t1.EmployeeCarModelID = t2emp.CarModelId left join
table2 t2sp
on t1.SpouseCarModelId = t2sp.CarModelId;

SQL Join for cell content, not column name

I read up on SQL Join but as far as I understand it, you can only join tables which have a column name in common.
I have information in two different tables, but the column name is different in each. I need to pull information on something which is only in one of the tables, but also need information from the other. So was looking to join/merge them.
Here is what I mean..
TABLE1:
http://postimg.org/image/hnd63c2f5/
The cell content 18599 in column from_pin_id also pertains to content in another table:
TABLE2:
http://postimg.org/image/apmu26l5z/
My question is how do I merge the two table details so that it recognizes 18599 is referring to the same thing, so that I can pull content on it from other columns in TABLE2?
I've looked through the codes on W3 but cannot find anything to what I need, as mentioned above, it seems to be just for joining tables with a common column:
SELECT column_name(s)
FROM table1
JOIN table2
ON table1.column_name=table2.column_name;
You can write as :
select * from table1
where from_pin_id in
(
select from_pin_id
from table1
intersect
select id
from table2
)
Intersect operator selects all elements that belong to both of the sets.
Change the table names and the columns that you select as needed.
SELECT table1.id, table1.owner_user_id, table1.from_pin_id, table2.board_id
FROM table1
JOIN table2 ON table1.from_pin_id = table2.id
GROUP BY id, owner_user_id, from_pin_id, board_id

Difference between "and" and "where" in joins

Whats the difference between
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
and table2.Id IN (2728)
and
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
where table2.Id IN (2728)
both return the same result and both have the same explain output
Firstly there is a semantic difference. When you have a join, you are saying that the relationship between the two tables is defined by that condition. So in your first example you are saying that the tables are related by cd.Company = table2.Name AND table2.Id IN (2728). When you use the WHERE clause, you are saying that the relationship is defined by cd.Company = table2.Name and that you only want the rows where the condition table2.Id IN (2728) applies. Even though these give the same answer, it means very different things to a programmer reading your code.
In this case, the WHERE clause is almost certainly what you mean so you should use it.
Secondly there is actually difference in the result in the case that you use a LEFT JOIN instead of an INNER JOIN. If you include the second condition as part of the join, you will still get a result row if the condition fails - you will get values from the left table and nulls for the right table. If you include the condition as part of the WHERE clause and that condition fails, you won't get the row at all.
Here is an example to demonstrate this.
Query 1 (WHERE):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
WHERE table2.Id IN (2728);
Result:
field1
200
Query 2 (AND):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
AND table2.Id IN (2728);
Result:
field1
100
200
Test data used:
CREATE TABLE table1 (Company NVARCHAR(100) NOT NULL, Field1 INT NOT NULL);
INSERT INTO table1 (Company, Field1) VALUES
('FooSoft', 100),
('BarSoft', 200);
CREATE TABLE table2 (Id INT NOT NULL, Name NVARCHAR(100) NOT NULL);
INSERT INTO table2 (Id, Name) VALUES
(2727, 'FooSoft'),
(2728, 'BarSoft');
SQL comes from relational algebra.
One way to look at the difference is that JOINs are operations on sets that can produce more records or less records in the result than you had in the original tables. On the other side WHERE will always restrict the number of results.
The rest of the text is extra explanation.
For overview of join types see article again.
When I said that the where condition will always restrict the results, you have to take into account that when we are talking about queries on two (or more) tables you have to somehow pair records from these tables even if there is no JOIN keyword.
So in SQL if the tables are simply separated by a comma, you are actually using a CROSS JOIN (cartesian product) which returns every row from one table for each row in the other.
And since this is a maximum number of combinations of rows from two tables then the results of any WHERE on cross joined tables can be expressed as a JOIN operation.
But hold, there are exceptions to this maximum when you introduce LEFT, RIGHT and FULL OUTER joins.
LEFT JOIN will join records from the left table on a given criteria with records from the right table, BUT if the join criteria, looking at a row from the left table is not satisfied for any records in the right table the LEFT JOIN will still return a record from the left table and in the columns that would come from the right table it will return NULLs (RIGHT JOIN works similarly but from the other side, FULL OUTER works like both at the same time).
Since the default cross join does NOT return those records you can not express these join criteria with WHERE condition and you are forced to use JOIN syntax (oracle was an exception to this with an extension to SQL standard and to = operator, but this was not accepted by other vendors nor the standard).
Also, joins usually, but not always, coincide with existing referential integrity and suggest relationships between entities, but I would not put as much weight into that since the where conditions can do the same (except in the before mentioned case) and to a good RDBMS it will not make a difference where you specify your criteria.
The join is used to reflect the entity relations
the where clause filters down results.
So the join clauses are 'static' (unless the entity relations change), while the where clauses are use-case specific.
There is no difference. "ON" is like a synonym for "WHERE", so t he second kind of reads like:
JOIN table2 WHERE cd.Company = table2.Name AND table2.Id IN (2728)
There is no difference when the query optimisation engine breaks it down to its relevant query operators.