Explanation of self-joins - sql

I don't understand the need for self-joins. Can someone please explain them to me?
A simple example would be very helpful.

You can view self-join as two identical tables. But in normalization, you cannot create two copies of the table so you just simulate having two tables with self-join.
Suppose you have two tables:
Table emp1
Id Name Boss_id
1 ABC 3
2 DEF 1
3 XYZ 2
Table emp2
Id Name Boss_id
1 ABC 3
2 DEF 1
3 XYZ 2
Now, if you want to get the name of each employee with his or her boss' names:
select c1.Name , c2.Name As Boss
from emp1 c1
inner join emp2 c2 on c1.Boss_id = c2.Id
Which will output the following table:
Name Boss
ABC XYZ
DEF ABC
XYZ DEF

It's quite common when you have a table that references itself. Example: an employee table where every employee can have a manager, and you want to list all employees and the name of their manager.
SELECT e.name, m.name
FROM employees e LEFT OUTER JOIN employees m
ON e.manager = m.id

A self join is a join of a table with itself.
A common use case is when the table stores entities (records) which have a hierarchical relationship between them. For example a table containing person information (Name, DOB, Address...) and including a column where the ID of the Father (and/or of the mother) is included. Then with a small query like
SELECT Child.ID, Child.Name, Child.PhoneNumber, Father.Name, Father.PhoneNumber
FROM myTableOfPersons As Child
LEFT OUTER JOIN myTableOfPersons As Father ON Child.FatherId = Father.ID
WHERE Child.City = 'Chicago' -- Or some other condition or none
we can get info about both child and father (and mother, with a second self join etc. and even grand parents etc...) in the same query.

Let's say you have a table users, set up like so:
user ID
user name
user's manager's ID
In this situation, if you wanted to pull out both the user's information and the manager's information in one query, you might do this:
SELECT users.user_id, users.user_name, managers.user_id AS manager_id, managers.user_name AS manager_name INNER JOIN users AS manager ON users.manager_id=manager.user_id

Imagine a table called Employee as described below. All employees have a manager which is also an employee (maybe except for the CEO, whose manager_id would be null)
Table (Employee):
int id,
varchar name,
int manager_id
You could then use the following select to find all employees and their managers:
select e1.name, e2.name as ManagerName
from Employee e1, Employee e2 where
where e1.manager_id = e2.id

They are useful if your table is self-referential. For example, for a table of pages, each page may have a next and previous link. These would be the IDs of other pages in the same table. If at some point you want to get a triple of successive pages, you'd do two self-joins on the next and previous columns with the same table's id column.

Without the ability for a table to reference itself, we'd have to create as many tables for hierarchy levels as the number of layers in the hierarchy. But since that functionality is available, you join the table to itself and sql treats it as two separate tables, so everything is stored nicely in one place.

Apart from the answers mentioned above (which are very well explained), I would like to add one example so that the use of Self Join can be easily shown.
Suppose you have a table named CUSTOMERS which has the following attributes:
CustomerID, CustomerName, ContactName, City, Country.
Now you want to list all those who are from the "same city" .
You will have to think of a replica of this table so that we can join them on the basis of CITY. The query below will clearly show what it means:
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS CustomerName2,
A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;

There are many correct answers here, but there is a variation that is equally correct. You can place your join conditions in the join statement instead of the WHERE clause.
SELECT e1.emp_id AS 'Emp_ID'
, e1.emp_name AS 'Emp_Name'
, e2.emp_id AS 'Manager_ID'
, e2.emp_name AS 'Manager_Name'
FROM Employee e1 RIGHT JOIN Employee e2 ON e1.emp_id = e2.emp_id
Keep in mind sometimes you want e1.manager_id > e2.id
The advantage to knowing both scenarios is sometimes you have a ton of WHERE or JOIN conditions and you want to place your self join conditions in the other clause to keep your code readable.
No one addressed what happens when an Employee does not have a manager. Huh? They are not included in the result set. What if you want to include employees that do not have managers but you don't want incorrect combinations returned?
Try this puppy;
SELECT e1.emp_id AS 'Emp_ID'
, e1.emp_name AS 'Emp_Name'
, e2.emp_id AS 'Manager_ID'
, e2.emp_name AS 'Manager_Name'
FROM Employee e1 LEFT JOIN Employee e2
ON e1.emp_id = e2.emp_id
AND e1.emp_name = e2.emp_name
AND e1.every_other_matching_column = e2.every_other_matching_column

Self-join is useful when you have to evaluate the data of the table with itself. Which means it'll correlate the rows from the same table.
Syntax: SELECT * FROM TABLE t1, TABLE t2 WHERE t1.columnName = t2.columnName
For example, we want to find the names of the employees whose Initial Designation equals to current designation. We can solve this using self join in following way.
SELECT NAME FROM Employee e1, Employee e2 WHERE e1.intialDesignationId = e2.currentDesignationId

One use case is checking for duplicate records in a database.
SELECT A.Id FROM My_Bookings A, My_Bookings B
WHERE A.Name = B.Name
AND A.Date = B.Date
AND A.Id != B.Id

SELF JOIN:
Joining a table by itself is called as Self Join.
We can perform operations on a single table.
When we use self join we should create alias names on a table otherwise we cannot implement self join.
When we create alias name on a table internally system is preparing virtual table on each alias name of a table.
We can create any number of alias names on a table but each alias name should be different.
Basic Rules of self join:
CASE-I: Comparing a single column values by itself with in the table
CASE-II: Comparing two different columns values to each other with in the table.
Example:
SELECT * from TEST;
ENAME
RICHARD
JOHN
MATHEW
BENNY
LOC
HYDRABAD
MUMBAI
HYDRABAD
CHENNAI
SELECT T1. ENAME, T1. LOC FROM TEST.T1, TEST T2 WHERE T1.LOC=T2.LOC AND T2.ENAME='RICHARD';

It's the database equivalent of a linked list/tree, where a row contains a reference in some capacity to another row.

Here is the exaplanation of self join in layman terms. Self join is not a different type of join. If you have understood other types of joins (Inner, Outer, and Cross Joins), then self join should be straight forward. In INNER, OUTER and CROSS JOINS, you join 2 or more different tables. However, in self join you join the same table with itslef. Here, we don't have 2 different tables, but treat the same table as a different table using table aliases. If this is still not clear, I would recomend to watch the following youtube videos.
Self Join with an example

Related

Join a table in SQL based off two columns?

I have two tables:
Employees (columns: ID, Name)
and
employee partners (EmployeeID1, EmployeeID2, Time)
I want to output EmployeName1, EmployeeName2, Time instead of imployee ids.
(In other words, replace the ids with names, but in two columns at a time)
How would I do this? Would JOIN be the appropriate command?
you need to join the employee table 2 times as the employee partners table acts as many to many connection.
The select should be:
SELECT emp1.name, emp2.name, em.time
FROM Employees emp1
JOIN employee_partners em ON emp1.id = EmployeeID1
JOIN Employees emp2 on emp2.id = EmployeeID2
Often in these situations, you want to use LEFT JOIN:
SELECT e1.name as name1, e2.name as name2, em.time
FROM employee_partners ep LEFT JOIN
Employees e1
ON e1.id = ep.EmployeeID1 LEFT JOIN
Employees e2
ON e2.id = ep.EmployeeID2;
Notes:
The LEFT JOINs ensure that you do not lose rows if either of the employee columns is NULL.
Use tables aliases; they make the query easier to write and to read.
Qualify all columns names; that is, include the table name so you know where the column is coming from.
I also added column aliases so you can distinguish between the names.

sql server - how to modify values in a query statement?

I have a statement like this:
select lastname,firstname,email,floorid
from employee
where locationid=1
and (statusid=1 or statusid=3)
order by floorid,lastname,firstname,email
The problem is the column floorid. The result of this query is showing the id of the floors.
There is this table called floor (has like 30 rows), which has columns id and floornumber. The floorid (in above statement) values match the id of the table floor.
I want the above query to switch the floorid values into the associated values of the floornumber column in the floor table.
Can anyone show me how to do this please?
I am using Microsoft sql server 2008 r2.
I am new to sql and I need a clear and understandable method if possible.
select lastname,
firstname,
email,
floor.floornumber
from employee
inner join floor on floor.id = employee.floorid
where locationid = 1
and (statusid = 1 or statusid = 3)
order by floorid, lastname, firstname, email
You have to do a simple join where you check, if the floorid matches the id of your floor table. Then you use the floornumber of the table floor.
select a.lastname,a.firstname,a.email,b.floornumber
from employee a
join floor b on a.floorid = b.id
where a.locationid=1 and (a.statusid=1 or a.statusid=3)
order by a.floorid,a.lastname,a.firstname,a.email
You need to use a join.
This will join the two tables on a certain field.
This way you can SELECTcolumns from more than one table at the time.
When you join two tables you have to specify on which column you want to join them.
In your example, you'd have to do this:
from employee join floor on employee.floorid = floor.id
Since you are new to SQL you must know a few things. With the other enaswers you have on this question, people use aliases instead of repeating the table name.
from employee a join floor b
means that from now on the table employee will be known as a and the table floor as b. This is really usefull when you have a lot of joins to do.
Now let's say both table have a column name. In your select you have to say from which table you want to pick the column name. If you only write this
SELECT name from Employee a join floor b on a.id = b.id
the compiler won't understand from which table you want to get the column name. You would have to specify it like this :
SELECT Employee.name from Employee a join floor b on a.id = b.id or if you prefer with aliases :
SELECT a.name from Employee a join floor b on a.id = b.id
Finally there are many type of joins.
Inner join ( what you are using because simply typing Join will refer to an inner join.
Left outer join
Right outer join
Self join
...
To should refer to this article about joins to know how to use them correctly.
Hope this helps.

Three tables join given me the all combination of records

When i written the query like the following.. It's written the combination of all the records.
What's the mistake in the query?
SELECT ven.vendor_code, add.address1
FROM vendor ven INNER JOIN employee emp
ON ven.emp_fk = emp.id
INNER JOIN address add
ON add.emp_name = emp.emp_name;
Using inner join, you've to put all the links (relations) between two tables in the ON clause.
Assuming the relations are good, you may test the following queries to see if they really make the combination of all records:
SELECT count(*)
from vendor ven
inner join employee emp on ven.emp_fk = emp.id
inner join address add on add.emp_name = emp.emp_name;
SELECT count(*)
add.address1
from vendor ven, employee emp, address add
If both queries return the same result (which I doubt), you really have what you say.
If not, as I assume, maybe you are missing a relation or a restriction to filter the number of results.

How to select records with no matches in the foreign table (Left outer join)

I have one table that holds my ressources:
Ressource | Ressource-ID
And a table that holds the associations
Ressource-ID | Employee-ID
How to select the ressources of an Employee that are available, i.e. not in the association table?
I've tried this, but it's not working:
select r.ress, r.ress_id
FROM Ressource r
LEFT outer JOIN Ressource_Employee_Association a ON r.ress_id = a.ress_id
WHERE a.emp_id = 'ID00163efea66b' and a.ress_id IS NULL
Any ideas?
Thanks
Thomas
After writing my above comments, and looking at the proposed solutions: I think I've got some more understanding of what you are trying to do.
Assuming you have unlimited quantity of resources in your resources table, you want to select the un-assigned resources per employee (based on their non-existence for any specific employee in the resource association table).
In order to accomplish this (and get a comprehensive list of employees) you'll need a 3rd table, in order to reference the complete list of employees. You'll also need to CROSS JOIN all of the resources onto the list of employees (assuming every employee has access to every resource), then you LEFT JOIN (LEFT OUTER JOIN whatever) your association list onto the query where the resource_id and employee_id match the resource_id in the resources table, and the employee_id in the employees table (respectively). THEN you add your where clause that filters out all the records that assign an employee to a resource. This leaves you with the resources that are available to the employee, which they also do not have signed out. This is convoluted to say, so hopefully the query sheds more light:
SELECT e.employee_id, e.employee, r.res_id, r.res
FROM employees e
CROSS JOIN resources r
LEFT JOIN assigned_resources ar
ON ar.employee_id = e.employee_id AND r.res_id = ar.res_id
WHERE ar.res_id IS NULL
If you don't have an employees table, you can accomplish the same by using the assigned resources table, but you will be limited to selecting employees who already have some resources allocated. You'll need to add a GROUP BY query because of the possible existence of multiple employee definitions in the association table. Here's what that query would look like:
SELECT e.employee_id, r.res_id, r.res
FROM assigned_resources e
CROSS JOIN resources r
LEFT JOIN assigned_resources ar
ON ar.employee_id = e.employee_id AND r.res_id = ar.res_id
WHERE ar.res_id IS NULL
GROUP BY e.employee_id, r.res_id
Does this work?
select r.ress, r.ress_id
from resource r
where not exists
(
select 1 from ressource_emplyee_association a
where a.emp_id = '...' and a.ress_id = r.ress_id
)
EDIT
Before that I had the following, but changed it according to the comments below:
select r.ress, r.ress_id
from resource r
where not exists
(
select top 1 1 from ressource_emplyee_association a
where a.emp_id = '...' and a.ress_id = r.ress_id
)
The WHERE clause is applied after the LEFT JOIN. This means that you are currently trying to get results where there is NO matching record in Ressource_Employee_Association, but where the emp_id equals 'ID00163efea66b'.
But if there is no matching record, how can emp_id be anything other than NULL?
One option is to move part of the WHERE clause into the join...
SELECT
r.ress, r.ress_id
FROM
Ressource r
LEFT OUTER JOIN
Ressource_Employee_Association a
ON r.ress_id = a.ress_id
AND a.emp_id = 'ID00163efea66b'
WHERE
a.ress_id IS NULL
This will list all resources that are not associated to employee 'ID00163efea66b'.
EDIT
Your comment implies that what you want is...
- A view listing all employees
- For each employee list each resource that they DON'T have
This requires an extra table listing all of your employees.
SELECT
*
FROM
Employee
CROSS JOIN
Ressource
WHERE
NOT EXISTS (SELECT * FROM Ressource_Employee_Association
WHERE emp_id = Employee.id
AND ress_id = Ressource.id)
SELECT *
FROM Ressource
WHERE ress_id IN (
SELECT ress_id,
FROM Ressource
MINUS
SELECT ress_id
FROM Ressource_Employee_Association
WHERE emp_id = 'ID00163efea66b'
);

What is SELF JOIN and when would you use it? [duplicate]

This question already has answers here:
Explanation of self-joins
(14 answers)
Closed 3 years ago.
What is self join and when would you use it? I don't understand self joins so a layman explanation with an example would be great.
You use a self join when a table references data in itself.
E.g., an Employee table may have a SupervisorID column that points to the employee that is the boss of the current employee.
To query the data and get information for both people in one row, you could self join like this:
select e1.EmployeeID,
e1.FirstName,
e1.LastName,
e1.SupervisorID,
e2.FirstName as SupervisorFirstName,
e2.LastName as SupervisorLastName
from Employee e1
left outer join Employee e2 on e1.SupervisorID = e2.EmployeeID
Well, one classic example is where you wanted to get a list of employees and their immediate managers:
select e.employee as employee, b.employee as boss
from emptable e, emptable b
where e.manager_id = b.empolyee_id
order by 1
It's basically used where there is any relationship between rows stored in the same table.
employees.
multi-level marketing.
machine parts.
And so on...
A self join is simply when you join a table with itself. There is no SELF JOIN keyword, you just write an ordinary join where both tables involved in the join are the same table. One thing to notice is that when you are self joining it is necessary to use an alias for the table otherwise the table name would be ambiguous.
It is useful when you want to correlate pairs of rows from the same table, for example a parent - child relationship. The following query returns the names of all immediate subcategories of the category 'Kitchen'.
SELECT T2.name
FROM category T1
JOIN category T2
ON T2.parent = T1.id
WHERE T1.name = 'Kitchen'
SQL self-join simply is a normal join which is used to join a table to itself.
Example:
Select *
FROM Table t1, Table t2
WHERE t1.Id = t2.ID
You'd use a self-join on a table that "refers" to itself - e.g. a table of employees where managerid is a foreign-key to employeeid on that same table.
Example:
SELECT E.name, ME.name AS manager
FROM dbo.Employees E
LEFT JOIN dbo.Employees ME
ON ME.employeeid = E.managerid