what is difference between self join and inner join
I find it helpful to think of all of the tables in a SELECT statement as representing their own data sets.
Before you've applied any conditions you can think of each data set as being complete (the entire table, for instance).
A join is just one of several ways to begin refining those data sets to find the information that you really want.
Though a database schema may be designed with certain relationships in mind (Primary Key <-> Foreign Key) these relationships really only exist in the context of a particular query. The query writer can relate whatever they want to whatever they want. I'll give an example of this later...
An INNER JOIN relates two tables to each other. There are often multiple JOIN operations in one query to chain together multiple tables. It can get as complicated as it needs to. For a simple example, consider the following three tables...
STUDENT
| STUDENTID | LASTNAME | FIRSTNAME |
------------------------------------
1 | Smith | John
2 | Patel | Sanjay
3 | Lee | Kevin
4 | Jackson | Steven
ENROLLMENT
| ENROLLMENT ID | STUDENTID | CLASSID |
---------------------------------------
1 | 2 | 3
2 | 3 | 1
3 | 4 | 2
CLASS
| CLASSID | COURSE | PROFESSOR |
--------------------------------
1 | CS 101 | Smith
2 | CS 201 | Ghandi
3 | CS 301 | McDavid
4 | CS 401 | Martinez
The STUDENT table and the CLASS table were designed to relate to each other through the ENROLLMENT table. This kind of table is called a Junction Table.
To write a query to display all students and the classes in which they are enrolled one would use two inner joins...
SELECT stud.LASTNAME, stud.FIRSTNAME, class.COURSE, class.PROFESSOR
FROM STUDENT stud
INNER JOIN ENROLLMENT enr
ON stud.STUDENTID = enr.STUDENTID
INNER JOIN CLASS class
ON class.CLASSID = enr.CLASSID;
Read the above closely and you should see what is happening. What you will get in return is the following data set...
| LASTNAME | FIRSTNAME | COURSE | PROFESSOR |
---------------------------------------------
Patel | Sanjay | CS 301 | McDavid
Lee | Kevin | CS 101 | Smith
Jackson | Steven | CS 201 | Ghandi
Using the JOIN clauses we've limited the data sets of all three tables to only those that match each other. The "matches" are defined using the ON clauses. Note that if you ran this query you would not see the CLASSID 4 row from the CLASS table or the STUDENTID 1 row from the STUDENT table because those IDs don't exist in the matches (in this case the ENROLLMENT table). Look into "LEFT"/"RIGHT"/"FULL OUTER" JOINs for more reading on how to make that work a little differently.
Please note, per my comments on "relationships" earlier, there is no reason why you couldn't run a query relating the STUDENT table and the CLASS table directly on the LASTNAME and PROFESSOR columns. Those two columns match in data type and, well look at that! They even have a value in common! This would probably be a weird data set to get in return. My point is it can be done and you never know what needs you might have in the future for interesting connections in your data. Understand the design of the database but don't think of "relationships" as being rules that can't be ignored.
In the meantime... SELF JOINS!
Consider the following table...
PERSON
| PERSONID | FAMILYID | NAME |
--------------------------------
1 | 1 | John
2 | 1 | Brynn
3 | 2 | Arpan
4 | 2 | Steve
5 | 2 | Tim
6 | 3 | Becca
If you felt so inclined as to make a database of all the people you know and which ones are in the same family this might be what it looks like.
If you wanted to return one person, PERSONID 4, for instance, you would write...
SELECT * FROM PERSON WHERE PERSONID = 4;
You would learn that he is in the family with FAMILYID 2. Then to find all of the PERSONs in his family you would write...
SELECT * FROM PERSON WHERE FAMILYID = 2;
Done and done! SQL, of course, can accomplish this in one query using, you guessed it, a SELF JOIN.
What really triggers the need for a SELF JOIN here is that the table contains a unique column (PERSONID) and a column that serves as sort of a "Category" (FAMILYID). This concept is called Cardinality and in this case represents a one to many or 1:M relationship. There is only one of each PERSON but there are many PERSONs in a FAMILY.
So, what we want to return is all of the members of a family if one member of the family's PERSONID is known...
SELECT fam.*
FROM PERSON per
JOIN PERSON fam
ON per.FamilyID = fam.FamilyID
WHERE per.PERSONID = 4;
Here's what you would get...
| PERSONID | FAMILYID | NAME |
--------------------------------
3 | 2 | Arpan
4 | 2 | Steve
5 | 2 | Tim
Let's note a couple of things. The words SELF JOIN don't occur anywhere. That's because a SELF JOIN is just a concept. The word JOIN in the query above could have been a LEFT JOIN instead and different things would have happened. The point of a SELF JOIN is that you are using the same table twice.
Consider my soapbox from before on data sets. Here we have started with the data set from the PERSON table twice. Neither instance of the data set affects the other one unless we say it does.
Let's start at the bottom of the query. The per data set is being limited to only those rows where PERSONID = 4. Knowing the table we know that will return exactly one row. The FAMILYID column in that row has a value of 2.
In the ON clause we are limiting the fam data set (which at this point is still the entire PERSON table) to only those rows where the value of FAMILYID matches one or more of the FAMILYIDs of the per data set. As we discussed we know the per data set only has one row, therefore one FAMILYID value. Therefore the fam data set now contains only rows where FAMILYID = 2.
Finally, at the top of the query we are SELECTing all of the rows in the fam data set.
Voila! Two queries in one.
In conclusion, an INNER JOIN is one of several kinds of JOIN operations. I would strongly suggest reading further into LEFT, RIGHT and FULL OUTER JOINs (which are, collectively, called OUTER JOINs). I personally missed a job opportunity for having a weak knowledge of OUTER JOINs once and won't let it happen again!
A SELF JOIN is simply any JOIN operation where you are relating a table to itself. The way you choose to JOIN that table to itself can use an INNER JOIN or an OUTER JOIN. Note that with a SELF JOIN, so as not to confuse your SQL engine you must use table aliases (fam and per from above. Make up whatever makes sense for your query) or there is no way to differentiate the different versions of the same table.
Now that you understand the difference open your mind nice and wide and realize that one single query could contain all different kinds of JOINs at once. It's just a matter of what data you want and how you have to twist and bend your query to get it. If you find yourself running one query and taking the result of that query and using it as the input of another query then you can probably use a JOIN to make it one query instead.
To play around with SQL try visiting W3Schools.com There is a locally stored database there with a bunch of tables that are designed to relate to each other in various ways and it's filled with data! You can CREATE, DROP, INSERT, UPDATE and SELECT all you want and return the database back to its default at any time. Try all sorts of SQL out to experiment with different tricks. I've learned a lot there, myself.
Sorry if this was a little wordy but I personally struggled with the concept of JOINs when I was starting to learn SQL and explaining a concept by using a bunch of other complex concepts bogged me down. Best to start at the bottom sometimes.
I hope it helps. If you can put JOINs in your back pocket you can work magic with SQL!
Happy querying!
A self join joins a table to itself. The employee table might be joined to itself in order to show the manager name and the employee name in the same row.
An inner join joins any two tables and returns rows where the key exists in both tables. A self join can be an inner join (most joins are inner joins and most self joins are inner joins). An inner join can be a self join but most inner joins involve joining two different tables (generally a parent table and a child table).
An inner join (sometimes called a simple join) is a join of two or more tables that returns only those rows that satisfy the join condition.
A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition. To perform a self join, Oracle Database combines and returns rows of the table that satisfy the join condition.
Related
Imagine I have four tables:
Agents
| agent_id | agent_name |
Teams
| team_id | team_name |agent_id |
Menu
| menu_id | menu_name |
Team_assignment
| menu_id | team_id|
I need to write a query that selects all agents that are assigned to all teams and all queues and disregard the ones that are not assigned to a queue. Note that every agent is always assigned to a team but it's not necessary that the agent is assigned to a queue.
Since you stated that this is for a school project, I'll try to stay within the guidelines mentioned here: How do I ask and answer homework questions?
From what I can make up from your question you basically want to select all the data from the different tables joining them on one of the columns in the first table a equals = a column from the second table b. Most commonly where the primary key from one table equals the foreign key from another table. Then you want to add conditions to your query where for example some column from table 1 equals = some value.
Do you catch my drift? 😏
No?
You want to SELECT a.*, b.* everything FROM table Agents a JOINing table Teams b ON column a.agent_id being equal to = column b.agent_id
You probably want to JOIN another table, lets say Team_assignment c ON column c.team_id being equal to = b.team_id.
You can JOIN more tables in the same way.
Sadly, I do not understand what you mean by the ones that are not assigned to a queue but it sounds like a condition that your query needs to match, so WHERE the potential column a.is_assigned_to_queue equals = true AND for example a.agent_name IS NOT NULL
If you got this far you should have been able to catch onto my drift 😎, congrats. This way hopefully you also got a better understanding of how building query works, instead of me just blatantly giving you the answer and you learn nothing from it. Like this:
SELECT a.*, b.*, c.*, d.* FROM Agents a
JOIN Teams b ON a.agent_id = b.agent_id
JOIN Team_assignment c ON c.team_id = b.team_id
JOIN Menu d ON d.menu_id = c.menu_id
WHERE a.is_assigned_to_queue = true
AND a.agent_name IS NOT NULL;
Now it is possible copy and pasting the snippet above will not work, that is because I'm not an SQL expert and I had to refresh my old memories about SQL myself by googling it. But that's the nice part of actually learning it. Being able to explain it to someone else :)
I have the following SQL:
SELECT j.AssocJobKey
, COUNT(DISTINCT o.ID) AS SubjectsOrdered
, COUNT(DISTINCT s.ID) AS SubjectsShot
FROM Jobs j
LEFT JOIN Orders o ON o.AssocJobKey = j.AssocJobKey
LEFT JOIN Subjects s ON j.AssocJobKey = s.AssocJobKey
GROUP BY
j.AssocJobKey
,j.JobYear
The basic structure is a Job is the parent that is unique by AssocJobKey and has a one to many relationships with Subjects and Orders.
The query gives me what I want, the output looks like this:
| AssocJobKey | SubjectsOrdered | SubjectsShot |
|-----------------------|------------------------|---------------------|
| BAT-H181 | 107 | 830 |
|--------------------- |------------------------|---------------------|
| BAT-H131 | 226 | 1287 |
The problem is the query is way to heavy and my memory is spiking, there's no way I could run this on a large dataset. If I remove one of the LEFT JOINs on the corresponding count the query executes instantly and theres no problem. So somehow things are bouncing around between the two left joins more than they should, but I don't understand why they would.
Really hoping to avoid joining on sub selects if at all possible.
Your query is generating a Cartesian product for each job. And this is big -- your second row has about 500k rows being generated. COUNT(DISTINCT) then has to figure out the unique ids among this Cartesian product.
The solution is simple: pre-aggregate:
SELECT j.AssocJobKey, o.SubjectsOrdered, s.SubjectsShot
FROM Jobs j LEFT JOIN
(SELECT o.AssocJobKey, COUNT(*) as SubjectsOrdered
FROM Orders o
GROUP BY o.AssocJobKey
) o
ON o.AssocJobKey = j.AssocJobKey LEFT JOIN
(SELECT j.AssocJobKey, COUNT(s.ID) AS SubjectsShot
FROM Subjects s
GROUP BY j.AssocJobKey
) s
ON j.AssocJobKey = s.AssocJobKey;
This makes certain assumptions that I think are reasonable:
The ids in the orders and subjects table are unique and non-NULL.
jobs.AssocJobKey is unique.
The query can be easily adapted if either of these are not true, but they seem like reasonable assumptions.
Often for these types of joins over different dimensions, COUNT(DISTINCT) is a reasonable solution (the queries are certainly simpler). This is true when there are at most a handful of values.
I have a database in MS Access, and I ran into a problem with empty values. I have 3 tables that are connected to eachother. Lets say Table1 contains people, Table 2 contains Phone numbers, and Table 3 connects table 1 and 2, having both their ID's so I could later see what person has what numbers by using the IDs.
What I want from access is that it would display a person even if he/she doesn't have a number assigned, and also a number when there are no people assigned to it.
Something like this:
Persons_name |Phone_number
--------------------------
Fred | 123
| 222
Anna |
The tables look something like this:
People People_phones Phones
------------- -------------- ------------
ID ID ID
Persons_name People_ID Phone_number
Phones_ID
So far I've managed to get access to show either table 1's null values or table 2's null values, but not both.
As E Mett indicated above, your looking for a full outer join which doesn't handle directly. Here is an example of what he's suggesting:
How do I write a full outer join query in access
JB
In sql jargon what you are looking for is an outer join.
This is unfortunately not available in Ms Access because it is rarely needed.
You should create two queries, one using a left join and the other with a right join.
Then use the UNION keyword to combine the results
I have a table IDs, first names, and last names. I also have a table of children and parents. I want to translate the children/parent table from int to nvarchar using the ID table. In this example, we have two people, Mark and Ray Smith. Ray is Mark's parent. Ray has no parent.
People_Table
ID|First|Last
--+-----+----
0 |NULL|NULL
10|Mark|Smith
15|Ray |Smith
Parent_Child_Table
Child|Parent
-----+------
10 | 15
15 | 0
I want to get
First|Last|First|Last
-----+-----+----+-----
Mark|Smith|Ray|Smith
Ray|Smith|NULL|NULL
I've tried using INNER JOIN but that only gives me the name of the children, for example. Using two select statements gets me one list of all the people, but doesn't maintain the structure. Any ideas?
Sorry for the ugly editing (or lack thereof), this is my first time posting here, and I'm rushing. I hope it's not confusing.
select p.first, p.last, c.first, c.last
from parent_child_table pc
inner join people_table p on p.id = pc.parent
inner join people_table c on c.id = pc.child
I'm trying to extract info from a table in my database based on a persons job. In one table i have all the clients info, in another table linked by ID_no their job title and the branches theyre associated with. the problem I'm having is when i join both tables I'm returning some duplicates because a person can be associated with more than one branch.
I would like to know how to return the duplicated values only once, because all I care about for the moment is the persons id number and what their job title is.
SELECT *
FROM dbo.employeeinfo AS ll
LEFT OUTER JOIN employeeJob AS lly
ON ll.id_no = lly.id_no
WHERE lly.job_category = 'cle'
I know Select Distinct will not work in this situation since the duplicated values return different branches.
Any help would be appreciated. Thanks
I'm using sql server 2008 by the way
*edit to show result i would like
------ ll. ll. lly. lly.
rec_ID --employeeID---Name-----JobTitle---Branch------
1 JX100 John cle london
2 JX100 John cle manchester
3 JX690 Matt 89899 london
4 JX760 Steve 12345 london
I would like the second record to not display because i'm not interested in the branch. i just need to know the employee id and his job title, but because of how the tables are structured it's returning JX100 twice because he's recorded as working in 2 different branches
You must use SELECT DISTINCT and specify you ONLY want person id number and job title.
I don't know exactly your fields name, but I think something like this could work.
SELECT DISTINCT ll.id_no AS person_id_number,
lly.job AS person_job
FROM dbo.employeeinfo AS ll LEFT OUTER JOIN
employeeJob AS lly ON ll.id_no = lly.id_no
WHERE lly.job_category = 'cle'