Joins explained by Venn Diagram with more than one join - sql

http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
and
https://web.archive.org/web/20120621231245/http://www.khankennels.com/blog/index.php/archives/2007/04/20/getting-joins/
have been very helpful in learning the basics of joins using Venn diagrams. But I am wondering how you would apply that same thinking to a query that has more than one join.
Let's say I have 3 tables:
Employees
EmployeeID
FullName
EmployeeTypeID
EmployeeTypes (full time, part time, etc.)
EmployeeTypeID
TypeName
InsuranceRecords
InsuranceRecordID
EmployeeID
HealthInsuranceNumber
Now, I want my final result set to include data from all three tables, in this format:
EmployeeID | FullName | TypeName | HealthInsuranceNumber
Using what I learned from those two sites, I can use the following joins to get all employees, regardless of whether or not their insurance information exists or not:
SELECT
Employees.EmployeeID, FullName, TypeName, HealthInsuranceNumber
FROM Employees
INNER JOIN EmployeeTypes ON Employees.EmployeeTypeID = EmployeeTypes.EmployeeTypeID
LEFT OUTER JOIN InsuranceRecords ON Employees.EmployeeID = InsuranceRecords.EmployeeID
My question is, using the same kind of Venn diagram pattern, how would the above query be represented visually? Is this picture accurate?

I think it is not quite possible to map your example onto these types of diagrams for the following reason:
The diagrams here are diagrams used to describe intersections and unions in set theory. For the purpose of having an overlap as depicted in the diagrams, all three diagrams need to contain elements of the same type which by definition is not possible if we are dealing with three different tables where each contains a different type of (row-)object.
If all three tables would be joined on the same key then you could identify the values of this key as the elements the sets contain but since this is not the case in your example these kind of pictures are not applicable.
If we do assume that in your example both joins use the same key, then only the green area would be the correct result since the first join restricts you to the intersection of Employees and Employee types and the second join restricts you the all of Employees and since both join conditions must be true you would get the intersection of both of the aforementioned sections which is the green area.
Hope this helps.

That is not an accurate set diagram (either Venn or Euler). There are no entities which are members of both Employees and Employee Types. Even if your table schema represented some kind of table-inheritance, all the entities would still be in a base table.
Jeff's example on the Coding Horror blog only works with like entities i.e. two tables containing the same entities - technically a violation of normalization - or a self-join.
Venn diagrams can accurately depict scenarios like:
-- The intersection lens
SELECT *
FROM table
WHERE condition1
AND condition2
-- One of the lunes
SELECT *
FROM table
WHERE condition1
AND NOT condition2
-- The union
SELECT *
FROM table
WHERE condition1
OR condition2

Related

How to check if table relationship is one to many in SSMS

I'm writing some SQL code based on my tables but don't want to miss any edge cases. I'm wondering how do you check if there's a one to many relationship between two tables in SSMS
SELECT *
FROM Houses
JOIN Addresses
on Houses.Id = Addresses.HouseId
Unfortunately the data when queried doesn't give me any insight.
What I tried to do:
Checked table dependencies but that didn't give me any insight. It shows addresses are dependencies but no relationship details.
May I ask, is it possible to determine if one to one via SSMS?
You can run a query to determine how many values each house in the two tables. Well, in this case, we'll assume it is the primary key of houses and the values of HouseId are valid ids:
SELECT num_addresses, COUNT(*)
FROM (SELECT h.Id, COUNT(a.HouseId) as num_addresses
FROM Houses h LEFT JOIN
Addresses a
ON h.Id = a.HouseId
GROUP BY h.Id
) ha
GROUP BY num_addresses;
Then interpret the results:
If the only row returned has num_addresses of 1, then you have a 1-1 relationship.
If two rows are returns with values of 0 and 1, then you have a 1/0-1 relationship.
If multiple rows are returned and the minimum is 1 then you have a 1-n.
If multiple rows are returned and the minimum value is 0, then you have a 0-n.
You could extend this for more general relationships, but this answers the question you asked here.
The relationship between tables is shown by things called 'DATABASE - DIAGRAM'
Here is document about how to create database - diagram in SSMS.
How to create database diagram SSMS
Gordon Linoff response is perfect for the helicopter view -
To get all records side by side simply select all columns from both tables side by side with the left join
SELECT Houses.*, Address.*
FROM Houses
left JOIN Addresses on Addresses.HouseId = Houses.Id
union all
SELECT Houses.*, Address.*
FROM Addresses
left JOIN Houses on Houses.Id = Addresses.HouseId
Microsoft have implemented a reduced code method to do the same but I have not explored it so could not comment
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/outer-joins?view=sql-server-ver15
What will be apparent is you will see Houses.ID and Addresses.HouseID side by side - An empty value in either means that the relevant table does not have a record for the relevant HouseID or ID
You can then cherry pick those edge case records as you see fit

SQL Multiple Joins - How do they work exactly?

I'm pretty sure this works universally across various SQL implementations. Suppose I have many-to-many relationship between 2 tables:
Customer: id, name
has many:
Order: id, description, total_price
and this relationship is in a junction table:
Customer_Order: order_date, customer_id, order_id
Now I want to write SQL query to join all of these together, mentioning the customer's name, the order's description and total price and the order date:
SELECT name, description, total_price FROM Customer
JOIN Customer_Order ON Customer_Order.customer_id = Customer.id
JOIN Order = Order.id = Customer_Order.order_id
This is all well and good. This query will also work if we change the order so it's FROM Customer_Order JOIN Customer or put the Order table first. Why is this the case? Somewhere I've read that JOIN works like an arithmetic operator (+, * etc.) taking 2 operands and you can chain operator together so you can have: 2+3+5, for example. Following this logic, first we have to calculate 2+3 and then take that result and add 5 to it. Is it the same with JOINs?
Is it that behind the hood, the first JOIN must first be completed in order for the second JOIN to take place? So basically, the first JOIN will create a table out of the 2 operands left and right of it. Then, the second JOIN will take that resulting table as its left operand and perform the usual joining. Basically, I want to understand how multiple JOINs work behind the hood.
In many ways I think ORMs are the bane of modern programming. Unleashing a barrage of underprepared coders. Oh well diatribe out of the way, You're asking a question about set theory. THere are potentially other options that center on relational algebra but SQL is fundamentally set theory based. here are a couple of links to get you started
Using set theory to understand SQL
A visual explanation of SQL

Same number of columns for Union Operation

I was going through union and union all logic and trying examples. What has puzzled me is why is it necessary to have same number of columns in both the tables to perform a union or union all operation?
Forgive me if my question's silly, but i couldn't get the exact answer anywhere. And conceptually thinking, as long as one common column is present, merging two tables should be easy right?(like a join operation). But this is not the case, and I want to know why?
JOIN operations do NOT require the same number of columns be selected in both tables. UNION operations are different that joins. Think of it as two separate lists of data that can be "pasted" together in one big block. You can't have columns that don't match.
Another way to look at it is a JOIN does this:
TableA.Col1, TableA.Col2 .... TableB.Col1, TableB.Col2
A UNION does this:
TableA.Col1, TableA.Col2
TableB.Col1, TableB.Col2
JOINS add columns to rows, UNIONS adds more rows to existing rows. That's why they must "match".
Join Operation combines columns from two tables.
Where as Union and Union all combines two results sets, so In order to combine two results you need to have same number of columns with compatible data types.
In real world e.g. In order to play a game of cricket you need 11 players in both team, 7 in one team and 11 in opposite team not allowed.
Lets say you have EMPTable with columns and values as below:
id, name, address, salary, DOB
1, 'SAM', '2 Merck Ln', 100000, '08/18/1980'
IF you want to UNION with only the column name (lets say value is 'TOBY'), it means you have to default other values to NULL (a smart software or db can implicitly do it for you), which in essence translates to below (to prevent the integrity of a relational table) ->
1, 'SAM', '2 Merck Ln', 100000, '08/18/1980'
UNION
NULL,'TOBY', NULL, NULL, NULL
A "union" by definition is a merger of (different) values of THE same class or type.
You need to research the difference between UNION and JOIN.
Basically, you use UNION when you want to get records from one source (table, view, group of tables, etc) and combine those results with records from another source. They have to have the same columns to get a common set of results.
You use JOIN when you want to join records from one source with another source. There are several types of JOINs (INNER, LEFT, RIGHT, etc) depending on your needs. When using JOINs, you can specify whichever columns you'd like.
Good luck.
join are basically used when we want to get data from two or more tables, and for this there is no need to fetch same number of columns, consider a situation where by using normalization concepts we have divided the table into two parts,
emp [eid, ename,edob,esal]
emp_info [eid,emob_no]
here i we want to know name and mobile no. of all employees then we will use the concept of join, because here information needed by us can't be provided by same table.
so we will use..
SELECT E.ENAME,EI.EMOB_NO FROM EMP E, EMP_INFO EI WHERE E.EID = EI.EID AND LOWER(E.ENAME)='john' ;
now consider about situation where we want to find employees those are having saving account and loan account in a bank.. here we want to find common tuples from two tables results. for this we will use set operations. [ intersection ]
for set operations 2 conditions MUST BE satisfied.
same no. of attributes must be fetches from each table.
domain of each attribute must be same of compatible to the higher one...

How to design the database schema to link two tables via lots of other tables

Although I'm using Rails, this question is more about database design. I have several entities in my database, with the schema a bit like this: http://fishwebby.posterous.com/40423840
If I want to get a list of people and order it by surname, that's no problem. However, if I want to get a list of people, ordered by surname, enrolled in a particular group, I have to use an SQL statement that includes several joins across four tables, something like this:
SELECT group_enrolment.*, person.*
FROM person INNER JOIN member ON person.id = member.person_id
INNER JOIN enrolment ON member.id = enrolment.member_id
INNER JOIN group_enrolment ON enrolment.id = group_enrolment.enrolment_id
WHERE group_enrolment.id = 123
ORDER BY person.surname;
Although this works, it strikes me as a bit inefficient, and potentially as my schema grows, these queries could get more and more complicated.
Another option could be to join the person table to all the other tables in the query by including person_id in the other tables, then it would just be one single join, for example
SELECT group_enrolment.*, person.*
FROM person INNER JOIN group_enrolment ON group_enrolment.person_id
WHERE group_enrolment.id = 123
ORDER BY person.surname;
But this would mean that in my schema, the person table is joined to a lot of other tables. Aside from a complicated schema diagram, does anyone see any disadvantages to this?
I'd be very grateful for any comments on this - whether what I'm doing now (the many table join) or the second solution or another one that hasn't occurred to me is the best way to go.
Many thanks in advance
Well, joins are what databases do. Having said that, you may consider propagating natural keys in your model, which would then allow you to skip over some tables in joins. Take a look at this example.
EDIT
I'm not saying that this will match your model (problem), but just for fun try similar queries on something like this:

Storing & Querying Heirarchical Data with Multiple Parent Nodes

I've been doing quite a bit of searching, but haven't been able to find many resources on the topic. My goal is to store scheduling data like you would find in a Gantt chart. So one example of storing the data might be:
Task Id | Name | Duration
1 Task A 1
2 Task B 3
3 Task C 2
Task Id | Predecessors
1 Null
2 Null
3 1
3 2
Which would have Task C waiting for both Task A and Task B to complete.
So my question is: What is the best way to store this kind of data and efficiently query it? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add in multiple parents it becomes hard to find info. By the way, I'm working with SQL Server and .NET for this task.
Your problem is related to the concept of relationship cardinality. All relationships have some cardinality, which expresses the potential number of instances on each side of the relationship that are members of it, or can participate in a single instance of the relationship. As an example, for people, (for most living things, I guess, with rare exceptions), the Parent-Child relationship has a cardinality of 2 to zero or many, meaning it takes two parents on the parent side, and there can be zero or many children (perhaps it should be 2 to 1 or many)
In database design, generally, anything that has a 1(one), (or a zero or one), on one side can be easily represented with just two tables, one for each entity, (sometimes only one table is needed see note**) and a foreign key column in the table representing the "many" side, that points to the other table holding the entity on the "one" side.
In your case you have a many to many relationship. (A Task can have multiple predecessors, and each predecessors can certainly be the predecessor for multiple tasks) In this case a third table is needed, where each row, effectively, represents an association between 2 tasks, representing that one is the predecessor to the other. Generally, This table is designed to contain only all the columns of the primary keys of the two parent tables, and it's own primary key is a composite of all the columns in both parent Primary keys. In your case it simply has two columns, the taskId, and the PredecessorTaskId, and this pair of Ids should be unique within the table so together they form the composite PK.
When querying, to avoid double counting data columns in the parent tables when there are multiple joins, simply base the query on the parent table... e.g., to find the duration of the longest parent,
Assuming your association table is named TaskPredecessor
Select TaskId, Max(P.Duration)
From Task T Join Task P
On P.TaskId In (Select PredecessorId
From TaskPredecessor
Where TaskId = T.TaskId)
** NOTE. In cases where both entities in the relationship are of the same entity type, they can both be in the same table. The canonical (luv that word) example is an employee table with the many to one relationship of Worker to Supervisor... Since the supervisor is also an employee, both workers and supervisors can be in the same [Employee] table, and the realtionship can gbe modeled with a Foreign Key (called say SupervisorId) that points to another row in the same table and contains the Id of the employee record for that employee's supervisor.
Use adjacency list model:
chain
task_id predecessor
3 1
3 2
and this query to find all predecessors of the given task:
WITH q AS
(
SELECT predecessor
FROM chain
WHERE task_id = 3
UNION ALL
SELECT c.predecessor
FROM q
JOIN chain c
ON c.task_id = q.predecessor
)
SELECT *
FROM q
To get the duration of the longest parent for each task:
WITH q AS
(
SELECT task_id, duration
FROM tasks
UNION ALL
SELECT t.task_id, t.duration
FROM q
JOIN chain с
ON c.task_id = q.task_id
JOIN tasks t
ON t.task_id = c.predecessor
)
SELECT task_id, MAX(duration)
FROM q
Check "Hierarchical Weighted total" pattern in "SQL design patterns" book, or "Bill Of Materials" section in "Trees and Hierarchies in SQL".
In a word, graphs feature double aggregation. You do one kind of aggregation along the nodes in each path, and another one across alternative paths. For example, find a minimal distance between the two nodes is minimum over summation. Hierarchical weighted total query (aka Bill Of Materials) is multiplication of the quantities along each path, and summation along each alternative path:
with TCAssembly as (
select Part, SubPart, Quantity AS factoredQuantity
from AssemblyEdges
where Part = ‘Bicycle’
union all
select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
from TCAssembly te, AssemblyEdges e
where te.SubPart = e.Part
) select SubPart, sum(Quantity) from TCAssembly
group by SubPart