Recursive SQL Query· - sql

I have the following relation:
CompanyInfo(company, role, employee)
What I'm trying to do is to find the shortest "path" between two employees.
Example
I need to find the distance between Joe and Peter.
Joe is the CEO of Company A, and a person named Alex is a board member.
Alex is the CEO of Company B, and Peter is a vice president at Company B. Then, the distance between Joe and Peter will be 2. If Joe and Peter had roles in the same company, it would be 1.
I need to solve this using recursive SQL. So far I've come up with the base case and the final select string, but I can't for the life of me figure out the recursive part.
WITH RECURSIVE shortest_path(c1,p1,c2,p2, path) AS (
-- Basecase --
SELECT c1.company, c1.person, c2.company, c2.person, array[c1.person, c2.person]
FROM CompanyInfo c1
INNER JOIN CompanyInfo c2 ON c1.company = c2.company
WHERE c1.person = 'Joe'
AND c1.person <> c2.person
UNION ALL
-- Recursive --
-- This is where I'm stuck.
)
SELECT p1, p2, array_length(path,1) -1 as distance
FROM shortest_path
WHERE p2 = 'Peter'
ORDER BY distance
LIMIT 1;
Sample Data
CREATE TABLE CompanyInfo (
company text,
role text,
employee text,
primary key (company, role, employee)
);
insert into CompanyInfo values('Company A', 'CEO', 'Joe');
insert into CompanyInfo values('Company A', 'Board member', 'Alex');
insert into CompanyInfo values('Company B', 'CEO', 'Alex');
insert into CompanyInfo values('Company B', 'Board member', 'Peter');
Expected Output
person 1 | person 2 | distance
Joe Peter 2

Try this. Keep running till new employee can be added to the path.
CREATE TABLE CompanyInfo (
company text,
role text,
employee text,
primary key (company, role, employee)
);
insert into CompanyInfo values('Company A', 'CEO', 'Joe');
insert into CompanyInfo values('Company A', 'Board member', 'Alex');
insert into CompanyInfo values('Company B', 'CEO', 'Alex');
insert into CompanyInfo values('Company B', 'Board member', 'Peter');
WITH RECURSIVE shortest_path(c1,p1,c2,p2, path) AS (
-- Basecase --
SELECT c1.company, c1.employee, c2.company, c2.employee, array[c1.employee, c2.employee]
FROM CompanyInfo c1
JOIN CompanyInfo c2 ON c1.company = c2.company
AND c1.employee = 'Joe'
AND c1.employee <> c2.employee
UNION ALL
-- Recursive --
SELECT c1, p1, c3.company, c3.employee, path || c3.employee
FROM shortest_path c1
JOIN CompanyInfo c2 ON c1.p2 = c2.employee
JOIN CompanyInfo c3 ON c3.company = c2.company
AND NOT c3.employee = ANY (c1.path)
)
SELECT *, array_length(path,1) -1 as distance
FROM shortest_path
WHERE p2 = 'Peter'
ORDER BY distance
LIMIT 1;

Related

How can I get certain columns from a table only if they meet a condition in SQL?

I have the following tables:
CREATE TABLE books
(
codBook INTEGER PRIMARY KEY,
title CHAR(20) NOT NULL
);
INSERT INTO books
VALUES (1, 'Book 1'), (2, 'Book 2'), (3, 'Book 3');
CREATE TABLE people
(
name CHAR(10) PRIMARY KEY,
address VARCHAR(50),
CP NUMERIC(5)
);
INSERT INTO people
VALUES ('Carl', 'C/X nº 1', '12345'), ('Louis', 'C/X nº 2', '12345'),
('Joseph', 'C/Y nº 3', '12346'), ('Anna', 'C/Z nº 4', '12347');
CREATE TABLE lends
(
codBook INTEGER REFERENCES books,
member CHAR(10) REFERENCES people,
date DATE,
PRIMARY KEY (codBook, member, date)
);
INSERT INTO lends
VALUES (1, 'Joseph', CURRENT_DATE - 10),
(1, 'Carl', CURRENT_DATE - 9),
(1, 'Louis', CURRENT_DATE - 8),
(2, 'Joseph', CURRENT_DATE - 10);
I am trying to get all the rows with the title, address and CP where they were borrowed only if they were borrowed in CP=12345 and the rows that are not from CP 12345 to appear but without the address and the CP. As book 1 has CP 12345 and 12346, I only want it to appear with CP 12345.
My expected solution is:
"Book 1";"C/X nº 1";12345
"Book 1";"C/X nº 2";12345
"Book 2";null;null
"Book 3";null;null
I tried joining all the tables using 2 left joins:
SELECT title, address, CP
FROM books
LEFT JOIN lends USING (codBook)
LEFT JOIN people ON (name = member)
WHERE CP = 12345;
But I only get the rows with CP=12345 and if I remove WHERE CP=12345 I obtain all the rows, even the book 1 with CP 12346. I am looking for a way to solve this.
If you join LENDS and PEOPLE first as INNER JOIN and add the CP number to the ON clause you get your result
SELECT title , address, CP
FROM books
LEFT JOIN (lends
INNER JOIN people ON (name = member AND CP = 12345)) USING (codBook)
title
address
cp
Book 1
C/X nº 2
12345
Book 1
C/X nº 1
12345
Book 2
null
null
Book 3
null
null
SELECT 4
fiddle
I hope this query will solve your problem:
select books.title, sub.address, sub.CP
from books
left join (
SELECT address, CP, codbook
FROM books
LEFT JOIN lends USING (codBook)
JOIN people ON (name = member and CP = 12345)
) as sub on books.codbook = sub.codbook

Calculations through a hierarchy in SQL

I am trying to perform some calculation by navigating through a hierarchy. In the simple example below, where organizations have a headcount and can be associated with parent organizations, the headcount is only specified for "leafs" organizations. I want to calculate the headcount all the way up the hierarchy using the simple rule: parent_headcount = sum(children_headcount).
I liked the idea of using SQL Common Table Expression for this, but this does not quite work. The determination of the level works (as it follows the natural top-down order of navigation), but not the headcount determination.
How would you fix this, or is there a better way to perform this calculation bottom-up?
-- Define the hierachical table Org
drop table if exists Org
create table Org (
ID int identity (1,1) not null, Name nvarchar(50), parent int null, employees int,
constraint [PK_Org] primary key clustered (ID),
constraint [FK_Parent] foreign key (parent) references Org(ID)
);
-- Fill it in with a simple example
insert into Org (name, parent, employees) values ('ACME', NULL, 0);
insert into Org (name, parent, employees) values ('ACME France', (select Org.ID from Org where Name = 'ACME'), 0);
insert into Org (name, parent, employees) values ('ACME UK', (select Org.ID from Org where Name = 'ACME'), 0);
insert into Org (name, parent, employees) values ('ACME Paris', (select Org.ID from Org where Name = 'ACME France'), 200);
insert into Org (name, parent, employees) values ('ACME Lyons', (select Org.ID from Org where Name = 'ACME France'), 100);
insert into Org (name, parent, employees) values ('ACME London', (select Org.ID from Org where Name = 'ACME UK'), 150);
select * from Org;
-- Try to determine the total number of employees at any level of the hierarchy
with Orgs as (
select
ID, name, parent, 0 as employees, 0 as level from Org where parent is NULL
union all
select
child.ID, child.name, child.parent, Orgs.employees + child.employees, level + 1 from Org child
join Orgs on child.parent = Orgs.ID
)
select * from Orgs;
This query returns:
The determination of the level is correct, but the calculation of the headcount is not (UK should be 150, France 300, and 450 at the top fo the hierarchy). It seems that CTE is suitable for top-down navigation, but not bottom-up?
Just another option using the datatype hierarchyid
Note: the #Top and Nesting is optional
Example
Declare #Top int = null
;with cteP as (
Select ID
,Parent
,Name
,HierID = convert(hierarchyid,concat('/',ID,'/'))
,employees
From Org
Where IsNull(#Top,-1) = case when #Top is null then isnull(Parent ,-1) else ID end
Union All
Select ID = r.ID
,Parent = r.Parent
,Name = r.Name
,HierID = convert(hierarchyid,concat(p.HierID.ToString(),r.ID,'/'))
,r.employees
From Org r
Join cteP p on r.Parent = p.ID)
Select Lvl = A.HierID.GetLevel()
,A.ID
,A.Parent
,Name = Replicate('|---',A.HierID.GetLevel()-1) + A.Name
,Employees = sum(B.Employees)
From cteP A
Join cteP B on B.HierID.ToString() like A.HierID.ToString()+'%'
Group By A.ID,A.Parent,A.Name,A.HierID
Order By A.HierID
Returns
You need to traverse hierarchy for every non-leafe node and the sum up all the paths from the node.
with Orgs as (
select
id as [top], ID, name, parent, 0 as employees, 0 as level
from Org g
where exists (select 1 from Org g2 where g.ID = g2.parent)
union all
select
orgs.[top], child.ID, child.name, child.parent, Orgs.employees + child.employees, level + 1 from Org child
join Orgs on child.parent = Orgs.ID
)
select [top] as id, sum(employees) employees
from Orgs
group by [top];
Db fiddle
Try this:
/***** DATA *************/
-- Define the hierachical table Org
drop table Org
create table Org (
ID int identity (1,1) not null, Name nvarchar(50), parent int null, employees int,
constraint [PK_Org] primary key clustered (ID),
constraint [FK_Parent] foreign key (parent) references Org(ID)
);
-- Fill it in with a simple example
insert into Org (name, parent, employees) values ('ACME', NULL, 0);
insert into Org (name, parent, employees) values ('ACME France', (select Org.ID from Org where Name = 'ACME'), 0);
insert into Org (name, parent, employees) values ('ACME UK', (select Org.ID from Org where Name = 'ACME'), 0);
insert into Org (name, parent, employees) values ('ACME Paris', (select Org.ID from Org where Name = 'ACME France'), 200);
insert into Org (name, parent, employees) values ('ACME Lyons', (select Org.ID from Org where Name = 'ACME France'), 100);
insert into Org (name, parent, employees) values ('ACME London', (select Org.ID from Org where Name = 'ACME UK'), 150);
select * from Org;
/******** END DATA ***********/
/******** QUERY ******/
-- Try to determine the total number of employees at any level of the hierarchy
with Orgs as (
select
ID, name, parent, employees, ID as RootID, 0 as level from Org
union all
select
child.ID , child.name, child.parent, child.employees, Orgs.RootID, level + 1 from Org child
join Orgs on child.parent = Orgs.ID
)
select Org.Id,
Org.Parent,
Org.Name,
Org.employees,
(select max(level) from Orgs a where a.Id = Org.Id) as [Level],
S.ProductCountIncludingChildren
from Org
inner join (
select RootID,
sum(employees) as ProductCountIncludingChildren
from Orgs
group by RootID
) as S
on Org.Id = S.RootID
left join Org Org2 on Org2.ID = Org.Parent
order by Org.Id
/**** END QUERY ******/

how to get employee name using corresponding id from lookup table using sql

I have a sql query as following:
table1 has:
name1, name2, date .....
user_table has:
employee_id, employee_name
table1 has id values under column name1 and name2 and user_table has id and corresponding name.
It would have been a straight forward join. But after the date 2019-03-20, name1 and name2 from table1 have id like 100101, 100102. And before 2019-03-20, name1 and name2 has values such as tom, dick, harry etc.
The goal here is obvious, to replace the id values with employee names in table1
My initial idea is to do a UNION between two segments of the table1, before 2019-03-20 and after 2019-03-20.
select t.*, u.employee_name as name1a, u1.employee_name as name2a
from table1 t
left join user_table u on t.name1= u.employee_id
left join user_table u1 on t.name2 = u1.employee_id
where
cast(t.approvedate as date) > '2019-03-20';
question 1: Is there a better solution than doing UNION?
question 2: To do a UNION both sides must have the same number of columns. But the query above will produce two additional columns name1a, name2a. Now I can just select the column names to avoid that issue, but what if I have too many columns to list in the select statement ?
Updated with sample table and desired result. I have following tables:
test_sales
CREATE TABLE test_sales (product varchar(20) ,sales_date varchar(20), person_1 varchar(20) , person_2 varchar(20) ) ;
INSERT INTO test_sales (product, sales_date, person_1, person_2) VALUES ('abc', '2019-04-01', '101', '110'), ('abc', '2019-04-10', '102', '111'),('abc', '2019-03-15', 'tom', 'john'), ('xyz', '2019-03-21', 'tom', 'dick'), ('xyz', '2019-03-29', 'harry', 'josh'), ('xyz', '2019-04-05', '102', '110'), ('xyz', '2019-03-29', 'harry', 'josh'), ('pqr', '2019-04-02', '101', '111');
test_user
CREATE TABLE test_user (employee_id varchar(10) ,employee_name varchar(20));
INSERT INTO test_user (employee_id, employee_name) VALUES ('101', 'john'),('102', 'josh'), ('110', 'tom'), ('111', 'dick');
And I want to get following output where the blank cells will also have names.
Right now I have this query which produces the result with blank cells.
select s.product, s.sales_date, u.employee_name as person_1, u1.employee_name as person_2 from test_sales s left join test_user u on s.person_1 = u.employee_id left join test_user u1 on s.person_2 =u1.employee_id;
You could join using an OR operator:
select t.*, u.employee_name as name1a, u1.employee_name as name2a
from table1 t
left join user_table u
on t.name1= u.employee_id
OR t.name2 = u.employee_id
OR t.name1 = u.employee_name
OR t.name2 = u.employee_name
It's not going to be fast, but it may do the trick. Optionally you can do 4 joins (name1 and name2 for id, and then name1 and name2 for name) and use coalesce:
select t.*, COALESCE(u.employee_name, u2.employee_name, u3.employee_name, u4.employee_name) AS employee_name
from table1 t
left join user_table u
on t.name1= u.employee_id
left join user_table u2
ON t.name2 = u2.employee_id
left join user_table u3
ON t.name1 = u3.employee_name
left join user_table u4
ON t.name2 = u4.employee_name

PostgreSQL: Get an entity with all his relationships

I have a table "Cars" and a table "Person". A Person drives many Cars and a Car can be driven by many People so I have another table "Person_Car" which has both id's per row.
Car(id, name)
Person(id, name)
Person_Car(car_id, person_id)
How can I get a list of all people with the cars it drives (car names concatenated), something like this:
("John", "Car 1, Car 2, Car 3")
("Kate", "Car 2, Car 4, Car 5")
Example is here: http://sqlfiddle.com/#!15/ba949/1
Test data:
Create table Car(id int, name text);
Create table Person(id int, name text);
Create table Person_Car(car_id int, person_id int);
INSERT INTO Car VALUES (1, 'Car 1'),
(2, 'Car 2'),
(3, 'Car 3'),
(4, 'Car 4'),
(5, 'Car 5');
INSERT INTO Person VALUES(1, 'John'), (2, 'Kate');
INSERT INTO Person_Car VALUES (1,1), (2,1), (3,1), (2,2), (4,2), (5,2);
Your desired code:
SELECT p.name, array_to_string(array_agg(c.name), ',') FROM Person p
INNER JOIN Person_Car pc ON p.id=pc.person_id
INNER JOIN Car c ON c.id=pc.car_id
GROUP by p.name
Output:
John Car 1,Car 2,Car 3
Kate Car 2,Car 4,Car 5
Just in case you want to avoid the GROUP BY
Option 1
WITH forienKeyTable AS
(
SELECT pc.person_id, c.name
FROM Car c
JOIN Person_Car pc ON pc.car_id = c.id
)
SELECT p.name
, array_to_string
(ARRAY(
SELECT fkt.name
FROM forienKeyTable fkt
WHERE fkt.person_id = p.id
)::text[], ','::text, 'empty'::text)
FROM Person p;
Option 2
SELECT p.name
, array_to_string
(ARRAY(
SELECT c.name
FROM Car c
JOIN Person_Car pc ON pc.car_id = c.id
WHERE pc.person_id = p.id
)::text[], ','::text, 'empty'::text)
FROM Person p;

SQL Query JOIN or IN operator?

I have two tables,
PERSON
and FRIENDS.
FRIENDS has the fields NAME and SURNAME.
A person has N friends.
I want to retrieve all the PERSONs that have atleast two FRIENDs, one with name ="mark", and the other with name="rocco" and surname ="siffredi".
Example: if I have a person that has 5 friends, one of them is called mark and no one is called rocco siffredi, no tables are returned.
I was thinking about:
SELECT * FROM person p
JOIN friends AS f ON p.ID=f.personID
WHERE f.name ="mark" AND f IN
( SELECT * from FRIENDS WHERE name="rocco" and surname="siffredi")
or
SELECT * FROM person p
JOIN friends AS f1 ON p.ID=f1.personID
JOIN friends AS f2 ON p.ID=f2.personID
WHERE f1.name="mark" AND f2.name="rocco" AND f2.surname="siffredi"
What is the best way? I mean the fastest way to execute it.
I don't care about readability.
Is there any other way to execute this query?
Ty.
EDIT: added the join on the ID...
I had to guess your column names and make up a table:
Use EXISTS:
CREATE table FRIENDS(person_id INT, friend_id INT)
go
SELECT *
FROM person
WHERE
EXISTS
(SELECT *
FROM friends f
JOIN person per
ON f.friend_id = per.id
WHERE
per.name ='mark' AND
person.id = f.person_id) AND
EXISTS
(SELECT *
FROM friends f
JOIN person per
ON f.friend_id = per.id
WHERE
per.name = 'rocco' AND
per.surname='siffredi' AND
person.id = f.person_id)
Your schema design isn't very good for what you are trying to do... I would have a Person table as you have, which would also contain a unique identifier called PersonId. I would then have a Friends table which took two fields - Person1Id and Person2Id.
This gives you a couple of important advantages - first of all your system is able to handle more than one bloke called John Smith (because we join on Ids rather than Names...). Secondly, a person's details are only ever recorded in the Person table. One definition of truth...
With these data as input:
INSERT INTO Person VALUES
(1, 'Bob', 'Smith'),
(2, 'Jim', 'Jones')
INSERT INTO Friends VALUES
(1, 1, 'Mark', 'Tally'),
(2, 1, 'John', 'Smith'),
(3, 1, 'Jack', 'Pollock'),
(4, 2, 'Mark', 'Rush'),
(5, 2, 'Rocco', 'Siffredi'),
(6, 2, 'Mark', 'Bush')
you can use this query:
SELECT PersonId, COUNT(*) AS NoOfFriends
FROM (
SELECT DISTINCT PersonId, Name,
Surname = CASE WHEN NAME = 'Mark' THEN NULl
ELSE Surname
END
FROM Friends
WHERE Name = 'Mark' OR (Name = 'Rocco' AND Surname = 'Siffredi') ) t
GROUP BY PersonId
to get the distinct number of required friends per PersonID:
PersonId NoOfFriends
------------------------
1 1
2 2
You can now join with the above table expression on PersonId and filter it by NoOfFriends:
SELECT p.*
FROM Person AS p
INNER JOIN (
SELECT PersonId, COUNT(*) AS NoOfFriends
FROM (
SELECT DISTINCT PersonId, Name,
Surname = CASE WHEN NAME = 'Mark' THEN NULl
ELSE Surname
END
FROM Friends
WHERE Name = 'Mark' OR (Name = 'Rocco' AND Surname = 'Siffredi') ) t
GROUP BY PersonId ) s ON s.PersonId = p.ID
WHERE s.NoOfFriends = 2
so as to get only persons having the required combination of associated friends:
ID Name Surname
-------------------
2 Jim Jones
P.S. I have completely re-written my answer after #t-clausen.dk's comment.