Join repeatedly until a string is present? - sql

While querying a subset of our employees, I'm trying to add a field for the SVP they "roll up" to.
Employees may have anywhere from 1 to 5 or 6 degrees of separation from their SVP. The trouble is, we don't have any specific hierarchical indicator to reference. I have to do this by walking up through the employee's manager repeatedly, until some manager's manager has "SVP" in their title.
How could I write a query to do this?
From the opposite direction, I've found the employees of a specific SVP (named BM for the example) by saying, 'Employee's manager is BM, OR Employee's manager's manager is BM, OR Employee's manager's manager's manager is BM' and so on...
For my instance, I suspect I'd only use the same sys_user table over and over again, following the manager field each time until I reach a user with SVP in the title.
+--------+-------------------+-----------+--------+
| sys_id | name | title | manager|
+--------+-------------------+-----------+--------+
| 555789 | Tina Belcher | Contractor| 123456 |
| 123456 | Bob Belcher | Manager | 654321 |
| 654321 | Calvin Fischoeder | SVP | 997755 |
+--------+-------------------+-----------+--------+
SELECT su.Name
, su.Title
, dp.name
, mg.name
FROM sys_user su
LEFT JOIN cmn_department dp
ON dp.sys_id = su.department
LEFT JOIN sys_user mg
ON mg.sys_id = su.manager
WHERE su.Title like ('%contractor%')
I appreciate any helps or tip that can be offered. Thanks for looking and have a great day.

Your Sys_User table is an adjacency list that only provides information about employees and who they directly report to. Adjacency Lists are one way to to encode hierarchical data. They're nice because they're relatively fast and compact, however they aren't the only way to encode hierarchical relationships.
To answer the questions you are asking you will benefit from re-encoding the data into a Closure table which maps each employee to all of its direct and indirect managers/reportees along with their degree of separation, and any other additional pertinent info. However, since it represents a Many-to-Many relationship you don't want to over load it with too much additional data. Fortunately thanks to the utility of recursive queries you can create one fairly easily on the fly.
To create a closure table you start by populating it with the degree 0 relationships, where every employee is considered their own manager/reportee. The reasoning for it is a bit beyond my ken, but it has something to do with the mathematics behind the concept of transitive closure (hence the name closure table). After that you iteratively (recursively) add each additional reporting degree. You can either do that from the Top Down, or from the Bottom Up
Here's the Top Down version:
with closure(manager_id, report_id, degree, is_managing_SVP, is_reporting_svp) as (
select sys_id
, sys_id
, 0
, case when title like '%SVP%' then 1 else 0 end
, case when title like '%SVP%' then 1 else 0 end
from sys_user
union all
select cur.manager_id
, nxt.sys_id
, cur.degree+1
, cur.is_managing_SVP
, case when nxt.title like '%SVP%' then 1 else 0 end
from closure cur
join sys_user nxt
on nxt.manager = cur.report_id
and nxt.sys_id <> nxt.manager
)
select * from closure
And here's the Bottom Up version:
with closure(manager_id, report_id, degree, is_managing_SVP, is_reporting_svp) as (
select sys_id
, sys_id
, 0
, case when title like '%SVP%' then 1 else 0 end
, case when title like '%SVP%' then 1 else 0 end
from sys_user
union all
select nxt.manager
, cur.report_id
, cur.degree+1
, case when mgr.title like '%SVP%' then 1 else 0 end
, cur.is_reporting_SVP
from closure cur
join sys_user nxt
on nxt.sys_id = cur.manager_id
and nxt.sys_id <> nxt.manager
join sys_user mgr
on mgr.sys_id = nxt.manager
)
select * from closure
It doesn't matter too much which version you use if you are going to generate the entire closure table, however, if you want to optimize your query and only generate a partial closure table then it depends on if you want to walk up or down the tree.
Once generated you can use the closure table to answer your questions about SVPs such as who each contractor's SVP is:
select r.sys_id, r.name, r.title, c.degree
, c.manager_id SVP_ID
, m.name SVP_name
, m.title SVP_title
from sys_user r
join closure c
on c.report_id = r.sys_id
join sys_user m
on m.sys_id = c.manager_id
where r.title like '%contractor%'
and c.is_managing_svp = 1
sys_id | name | title | degree | SVP_ID | SVP_name | SVP_title
-----: | :----------- | :--------- | -----: | -----: | :---------------- | :--------
555789 | Tina Belcher | Contractor | 2 | 654321 | Calvin Fischoeder | SVP
Or every direct and indirect report to the SVP named Calvin Fischoeder:
select m.sys_id manager_id
, m.name
, m.title
, c.degree
, r.sys_id report_id
, r.name report_name
, r.title report_title
from sys_user m
join closure c
on c.manager_id = m.sys_id
join sys_user r
on r.sys_id = c.report_id
where m.name = 'Calvin Fischoeder'
order by degree, report_name
manager_id | name | title | degree | report_id | report_name | report_title
---------: | :---------------- | :---- | -----: | --------: | :---------------- | :-----------
654321 | Calvin Fischoeder | SVP | 0 | 654321 | Calvin Fischoeder | SVP
654321 | Calvin Fischoeder | SVP | 1 | 123456 | Bob Belcher | Manager
654321 | Calvin Fischoeder | SVP | 2 | 555789 | Tina Belcher | Contractor
To see all queries in action, check out this db<>fiddle

You are looking for a recursive CTE:
with cte as (
select su.sys_id, su.name, su.title, su.manager, 1 as lev, 0 as hit_svp
from sys_user su
where su.Title like '%contractor%'
union all
select su.sys_id, su.name, su.title, su.manager, lev + 1,
(case when su.title like '%SVP%' then 1 else 0 end) as hit_svp
from sys_user su join
cte
on cte.manager = su.sys_id
where cte.hit_svp = 0
)
select . . . -- whatever columns you want
from cte; -- you may want additional joins here for other columns

Related

How to return a single value built from values stored in multiple records?

I am an application developer unfortunately put in the position of needing to write (/update) the SQL statement in order to return data for the application. My experience with SQL is limited, so would appreciate any help.
We have a Oracle Database 11g (11.2.0.4.0)
Example Tables
I've created the following example which replicates our set-up. It consists of:
A main table which contains records of trips around different cities. (MAIN_TRIP_TABLE)
Various additional tables which contain additional properties linked to these trips via INNER JOINs. (ADDITIONAL_TABLE)
A separate table showing the steps taken along the journey (ie. interim locations visited). A value of STEP_NUM = 1 is always the final destination, and thus there is always at least 1 record in this table per trip in the main table. If there were any interim stops made of the journey they are listed in this table as separate records with STEP_NUM iterating upwards. (JOURNEY_STEPS_TABLE)
MAIN_TRIP_TABLE
RECORD_ID | PROP_1 | PROP_2 | FINAL_DEST | ...
-------------------------------------------------
10001 | A | 1 | London | ...
10002 | A | 0 | Reading | ...
10003 | B | 1 | Leeds | ...
10004 | B | 0 | York | ...
ADDITIONAL_TABLE
RECORD_ID | PROP_3 | ...
------------------------
10001 | X | ...
10002 | Y | ...
10003 | Y | ...
10004 | X | ...
JOURNEY_STEPS_TABLE
RECORD_ID | STEP_NUM | LOCATION | ...
--------------------------------------
10001 | 1 | London | ...
10002 | 1 | Reading | ...
10002 | 2 | Bath | ...
10003 | 1 | Leeds | ...
10003 | 2 | York | ...
10003 | 3 | Bristol | ...
10004 | 1 | York | ...
10004 | 2 | Cardiff | ...
10004 | 3 | Oxford | ...
10004 | 4 | London | ...
Issue
I want to retrieve something that looks like:
SELECT
MAIN_TRIP_TABLE.RECORD_ID
, MAIN_TRIP_TABLE.PROP_1
, MAIN_TRIP_TABLE.PROP_2
, ADDITIONAL_TABLE.PROP_3
, <Concatenation/Array of JOURNEY_STEPS_TABLE> as "InterimStops"
FROM MAIN_TRIP_TABLE
INNER JOIN ADDITIONAL_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = ADDITIONAL_TABLE.RECORD_ID
LEFT OUTER JOIN JOURNEY_STEPS_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = JOURNEY_STEPS_TABLE.RECORD_ID
Where the "InterimStops" value above is some sort of concatenation of any and all values in found in the JOURNEY_STEPS_TABLE, for that particular RECORD_ID, in order of increasing STEP_NUM, with some sort of deliminator. (eg for '10001' I would want just "London", and for '10004' I would want "York,Cardiff,Oxford,London").
If I get something like this, I can then separate these out to an JSON array, within the application I'm developing.
Note: The actual SQL SELECT query is already significantly more complex with other fields and tables, so changing the query away from 1 SELECT query (ie. instead using multiple queries), is something I'd like to avoid unless absolutely necessary.
Things I've tried
After some Googling, I started to build a SQL statement using LISTAGG, and to begin with it looked promising:
SELECT
MAIN_TRIP_TABLE.RECORD_ID
, LISTAGG(JOURNEY_STEPS_TABLE.LOCATION, ',') WITHIN GROUP (ORDER BY JOURNEY_STEPS_TABLE.STEP_NUMBER) "InterimStops"
FROM MAIN_TRIP_TABLE
LEFT OUTER JOIN JOURNEY_STEPS_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = JOURNEY_STEPS_TABLE.RECORD_ID
GROUP BY MAIN_TRIP_TABLE.RECORD_ID
This returned exactly the sort of value I was looking for, but this failed as soon as I tried to bring back in the other values from both the main table and additional tables (eg: MAIN_TRIP_TABLE.PROP_1, MAIN_TRIP_TABLE.PROP_2, ADDITIONAL_TABLE.PROP_3). This gave me a "ORA-00979: not a GROUP BY expression" error.
I then tried to get this data via a subquery but struggled to get anything working.
Any help, insight, or pointing in the right direct, would be very much appreciated.
Many Thanks
It's easier to do this with a subquery so you don't have to group the data on the joined set of columns (as you allready tried):
SELECT MAIN_TRIP_TABLE.RECORD_ID
, (SELECT LISTAGG(JOURNEY_STEPS_TABLE.LOCATION, ',') WITHIN GROUP (ORDER BY JOURNEY_STEPS_TABLE.STEP_NUMBER)
FROM JOURNEY_STEPS_TABLE
WHERE JOURNEY_STEPS_TABLE.RECORD_ID = MAIN_TRIP_TABLE.RECORD_ID) "InterimStops"
FROM MAIN_TRIP_TABLE
The other possibility is to LEFT JOIN the grouped data:
SELECT MAIN_TRIP_TABLE.RECORD_ID
, JOURNEY_STEPS_TABLE."InterimStops"
FROM MAIN_TRIP_TABLE
LEFT JOIN (SELECT RECORD_ID
, LISTAGG(LOCATION, ',') WITHIN GROUP (ORDER BY STEP_NUMBER) "InterimStops"
FROM JOURNEY_STEPS_TABLE
GROUP BY RECORD_ID) JOURNEY_STEPS_TABLE
ON JOURNEY_STEPS_TABLE.RECORD_ID = MAIN_TRIP_TABLE.RECORD_ID

Find uncovered periods without exploding each combination

I have the following two tables
People
+--------+---------------+-------------+
| Name | ContractStart | ContractEnd |
+--------+---------------+-------------+
| Kate | 20180101 | 20181231 |
| Sawyer | 20180101 | 20181231 |
| Ben | 20170601 | 20181231 |
+--------+---------------+-------------+
Shifts
+---------+--------+------------+----------+
| Station | Name | ShiftStart | ShiftEnd |
+---------+--------+------------+----------+
| Swan | Kate | 20180101 | 20180131 |
| Arrow | Kate | 20180301 | 20180331 |
| Arrow | Kate | 20180401 | 20181231 |
| Flame | Sawyer | 20180101 | 20181231 |
| Swan | Ben | 20180101 | 20181231 |
+---------+--------+------------+----------+
It means that, for example, Kate will be available from 20180101 to 20181231. In this period of time she will work at station Swan from 20180101 to 20180131, at station Arrow from 20180301 to 20180331 and from 20180401 to 20181231.
My goal is to come to the following table
+------+---------------+-------------+
| | VacationStart | VacationEnd |
+------+---------------+-------------+
| Kate | 20180201 | 20180228 |
| Ben | 20170601 | 20171231 |
+------+---------------+-------------+
that means that Kate will be free from 20180201 to 20180228.
My first idea was to create a table with every day of the 2017 and 2018, let's say a CalTable, then JOIN the table with People to find every day that every person should be available. At this point JOIN again the resulting table with Shifts to have evidence of the days NOT BETWEEN ShiftStart AND ShiftEnd.
This steps give me correct results but are very slow, considering that I have almost 1.000.000 of person and usually between ContractStart and ContractEnd the are 10-20 years.
What could be a correct approach to get the results in a more clever and fast way?
Thanks.
This is the data of the example on db<>Fiddle
For # A_Name_Does_Not_Matter this is my attempt
CREATE TABLE #CalTable([ID] VARCHAR(8) NOT NULL)
DECLARE #num int
SET #num = 20170101
WHILE (#num <= 20181231)
BEGIN
INSERT INTO #CalTable([ID])
SELECT #num AS [ID]
SET #num = #num + 1
END
SELECT X.[Name], X.[TIMEID]
FROM (
-- All day availables
SELECT DISTINCT A.[Name],B.[ID] AS [TIMEID]
FROM #People A INNER JOIN #CalTable B
ON B.[ID] BETWEEN A.[ContractStart] AND A.[ContractEnd]
) X
LEFT JOIN (
-- Working day
SELECT DISTINCT A.[Name],B.[ID] AS [TIMEID]
FROM #People A INNER JOIN #CalTable B
ON B.[ID] BETWEEN A.[ContractStart] AND A.[ContractEnd]
INNER JOIN #Shifts C ON A.[Name]=C.[Name] AND B.[ID] BETWEEN C.[ShiftStart] AND C.[ShiftEnd]
) Z
ON X.[Name]=Z.[Name] AND X.[TIMEID]=Z.[TIMEID]
WHERE Z.[Name] IS NULL
ORDER BY X.[Name],X.[TIMEID]
and then aggregate the dates witk this query.
so a persons start date could be the start of a vacation, and you can find the end of that vacation by finding the date of their first shift (minus 1 day) by using CROSS APPLY to get the TOP 1 shift, ORDERED BY DATE
In an unusual situation that they have no shifts, their vacation ends on their contract end date.
Future vacations then start the day after a shift, and end the day before the next shift (can be found by OUTER APPLY) and defaulted to contracted end date if there is no further shift
SELECT p.name, p.contractStart vacationstart, p.ContractEnd vacationend from people p WHERE not exists(select 1 from shifts s where p.name = s.name)
UNION
SELECT p2.name,
p2.contractStart vacationstart,
dateadd(day,-1,DQ.ShiftStart) as vacationend
from PEOPLE P2
CROSS APPLY
(SELECT TOP 1 s2.ShiftStart FROM shifts s2 WHERE p2.name = s2.name order by sfiftstart) DQ
WHERE DQ.ShiftStart > p2.contractstart
UNION
select P3.NAME,
dateadd(day,1,s3.ShiftEnd) vacationstart,
COALESCE(dateadd(day,-1, DQ2.shiftStart),P3.ContractEnd) --you might have to add handling yourself for removing a case where they work on their contract end date
FROM people p3 JOIN shifts s3 on p3.name = s3.name
OUTER APPLY (SELECT TOP 1 s4.shiftStart
from shifts s4
where s4.name = p3.name
and
s4.shiftstart > s3.shiftstart
order by s4.shiftstart) DQ2
it's hard for me to verify without test data.
For an employee, what I seek is.
Contract Start, Shift1Start - 1
Shift1End + 1, Shift2Start - 1
Shift2End + 1, Shift3Start - 1
Shift3End + 1, ContractEnd
then add the case with 'no shifts'
finally shifts may be contiguous, leading to vacations of duration of zero or less - you could filter these by making the query a sub query, and simply filtering

Count in SQL server

select
hotel.hotel_name 'Hotel Name',
Address 'Address',
Description 'Description',
Contact_No 'Contact Number'
from hotel, customer_profile, booking_details where
customer_profile.email = booking_details.email
and hotel.hotel_name = booking_details.hotel_name
and booking_details.email = 'davidho#yahoo.com'
order by 1
I can't add images /:
So for example currently my result is
+-----------------------+
| Name | Address |
+-----------------------+
| Hotel1 | Beach Road1 |
| Hotel2 | Beach Road2 |
| Hotel2 | Beach Road2 |
+-----------------------+
I want it to remove the duplicates, and add a new column "Number of times" which indicate how many times it appeared.
I want it to be
+------------------------------------+
| Name | Address | No of Times |
+------------------------------------+
| Hotel1 | Beach Road1 | 1 |
| Hotel2 | Beach Road2 | 2 |
+------------------------------------+
You should use for your examples group by clause a below
select
hotel.hotel_name "Hotel Name",
Address, count(1) "No of Times"
from hotel, customer_profile, booking_details where
customer_profile.email = booking_details.email
and hotel.hotel_name = booking_details.hotel_name
and booking_details.email = 'davidho#yahoo.com'
group by hotel.hotel_name, Address
order by 1
First, learn proper join syntax, don't use single quote for column names, and use column aliases for all columns. Here is what I think your current query is doing.
select h.hotel_name as HotelName, h.Address. h.Description
cp.Contact_No
from booking_details bd join
hotel h
on h.hotel_name = bd.hotel_name join
customer_profile cp
on cp.email = bd.email
where bd.email = 'davidho#yahoo.com'
order by 1;
Next, to solve your question, just use group by. You can also eliminate the join to customer_profile, because I don't think that table gets used:
select h.hotel_name as HotelName, h.Address. count(*) as NumberOfTimes
from booking_details bd join
hotel h
on h.hotel_name = bd.hotel_name
where bd.email = 'davidho#yahoo.com'
order by 1;

Multiple select or distinct

I am new to sql so looking for a little help - got the first part down however I am having issues with the second part.
I got the three tables tied together. First I needed to tie tblPatient.ID = tblPatientVisit.PatientID together to eliminate dups which works
Now I need to take those results and eliminate dups in the MRN but my query is only returning one result which is WRONG - LOL
Query
select
tblPatient.id,
tblPatient.firstname,
tblPatient.lastname,
tblPatient.dob,
tblPatient.mrn,
tblPatientSmokingScreenOrder.SmokeStatus,
tblPatientVisit.VisitNo
from
tblPatient,
tblPatientSmokingScreenOrder,
tblPatientVisit
Where
tblPatient.ID = tblPatientVisit.PatientID
and tblPatientVisit.ID = tblPatientSmokingScreenOrder.VisitID
and tblPatient.ID in(
Select Distinct
tblPatient.mrn
From
tblPatient
where
isdate(DOB) = 1
and Convert(date,DOB) <'12/10/2000'
and tblPatientVisit.PatientType = 'I')
Actual Results:
ID | firstName | LastName | DOB | MRN | SmokeStatus | VisitNO
12 | Test Guy | Today | 12/12/1023 | 0015396 | Never Smoker | 0013957431
Desired Results:
90 | BOB | BUILDER | 02/24/1974 | 0015476 | Former Smoker | 0015476001
77 | DORA | EXPLORER | 06/04/1929 | 0015463 | Never Smoker | 0015463001
76 | MELODY | VALENTINE | 09/17/1954 | 0015461 | Current | 0015461001
32 | STRAWBERRY | SHORTCAKE | 07/06/1945 | 0015415 | Current | 0015415001
32 | STRAWBERRY | SHORTCAKE | 07/06/1945 | 0015415 | Never Smoker | 0015415001
32 | STRAWBERRY | SHORTCAKE | 07/06/1945 | 0015415 | Former Smoker | 0015415001
12 | Test Guy | Today | 12/12/1023 | 0015345 | Never Smoker | 0013957431
Anyone have any suggestions on how I go down to the next level and get all the rows with one unique MRN. From the data above I should have 5 in my list. Any help would be appreciated.
Thanks
If I had to guess -- The only thing that looks odd (but maybe it's OK) is that you're comparing patient.ID from your parent query to patient.mrn in the subquery.
Beyond that -- things to check:
(1)
Do you get all your patients with the inner query?
Select Distinct
tblPatient.mrn
From
tblPatient
where
isdate(DOB) = 1
and Convert(date,DOB) <'12/10/2000'
and tblPatientVisit.PatientType = 'I'
(2)
What is your patient type for the missing records? (Your filtering it to tblPatientVisit.PatientType = 'I' -- do the missing records have that patient type as well?)
Perhaps you need to invert the logic here. As in,
Select Distinct
patients.mrn1
From (select
tblPatient.id as id1,
tblPatient.firstname as firstname1,
tblPatient.lastname as lastname1,
tblPatient.dob as DOB1,
tblPatient.mrn as mrn1,
tblPatientSmokingScreenOrder.SmokeStatus as SmokeStatus1,
tblPatientVisit.VisitNo as VisitNo1,
tblPatientVisit.PatientType as PatientType1,
from
tblPatient,
tblPatientSmokingScreenOrder,
tblPatientVisit
Where
tblPatient.ID = tblPatientVisit.PatientID
and tblPatientVisit.ID = tblPatientSmokingScreenOrder.VisitID
) as patients
where
isdate(patients.DOB1) = 1
and Convert(date,patients.DOB1) <'12/10/2000'
and patients.PatientType1 = 'I');
Cleaned it up a bit and I think they were right. MRN wont match patient id, at least not from your example data. You should not need an inner query. This query should give you what you want.
SELECT DISTINCT
p.id,
p.firstname,
p.lastname,
p.dob,
p.mrn,
s.SmokeStatus,
v.VisitNo
FROM
tblPatient p
JOIN tblPatientVisit v ON p.id = v.patientId
JOIN tblPatientSmokingScreenOrder s ON v.id = s.visitId
WHERE
isdate(p.DOB) = 1
AND CONVERT(date,p.DOB) <'12/10/2000'
AND v.PatientType = 'I'

Generating a hierarchy

I got the following question at a job interview and it completely stumped me, so I'm wondering if anybody out there can help explain it to me. Say I have the following table:
employees
--------------------------
id | name | reportsTo
--------------------------
1 | Alex | 2
2 | Bob | NULL
3 | Charlie | 5
4 | David | 2
5 | Edward | 8
6 | Frank | 2
7 | Gary | 8
8 | Harry | 2
9 | Ian | 8
The question was to write a SQL query that returned a table with a column for each employee's name and a column showing how many people are above that employee in the organization: i.e.,
hierarchy
--------------------------
name | hierarchyLevel
--------------------------
Alex | 1
Bob | 0
Charlie | 3
David | 1
Edward | 2
Frank | 1
Gary | 2
Harry | 1
Ian | 2
I can't even figure out where to begin writing this as a SQL query (a cursor, maybe?). Can anyone help me out in case I get asked a similar question to this again? Thanks.
The simplest example would be to use a (real or temporary) table, and add one level at a time (fiddle):
INSERT INTO hierarchy
SELECT id, name, 0
FROM employees
WHERE reportsTo IS NULL;
WHILE ((SELECT COUNT(1) FROM employees) <> (SELECT COUNT(1) FROM hierarchy))
BEGIN
INSERT INTO hierarchy
SELECT e.id, e.name, h.hierarchylevel + 1
FROM employees e
INNER JOIN hierarchy h ON e.reportsTo = h.id
AND NOT EXISTS(SELECT 1 FROM hierarchy hh WHERE hh.id = e.id)
END
Other solutions will be slightly different for each RDBMS. As one example, in SQL Server, you can use a recursive CTE to expand it (fiddle):
;WITH expanded AS
(
SELECT id, name, 0 AS level
FROM employees
WHERE reportsTo IS NULL
UNION ALL
SELECT e.id, e.name, level + 1 AS level
FROM expanded x
INNER JOIN employees e ON e.reportsTo = x.id
)
SELECT *
FROM expanded
ORDER BY id
Other solutions include recursive stored procedures, or even using dynamic SQL to iteratively increase the number of joins until everybody is accounted for.
Of course all these examples assume there are no cycles and everyone can be traced up the chain to a head honcho (reportsTo = NULL).