SQL Route Finder in Oracle - Recursion? - sql

I am trying to build a simple route finder which calculates and stores the nodes of which a route traverses to get from A -- B. I have two tables; One which is made up of stages (The nodes and their 'next possible hops') and a route_stage table which should be able to store each route calculated with a unique route id.
Stage Table
STAGEID START_STATION NEXT_HOP_STATION LENGTH
---------- ------------------------------ ------------------------------ ----------
1 Penzance Plymouth 78
2 Plymouth Exeter 44.8
3 Exeter Taunton 36.6
4 Exeter Salisbury 96.6
5 Salisbury Basingstoke 38.2
6 Basingstoke Southampton 52.7
7 Southampton Poole 37
8 Poole Weymouth 31.6
9 Taunton Reading 99.5
10 Reading Basingstoke 18
11 Reading Paddington 40.9
12 Taunton Bristol 48.8
13 Bristol Bath 13
14 Bath Swindon 37.5
15 Swindon Reading 39.8
Route_Stage Table
ROUTEID STAGEID
---------- ----------
1 1
1 2
1 3
1 9
1 11
2 6
2 7
2 8
2 10
2 11
For the case of the above, the route with ID 1 Starts at Penzance and traverses, Plymouth, Exeter, Taunton, Reading and terminates at Paddington. Ideally I want to create a stored procedure that takes the entry parameters of a start and end station so the code inside will be able to calculate a suitable route.
I've had a look at recursion but got a bit lost, as I am not sure how the code should react when there are multiple potential paths from a node? How would it know which one was the correct one to go down.
Any help is greatly appreciated. Thanks!

For a single given starting position, this will (I think.. Sorry, typing by hand on an iPad) provide a row for each route that leaves that starting point.
SELECT
LEVEL as route_step,
t1.next_hop_station as next_station,
t1.stageid
FROM
stage t1
INNER JOIN stage t2
ON t2.start_station = t1.next_hop_station
START WITH
t1.start_station = 'your start station'
CONNECT BY
PRIOR t1.start_station = t1.next_hop_station
So, for start station Penzance:
Route_Step Next_Station StageID
1. Plymouth. 1
2. Exeter. 2
3. Taunton. 3
4. Reading. 9
5. Basingstoke. 10
6. Southampton 6
7. Poole. 7
8. Weymouth 8
5. Paddington. 11
3. Salisbury 4
4. Basingstoke. 5
5. Southampton. 6
6. Poole. 7
7. Weymouth. 8
* excuse the .'s!
Wrapping that with a join on your distinct starting stations (and removing the explicit START WITH clause so that you get routes from all stations, not just a single station) will give you what you need for your output table (although as per previous comments, I'm not sure what use that structure is to you, as you lose pertinent detail):
SELECT
First_Stage.stageid as routeid,
q.stageid
FROM
(
SELECT
LEVEL as route_step,
t1.next_hop_station as next_station,
t1.stageid
FROM
stage t1
INNER JOIN stage t2
ON t2.start_station = t1.next_hop_station
CONNECT BY
PRIOR t1.start_station = t1.next_hop_station
) q
INNER JOIN stage as first_stage
ON first_stage.stageid = q.stageid
AND q.route_step = 1

Related

Select maximum value where another column is used for for the Grouping

I'm trying to join several tables, where one of the tables is acting as a
key-value store, and then after the joins find the maximum value in a
column less than another column. As a simplified example, I have the following three tables:
Documents:
DocumentID
Filename
LatestRevision
1
D1001.SLDDRW
18
2
P5002.SLDPRT
10
Variables:
VariableID
VariableName
1
DateReleased
2
Change
3
Description
VariableValues:
DocumentID
VariableID
Revision
Value
1
2
1
Created
1
3
1
Drawing
1
2
3
Changed Dimension
1
1
4
2021-02-01
1
2
11
Corrected typos
1
1
16
2021-02-25
2
3
1
Generic part
2
3
5
Screw
2
2
4
2021-02-24
I can use the LEFT JOIN/IS NULL thing to get the latest version of
variables relatively easily (see http://sqlfiddle.com/#!7/5982d/3/0).
What I want is the latest version of variables that are less than or equal
to a revision which has a DateReleased, for example:
DocumentID
Filename
Variable
Value
VariableRev
DateReleased
ReleasedRev
1
D1001.SLDDRW
Change
Changed Dimension
3
2021-02-01
4
1
D1001.SLDDRW
Description
Drawing
1
2021-02-01
4
1
D1001.SLDDRW
Description
Drawing
1
2021-02-25
16
1
D1001.SLDDRW
Change
Corrected Typos
11
2021-02-25
16
2
P5002.SLDPRT
Description
Generic Part
1
2021-02-24
4
How do I do this?
I figured this out. Add another JOIN at the start to add in another version of the VariableValues table selecting only the DateReleased variables, then make sure that all the VariableValues Revisions selected are less than this date released. I think the LEFT JOIN has to be added after this table.
The example at http://sqlfiddle.com/#!9/bd6068/3/0 shows this better.

R or postgres SQL code : How to identify all connected values in a table to identify unique networks

I have a problem which is a little close to that of a social network. I need to identify all the candidates in one network of friends and give that network of friends a network name or number. I would have to write this in SQL(postgres) or in R
"Createdcolumn(network)") is what i need to create at my end. Column1 and Column2 are already in my data
S.no/lineno Column1 Column2 Createdcolumn(network)
1 Peet Jackson 1
2 Jason Filip 2
3 Luke Filip 2
4 Jason Becky 2
5 Aron Chris 3
6 Maron Cheese 4
7 Matt Brooklyn 5
8 Brooklyn Federer 5
9 Ruselle Federer 5
Little more info about the first figure to understand created column:
Lines 2,3,4 are together in network 2 because that is one circle of friends, here is the logic:
lines 2 and 3 are connected because of Filip
lines 4 and 2 are connected because of Jason
(so now all lines 2,3 & 4 are actually one network since they are all connected in someway; maybe a friend of friend, or friend of friend of friend, or n times a friend of a friend)
Likewise 7,8,9 are one network, here is the logic:
lines 7 and 8 are one network because of Brooklyn
lines 8 and 9 are one network because of Federer
(so now all lines 7,8 & 9 are actually one network since they are connected in someway; maybe a friend of friend, or friend of friend of friend, or n times a friend of a friend)
Line 1: Peet and Jackson has no other network of friends so that line is one network on its own
Line 5: Aron and Chris has no other network of friends so that line is one network on its own
Now
S.no/lineno Column1 Column2 Createdcolumn(network)
1 Peet Jackson 1
2 Jason Filip 2
3 Luke Filip 2
4 Jason Becky 2
5 Aron Chris 3
6 Maron Cheese 4
7 Matt Brooklyn 5
8 Brooklyn Federer 5
9 Ruselle Federer 5
10 Aron Ruselle 5
now explaining fig 2 for better understanding:
in Fig2 I added "Aron" and "Ruselle" to line 10. So now line 5 changed from network 3 to network 5 since all of them are connected:
lines 7 and 8 are connected because of Brooklyn
lines 8 and 9 are connected because of Federer
lines 9 and 10 are connected network because of Ruselle
lines 5 and 10 are connected network because of Aron
(so now all lines 5,7,8,9 and 10 are actually one network since they are connected in someway)
Callouts:
1) the network 5 in the Fig2 can be renamed as network "3" too, no issues. Main idea to to have all
connected people as part of ONE network
2) My list is NOT dynamic and will NOT grow from list in Fig 1 to Fig 2, so i only need a solution which will get my work done in short term. Scaling solution is not required right now
3) My table has the same two columns but almost 40K unique names, so I CANNOT hardcode any names into my code
4) The names can be alphabetic, numeric of alphanumeric too
I tried using complex full and cross joins but that was quite tedious. I read about the igraph package on R which might be what I am looking for
not sure if i explained my question clearly. Apologies incase of any confusion*
Thanks
**EDIT - Converted it to PostgresSQL syntax
Here is a solution, based on sql server cte syntax:
First create the table:
create table net(s int, c1 varchar(20), c2 varchar(20))
Next Populate with your data:
insert into net values
(1,'Peet','Jackson')
,(2,'Jason','Filip')
,(3,'Luke','Filip')
,(4,'Jason','Becky')
,(5,'Aron','Chris')
,(6,'Maron','Cheese')
,(7,'Matt','Brooklyn')
,(8,'Brooklyn','Federer')
,(9,'Ruselle','Federer')
,(10,'Aron','Ruselle')
Now the CTE:
;with recursive cte as (
select *, ','||c1||','||c2 as network, s as MaxS from net
union all
select net.*, cte.network||','
||case when cte.network like '%'||net.c1||'%' then net.c2 else net.c1 end,net.s
from net
join cte on cte.network like '%'||','||net.c1||'%' or cte.network like '%'||','||net.c2||'%'
where net.s>cte.MaxS
)
, groups as (
select net.*, network, MaxS,
row_number() over (partition by net.s order by length(network) desc) as longest
from net
join cte on cte.network like '%'||','||net.c1||'%'
)
select s,c1,c2,
dense_rank() over (order by MaxS) as groupno
from groups where longest=1
The result:
s c1 c2 groupno
1 Peet Jackson 1
2 Jason Filip 2
3 Luke Filip 2
4 Jason Becky 2
6 Maron Cheese 3
7 Matt Brooklyn 4
8 Brooklyn Federer 4
9 Ruselle Federer 4
10 Aron Ruselle 4
5 Aron Chris 4

In a game show database scenario, how do I fetch the average total episode score per season in a single query?

Pardon the title gore. I'm having trouble finding a good way to express my question, which is endemic to the problem.
The Tables
season
id name
---- ------
1 Season 1
2 Season 2
3 Season 3
episode
id season_id number title
---- ----------- -------- ---------------------------------------
1 1 1 Pilot
2 1 2 1x02 - We Got Picked Up
3 1 3 1x03 - This is the Third Episode
4 2 1 2x01 - We didn't get cancelled.
5 2 2 2x02 - We're running out of ideas!
6 3 1 3x01 - We're still here.
7 3 2 3x02 - Okay, this game show is dying.
8 3 3 3x03 - Untitled
score
id episode_id score contestant_id (table not given)
---- ------------ ------- ---------------------------------
1 1 35 1
2 1 -12 2
3 1 8 3
4 1 5 4
5 2 13 1
6 2 -2 5
7 2 3 3
8 2 -14 6
9 3 -14.5 1
10 3 -3 2
11 3 1.5 7
12 3 9.5 5
13 4 22.8 1
14 4 -3 8
15 5 2 1
16 5 13.5 9
17 5 7 3
18 6 13 1
19 6 -84 10
20 6 12 11
21 7 3 1
22 7 10 2
23 8 29 1
24 8 1 5
As you can see, you have multiple episodes per season, and multiple scores per episode (one score per contestant). Contestants can reappear in later episodes (irrelevant), scores are floating point values, and there can be an arbitrary number of scores per episode.
So what am I looking for?
I'd like to get the average total episode score per season, where the total episode score is the sum of all the scores in an episode. Mathematically, this comes out to be the sum of all scores in a season divided by the number of episodes. Easy enough to comprehend, but I have had trouble doing it in a single query and getting the correct result. I'd like an output like the following:
name average_total_episode_score
---------- -----------------------------
Season 1 9.83
Season 2 21.15
Season 3 -5.33
The top-level query needs to be on the season table as it will be combined with other, similar queries on the same table. It's easy enough to do this with an aggregate in a subquery, but an aggregation executes the subquery, failing my single-query requirement. Can this be done in a single query?
Hope this should work
Select s.id, avg(score)
FROM Season S,
Episode e,
Score sc
WHERE s.id = e.season_id
AND e.id = sc.episode_id
Group by s.id
Okay, just figured it out. As usual, I had to write and post a whole book before the simple solution descended upon me.
The problem in my query (which I didn't give in the question) was the lack of a DISTINCT count. Here is a working query:
SELECT
"season"."id",
"season"."name",
(SUM("score"."score") / COUNT(DISTINCT "episode"."id")) AS "average_total_episode_score"
FROM "season"
LEFT OUTER JOIN "episode"
ON ("season"."id" = "episode"."season_id")
LEFT OUTER JOIN "score"
ON ("episode"."id" = "score"."episode_id")
GROUP BY "season"."id"
select Se.id AS Season_Id, sum(score) As season_score, avg(score) from score S join episode E ON S.episode_id = E.id
join Season se ON se.id = e.season_id group by se.id

Query: Employee Training Schedules Based on Position/Workrole

My company sends folks to training. Based on projected new hires/transfers, I was asked to generate a report that estimates the number of seats we need in each course broken out by quarter.
Question: My question is two-fold:
What is the best way to represent a sequence of courses (i.e. prerequisites) in a relational DB?
How do I create the query(-ies) necessary to produce the following desired output:
Desired Output:
ID PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 1 1 1/14/2017 1/14/2017
2 2 1 2/17/2017 2/17/2017
3 2 2 2/18/2017 2/19/2017
4 2 3 2/20/2017 2/20/2017
5 3 49 1/18/2017 2/03/2017
6 …
Background Info: The courses are taken in-sequence: the first few courses are orientation courses for the company, and later courses are more specific to the employee's workrole. There are over 50 different courses, 40 different workroles and we're projecting ~1k new hires/transfers. Each work role must take a sequence of courses in a prescribed order, but I'm having trouble representing this ordering and subsequently writing the necessary query.
Existing Tables:
I have several tables that I've used to store the data: Personnel, LnkPersonnelToWorkroles,Workroles, LnkWorkrolesToCourses, and Courses (there's many others as well, but I omit them for the sake of scoping this question down). Here's some notional data from these tables:
Personnel (These are the projected new hires and their estimated arrival date.)
ID DisplayName RequiredCompletionDate
1 Kristel Bump 10/1/2016
2 Shelton Franke 3/11/2017
3 Shaunda Launer 4/16/2017
4 Clarinda Kestler 3/13/2017
5 My Wimsatt 6/6/2017
6 Gillian Bramer 10/25/2016
7 ...
Workroles (These are the positions in the company)
ID Workrole
1 Manager
2 Secretary
3 Admin Asst.
4 ...
LnkPersonnelToWorkroles (Links projected new hires to their projected workrole)
ID PersonnelID WorkroleID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 ...
Courses (All courses available)
ID CourseName LengthInDays
1 Orientation 1
2 Email Etiquette 2
3 Workplace Safety 1
4 ...
LnkWorkrolesToCourses
(Links workroles to their required courses in a Many-to-Many relationship)
ID WorkroleID CourseID
1 1 1
2 2 1
3 2 2
4 2 3
5 3 49
6 ...
Thoughts: My approach is to first develop a person-by-person schedule based upon the new hire's target completion date and workrole. Then for each class, I could sum the number of new hires starting in that quarter.
I've considered trying to represent the courses in the most general way I could think of (i.e. using a directed acyclic graph), but since most of the courses have only a single prerequisite course, I think it's much easier to represent the prerequisites using the Prerequisites table below; however, I don't know how I would use this in a query.
Prerequisites (Is this a good idea?)
ID CourseID PrereqCourseID
1 2 1
2 3 1
3 4 1
4 5 4
5 ...
Note: I am not currently concerned with whether or not the courses are actually offered on those days; we will figure out the course schedules once we know approximately how many we need each quarter. Right now, we're trying to estimate the demand for each course.
Edit 1: To clarify the Desired Output table: if the person begins course 1 on day D, then they can't start course 2 until after they finish course 1, i.e. until the next day. For courses with a length L >1 days, the start date for a subsequent courses is delayed L days. Notice this effect playing out for workrole ID 2 in the Desired Output table: He is expected to arrive on 2/17, start and complete course 1 the same day, begin course 2 the next day (on 2/18), and finish course 2 the day after that (on 2/19).
I'm posting this answer because it gives me an approximate solution; other answers are still welcome.
I avoided a prerequisite table altogether and opted for a simpler approach: a partial ordering of the courses.
First, I drew the course prerequisite tree; it looked similar to this image:
I defined a partial ordering of the courses based on their depth in the prerequisite tree. In the picture above, CHM124 and High School Chem w/ Lab are priority 1, CHM152 is priority 2, CHM 153 is priority 3, CHM260 and CHM 270 are priority 4, and so on... This partial ordering was stored in the CoursePriority table:
CoursePriority:
ID CourseID Priority
1 1 1
2 2 2
3 3 3
4 4 3
5 5 4
6 6 3
7 ...
So that no two courses would every be taken at the same time, I perturbed each course's priority by a small random number using the following Update query:
UPDATE CoursePriority SET CoursePriority.Priority = [Priority]+Rnd([ID])/1000;
(I used [ID] as input to the Rnd method to ensure each course was perturbed by a different random number.) I ended up with this:
ID CourseID Priority
1 1 1.000005623
2 2 2.000094955
3 3 3.000036401
4 4 3.000052486
5 5 4.000076711
6 6 3.00000535
7 ...
The approach above answers my first question "What is the best [sensible] way to represent a sequence of courses (i.e. prerequisites) in a relational DB?" Now as for generating the course schedule...
First, I created a query qryLnkCoursesPriorities to link Courses to the CoursePriority table:
SELECT Courses.ID AS CourseID, Courses.DurationInDays, CoursePriority.Priority
FROM Courses INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID;
Result:
CourseID DurationInDays Priority
1 35 1.000076177
2 21 2.000148297
3 28 3.000094352
4 14 3.000081442
5...
Second, I created the qryWorkrolePriorityDelay query:
SELECT LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.CourseID AS CourseID, qryLnkCoursePriorities.Priority, qryLnkCoursePriorities.DurationInDays, ([DurationInDays]+Nz(DSum("DurationInDays","qryLnkCoursePriorities","[Priority]>" & [Priority] & ""))) AS LeadTimeInDays
FROM LnkWorkrolesToCourses INNER JOIN qryLnkCoursePriorities ON LnkWorkrolesToCourses.CourseID = qryLnkCoursePriorities.CourseID
ORDER BY LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.Priority;
Simply put: The qryWorkrolePriorityDelay query tells me how many days in advance each course should be taken to ensure the new hire can complete all subsequent courses prior to their required training completion deadline. It looks like this:
WorkroleID CourseID Priority DurationInDays LeadTimeInDays
1 7 1.000060646 7 147
1 1 1.000076177 35 140
1 2 2.000148297 21 105
1 4 3.000081442 14 84
1 6 3.000082824 14 70
1 3 3.000094352 28 56
1 5 4.000106905 28 28
2...
Finally, I was able to bring this all together to create the qryCourseSchedule query:
SELECT Personnel.ID AS PersonnelID, LnkWorkrolesToCourses.CourseID, [ProjectedHireDate]-[leadTimeInDays] AS ProjectedStartDate, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays] AS ProjectedEndDate
FROM Personnel INNER JOIN (((LnkWorkrolesToCourses INNER JOIN (Courses INNER JOIN qryWorkrolePriorityDelay ON Courses.ID = qryWorkrolePriorityDelay.CourseID) ON (Courses.ID = LnkWorkrolesToCourses.CourseID) AND (LnkWorkrolesToCourses.WorkroleID = qryWorkrolePriorityDelay.WorkroleID)) INNER JOIN LnkPersonnelToWorkroles ON LnkWorkrolesToCourses.WorkroleID = LnkPersonnelToWorkroles.WorkroleID) INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID) ON Personnel.ID = LnkPersonnelToWorkroles.PersonnelID
ORDER BY Personnel.ID, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays];
This query gives me the following output:
PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 7 5/7/2016 5/14/2016
1 1 5/14/2016 6/18/2016
1 2 6/18/2016 7/9/2016
1 4 7/9/2016 7/23/2016
1 6 7/23/2016 8/6/2016
1 3 8/6/2016 9/3/2016
1 5 9/3/2016 10/1/2016
2...
With this output, I created a pivot table, where course start dates were grouped by quarter and counted. This gave me exactly what I needed.

How to implement Relay Teams in a Track & Field Database

I have a track and Field Database with these tables (simplified):
Performance Table
Row Athlete Event Mark Meet
1 1 3 0:55 A
2 2 2 2:25 A
3 3 3 0:54 A
4 4 4 4:10 A
5 2 2 2:11 A
6 3 2 2:12 B
7 1 1 10 C
Athlete Table
Row Name Age Sex
1 Joe 13 M
2 Amy 15 F
3 John 16 M
4 Tim 17 M
So I understand how to implement this for an event with only 1 athlete (e.g. 100 m dash), but how would I include a relay event with 4 athletes. So, for example a 4x400 relay would need 4 athletes. In other words, some events have only 1 athlete and some have more than one. I am not sure if I should use:
Linking Table
Add 4 Columns
Do a table like below.
Other
Option 3 Table
Performance Table (Event 5 is a relay)
Row Athlete Event Mark Meet
1 1 3 0:55 A
2 2 2 2:25 A
3 3 3 0:54 A
4 4 4 4:10 A
5 2 2 2:11 A
6 3 2 2:12 B
7 1 5 9:34 C
8 2 5 9:34 C
9 3 5 9:34 C
10 4 5 9:34 C
Are you going to have events in the system before they are finished? For example, today's meet will include a 4x400 and here are the runners...
If that's the case then you'll need the linking table that you referred to because you want to be able to have that data stand on its own. It would just have the event_id and athlete_id in it so that you could have that set up. That would also be the PK (Primary Key) for the table and you would then use those two columns as the FK (Foreign Key) to the Performance table that you have at the end. If the data will never exist without times then you could just skip the link table and have the Performance table, although having the link table still wouldn't hurt in that case.