I have limited experience with SQL and I am trying to build a query that 'automatically' uses the xyzDesc record in place of xyzID references in the result of the query.
I've included a sample of what I am looking for. It is important to keep in mind, that the recordset I am trying to produce has ~ 35 columns (where it is necessary to initially outer join 3 very large tables) where ~10 columns need to be xref as I hope will be demonstrated by the example. Additionally, the database I am querying has the underlining tables containing millions of rows.
Project table:
projectID projectDesc capitalSpend regionID statusID
-------------------------------------------------------------
1 Project A 200 1 7
2 Project B 300 1 2
3 Project C 200 1 5
4 Project D 100 2 4
5 Project E 300 2 3
6 Project F 500 3 1
7 Project G 400 3 1
StatusXref table
statusID statusDesc
------------------------
1 Proposed
2 Prelim
3 Scheduled
4 Execute
5 Completed
6 On Hold
7 Decline
RegionXref table:
regionID regionDesc
------------------------
1 New York
2 Houston
3 Los Angeles
4 Chicago
5 Denver
6 Dallas
7 Boston
Expected results when executing query:
projectID projectDesc capitalSpend Region Status
---------------------------------------------------------------
1 Project A 200 New York Decline
2 Project B 300 New York Prelim
3 Project C 200 New York Completed
4 Project D 100 Houston Execute
5 Project E 300 Houston Scheduled
6 Project F 500 Los Angeles Proposed
7 Project G 400 Los Angeles Proposed
This seems like it should be 'easy' as it would be a simple vlookup in excel but I'm reluctant to pull all the data into excel and then do these lookups as excel row limitations prevent full data.
Try something like this:
select p.projectID, p.projectDesc, p.capitalSpend, r.regionDesc, s.statusDesc
from Project p
inner join StatusXref s on p.statusID = s.statusID
inner join RegionXref r on p.regionID = r.regionID
A JOIN is exactly what you are looking for. There are several types of joins, but the most common one is an inner join.
Example query:
SELECT p.projectID, P.projectDesc, P.capitalSpend, R.regionDesc, S.statusDesc
FROM Project P
JOIN StatusXref S ON P.statusID = S.statusID
JOIN RegionXref R ON P.regionID = R.regionID;
This SQLFiddle will let you run queries on your small dataset. Be sure to use the select the correct version of SQL in the top left corner.
http://sqlfiddle.com/#!3/0ebc89
P.S. When querying on your large dataset, you may find the LIMIT clause to be useful to test and see if your query is working properly without running across the millions of rows.
Related
I'm studying SQL and somehow I'm stuck with a question. I have 2 tables ('users' and 'follows').
Follows Table
user_id
follows
date
1
2
1993-09-01
2
1
1989-01-01
3
1
1993-07-01
2
3
1994-10-10
3
2
1995-03-01
4
2
1988-08-08
4
1
1988-08-08
1
4
1994-04-02
1
5
2000-01-01
5
1
2000-01-02
5
6
1986-01-10
7
1
1990-02-02
1
7
1996-10-01
1
8
1993-09-03
8
1
1995-09-01
8
9
1995-09-01
9
8
1996-01-10
7
8
1993-09-01
3
9
1996-05-30
4
9
1996-05-30
Users Table
user_id
first_name
last_name
school
1
Harry
Potter
Gryffindor
2
Ron
Wesley
Gryffindor
3
Hermonie
Granger
Gryffindor
4
Ginny
Weasley
Gryffindor
5
Draco
Malfoy
Slytherin
6
Tom
Riddle
Slytherin
7
Luna
Lovegood
Ravenclaw
8
Cho
Chang
Ravenclaw
9
Cedric
Diggory
Hufflepuff
I need to list all rows from follows where someone from one house follows someone from a different house. I tried to make 2 queries, one to get all houses related to follows.user_id and another one with all houses related to follows.follows and "merge" then:
select a.nome_id, a.user_id_house, b.follows_id, b.follows_house
from ( select follows.user_id as nome_id
, users.house as user_id_house
from follows inner join users
ON users.user_id = follows.user_id
) as a,
( select follows.follows as follows_id
, users.house as follows_house
from follows inner join users
ON follows.user_id = users.user_id
) as b
where a.user_id_house <> b.follows_house;
The problem is that the result is like 400 rows, its not right. I have no idea how I could solve this.
Try this
SELECT follows.user_id, users.school, followers.user_id, followers.school FROM follows
JOIN users ON follows.user_id=users.user_id
JOIN users as followers ON follows.follows=followers.user_id
WHERE users.school <> followers.school
Note: Pay attention to naming in my answer
Thanks for correcting to Thorsten Kettner
I have the following SQL tables and I'm basically trying to pull a table of every game that Ralph played in for 2018, and the amount of points scored.
Ralph has a unique_id, but may play on multiple teams, or in different positions. Each year that he plays has a new record entered into the player info table for each of those teams and/or positions.
The games data table's player ID may use both of Ralph's player info records, so for instance, records 1 and 2 of game data are both for Ralph, and his actual total points scored is 18 (12 + 6). I don't need those points to be added together, as that can be done easier in PHP, but I do need both records pulled.
------------------------------
Player Info as pi
------------------------------
id | unique_id | year | name | team | pos
1 5000 2018 Ralph 5 F
2 5000 2018 Ralph 5 C
3 5600 2018 Bill 5 G
4 5000 2017 Ralph 4 F
5 2688 2016 Mike 6 G
------------------------------
Game Info as gi
------------------------------
id | team 1 | team 2
1 5 6
2 6 5
3 8 3
4 6 2
------------------------------
Game Data as gd
------------------------------
id | game_info_id | player_id | Points
1 1 1 12
2 1 2 6
3 2 1 4
4 4 5 6
The table should show pi.id, pi.unique_id, gi.id, gd.* WHERE gd.player_id = Any of Ralph's pi.id's AND pi.year=2018
Any help here is appreciated, this seems a bit out of my wheelhouse.
Join the tables like this:
select
pi.id, pi.unique_id, gi.id, gd.*
from playerinfo pi
inner join gameinfo gi on pi.team in (gi.team1, gi.team2)
inner join gamedata gd on gd.game_info_id = gi.id and gd.player_id = pi.id
where pi.name = 'Ralph' and pi.year = 2018
My company sends folks to training. Based on projected new hires/transfers, I was asked to generate a report that estimates the number of seats we need in each course broken out by quarter.
Question: My question is two-fold:
What is the best way to represent a sequence of courses (i.e. prerequisites) in a relational DB?
How do I create the query(-ies) necessary to produce the following desired output:
Desired Output:
ID PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 1 1 1/14/2017 1/14/2017
2 2 1 2/17/2017 2/17/2017
3 2 2 2/18/2017 2/19/2017
4 2 3 2/20/2017 2/20/2017
5 3 49 1/18/2017 2/03/2017
6 …
Background Info: The courses are taken in-sequence: the first few courses are orientation courses for the company, and later courses are more specific to the employee's workrole. There are over 50 different courses, 40 different workroles and we're projecting ~1k new hires/transfers. Each work role must take a sequence of courses in a prescribed order, but I'm having trouble representing this ordering and subsequently writing the necessary query.
Existing Tables:
I have several tables that I've used to store the data: Personnel, LnkPersonnelToWorkroles,Workroles, LnkWorkrolesToCourses, and Courses (there's many others as well, but I omit them for the sake of scoping this question down). Here's some notional data from these tables:
Personnel (These are the projected new hires and their estimated arrival date.)
ID DisplayName RequiredCompletionDate
1 Kristel Bump 10/1/2016
2 Shelton Franke 3/11/2017
3 Shaunda Launer 4/16/2017
4 Clarinda Kestler 3/13/2017
5 My Wimsatt 6/6/2017
6 Gillian Bramer 10/25/2016
7 ...
Workroles (These are the positions in the company)
ID Workrole
1 Manager
2 Secretary
3 Admin Asst.
4 ...
LnkPersonnelToWorkroles (Links projected new hires to their projected workrole)
ID PersonnelID WorkroleID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 ...
Courses (All courses available)
ID CourseName LengthInDays
1 Orientation 1
2 Email Etiquette 2
3 Workplace Safety 1
4 ...
LnkWorkrolesToCourses
(Links workroles to their required courses in a Many-to-Many relationship)
ID WorkroleID CourseID
1 1 1
2 2 1
3 2 2
4 2 3
5 3 49
6 ...
Thoughts: My approach is to first develop a person-by-person schedule based upon the new hire's target completion date and workrole. Then for each class, I could sum the number of new hires starting in that quarter.
I've considered trying to represent the courses in the most general way I could think of (i.e. using a directed acyclic graph), but since most of the courses have only a single prerequisite course, I think it's much easier to represent the prerequisites using the Prerequisites table below; however, I don't know how I would use this in a query.
Prerequisites (Is this a good idea?)
ID CourseID PrereqCourseID
1 2 1
2 3 1
3 4 1
4 5 4
5 ...
Note: I am not currently concerned with whether or not the courses are actually offered on those days; we will figure out the course schedules once we know approximately how many we need each quarter. Right now, we're trying to estimate the demand for each course.
Edit 1: To clarify the Desired Output table: if the person begins course 1 on day D, then they can't start course 2 until after they finish course 1, i.e. until the next day. For courses with a length L >1 days, the start date for a subsequent courses is delayed L days. Notice this effect playing out for workrole ID 2 in the Desired Output table: He is expected to arrive on 2/17, start and complete course 1 the same day, begin course 2 the next day (on 2/18), and finish course 2 the day after that (on 2/19).
I'm posting this answer because it gives me an approximate solution; other answers are still welcome.
I avoided a prerequisite table altogether and opted for a simpler approach: a partial ordering of the courses.
First, I drew the course prerequisite tree; it looked similar to this image:
I defined a partial ordering of the courses based on their depth in the prerequisite tree. In the picture above, CHM124 and High School Chem w/ Lab are priority 1, CHM152 is priority 2, CHM 153 is priority 3, CHM260 and CHM 270 are priority 4, and so on... This partial ordering was stored in the CoursePriority table:
CoursePriority:
ID CourseID Priority
1 1 1
2 2 2
3 3 3
4 4 3
5 5 4
6 6 3
7 ...
So that no two courses would every be taken at the same time, I perturbed each course's priority by a small random number using the following Update query:
UPDATE CoursePriority SET CoursePriority.Priority = [Priority]+Rnd([ID])/1000;
(I used [ID] as input to the Rnd method to ensure each course was perturbed by a different random number.) I ended up with this:
ID CourseID Priority
1 1 1.000005623
2 2 2.000094955
3 3 3.000036401
4 4 3.000052486
5 5 4.000076711
6 6 3.00000535
7 ...
The approach above answers my first question "What is the best [sensible] way to represent a sequence of courses (i.e. prerequisites) in a relational DB?" Now as for generating the course schedule...
First, I created a query qryLnkCoursesPriorities to link Courses to the CoursePriority table:
SELECT Courses.ID AS CourseID, Courses.DurationInDays, CoursePriority.Priority
FROM Courses INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID;
Result:
CourseID DurationInDays Priority
1 35 1.000076177
2 21 2.000148297
3 28 3.000094352
4 14 3.000081442
5...
Second, I created the qryWorkrolePriorityDelay query:
SELECT LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.CourseID AS CourseID, qryLnkCoursePriorities.Priority, qryLnkCoursePriorities.DurationInDays, ([DurationInDays]+Nz(DSum("DurationInDays","qryLnkCoursePriorities","[Priority]>" & [Priority] & ""))) AS LeadTimeInDays
FROM LnkWorkrolesToCourses INNER JOIN qryLnkCoursePriorities ON LnkWorkrolesToCourses.CourseID = qryLnkCoursePriorities.CourseID
ORDER BY LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.Priority;
Simply put: The qryWorkrolePriorityDelay query tells me how many days in advance each course should be taken to ensure the new hire can complete all subsequent courses prior to their required training completion deadline. It looks like this:
WorkroleID CourseID Priority DurationInDays LeadTimeInDays
1 7 1.000060646 7 147
1 1 1.000076177 35 140
1 2 2.000148297 21 105
1 4 3.000081442 14 84
1 6 3.000082824 14 70
1 3 3.000094352 28 56
1 5 4.000106905 28 28
2...
Finally, I was able to bring this all together to create the qryCourseSchedule query:
SELECT Personnel.ID AS PersonnelID, LnkWorkrolesToCourses.CourseID, [ProjectedHireDate]-[leadTimeInDays] AS ProjectedStartDate, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays] AS ProjectedEndDate
FROM Personnel INNER JOIN (((LnkWorkrolesToCourses INNER JOIN (Courses INNER JOIN qryWorkrolePriorityDelay ON Courses.ID = qryWorkrolePriorityDelay.CourseID) ON (Courses.ID = LnkWorkrolesToCourses.CourseID) AND (LnkWorkrolesToCourses.WorkroleID = qryWorkrolePriorityDelay.WorkroleID)) INNER JOIN LnkPersonnelToWorkroles ON LnkWorkrolesToCourses.WorkroleID = LnkPersonnelToWorkroles.WorkroleID) INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID) ON Personnel.ID = LnkPersonnelToWorkroles.PersonnelID
ORDER BY Personnel.ID, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays];
This query gives me the following output:
PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 7 5/7/2016 5/14/2016
1 1 5/14/2016 6/18/2016
1 2 6/18/2016 7/9/2016
1 4 7/9/2016 7/23/2016
1 6 7/23/2016 8/6/2016
1 3 8/6/2016 9/3/2016
1 5 9/3/2016 10/1/2016
2...
With this output, I created a pivot table, where course start dates were grouped by quarter and counted. This gave me exactly what I needed.
Given the following SQL tables:
Administrators:
id Name rating
1 Jeff 48
2 Albert 55
3 Ken 35
4 France 56
5 Samantha 52
6 Jeff 50
Meetings:
id originatorid Assitantid
1 3 5
2 6 3
3 1 2
4 6 4
I would like to generate a table from Ken's point of view (id=3) therefore his id could be possibly present in two different columns in the meetings' table. (The statement IN does not work since I introduce two different field columns).
Thus the ouput would be:
id originatorid Assitantid
1 3 5
2 6 3
If you really just need to see which column Ken's id is in, you only need an OR. The following will produce your example output exactly.
SELECT * FROM Meetings WHERE originatorid = 3 OR Assistantid = 3;
If you need to take the complex route and list names along with meetings, an OR in your join's ON clause should work here:
SELECT
Administrators.name,
Administrators.id,
Meetings.originatorid,
Meetings.Assistantid
FROM Administrators
JOIN Meetings
ON Administrators.id = Meetings.originatorid
OR Administrators.id = Meetings.Assistantid
Where Administrators.name = 'Ken'
I have the following 3 tables:
1) Sweetness Table
FruitIndex CountryIndex Sweetness
1 1 10
1 2 20
1 3 400
2 1 50
2 2 123
2 3 1
3 1 49
3 2 40
3 3 2
2) Fruit Name Table
FruitIndex FruitName
1 Apple
2 Orange
3 Peaches
3) Country Name Table
CountryIndex CountryName
1 UnitedStates
2 Canada
3 Mexico
I'm trying to perform a CrossTab SQL query to end up with:
Fruit\Country UnitedStates Canada Mexico
Apple 10 20 400
Orange 50 123 1
Peaches 49 40 2
The challenging part is to label the rows/columns with the relevant names from the Name tables.
I can use MS Access to design 2 queries,
create the joins the fruit/country names table with the Sweetness table
perform crosstab query
However I'm having trouble doing this in a single query. I've attempted nesting the 1st query's SQL into the 2nd, but it doesn't seem to work.
Unfortunately, my solution needs to be be wholly SQL, as it is an embedded SQL query (cannot rely on query designer in MS Access, etc.).
Any help greatly appreciated.
Prembo.
How about:
TRANSFORM First(Sweetness.Sweetness) AS FirstOfSweetness
SELECT Fruit.FruitName
FROM (Sweetness
INNER JOIN Fruit
ON Sweetness.FruitIndex = Fruit.FruitIndex)
INNER JOIN Country
ON Sweetness.CountryIndex = Country.CountryIndex
GROUP BY Fruit.FruitName
PIVOT Country.CountryName;
I hate to rely on an outside post and present it as my answer, but this is a pretty steep topic and I can't do it justice. So I suggest you look at this article.