I have the following 3 tables:
1) Sweetness Table
FruitIndex CountryIndex Sweetness
1 1 10
1 2 20
1 3 400
2 1 50
2 2 123
2 3 1
3 1 49
3 2 40
3 3 2
2) Fruit Name Table
FruitIndex FruitName
1 Apple
2 Orange
3 Peaches
3) Country Name Table
CountryIndex CountryName
1 UnitedStates
2 Canada
3 Mexico
I'm trying to perform a CrossTab SQL query to end up with:
Fruit\Country UnitedStates Canada Mexico
Apple 10 20 400
Orange 50 123 1
Peaches 49 40 2
The challenging part is to label the rows/columns with the relevant names from the Name tables.
I can use MS Access to design 2 queries,
create the joins the fruit/country names table with the Sweetness table
perform crosstab query
However I'm having trouble doing this in a single query. I've attempted nesting the 1st query's SQL into the 2nd, but it doesn't seem to work.
Unfortunately, my solution needs to be be wholly SQL, as it is an embedded SQL query (cannot rely on query designer in MS Access, etc.).
Any help greatly appreciated.
Prembo.
How about:
TRANSFORM First(Sweetness.Sweetness) AS FirstOfSweetness
SELECT Fruit.FruitName
FROM (Sweetness
INNER JOIN Fruit
ON Sweetness.FruitIndex = Fruit.FruitIndex)
INNER JOIN Country
ON Sweetness.CountryIndex = Country.CountryIndex
GROUP BY Fruit.FruitName
PIVOT Country.CountryName;
I hate to rely on an outside post and present it as my answer, but this is a pretty steep topic and I can't do it justice. So I suggest you look at this article.
Related
My company sends folks to training. Based on projected new hires/transfers, I was asked to generate a report that estimates the number of seats we need in each course broken out by quarter.
Question: My question is two-fold:
What is the best way to represent a sequence of courses (i.e. prerequisites) in a relational DB?
How do I create the query(-ies) necessary to produce the following desired output:
Desired Output:
ID PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 1 1 1/14/2017 1/14/2017
2 2 1 2/17/2017 2/17/2017
3 2 2 2/18/2017 2/19/2017
4 2 3 2/20/2017 2/20/2017
5 3 49 1/18/2017 2/03/2017
6 …
Background Info: The courses are taken in-sequence: the first few courses are orientation courses for the company, and later courses are more specific to the employee's workrole. There are over 50 different courses, 40 different workroles and we're projecting ~1k new hires/transfers. Each work role must take a sequence of courses in a prescribed order, but I'm having trouble representing this ordering and subsequently writing the necessary query.
Existing Tables:
I have several tables that I've used to store the data: Personnel, LnkPersonnelToWorkroles,Workroles, LnkWorkrolesToCourses, and Courses (there's many others as well, but I omit them for the sake of scoping this question down). Here's some notional data from these tables:
Personnel (These are the projected new hires and their estimated arrival date.)
ID DisplayName RequiredCompletionDate
1 Kristel Bump 10/1/2016
2 Shelton Franke 3/11/2017
3 Shaunda Launer 4/16/2017
4 Clarinda Kestler 3/13/2017
5 My Wimsatt 6/6/2017
6 Gillian Bramer 10/25/2016
7 ...
Workroles (These are the positions in the company)
ID Workrole
1 Manager
2 Secretary
3 Admin Asst.
4 ...
LnkPersonnelToWorkroles (Links projected new hires to their projected workrole)
ID PersonnelID WorkroleID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 ...
Courses (All courses available)
ID CourseName LengthInDays
1 Orientation 1
2 Email Etiquette 2
3 Workplace Safety 1
4 ...
LnkWorkrolesToCourses
(Links workroles to their required courses in a Many-to-Many relationship)
ID WorkroleID CourseID
1 1 1
2 2 1
3 2 2
4 2 3
5 3 49
6 ...
Thoughts: My approach is to first develop a person-by-person schedule based upon the new hire's target completion date and workrole. Then for each class, I could sum the number of new hires starting in that quarter.
I've considered trying to represent the courses in the most general way I could think of (i.e. using a directed acyclic graph), but since most of the courses have only a single prerequisite course, I think it's much easier to represent the prerequisites using the Prerequisites table below; however, I don't know how I would use this in a query.
Prerequisites (Is this a good idea?)
ID CourseID PrereqCourseID
1 2 1
2 3 1
3 4 1
4 5 4
5 ...
Note: I am not currently concerned with whether or not the courses are actually offered on those days; we will figure out the course schedules once we know approximately how many we need each quarter. Right now, we're trying to estimate the demand for each course.
Edit 1: To clarify the Desired Output table: if the person begins course 1 on day D, then they can't start course 2 until after they finish course 1, i.e. until the next day. For courses with a length L >1 days, the start date for a subsequent courses is delayed L days. Notice this effect playing out for workrole ID 2 in the Desired Output table: He is expected to arrive on 2/17, start and complete course 1 the same day, begin course 2 the next day (on 2/18), and finish course 2 the day after that (on 2/19).
I'm posting this answer because it gives me an approximate solution; other answers are still welcome.
I avoided a prerequisite table altogether and opted for a simpler approach: a partial ordering of the courses.
First, I drew the course prerequisite tree; it looked similar to this image:
I defined a partial ordering of the courses based on their depth in the prerequisite tree. In the picture above, CHM124 and High School Chem w/ Lab are priority 1, CHM152 is priority 2, CHM 153 is priority 3, CHM260 and CHM 270 are priority 4, and so on... This partial ordering was stored in the CoursePriority table:
CoursePriority:
ID CourseID Priority
1 1 1
2 2 2
3 3 3
4 4 3
5 5 4
6 6 3
7 ...
So that no two courses would every be taken at the same time, I perturbed each course's priority by a small random number using the following Update query:
UPDATE CoursePriority SET CoursePriority.Priority = [Priority]+Rnd([ID])/1000;
(I used [ID] as input to the Rnd method to ensure each course was perturbed by a different random number.) I ended up with this:
ID CourseID Priority
1 1 1.000005623
2 2 2.000094955
3 3 3.000036401
4 4 3.000052486
5 5 4.000076711
6 6 3.00000535
7 ...
The approach above answers my first question "What is the best [sensible] way to represent a sequence of courses (i.e. prerequisites) in a relational DB?" Now as for generating the course schedule...
First, I created a query qryLnkCoursesPriorities to link Courses to the CoursePriority table:
SELECT Courses.ID AS CourseID, Courses.DurationInDays, CoursePriority.Priority
FROM Courses INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID;
Result:
CourseID DurationInDays Priority
1 35 1.000076177
2 21 2.000148297
3 28 3.000094352
4 14 3.000081442
5...
Second, I created the qryWorkrolePriorityDelay query:
SELECT LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.CourseID AS CourseID, qryLnkCoursePriorities.Priority, qryLnkCoursePriorities.DurationInDays, ([DurationInDays]+Nz(DSum("DurationInDays","qryLnkCoursePriorities","[Priority]>" & [Priority] & ""))) AS LeadTimeInDays
FROM LnkWorkrolesToCourses INNER JOIN qryLnkCoursePriorities ON LnkWorkrolesToCourses.CourseID = qryLnkCoursePriorities.CourseID
ORDER BY LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.Priority;
Simply put: The qryWorkrolePriorityDelay query tells me how many days in advance each course should be taken to ensure the new hire can complete all subsequent courses prior to their required training completion deadline. It looks like this:
WorkroleID CourseID Priority DurationInDays LeadTimeInDays
1 7 1.000060646 7 147
1 1 1.000076177 35 140
1 2 2.000148297 21 105
1 4 3.000081442 14 84
1 6 3.000082824 14 70
1 3 3.000094352 28 56
1 5 4.000106905 28 28
2...
Finally, I was able to bring this all together to create the qryCourseSchedule query:
SELECT Personnel.ID AS PersonnelID, LnkWorkrolesToCourses.CourseID, [ProjectedHireDate]-[leadTimeInDays] AS ProjectedStartDate, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays] AS ProjectedEndDate
FROM Personnel INNER JOIN (((LnkWorkrolesToCourses INNER JOIN (Courses INNER JOIN qryWorkrolePriorityDelay ON Courses.ID = qryWorkrolePriorityDelay.CourseID) ON (Courses.ID = LnkWorkrolesToCourses.CourseID) AND (LnkWorkrolesToCourses.WorkroleID = qryWorkrolePriorityDelay.WorkroleID)) INNER JOIN LnkPersonnelToWorkroles ON LnkWorkrolesToCourses.WorkroleID = LnkPersonnelToWorkroles.WorkroleID) INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID) ON Personnel.ID = LnkPersonnelToWorkroles.PersonnelID
ORDER BY Personnel.ID, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays];
This query gives me the following output:
PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 7 5/7/2016 5/14/2016
1 1 5/14/2016 6/18/2016
1 2 6/18/2016 7/9/2016
1 4 7/9/2016 7/23/2016
1 6 7/23/2016 8/6/2016
1 3 8/6/2016 9/3/2016
1 5 9/3/2016 10/1/2016
2...
With this output, I created a pivot table, where course start dates were grouped by quarter and counted. This gave me exactly what I needed.
I have limited experience with SQL and I am trying to build a query that 'automatically' uses the xyzDesc record in place of xyzID references in the result of the query.
I've included a sample of what I am looking for. It is important to keep in mind, that the recordset I am trying to produce has ~ 35 columns (where it is necessary to initially outer join 3 very large tables) where ~10 columns need to be xref as I hope will be demonstrated by the example. Additionally, the database I am querying has the underlining tables containing millions of rows.
Project table:
projectID projectDesc capitalSpend regionID statusID
-------------------------------------------------------------
1 Project A 200 1 7
2 Project B 300 1 2
3 Project C 200 1 5
4 Project D 100 2 4
5 Project E 300 2 3
6 Project F 500 3 1
7 Project G 400 3 1
StatusXref table
statusID statusDesc
------------------------
1 Proposed
2 Prelim
3 Scheduled
4 Execute
5 Completed
6 On Hold
7 Decline
RegionXref table:
regionID regionDesc
------------------------
1 New York
2 Houston
3 Los Angeles
4 Chicago
5 Denver
6 Dallas
7 Boston
Expected results when executing query:
projectID projectDesc capitalSpend Region Status
---------------------------------------------------------------
1 Project A 200 New York Decline
2 Project B 300 New York Prelim
3 Project C 200 New York Completed
4 Project D 100 Houston Execute
5 Project E 300 Houston Scheduled
6 Project F 500 Los Angeles Proposed
7 Project G 400 Los Angeles Proposed
This seems like it should be 'easy' as it would be a simple vlookup in excel but I'm reluctant to pull all the data into excel and then do these lookups as excel row limitations prevent full data.
Try something like this:
select p.projectID, p.projectDesc, p.capitalSpend, r.regionDesc, s.statusDesc
from Project p
inner join StatusXref s on p.statusID = s.statusID
inner join RegionXref r on p.regionID = r.regionID
A JOIN is exactly what you are looking for. There are several types of joins, but the most common one is an inner join.
Example query:
SELECT p.projectID, P.projectDesc, P.capitalSpend, R.regionDesc, S.statusDesc
FROM Project P
JOIN StatusXref S ON P.statusID = S.statusID
JOIN RegionXref R ON P.regionID = R.regionID;
This SQLFiddle will let you run queries on your small dataset. Be sure to use the select the correct version of SQL in the top left corner.
http://sqlfiddle.com/#!3/0ebc89
P.S. When querying on your large dataset, you may find the LIMIT clause to be useful to test and see if your query is working properly without running across the millions of rows.
I'm trying to create a select statement which will return all linked rows based on a "LINK_SEQUENCE" column. The column however, only links two rows together. A simple query to combine the two makes perfect sense to me. However, in the case that three or more links in the "chain" are present, I want to make sure that all codes are returned.
How would I go about returning something similar to this when only handing the query one of the codes involved? EX: 3245.
Not sure if it much matters in this situation, but this is for an Oracle database. Thank you all very much!
Source data from SQL Fiddle:
ID CODE LINK_SEQUENCE NAME
1 3267 1 Potato
2 3245 1 Potato
3 3245 2 Potato
4 3975 2 Potato
5 3975 3 Potato
6 5478 3 Potato
7 2368 4 Apricot
8 4748 4 Apricot
9 8957 (null) Carrot
SELECT * FROM LinkedTable lt
WHERE ft.link_sequence IN
( SELECT link_sequence FROM LinkedTable WHERE code = 3245 AND link_sequence IS NOT NULL )
ORDER BY ft.ID;
See my SQL Fiddle DEMO.
SECOND ATTEMPT:
SELECT DISTINCT *
FROM LinkedTable
START WITH code = 3245
CONNECT BY NOCYCLE
PRIOR code = code AND PRIOR link_sequence+1 = link_sequence OR
PRIOR code <> code AND PRIOR link_sequence = link_sequence
ORDER BY link_sequence, code
;
Updated SQL Fiddle with this code. Please try to break it.
Based on your data (starting with 3245) it gives the following chain:
ID CODE LINK_SEQUENCE NAME
2 3245 1 Potato
1 3267 1 Potato
3 3245 2 Potato
4 3975 2 Potato
5 3975 3 Potato
6 5478 3 Potato
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Oracle: how to “group by” over a range?
Let's say I have data that looks like this:
Item Count
======== ========
1 123
2 1
3 47
4 117
5 18
6 466
7 202
I want to create a query that gives me this:
Count Start Count End Occurrences
=========== =========== ===========
0 100 3
101 200 2
201 300 1
301 400 0
401 500 1
Basically, I want to take a bunch of counts and group them into ranges for statistical rollups. I don't think I'm using the right keywords to find the answer to this. I am going against Oracle, though if there is an ANSI SQL answer I'd love to have it.
select
a.mini,
a.maxi,
count(a.item)
from
(
select
table.item,
case (table.counter)
when counter>=0 and counter<=100 then 0
when counter>100 and counter<200 then 101
when ....
end as mini
table.item,
case (table.counter)
when counter>=0 and counter<=100 then 100
when counter>100 and counter<200 then 201
when ....
end as maxi
from
table
) a
group by
a.mini,
a.maxi
One way is using CASE statements. If you want it to be scalable, try having the range in a separate table and use JOIN to count the occurence.
Given the following SQL tables:
Administrators:
id Name rating
1 Jeff 48
2 Albert 55
3 Ken 35
4 France 56
5 Samantha 52
6 Jeff 50
Meetings:
id originatorid Assitantid
1 3 5
2 6 3
3 1 2
4 6 4
I would like to generate a table from Ken's point of view (id=3) therefore his id could be possibly present in two different columns in the meetings' table. (The statement IN does not work since I introduce two different field columns).
Thus the ouput would be:
id originatorid Assitantid
1 3 5
2 6 3
If you really just need to see which column Ken's id is in, you only need an OR. The following will produce your example output exactly.
SELECT * FROM Meetings WHERE originatorid = 3 OR Assistantid = 3;
If you need to take the complex route and list names along with meetings, an OR in your join's ON clause should work here:
SELECT
Administrators.name,
Administrators.id,
Meetings.originatorid,
Meetings.Assistantid
FROM Administrators
JOIN Meetings
ON Administrators.id = Meetings.originatorid
OR Administrators.id = Meetings.Assistantid
Where Administrators.name = 'Ken'