Scenario:
Identify a students first class with the university and determine if they passed a second (consecutive) class after passing the first class with in 1 year of ending the first class. If the student did not pass a consecutive second class within 1 year of ending the first class, did they pass any other classes within the same timeframe, e.g. third, fourth, fifth class.
Questions stemming from the first portion of the scenario were easy enough to answer with the use of the lead() function to pull up the next consecutive class information to the same row as the first class. However, I am having trouble finding the best way to determine if the student passed any classes within the designated timeframe, i.e. within 1 year of ending the first class.
My Question:
Is there a way to perform a lookup/search within the partition created by the lead() function?
OR
Is it better to create an additional aggregated query based on passing grades and join back to the primary table based on the aforementioned date range and appropriate key(s) using WHERE EXISTS?
Thanks for taking a look...
Indeed the question isn't very clear. So I guessed and came up with this.
CREATE TABLE Class (
Id int NOT NULL PRIMARY KEY IDENTITY(1, 1),
ExamDate date NOT NULL,
PassMark float NOT NULL
)
CREATE TABLE Thing (
Id int NOT NULL PRIMARY KEY IDENTITY(1, 1),
StudentId int NOT NULL,
ClassId int NOT NULL,
Score float NULL,
CONSTRAINT Thing_FK_Class FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
;
WITH ctePasses AS (
-- Get the passes and order them.
SELECT t.*, c.ExamDate,
ROW_NUMBER() OVER (PARTITION BY StudentId ORDER BY c.ExamDate) AS n
FROM Thing t
INNER JOIN Class c ON t.ClassId = c.Id
WHERE ISNULL(t.Score, 0) >= c.PassMark
),
cteIntermediateFails AS (
-- For every student that has a pass get the number of fails that come after it until either the end of the next pass.
SELECT t.StudentId, p.n, COUNT(*) AS IntermediateFails
FROM Thing t
INNER JOIN Class c ON t.ClassId = c.Id
INNER JOIN ctePasses p ON t.StudentId = p.StudentId
LEFT JOIN ctePasses q ON p.StudentId = q.StudentId AND p.n + 1 = q.n
WHERE c.ExamDate > p.ExamDate AND c.ExamDate < q.ExamDate
GROUP BY t.StudentId, p.n
)
SELECT p1.StudentId, p1.ExamDate, p2.ExamDate, f.IntermediateFails
FROM ctePasses p1
LEFT JOIN ctePasses p2 ON p1.StudentId = p2.StudentId AND p2.n = p1.n + 1
LEFT JOIN cteIntermediateFails f ON p1.StudentId = f.StudentId AND p1.n = f.n
WHERE p1.n = 1
Using CTEs rather than subqueries lets me use the first one in the second one.
Related
I'm storing the records in SQL that represent a multiple inheritance relationship similar to the one in C++. Like that:
CREATE TABLE Classes
(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE Inheritance
(
class_id INTEGER NOT NULL,
base_class_id INTEGER NOT NULL,
FOREIGN KEY (class_id) REFERENCES Classes(id),
FOREIGN KEY (base_class_id) REFERENCES Classes(id)
);
The classes have properties of two types. These properties are inherited by the classes, but in different ways. The first type type of property whenever defined for the class overrides the value of the same property used in any of base classes. The other type accumulates the value: the property is actually a set of values, each class inherits all values of it's base classes, plus may add an additional (single) value to this set:
CREATE TABLE OverridableValues
(
class_id INTEGER PRIMARY KEY,
value TEXT NOT NULL,
FOREIGN KEY (class_id) REFERENCES Classes(id)
);
CREATE TABLE AccumulableValues
(
class_id INTEGER PRIMARY KEY,
value TEXT NOT NULL,
FOREIGN KEY (class_id) REFERENCES Classes(id)
);
The caveat with OverridableValues: there are no cases when the same property is overridden on different paths of multiple inheritance.
I'm trying to design queries using common table expressions that would return the value/values for a given property and class.
The approach that I'm trying to use is to start from the root (assume for simplicity that there is a single root class), and then to build the tree of paths from the root to every other class. The problem is how to pass the information about properties from the parents to children. For example below is an incorrect attempt to do that:
WITH ParentProperty (id, value) AS
(
SELECT c.id, a.value
FROM Classes c
LEFT JOIN AccumulableValues a
ON a.class_id = c.id
WHERE c.id = 1 --This is the root
UNION ALL
SELECT i.class_id, IFNULL(a.value, ba.value)
FROM ParentProperty p
JOIN Inheritance i
ON i.base_class_id = p.id
LEFT JOIN AccumulableValues a
ON a.class_id = i.class_id
LEFT JOIN AccumulableValues ba
ON ba.class_id = i.base_class_id
)
SELECT id, value
FROM ParentProperty;
I feel like I need one more UNION ALL inside the CTE, which is not allowed. But without it I either miss proper values or inherited ones. So far I've failed to design the query for both types of properties.
I'm using SQLite as my database engine.
Finally I've found a solution. I'm describing it below, but more efficient ones are still welcomed.
Let's start with the Accumulable property. My problem was that I tried to add more than one UNION ALL into a single CTE. I've solved that with adding additional CTE (see the AcquiresFrom)
WITH AcquiresFrom (class_id, from_class_id, value) AS
(
SELECT a.class_id, a.class_id, a.value
FROM AccumulatableValues a
UNION ALL
SELECT i.class_id, i.base_class_id, NULL
FROM Inheritance i
),
ClassProperty (class_id, value) AS
(
SELECT c.id, NULL
FROM Classes c
LEFT JOIN Inheritance i
ON i.class_id = c.id
WHERE i.base_class_id IS NULL
UNION ALL
SELECT a.class_id, IFNULL(a.value, p.value)
FROM ClassProperty p
JOIN AcquiresFrom a
ON (a.from_class_id = p.class_id AND a.from_class_id != a.class_id) OR
(a.class_id = p.class_id AND a.class_id = a.from_class_id AND p.value IS NULL)
)
SELECT DISTINCT class_id, value
FROM ClassProperty
WHERE value IS NOT NULL
ORDER BY class_id;
The AcquiresFrom means the way to aquire the value: the class either introduces a new value (the first clause) or to inherits it (the second clause). The ClassProperty incrementally propagates the values from base classes to derived. The only thing left to do is to eliminate duplicates and NULL values (the last clause SELECT DISTINCT / WHERE value IS NOT NULL).
The overridable property is more complex.
WITH Roots (id, value) AS
(
SELECT c.id, o.value
FROM Classes c
LEFT JOIN Inheritance i
ON i.class_id = c.id
LEFT JOIN OverridableValues o
ON o.class_id = c.id
WHERE i.base_class_id IS NULL
),
PossibleValues (id, acquired_from_id, value) AS
(
SELECT r.id, r.id, r.value
FROM Roots r
UNION ALL
SELECT i.class_id, CASE WHEN o.value IS NULL THEN p.acquired_from_id ELSE i.class_id END, IFNULL(o.value, p.value)
FROM PossibleValues p
JOIN Inheritance i
ON i.base_class_id = p.id
LEFT JOIN OverridableValues o
ON o.class_id = i.class_id
),
Split (class_id, base_class_id, direct) AS (
SELECT i.class_id, i.base_class_id, 1
FROM Inheritance i
UNION ALL
SELECT i.class_id, i.base_class_id, 0
FROM Inheritance i
),
Ancestors (id, ancestor_id) AS (
SELECT r.id, NULL
FROM Roots r
UNION ALL
SELECT s.class_id, CASE WHEN s.direct == 1 THEN a.id ELSE a.ancestor_id END
FROM Ancestors a
JOIN Split s
ON s.base_class_id = a.id
)
SELECT DISTINCT p.id, p.value
FROM PossibleValues p
WHERE p.acquired_from_id NOT IN
(
SELECT a.ancestor_id
FROM PossibleValues p1
JOIN PossibleValues p2
ON p2.id = p1.id
JOIN Ancestors a
ON a.id = p1.acquired_from_id AND a.ancestor_id = p2.acquired_from_id
WHERE p1.id = p.id
);
The Roots is obviously the list of classes that have no parents. The PossibleValues CTE propagates/overrides the values from roots to final classes, and breaks multiple inheritance cycles making the structure a tree-like. All valid id/value pairs are present in the result of this query, however some invalid values are present as well. These invalid values are those that were overridden on one of the branches, but this fact is not known on another branch. The acquired_from_id allows us to reconstruct who was that class that first introduced this value (that may be useful whenever two different classes intruduce the same value).
The last thing left is to resolve the ambiguity caused by multiple inheritance. Knowing the class and two possible values we need to know whether one value overrides the other. That is resolved with the Ancestors expression.
I'm working on a project right now and I need to do some request to my DB via SQL *PLUS.
Here is what I'm trying to do.
I want to get a table in which I get Professor first and last name with those conditons (I have to verify the first condition, and then the other):
(First) In a session (let's say 12004), a prof did teach those two courses, INF3180 and INF2110
(Second) In another session, 32003, a prof did teach those two courses, INF1130 and INF1110
Here is the code that created the DB:
CREATE TABLE Professor
(professorCode CHAR(5) NOT NULL,
lastName VARCHAR(10) NOT NULL,
firstName VARCHAR(10) NOT NULL,
CONSTRAINT PrimaryKeyProfessor PRIMARY KEY (professorCode)
)
;
CREATE TABLE Group
(sigle CHAR(7) NOT NULL,
noGroup INTEGER NOT NULL,
sessionCode INTEGER NOT NULL,
maxInscriptions INTEGER NOT NULL,
professorCode CHAR(5) NOT NULL,
CONSTRAINT PrimaryKeyGroup PRIMARY KEY
(sigle,noGroupe,sessionCode),
CONSTRAINT CESigleGroupeRefCours FOREIGN KEY (sigle) REFERENCES Cours,
CONSTRAINT CECodeSessionRefSession FOREIGN KEY (sessionCode) REFERENCES
Session,
CONSTRAINT CEcodeProfRefProfessor FOREIGN KEY(professorCode) REFERENCES
Professor
)
;
And here is my current not working request :
SELECT DISTINCT Professor.firstName, Professor.lastName
FROM Professor, Group
WHERE Group.professorCode = Professor.professorCode
AND Group.sessionCode = 32003
AND (Group.sigle = 'INF1130' AND
Group.sigle = 'INF1110')
OR Group.sessionCode = 12004
AND (Group.sigle = 'INF3180' AND
Group.sigle = 'INF2110')
I know there is a way to combine both results, but I can't seem to find it.
There is only one match possible in that case :
Only one match with 32003 : INF1130, INF1110
None match with 12004 : INF3180, INF2110
The resulting table is supposed to look like this :
--------------------------
First Name Last Name
--------------------------
Denis Tremblay
The proposed solution given by Gordon Linoff looks very good, except it returns me no table since with the following the code, it needs to have the 4 courses and 2 sessionCode to be included. The issue here is that it needs to verify both condition and append the result. Let's say the conditions for the session 12004 results to nothing, then I can consider it as NULL. Then, the second condition, with the session 32003, gives me one match. It should append both results to give me the table presented over.
I want to do one request only for this.
Thanks A LOT!
EDIT : Reformulated
EDIT2 : Gave an example of a known match
EDIT3 : Further explanation why the proposed solution isn't working
Think: group by and having. More importantly, think JOIN, JOIN, JOIN. Never use commas in the from clause.
SELECT p.firstName, p.lastName
FROM Professor p JOIN
Group g
ON g.professorCode = p.professorCode
WHERE (g.sessionCode, g.sigle) IN ( (32003, 'INF1130'), (32003, 'INF1110'),
(12004, 'INF3180'), (12004, 'INF2110')
)
GROUP BY p.firstName, p.lastName
HAVING COUNT(DISTINCT g.sigl) = 4; -- has all four
It seems like you want to list any professor who either taught INF1130 and INF1110 in 32003; or taught INF3180 and INF2110 in 12004. Unfortunately you've presented that as AND (i.e. they have to have taught all four courses - one pair of courses AND the other), not OR (one set of courses OR the other).
As a long-winded way of expanding what I think you want:
SELECT p.firstName, p.lastName
FROM Professor p
WHERE (
EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 32003
AND sigle = 'INF1130'
)
AND EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 32003
AND sigle = 'INF1110'
)
)
OR (
EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 12004
AND sigle = 'INF3180'
)
AND EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 12004
AND sigle = 'INF2110'
)
);
Four subqueries isn't going to be terribly efficient. You could do mutiple joins instead.
If you will always be looking for two sigle values per sessionCode then you could modify Gordon's answer to count how many matches each sigle, by adding that to the group-by clause:
SELECT p.firstName, p.lastName
FROM GroupX g
JOIN Professor p
ON p.professorCode = g.professorCode
WHERE (g.sessionCode, g.sigle) IN ( (32003, 'INF1130'), (32003, 'INF1110'),
(12004, 'INF3180'), (12004, 'INF2110')
)
GROUP BY p.firstName, p.lastName, g.sessionCode
HAVING COUNT(*) = 2;
If you did have a professor who taught all four then you would get them listed twice; if that can happen you could add your DISTINCT back in, though that feels a bit wrong. You could also use a subquery and IN to avoid that:
SELECT p.firstName, p.lastName
FROM Professor p
WHERE ProfessorCode IN (
SELECT professorCode
FROM GroupX
WHERE (sessionCode, sigle) IN ( (32003, 'INF1130'), (32003, 'INF1110'),
(12004, 'INF3180'), (12004, 'INF2110')
)
GROUP BY professorCode, sessionCode
HAVING COUNT(*) = 2
)
(I've changed Group to GroupX because that isn't a valid identifier; because it's a keyword. I assume you've changed your real names - maybe from another language?)
use modern join
SELECT Professor.firstName, Professor.lastName
FROM Professor join "Group" g on
g.professorCode = Professor.professorCode
where g.sessionCode in( 32003,12004 )
AND g.sigle in( 'INF1130', 'INF1110','INF3180','INF2110')
group by Professor.firstName, Professor.lastName
having count( distinct sigle )=4
I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id
You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.
Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1
SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name
I have 3 tables, match, players, and match_player - the third allowing a many-to-many relationship as each player has many matches, and a match consists of many players.
CREATE TABLE IF NOT EXISTS match
(id integer primary key,
date text,
winning_side integer)
CREATE TABLE IF NOT EXISTS player
(id integer primary key,
name text)
CREATE TABLE IF NOT EXISTS match_player
(id integer primary key autoincrement,
match_id integer,
player_id integer,
side integer,
foreign key(match_id) references match(id),
foreign key(player_id) references player(id))
The side can be 1 or 0.
I want to see what % of games a player wins.
I figure a good way of doing this is a SELECT statement on the match_player and then counting the number of entries where side == winning_side. But of course winning_side is in a separate table.
I've been googling on how to easily do this, but can't fathom it.
select
mp.player_id,
min(p.name) as name,
sum(case when m.winning_side = mp.side then 1 else 0 end) * 100.00
/ count(*) as win_percentage
from
match m
inner join match_player mp
on mp.match_id = m.id
inner join player p
on p.id = mp.player_id
group by mp.player_id
If you're really only interested in a single player it's pretty much the same thing. It wouldn't hurt anything but you no longer need the group by:
select
min(p.name) as name,
sum(case when m.winning_side = mp.side then 1 else 0 end) * 100.00
/ count(*) as win_percentage
from
match m
inner join match_player mp
on mp.match_id = m.id
inner join player p
on p.id = mp.player_id
where mp.player_id = ???
And if you don't even care about the player's name then you can eliminate one of the joins:
select
sum(case when m.winning_side = mp.side then 1 else 0 end) * 100.00
/ count(*) as win_percentage
from
match m
inner join match_player mp
on mp.match_id = m.id
where mp.player_id = ???
I don't know how Sqlite handles numbers but usually you need to be careful to make sure your division is not integer division. So you probably don't want to wait to multiply by 100.00 until the end of the expression.
Hopefully that gives you a good start.
Assuming that each match appears in match_player twice (or n-times), once for each player, then the player table is not actually needed. Then, you can just use avg() to get the value you want, because MySQL has the convenient functionality of treating boolean values as integers:
select mp.player_id,
avg(mp.side = m.winning_side) as proportion_winning
from match_player mp join
match m
on mp.match_id = m.id
group by mp.player_id;
If you want a value between 0 and 100, then multiply by 100:
select mp.player_id,
100 * avg(mp.side = m.winning_side) as proportion_winning
. . .
Please don't downgrade this as it is bit complex for me to explain. I'm working on data migration so some of the structures look weird because it was designed by someone like that.
For ex, I have a table Person with PersonID and PersonName as columns. I have duplicates in the table.
I have Details table where I have PersonName stored in a column. This PersonName may or may not exist in the Person table. I need to retrieve PersonID from the matching records otherwise put some hardcode value in PersonID.
I can't write below query because PersonName is duplicated in Person Table, this join doubles the rows if there is a matching record due to join.
SELECT d.Fields, PersonID
FROM Details d
JOIN Person p ON d.PersonName = p.PersonName
The below query works but I don't know how to replace "NULL" with some value I want in place of NULL
SELECT d.Fields, (SELECT TOP 1 PersonID FROM Person where PersonName = d.PersonName )
FROM Details d
So, there are some PersonNames in the Details table which are not existent in Person table. How do I write CASE WHEN in this case?
I tried below but it didn't work
SELECT d.Fields,
CASE WHEN (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) = null
THEN 123
ELSE (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) END Name
FROM Details d
This query is still showing the same output as 2nd query. Please advise me on this. Let me know, if I'm unclear anywhere. Thanks
well.. I figured I can put ISNULL on top of SELECT to make it work.
SELECT d.Fields,
ISNULL(SELECT TOP 1 p.PersonID
FROM Person p where p.PersonName = d.PersonName, 124) id
FROM Details d
A simple left outer join to pull back all persons with an optional match on the details table should work with a case statement to get your desired result.
SELECT
*
FROM
(
SELECT
Instance=ROW_NUMBER() OVER (PARTITION BY PersonName),
PersonID=CASE WHEN d.PersonName IS NULL THEN 'XXXX' ELSE p.PersonID END,
d.Fields
FROM
Person p
LEFT OUTER JOIN Details d on d.PersonName=p.PersonName
)AS X
WHERE
Instance=1
Ooh goody, a chance to use two LEFT JOINs. The first will list the IDs where they exist, and insert a default otherwise; the second will eliminate the duplicates.
SELECT d.Fields, ISNULL(p1.PersonID, 123)
FROM Details d
LEFT JOIN Person p1 ON d.PersonName = p1.PersonName
LEFT JOIN Person p2 ON p2.PersonName = p1.PersonName
AND p2.PersonID < p1.PersonID
WHERE p2.PersonID IS NULL
You could use common table expressions to build up the missing datasets, i.e. your complete Person table, then join that to your Detail table as follows;
declare #n int;
-- set your default PersonID here;
set #n = 123;
-- Make sure previous SQL statement is terminated with semilcolon for with clause to parse successfully.
-- First build our unique list of names from table Detail.
with cteUniqueDetailPerson
(
[PersonName]
)
as
(
select distinct [PersonName]
from [Details]
)
-- Second get unique Person entries and record the most recent PersonID value as the active Person.
, cteUniquePersonPerson
(
[PersonID]
, [PersonName]
)
as
(
select
max([PersonID]) -- if you wanted the original Person record instead of the last, change this to min.
, [PersonName]
from [Person]
group by [PersonName]
)
-- Third join unique datasets to get the PersonID when there is a match, otherwise use our default id #n.
-- NB, this would also include records when a Person exists with no Detail rows (they are filtered out with the final inner join)
, cteSudoPerson
(
[PersonID]
, [PersonName]
)
as
(
select
coalesce(upp.[PersonID],#n) as [PersonID]
coalesce(upp.[PersonName],udp.[PersonName]) as [PersonName]
from cteUniquePersonPerson upp
full outer join cteUniqueDetailPerson udp
on udp.[PersonName] = p.[PersonName]
)
-- Fourth, join detail to the sudo person table that includes either the original ID or our default ID.
select
d.[Fields]
, sp.[PersonID]
from [Details] d
inner join cteSudoPerson sp
on sp.[PersonName] = d.[PersonName];