Concerned with query size using non-unique join conditions - sql

I have a situation at work. I work in housing. We raise orders to houses (so our contractors can go out and repair the houses).
Orders contain one or more jobs. A dwelling has zero, one or more orders raised against it.
This is a brief data definition. I've simplified the tables - but hopefully you get the idea. An order can contain many jobs, and a property can have many orders.
CREATE TABLE dwellings (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
address VARCHAR2(100) NOT NULL
);
CREATE TABLE orders (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
created_by VARCHAR2(10) NOT NULL,
created_on DATE NOT NULL,
dwelling_id VARCHAR2(10) NOT NULL REFERENCES dwellings(id)
);
CREATE TABLE jobs (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
sor_id VARCHAR2(10) NOT NULL,
order_id VARCHAR2(10) NOT NULL REFERENCES orders(id)
);
And populated:
INSERT INTO dwellings VALUES ('00ABC', '2 The Mews House Little Boston London E1 1EE');
INSERT INTO dwellings VALUES ('5H88H', '3 Electric House Snodsbury S1 1IT');
INSERT INTO orders VALUES ('000001-A', 'CSMITH', DATE '2016-03-10', '00ABC');
INSERT INTO orders VALUES ('000002-A', 'CSMITH', DATE '2016-03-11', '00ABC');
INSERT INTO orders VALUES ('000003-A', 'AJONES', DATE '2016-03-16', '00ABC');
INSERT INTO orders VALUES ('000004-A', 'CSMITH', DATE '2016-03-16', '5H88H');
INSERT INTO jobs VALUES ('001', '000AA0', '000001-A');
INSERT INTO jobs VALUES ('002', '123BB0', '000001-A');
INSERT INTO jobs VALUES ('003', '000AA0', '000002-A');
INSERT INTO jobs VALUES ('004', '787XD7', '000003-A');
INSERT INTO jobs VALUES ('005', '000AA0', '000003-A');
INSERT INTO jobs VALUES ('006', '787XD7', '000004-A');
An analyst wants to know agents who are raising orders that are similar to previous orders. The thing under scrutiny is the SOR_ID, which denotes the type of job. Remember, there is one or more job associated with each order. So the task is: produce a report showing orders that contain one or more duplicate job types to previous orders at the property.
The report I'm building will have these column headings.
Agent Name
Order Id
Address
Previous Order Id
Duplicate Job Types
Here is the start of a query that gets there. I haven't executed it against the database because there are 50,000 properties and 100,000 orders and 200,000 jobs. I'm concerned about the size of the table because I'm joining on columns that are not unique.
select * from orders ord
join orders ord2 on ord.dwelling_id = ord2.dwelling_id --shaky
and ord.id <> ord2.id
and ord.created_on - ord2.created_on between 0 and 90
join jobs job on job.order_id = ord.id
join jobs job2 on job2.order_id = ord2.id
where job.sor_id = job2.sor_id
I'm looking for recommendations for how you might refactor this query into something more manageable (without PLSQL). Note that I haven't used LAG / LEAD and I haven't yet used LISTAGG to collapse the job type codes. That will come later. I'm concerned about how expensive the query is at the moment.

Query:
SELECT o.created_by AS agent_name,
d.address,
LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
GROUP BY o.created_by, d.address, j.sor_id
HAVING COUNT(1) > 1;
Output:
AGENT_NAME ADDRESS ORDER_IDS JOB_TYPE
---------- -------------------------------------------- ----------------- ----------
CSMITH 2 The Mews House Little Boston London E1 1EE 000001-A,000002-A 000AA0
Lists the jobs with the different order ids that were of the same type and placed by the same agent at the same address. The orders are listed in chronological order within the comma-separated list.
However, if you want it with your headings then you could do:
SELECT *
FROM (
SELECT o.created_by AS agent_name,
o.id,
d.address,
LAG( o.id ) OVER ( PARTITION BY o.created_by, d.address, j.sor_id
ORDER BY o.created_on
) AS previous_order_id,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
)
WHERE previous_order_id IS NOT NULL;
Which would output:
AGENT_NAME ID ADDRESS PREVIOUS_ORDER_ID JOB_TYPE
---------- ---------- -------------------------------------------- ----------------- ----------
CSMITH 000002-A 2 The Mews House Little Boston London E1 1EE 000001-A 000AA0
If you want to consider multiple agents then you can remove o.created_by from the GROUP BYor PARTITION BY clauses. For the top query you would then need to use LISTAGG to get all the agents. Like this:
SELECT LISTAGG( o.created_by, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS agent_name,
d.address,
LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
GROUP BY d.address, j.sor_id
HAVING COUNT(1) > 1;
Or, for the second query, like this:
SELECT *
FROM (
SELECT o.created_by AS agent_name,
o.id,
d.address,
LAG( o.id ) OVER ( PARTITION BY d.address, j.sor_id
ORDER BY o.created_on
) AS previous_order_id,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
)
WHERE previous_order_id IS NOT NULL;
Both the queries would then also output the order with id 000003-A placed by AJONES.

Changes i would try out:
ord.id <> ord2.id : ord2.id < ord.id (not sure if that's applicable for you)
ord.created_on - ord2.created_on between 0 and 90 : ord2.created_on <= ord.created_on and ord2.created_on >= ord.created_on - 90 (not sure if the RDBMS can do that optimization)
Move job.sor_id = job2.sor_id into the ON clause (But the RDBMS will probably do that for you)
select * from orders ord
join orders ord2
on ord2.dwelling_id = ord.dwelling_id
and ord2.id < ord.id
and ord2.created_on <= ord.created_on
and ord2.created_on >= ord.created_on - 90
join jobs job on job.order_id = ord.id
join jobs job2
on job2.order_id = ord2.id
and job2.sor_id = job.sor_id;
Indexes you will need:
orders(dwelling_id, created_on, id)
jobs(order_id, sor_id)

Related

Summarize the Table in Sql

I want to join four tables and get total sales(Value*Quantity) for each month.
each transaction have to get monthly wise(July2018)
Example:
Agent_ID Agent Name Total sales(monthly wise)
Agent table
----------
Agent_ID
Agent Name
Agent address
Transaction table
-----------------
Transaction_ID
Transaction_Date(12/7/2018)
Agent_ID
Transation_Status
Transaction Detal table
-----------------------
Transaction_ID
Item_code
Quantity
Item Table
----------
Item_code
Item_name
Value
Pls support for this scenario
Supposing that the Transaction_Date is DATETIME OR TIMESTAMP field, Here is a Mysql query that could give you the desired result.
SELECT SUM(Value * Quantity) as total, DATE_FORMAT(Transaction_Date, '%Y-%M') date
FROM Agent JOIN Transaction USING (Agent_ID)
JOIN Transaction_Detail USING (Transaction_ID)
JOIN Item USING (Item_code)
GROUP BY Agent_ID, date;
Here's your query. Your are trying to sort by month-year.
select concat(DATENAME(month, cast(Transaction_Date as varchar)), year(cast(Transaction_Date as varchar))), sum(t4.Value * t3.Quantity), from table t1
inner join transaction_table t2 on t2.Agent_ID = t1.Agent_ID
inner join transaction_detail_table t3 on t3.Transactdion_ID = t2.Transaction_ID
inner join item_table t4 on t4.Item_code = t3.Item_code
group by concat(DATENAME(month, cast(Transaction_Date as varchar)), year(cast(Transaction_Date as varchar)))

Combine rows from Mulitple tables into single table

I have one parent table Products with multiple child tables -Hoses,Steeltubes,ElectricCables,FiberOptics.
ProductId -Primary key field in Product table
ProductId- ForeignKey field in Hoses,Steeltubes,ElectricCables,FiberOptics.
Product table has 1 to many relationship with Child tables
I want to combine result of all tables .
For eg - Product P1 has PK field ProductId which is used in all child tables as FK.
If Hoses table has 4 record with ProductId 50 and Steeltubes table has 2 records with ProductId 50 when I perform left join then left join is doing cartesian product of records showing 8 record as result But it should be 4 records .
;with HOSESTEELCTE
as
(
select '' as ModeType, '' as FiberOpticQty , '' as NumberFibers, '' as FiberLength, '' as CableType , '' as Conductorsize , '' as Voltage,'' as ElecticCableLength , s.TubeMaterial , s.TubeQty, s.TubeID , s.WallThickness , s.DWP ,s.Length as SteelLength , h.HoseSeries, h.HoseLength ,h.ProductId
from Hoses h
left join
(
--'' as HoseSeries,'' as HoseLength ,
select TubeMaterial , TubeQty, TubeID , WallThickness , DWP , Length,ProductId from SteelTubes
) s on (s.ProductId = h.ProductId)
) select * from HOSESTEELCTE
Assuming there are no relationships between child tables and you simply want a list of all child entities which make up a product you could generate a cte which has a number of rows which are equal to the largest number of entries across all the child tables for a product. In the example below I have used a dates table to simplify the example.
so for this data
create table products(pid int);
insert into products values
(1),(2);
create table hoses (pid int,descr varchar(2));
insert into hoses values (1,'h1'),(1,'h2'),(1,'h3'),(1,'h4');
create table steeltubes (pid int,descr varchar(2));
insert into steeltubes values (1,'t1'),(1,'t2');
create table electriccables(pid int,descr varchar(2));
truncate table electriccables
insert into electriccables values (1,'e1'),(1,'e2'),(1,'e3'),(2,'e1');
this cte
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050105)
select * from cte
create a cartesian join (one of the rare ocassions where an implicit join helps) pid to rn
result
rn pid
-------------------- -----------
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
And if we add the child tables
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050106)
select c.pid,h.descr hoses,s.descr steeltubes,e.descr electriccables from cte c
left join (select h.*, row_number() over(order by h.pid) rn from hoses h) h on h.rn = c.rn and h.pid = c.pid
left join (select s.*, row_number() over(order by s.pid) rn from steeltubes s) s on s.rn = c.rn and s.pid = c.pid
left join (select e.*, row_number() over(order by e.pid) rn from electriccables e) e on e.rn = c.rn and e.pid = c.pid
where h.rn is not null or s.rn is not null or e.rn is not null
order by c.pid,c.rn
we get this
pid hoses steeltubes electriccables
----------- ----- ---------- --------------
1 h1 t1 e1
1 h2 t2 e2
1 h3 NULL e3
1 h4 NULL NULL
2 NULL NULL e1
In fact, the result having 8 rows can be expected to be the result, since your four records are joined with the first record in the other table and then your four records are joined with the second record of the other table, making it 4 + 4 = 8.
The very fact that you expect 4 records to be in the result instead of 8 shows that you want to use some kind of grouping. You can group your inner query issued for SteelTubes by ProductId, but then you will need to use aggregate functions for the other columns. Since you have only explained the structure of the desired output, but not the semantics, I am not able with my current knowledge about your problem to determine what aggregations you need.
Once you find out the answer for the first table, you will be able to easily add the other tables into the selection as well, but in case of large data you might get some scaling problems, so you might want to have a table where you store these groups, maintain it when something changes and use it for these selections.

Find the latest or earliest date

I have a table with a foreign key called team_ID, a date column called game_date, and a single char column called result. I need to find when the next volleyball game happens. I have successfully narrowed the game dates down to all the volleyball games that have not happened yet because the result IS NULL. I have all the select in line, I just need to find the earliest date.
Here is what I've got:
SELECT game.game_date, team.team_name
FROM game
JOIN team
ON team.team_id = game.team_id
WHERE team.sport_id IN
(SELECT sport.sport_id
FROM sport
WHERE UPPER(sport.sport_type_code) IN
(SELECT UPPER(sport_type.sport_type_code)
FROM sport_type
WHERE UPPER(sport_type_name) like UPPER('%VOLLEYBALL%')
)
)
AND game.result IS NULL;
I'm a time traveler so don't mind the old dates.
When I run it, I get this:
GAME_DATE TEAM_NAME
----------- ----------
11-NOV-1998 BEars
13-NOV-1998 BEars
13-NOV-1998 WildCats
14-NOV-1998 BEars
How do I set it up so I get only the MIN(DATE) and the TEAM_NAME playing on that date?
I've tried AND game.game_date = MIN(game.game_date) but it simply tells me that a group function in not allowed here. There has to be a way to retrieve the MIN(game_date) and use it as a condition to be met.
I'm using Oracle 11g pl/sql.
This should be the final working code.
SELECT *
FROM
(
SELECT g.game_date, t.team_name
FROM game g
JOIN team t
ON t.team_id = g.team_id
JOIN sport s
ON t.sport_id = s.sport_id
JOIN sport_type st
ON UPPER(s.sport_type_code) IN UPPER(st.sport_type_code)
WHERE UPPER(sport_type_name) like UPPER('%VOLLEYBALL%')
AND g.result IS NULL
ORDER BY g.game_date
)
WHERE ROWNUM = 1;
The ROWNUM pseudocolumn is generated before any ORDER BY clause is applied to the query. If you just do WHERE ROWNUM <= X then you will get X rows in whatever order Oracle produces the data from the datafiles and not the X minimum rows. To guarantee getting the minimum row you need to use ORDER BY first and then filter on ROWNUM like this:
SELECT *
FROM (
SELECT g.game_date, t.team_name
FROM game g
JOIN team t
ON t.team_id = g.team_id
INNER JOIN sport s
ON t.sport_id = s.sport_id
INNER JOIN sport_type y
ON UPPER( s.sport_type_code ) = UPPER( y.sport_type_code )
WHERE UPPER( y.sport_type_name) LIKE UPPER('%VOLLEYBALL%')
AND g.result IS NULL
ORDER BY game_date ASC -- You need to do the ORDER BY in an inner query
)
WHERE ROWNUM = 1; -- Then filter on ROWNUM in an outer query.
If you want to return multiple rows with the minimum date then:
SELECT game_date,
team_name
FROM (
SELECT g.game_date,
t.team_name,
RANK() OVER ( ORDER BY g.game_date ASC ) AS rnk
FROM game g
JOIN team t
ON t.team_id = g.team_id
INNER JOIN sport s
ON t.sport_id = s.sport_id
INNER JOIN sport_type y
ON UPPER( s.sport_type_code ) = UPPER( y.sport_type_code )
WHERE UPPER( y.sport_type_name) LIKE UPPER('%VOLLEYBALL%')
AND g.result IS NULL
)
WHERE rnk = 1;
Could you make it simple and order by date and SELECT TOP 1? I think this is the syntax in Oracle:
WHERE ROWNUM <= number;
select game.game_date,team.team_name from (
SELECT game.game_date, team.team_name, rank() over (partition by team.team_name order by game.game_date asc) T
FROM game
JOIN team
ON team.team_id = game.team_id
WHERE team.sport_id IN
(SELECT sport.sport_id
FROM sport
WHERE UPPER(sport.sport_type_code) IN
(SELECT UPPER(sport_type.sport_type_code)
FROM sport_type
WHERE UPPER(sport_type_name) like UPPER('%VOLLEYBALL%')
)
)
AND game.result IS NULL
) query1 where query1.T=1;

How to display the record with the highest value in Oracle?

I have 4 tables with the following structure:
Table artist:
artistID lastname firstname nationality dateofbirth datedcease
Table work:
workId title copy medium description artist ID
Table Trans:
TransactionID Date Acquired Acquistionprice datesold askingprice salesprice customerID workID
Table Customer:
customerID lastname Firstname street city state zippostalcode country areacode phonenumber email
First question is which artist has the most works of artsold and how many of the artist works have been sold.
My SQL query is this:
SELECT * From dtoohey.artist A1
INNER JOIN
(
SELECT COUNT(W1.ArtistID) AS COUNTER, artistID FROM dtoohey.trans T1
INNER JOIN dtoohey.work W1
ON W1.workid = T1.Workid
GROUP BY W1.artistID
) TEMP1
ON TEMP1.artistID = A1.artistID
WHERE A1.artistID = TEMP1.artistId
ORDER BY COUNTER desc;
I am to get the whole table but I only want show only the first row which is the highest count how do I do that??
I have tried inserting WHERE ROWNUM <=1 but it shows artist ID with 1
qns 2 is sales of which artist's work have resulted in the highest average profit (i.e) the average of the profits made on each sale of worksby an artist), and what is that amount.
My SQL query is:
SELECT A1.artistid, A1.firstname FROM
(
SELECT
(salesPrice - AcquisitionPrice) as profit,
w1.artistid as ArtistID
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
on W1.workid = T1.workid
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artistID = TEMP1.artistID
GROUP BY A1.artistid
HAVING MAX(PROFIT) = AVG(PROFIT);
I'm not able to execute it
I have tried query below but still not able to get it keep getting the error missing right parenthesis
SELECT A1.artistid, A1.firstname, TEMP1.avgProfit
FROM
(
SELECT
AVG(salesPrice - AcquisitionPrice) as avgProfit,
W1.artistid as artistid
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
ON W1.workid = T1.workid
GROUP BY artistid
ORDER BY avgProfit DESC
LIMIT 1
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artisid = TEMP1.artistid
Sometimes ORA-00907: missing right parenthesis means exactly that: we have a left bracket without a matching right one. But it can also be thrown by a syntax error in a part of a statement bounded by parentheses.
It's that second cause here: LIMIT is a Mysql command which Oracle does not recognise. You can use an analytic function here:
SELECT A1.artistid, A1.firstname, TEMP1.avgProfit
FROM
(
select artistid
, avgProfit
, rank() over (order by avgProfit desc) as rnk
from (
SELECT
AVG(salesPrice - AcquisitionPrice) as avgProfit,
W1.artistid as artistid
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
ON W1.workid = T1.workid
GROUP BY artistid
)
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artisid = TEMP1.artistid
where TEMP1.rnk = 1
This uses the RANK() function which will return more than one row if several artists achieve the same average profit. You might want to use ROW_NUMBER() instead. Analytic functions can be very powerful. Find out more.
You can apply ROWN_NUMBER(), RANK() and DENSE_RANK() to any top-n problem. You can use one of them to solve your first problem too.
"however the avg profit is null."
That's probably a data issue. If one of the numbers in (salesPrice - AcquisitionPrice) is null the result will be null, and won't be included in the average. If all the rows for an artist are null the AVG() will be null.
As it happens the sort order will put NULL last. But as the PARTITION BY clause sorts by AvgProfit desc that puts the NULL results at rank 1. The solution is to use the NULLS LAST in the windowing clause:
, rank() over (order by avgProfit desc nulls last) as rnk
This will guarantee you a non-null result at the top (providing at least one of your artists has values in both columns).
1st question - Oracle does not guarantee the order by which rows are retrieved. Hence you must first order and then limit the ordered set.
SELECT * from (
SELECT A1.* From dtoohey.artist A1
INNER JOIN
(
SELECT COUNT(W1.ArtistID) AS COUNTER, artistID FROM dtoohey.trans T1
INNER JOIN dtoohey.work W1
ON W1.workid = T1.Workid
GROUP BY W1.artistID
) TEMP1
ON TEMP1.artistID = A1.artistID
WHERE A1.artistID = TEMP1.artistId
ORDER BY COUNTER desc
) WHERE ROWNUM = 1
2nd question: I believe (haven't tested) that you have that LIMIT 1 wrong. That keyword is for use with Bulk collecting.

SQL - Find duplicates with equivalencies

I'm having trouble wrapping my mind around developing this SQL query. Given the following two tables:
ACADEMIC_HISTORY ( STUDENT_ID, TERM, COURSE_ID, COURSE_GRADE )
COURSE_EQUIVALENCIES ( COURSE_ID, COURSE_ID_EQUIVALENT )
What would be the best way to detect if students have taken the same (or an equivalent) course in the past with a passing grade (C or better)?
Example
Student #1 took the course ABC001 and received a grade of C. Ten years later, the course was renamed ABC011 and the appropriate entry was made in COURSE_EQUIVALENCIES. The student retook the course under this new name and received a grade of B. How can I construct a SQL query that will detect the duplicate courses and only count the first passing grade?
(The actual case is significantly more complicated, but this should get me started.)
Thanks in advance.
EDIT:
It's not even necessary to keep or discard any information. A query that simply shows classes with duplicates will be sufficient.
you could use something like:
SELECT
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID
http://sqlfiddle.com/#!3/d608f/20
Sorry posted with a bug.. it preferred the score of the actual course requested over any equivalencies - fixed now
this only looks for one level of equivalencies.. but maybe you want to enforce that and have that part of the data entry process.. review all possible equivalencies and enter the valid ones
EDIT: for first pass of qualifying course (using numbered terms..)
SELECT TOP 1
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.TERM
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID, TERM
ORDER BY TERM ASC
http://sqlfiddle.com/#!3/fdded/6
(note TOP is a t-sql command for MySQL you need LIMIT)
The data (in LOWERCASE)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;
SET search_path='tmp';
CREATE TABLE academic_history
( student_id INTEGER NOT NULL
, course_id CHAR(6)
, course_grade CHAR(1)
, PRIMARY KEY(student_id,course_id)
);
INSERT INTO academic_history ( student_id,course_id,course_grade) VALUES
(1, 'ABC001' , 'C' )
, (1, 'ABC011' , 'B' )
, (2, 'ABC011' , 'A' )
;
CREATE TABLE course_equivalencies
( course_id CHAR(6)
, course_id_equivalent CHAR(6)
);
INSERT INTO course_equivalencies(course_id,course_id_equivalent) VALUES
( 'ABC011' , 'ABC001' )
;
The query:
-- EXPLAIN ANALYZE
WITH canon AS (
SELECT ah.student_id AS student_id
, ah.course_id AS course_id
, COALESCE (eq.course_id_equivalent,ah.course_id) AS course_id_equivalent
FROM academic_history ah
LEFT JOIN course_equivalencies eq ON eq.course_id = ah.course_id
)
SELECT h.student_id
, c.course_id_equivalent
, MIN(h.course_grade) AS the_grade
FROM academic_history h
JOIN canon c ON c.student_id = h.student_id AND c.course_id = h.course_id
GROUP BY h.student_id, c.course_id_equivalent
ORDER BY h.student_id, c.course_id_equivalent
;
The output:
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table tmp.academic_history
drop cascades to table tmp.course_equivalencies
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "academic_history_pkey" for table "academic_history"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 1
student_id | course_id_equivalent | the_grade
------------+----------------------+-----------
1 | ABC001 | B
2 | ABC001 | A
(2 rows)