SQL - Find duplicates with equivalencies - sql

I'm having trouble wrapping my mind around developing this SQL query. Given the following two tables:
ACADEMIC_HISTORY ( STUDENT_ID, TERM, COURSE_ID, COURSE_GRADE )
COURSE_EQUIVALENCIES ( COURSE_ID, COURSE_ID_EQUIVALENT )
What would be the best way to detect if students have taken the same (or an equivalent) course in the past with a passing grade (C or better)?
Example
Student #1 took the course ABC001 and received a grade of C. Ten years later, the course was renamed ABC011 and the appropriate entry was made in COURSE_EQUIVALENCIES. The student retook the course under this new name and received a grade of B. How can I construct a SQL query that will detect the duplicate courses and only count the first passing grade?
(The actual case is significantly more complicated, but this should get me started.)
Thanks in advance.
EDIT:
It's not even necessary to keep or discard any information. A query that simply shows classes with duplicates will be sufficient.

you could use something like:
SELECT
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID
http://sqlfiddle.com/#!3/d608f/20
Sorry posted with a bug.. it preferred the score of the actual course requested over any equivalencies - fixed now
this only looks for one level of equivalencies.. but maybe you want to enforce that and have that part of the data entry process.. review all possible equivalencies and enter the valid ones
EDIT: for first pass of qualifying course (using numbered terms..)
SELECT TOP 1
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.TERM
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID, TERM
ORDER BY TERM ASC
http://sqlfiddle.com/#!3/fdded/6
(note TOP is a t-sql command for MySQL you need LIMIT)

The data (in LOWERCASE)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;
SET search_path='tmp';
CREATE TABLE academic_history
( student_id INTEGER NOT NULL
, course_id CHAR(6)
, course_grade CHAR(1)
, PRIMARY KEY(student_id,course_id)
);
INSERT INTO academic_history ( student_id,course_id,course_grade) VALUES
(1, 'ABC001' , 'C' )
, (1, 'ABC011' , 'B' )
, (2, 'ABC011' , 'A' )
;
CREATE TABLE course_equivalencies
( course_id CHAR(6)
, course_id_equivalent CHAR(6)
);
INSERT INTO course_equivalencies(course_id,course_id_equivalent) VALUES
( 'ABC011' , 'ABC001' )
;
The query:
-- EXPLAIN ANALYZE
WITH canon AS (
SELECT ah.student_id AS student_id
, ah.course_id AS course_id
, COALESCE (eq.course_id_equivalent,ah.course_id) AS course_id_equivalent
FROM academic_history ah
LEFT JOIN course_equivalencies eq ON eq.course_id = ah.course_id
)
SELECT h.student_id
, c.course_id_equivalent
, MIN(h.course_grade) AS the_grade
FROM academic_history h
JOIN canon c ON c.student_id = h.student_id AND c.course_id = h.course_id
GROUP BY h.student_id, c.course_id_equivalent
ORDER BY h.student_id, c.course_id_equivalent
;
The output:
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table tmp.academic_history
drop cascades to table tmp.course_equivalencies
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "academic_history_pkey" for table "academic_history"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 1
student_id | course_id_equivalent | the_grade
------------+----------------------+-----------
1 | ABC001 | B
2 | ABC001 | A
(2 rows)

Related

Make string_agg() return unique values only [duplicate]

This question already has answers here:
Get unique values using STRING_AGG in SQL Server
(8 answers)
Closed 1 year ago.
I am working in SQL Server 2017 and I have the following two tables:
create table Computer (
Id int Identity(1, 1) not null,
Name varchar(100) not null,
constraint pk_computer primary key (Id)
);
create table HardDisk (
Id int Identity(1, 1) not null,
Interface varchar(100) not null,
ComputerId int not null,
constraint pk_harddisk primary key (Id),
constraint fk_computer_harddisk foreign key (ComputerId) references Computer(Id)
);
I have data such as:
Query
My current query is the following:
-- select query
select c.Id as computer_id,
string_agg(cast(hd.Interface as nvarchar(max)), ' ') as hard_disk_interfaces
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
group by c.Id;
This gets me the following:
computer_id | hard_disk_interfaces
-------------+----------------------
1 | SATA SAS
2 | SATA SAS SAS SAS SATA
However, I only want the distinct values, I'd like to end up with:
computer_id | hard_disk_interfaces
-------------+----------------------
1 | SATA SAS
2 | SATA SAS
I tried to put distinct in front of the string_agg, but that didn't work.
Here's a db-fiddle.
Incorrect syntax near the keyword 'distinct'.
string_agg is missing that feature , so you have to prepare the distinct list you want then aggregate them :
select id , string_agg(interface,' ') hard_disk_interfaces
from (
select distinct c.id, interface
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
) t group by id
for your original query :
select *
from ....
join (
<query above> ) as temp
...
group by ... , hard_disk_interfaces
A couple of other ways:
;WITH cte AS
(
SELECT c.Id, Interface = CONVERT(varchar(max), hd.Interface)
FROM dbo.Computer AS c
LEFT OUTER JOIN dbo.HardDisk AS hd ON c.Id = hd.ComputerId
GROUP BY c.Id, hd.Interface
)
SELECT Id, STRING_AGG(Interface, ' ')
FROM cte
GROUP BY Id;
or
SELECT c.Id, STRING_AGG(x.Interface, ' ')
FROM dbo.Computer AS c
OUTER APPLY
(
SELECT Interface = CONVERT(varchar(max), Interface)
FROM dbo.HardDisk WHERE ComputerID = c.Id
GROUP BY Interface
) AS x
GROUP BY c.Id;
Example db<>fiddle
If you are getting duplicates in a larger query with more joins, I would argue those duplicates are not duplicates coming out of STRING_AGG(), but rather duplicate rows coming from one or more of your 47 joins, not from this portion of the query. And I would guess that you still get those duplicates even if you leave out this join altogether.

Names of nodes at depth d for every descendant leaf

I have a category hierarchy that products are attached to. That category hierarchy is saved as an adjacency list. Products can be attached to any category nodes at any level. The category hierarchy is a tree.
I would like to...
get the name of every level 3 category...
per product...
where that product is attached to any level 3 category node...
or a descendant of a level 3 node.
I know I can materialize the hierarchy, and from that I've been able to satisfy all requirements but the last. I always lose some products or categories.
Given
CREATE TABLE product (p_id varchar PRIMARY KEY);
CREATE TABLE category (c_id varchar PRIMARY KEY, parent_c_id varchar);
CREATE TABLE product_category (
p_id varchar,
c_id varchar,
PRIMARY KEY (p_id, c_id),
FOREIGN KEY (p_id) REFERENCES product (p_id)
ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (c_id) REFERENCES category (c_id)
ON UPDATE CASCADE ON DELETE CASCADE
);
INSERT INTO product (p_id) VALUES
('p_01'),
('p_02'),
('p_03'),
('p_04'),
('p_05');
INSERT INTO category (c_id, parent_c_id) VALUES
('c_0_1', NULL),
-- L1
('c_1_1', 'c_0_1'),
('c_1_2', 'c_0_1'),
('c_1_3', 'c_0_1'),
-- L2
('c_2_1', 'c_1_1'),
('c_2_2', 'c_1_1'),
('c_2_3', 'c_1_2'),
('c_2_4', 'c_1_3'),
-- L3
('c_3_1', 'c_2_1'),
('c_3_2', 'c_2_2'),
('c_3_3', 'c_2_3'),
('c_3_4', 'c_2_4'),
-- L4
('c_4_1', 'c_3_1'),
('c_4_2', 'c_3_2'),
('c_4_3', 'c_3_3'),
('c_4_4', 'c_3_4');
INSERT INTO product_category (p_id, c_id) VALUES
-- p_01 explicitly attached to every level in path 1; include.
('p_01', 'c_0_1'),
('p_01', 'c_2_1'),
('p_01', 'c_3_1'),
('p_01', 'c_4_1'),
-- p_02 explicitly attached to desired level in paths 1 and 3; include both.
('p_02', 'c_3_3'),
('p_02', 'c_3_4'),
-- p_03 explicitly attached to super-level in path 3; exclude.
('p_03', 'c_2_4'),
-- p_04 explicitly attached to sub-level in path 1,
-- transitively to desired level in path 1; include.
('p_04', 'c_4_2');
-- p_05 not attached at all.
I would like to end up with something like
p_id | c_id
------+----------------
p_01 | {c_3_1}
p_02 | {c_3_3, c_3_4}
p_04 | {c_3_2}
(3 rows)
but the closest I have gotten is
WITH RECURSIVE category_tree (c_id, parent_c_id, depth, path) AS (
SELECT c_id, parent_c_id, 0 AS depth, ARRAY[]::varchar[]
FROM category
WHERE parent_c_id IS NULL
UNION ALL
SELECT c.c_id, c.parent_c_id, ct.depth + 1, path || c.c_id
FROM category_tree AS ct
INNER JOIN category AS c ON c.parent_c_id = ct.c_id
)
SELECT *
INTO TEMP TABLE t_category_path
FROM category_tree;
SELECT p.p_id, ARRAY_AGG(c_id) category_names
FROM product AS p,
(SELECT DISTINCT t1.c_id, p_id
FROM product_category AS pc
INNER JOIN t_category_path AS t1 ON pc.c_id = t1.c_id
WHERE t1.depth = 3
ORDER BY c_id) x
WHERE p.p_id = x.p_id
GROUP BY p.p_id;
p_id | category_names
------+----------------
p_01 | {c_3_1}
p_02 | {c_3_4,c_3_3}
(2 rows)
The order of categories is irrelevent (I want a set, not a list).
I can tolerate duplicate categories far better than missing categories or products.
I have some liberty to adjust the schema.
> select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 10.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
step-by-step demo:db<>fiddle
WITH RECURSIVE cte AS (
SELECT c_id, parent_c_id, 0 as level, NULL AS level3_category
FROM category
WHERE parent_c_id IS NULL
UNION
SELECT
c.c_id,
cte.parent_c_id,
cte.level + 1,
CASE -- 1
WHEN cte.level + 1 = 3 THEN c.c_id
ELSE cte.level3_category
END
FROM
category c
JOIN
cte
ON c.parent_c_id = cte.c_id
)
SELECT
p_id,
ARRAY_AGG(DISTINCT level3_category) as c_id -- 2
FROM
cte
JOIN
product_category pc
ON cte.c_id = pc.c_id AND cte.level3_category IS NOT NULL
GROUP BY p_id
This CASE clause stores the current name if and only if it is level 3. If it is less, than it returns NULL, if it is greater, it takes the level 3 value.
DISTINCT is allowed in GROUP BY aggregates to eliminate non-distinct values.
You can use exists and not exists with joins to get a particular depth:
select p.p_id, array_agg(pc.c_id)
from products p join
product_category pc
on p.p_id = pc.p_id
where exists (select 1
from category_tree ct join
category_tree ctp
on ct.parent_cid = ctp.cid join
category_tree ctp2
on ctp.parent_cid = ctp2.cid
where ct.cid = pc.c_id
) and
not exists (select 1
from category_tree ct join
category_tree ctp
on ct.parent_cid = ctp.cid join
category_tree ctp2
on ctp.parent_cid = ctp2.cid join
category_tree ctp3
on ctp2.parent_cid = ctp3.cid
where ct.cid = pc.c_id
)
group by p.p_id;

How to differentiate between “no child rows exist” and “no parent row exists” in one SELECT query?

Say I have a table C that references rows from tables A and B:
id, a_id, b_id, ...
and a simple query:
SELECT * FROM C WHERE a_id=X AND b_id=Y
I would like to differentiate between the following cases:
No row exists in A where id = X
No row exists in B where id = Y
Both such rows in A and B exist, but no rows in C exist where a_id = X and b_id = Y
The above query will return empty result in all those cases.
In case of one parent table I could do a LEFT JOIN like:
SELECT * FROM A LEFT JOIN C ON a.id = c.a_id WHERE c.a_id = X
and then check if the result is empty (no row in A exists), has one row with NULL c.id (row in A exists, but no rows in C exist) or 1+ rows with non-NULL c.id (row in A exists and at least one row in C exists). A bit messy but it works, but I was wondering if there is a better way of doing this, especially if there is more than one parent table?
For example:
C is "things owned by people", A is "people", B is "types of things". When someone asks "give me a list of games owned by Bill", and there are no such records in C, I would like to return an empty list only if both "Bill" and "games" exist in their corresponding tables, but an error code if either of them doesn't.
So if there are no records matching "Bill" and "games" in table C, I would like to say "I don't know who Bill is" instead of "Bill has no games" if I don't have a record about Bill in table A.
create table a(a_id integer not null primary key);
create table b(b_id integer not null primary key);
create table c(a_id integer not null references a(a_id)
, b_id integer not null references b(b_id)
, primary key (a_id,b_id)
);
insert into a(a_id) values(0),(2),(4),(6);
insert into b(b_id) values(0),(3),(6);
insert into c(a_id,b_id) values(6,6);
PREPARE omg(integer,integer) AS
SELECT EXISTS(SELECT * FROM a where a.a_id = $1) AS a_exists
, EXISTS(SELECT * FROM b where b.b_id = $2) AS b_exists
, EXISTS(SELECT * FROM c where c.a_id = $1 and c.b_id = $2) AS c_exists
;
EXECUTE omg(1,1);
EXECUTE omg(2,1);
EXECUTE omg(1,3);
EXECUTE omg(6,6);
-- with optional payload:
PREPARE omg2(integer,integer) AS
SELECT val.a_id AS va_id
, val.b_id AS vb_id
, EXISTS(SELECT * FROM a WHERE a.a_id = $1) AS a_exists
, EXISTS(SELECT * FROM b WHERE b.b_id = $2) AS b_exists
, EXISTS(select * FROM c WHERE c.ca_id = val.a_id AND c.cb_id = val.b_id ) AS c_exists
, a.*
, b.*
, c.*
FROM (values ($1,$2)) val(a_id,b_id)
LEFT JOIN a ON a.a_id = val.a_id
LEFT JOIN b ON b.b_id = val.b_id
LEFT JOIN c ON c.ca_id = val.a_id AND c.cb_id = val.b_id
;
EXECUTE omg2(1,1);
EXECUTE omg2(2,1);
EXECUTE omg2(1,3);
EXECUTE omg2(6,6);
I think I managed to get a satisfactory solution using the following two features:
Subselect bound to a column, which allows me to check if a row exists and (importantly) get a NULL value otherwise (e.g. SELECT (SELECT id FROM a WHERE id = 1) as a_id))
Common Table Expressions
Initial data:
CREATE TABLE people
(
id integer not null primary key,
name text not null
);
CREATE TABLE thing_types
(
id integer not null primary key,
name text not null
);
CREATE TABLE things
(
id integer not null primary key,
person_id integer not null references people(id),
thing_type_id integer not null references thing_types(id),
name text not null
);
INSERT INTO people VALUES (1, 'Bill');
INSERT INTO thing_types VALUES (1, 'game');
INSERT INTO things VALUES (1, 1, 1, 'Duke Nukem');
INSERT INTO things VALUES (2, 1, 1, 'Warcraft 2');
And the query:
WITH v AS (
SELECT (SELECT id FROM people WHERE id=<person_id_param>) AS person_id,
(SELECT id FROM thing_types WHERE id=<thing_type_param>) AS thing_type_id
)
SELECT v.person_id, v.thing_type_id, things.name
FROM
v LEFT JOIN things
ON v.person_id = things.person_id AND v.thing_type_id = things.thing_type_id
This query will always return at least one row, and I just need to check which, if any, of the three columns of the first row are NULLs.
In case if both parent table ids are valid and there are some records, none of them will be NULL:
person_id thing_type_id name
-------------------------------------
1 1 Duke Nukem
1 1 Warcraft 2
If either person_id or thing_type_id are invalid, I get one row where name is NULL and either person_id or thing_type_id is NULL:
person_id thing_type_id name
-------------------------------------
NULL 1 NULL
If both person_id and thing_type_id are valid but there are no records in things, I get one row where both person_id and thing_type_id are not NULL, but the name is NULL:
person_id thing_type_id name
-------------------------------------
1 1 NULL
Since I have a NOT NULL constraint on things.name, I know that this case can only mean that there are no matching records in things. If NULLs were allowed in things.name, I could include things.id instead and check that for NULLness.
You have 3 cases, the third one is a bit more complex but can be achieved by using cross join between a and b, all three cases in a union could be like this
select a_id, b_id , 'case 1' from c
where not exists (select 1 from a where a.a_id=c.a_id)
union all
select a_id, b_id ,'case 2' from c
where not exists (select 1 from b where b.b_id=c.b_id)
union all
select a_id, b_id, 'case 3' from a cross join b
where exists (select 1 from c where c.a_id=a.a_id)
and exists (select 1 from c where c.b_id=b.b_id)
and not exists (select 1 from c where c.b_id=b.b_id and c.a_id=a.a_id)

Join on resultant table of another join without using subquery,CTE or temp tables

My question is can we join a table A to resultant table of inner join of table A and B without using subquery, CTE or temp tables ?
I am using SQL Server.
I will explain the situation with an example
The are two tables GoaLScorers and GoalScoredDetails.
GoaLScorers
gid Name
-----------
1 A
2 B
3 A
GoalScoredDetails
DetailId gid stadium goals Cards
---------------------------------------------
1 1 X 2 1
2 2 Y 5 2
3 3 Y 2 1
The result I am expecting is if I select a stadium 'X' (or 'Y')
I should get name of all who may or may not have scored there, also aggregate total number of goals,total cards.
Null value is acceptable for names if no goals or no cards.
I can get the result I am expecting with the below query
SELECT
gs.name,
SUM(goal) as TotalGoals,
SUM(cards) as TotalCards
FROM
(SELECT
gid, stadium, goal, cards
FROM
GoalScoredDetails
WHERE
stadium = 'Y') AS vtable
RIGHT OUTER JOIN
GoalScorers AS gs ON vtable.gid = gs.gid
GROUP BY
gs.name
My question is can we get the above result without using a subquery or CTE or temp table ?
Basically what we need to do is OUTER JOIN GoalScorers to resultant virtual table of INNER JOIN OF GoalScorers and GoalScoredDetails.
But I am always faced with ambiguous column name error as "gid" column is present in GoalScorers and also in resultant table. Error persists even if I try to use alias for column names.
I have created a sql fiddle for this her: http://sqlfiddle.com/#!3/40162/8
SELECT gs.name, SUM(gsd.goal) AS totalGoals, SUM(gsd.cards) AS totalCards
FROM GoalScorers gs
LEFT JOIN GoalScoredDetails gsd ON gsd.gid = gs.gid AND
gsd.Stadium = 'Y'
GROUP BY gs.name;
IOW, you could push your where criteria onto joining expression.
The error Ambiguous column name 'ColumnName' occurs when SQL Server encounters two or more columns with the same and it hasn't been told which to use. You can avoid the error by prefixing your column names with either the full table name, or an alias if provided. For the examples below use the following data:
Sample Data
DECLARE #GoalScorers TABLE
(
gid INT,
Name VARCHAR(1)
)
;
DECLARE #GoalScoredDetails TABLE
(
DetailId INT,
gid INT,
stadium VARCHAR(1),
goals INT,
Cards INT
)
;
INSERT INTO #GoalScorers
(
gid,
Name
)
VALUES
(1, 'A'),
(2, 'B'),
(3, 'A')
;
INSERT INTO #GoalScoredDetails
(
DetailId,
gid,
stadium,
goals,
Cards
)
VALUES
(1, 1, 'x', 2, 1),
(2, 2, 'y', 5, 2),
(3, 3, 'y', 2, 1)
;
In this first example we recieve the error. Why? Because there is more than one column called gid it cannot tell which to use.
Failed Example
SELECT
gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
This example works because we explicitly tell SQL which gid to return:
Working Example
SELECT
gs.gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
You can, of course, return both:
Example
SELECT
gs.gid,
gsd.gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
In multi table queries I would always recommend prefixing every column name with a table/alias name. This makes the query easier to follow, and reduces the likelihood of this sort of error.

How to make a txt/doc/pdf/html file as we want to view in PostgreSQL

I have the tables and data .
What i want is, to generate a report of all the students sorted by there names..
--student table
create table student(
sid int,
sname text
);
--student details
insert into student(sid,sname)
values(101,'John'),
(102,'barbie'),
(103,'britney'),
(104,'jackson'),
(105,'abraham')
;
--questions table all the questions for the test
create table questions(
questionid serial,
question text
);
--i have the questions in my table
insert into questions(question)
values('How much is 1+1'),('What is the value of PI'),('Whose dimensions are all equal');
--the test table it contains the details of the test attebdee by every student..
create table test(
sno serial,
sid int,
questionid int,
answer text,
marks int
);
--insert into test table the answers and the marks ..should be updated here..
insert into test(sid,questionid,answer,marks)
values(101,1,'2',10),
(102,2,' 3 ',0),
(103,3,' ring ',0),
(104,1,' 1 ',0),
(105,1,' 1 ',0),
(101,2,'3.7',0),
(101,3,' square',10);
My Requirement:
My txt/doc/pdf/html file which is generated should be in a view as below
live sql fiddle demo i tried
may be something like this:
copy(
with cte as (
select
s.sid, s.sname,
q.question, t.answer, t.marks,
row_number() over(partition by s.sid order by t.sno) as row_num
from student as s
left outer join test as t on t.sid = s.sid
left outer join questions as q on q.questionid = t.questionid
)
select
case when c.row_num = 1 then c.sid else null end as sid,
case when c.row_num = 1 then c.sname else null end as sname,
c.question, c.answer, c.marks
from cte as c
order by c.sname asc, c.row_num asc
) to 'e:\sample.csv' delimiter ',' csv header;
sql fiddle demo