SQL - EXIST OR ALL? - sql

I have two different table student and grades;
grade table has an attribute student_id which references student_id from student table.
How do I find which student has every grade that exists?
If this is not clear,
Student ID Name
1 1 John
2 2 Paul
3 3 George
4 4 Mike
5 5 Lisa
Grade Student_Id Course Grade
1 1 Math A
2 1 English B
3 1 Physics C
4 2 Math A
5 2 English A
6 2 Physics B
7 3 Economics A
8 4 Art C
9 5 Biology A
Assume there is only grade a,b,c (no d, e or fail)
I want to find only John because He has grade a,b,c while
other student like Paul(2) should not be selected because he does not have grade c. It does not matter which course he took, I just need to find if he has all the grades out there available.
Feel like I should something like exist or all function in sql but not sure.
Please help. Thank you in advance.

I would use GROUP BY and HAVING, but like this:
SELECT s.Name
FROM Student s JOIN
Grade g
ON s.ID = g.Student_Id
GROUP BY s.id, s.Name
HAVING COUNT(DISTINCT g.Grade) = (SELECT COUNT(DISTINCT g2.grade) FROM grade g2);
You say "all the grades out there", so the query should not use a constant for that.

You can use HAVING COUNT(DISTINCT Grade) = 3 to check that the student has all 3 grades:
SELECT Name
FROM Student S
JOIN Grade G ON S.ID = G.Student_Id
GROUP BY Name
HAVING COUNT(DISTINCT Grade) = 3
Guessing at S.ID vs S.Student on the join. Not sure what the difference is there.

By using exists
select * from student s
where exists ( select 1
from grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Example
with Student as
(
select 1 as id,'John' as person
union all
select 2 as id,'Paul' as person
union all
select 3 as id,'jorge'
),
Grades as
(
select 1 as Graden, 1 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 2 as Graden, 1 as Student_Id, 'English' as Course, 'B' as Grade
union all
select 3 as Graden, 1 as Student_Id, 'Physics' as Course, 'C' as Grade
union all
select 4 as Graden, 2 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 5 as Graden, 2 as Student_Id, 'English' as Course, 'A' as Grade
union all
select 6 as Graden, 2 as Student_Id, 'Physics' as Course, 'B' as Grade
)
select * from Student s
where exists ( select 1
from Grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Note having count(distinct Grade)=3 i used this as in your sample data grade type is 3

Before delving into the answer, here's a working SQL Fiddle Example so you can see this in action.
As Gordon Linoff points out in his excellent answer, you should use GroupBy and Having Count(Distinct ... ) ... as an easy way to check.
However, I'd recommend changing your design to ensure that you have tables for each concern.
Currently your Grade table holds each student's grade per course. So it's more of a StudentCourse table (i.e. it's the combination of student and course that's unique / gives you that table's natural key). You should have an actual Grade table to give you the list of available grades; e.g.
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
This then allows you to ensure that your query would still work if you decided to include grades D and E, without having to amend any code. It also ensures that you only have to query a small table to get the complete list of grades, rather than a potentially huge table; so will give better performance. Finally, it will also help you maintain good data; i.e. so you don't accidentally end up with students with grade X due to a typo; i.e. since the validation/constraints exist in the database.
select Name from Student s
where s.Id in
(
select sc.StudentId
from StudentCourse sc
group by sc.StudentId
having count(distinct sc.Grade) = (select count(Code) from Grade)
)
order by s.Name
Likewise, it's sensible to create a Course table. In this case holding Ids for each course; since holding the full course name in your StudentCourse table (as we're now calling it) uses up a lot more space and again lacks validation / constraints. As such, I'd propose amending your database schema to look like this:
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
create table Course
(
Id bigint not null identity(1,1) constraint PK_Course primary key clustered
, Name nvarchar(128) not null constraint UK_Course_Name unique
)
insert Course (Name) values ('Math'),('English'),('Physics'),('Economics'),('Art'),('Biology')
create table Student
(
Id bigint not null identity(1,1) constraint PK_Student primary key clustered
,Name nvarchar(128) not null constraint UK_Student_Name unique
)
set identity_insert Student on --inserting with IDs to ensure the ids of these students match data from your question
insert Student (Id, Name)
values (1, 'John')
, (2, 'Paul')
, (3, 'George')
, (4, 'Mike')
, (5, 'Lisa')
set identity_insert Student off
create table StudentCourse
(
Id bigint not null identity(1,1) constraint PK_StudentCourse primary key
, StudentId bigint not null constraint FK_StudentCourse_StudentId foreign key references Student(Id)
, CourseId bigint not null constraint FK_StudentCourse_CourseId foreign key references Course(Id)
, Grade char /* allow null in case we use this table for pre-results; otherwise make non-null */ constraint FK_StudentCourse_Grade foreign key references Grade(Code)
, Constraint UK_StudentCourse_StudentAndCourse unique clustered (StudentId, CourseId)
)
insert StudentCourse (StudentId, CourseId, Grade)
select s.Id, c.Id, x.Grade
from (values
('John', 'Math', 'A')
,('John', 'English', 'B')
,('John', 'Physics', 'C')
,('Paul', 'Math', 'A')
,('Paul', 'English', 'A')
,('Paul', 'Physics', 'B')
,('George', 'Economics','A')
,('Mike', 'Art', 'C')
,('Lisa', 'Biology', 'A')
) x(Student, Course, Grade)
inner join Student s on s.Name = x.Student
inner join Course c on c.Name = x.Course

Related

Accessing to total number in each second level(Postgres Hierarchical Query Practice)

I was practicing on Postgres and stuck on a point that I couldn't find a way to achieve. I have a simple database which are the attributes:
CREATE TABLE public.department
(
"deptId" integer NOT NULL PRIMARY KEY,
name character varying(30) COLLATE pg_catalog."default" NOT NULL,
"parentId" integer,
"numEmpl" integer NOT NULL,
CONSTRAINT "department_parentId_fkey" FOREIGN KEY ("parentId")
REFERENCES public.department ("deptId") MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
and then I have some data in the table. Short example is
insert into department values (1, 'Headquarter', 1, 10);
insert into department values (2, 'Sales', 1, 15);
insert into department values (3, 'Logistics', 1, 25);
...
I was trying to get the total number of people who are employeed in each second level department.
I am able to get the total number of employeed people in each department but according to my search in the internet this is possible with "Hierarchical Queries". Currently, I am using
parentId=1
while querying.
Any solutions for this? Thank you.
Here is one option:
with recursive cte as (
select deptid as rootid, deptid from department where parentid = 1 and deptid <> 1
union all
select c.rootid, d.deptid
from cte c
inner join department d on d.parentid = c.deptid and d.deptid <> 1
)
select rootid, count(*) cnt from cte group by rootid

Check for uniqueness within update statement

I have a table that links a class to the students that are in that class:
create table class_student
(class_id int,
student_id int,
constraint class_student_u unique nonclustered (class_id, student_id))
If I want to transfer all the classes from one student to another (remove one student from all the classes he/she is enrolled in and add another student to each of the classes the old student was enrolled in), I use the following query:
update class_student
set student_id = #newStudent
where student_id = #oldStudent
and class_id not in (select class_id
from class_student
where student_id = #newStudent)
delete from class_student
where student_id = #oldStudent
How can I transfer the classes from more than one student to the new student? I can't just put where student_id in (#oldStudent1, #oldStudent2) because if both old students are in the same class, after running the above query there will be a violation of the unique constraint. Also, I'd like to do the update in as few queries if possible (I could just run the above queries twice, but I'd like to do it in fewer).
I'm using SQL Server 2008 R2.
Edit: To clarify, here's an example:
class_id student_id
===================
1 1
1 2
2 3
3 1
3 3
4 2
4 3
This means that student 1 is in class 1 and 3, student 2 is in class 1 and 4, and student 3 is in class 2, 3, and 4. If I want to transfer all the classes from student 1 to student 3, I would run the following query:
update class_student
set student_id = 3
where student_id = 1
and class_id not in (select class_id
from class_student
where student_id = 3)
delete from class_student
where student_id = 1
Our data would look like this:
class_id student_id
===================
1 3
1 2
2 3
3 3
4 2
4 3
If, instead, I had run this query:
update class_student
set student_id = 3
where student_id in (1, 2)
and class_id not in (select class_id
from class_student
where student_id = 3)
delete from class_student
where student_id in (1, 2)
Ignoring the unique constraint on the table, the data would look like this:
class_id student_id
===================
1 3
1 3
2 3
3 3
4 3
The double (1, 3) record is what I'm trying to avoid, because it will cause a unique constraint violation in the table.
When setting up the original table you should always include a unique row id with which to reference any specific row (please see below the 'identity' column called row_id):
DROP TABLE class_student
create table class_student
(
row_id int identity(1,1),
class_id int,
student_id int,
constraint class_student_u unique nonclustered (class_id, student_id)
)
insert class_student (class_id,student_id) values (1,1)
insert class_student (class_id,student_id) values (1,2)
insert class_student (class_id,student_id) values (2,3)
insert class_student (class_id,student_id) values (3,1)
insert class_student (class_id,student_id) values (3,3)
insert class_student (class_id,student_id) values (4,2)
insert class_student (class_id,student_id) values (4,3)
In a situation where students 1 and 2 are leaving and you are passing any classes they were taking to student 3 (unless student 3 is already attending those classes), the code could
look something like this:
WITH CTE
AS
(
SELECT row_Id,class_id,student_id,RN = ROW_NUMBER()OVER(PARTITION BY
class_id ORDER BY class_id) FROM class_student WHERE student_id in (1,2,3)
)
DELETE FROM class_student where class_id in (select class_id from
class_student group by class_id having count(class_id) > 1) and student_id
<> 3 and row_id not in (select row_id from cte where student_id <> 3 and
rn >= 2)
Update class_student set student_id = 3
I am using a 'common table expression' with 'RANK' to number each class_id according to the number of rows bearing the same class_id. To see this you can run the code below after
creating the class_student table and inserting the data (see top) but before you run the CTE code above:
WITH CTE
AS
(
SELECT row_Id,class_id,student_id,RN = ROW_NUMBER()OVER(PARTITION BY
class_id ORDER BY class_id) FROM class_student WHERE student_id in (1,2,3)
)
SELECT * FROM CTE
Because class_id 1,3 and 4 are duplicated, they have a value of 2 in the RN (Row Number) column.
I'm using this result in the CTE to delete the rows we don't need from the class_student table and this is where the importance of always having a unique row_id can be seen.
The Delete query deletes rows in the class_student table which are Class ID duplicates. In the case of a class attended by both student 3 and one or both of the other students it
takes the rows where the Student ID is not 3 (because Student 3 is not leaving).
To do this successfully (without taking rows that we want to retain to be assigned to student 3), it requires (by comparing row_id's) that rows where RN = 2 (i.e. class_id is duplicated)
and student_id is not 3 are retained so that we keep one of the rows for Classes that both student 1 and 2 were doing but student 3 was not (i.e. where neither student_id was 3).
Finally, update all remaining rows in the table to a student_id of 3 so that Student 3 gets all the courses.
To see the result you can run:
select * from class_student
I think you'll need at least 2 DML statements to accomplish your goal. And if you really need it to happen in one go, then you can wrap the statements in a stored procedure.
insert into class_student (class_id, student_id)
select distinct class_id, #newStudent
from class_student
where student_id in (#oldStudent1, #oldStudent2)
and class_id not in (select class_id
from class_student
where student_id = #newStudent);
delete from class_student
where student_id in (#oldStudent1, #oldStudent2);
EDIT: Fixed insert to include the "not in" clause.

Postgresql aggregate array

I have a two tables
Student
--------
Id Name
1 John
2 David
3 Will
Grade
---------
Student_id Mark
1 A
2 B
2 B+
3 C
3 A
Is it possible to make native Postgresql SELECT to get results like below:
Name Array of marks
-----------------------
'John', {'A'}
'David', {'B','B+'}
'Will', {'C','A'}
But not like below
Name Mark
----------------
'John', 'A'
'David', 'B'
'David', 'B+'
'Will', 'C'
'Will', 'A'
Use array_agg: http://www.sqlfiddle.com/#!1/5099e/1
SELECT s.name, array_agg(g.Mark) as marks
FROM student s
LEFT JOIN Grade g ON g.Student_id = s.Id
GROUP BY s.Id
By the way, if you are using Postgres 9.1, you don't need to repeat the columns on SELECT to GROUP BY, e.g. you don't need to repeat the student name on GROUP BY. You can merely GROUP BY on primary key. If you remove the primary key on student, you need to repeat the student name on GROUP BY.
CREATE TABLE grade
(Student_id int, Mark varchar(2));
INSERT INTO grade
(Student_id, Mark)
VALUES
(1, 'A'),
(2, 'B'),
(2, 'B+'),
(3, 'C'),
(3, 'A');
CREATE TABLE student
(Id int primary key, Name varchar(5));
INSERT INTO student
(Id, Name)
VALUES
(1, 'John'),
(2, 'David'),
(3, 'Will');
What I understand you can do something like this:
SELECT p.p_name,
STRING_AGG(Grade.Mark, ',' ORDER BY Grade.Mark) As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
EDIT
I am not sure. But maybe something like this then:
SELECT p.p_name, 
    array_to_string(ARRAY_AGG(Grade.Mark),';') As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
Reference here
You could use the following:
SELECT Student.Name as Name,
(SELECT array(SELECT Mark FROM Grade WHERE Grade.Student_id = Student.Id))
AS ArrayOfMarks
FROM Student
As described here: http://www.mkyong.com/database/convert-subquery-result-to-array/
Michael Buen got it right. I got what I needed using array_agg.
Here just a basic query example in case it helps someone:
SELECT directory, ARRAY_AGG(file_name)
FROM table
WHERE type = 'ZIP'
GROUP BY directory;
And the result was something like:
| parent_directory | array_agg |
+-------------------------+----------------------------------------+
| /home/postgresql/files | {zip_1.zip,zip_2.zip,zip_3.zip} |
| /home/postgresql/files2 | {file1.zip,file2.zip} |
This post also helped me a lot: "Group By" in SQL and Python Pandas.
It basically says that it is more convenient to use only PSQL when possible, but that Python Pandas can be useful to achieve extra functionalities in the filtering process.

Who to Insert data into ODD/EVEN rows only in SQL

I have one table with gender as one of the columns.
In gender column only M or F are allowed.
Now i want to sort the table so that while displaying the table in gender field M and F will come alternetivly.
I have Tried....
I have tried to create one(new) table with the same structure as my existing table.
Now using high leval insert i want to insert M to odd rows and F to even rows.
After that i want to join those two statements using union operator.
I am able to insert to ( new ) the table only male or female but not to the even or odd rows...
Can any body help me regarding this....
Thanks in Advance....
Don't consider a table to be "sorted". The SQL server may return the rows in any order depending on execution plan, index, joins etc. If you want a strict order you need to have an ordered column, like an identity column. Usually it is better to apply the desired sorting when selecting data.
However the interleaving of M and F is a little bit tricky, you need to use the ROW_NUMBER function.
Valid SQL Server code:
CREATE TABLE #GenderTable(
[Name] [nchar](10) NOT NULL,
[Gender] [char](1) NOT NULL
)
-- Create sample data
insert into #GenderTable (Name, Gender) values
('Adam', 'M'),
('Ben', 'M'),
('Casesar', 'M'),
('Alice', 'F'),
('Beatrice', 'F'),
('Cecilia', 'F')
SELECT * FROM #GenderTable
SELECT * FROM #GenderTable
order by ROW_NUMBER() over (partition by gender order by name), Gender
DROP TABLE #GenderTable
This gives the output
Name Gender
Adam M
Ben M
Casesar M
Alice F
Beatrice F
Cecilia F
and
Name Gender
Alice F
Adam M
Beatrice F
Ben M
Cecilia F
Casesar M
If you use another DBMS the syntax may differ.
I think the best way to do it would be to have two queries (one for M, one for F) and then join them together. The catch would be you would have to calculate the "rank" of each query and then sort accordingly.
Something like the following should do what you need:
select * from
(select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'M'
union
select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'F') joined
order by joined.rank, joined.gender;
If you are using SQL Server, you can seed your two tables with an IDENTITY column as follows. Make one odd and one even and then union and sort by this column.
Note that you can only truly alternate if there are the same number of male and female records. If there are more of one than the other, you will end up with non-alternating rows at the end.
CREATE TABLE MaleTable(Id INT IDENTITY(1,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
CREATE TABLE FemaleTable(Id INT IDENTITY(2,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
SELECT u.Id
,u.Gender
FROM (
SELECT Id, Gender
FROM FemaleTable
UNION
SELECT Id, Gender
FROM MaleTable
) u
ORDER BY u.Id ASC
See here for a working example

Add or delete repeated row

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?
We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.
I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.
This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft