How to do a narrow to wide transformation of data when there are no key names - sql

I have a data set that looks like this:
school_id | class_id | recess_num | student_id
----------------------------------------------
27 | 6 | 2 | 12
27 | 6 | 2 | 53
27 | 6 | 2 | 23
27 | 6 | 2 | 47
27 | 14 | 2 | 6
27 | 14 | 2 | 51
27 | 14 | 2 | 42
27 | 14 | 2 | 60
The idea is that certain students from different classes go out for recess at the same time. A couple of important points:
The same number of students from each class go out at the same time
The number of students from each class that go out each time is always the same (let's say 4 at a time)
I would like to create a wide table representation of this data where all the students that are out at each recess are captured in a single row. Since the number of students is always the same, I want to create new columns for each of the students:
school_id | class_id | recess_num | student_1 | student_2 | student_3 | student_4
---------------------------------------------------------------------------------
27 | 6 | 2 | 12 | 53 | 23 | 47
27 | 14 | 2 | 6 | 51 | 42 | 60
What is the best way to accomplish this using only a SQL query?

You can do conditional aggregation:
select
school_id,
class_id,
recess_num,
max(case when rn = 1 then student_id end) student_1,
max(case when rn = 2 then student_id end) student_2,
max(case when rn = 3 then student_id end) student_3,
max(case when rn = 4 then student_id end) student_4
from (
select
t.*,
row_number()
over(partition by school_id, class_id, recess_num order by student_id) rn
from mytable t
) t
group by
school_id,
class_id,
recess_num
The inner query ranks students within groups of school/class/recess, ordered by increasing id. Then the outer query pivots the data, using conditional aggregation.
Note that this does not produces exactly the same ordering of students in the columns as your expected result. However, this seems like a more consistent method to order the students by id (your expected result does not seem consistent in that regard).
Demo on DB Fiddle:
school_id | class_id | recess_num | student_1 | student_2 | student_3 | student_4
--------: | -------: | ---------: | --------: | --------: | --------: | --------:
27 | 6 | 2 | 12 | 23 | 47 | 53
27 | 14 | 2 | 6 | 42 | 51 | 60

select
school_id,
class_id,
recess_num,
case when student_id=12 then student_id end as student1,
case when student_id=53 then student_id end as student2,
case when student_id=23 then student_id end as student3,
case when student_id=47 then student_id end as student4,
from table
group by
school_id,
class_id,
recess_num

Related

Write an SQL query to report the students (student_id, student_name) being “quiet” in ALL exams

-- A "quite" student is the one who took at least one exam and didn't score
neither the high score nor the low score.
-- Write an SQL query to report the students (student_id, student_name)
being "quiet" in ALL exams.
-- Don't return the student who has never taken any exam. Return the result
table ordered by student_id.
-- The query result format is in the following example.
Student table:
-- +-------------+---------------+
-- | student_id | student_name |
-- +-------------+---------------+
-- | 1 | Daniel |
-- | 2 | Jade |
-- | 3 | Stella |
-- | 4 | Jonathan |
-- | 5 | Will |
-- +-------------+---------------+
Exam table:
-- +------------+--------------+-----------+
-- | exam_id | student_id | score |
-- +------------+--------------+-----------+
-- | 10 | 1 | 70 |
-- | 10 | 2 | 80 |
-- | 10 | 3 | 90 |
-- | 20 | 1 | 80 |
-- | 30 | 1 | 70 |
-- | 30 | 3 | 80 |
-- | 30 | 4 | 90 |
-- | 40 | 1 | 60 |
-- | 40 | 2 | 70 |
-- | 40 | 4 | 80 |
-- +------------+--------------+-----------+
Result table:
-- +-------------+---------------+
-- | student_id | student_name |
-- +-------------+---------------+
-- | 2 | Jade |
-- +-------------+---------------+
Is my solution correct?
--My Solution
Select Student_id, Student_name
From (
Select
B.Student_id,
A.Student_name,
Score,
Max(Score) Over (Partition by Exam_id) score_max,
Max(Score) Over (Partition by Exam_id) score_min
From
Student A, Exam B
Where
A.Student_ID = B.Student_ID
) T
Where
Score != Max_score or Score != Min_Score
Group by
student_id, student_name
Having
Count(*) = (Select distinct count(exam_id) from exam)
Order by
A.student_id
Your result is correct but you need two changes in your query.
You have to change Max by Min in your score_min.
// ...
min(score) over (partition by exam_id) score_min,
max(score) over (partition by exam_id) score_max
// ...
Having it should be like this:
having count(1) =
(select count(distinct exam_id) from exam t2
where t1.student_id = t2.student_id)

Grouping when using analytic functions

Let's suppose we have a table that looks like this:
Level|Depth|Descrip|
0 | 0 | Base |
1 | 50 | Level_1 |
2 | 53 | Level_2 |
3 | 60 | Level_3 |
8 | 80 | Level_8 |
10 | 81 | Level_10|
15 | 101 | Level_15|
16 | 102 | Level_16|
17 | 102 | Level_16_bis|
18 | 103 | Level_17|
I need, in first place, to get the rows that represent significative(more than 15 mts) depth jump respecting the previous ones. I get those rows doing something like this:
Select level,depth, descrip from(
Select level
, depth
,lag(depth) over (order by level asc) as prev_depth
, descrip
from ground_levels
)
Where abs(depth-prev_depth) > 15 and depth > 0
Which give me a table like this:
Level|Depth|Descrip|
1 | 50 | Level_1|
8 | 80 | Level_8|
15 | 101 | Level_15|
Now, I need to collect the levels that falls in between the jumps. So, I need something like this:
Level|Depth| Descrip | Equivalent_levels |
1 | 50 | Level_1 | 2,3 |
8 | 80 | Level_8 | 10 |
15 | 101 | Level_15| 16,17,18 |
I have being doing some searching about use "listagg", rank() and other analytic functions but I'm stuck with the script :(
In addition, it would be great if I can start a grouping when this condition is meet: abs(depth-prev_depth) > 15, so I can get something like that:
Level|Depth|Descrip | Group_ID
1 | 50 | Level_1 | 1 |
2 | 53 | Level_2 | 1 |
3 | 60 | Level_3 | 1 |
8 | 80 | Level_8 | 2 |
10 | 81 | Level_10| 2 |
15 | 101 | Level_15| 3 |
16 | 102 | Level_16| 3 |
17 | 102 | Level_16_bis| 3 |
18 | 103 | Level_17| 3 |
Any ideas ??
P.S: Sorry my bad english...
You can use a cumulative sum to define the groups. And then aggregation:
Select min(level) as level,
min(depth) keep (dense_rank first order by level) as depth,
min(descrip) keep (dense_rank first order by level) as descrip,
list_agg(level, ',') within group (order by level) as levels
from (select gl.*,
sum(case when abs(prev_depth - depth) > 15 and depth > 0 then 1 else 0 end) over (order by level) as grp
from (select gl.*, lag(depth) over (order by level asc) as prev_depth
from ground_levels
) gl
) gl
group grp;
This actually keeps the starting level in the list. It can be removed, but that requires a bit more work.

Only selecting certain rows based on calculated value

I have a table similar to:
-----------------------
|Student| Month| GPA |
---------------------
| 1 | 1 | 70 |
| 1 | 2 | 70 |
| 1 | 3 | 75 |
| 2 | 1 | 80 |
| 2 | 2 | 72 |
| 2 | 3 | 72 |
What I want, is to calculate the GPA change, per month, per student - only selecting rows where an actual change was observed. My desired output is:
-----------------------
|Student| Month| GPA |
---------------------
| 1 | 3 | 1.071|
| 2 | 2 | 0.9 |
So far I have the following query (simplified, but similar):
SELECT
Student,
Month,
GPA,
Change =
CASE
WHEN LAG(GPA, 1) OVER (ORDER BY Student, Month) !> 0
THEN 1
WHEN Student != LAG(Student, 1) OVER (ORDER BY Student, Month)
THEN 1
ELSE GPA/LAG(GPA, 1) OVER (ORDER BY Student, Month)
FROM students
ORDER BY Student, Month;
The output I receive from this is:
---------------------------------
|Student| Month| GPA | Change|
---------------------------------
| 1 | 1 | 70 | 1 |
| 1 | 2 | 70 | 1 |
| 1 | 3 | 75 | 1.071|
| 2 | 1 | 80 | 1 |
| 2 | 2 | 72 | 0.9 |
| 2 | 3 | 72 | 1 |
I believe a sub-query is needed to only select rows where Change != 1, but I'm unsure how to implement this correctly here.
You seem to want:
select s.*,
gpa / nullif(prev_gpa, 0) -- I suppose a 0 gpa is possible
from (select s.*,
lag(gpa) over (partition by student order by month) as prev_gpa
from s
) s
where prev_gpa is not null and prev_gpa <> gpa;
Very similar to Gordon's, but takes advantage of the optional 3rd parameter to LAG to use the current row's GPA when there is no previous (to yield no change).
SELECT *
FROM (
SELECT Student, Month, GPA
, Change = GPA / LAG(GPA, 1, GPA) OVER (PARTITION BY Student ORDER BY Month)
FROM students
) AS subQ
WHERE Change != 1.0
ORDER BY Student, Month
;
Edit: I'm not sure what the minimum GPA value could be, but it is best to be aware that a previous GPA of 0 would cause a divide by zero error.

How to print the students name in this query?

The concerned tables are as follows:
students(rollno, name, deptcode)
depts(deptcode, deptname)
course(crs_rollno, crs_name, marks)
The query is
Find the name and roll number of the students from each department who obtained
highest total marks in their own department.
Consider:
i) Courses of different department are different.
ii) All students of a particular department take same number and same courses.
Then only the query makes sense.
I wrote a successful query for displaying the maximum total marks by a student in each department.
select do.deptname, max(x.marks) from students so
inner join depts do
on do.deptcode=so.deptcode
inner join(
select s.name as name, d.deptname as deptname, sum(c.marks) as marks from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name and x.deptname=do.deptname group by do.deptname;
But as mentioned I need to display the name as well. Accordingly if I include so.name in select list, I need to include it in group by clause and the output is as below:
Kendra Summers Computer Science 274
Stewart Robbins English 80
Cole Page Computer Science 250
Brian Steele English 83
expected output:
Kendra Summers Computer Science 274
Brian Steele English 83
Where is the problem?
I guess this can be easily achieved if you use window function -
select name, deptname, marks
from (select s.name as name, d.deptname as deptname, sum(c.marks) as marks,
row_number() over(partition by d.deptname order by sum(c.marks) desc) rn
from students s
inner join crs_regd c on s.rollno=c.crs_rollno
inner join depts d on d.deptcode=s.deptcode
group by s.name,d.deptname) x
where rn = 1;
To solve the problem with a readable query I had to define a couple of views:
total_marks: For each student the sum of their marks
create view total_marks as select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno;
dept_max: For each department the highest total score by a single student of that department
create view dept_max as select deptcode, max(total) max_total from total_marks group by deptcode;
So I can get the desidered output with the query
select a.deptcode, a.rollno, a.name from total_marks a join dept_max b on a.deptcode = b.deptcode and a.total = b.max_total
If you don't want to use views you can replace their selects on the final query, which will result in this:
select a.deptcode, a.rollno, a.name
from
(select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a
join (select deptcode, max(total) max_total from (select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a_ group by deptcode) b
on a.deptcode = b.deptcode and a.total = b.max_total
Which I'm sure it is easily improvable in performance by someone more skilled then me...
If you (and anybody else) want to try it the way I did, here is the schema:
create table depts ( deptcode int primary key auto_increment, deptname varchar(20) );
create table students ( rollno int primary key auto_increment, name varchar(20) not null, deptcode int, foreign key (deptcode) references depts(deptcode) );
create table course ( crs_rollno int, crs_name varchar(20), marks int, foreign key (crs_rollno) references students(rollno) );
And here all the entries I inserted:
insert into depts (deptname) values ("Computer Science"),("Biology"),("Fine Arts");
insert into students (name,deptcode) values ("Turing",1),("Jobs",1),("Tanenbaum",1),("Darwin",2),("Mendel",2),("Bernard",2),("Picasso",3),("Monet",3),("Van Gogh",3);
insert into course (crs_rollno,crs_name,marks) values
(1,"Algorithms",25),(1,"Database",28),(1,"Programming",29),(1,"Calculus",30),
(2,"Algorithms",24),(2,"Database",22),(2,"Programming",28),(2,"Calculus",19),
(3,"Algorithms",21),(3,"Database",27),(3,"Programming",23),(3,"Calculus",26),
(4,"Zoology",22),(4,"Botanics",28),(4,"Chemistry",30),(4,"Anatomy",25),(4,"Pharmacology",27),
(5,"Zoology",29),(5,"Botanics",27),(5,"Chemistry",26),(5,"Anatomy",25),(5,"Pharmacology",24),
(6,"Zoology",18),(6,"Botanics",19),(6,"Chemistry",22),(6,"Anatomy",23),(6,"Pharmacology",24),
(7,"Sculpture",26),(7,"History",25),(7,"Painting",30),
(8,"Sculpture",29),(8,"History",24),(8,"Painting",30),
(9,"Sculpture",21),(9,"History",19),(9,"Painting",25) ;
Those inserts will load these data:
select * from depts;
+----------+------------------+
| deptcode | deptname |
+----------+------------------+
| 1 | Computer Science |
| 2 | Biology |
| 3 | Fine Arts |
+----------+------------------+
select * from students;
+--------+-----------+----------+
| rollno | name | deptcode |
+--------+-----------+----------+
| 1 | Turing | 1 |
| 2 | Jobs | 1 |
| 3 | Tanenbaum | 1 |
| 4 | Darwin | 2 |
| 5 | Mendel | 2 |
| 6 | Bernard | 2 |
| 7 | Picasso | 3 |
| 8 | Monet | 3 |
| 9 | Van Gogh | 3 |
+--------+-----------+----------+
select * from course;
+------------+--------------+-------+
| crs_rollno | crs_name | marks |
+------------+--------------+-------+
| 1 | Algorithms | 25 |
| 1 | Database | 28 |
| 1 | Programming | 29 |
| 1 | Calculus | 30 |
| 2 | Algorithms | 24 |
| 2 | Database | 22 |
| 2 | Programming | 28 |
| 2 | Calculus | 19 |
| 3 | Algorithms | 21 |
| 3 | Database | 27 |
| 3 | Programming | 23 |
| 3 | Calculus | 26 |
| 4 | Zoology | 22 |
| 4 | Botanics | 28 |
| 4 | Chemistry | 30 |
| 4 | Anatomy | 25 |
| 4 | Pharmacology | 27 |
| 5 | Zoology | 29 |
| 5 | Botanics | 27 |
| 5 | Chemistry | 26 |
| 5 | Anatomy | 25 |
| 5 | Pharmacology | 24 |
| 6 | Zoology | 18 |
| 6 | Botanics | 19 |
| 6 | Chemistry | 22 |
| 6 | Anatomy | 23 |
| 6 | Pharmacology | 24 |
| 7 | Sculpture | 26 |
| 7 | History | 25 |
| 7 | Painting | 30 |
| 8 | Sculpture | 29 |
| 8 | History | 24 |
| 8 | Painting | 30 |
| 9 | Sculpture | 21 |
| 9 | History | 19 |
| 9 | Painting | 25 |
+------------+--------------+-------+
I take chance to point out that this database is badly designed. This becomes evident with course table. For these reasons:
The name is singular
This table does not represent courses, but rather exams or scores
crs_name should be a foreign key referencing the primary key of another table (that would actually represent the courses)
There is no constrains to limit the marks to a range and to avoid a student to take twice the same exam
I find more logical to associate courses to departments, instead of student to departments (this way also would make these queries easier)
I tell you this because I understood you are learning from a book, so unless the book at one point says "this database is poorly designed", do not take this exercise as example to design your own!
Anyway, if you manually resolve the query with my data you will come to this results:
+----------+--------+---------+
| deptcode | rollno | name |
+----------+--------+---------+
| 1 | 1 | Turing |
| 2 | 6 | Bernard |
| 3 | 8 | Monet |
+----------+--------+---------+
As further reference, here the contents of the views I needed to define:
select * from total_marks;
+----------+-----------+--------+-------+
| deptcode | name | rollno | total |
+----------+-----------+--------+-------+
| 1 | Turing | 1 | 112 |
| 1 | Jobs | 2 | 93 |
| 1 | Tanenbaum | 3 | 97 |
| 2 | Darwin | 4 | 132 |
| 2 | Mendel | 5 | 131 |
| 2 | Bernard | 6 | 136 |
| 3 | Picasso | 7 | 81 |
| 3 | Monet | 8 | 83 |
| 3 | Van Gogh | 9 | 65 |
+----------+-----------+--------+-------+
select * from dept_max;
+----------+-----------+
| deptcode | max_total |
+----------+-----------+
| 1 | 112 |
| 2 | 136 |
| 3 | 83 |
+----------+-----------+
Hope I helped!
Try the following query
select a.name, b.deptname,c.marks
from students a
, crs_regd b
, depts c
where a.rollno = b.crs_rollno
and a.deptcode = c.deptcode
and(c.deptname,b.marks) in (select do.deptname, max(x.marks)
from students so
inner join depts do
on do.deptcode=so.deptcode
inner join (select s.name as name
, d.deptname as deptname
, sum(c.marks) as marks
from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name
and x.deptname=do.deptname
group by do.deptname
)
Inner/Sub query will fetch the course name and max marks and the outer query gets the corresponding name of the student.
try and let know if you got the desired result
Dense_Rank() function would be helpful in this scenario:
SELECT subquery.*
FROM (SELECT Student_Total_Marks.rollno,
Student_Total_Marks.name,
Student_Total_Marks.deptcode, depts.deptname,
rank() over (partition by deptcode order by total_marks desc) Student_Rank
FROM (SELECT Stud.rollno,
Stud.name,
Stud.deptcode,
sum(course.marks) total_marks
FROM students stud inner join course course on stud.rollno = course.crs_rollno
GROUP BY stud.rollno,Stud.name,Stud.deptcode) Student_Total_Marks,
dept dept
WHERE Student_Total_Marks.deptcode = dept.deptname
GROUP BY Student_Total_Marks.deptcode) subquery
WHERE suquery.student_rank = 1

How to apply TOP statement to only 1 column while selecting multiple columns from a table?

I am trying to select multiple columns from a table, but I want to select top certain number of records based on one column. I tried this :
select roll_no ,marks as Percentage
from database
where marks in (select top (3) *
from database
where subject = ''
order by marks desc) order by percentage desc
and I am getting the error:
Only one expression can be specified in the select list when the
sub-query is not introduced with EXISTS or more than specified number
of records.
I also tried :
select roll_no ,marks as Percentage
from database
where marks in (select top (3) marks
from database
where subject = ''
order by marks desc) order by percentage desc
which returns the right result for some subjects but for others..it is displaying top marks from other subjects as well.
eg :
+---------+-------+
| roll_no | marks |
+---------+-------+
|10003 | 87 |
|10006 | 72 |
|10003 | 72 |
|10002 | 67 |
|10004 | 67 |
+---------+-------+
How to frame the query correctly?
sample data :
+---------+-------+---------+
| roll_no | marks |subject |
+---------+-------+---------+
|10001 | 45 | Maths |
|10001 | 72 | Science |
|10001 | 64 | English |
|10002 | 52 | Maths |
|10002 | 35 | Science |
|10002 | 75 | English |
|10003 | 52 | Maths |
|10003 | 35 | Science |
|10003 | 75 | English |
|10004 | 52 | Maths |
|10004 | 35 | Science |
|10004 | 75 | English |
+---------+-------+---------+
If I'm right and you are looking for the best 3 marks for each subject, then you can get it with the following:
DECLARE #SelectedSubject VARCHAR(50) = 'Maths'
;WITH FilteredSubjectMarks AS
(
SELECT
D.Subject,
D.Roll_no,
D.Marks,
MarksRanking = DENSE_RANK() OVER (ORDER BY D.Marks DESC)
FROM
[Database] AS D
WHERE
D.Subject = #SelectedSubject
)
SELECT
F.*
FROM
FilteredSubjectMarks AS F
WHERE
F.MarksRanking <= 3
You can use window functions to rank your marks column (specifically dense_rank, which allows duplicate rankings whilst retaining sequential numbering) and then return all rows with a rank of 3 or less:
declare #t table(roll_no int identity(1,1),marks int);
insert into #t(marks) values(2),(4),(5),(8),(6),(1),(3),(2),(1),(8);
with t as
(
select roll_no
,marks
,dense_rank() over (order by marks desc) as r
from #t
)
select *
from t
where r <= 3;
Output:
+---------+-------+---+
| roll_no | marks | r |
+---------+-------+---+
| 4 | 8 | 1 |
| 10 | 6 | 1 |
| 5 | 6 | 2 |
| 3 | 5 | 3 |
+---------+-------+---+