How would I make this join on this statistic? - sql

Firstly, sorry about the question title. I'm not up with statistics parlance or this kind of join difficulty whatever that may be.
I have a query*, with it I essentially generate three things.. a random_sex, random_first, and random_last. I'm trying to join now with this method.
random_sex | random_first | random_last
------------+------------------+------------------
male | 47.7101715711225 | 24.3833348881337
male | 72.8463141907472 | 28.3560050522089
female | 72.8617294209544 | 33.3203859277759
male | 39.3406164890062 | 26.3352867371729
female | 28.6855500966031 | 65.8870893270099
female | 35.5960198949557 | 83.1188118207422
male | 11.5711074977927 | 10.544433838184
male | 15.6900786811765 | 18.7324617852545
male | 24.9860797089245 | 8.98265511383023
female | 80.4563122882508 | 35.594445341751
(10 rows)
Essentially the census data sits in a table like this...
name | freq | cumfreq | rank | name_type
------------+-------+---------+------+-----------
SMITH | 1.006 | 1.006 | 1 | LAST
JOHNSON | 0.81 | 1.816 | 2 | LAST
WILLIAMS | 0.699 | 2.515 | 3 | LAST
JONES | 0.621 | 3.136 | 4 | LAST
BROWN | 0.621 | 3.757 | 5 | LAST
DAVIS | 0.48 | 4.237 | 6 | LAST
MILLER | 0.424 | 4.66 | 7 | LAST
WILSON | 0.339 | 5 | 8 | LAST
MOORE | 0.312 | 5.312 | 9 | LAST
TAYLOR | 0.311 | 5.623 | 10 | LAST
ANDERSON | 0.311 | 5.934 | 11 | LAST
THOMAS | 0.311 | 6.245 | 12 | LAST
JACKSON | 0.31 | 6.554 | 13 | LAST
WHITE | 0.279 | 6.834 | 14 | LAST
HARRIS | 0.275 | 7.109 | 15 | LAST
MARTIN | 0.273 | 7.382 | 16 | LAST
THOMPSON | 0.269 | 7.651 | 17 | LAST
GARCIA | 0.254 | 7.905 | 18 | LAST
MARTINEZ | 0.234 | 8.14 | 19 | LAST
And, in this case..
random_sex | random_first | random_last
male | 47.7101715711225 | 24.3833348881337
I want it to be joined like this (procedurally):
=# select * from census.names where cumfreq > 47.7101715711225 AND name_type = 'MALE_FIRST' order by cumfreq asc limit 1;
name | freq | cumfreq | rank | name_type
--------+-------+---------+------+------------
SILVER | 0.009 | 47.717 | 1424 | MALE_FIRST
=# select * from census.names where cumfreq > 24.3833348881337 AND name_type = 'LAST' order by cumfreq asc limit 1;
name | freq | cumfreq | rank | name_type
--------+-------+---------+------+-----------
HARPER | 0.054 | 24.408 | 185 | LAST
So this gents name would be Silver Harper. I've never met one in my life, but they do exist.
I'd like to return "Silver" "Harper" in the above query rather than random numbers. How can I make it work like this?
FOOTNOTE
*: Just to keep it simple:
SELECT
CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
, RANDOM() * 90.020 AS random_first -- dataset is 90% of most popular
, RANDOM() * 90.483 AS random_last
FROM generate_series(1,10,1);

I actually don't know about statistics as well. but I think this is what you want
Lets name the table who returns the random columns Randoms
WITH RANDOMS AS
(
SELECT
CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
, RANDOM() * 90.020 AS random_first
, RANDOM() * 90.483 AS random_last
FROM generate_series(1,10,1)
)
SELECT (
SELECT A.NAME
FROM census.names A
WHERE A.cumfreq > R.random_first
AND A.name_type = 'MALE_FIRST'
order by A.cumfreq asc limit 1
),
(
SELECT A.NAME
FROM census.names A
WHERE A.cumfreq > R.random_last
AND A.name_type = 'LAST'
order by A.cumfreq asc limit 1
) AS NAME
FROM RANDOMS R ;

Correlated sub-queries?
SELECT
*
FROM
yourRandomTable
INNER JOIN
census.names AS first_name
ON first_name.cumfreq = (SELECT MIN(cumfreq)
FROM census.names
WHERE cumfreq > yourRandomTable.random_first
AND type = yourRandomTable.random_sex + '_FIRST')
AND first_name.type = yourRandomTable.random_sex + '_FIRST'
INNER JOIN
census.names AS last_name
ON last_name.cumfreq = (SELECT MIN(cumfreq)
FROM census.names
WHERE cumfreq > yourRandomTable.random_last
AND type = 'LAST')
AND last_name.type = 'LAST'
You can vary this pattern quite a lot. Exactly how you choose to do it depends on how you have set up your indexes.

EXPLAIN ANALYZE SELECT
r.sex
, r.detail
, COALESCE(
(SELECT name FROM census.names AS mf WHERE r.sex = 'male' AND mf.name_type = 'MALE_FIRST' AND mf.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
, (SELECT name FROM census.names AS ff WHERE r.sex = 'female' AND ff.name_type = 'FEMALE_FIRST' AND ff.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
) AS first
, (SELECT name FROM census.names AS l WHERE l.name_type = 'LAST' AND l.cumfreq > r.last ORDER BY cumfreq LIMIT 1) AS last
FROM (
SELECT
RANDOM() * 90.020 AS first
, RANDOM() * 90.483 AS last
, CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS sex
FROM generate_series(1,10,1)
) AS r;
This is actually what I ended up going with.

Related

Write an SQL query to report the students (student_id, student_name) being “quiet” in ALL exams

-- A "quite" student is the one who took at least one exam and didn't score
neither the high score nor the low score.
-- Write an SQL query to report the students (student_id, student_name)
being "quiet" in ALL exams.
-- Don't return the student who has never taken any exam. Return the result
table ordered by student_id.
-- The query result format is in the following example.
Student table:
-- +-------------+---------------+
-- | student_id | student_name |
-- +-------------+---------------+
-- | 1 | Daniel |
-- | 2 | Jade |
-- | 3 | Stella |
-- | 4 | Jonathan |
-- | 5 | Will |
-- +-------------+---------------+
Exam table:
-- +------------+--------------+-----------+
-- | exam_id | student_id | score |
-- +------------+--------------+-----------+
-- | 10 | 1 | 70 |
-- | 10 | 2 | 80 |
-- | 10 | 3 | 90 |
-- | 20 | 1 | 80 |
-- | 30 | 1 | 70 |
-- | 30 | 3 | 80 |
-- | 30 | 4 | 90 |
-- | 40 | 1 | 60 |
-- | 40 | 2 | 70 |
-- | 40 | 4 | 80 |
-- +------------+--------------+-----------+
Result table:
-- +-------------+---------------+
-- | student_id | student_name |
-- +-------------+---------------+
-- | 2 | Jade |
-- +-------------+---------------+
Is my solution correct?
--My Solution
Select Student_id, Student_name
From (
Select
B.Student_id,
A.Student_name,
Score,
Max(Score) Over (Partition by Exam_id) score_max,
Max(Score) Over (Partition by Exam_id) score_min
From
Student A, Exam B
Where
A.Student_ID = B.Student_ID
) T
Where
Score != Max_score or Score != Min_Score
Group by
student_id, student_name
Having
Count(*) = (Select distinct count(exam_id) from exam)
Order by
A.student_id
Your result is correct but you need two changes in your query.
You have to change Max by Min in your score_min.
// ...
min(score) over (partition by exam_id) score_min,
max(score) over (partition by exam_id) score_max
// ...
Having it should be like this:
having count(1) =
(select count(distinct exam_id) from exam t2
where t1.student_id = t2.student_id)

How to print the students name in this query?

The concerned tables are as follows:
students(rollno, name, deptcode)
depts(deptcode, deptname)
course(crs_rollno, crs_name, marks)
The query is
Find the name and roll number of the students from each department who obtained
highest total marks in their own department.
Consider:
i) Courses of different department are different.
ii) All students of a particular department take same number and same courses.
Then only the query makes sense.
I wrote a successful query for displaying the maximum total marks by a student in each department.
select do.deptname, max(x.marks) from students so
inner join depts do
on do.deptcode=so.deptcode
inner join(
select s.name as name, d.deptname as deptname, sum(c.marks) as marks from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name and x.deptname=do.deptname group by do.deptname;
But as mentioned I need to display the name as well. Accordingly if I include so.name in select list, I need to include it in group by clause and the output is as below:
Kendra Summers Computer Science 274
Stewart Robbins English 80
Cole Page Computer Science 250
Brian Steele English 83
expected output:
Kendra Summers Computer Science 274
Brian Steele English 83
Where is the problem?
I guess this can be easily achieved if you use window function -
select name, deptname, marks
from (select s.name as name, d.deptname as deptname, sum(c.marks) as marks,
row_number() over(partition by d.deptname order by sum(c.marks) desc) rn
from students s
inner join crs_regd c on s.rollno=c.crs_rollno
inner join depts d on d.deptcode=s.deptcode
group by s.name,d.deptname) x
where rn = 1;
To solve the problem with a readable query I had to define a couple of views:
total_marks: For each student the sum of their marks
create view total_marks as select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno;
dept_max: For each department the highest total score by a single student of that department
create view dept_max as select deptcode, max(total) max_total from total_marks group by deptcode;
So I can get the desidered output with the query
select a.deptcode, a.rollno, a.name from total_marks a join dept_max b on a.deptcode = b.deptcode and a.total = b.max_total
If you don't want to use views you can replace their selects on the final query, which will result in this:
select a.deptcode, a.rollno, a.name
from
(select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a
join (select deptcode, max(total) max_total from (select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a_ group by deptcode) b
on a.deptcode = b.deptcode and a.total = b.max_total
Which I'm sure it is easily improvable in performance by someone more skilled then me...
If you (and anybody else) want to try it the way I did, here is the schema:
create table depts ( deptcode int primary key auto_increment, deptname varchar(20) );
create table students ( rollno int primary key auto_increment, name varchar(20) not null, deptcode int, foreign key (deptcode) references depts(deptcode) );
create table course ( crs_rollno int, crs_name varchar(20), marks int, foreign key (crs_rollno) references students(rollno) );
And here all the entries I inserted:
insert into depts (deptname) values ("Computer Science"),("Biology"),("Fine Arts");
insert into students (name,deptcode) values ("Turing",1),("Jobs",1),("Tanenbaum",1),("Darwin",2),("Mendel",2),("Bernard",2),("Picasso",3),("Monet",3),("Van Gogh",3);
insert into course (crs_rollno,crs_name,marks) values
(1,"Algorithms",25),(1,"Database",28),(1,"Programming",29),(1,"Calculus",30),
(2,"Algorithms",24),(2,"Database",22),(2,"Programming",28),(2,"Calculus",19),
(3,"Algorithms",21),(3,"Database",27),(3,"Programming",23),(3,"Calculus",26),
(4,"Zoology",22),(4,"Botanics",28),(4,"Chemistry",30),(4,"Anatomy",25),(4,"Pharmacology",27),
(5,"Zoology",29),(5,"Botanics",27),(5,"Chemistry",26),(5,"Anatomy",25),(5,"Pharmacology",24),
(6,"Zoology",18),(6,"Botanics",19),(6,"Chemistry",22),(6,"Anatomy",23),(6,"Pharmacology",24),
(7,"Sculpture",26),(7,"History",25),(7,"Painting",30),
(8,"Sculpture",29),(8,"History",24),(8,"Painting",30),
(9,"Sculpture",21),(9,"History",19),(9,"Painting",25) ;
Those inserts will load these data:
select * from depts;
+----------+------------------+
| deptcode | deptname |
+----------+------------------+
| 1 | Computer Science |
| 2 | Biology |
| 3 | Fine Arts |
+----------+------------------+
select * from students;
+--------+-----------+----------+
| rollno | name | deptcode |
+--------+-----------+----------+
| 1 | Turing | 1 |
| 2 | Jobs | 1 |
| 3 | Tanenbaum | 1 |
| 4 | Darwin | 2 |
| 5 | Mendel | 2 |
| 6 | Bernard | 2 |
| 7 | Picasso | 3 |
| 8 | Monet | 3 |
| 9 | Van Gogh | 3 |
+--------+-----------+----------+
select * from course;
+------------+--------------+-------+
| crs_rollno | crs_name | marks |
+------------+--------------+-------+
| 1 | Algorithms | 25 |
| 1 | Database | 28 |
| 1 | Programming | 29 |
| 1 | Calculus | 30 |
| 2 | Algorithms | 24 |
| 2 | Database | 22 |
| 2 | Programming | 28 |
| 2 | Calculus | 19 |
| 3 | Algorithms | 21 |
| 3 | Database | 27 |
| 3 | Programming | 23 |
| 3 | Calculus | 26 |
| 4 | Zoology | 22 |
| 4 | Botanics | 28 |
| 4 | Chemistry | 30 |
| 4 | Anatomy | 25 |
| 4 | Pharmacology | 27 |
| 5 | Zoology | 29 |
| 5 | Botanics | 27 |
| 5 | Chemistry | 26 |
| 5 | Anatomy | 25 |
| 5 | Pharmacology | 24 |
| 6 | Zoology | 18 |
| 6 | Botanics | 19 |
| 6 | Chemistry | 22 |
| 6 | Anatomy | 23 |
| 6 | Pharmacology | 24 |
| 7 | Sculpture | 26 |
| 7 | History | 25 |
| 7 | Painting | 30 |
| 8 | Sculpture | 29 |
| 8 | History | 24 |
| 8 | Painting | 30 |
| 9 | Sculpture | 21 |
| 9 | History | 19 |
| 9 | Painting | 25 |
+------------+--------------+-------+
I take chance to point out that this database is badly designed. This becomes evident with course table. For these reasons:
The name is singular
This table does not represent courses, but rather exams or scores
crs_name should be a foreign key referencing the primary key of another table (that would actually represent the courses)
There is no constrains to limit the marks to a range and to avoid a student to take twice the same exam
I find more logical to associate courses to departments, instead of student to departments (this way also would make these queries easier)
I tell you this because I understood you are learning from a book, so unless the book at one point says "this database is poorly designed", do not take this exercise as example to design your own!
Anyway, if you manually resolve the query with my data you will come to this results:
+----------+--------+---------+
| deptcode | rollno | name |
+----------+--------+---------+
| 1 | 1 | Turing |
| 2 | 6 | Bernard |
| 3 | 8 | Monet |
+----------+--------+---------+
As further reference, here the contents of the views I needed to define:
select * from total_marks;
+----------+-----------+--------+-------+
| deptcode | name | rollno | total |
+----------+-----------+--------+-------+
| 1 | Turing | 1 | 112 |
| 1 | Jobs | 2 | 93 |
| 1 | Tanenbaum | 3 | 97 |
| 2 | Darwin | 4 | 132 |
| 2 | Mendel | 5 | 131 |
| 2 | Bernard | 6 | 136 |
| 3 | Picasso | 7 | 81 |
| 3 | Monet | 8 | 83 |
| 3 | Van Gogh | 9 | 65 |
+----------+-----------+--------+-------+
select * from dept_max;
+----------+-----------+
| deptcode | max_total |
+----------+-----------+
| 1 | 112 |
| 2 | 136 |
| 3 | 83 |
+----------+-----------+
Hope I helped!
Try the following query
select a.name, b.deptname,c.marks
from students a
, crs_regd b
, depts c
where a.rollno = b.crs_rollno
and a.deptcode = c.deptcode
and(c.deptname,b.marks) in (select do.deptname, max(x.marks)
from students so
inner join depts do
on do.deptcode=so.deptcode
inner join (select s.name as name
, d.deptname as deptname
, sum(c.marks) as marks
from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name
and x.deptname=do.deptname
group by do.deptname
)
Inner/Sub query will fetch the course name and max marks and the outer query gets the corresponding name of the student.
try and let know if you got the desired result
Dense_Rank() function would be helpful in this scenario:
SELECT subquery.*
FROM (SELECT Student_Total_Marks.rollno,
Student_Total_Marks.name,
Student_Total_Marks.deptcode, depts.deptname,
rank() over (partition by deptcode order by total_marks desc) Student_Rank
FROM (SELECT Stud.rollno,
Stud.name,
Stud.deptcode,
sum(course.marks) total_marks
FROM students stud inner join course course on stud.rollno = course.crs_rollno
GROUP BY stud.rollno,Stud.name,Stud.deptcode) Student_Total_Marks,
dept dept
WHERE Student_Total_Marks.deptcode = dept.deptname
GROUP BY Student_Total_Marks.deptcode) subquery
WHERE suquery.student_rank = 1

How to apply TOP statement to only 1 column while selecting multiple columns from a table?

I am trying to select multiple columns from a table, but I want to select top certain number of records based on one column. I tried this :
select roll_no ,marks as Percentage
from database
where marks in (select top (3) *
from database
where subject = ''
order by marks desc) order by percentage desc
and I am getting the error:
Only one expression can be specified in the select list when the
sub-query is not introduced with EXISTS or more than specified number
of records.
I also tried :
select roll_no ,marks as Percentage
from database
where marks in (select top (3) marks
from database
where subject = ''
order by marks desc) order by percentage desc
which returns the right result for some subjects but for others..it is displaying top marks from other subjects as well.
eg :
+---------+-------+
| roll_no | marks |
+---------+-------+
|10003 | 87 |
|10006 | 72 |
|10003 | 72 |
|10002 | 67 |
|10004 | 67 |
+---------+-------+
How to frame the query correctly?
sample data :
+---------+-------+---------+
| roll_no | marks |subject |
+---------+-------+---------+
|10001 | 45 | Maths |
|10001 | 72 | Science |
|10001 | 64 | English |
|10002 | 52 | Maths |
|10002 | 35 | Science |
|10002 | 75 | English |
|10003 | 52 | Maths |
|10003 | 35 | Science |
|10003 | 75 | English |
|10004 | 52 | Maths |
|10004 | 35 | Science |
|10004 | 75 | English |
+---------+-------+---------+
If I'm right and you are looking for the best 3 marks for each subject, then you can get it with the following:
DECLARE #SelectedSubject VARCHAR(50) = 'Maths'
;WITH FilteredSubjectMarks AS
(
SELECT
D.Subject,
D.Roll_no,
D.Marks,
MarksRanking = DENSE_RANK() OVER (ORDER BY D.Marks DESC)
FROM
[Database] AS D
WHERE
D.Subject = #SelectedSubject
)
SELECT
F.*
FROM
FilteredSubjectMarks AS F
WHERE
F.MarksRanking <= 3
You can use window functions to rank your marks column (specifically dense_rank, which allows duplicate rankings whilst retaining sequential numbering) and then return all rows with a rank of 3 or less:
declare #t table(roll_no int identity(1,1),marks int);
insert into #t(marks) values(2),(4),(5),(8),(6),(1),(3),(2),(1),(8);
with t as
(
select roll_no
,marks
,dense_rank() over (order by marks desc) as r
from #t
)
select *
from t
where r <= 3;
Output:
+---------+-------+---+
| roll_no | marks | r |
+---------+-------+---+
| 4 | 8 | 1 |
| 10 | 6 | 1 |
| 5 | 6 | 2 |
| 3 | 5 | 3 |
+---------+-------+---+

Combining SQL grouped and ungrouped results with a cross join?

I have inherited two tables, where the data for one is in hours, and the data for the other is in days.
One table has planned resource use, the other holds actual hours spent
Internal_Resources
| PeopleName | NoOfDays | TaskNo |
|------------|----------|--------|
| Fred | 1 | 100 |
| Bob | 3 | 100 |
| Mary | 2 | 201 |
| Albert | 10 | 100 |
TimeSheetEntries
| UserName | PaidHours | TaskNumber |
|----------|-----------|------------|
| Fred | 7 | 100 |
| Fred | 14 | 100 |
| Fred | 7 | 100 |
| Bob | 7 | 100 |
| Bob | 21 | 100 |
| Mary | 7 | 201 |
| Mary | 14 | 100 |
What I need is a comparison of time planned vs time spent.
| name | PlannedDays | ActualDays |
|--------|-------------|------------|
| Albert | 10 | NULL |
| Bob | 3 | 4.00 |
| Fred | 1 | 4.00 |
| Mary | NULL | 2.00 |
I've cobbled together something that almost does the trick:
SELECT
UserName,
( SELECT
NoOfDays FROM Internal_Resources as r
WHERE r.PeopleName = e.UserName AND r.TaskNumber = ? ) AS PlannedDays,
SUM ( Round( PaidHours / 7 , 2 ) ) as ActualDays
FROM TimeSheetEntries e WHERE TaskNo = ?
GROUP BY UserName
Which for task 100 gives me back something like:
| UserName | PlannedDays | ActualDays |
|----------|-------------|------------|
| Bob | 3 | 4 |
| Fred | 1 | 4 |
| Mary | 0 | 2 |
but lazy Albert doesn't feature! I'd like:
| UserName | PlannedDays | ActualDays |
|----------|-------------|------------|
| Albert | 10 | 0 |
| Bob | 3 | 4 |
| Fred | 1 | 4 |
| Mary | 0 | 2 |
I've tried using variations on
SELECT * FROM ( SELECT ... ) AS plan
INNER JOIN ( [second-query] ) AS actual
ON plan.PeopleName = actual.UserName
What should I be doing? I suspect I need to squeeze a cross-join in there somewhere, but I'm getting nowhere...
( This going to be run inside a FileMaker ExecuteSQL() call, so I need pretty vanilla SQL... And no, I don't have control over the column or table names :-( )
EDIT:
To be clear, I need the result set to include both users who had planned days and haven't worked on a task, as well as those who have worked on a task without having planned days...
EDIT 2:
I can kind of get what I want manually, but can't see how to combine the statements below:
SELECT people.name, PlannedDays, ActualDays FROM
( SELECT PeopleName as name FROM Internal_Resources WHERE TaskNo = 100
UNION
SELECT DISTINCT UserName as name FROM TimeSheetEntries WHERE TaskNumber = 100
ORDER BY Name) AS people
gets me:
+--------+
| name |
+--------+
| Albert |
| Bob |
| Fred |
| Mary |
+--------+
and:
( SELECT PeopleName AS name, NoOfDays AS PlannedDays
FROM Internal_Resources WHERE TaskNo = 100 ) AS actual
gets me:
+--------+-------------+
| name | PlannedDays |
+--------+-------------+
| Fred | 1 |
| Bob | 3 |
| Albert | 10 |
+--------+-------------+
and finally,
( SELECT UserName AS name, SUM( Round( PaidHours / 7, 2 ) ) AS ActualDays
FROM TimeSheetEntries
WHERE TaskNumber = 100 GROUP BY UserName ) AS planned
gets me:
+------+------------+
| name | ActualDays |
+------+------------+
| Bob | 4.00 |
| Fred | 4.00 |
| Mary | 2.00 |
+------+------------+
Now all (All! ha!) I want is to combine these into this:
+--------+-------------+------------+
| name | PlannedDays | ActualDays |
+--------+-------------+------------+
| Albert | 10 | NULL |
| Bob | 3 | 4.00 |
| Fred | 1 | 4.00 |
| Mary | NULL | 2.00 |
+--------+-------------+------------+
EDIT 3:
I've tried combining it with something along the lines of:
SELECT people.name, PlannedDays, ActualDays
FROM ( SELECT PeopleName as name FROM Internal_Resources WHERE TaskNo = 100
UNION
SELECT DISTINCT UserName as name FROM TimeSheetEntries WHERE TaskNumber = 100
ORDER BY Name) AS people
LEFT JOIN ( SELECT PeopleName AS name, NoOfDays AS PlannedDays FROM Internal_Resources WHERE TaskNo = 100 ) AS actual,
ON people.name = actual.name
LEFT JOIN ( SELECT UserName AS name, SUM( Round( PaidHours / 7, 2 ) ) AS ActualDays FROM TimeSheetEntries WHERE TaskNumber = 100 GROUP BY UserName ) AS planned
ON people.name = planned.name;
but the syntax is clearly wonky.
Okay - this works:
SELECT people.name, COALESCE(PlannedDays, 0) as planned, COALESCE(ActualDays, 0) as actual
FROM ( SELECT PeopleName as name FROM Internal_Resources WHERE TaskNo = 100
UNION
SELECT DISTINCT UserName as name FROM TimeSheetEntries WHERE TaskNumber = 100
ORDER BY Name) AS people
LEFT JOIN ( SELECT PeopleName AS name, NoOfDays AS PlannedDays FROM Internal_Resources WHERE TaskNo = 100 ) AS ir
ON people.name = ir.name
LEFT JOIN ( SELECT UserName AS name, SUM( Round( PaidHours / 7, 2 ) ) AS ActualDays FROM TimeSheetEntries WHERE TaskNumber = 100 GROUP BY UserName ) AS ts
ON people.name = ts.name;
Giving:
+--------+---------+--------+
| name | planned | actual |
+--------+---------+--------+
| Albert | 10 | 0.00 |
| Bob | 3 | 4.00 |
| Fred | 1 | 4.00 |
| Mary | 0 | 2.00 |
+--------+---------+--------+
I thought there must be an easier way, and this looks simpler:
SELECT name, SUM(x) AS planned, SUM(y) AS actual
FROM (
SELECT PeopleName AS name, NoOfDays AS x, 0 AS y
FROM Internal_Resources WHERE TaskNo = 100
UNION
SELECT UserName AS name, 0 AS x, SUM( PaidHours / 7 ) AS y
FROM TimeSheetEntries WHERE TaskNumber = 100 GROUP BY UserName) AS source
GROUP BY name;
But frustratingly - both work in MySQL and both FAIL in FileMaker's cut-down SQL version - SELECTing from a derived table doesn't appear to be supported.
Finally - the trick to getting it to work in FileMaker SQL - subqueries are supported for IN and NOT IN... so a union of three queries - people who have planned days and have done some work, people who have done unplanned work, and people who haven't done planned work:
SELECT PeopleName as name, NoOfDays as planned, Sum( PaidHours / 7 ) as actual
FROM Internal_Resources
JOIN TimeSheetEntries
ON PeopleName = UserName
WHERE TaskNumber = 100 AND TaskNo = 100 GROUP BY PeopleName
UNION
SELECT UserName as name, 0 as planned, Sum( PaidHours / 7 ) as actual
FROM TimeSheetEntries
WHERE TaskNumber = 100
AND UserName NOT IN (
SELECT PeopleName FROM Internal_Resources WHERE TaskNo = 100
)
UNION
SELECT PeopleName as name, NoOfDays as planned, 0 as actual
FROM Internal_Resources WHERE TaskNo = 100
AND PeopleName NOT IN (
SELECT PeopleName as name
FROM Internal_Resources JOIN TimeSheetEntries
ON PeopleName = UserName
WHERE TaskNumber = 100 AND TaskNo = 100
GROUP BY PeopleName
)
ORDER BY name;
Hope this helps someone.
Doesn't filemaker support LEFT OUTER JOINs?
SELECT
PeopleName,
NoOfDays AS PlannedDays
ROUND(SUM(PaidHours) / 7, 2) AS ActualDays
FROM
Internal_Resources AS planned
-- left join should not discard Albert's record from Internal_Resources
LEFT JOIN TimeSheetEntries AS actual
ON planned.PeopleName = actual.UserName
AND planned.TaskNo = actual.TaskNumber
WHERE
planned.TaskNo = ?
GROUP BY PeopleName, NoOfDays
Invert the logic to read from Internal_resources in the outer query:
SELECT ir.UserName, NoOfDays as PlannedDays,
(SELECT SUM ( Round( PaidHours / 7 , 2 ))
FROM TimeSheetEntries e
WHERE e.TaskNo = ? AND ir.PeopleName = e.UserName
) as ActualDays
FROM Internal_Resources ir
WHERE ir.TaskNumber = ?
GROUP BY ir.UserName, NoOfDays;

Crosstab Query with Dynamic Columns in SQL Server 2005 up

I'm having a problem with Crosstab query in SQL Server.
Suppose that I have data as below:
| ScoreID | StudentID | Name | Sex | SubjectName | Score |
------------------------------------------------------------------
| 1 | 1 | Student A | Male | C | 100 |
| 2 | 1 | Student A | Male | C++ | 40 |
| 3 | 1 | Student A | Male | English | 60 |
| 4 | 1 | Student A | Male | Database | 15 |
| 5 | 1 | Student A | Male | Math | 50 |
| 6 | 2 | Student B | Male | C | 77 |
| 7 | 2 | Student B | Male | C++ | 12 |
| 8 | 2 | Student B | Male | English | 56 |
| 9 | 2 | Student B | Male | Database | 34 |
| 10 | 2 | Student B | Male | Math | 76 |
| 11 | 3 | Student C | Female | C | 24 |
| 12 | 3 | Student C | Female | C++ | 10 |
| 13 | 3 | Student C | Female | English | 15 |
| 14 | 3 | Student C | Female | Database | 40 |
| 15 | 3 | Student C | Female | Math | 21 |
| 16 | 4 | Student D | Female | C | 17 |
| 17 | 4 | Student D | Female | C++ | 34 |
| 18 | 4 | Student D | Female | English | 24 |
| 19 | 4 | Student D | Female | Database | 56 |
| 20 | 4 | Student D | Female | Math | 43 |
I want to make query which show the result as below:
| StuID| Name | Sex | C | C++ | Eng | DB | Math | Total | Average |
| 1 | Student A | Male | 100| 40 | 60 | 15 | 50 | 265 | 54 |
| 2 | Student B | Male | 77 | 12 | 56 | 34 | 76 | 255 | 51 |
| 3 | Student C | Female | 24 | 10 | 15 | 40 | 21 | 110 | 22 |
| 4 | Student D | Female | 17 | 34 | 24 | 56 | 43 | 174 | 34.8 |
How could I query to show output like this?
Note:
Subject Name:
C
C++
English
Database
Math
will be changed depend on which subject student learn.
Please go to http://sqlfiddle.com/#!6/2ba07/1 to test this query.
There are two ways to perform a PIVOT static where you hard-code the values and dynamic where the columns are determined when you execute.
Even though you will want a dynamic version, sometimes it is easier to start with a static PIVOT and then work towards a dynamic one.
Static Version:
SELECT studentid, name, sex,[C], [C++], [English], [Database], [Math], total, average
from
(
select s1.studentid, name, sex, subjectname, score, total, average
from Score s1
inner join
(
select studentid, sum(score) total, avg(score) average
from score
group by studentid
) s2
on s1.studentid = s2.studentid
) x
pivot
(
min(score)
for subjectname in ([C], [C++], [English], [Database], [Math])
) p
See SQL Fiddle with demo
Now, if you do not know the values that will be transformed then you can use Dynamic SQL for this:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(SubjectName)
from Score
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT studentid, name, sex,' + #cols + ', total, average
from
(
select s1.studentid, name, sex, subjectname, score, total, average
from Score s1
inner join
(
select studentid, sum(score) total, avg(score) average
from score
group by studentid
) s2
on s1.studentid = s2.studentid
) x
pivot
(
min(score)
for subjectname in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo
Both versions will yield the same results.
Just to round out the answer, if you do not have a PIVOT function, then you can get this result using CASE and an aggregate function:
select s1.studentid, name, sex,
min(case when subjectname = 'C' then score end) C,
min(case when subjectname = 'C++' then score end) [C++],
min(case when subjectname = 'English' then score end) English,
min(case when subjectname = 'Database' then score end) [Database],
min(case when subjectname = 'Math' then score end) Math,
total, average
from Score s1
inner join
(
select studentid, sum(score) total, avg(score) average
from score
group by studentid
) s2
on s1.studentid = s2.studentid
group by s1.studentid, name, sex, total, average
See SQL Fiddle with Demo
You need to use SQL PIVOT in this case. Plese refer the following link:
Pivot on Unknown Number of Columns
Pivot two or more columns in SQL Server
Pivots with Dynamic Columns in SQL Server
This requires building a SQL query string at runtime. Column names, counts and data-types in SQL Server are always static (the most important reason for that is that the optimizer must know the query data flow at optimization time).
So I recommend that you build a PIVOT-query at runtime and run it through sp_executesql. Note that you have to hardcode the pivot-column values. Be careful to escape them properly. You cannot use parameters for them.
Alternatively you can build one such query per column-count and use parameters just for the pivot values. You would have to assign some dummy column names like Pivot0, Pivot1, .... Still you need one query template per count of columns. Except if you are willing to hard-code the maximum number of pivot-columns into the query (say 20). In this case you actually could use static SQL.