Join results of two queries in SQL and produce a result given some condition - sql

I've never used SQL until now, so please bear with me. I have a table of departments:
I have written two queries as follows:
-- nbr of staff associated with each dept.
SELECT count(departmentId) as freq
FROM staff
GROUP BY departmentId
-- nbr of students associated with each dept.
SELECT count(departmentId) as freq
FROM StudentAssignment
GROUP BY departmentId
These produce the following two tables:
For each department id 1 to 5, I need to divide the studentFreq by the staffFreq and show the department id and description if the result is greater than 2.
If the staffFreq i.e. number of staff, for a department id is zero then I need to show that department id and description too.
So for example, in this case I want to produce a table with the department ids of 1, 4 and 5 and their corresponding descriptions: Computing, Classics and Mechanical Engineering.
Computing because 7 / 2 > 2. Classics and ME because 0 staff are assigned to those depts.

One method is a left join, starting with the departments table:
SELECT d.*,
s.freq as as num_staff, sa.freq as num_students,
sa.freq * 1.0 / s.freq as student_staff_ratio
FROM deptartments d LEFT JOIN
(SELECT departmentId, count(*) as freq
FROM staff
GROUP BY departmentId
) s
ON s.departmentId = d.department_id LEFT JOIN
(SELECT departmentId, count(*) as freq
FROM StudentAssignment
GROUP BY departmentId
) sa
ON sa.departmentId = d.departmentId;
Notes:
This should missing values as NULL rather than 0. You can assign 0 instead using COALESCE(): COALESCE(s.freq, 0) as num_staff.
SQL Server does integer division, so 7 / 2 = 3, not 3.5. I think you would typically want the fractional component.

Related

GROUP BY one column, then GROUP BY another column

I have a database table t with a sales table:
ID
TYPE
AGE
1
B
20
1
BP
20
1
BP
20
1
P
20
2
B
30
2
BP
30
2
BP
30
3
P
40
If a person buys a bundle it appears the bundle sale (TYPE B) and the different bundle products (TYPE BP), all with the same ID. So a bundle with 2 products appears 3 times (1x TYPE B and 2x TYPE BP) and has the same ID.
A person can also buy any other product in that single sale (TYPE P), which has also the same ID.
I need to calculate the average/min/max age of the customers but the multiple entries per sale tamper with the correct calculation.
The real average age is
(20 + 30 + 40) / 3 = 30
and not
(20+20+20+20 + 30+30+30 + 40) / 8 = 26,25
But I don't know how I can reduce the sales to a single row entry AND get the 4 needed values?
Do I need to GROUP BY twice (first by ID, then by AGE?) and if yes, how can I do it?
My code so far:
SELECT
AVERAGE(AGE)
, MIN(AGE)
, MAX(AGE)
, MEDIAN(AGE)
FROM t
but that does count every row.
Assuming the age is the same for all rows with the same ID (which in itself indicates a normalisation problem), you can use nest aggregation:
select avg(min(age)) from sales
group by id
AVG(MIN(AGE))
-------------
30
SQL Fiddle
The example in the documentation is very similar; and is explained as:
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause (department_id), and aggregates the results again.
So for your version:
This calculation evaluates the inner aggregate (MIN(age)) for each group defined by the GROUP BY clause (id), and aggregates the results again.
It doesn't really matter whether the inner aggregate is min or max - again, assuming they are all the same - it's just to get a single value per ID, which can then be averaged.
You can do the same for the other values in your original query:
select
avg(min(age)) as avg_age,
min(min(age)) as min_age,
max(min(age)) as max_age,
median(min(age)) as med_age
from sales
group by id;
AVG_AGE MIN_AGE MAX_AGE MED_AGE
------- ------- ------- -------
30 20 40 30
Or if you prefer you could get the one-age-per-ID values once ina CTE or subquery and apply the second layer of aggregation to that:
select
avg(age) as avg_age,
min(age) as min_age,
max(age) as max_age,
median(age) as med_age
from (
select min(age) as age
from sales
group by id
);
which gets the same result.
SQL Fiddle

Is this SQL (oracle) query correct?

Diagram of Tables:
SELECT DISTINCT s.location, tb.num_students_enrolled, tb2.num_sections
FROM section s
JOIN
(SELECT location, COUNT(DISTINCT student_id) num_students_enrolled
FROM section sec JOIN enrollment e ON(sec.section_id = e.section_id)
GROUP BY location) tb
ON(s.location = tb.location)
JOIN
(SELECT location, COUNT(section_id) num_sections
FROM section sect
GROUP BY location) tb2
ON(s.location = tb2.location)
ORDER BY s.location
Question:
Create a query that lists location, number of sections taught in that location and number of students enrolled in courses at that location. Sort by location.
Enrollment table - many to many relationship between section_id and student_id. A student can be enrolled in multiple sections, and there can be multiple students enrolled in a section.
Section Table
Section table - section_id is unique primary key. There are a total of 12 locations.
Query Results
location nm_st_en nm_sects
H310 1 1
L206 8 1
L210 29 10
L211 10 3
L214 33 15
L500 14 2
L507 33 15
L509 64 25
L511 3 1
M200 1 1
M311 11 3
M500 5 1
Whether the query is correct requires data, and is up to you to figure out. Write some queries to verify your results on a subset of the data.
But I think you might be over-complicating this.
Something like:
select sec.location,
count(distinct sec.section_id) as number_of_sections,
count(distinct enr.student_id) as number_of_students
from Section sec join Enrollment enr on sec.section_id = enr.section_id
group by sec.location
order by sec.location;
As a general pointer, you should avoid joining query results, as it makes things a bit difficult to understand. Also, parenthesis around join clauses are unnecessary clutter.

group by in sql to find max count

Update
I have a query like this
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) from StudentList sl
group by sl.College_ID,sl.Department_ID
order by sl.College_ID,sl.Department_ID asc
abouve query gives the below result
and i have 200 - college id and each college have 6 department_id i could get the count [No.of student ] in each department
College_Id Dept_Id count
1 1 100
1 2 210
2 3 120
2 6 80
3 1 340
but my question is i need to display the maximum count[student] for each department
some thing like this
college_ID Dept_Id count
3 1 340
26 2 250
and i tried this out but getting error
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) from StudentList sl
group by sl.College_ID,sl.Department_ID
having COUNT(sl.RegisterNumber)=max(COUNT(sl.RegisterNumber))
order by sl.College_ID,sl.Department_ID asc
what went wrong can some one help me
Maybe something like this?
SELECT sl.College_ID, sl.Department_ID, COUNT(sl.RegisterNumber) As StudentCount, s2.MaxCount
FROM StudentList sl
INNER JOIN (
SELECT Department_ID, MAX(StudentCount) AS MaxCount
FROM (
SELECT College_ID, Department_ID, COUNT(*) As StudentCount
FROM StudentList
GROUP BY College_ID, Department_ID
) s1
GROUP BY Department_ID
) s2 ON sl.Department_ID = s2.Department_ID
GROUP BY sl.College_ID, sl.Department_ID, s2.MaxCount
HAVING COUNT(sl.RegisterNumber) = s2.MaxCount
ORDER BY sl.College_ID, sl.Department_ID ASC
EDIT: I've updated the query to more accurately answer your question, I missed the part where you want the College_ID with the max count.
EDIT 2: Okay, this should work now, I needed a second nested subquery for aggregating the aggregates. I don't know of a better way to compare the aggregates of different groups.
The result you want, which is group on college_ID and you not really care about college_ID, since from you example, with Dept_Id=1 will can not sure which college_ID is.
In that case, you can remove college_ID from your select statement and do a SUM GROUP BY
base on your query, something like:
SELECT t.Department_ID, SUM(t.c)
FROM (
select sl.College_ID,sl.Department_ID,COUNT(sl.RegisterNumber) c from StudentList sl
group by sl.College_ID,sl.Department_ID
) t
GROUP BY t.Department_ID
ORDER BY SUM(t.c)
Note: If you really want college_ID in your result, you can do a JOIN to get your college_ID

SELECT COUNT(*) - return 0 along with grouped fields if there are no matching rows

I have the following query:
SELECT employee,department,count(*) AS sum FROM items
WHERE ((employee = 1 AND department = 2) OR
(employee = 3 AND department = 4) OR
(employee = 5 AND department = 6) OR
([more conditions with the same structure]))
AND available = true
GROUP BY employee, department;
If there are no items for a pair "employee-department", then the query returns nothing. I'd like it to return zero instead:
employee | department | sum
---------+------------+--------
1 | 2 | 0
3 | 4 | 12
5 | 6 | 1234
EDIT1
Looks like this is not possible, as Matthew PK explains in his answer to a similar question. I was mistakenly assuming Postgres could extract missing values from WHERE clause somehow.
EDIT2
It is possible with some skills. :) Thanks to Erwin Brandstetter!
Not possible? Challenge accepted. :)
WITH x(employee, department) AS (
VALUES
(1::int, 2::int)
,(3, 4)
,(5, 6)
-- ... more combinations
)
SELECT x.employee, x.department, count(i.employee) AS ct
FROM x
LEFT JOIN items i ON i.employee = x.employee
AND i.department = x.department
AND i.available
GROUP BY x.employee, x.department;
This will give you exactly what you are asking for. If employee and department aren't integer, cast to the matching type.
Per comment from #ypercube: count() needs to be on a non-null column of items, so we get 0 for non-existing critera, not 1.
Also, pull up additional criteria into the LEFT JOIN condition (i.available in this case), so you don't exclude non-existing criteria.
Performance
Addressing additional question in comment.
This should perform very well. With longer lists of criteria, (LEFT) JOIN is probably the fastest method.
If you need it as fast as possible, be sure to create a multicolumn index like:
CREATE INDEX items_some_name_idx ON items (employee, department);
If (employee, department) should be the PRIMARY KEY or you should have a UNIQUE constraint on the two columns, that would do the trick, too.
select employee, department,
count(
(employee = 1 and department = 2) or
(employee = 3 and department = 4) or
(employee = 5 and department = 6) or
null
) as sum
from items
where available = true
group by employee, department;
Building on Erwin's join suggestion, this one really works:
with x(employee, department) as (
values (1, 2)
)
select
coalesce(i.employee, x.employee) as employee,
coalesce(i.department, x.department) as department,
count(available or null) as ct
from
x
full join
items i on
i.employee = x.employee
and
i.department = x.department
group by 1, 2
order by employee, department

How do I write a standard SQL GROUP BY that includes columns not in the GROUP BY clause

Let's say I have a table called Customer, defined like this:
Id Name DepartmentId Hired
1 X 101 2001/01/01
2 Y 102 2002/01/01
3 Z 102 2003/01/01
And I want to retrieve the date of the last hiring in each department.
Obviously I would do this
SELECT c.DepartmentId, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
Which returns:
101 2001/01/01
102 2003/01/01
But what do I do if I want to return the name of the guy hired? I.e. I would want this result set:
101 2001/01/01 X
102 2003/01/01 Z
Note that the following does not work, as it would return three rows rather than the two I'm looking for:
SELECT c.DepartmentId, c.Name, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
I can't remember seeing a query that achieves this.
NOTE: It's not acceptable to join on the Hired field, as that would not be guaranteed to be accurate.
A subselect would do the job and would handle the case where more than one person was hired in the same department on the same day:
SELECT c.DepartmentId, c.Name, c.Hired from Customer c,
(SELECT DepartmentId, MAX(Hired) as MaxHired
FROM Customer
GROUP BY DepartmentId) as sub
WHERE c.DepartmentId = sub.DepartmentId AND c.Hired = sub.MaxHired
Standard Sql:
select *
from Customer C
where exists
(
-- Linq to Sql put NULL instead ;-)
-- In fact, you can even put 1/0 here and would not cause division by zero error
-- An RDBMS do not parse the select clause of correlated subquery
SELECT NULL
FROM Customer
where c.DepartmentId = DepartmentId
GROUP BY DepartmentId
having c.Hired = MAX(Hired)
)
If Sql Server happens to support tuple testing, this is the most succint:
select *
from Customer
where (DepartmentId, Hired) in
(select DepartmentId, MAX(Hired)
from Customer
group by DepartmentId)
SELECT a.*
FROM Customer AS a
JOIN
(SELECT DepartmentId, MAX(Hired) AS Hired
FROM Customer GROUP BY DepartmentId) AS b
USING (DepartmentId,Hired);