SQL query for the given table - sql

I have 2 tables, Student and Supervisor:
STUDENT(supervisorid(pk),name,email....)
SUPERVISOR(supervisorid(pk),name,email....)
Now I need to print supervisor name, email and the # of students under the supervisor (they will have same supervisor id). Something like:
select supervisorname,
supervisoremail,
tot_stud as (select count(*)
Phd_Student s
where s.supervisor_id = r.supervisor_id)
from Phd_Supervisor r
Can you please tell me the SQL query for this.

You will want to use the group by clause for this query. You can specify all of the fields that you want to display, as well as the count(*), join the tables, relate the tables , and then put in your group by clause, listing all of the display fields,(without the count(*)), as those are the fields you are grouping the students by to get their count.

select supervisorname,
supervisoremail,
(select count(*)
from Phd_Student s
where s.supervisor_id = r.supervisor_id) as tot_stud
from Phd_Supervisor r

Related

Count Distinct values in one column based on other column

I am trying to count distinct values on Z_l based on value by using with clause. Sample data exercise included below.
please look at the picture, the distinct values of Z_l based on X='ny'
with distincz_l as (select ny.X, ny.z_l o.cnt From HOPL ny join (select X, count(*) as cnt from HOPL group by X) o on (ny.X = o.Z_l)) select * from HOPL;
You don't even need a WITH clause, since you just need one single sentence:
SELECT z_l, count(1)
FROM hopl
WHERE x='ny'
GROUP BY z_l
;

Aggregating based on GROUPING of multiple columns

I am trying to subquery and aggregate in SQL after doing an initial query with multiple joins. My ultimate goal is to get a count (or a sum) of specimens tested based on a grouping of multiple columns. This is slightly different from SQL Server query - Selecting COUNT(*) with DISTINCT and SQL Server: aggregate error on grouping.
The three tables that I use (PERSON, SPECIMEN, TEST), have 1-many relationships. So PERSON has many SPECIMENS and those SPECIMENS have many TESTS. I did three inner joins to combine these tables plus an additional table (ANALYSIS).
WITH TALLY as (
SELECT PERSON.NAME, PERSON.PHASE, TEST.DATE_STARTED, TEST.ANALYSIS, SPECIMEN.GROUP, TEST.STATUS,
ANALYSIS.ANALYSIS_TYPE, SPECIMEN.SPECIMEN_NUMBER
FROM DB.TEST
INNER JOIN
DB.SAMPLE ON
TEST.SPECIMEN_NUMBER = SPECIMEN.SPECIMEN_NUMBER
INNER JOIN
DB.PRODUCT ON
SPECIMEN.PERSON = PERSON.NAME
INNER JOIN
DB.ANALYSIS ON
TEST.ANALYSIS = ANALYSIS.NAME
WHERE PERSON.NAME = 'Joe'
AND TEST.DATE_STARTED >= '20-DEC-16' AND TEST.DATE_STARTED <='01-APR-18'
AND PERSON.PHASE = 'PHASE1'
ORDER BY TEST.DATE_STARTED)
SELECT COUNT(DISTINCT ANALYSIS) as SPECIMEN_COUNT, DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
FROM TALLY
GROUP BY DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
ORDER BY DATE_STARTED;
This gives me the repeated columns: first grouping repeated 4 times
What I am trying to see is: aggregated first grouping with total count
Any thoughts as to what is missing? SUM instead of COUNT or in addition to COUNT creates an error. Thanks in advance!
9/17/2020 Update: I have tried adding a subquery because I also need to use a new column of metadata (ANALYSIS_TYPE_ALIAS) which is created in the first query through a CASE STATEMENT(...). I have also tried using another subquery with inner join to count based on those conditions to a temp table, but still cannot seem to aggregate to flatten the table. Here is my current attempt:
WITH TALLY as (
SELECT PERSON.NAME, PERSON.PHASE, TEST.DATE_STARTED, TEST.ANALYSIS, SPECIMEN.GROUP, TEST.STATUS,
ANALYSIS.ANALYSIS_TYPE...
FROM DB.TEST
INNER JOIN
DB.SAMPLE ON
TEST.SPECIMEN_NUMBER = SPECIMEN.SPECIMEN_NUMBER
INNER JOIN
DB.PRODUCT ON
SPECIMEN.PERSON = PERSON.NAME
INNER JOIN
DB.ANALYSIS ON
TEST.ANALYSIS = ANALYSIS.NAME
WHERE PERSON.NAME = 'Joe'
AND TEST.DATE_STARTED >= '20-DEC-16' AND TEST.DATE_STARTED <='01-APR-18'
AND PERSON.PHASE = 'PHASE1'
ORDER BY TEST.DATE_STARTED),
SUMMARY_COMBO AS (SELECT DISTINCT(CONCAT(CONCAT(CONCAT(CONCAT(ANALYSIS, DATE_STARTED),STATUS), GROUP), ANALYSIS_TYPE_ALIAS))AS UUID,
TALLY.NAME, TALLY.PHASE, TALLY.DATE_STARTED, TALLY.ANALYSIS, TALLY.GROUP, TALLY.STATUS, TALLY.ANALYSIS_TYPE_ALIAS
FROM TALLY)
SELECT SUMMARY_COMBO.NAME, SUMMARY_COMBO.PHASE, SUMMARY_COMBO.DATE_STARTED, SUMMARY_COMBO.ANALYSIS,SUMMARY_COMBO.GROUP, SUMMARY_COMBO.STATUS, SUMMARY_COMBO.ANALYSIS_TYPE_ALIAS,
COUNT(SUMMARY_COMBO.ANALYSIS) OVER (PARTITION BY SUMMARY_COMBO.UUID) AS SPECIMEN_COUNT
FROM SUMMARY_COMBO
ORDER BY SUMMARY_COMBO.DATE_STARTED;
This gave me the following table Shows aggregated counts, but doesn't aggregate based on unique UUID. Is there a way to take the sum of the count? I've tried to do this by storing count to a subquery and then referencing that count variable, but I am missing something in how to group the 8 columns of data that I want to show + the count of that combination of columns.
Thanks!
Just remove analysis from the group by clause, since that's the column whose distinct values you want to count. Otherwise, the query generates more groups than what you need (and the count of distinct analysis values in each group is always 1).
WITH TALLY as ( ...)
SELECT COUNT(DISTINCT ANALYSIS) as SPECIMEN_COUNT, DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
FROM TALLY
GROUP BY DATE_STARTED, STATUS, GROUP, ANALYSIS_TYPE
ORDER BY DATE_STARTED;

Show unique ID's in a table with all extra info

SELECT Personeelsnummer, Achternaam, Voornaam, Departement, SubDep, SubSubDep, FTE, RedenUitDienst, Anciennitëitsdatum, GeldigOp, Schrapping, Ancienniteit, Positie, Nieveau, OmschrijfingStatuut
FROM tbl_Worker
GROUP BY Personeelsnummer
OR
SELECT (DISTINCT Personeelsnummer), Achternaam, Voornaam, Departement, SubDep, SubSubDep, FTE, RedenUitDienst, Anciennitëitsdatum, GeldigOp, Schrapping, Ancienniteit, Positie, Nieveau, OmschrijfingStatuut
FROM tbl_Worker
GROUP BY Personeelsnummer
I have a worker table with 49000 records, this includes a 'snapshot' from all workers EVERY month. But what I need is a table with all employees the company 'ever' had but only once. so I tried to wright the query's show above but they are not working.
So what I need is a query that shows all unique 'Personeelsnummers' with all the extra information about these persons.
what does work is this: SELECT DISTINCT Personeelsnummer FROM tbl_Worker ==> this gives me a table with 1200 records but only the numbers but I need all the extra information.
Instead of GROUP BY, use WHERE to get the first or last record:
SELECT w.*
FROM tbl_Worker as w
WHERE monthcol = (SELECT MAX(w2.monthcol)
FROM tbl_Worker as w2
WHERE w2.Personeelsnummer = w.Personeelsnummer
);
You would use MIN() to get the first month's record. My Dutch is a bit weak, so I don't know which column refers to the date for the record.
For performance, you want an index on tbl_Worker(Personeelsnummer, GeldigOp):
create index idx_tbl_worker_Personeelsnummer_GeldigOp on tbl_Worker(Personeelsnummer, GeldigOp);
EDIT:
Or you can do it with a JOIN:
SELECT w.*
FROM tbl_Worker as w INNER JOIN
(SELECT Personeelsnummer, MAX(GeldigOp) as max_GeldigOp
FROM tbl_Worker
GROUP BY Personeelsnummer
) as ww
ON ww.Personeelsnummer = w.Personeelsnummer and ww.max_GeldigOp = w.GeldigOp;
You're looking for a group by:
select *
from table
group by field1
Which can occasionally be written with a distinct on statement:
select distinct on field1 *
from table
As seen in this topic.

How can I COUNT rows from another table using a SELECT statement when joining?

this is the first time I've tried including a row count within a select statement. I've tried the following but including COUNT(other row) is apparently not allowed in the way I'm trying to do it.
How can I include a row count from another table in a select statement, mainly consisting of objects from the first table?
-Thanks
...
SELECT
Reports.ReportID,
EmployeeADcontext,
ReportName,
CreatedDate,
COUNT(Expenses.ExpID) AS ExpCount,
ReportTotal,
Status
FROM
[dbo].[Reports]
INNER JOIN
[dbo].[Expenses]
ON
[dbo].[Expenses].ReportID = [dbo].[Reports].ReportID
WHERE EmployeeADcontext = #rptEmployeeADcontext
You are missing your GROUP BY. Whenever you aggregate (SUM, COUNT, MAX, etc..) you always need to include a GROUP BY statement that includes all visible fields except your aggregated fields. So your code should read:
SELECT
Reports.ReportID,
EmployeeADcontext,
ReportName,
CreatedDate,
COUNT(Expenses.ExpID) AS ExpCount,
ReportTotal,
Status
FROM
[dbo].[Reports]
INNER JOIN
[dbo].[Expenses]
ON
[dbo].[Expenses].ReportID = [dbo].[Reports].ReportID
WHERE EmployeeADcontext = #rptEmployeeADcontext
GROUP BY Reports.ReportID, EmployeeADcontext, ReportName, CreatedDate,
ReportTotal, Status
Here is some additional documentation on T-SQL GROUP BY.
You need a group by clause.
Add:
GROUP BY
Reports.ReportID,
EmployeeADcontext,
ReportName,
CreatedDate,
ReportTotal,
Status
You could use a sub-query to return the count. That way you don't need any joins. For example:
SELECT
r.ReportID,
r.EmployeeADcontext,
r.ReportName,
r.CreatedDate,
(select COUNT(e1.ExpID) FROM Expenses e1 where e1.ReportID = r.ReportId) AS ExpCount,
r.ReportTotal,
r.Status
FROM Reports r
WHERE r.EmployeeADcontext = #rptEmployeeADcontext

How to 'add' a column to a query result while the query contains aggregate function?

I have a table named 'Attendance' which is used to record student attendance time in courses. This table has 4 columns, say 'id', 'course_id', 'attendance_time', and 'student_name'. An example of few records in this table is:
23 100 1/1/2010 10:00:00 Tom
24 100 1/1/2010 10:20:00 Bob
25 187 1/2/2010 08:01:01 Lisa
.....
I want to create a summary of the latest attendance time for each course. I created a query below:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id
The result would be something like this
100 1/1/2010 10:20:00
187 1/2/2010 08:01:01
Now, all I want to do is add the 'id' column to the result above. How to do it?
I can't just change the command to something like this
SELECT id, course_id, max(attendance_time) FROM attendance GROUP BY id, course_id
because it would return all the records as if the aggregate function is not used. Please help me.
This is a typical 'greatest per group', 'greatest-n-per-group' or 'groupwise maximum' query that comes up on Stack Overflow almost every day. You can search Stack Overflow for these terms to find many different examples of how to solve this with different databases. One way to solve it is as follows:
SELECT
T2.course_id,
T2.attendance_time
T2.id
FROM (
SELECT
course_id,
MAX(attendance_time) AS attendance_time
FROM attendance
GROUP BY course_id
) T1
JOIN attendance T2
ON T1.course_id = T2.course_id
AND T1.attendance_time = T2.attendance_time
Note that this query can in theory return multiple rows per course_id if there are multiple rows with the same attendance_time. If that cannot happen then you don't need to worry about this issue. If this is a potential problem then you can solve this by adding an extra grouping on course_id, attendance_time and selecting the minimum or maximum id.
What do you need the additional column for? It already has a course ID, which identifies the data. A synthetic ID to the query would be useless because it does not refer to anything. If you want to get the max from the query results for a single course, then you can add a where condition like this:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id **WHERE course_id = your_id_here**;
If you mean that the column should be named 'id', you can alias it in the query:
SELECT course_id **AS id**, max(attendance_time) FROM attendance GROUP BY course_id;
You could make a view out of your query to easily access the aggregate data:
CREATE VIEW max_course_times AS SELECT course_id AS id, max(attendance_time) FROM attendance GROUP BY course_id;
SELECT * FROM max_course_times;
For SQL Server 2008 onwards, I like to use a Common Table Expression to add aggregated columns to queries:
WITH AttendanceTimes (course_id, maxTime)
AS
(
SELECT
course_id,
MAX(attendance_time)
FROM attendance
GROUP BY course_id
)
SELECT
a.course_id,
t.maxTime,
a.id
FROM attendance a
INNER JOIN AttendanceTimes t
ON a.course_id = t.course_id