SQL Nested Query with distinct count - sql

I have a dilemma, and I'm hoping someone will be able to help me out. I am attempting to work on some made up problems from an old text book of mine, this isn't a question from the book, but the data is, I just wanted to see if I could still work in SQL, so here goes. When this code is executed,
SELECT COUNT(code_description) "Number of Different Crimes", last, first,
code_description
FROM
(
SELECT criminal_id, last, first, crime_code, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
ORDER BY criminal_id
)
WHERE criminal_id = 1020
GROUP BY last, first, code_description;
I am provided with these results:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
1 Phelps Sam Agg Assault
1 Phelps Sam Drug Offense
Inevitably, I would like the number of different crimes to be 2 for each line since this criminal has two unique crimes charged to him. I would like it to be displayed something like:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
2 Phelps Sam Agg Assault
2 Phelps Sam Drug Offense
Not to push my luck but I would also like to get rid of the follow line also:
WHERE criminal_id = 1020
to something a little more elegant to represent any criminal with more than 1 crime type associated with them, for this case, Sam Phelps is the only one in this data set.

As #sgeddes said in a comment, you can use an analytic count, which doesn't need a subquery if you're specifying the criminal ID:
SELECT COUNT(code_description) OVER (PARTITION BY first, last) AS "Number of Different Crimes",
last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
WHERE criminal_id = 1020;
If you want to look for anyone with multiple crimes then you do need a subquery so you can filter on the analytic result:
SELECT charge_count AS "Number of Different Crimes",
last, first, code_description
FROM (
SELECT COUNT(DISTINCT code_description) OVER (PARTITION BY first, last) AS charge_count,
criminal_id, last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
)
WHERE charge_count > 1
ORDER BY criminal_id, code_description;
SQL Fiddle demo.
If the charges are across multiple crimes, but duplicated, then the distinct count still works, but you might want to make add a distinct to the overall result set - unless you want to show other crime-specific info - otherwise you get something like this.

Related

How can I get Access SQL to return a dataset of the largest value in each category?

This has been driving me crazy all day, and I've gone through every solution I can find on here. This should be a very simple thing.
I have a table in Access that contains a list of applications:
ApplicantNumber | Region
There are many more columns, but those are the two I care about at the moment. Each row is a separate application, and each applicant can submit multiple applications.
I have a query in Access that finds the count per applicant of applications in each region:
ApplicantNumber | Region | CountOfAPplications
How the ##&*!!! do I pull out of that the region with the most applications for each ApplicantNumber?
As far as I can tell, the following should work fine but it just provides the same output as the initial query with the full count per applicant:
SELECT myQry.ApplicantNumber, myQRY.Region, Max(myQRY.CountOfRegion)
FROM (SELECT AppliedCensusBlocks.ApplicantNumber, AppliedCensusBlocks.Region, Count(AppliedCensusBlocks.Region) AS CountOfRegion
FROM AppliedCensusBlocks
GROUP BY AppliedCensusBlocks.ApplicantNumber, AppliedCensusBlocks.Region) AS myQRY
GROUP BY myQry.ApplicantNumber, myQry.Region
What am I doing wrong? If I remove the Region field, Access will work as I'd expect and just show the ApplicantNumber and maximum count. BUt I'm really trying to get at the region name associated with the maximum count.
This is a bit tricky. MS Access is not the best suited for this sort of query. But here is one way
SELECT acb.ApplicantNumber, acb.Region, Count(*) AS CountOfRegion
FROM AppliedCensusBlocks as acb
GROUP BY acb.ApplicantNumber, acb.Region
HAVING COUNT(*) = (SELECT TOP 1 COUNT(*)
FROM AppliedCensusBlocks as acb2
WHERE acb2.ApplicantNumber = acb.ApplicantNumber
GROUP BY acb2.Region
ORDER BY COUNT(*) DESC, acb2.Region
);
SELECT TOP 1 ApplicantNumber, Region, COUNT(*) AS Applications
FROM AppliedCensusBlocks
GROUP BY ApplicantNumber, Region
ORDER BY COUNT(*) DESC

sql-server-2008 : get the last status of subjects of a student

Salam, (Greetings) to all.
Intro:
I am working on a Student Examination System, where Students appear and pass or fail or absent.
Problem:
I am tasked to fetch their Summary of Status. you may say a Result Card which should print their very last status of a Subject.
Below is a sample of the data where a student has appeared many times, in different sessions. I have highlighted one subject in which a student has appeared three times.
Now, I write the following Query which extract the same result as the picture above:
SELECT DISTINCT
gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,gr.MARKS,
gr.PASSFAIL, gr.GRADE,max(gr.SESSION_ID), gr.LEVEL_ID
FROM RESULT gr
WHERE gr.STUDKEY = '0100106524'
GROUP BY gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,gr.MARKS,
gr.PASSFAIL, gr.GRADE, gr.LEVEL_ID
Desired:
I want to get only the last status of a subject in which a student has appeared.
Help is requested. Thanks in advanced.
Regards
I am using sql-server-2008.
This won't work because you include fields like gr.MARKS and gr.GRADE in the group by and in the select which means that the query might return more than 1 record for each session id while their grade or marks is different.
SELECT
gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,
gr.PASSFAIL, gr.GRADE,gr.SESSION_ID, gr.LEVEL_ID
FROM RESULT gr
JOIN (SELECT MAX(SessionId) as sessionId, STUDKEY
FROM RESULT
GROUP BY STUDKEY ) gr1 ON gr1.sessionId=gr.sessionid AND gr1.STUDKEY =gr.STUDKEY
Hopefully there is a date field, or something that indicates the order of the students appearances in this class. Use that to order your query in descending order, so that the most recent occurrence is the first record, then specifiy "Top 1" which will then give you only the most recent record for that student, which will include in his most recent status.
SELECT TOP 1
gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,gr.MARKS,
gr.PASSFAIL, gr.GRADE,gr.SESSION_ID, gr.LEVEL_ID
FROM RESULT gr
WHERE gr.STUDKEY = '0100106524'
ORDER BY gr.Date DESC //swap "Date" out for your field indicating the sequence.
or use a Group by with MAX(Date) if you're looking for multiple classes with the same student at the same time.

How do I use the MAX function over three tables?

So, I have a problem with a SQL Query.
It's about getting weather data for German cities. I have 4 tables: staedte (the cities with primary key loc_id), gehoert_zu (contains the city-key and the key of the weather station that is closest to this city (stations_id)), wettermessung (contains all the weather information and the station's key value) and wetterstation (contains the stations key and location). And I'm using PostgreSQL
Here is how the tables look like:
wetterstation
s_id[PK] standort lon lat hoehe
----------------------------------------
10224 Bremen 53.05 8.8 4
wettermessung
stations_id[PK] datum[PK] max_temp_2m ......
----------------------------------------------------
10224 2013-3-24 -0.4
staedte
loc_id[PK] name lat lon
-------------------------------
15 Asch 48.4 9.8
gehoert_zu
loc_id[PK] stations_id[PK]
-----------------------------
15 10224
What I'm trying to do is to get the name of the city with the (for example) highest temperature at a specified date (could be a whole month, or a day). Since the weather data is bound to a station, I actually need to get the station's ID and then just choose one of the corresponding to this station cities. A possible question would be: "In which city was it hottest in June ?" and, say, the highest measured temperature was in station number 10224. As a result I want to get the city Asch. What I got so far is this
SELECT name, MAX (max_temp_2m)
FROM wettermessung, staedte, gehoert_zu
WHERE wettermessung.stations_id = gehoert_zu.stations_id
AND gehoert_zu.loc_id = staedte.loc_id
AND wettermessung.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX (max_temp_2m) DESC
LIMIT 1
There are two problems with the results:
1) it's taking waaaay too long. The tables are not that big (cities has about 70k entries), but it needs between 1 and 7 minutes to get things done (depending on the time span)
2) it ALWAYS produces the same city and I'm pretty sure it's not the right one either.
I hope I managed to explain my problem clearly enough and I'd be happy for any kind of help. Thanks in advance ! :D
If you want to get the max temperature per city use this statement:
SELECT * FROM (
SELECT gz.loc_id, MAX(max_temp_2m) as temperature
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY gz.loc_id) as subselect
INNER JOIN staedte as std
ON std.loc_id = subselect.loc_id
ORDER BY subselect.temperature DESC
Use this statement to get the city with the highest temperature (only 1 city):
SELECT * FROM(
SELECT name, MAX(max_temp_2m) as temp
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
INNER JOIN staedte as std
ON gz.loc_id = std.loc_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX(max_temp_2m) DESC
LIMIT 1) as subselect
ORDER BY temp desc
LIMIT 1
For performance reasons always use explicit joins as LEFT, RIGHT, INNER JOIN and avoid to use joins with separated table name, so your sql serevr has not to guess your table references.
This is a general example of how to get the item with the highest, lowest, biggest, smallest, whatever value. You can adjust it to your particular situation.
select fred, barney, wilma
from bedrock join
(select fred, max(dino) maxdino
from bedrock
where whatever
group by fred ) flinstone on bedrock.fred = flinstone.fred
where dino = maxdino
and other conditions
I propose you use a consistent naming convention. Singular terms for tables holding a single item per row is a good convention. You only table breaking this is staedte. Should be stadt.
And I suggest to use station_id consistently instead of either s_id and stations_id.
Building on these premises, for your question:
... get the name of the city with the ... highest temperature at a specified date
SELECT s.name, w.max_temp_2m
FROM (
SELECT station_id, max_temp_2m
FROM wettermessung
WHERE datum >= '2012-8-1'::date
AND datum < '2012-12-1'::date -- exclude upper border
ORDER BY max_temp_2m DESC, station_id -- id as tie breaker
LIMIT 1
) w
JOIN gehoert_zu g USING (station_id) -- assuming normalized names
JOIN stadt s USING (loc_id)
Use explicit JOIN conditions for better readability and maintenance.
Use table aliases to simplify your query.
Use x >= a AND x < b to include the lower border and exclude the upper border, which is the common use case.
Aggregate first and pick your station with the highest temperature, before you join to the other tables to retrieve the city name. Much simpler and faster.
You did not specify what to do when multiple "wettermessungen" tie on max_temp_2m in the given time frame. I added station_id as tiebreaker, meaning the station with the lowest id will be picked consistently if there are multiple qualifying stations.

Query to find how many students signed out

I need to write a query to find out how many students signed out after 1st period. We don't store a record if the student was present so I can't say if the student was present 1st period and has 6 absence records (we have 7 period days). All I have is the info in the schema below. I ahve a query that I wrote but its not working. Need some help on where to go from here.
Thanks
Select student_id, Count(*) AS #ofPerAbsent
From Attend_Student_Detail
where School_Year='1112' and School_Number='0031'
and Absent_Date='2012-04-13' and Absent_Code IN ('ABU','ABX')
Group by Student_ID
Having count(*)<=6
ORDER BY #ofPerAbsent desc
So your criteria for determining a student signed out after 1st period is having an Absent_Code or 'ABU' or 'ABX' ?
If that assumption is correct, then you can query as follows to get count of students per day that fit that criteria...
SELECT COUNT(DISTINCT(Student_ID))
FROM Attend_Student_Detail
WHERE Absent_Code IN ('ABU','ABX')
GROUP BY Absent_Date
You can further filter to specific dates in the WHERE clause if you'd like.
Your schema doesn't make much sense to me by the way; so if the above is not what you're looking for, can you please explain your schema a bit more and I'm sure I can help.
from what i can gather you will want to count all the absences minus the count of absences after the first period, so i think something like this should work.
SELECT
A.student_id,
(Count(A.student_id) - B.absences_after) as absences
FROM
attend_student_detail as A
LEFT JOIN (
SELECT
Z.student_id,
Count(Z.student_id) as absences_after
FROM
attend_student_detail as Z
WHERE school_year='1112' AND school_number='0031'
AND absent_date='2012-04-13' AND absent_code IN ('ABU','ABX')
AND absent_period <> "period one"
GROUP BY Z.student_id
) as B
ON B.student_id = A.student_id
GROUP BY A.student_id;

SQL Query (incorrect use of joins?)

I'm trying to write an (Oracle) SQL query that, given an "agent_id", would give me a list of questions that agent has answered during an assessment, as well as an average score over all of the times that agent has answered those questions.
Note: I tried to design the query such that it would support multiple employees (so we can query at the store level), hence the "IN" condition in the where clause.
Here's what I have so far:
select question.question_id as questionId,
((sum(answer.answer) / count(answer.answer)) * 100) as avgScore
from SPMADMIN.SPM_QC_ASSESSMENT_ANSWER answer
join SPMADMIN.SPM_QC_QUESTION question
on answer.question_id = question.question_id
join SPMADMIN.SPM_QC_ASSESSMENT assessment
on answer.assessment_id = assessment.assessment_id
join SPMADMIN.SPM_QC_SUB_GROUP_TYPE sub_group
on question.sub_group_type_id = sub_group.sub_group_id
join SPMADMIN.SPM_QC_GROUP_TYPE theGroup
on sub_group.group_id = theGroup.group_id
where question.question_id in (select distinct question2.question_id
from SPMADMIN.SPM_QC_QUESTION question2
)
and question.bool_yn_active_flag = 'Y'
and assessment.agent_id in (?)
and answer.answer is not null
order by theGroup.page_order asc,
sub_group.group_order asc,
question.sub_group_order asc
Basically I would want to see:
|questionId|avgScore|
| 1 | 100 |
| 2 | 50 |
| 3 | 75 |
Such that every question that employee has ever answered is in the list of question indexes with their average score over all of the times they've answered it.
When I run it as is, I'm given a "ORA-00937: not a single-group group function" error. Any sort of combination of a "group by" clause I've added hasn't helped in the least.
When I run it removing the question.question_id as questionId, part of the select, it runs fine, but it shows their average score over all questions. I need it broken down by question.
Any help or pointers would be greatly appreciated.
When you have an aggregate function in the SELECT list (SUM and COUNT are aggregate functions), then any other columns in the SELECT list need to be in a GROUP BY clause. For example:
SELECT fi, COUNT(fo)
FROM fum
GROUP BY fi
The COUNT(fo) expression is an aggregate, the fi column is a non-aggregate. If you were to add another non-aggregate to the SELECT list, it would also need to be included in the GROUP BY. For example
SELECT TRUNC(fee), fi, COUNT(fo)
FROM fum
GROUP BY TRUNC(fee), fi
To be a little more precise, rather than say "columns in the SELECT list", we should actually say "all non-aggregate expressions in the SELECT list" will need to be included in the GROUP BY clause.
It's not your joins but your use of GROUP BY.
When you use a GROUP BY in SQL, the things you GROUP BY are the things which define the groups. Everything else you have in your SELECT have to be in aggregates which operate over the group.
You can also do aggregates over the entire set without a GROUP BY, but then every column will need to be within an aggregate function.