To solve the below question, I want to combine 2 result sets (Query #1 + Query #2) using SQL.
I've tried UNION but it didn't work. Help!
Question:
Sample Data:
Query #1:
SELECT
COUNT(student_id),
MAX(registration_date),
MAX(lab1)
FROM grades;
Query #1 result:
Query #2:
SELECT
MAX(SUM(NVL(lab1, 0) + NVL(lab2, 0)))
FROM grades
GROUP BY student_id;
Query #2 result:
You had a few descent attempts. If you could edit your existing post and supply some sample data would help better show us what the underlying basis would be. Are there multiple rows per lab1 and/or lab2. Are there multiple classes represented? You want a sum, but doing a max?
You need a group by per student so each student shows of their own. So you had that. You also had max(), but also sum(). So hopefully this will be close to what you need.
SELECT
g.student_id,
max( g.registration_date ) NewestRegistrationDate,
max( coalesce( g.lab1, 0 )) HighestLab1,
max( coalesce( g.lab2, 0 )) HighestLab2,
max( coalesce( g.lab1, 0 ))
+ max( coalesce( g.lab2, 0 )) HighestBothLabs
FROM
grades g
group by
g.student_id
At least the above will give you the per-student basis as your grid image would represent, not the final answer which apparently wants the total student and the highest grade overall. That one, you WOULD want the count(), and ignore the group by as you want the answer over the entire set. But do you want the highest in each lab category? Or the highest per student with whatever the highest student's total labs are.
EX:
Student Lab1 Lab2 Total
1 75 92 167
2 74 100 174
Here, you have two students. The first has the highest lab1, but the second has highest total. You need to make that determination or at least get clarification from the teacher. If you want the highest by lab 1, you probably need something like wrapping the first query into the second. But that too may be over what you are learning. Using one result as basis for the outer such as.
select
count(*) TotalStudents,
Max( PerStudent.NewestRegistrationDate ) MostRecentRegistrationDate,
Max( PerStudent.HighestLab1 ) HighestFirstLab,
Max( PerStudent.HighestBothLabs ) HighestAllLabs
from
( SELECT
g.student_id,
max( g.registration_date ) NewestRegistrationDate,
max( coalesce( g.lab1, 0 )) HighestLab1,
max( coalesce( g.lab2, 0 )) HighestLab2,
max( coalesce( g.lab1, 0 ))
+ max( coalesce( g.lab2, 0 )) HighestBothLabs
FROM
grades g
group by
g.student_id ) PerStudent
The inner pre-query (alias PerStudent) keeps the data at the per-student level giving you the basis to get the highest per at the OUTER query level that does NOT have the group by. But at least running the first query to confirm the baseline of the data first. If you dont have that correct, the outer won't make sense either.
Thanks to DRapp, I solved it with the code below.
SELECT
COUNT(*) TotalStudents,
MAX(PerStudent.MostRecentRegistrationDate) MostRecentRegistrationDate,
MAX(PerStudent.HighestFirstLab) HighestFirstLab,
MAX(PerStudent.SumBothLabs) HighestBothLabs
FROM
(SELECT
MAX(g.registration_date) MostRecentRegistrationDate,
MAX(coalesce(g.lab1, 0)) HighestFirstLab,
SUM(coalesce(g.lab1, 0) + coalesce(g.lab2, 0)) SumBothLabs
FROM grades g
GROUP BY g.student_id) PerStudent;
Related
I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;
I have 3 sub-tables of different formats joined together with unions if this affects anything into full-table. There I have columns "location", "amount" and "time". Then to keep generality for my later needs I union full-table with location-table that has all possible "location" values and other fields are null into master-table.
I query master-table,
select location, sum(amount)
from master-table
where (time...)
group by location
However some "location" values are dropped because sum(amount) is 0 for those "location"s but I really want to have full list of those "location"s for my further steps.
Alternative would be to use HAVING clause but from what I understand HAVING is impossible here because i filter on "time" while grouping on "location" and I would need to add "time" in grouping which destroys the purpose. Keep in mind that the goal here is to get sum(amount) in each "location"
select location, sum(amount)
from master-table
group by location, time
having (time...)
To view the output:
with the first code I get
loc1, 5
loc3, 10
loc6, 1
but I want to get
loc1, 5
loc2, 0
loc3, 10
loc4, 0
loc5, 0
loc6, 1
Any suggestions on what can be done with this structure of master-table? Alternative solution to which I have no idea how to code would be to add numbers from the first query result to location-table (as a query, not actual table) with the final result query that I've posted above.
What you want will require a complete list of locations, then a left-outer join using that table and your calculated values, and IsNull (for tsql) to ensure you see the 0s you expect. You can do this with some CTEs, which I find valuable for clarity during development, or you can work on "putting it all together" in a more traditional SELECT...FROM... statement. The CTE approach might look like this:
WITH loc AS (
SELECT DISTINCT LocationID
FROM location_table
), summary_data as (
SELECT LocationID, SUM(amount) AS location_sum
FROM master-table
GROUP BY LocationID
)
SELECT loc.LocationID, IsNull(location_sum,0) AS location_sum
FROM loc
LEFT OUTER JOIN summary_data ON loc.LocationID = summary_data.LocationID
See if that gets you a step or two closer to the results you're looking for.
I can think of 2 options:
You could move the WHERE to a CASE WHEN construction:
-- Option 1
select
location,
sum(CASE WHEN time <'16:00' THEN amount ELSE 0 END)
from master_table
group by location
Or you could JOIN with the possible values of location (which is my first ever RIGHT JOIN in a very long time 😉):
-- Option 2
select
x.location,
sum(CASE WHEN m.time <'16:00' THEN m.amount ELSE 0 END)
from master_table m
right join (select distinct location from master_table) x ON x.location = m.location
group by x.location
see: DBFIDDLE
The version using T-SQL without CTEs would be:
SELECT l.location ,
ISNULL(m.location_sum, 0) as location_sum
FROM master-table l
LEFT JOIN (
SELECT location,
SUM(amount) as location_sum
FROM master-table
WHERE (time ... )
GROUP BY location
) m ON l.location = m.location
This assumes that you still have your initial UNION in place that ensures that master-table has all possible locations included.
It is the where clause that excludes some locations. To ensure you retain every location you could introduce "conditional aggregation" instead of using the where clause: e.g.
select location, sum(case when (time...) then amount else 0 end) as location_sum
from master-table
group by location
i.e. instead of excluding some rows from the result, place the conditions inside the sum function that equate to the conditions you would have used in the where clause. If those conditions are true, then it will aggregate the amount, but if the conditions evaluate to false then 0 is summed, but the location is retained in the result.
Dataset looking at the types of crime for a given city.
Incident ID
Incident Code
Incident Category
Incident Subcategory
Incident Description
618691
4134
Assault
Simple Assault
Battery
618691
15300
Offences Against The Family And Children
Other
Hate Crime (secondary only)
618701
7053
Vehicle Impounded
Vehicle Impounded
Vehicle, Impounded
618701
65010
Traffic Violation Arrest
Traffic Violation Arrest
Traffic Violation Arrest
618701
65050
Other Miscellaneous
Other
Driving While Under The Influence Of Alcohol
626010
5043
Burglary
Burglary - Residential
Burglary, Residence, Unlawful Entry
626010
6381
Larceny Theft
Larceny Theft - Other
Embezzlement from Dependent or Elder Adult by Caretaker
626010
7041
Recovered Vehicle
Recovered Vehicle
Vehicle, Recovered, Auto
626010
16650
Drug Offense
Drug Violation
Methamphetamine Offense
Each IncidentID has 2, 3, or 4 Incident Codes associated with it.
I want to be able to count the number of times each combination of 2, 3, or 4 Incident Codes appears in the entire dataset.
For example:
Incident Codes 4134, 15300: x amount of times
Incident Codes 7053, 65010, 65050: x amount of times
Incident Codes 5043, 6381, 7041, 16650: x amount of times
I apologize if I've given a poor explanation - this is my first post on SO and quite frankly I don't know how to best communicate this question.
I don't know what SQL code to run to get my answer. The closest I've come to finding an answer is this post, Select combination of two columns, and count occurrences of this combination, but it already has the data separated into two columns, which my data is not there.
My thought is to split the additional codes into other columns, but perhaps there is a way to avoid doing that by having the code run the calculation for me without it.
I appreciate any and all input you may be able to give!
Let's suppose your table is named "TableX". I think this query should be near to what you need:
Select T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode, Count(1) AS AmountOfTimes
From TableX T1
Join TableX T2 ON T2.IncidentID = T1.IncidentID AND
T2.IncidentCode <> T1.IncidentCode
Left Join TableX T3 ON T3.IncidentID = T1.IncidentID AND
T3.IncidentCode <> T1.IncidentCode AND
T3.IncidentCode <> T2.IncidentCode
Left Join TableX T4 ON T4.IncidentID = T1.IncidentID AND
T4.IncidentCode <> T1.IncidentCode AND
T4.IncidentCode <> T2.IncidentCode AND
T4.IncidentCode <> T3.IncidentCode
Group By T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode
You would probably be best to try and NOT get all 3 parts in one query and here is why. Lets say for example that one officer enters their data as codes 1, 2, 3. Another enters codes as 3, 1, 2, and yet another enters as 2, 3, 1. They are all the same "set" of codes just in different order. If you rely on just being the first being the same, you would be getting 3 different rows showing the same thing each with 1 count.
You would be better served by running 3 distinct queries with a WHERE and HAVING clause based on just the codes you are interested in the "set". Something simple like
select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2
This will return all incidents that have BOTH parts, even if the incident was associated with any 3rd and/or 4th additional codes in a given incident. Having the total records IS your count.
So, now, take your codes of interest ex: 1 & 2, and you have the possibility of 2 more incident codes per incident, and you add an additional 30+ combinations of codes 3 & 4 into the mix. If you dont care about the others that may be "extra", it does not screw up your count on the precise piece(s) you are looking for.
Then, all you have to do to get your other "what if" scenario counts is change your IN clause once and the having to match the count. Since you are only filtering based on the specific codes in question, you only want those that have the same count regardless of extra incident codes per example stated.
YT.IncidentCode in ( 7053, 65010, 65050 )
group by
YT.IncidentID
having
count(*) = 3
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
group by
YT.IncidentID
having
count(*) = 4
Now, if you only really care about the final count of each respectively, just wrap that up one more to get the count of rows returned such as
select
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
Then, if you wanted to do this on some time period basis such as you have a given date of the incident, and you wanted to keep running the same query / counts, you could expand and do something like this by doing a UNION to each query.
select
'Assault and Offenses against Family and Children' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
UNION
select
'Vehicle Impound, Traffic Arrest, Other Misc' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 7053, 65010, 65050 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 3 ) PreQualified
UNION
select
'Burglary, Theft, Drugs and Vehicle Recovery' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 4 ) PreQualified
Notice each query in the UNION returns the same number, and order of columns. So it will just return a list (in this case) of 3 rows with a description and count per category regardless of the physical order the incident codes were entered, even IF they were entered in the 3rd and 4th when only looking for 2 code possibilities.
Sometimes a generic query (as in the left-join sample) is ok, and nothing wrong with it, but ask yourself the flexibility and do you want to drill into each permutation just to get your final result numbers.
I'm working on this query:
SELECT s.studentname,
Avg(cs.exam_season_one
+ cs.exam_season_two
+ cs.degree_season_one
+ cs.degree_season_two ) / 4 AS average
FROM courses_student cs
join students s
ON s.student_id = cs.student_id
join SECTION se
ON s.sectionid = se.sectionid
WHERE cs.courses_id = 1
AND ( se.classes_id = 2
OR se.classes_id = 5 )
AND s.studentname LIKE 'm%'
GROUP BY s.studentname
until this moment everything works perfectly but I need to add a last condition and I dont know how.
I need to get the sudents with the same average
I mean only students with count(average) > 1 (idk if this is right)
anyone knows how to solve this problem in this query?
PS: I use oracle.
Edit:
The create tables statements and sample data are here:
http://sqlfiddle.com/#!4/ebf636
trying to explain the problem more because maybe I did it the wrong way the first time!
First, the average is the average of 4 columns
The output I expect is to get the names of the students who belong to class 2 or class 5 (classes_id = 2, classes_id = 5), also their name should start with M
I want to check their average in a specific course (course_id = 1)
and the last condition I'm asking about is that I want to get the students who only have the same average in this course.
for example:
if we have 4 students and the averages in the course are (60,70,80,80) then I want to get only the last 2 student names because they have the same average. hope it's clear now!
You appear to be asking for the students where the average of the averages of their exam and degree seasons 1 and 2 are greater than 1 for certain sections.
Without sample data and expected output to validate against, its difficult to answer but, for each student, you want to GROUP BY the primary key that uniquely identifies the student (otherwise you may aggregate two students with the same name together) and you only need to check if the section exists (rather than JOINing the section as that would create duplicate rows if there are multiple matching sections and skew the averages).
Then
if we have 4 students and the averages in the course are (60,70,80,80) then I want to get only the last 2 student names because they have the same average
You want to count how many students have the same average and then filter out those with unique averages:
SELECT studentname,
average
FROM (
SELECT a.*,
COUNT(*) OVER (PARTITION BY average) AS num_with_average
FROM (
SELECT MAX(s.studentname) AS studentname,
( Avg(cs.exam_season_one)
+ Avg(cs.exam_season_two)
+ Avg(cs.degree_season_one)
+ Avg(cs.degree_season_two) ) / 4 AS average
FROM courses_student cs
join students s
ON s.student_id = cs.student_id
WHERE cs.courses_id = 1
AND s.studentname LIKE 'm%'
AND EXISTS(
SELECT 1
FROM section se
WHERE s.sectionid = se.sectionid
AND se.classes_id IN (2, 5)
)
GROUP BY
s.student_id
) a
)
WHERE num_with_average > 1;
db<>fiddle here
I'm trying to add the counts together and output the one with the max counts.
The question is: Display the person with the most medals (gold as place = 1, silver as place = 2, bronze as place = 3)
Add all the medals together and display the person with the most medals
Below is the code I have thought about (obviously doesn't work)
Any ideas?
Select cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
having max (count(re.place = 1) + count(re.place = 2) + count(re.place = 3))
Sorry forgot to add that were not allowed to use ORDER BY.
Some data in the table
Competitors Table
Competitornum GivenName Familyname gender Dateofbirth Countrycode
219153 Imri Daniel Male 1988-02-02 Aus
Results Table
Eventid Competitornum Place Lane Elapsedtime
SWM111 219153 1 2 20 02
From what you've described it sounds like you just need to take the "Top" individual in the total medal count. In order to do that you would write something like this.
Select top 1 cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
order by count(*) desc
Without using order by you have a couple of other options though I'm glossing over whatever syntax peculiarities sqlfire may use.
You could determine the max medal count of any user and then only select competitors that have that count. You could do this by saving it out to a variable or using a subquery.
Select cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
having count(*) = (
Select max( count(*) )
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
)
Just a note here. This second method is highly inefficient because we recalculate the max medal count for every row in the parent table. If sqlfire supports it you would be much better served by calculating this ahead of time, storing it in a variable and using that in the HAVING clause.
You are grouping by re.place, is that what you want? You want the results per ... ? :)
[edit] Good, now that's fixed you're almost there :)
The having is not needed in this case, you simply need to add a count(re.EventID) to your select and make a subquery out of it with a max(that_count_column).