Find rows with fewer than X associations (including 0)

Find rows with fewer than X associations (including 0) - sql

I have students associated to schools, and want to find all schools that have five or fewer (including zero) students that have has_mohawk = false.
Here's an Activerecord query:
School.joins(:students)
.group(:id)
.having("count(students.id) < 5")
.where(students: {has_mohawk: true})
This works for schools with 1 - 4 such students with mohawks, but omits schools where there are no such students!
I figured out a working solution and will post it. But I am interested in a more elegant solution.
Using Rails 5. I'm also curious whether Rails 6's missing could handle this?

find all schools that have five or fewer (including zero) students that have has_mohawk = false.
Here is an optimized SQL solution. SQL is what it comes down to in any case. (ORMs like Active Record are limited in their capabilities.)
SELECT sc.*
FROM schools sc
LEFT JOIN (
SELECT school_id
FROM students
WHERE has_mohawk = false
GROUP BY 1
HAVING count(*) >= 5
) st ON st.school_id = sc.id
WHERE st.school_id IS NULL; -- "not disqualified"
While involving all rows, aggregate before joining. That's faster.
This query takes the reverse approach by excluding schools with 5 or more qualifying students. The rest is your result - incl. schools with 0 qualifying students. See:
Select rows which are not present in other table
Any B-tree index on students (school_id) can support this query, but this partial multicolumn index would be perfect:
CREATE INDEX ON students (school_id) WHERE has_mohawk = false;
If there can be many students per school, this is faster:
SELECT sc.*
FROM schools sc
JOIN LATERAL (
SELECT count(*) < 5 AS qualifies
FROM (
SELECT -- select list can be empty (cheapest)
FROM students st
WHERE st.school_id = sc.id
AND st.has_mohawk = false
LIMIT 5 -- !
) st1
) st2 ON st2.qualifies;
The point is not to count all qualifying students, but stop looking once we found 5. Since the join to the LATERAL subquery with an aggregate function always returns a row (as opposed to the join in the first query), schools without qualifying students are kept in the loop, and we don't need the reverse approach.
About LATERAL:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

In addition to the first query, write another to find schools where no students have mohawks (works in Rails 5).
School.left_outer_joins(:students)
.group(:id)
.having("max(has_mohawk::Integer) = 0")
You might think from this popular answer that you could instead just write:
School.left_outer_joins(:students)
.group(:id)
.where.not(student: {has_mohawk: true})
But that will include (at least in Rails 5) schools where there is any student with a has_mohawk value of false, even if some students have a has_mohawk value of true.
Explanation of max(has_mohawk::Integer) = 0
It converts the boolean to an integer (1 for true, 0 for false). Schools with any true values will have a max of 1, and can thus be filtered out.
Similiar: SQL: Select records where ALL joined records satisfy some condition

Related

Oracle sql group function issues

Find Melbourne VIP level 4 customers’ first name, last name who have hired the vehicle model as “Ranger ” at least 2 times in database. You write three different queries: one is using operator EXISTS and the other one is using operator IN. The third query with the main filter criteria in FROM clause of the main query or filter criteria in the sub-query. Find one with the better performance.
I Have tried this query;
SELECT c_fname, c_fname FROM rental WHERE
EXISTS(SELECT c_id FROM customer WHERE c_city = 'Melbourne' AND customer.vip_level = '4')
AND EXISTS (SELECT vehicle_reg FROM vehicle WHERE v_model = 'Ranger')
HAVING COUNT(c_id)>=2 GROUP BY c_lname, c_fname;
I am getting error: SQL Error: ORA-00934: group function is not allowed here
00934. 00000 - "group function is not allowed here"
can anyone help me with this question. really struggled to get this done?

You are selecting from the wrong subject table as Rental does not have c_fname or c_lname columns.
You want to "Find Melbourne VIP level 4 customers’ first name, last name" which would be in the customer table:
SELECT c_fname,
c_lname
FROM customer
WHERE c_city = 'Melbourne'
AND vip_level = 4;
Then you want to add an additional filter "who have hired the vehicle model as “Ranger ” at least 2 times in database". That requires you to use EXISTS (for the answer for the first query) and you need to correlate between the outer-query and the sub-query; once you have done that then you do not need a GROUP BY clause and you are aggregating over the entire result set of the sub-query and can just use a HAVING clause.
SELECT c_fname,
c_lname
FROM customer c
WHERE c_city = 'Melbourne'
AND vip_level = 4
AND EXISTS(
SELECT 1
FROM rental r
INNER JOIN vehicle v
ON (r.vehicle_reg = v.vehicle_reg)
WHERE c.c_id = r.c_id
AND v.v_model = 'Ranger'
HAVING COUNT(*) >= 2
);
Then you need to write the same query using IN instead of EXISTS and the same query a third time using JOIN conditions instead of IN or EXISTS and, finally, you need to compare the performance of all three queries.

How to select all rows that are related to satisfied condition in ms ACCESS

I have the following data in table VehiclesAccSummary:
I am not sure how to do this but what I want is a query that would return all rows where there where only two car in the accident and it was a head on crash so in both cars PointOfImpact was 'Front' i know in need to do some inner join on the same table but i don' want to join same car with it self. Any idea ?
The end result should be something like this

You can use a subquery to count the number of "Front' records for each AccRef.
SELECT *
FROM VehiclesAccSummary INNER JOIN
(SELECT VehiclesAccSummary.AccRef, Count(VehiclesAccSummary.PointOfImpact) AS CountOfPointOfImpact FROM VehiclesAccSummary GROUP BY VehiclesAccSummary.AccRef, VehiclesAccSummary.PointOfImpact HAVING VehiclesAccSummary.PointOfImpact="Front") AS FrontCount
ON VehiclesAccSummary.AccRef = FrontCount.AccRef
WHERE VehiclesAccSummary.NumCars = 2 AND CountOfPointOfImpact = 2;
This will limit the records to AccRefs with NumCar = 2 and the count of Front records = 2.
Edit: Since the same car can be listed in multiple records, we need a new approach. Try this:
SELECT VehiclesAccSummary.*, Subquery.CountOfPointOfImpact
FROM VehiclesAccSummary LEFT JOIN (SELECT VehiclesAccSummary.AccRef, Count(VehiclesAccSummary.PointOfImpact) AS CountOfPointOfImpact
FROM VehiclesAccSummary
WHERE (((VehiclesAccSummary.PointOfImpact)<>"Front"))
GROUP BY VehiclesAccSummary.AccRef) AS Subquery ON VehiclesAccSummary.AccRef = Subquery.AccRef
WHERE (((Subquery.CountOfPointOfImpact) Is Null));
Instead of confirming that the count of front accidents is 2, this confirms that the count of non-front accidents is 0.

I get the same object twice

I am trying to get all the lessons of the students that have a grade that contains a certain term.
The orange relations are the relevant relations:
The query:
SELECT
tg.nhsColor AS cellColor,
tg.nhsTgradeName AS LessonName,
lsons.nhsLessonID AS LessonID,
lsons.nhsTgradeID AS TgradeID,
lsons.nhsDay AS nhsDay,
lsons.nhsHour AS nhsHour,
tg.nhsTeacherID AS TeacherID
FROM
nhsTeacherGrades AS tg,
nhsLessons AS lsons,
nhsLearnGroups,
nhsMembers AS mem,
nhsGrades AS grd
WHERE
tg.nhsTgradeID = lsons.nhsTgradeID
AND nhsLearnGroups.nhsTgradeID = tg.nhsTgradeID
AND mem.nhsUserID = nhsLearnGroups.nhsStudentID
AND mem.nhsGradeID = grd.nhsGradeID
AND grd.nhsGradeName LIKE '%"+gradePart+"%'
The query works, yet, i get the same lesson twice from this query.

You can get duplicates for at least two reasons:
the same lessons can occur in different teacher grades followed by a certain student
different students can follow the same teacher grade
The following (untested) nested SQL could solve this. It gets the teacher grade ID of each lesson and checks which of these have at least one viable student linked to it:
SELECT tg.nhsColor AS cellColor,
tg.nhsTgradeName AS LessonName,
lsons.nhsLessonID AS LessonID,
lsons.nhsTgradeID AS TgradeID,
lsons.nhsDay AS nhsDay,
lsons.nhsHour AS nhsHour,
tg.nhsTeacherID AS TeacherID
FROM nhsLessons AS lsons
INNER JOIN nhsTeacherGrades AS tg
ON tg.nhsTgradeID = lsons.nhsTgradeID
WHERE tg.nhsTgradeID IN (
SELECT grp.nhsTgradeID
FROM (nhsLearnGroups grp
INNER JOIN nhsMembers AS mem
ON mem.nhsUserID = grp.nhsStudentID)
INNER JOIN nhsGrades AS grd
ON mem.nhsGradeID = grd.nhsGradeID
WHERE grd.nhsGradeName LIKE '%"+gradePart+"%'
)
Note that I used the JOIN syntax, which is considered better practice than placing join conditions in the WHERE clause. MS Access is quite pesky about using parentheses in the JOIN clauses, so you might need to play with those a bit to make it work.

Alternative to using GROUP BY without aggregates to retrieve distinct "best" result

I'm trying to retrieve the "Best" possible entry from an SQL table.
Consider a table containing tv shows:
id, title, episode, is_hidef, is_verified
eg:
id title ep hidef verified
1 The Simpsons 1 True False
2 The Simpsons 1 True True
3 The Simpsons 1 True True
4 The Simpsons 2 False False
5 The Simpsons 2 True False
There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.
I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).
3 The Simpsons 1 True True
5 The Simpsons 2 True False
In the past I would have used the following query:
SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified
This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).
Thanks =)

In some way similar to Andomar's but this one really works.
select C.*
FROM
(
select min(ID) minid
from (
select distinct title, ep, max(hidef*1 + verified*1) ord
from tbl
group by title, ep) a
inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id
The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0.
The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.
In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.

This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:
SELECT s2.id
, s2.title
, s2.episode
, s2.is_hidef
, s2.is_verified
FROM (
select distinct title
, episode
from shows
where title = 'The Simpsons'
) s1
JOIN shows s2
ON s2.id =
(
select id
from shows s3
where s3.title = s1.title
and s3.episode = s1.episode
order by
s3.is_hidef DESC
, s3.is_verified DESC
limit 1
)
But given the cost of readability, I would stick with your original query.

MySQL Join from multiple options to select one value

I am putting together a nice little database for adding values to options, all these are setup through a map (Has and Belongs to Many) table, because many options are pointing to a single value.
So I am trying to specify 3 option.ids and a single id in a value table - four integers to point to a single value. Three tables. And I am running into a problem with the WHERE part of the statement, because if multiple values share an option there are many results. And I need just a single result.
SELECT value.id, value.name FROM value
LEFT JOIN (option_map_value, option_table)
ON (value.id = option_map_value.value_id AND option_map_value.option_table_id = option_table.id)
WHERE option_table.id IN (5, 2, 3) AND value.y_axis_id = 16;
The problem with the statement seems to be the IN on the WHERE clause. If one of the numbers are different in the IN() part, then there are multiple results - which is not good.
I have tried DISTINCT, which again works if there is one result, but returns many if there is many. The closest we have gotten to is adding a count - to return to value with the most options at the top.
So is there a way to do the WHERE to be more specific. I cannot break it out into option_table.id = 5 AND option_table.id = 2 - because that one fails. But can the WHERE clause be more specifc?
Maybe it is me being pedantic, but I would like to be able to return just the single result, instead of a count of results... Any ideas?

The problem with the statement seems to be the IN on the WHERE clause. If one of the numbers are different in the IN() part, then there are multiple results - which is not good. I have tried DISTINCT, which again works if there is one result, but returns many if there is many. The closest we have gotten to is adding a count - to return to value with the most options at the top.
You were very close, considering the DISTINCT:
SELECT v.id,
v.name
FROM VALUE v
LEFT JOIN OPTION_MAP_VALUE omv ON omv.value_id = v.id
LEFT JOIN OPTION_TABLE ot ON ot.id = omv.option_table_id
WHERE ot.id IN (5, 2, 3)
AND v.y_axis_id = 16
GROUP BY v.id, v.name
HAVING COUNT(*) = 3
You were on the right track, but needed to use GROUP BY instead in order to be able to use the HAVING clause to count the DISTINCT list of values.
Caveat emptor:
The GROUP BY/HAVING COUNT version of the query is dependent on your data model having a composite key, unique or primary, defined for the two columns involved (value_id and option_table_id). If this is not in place, the database will not stop duplicates being added. If duplicate rows are possible in the data, this version can return false positives because a value_id could have 3 associations to the option_table_id 5 - which would satisfy the HAVING COUNT(*) = 3.
Using JOINs:
A safer, though more involved, approach is to join onto the table that can have multiple options, as often as you have criteria:
SELECT v.id,
v.name
FROM VALUE v
JOIN OPTION_MAP_VALUE omv ON omv.value_id = v.id
JOIN OPTION_TABLE ot5 ON ot5.id = omv.option_table_id
AND ot5.id = 5
JOIN OPTION_TABLE ot2 ON ot2.id = omv.option_table_id
AND ot2.id = 2
JOIN OPTION_TABLE ot3 ON ot3.id = omv.option_table_id
AND ot3.id = 3
WHERE v.y_axis_id = 16
GROUP BY v.id, v.name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find rows with fewer than X associations (including 0) - sql

Related

Oracle sql group function issues

How to select all rows that are related to satisfied condition in ms ACCESS

I get the same object twice

Alternative to using GROUP BY without aggregates to retrieve distinct "best" result

MySQL Join from multiple options to select one value

Categories

Resources