Cannot find correct number of values in a table that are not in another table, though I can do otherwise - sql

I want to retrieve the course_id in table course that is not in the table takes. Table takes only contains course_id of courses taken by students. The problem is that if I have:
select count (distinct course.course_id)
from course, takes
where course.course_id = (takes.course_id);
the result is 85 which is smaller than the total number of course_id in table course, which is 200. The result is correct.
But I want to find the number of course_id that are not in the table takes, and I have:
select count (distinct course.course_id)
from course, takes
where course.course_id != (takes.course_id);
The result is 200, which is equal the number of course_id in table course. What is wrong with my code?

This SQL will give you the count of course_id in table course that aren't in the table takes:
select count (*)
from course c
where not exists (select *
from takes t
where c.course_id = t.course_id);
You didn't specify your DBMS, however, this SQL is pretty standard so it should work in the popular DBMSs.

There are a few different ways to accomplish what you're looking for. My personal favorite is the LEFT JOIN condition. Let me walk you through it:
Fact One: You want to return a list of courses
Fact Two: You want to
filter that list to not include anything in the Takes table.
I'd go about this by first mentally selecting a list of courses:
SELECT c.Course_ID
FROM Course c
and then filtering out the ones I don't want. One way to do this is to use a LEFT JOIN to get all the rows from the first table, along with any that happen to match in the second table, and then filter out the rows that actually do match, like so:
SELECT c.Course_ID
FROM
Course c
LEFT JOIN -- note the syntax: 'comma joins' are a bad idea.
Takes t ON
c.Course_ID = t.Course_ID -- at this point, you have all courses
WHERE t.Course_ID IS NULL -- this phrase means that none of the matching records will be returned.
Another note: as mentioned above, comma joins should be avoided. Instead, use the syntax I demonstrated above (INNER JOIN or LEFT JOIN, followed by the table name and an ON condition).

Related

SQL - subquery returning more than 1 value

What my issue is:
I am constantly returning multiple values when I don't expect to. I am attempting to get a specific climate, determined by the state, county, and country.
What I've tried:
The code given below. I am unsure as to what is wrong with it specifically. I do know that it is returning multiple values. But why? I specify that STATE_ABBREVIATION = PROV_TERR_STATE_LOC and with the inner joins that I do, shouldn't that create rows that are similar except for their different CLIMATE_IDs?
SELECT
...<code>...
(SELECT locations.CLIMATE_ID
FROM REF_CLIMATE_LOCATION locations, SED_BANK_TST.dbo.STATIONS stations
INNER JOIN REF_STATE states ON STATE_ID = states.STATE_ID
INNER JOIN REF_COUNTY counties ON COUNTY_ID = counties.COUNTY_ID
INNER JOIN REF_COUNTRY countries ON COUNTRY_ID = countries.COUNTRY_ID
WHERE STATE_ABBREVIATION = PROV_TERR_STATE_LOC) AS CLIMATE_ID
...<more code>...
FROM SED_BANK_TST.dbo.STATIONS stations
I've been at this for hours, looking up different questions on SO, but I cannot figure out how to make this subquery return a single value.
All those inner joins don't reduce the result set if the IDs you're testing exist in the REF tables. Apart from that you're doing a Cartesian product between locations and stations (which may be an old fashioned inner join because of the where clause).
You'll only get a single row if you only have a single row in the locations table that matches a single row in the stations table under the condition that STATE_ABBREVIATION = PROV_TERR_STATE_LOC
Your JOINs show a hierarchy of locations: Country->State->County, but your WHERE clause only limits by the state abbreviation. By joining the county you'll get one record for every county in that state. You CAN limit your results by taking the TOP 1 of the results, but you need to be very careful that that's really what you want. If you're looking for a specific county, you'll need to include that in the WHERE clause. You get some control with the TOP 1 in that it will give the top 1 based on an ORDER BY clause. I.e., if you want the most recently added, use:
SELECT TOP 1 [whatever] ORDER BY [DateCreated] DESC;
For your subquery, you can do something like this:
SELECT TOP 1
locations.CLIMATE_ID
FROM REF_CLIMATE_LOCATION locations ,
SED_BANK_TST.dbo.STATIONS stations
INNER JOIN REF_STATE states ON STATE_ID = states.STATE_ID
INNER JOIN REF_COUNTY counties ON COUNTY_ID = counties.COUNTY_ID
INNER JOIN REF_COUNTRY countries ON COUNTRY_ID = countries.COUNTRY_ID
WHERE STATE_ABBREVIATION = PROV_TERR_STATE_LOC
Just be sure to either add an ORDER BY at the end or be okay with it choosing the TOP 1 based on the "natural order" on the tables.
If you are expecting to have a single value on your sub-query, probably you need to use DISTINCT. The best way to see it is you run your sub-query separately and see the result. If you need to include other columns from the tables you used, you may do so to check what makes your result have multiple rows.
You can also use MAX() or MIN() or TOP 1 to get a single value on the sub-query but this is dependent to the logic you want to achieve for locations.CLIMATE_ID. You need to answer the question, "How is it related to the rest of the columns retrieved?"

How to select one field based on another field, and then subtract any with more than 1 result?

Ok, so I have been looking everywhere and i cant seem to find anything.
Heres what Im trying to do:
SELECT SFirstName, SLastName
FROM Advisors
WHERE Students <= ('1') and Students.AdvisorID=Advisors.AdvisorID;
The SName is student names, and i need to basically list advisors based on the number of active students for each, and then filter out any with more than one student.
Heres an attempt to code this with the DBO names and all.
SELECT AFirstName, ALastName
FROM DBO.Students, DBO.Advisors
WHERE Advisor.AdvisorID=Student.AdvisorID AND >1;
Basically the advisorID is a Foreign Key within the Students table, and I need to match it with the advisor table and then the >1 statement. I dont know how to reduce results by number yet however.
Whenever I try this it tells me it cannot find the advisor.advisorID or the student.AdvisorID. how would i do this while still using the foreign key from one and the primary key from the other to cross check matches.
If I understand you question correctly, you want to list the Advisors with less than 1 Students. If so, what you need is a LEFT JOIN coupled with COUNT and HAVING:
SELECT
a.AFirstName, a.ALastName
FROM dbo.Student s
LEFT JOIN dbo.Advisors a
ON a.AdvisorID = s.AdvisorID
GROUP BY
a.AdvisorID, a.AFirstName, a.ALastName
HAVING
COUNT(s.AdvisorID) <= 1
Notes:
Avoid using the old-style JOIN syntax.
Alias your tables to improve readability.

How to show count value as 0 on rows removed with WHERE (microsoft access)

I have two tables where one table represent the survey with the location and the other table the people interviewed (there are many people for each survey). I'm trying to show the count of people over a certain age in each location, however some provinces don't have anyone over certain ages therefore don't show in the resulting table. I would like the count to show zero if no one is over a certain age.
I have:
SELECT a.location, Count([b.age])
FROM Survey AS a LEFT JOIN person AS b ON a.surveyid = b.surveyid
Where b.age >= 85
GROUP BY a.location;
I realize that the WHERE clause is what is eliminating the zero count results but I can't figure out the subquery I would need.
Use conditional aggregation instead. That means moving the boolean condition to the argument of the aggregation function
SELECT s.location,
SUM(IIF(p.age >= 85, 1, 0))
FROM Survey AS s LEFT JOIN
person AS p
ON s.surveyid = p.surveyid
GROUP BY s.location;
Noticed that I changed the table aliases to be abbreviations of the table names. This makes the query easier to follow.

why results of two queries are different?

select distinct ID, title, takes.course_id
from course join takes
on course.course_id = takes.course_id
where takes.course_id in
(select takes.course_id
from takes
where ID = '10204');
select ID, title, takes.course_id
from course join takes
on course.course_id = takes.course_id
where ID = '10204';
I want to query the course IDs and the titles of the courses that a student whose ID is 10204 takes. The first gives a result with 5000 rows which is incorrect. The second give a correct result. So what is wrong with the first?
The first query gives you data for all students that happen to take a course that 10204 also takes.
Essentially the first query can be read as "Find all courses and the students that take them, for any course that is also taken by student 10204". You can look at the first query as a 3 way join. The results of the subquery select takes.course_id from takes where ID = '10204' would be the "third" table.
Adding to the pile on since everyone seems to be offering bits and pieces, some of which are oddly irate...
The first query says "Give me information on the students and courses where the courses were also taken by student 10204"
The second query says "Give me information on the students and courses taken by student 10204"
You say you wanted to get the course IDs and Titles for the courses taken by the student 10204, so obviously the second query is the correct one. You don't care about other student's that have taken the same courses.
Perhaps, to put it into perspective, rewriting the first, and incorrect query will help:
select distinct ID, title, takes.course_id
from course
join takes
on course.course_id = takes.course_id
join takes as takes2
on takes.course_id = takes2.course_id
WHERE
takes2.ID = '10204');
Well that is could be because in the first query you are quering where the course_id in takes table is equal to a specific course_id in that table (WHICH CAN BE NOT UNIQUE)
and in the second query you are straightly querying where the course_id is equal to a unique ID in that table!
Thanks you guys. I think my problem is that I did not realize that other students can take the same courses with the student having ID 10204. That this why though the condition is to query only courses take by the students 10204, the results is all about the courses taken by both 10204 and other students.
Because takes.ID != course.ID. The first you use takes.ID in the where clause but the second you use course.ID

Modelling database for a small soccer league

The database is quite simple. Below there is a part of a schema relevant to this question
ROUND (round_id, round_number)
TEAM (team_id, team_name)
MATCH (match_id, match_date, round_id)
OUTCOME (team_id, match_id, score)
I have a problem with query to retrieve data for all matches played. The simple query below gives of course two rows for every match played.
select *
from round r
inner join match m on m.round_id = r.round_id
inner join outcome o on o.match_id = m.match_id
inner join team t on t.team_id = o.team_id
How should I write a query to have the match data in one row?
Or maybe should I redesign the database - drop the OUTCOME table and modify the MATCH table to look like this:
MATCH (match_id, match_date, team_away, team_home, score_away, score_home)?
You can almost generate the suggested change from the original tables using a self join on outcome table:
select o1.team_id team_id_1,
o2.team_id team_id_2,
o1.score score_1,
o2.score score_2,
o1.match_id match_id
from outcome o1
inner join outcome o2 on o1.match_id = o2.match_id and o1.team_id < o2.team_id
Of course, the information for home and away are not possible to generate, so your suggested alternative approach might be better after all. Also, take note of the condition o1.team_id < o2.team_id, which gets rid of the redundant symmetric match data (actually it gets rid of the same outcome row being joined with itself as well, which can be seen as the more important aspect).
In any case, using this select as part of your join, you can generate one row per match.
you fetch 2 rows for every matches played but team_id and team_name are differents :
- one for team home
- one for team away
so your query is good
Using the match table as you describe captures the logic of a game simply and naturally and additionally shows home and away teams which your initial model does not.
You might want to add the round id as a foreign key to round table and perhaps a flag to indicate a match abandoned situation.
drop outcome. it shouldn't be a separate table, because you have exactly one outcome per match.
you may consider how to handle matches that are cancelled - perhaps scores are null?