List every match with the goals scored by each team as shown. This will use "CASE WHEN" which has not been explained in any previous exercises - sql

The question states "List every match with the goals scored by each team as shown below."
This is the result that the question is asking me to show.
I'm quite confused with LEFT JOIN in particular for this problem. Initially, I used the this code:
SELECT mdate,team1,
SUM( CASE WHEN teamid=team1 THEN 1 ELSE 0 END ) score1,team2,
SUM( CASE WHEN teamid=team2 THEN 1 ELSE 0 END ) score2 FROM game
**JOIN** goal ON matchid = id GROUP BY mdate, team1, team2
However, this does not give the right answer, as the SQLZOO result is not correct. So, I looked up on the Internet for the answer, and it states this:
SELECT mdate,team1,
SUM( CASE WHEN teamid=team1 THEN 1 ELSE 0 END ) score1,team2,
SUM( CASE WHEN teamid=team2 THEN 1 ELSE 0 END ) score2 FROM game
**LEFT JOIN** goal ON matchid = id GROUP BY mdate, team1, team2
How did they know which kind of JOIN to use? I know that for the LEFT JOIN it takes all information from the game table and merges to the goal which includes only the matching information to the goal table. The JOIN table will only include information that both tables have in common.
The Database tables

how did they know which kind of JOIN to use
The definition of LEFT JOIN ON is that it returns the rows of (INNER) JOIN ON plus the unmatched rows of the left table extended by NULLs.
(Where a left row is "unmatched" in an INNER JOIN ON when it is not used to form a result row.)
Thus, there is always at least one row output for every left row input. There will more than one when the the associated INNER JOIN outputs more than one row for some left row(s) input. If the LEFT JOIN is ON a condition requiring that a left FK (foreign key) subrow equals its referenced right table subrow then there will be exactly one row output for each row input.
for the LEFT JOIN it takes every information from game table and merge to the goal which includes only the matching information to the goal table. The JOIN table will only include information that both tables have in common.
The terms you are using are just too vague. They don't actually describe what the operators calculate. (In this case or the general case.) "takes information from", "merge to", "includes only the matching information to" and "include information that both tables have in common" might evoke what the operators do if you already know, but they don't clearly describe or define. It is important in technical work to memorize the exact definitions of technical terms and to be able use only those terms only in the right way.
Is there any rule of thumb to construct SQL query from a human-readable description?

Here we are using left join instead of inner join, for the simple fact that ques asks to list all the matches, and it is not necessary that in every match a goal is scored, we can have missing values of matchid in goal column, which you can check by executing the following query:
select * from game left join goal on id=matchid
where matchid is NULL
and you will find that for matchid 1028 and 1029 no goal was scored.

Related

How to join 4 tables in SQL?

I just started using SQL and I need some help. I have 4 tables in a database. All four are connected with each other. I need to find the amount of unique transactions but can't seem to find it.
Transactions
transaction_id pk
name
Partyinvolved
transaction.id pk
partyinvolved.id
type (buyer, seller)
PartyCompany
partyinvolved.id
Partycompany.id
Companies
PartyCompany.id pk
sector
pk = primary key
The transaction is unique if the conditions are met.
I only need a certain sector out of Companies, this is condition1. Condition2 is a condition inside table Partyinvolved but we first need to execute condition1. I know the conditions but do not know where to put them.
SELECT *
FROM group
INNER JOIN groupB ON groupB.group_id = group.id
INNER JOIN companies ON companies.id = groupB.company_id
WHERE condition1 AND condition2 ;
I want to output the amount of unique transactions with the name.
It is a bit unclear what you are asking as your table definitions look like your hinting at column meanings more than names such as partycompany.id you are probably meaning the column that stores the relationship to PartyCompany column Id......
Anyway, If I follow that logic and I look at your questions about wanting to know where to limit the recordsets during the join. You could do it in Where clause because you are using an Inner Join and it wont mess you your results, but the same would not be true if you were to use an outer join. Plus for optimization it is typically best to add the limiter to the ON condition of the join.
I am also a bit lost as to what exactly you want e.g. a count of transactions or the actual transactions associated with a particular sector for instance. Anyway, either should be able to be derived from a basic query structure like:
SELECT
t.*
FROM
Companies co
INNER JOIN PartyCompancy pco
ON co.PartyCompanyId = pco.PartyCompanyId
INNER JOIN PartyInvolved pinv
ON pco.PartyInvolvedId = pinv.PartyInvolvedId
AND pinv.[type] = 'buyer'
INNER JOIN Transactions t
ON ping.TransactionId = t.TransactionId
WHERE
co.sector = 'some sector'

SQL - subquery returning more than 1 value

What my issue is:
I am constantly returning multiple values when I don't expect to. I am attempting to get a specific climate, determined by the state, county, and country.
What I've tried:
The code given below. I am unsure as to what is wrong with it specifically. I do know that it is returning multiple values. But why? I specify that STATE_ABBREVIATION = PROV_TERR_STATE_LOC and with the inner joins that I do, shouldn't that create rows that are similar except for their different CLIMATE_IDs?
SELECT
...<code>...
(SELECT locations.CLIMATE_ID
FROM REF_CLIMATE_LOCATION locations, SED_BANK_TST.dbo.STATIONS stations
INNER JOIN REF_STATE states ON STATE_ID = states.STATE_ID
INNER JOIN REF_COUNTY counties ON COUNTY_ID = counties.COUNTY_ID
INNER JOIN REF_COUNTRY countries ON COUNTRY_ID = countries.COUNTRY_ID
WHERE STATE_ABBREVIATION = PROV_TERR_STATE_LOC) AS CLIMATE_ID
...<more code>...
FROM SED_BANK_TST.dbo.STATIONS stations
I've been at this for hours, looking up different questions on SO, but I cannot figure out how to make this subquery return a single value.
All those inner joins don't reduce the result set if the IDs you're testing exist in the REF tables. Apart from that you're doing a Cartesian product between locations and stations (which may be an old fashioned inner join because of the where clause).
You'll only get a single row if you only have a single row in the locations table that matches a single row in the stations table under the condition that STATE_ABBREVIATION = PROV_TERR_STATE_LOC
Your JOINs show a hierarchy of locations: Country->State->County, but your WHERE clause only limits by the state abbreviation. By joining the county you'll get one record for every county in that state. You CAN limit your results by taking the TOP 1 of the results, but you need to be very careful that that's really what you want. If you're looking for a specific county, you'll need to include that in the WHERE clause. You get some control with the TOP 1 in that it will give the top 1 based on an ORDER BY clause. I.e., if you want the most recently added, use:
SELECT TOP 1 [whatever] ORDER BY [DateCreated] DESC;
For your subquery, you can do something like this:
SELECT TOP 1
locations.CLIMATE_ID
FROM REF_CLIMATE_LOCATION locations ,
SED_BANK_TST.dbo.STATIONS stations
INNER JOIN REF_STATE states ON STATE_ID = states.STATE_ID
INNER JOIN REF_COUNTY counties ON COUNTY_ID = counties.COUNTY_ID
INNER JOIN REF_COUNTRY countries ON COUNTRY_ID = countries.COUNTRY_ID
WHERE STATE_ABBREVIATION = PROV_TERR_STATE_LOC
Just be sure to either add an ORDER BY at the end or be okay with it choosing the TOP 1 based on the "natural order" on the tables.
If you are expecting to have a single value on your sub-query, probably you need to use DISTINCT. The best way to see it is you run your sub-query separately and see the result. If you need to include other columns from the tables you used, you may do so to check what makes your result have multiple rows.
You can also use MAX() or MIN() or TOP 1 to get a single value on the sub-query but this is dependent to the logic you want to achieve for locations.CLIMATE_ID. You need to answer the question, "How is it related to the rest of the columns retrieved?"

Cannot find correct number of values in a table that are not in another table, though I can do otherwise

I want to retrieve the course_id in table course that is not in the table takes. Table takes only contains course_id of courses taken by students. The problem is that if I have:
select count (distinct course.course_id)
from course, takes
where course.course_id = (takes.course_id);
the result is 85 which is smaller than the total number of course_id in table course, which is 200. The result is correct.
But I want to find the number of course_id that are not in the table takes, and I have:
select count (distinct course.course_id)
from course, takes
where course.course_id != (takes.course_id);
The result is 200, which is equal the number of course_id in table course. What is wrong with my code?
This SQL will give you the count of course_id in table course that aren't in the table takes:
select count (*)
from course c
where not exists (select *
from takes t
where c.course_id = t.course_id);
You didn't specify your DBMS, however, this SQL is pretty standard so it should work in the popular DBMSs.
There are a few different ways to accomplish what you're looking for. My personal favorite is the LEFT JOIN condition. Let me walk you through it:
Fact One: You want to return a list of courses
Fact Two: You want to
filter that list to not include anything in the Takes table.
I'd go about this by first mentally selecting a list of courses:
SELECT c.Course_ID
FROM Course c
and then filtering out the ones I don't want. One way to do this is to use a LEFT JOIN to get all the rows from the first table, along with any that happen to match in the second table, and then filter out the rows that actually do match, like so:
SELECT c.Course_ID
FROM
Course c
LEFT JOIN -- note the syntax: 'comma joins' are a bad idea.
Takes t ON
c.Course_ID = t.Course_ID -- at this point, you have all courses
WHERE t.Course_ID IS NULL -- this phrase means that none of the matching records will be returned.
Another note: as mentioned above, comma joins should be avoided. Instead, use the syntax I demonstrated above (INNER JOIN or LEFT JOIN, followed by the table name and an ON condition).

Derived column results

i'm an sql novice and have to Formulate an SQL query that lists all 5 columns from a QUALITY table and adds two more columns: ProductCode of the items produced in the batch, and a derived column BatchQuality that contains “Poor” if the batch is of poor quality (contains more than 1 defective item) and “Good” otherwise.
I'm pulling from 3 tables that I put in an oracle database: Production table(contains serialno, batchno, and productcode), Quality table (batchno, test1, test2, teste3, test4), and defective table (defectiveid, serialno).
I'm able to get 6 out of 7 columns by using the following:
select q.batchno, q.test1, q.test2, q.test3, q.test4, p.productcode_id
from production p, defective d, quality q
where d.serialno = p.serialno
and p.batchno = q.batchno;
Any ideas on how to get the last column called batchquality that says if it's good or poor? I'm thinking that I need a count function, but once I have that, how would I go about getting a new column that would state poor or good?
Appreciate any help that can be provided.
Your current query is an inner join using an old, outdated implicit join in the where clause. I assume the defective table only contains a row for a product if there was a defect. Your inner join will always return defective parts only, never parts without defects. For that you need an outer join. Another reason to ditch the outdated implicit joins and use an explicit JOIN operator:
select q.batchno, q.test1, q.test2, q.test3, q.test4, p.productcode_id
from production p
JOIN quality q ON p.batchno = q.batchno;
LEFT JOIN defective d ON d.serialno = p.serialno
For products that do not have defects, the values for the columns from the defective table will be null. So to get a flag if a product had is "good" or "bad" you need to check if the value is null:
select q.batchno, q.test1, q.test2, q.test3, q.test4, p.productcode_id,
case
when d.serialno is null then 'good'
else 'bad'
as batch_quality
from production p
JOIN quality q ON p.batchno = q.batchno;
LEFT JOIN defective d ON d.serialno = p.serialno
Due to the nature of joins, the above statement will however repeat each row from the production table for each row in the quality and defective table. It is not clear to me if you want that or not.

Modelling database for a small soccer league

The database is quite simple. Below there is a part of a schema relevant to this question
ROUND (round_id, round_number)
TEAM (team_id, team_name)
MATCH (match_id, match_date, round_id)
OUTCOME (team_id, match_id, score)
I have a problem with query to retrieve data for all matches played. The simple query below gives of course two rows for every match played.
select *
from round r
inner join match m on m.round_id = r.round_id
inner join outcome o on o.match_id = m.match_id
inner join team t on t.team_id = o.team_id
How should I write a query to have the match data in one row?
Or maybe should I redesign the database - drop the OUTCOME table and modify the MATCH table to look like this:
MATCH (match_id, match_date, team_away, team_home, score_away, score_home)?
You can almost generate the suggested change from the original tables using a self join on outcome table:
select o1.team_id team_id_1,
o2.team_id team_id_2,
o1.score score_1,
o2.score score_2,
o1.match_id match_id
from outcome o1
inner join outcome o2 on o1.match_id = o2.match_id and o1.team_id < o2.team_id
Of course, the information for home and away are not possible to generate, so your suggested alternative approach might be better after all. Also, take note of the condition o1.team_id < o2.team_id, which gets rid of the redundant symmetric match data (actually it gets rid of the same outcome row being joined with itself as well, which can be seen as the more important aspect).
In any case, using this select as part of your join, you can generate one row per match.
you fetch 2 rows for every matches played but team_id and team_name are differents :
- one for team home
- one for team away
so your query is good
Using the match table as you describe captures the logic of a game simply and naturally and additionally shows home and away teams which your initial model does not.
You might want to add the round id as a foreign key to round table and perhaps a flag to indicate a match abandoned situation.
drop outcome. it shouldn't be a separate table, because you have exactly one outcome per match.
you may consider how to handle matches that are cancelled - perhaps scores are null?