Same queries giving different results - sql

So for an assignment at school, we had to extract a count from a database. The question was as follows,
--19) How many airports in a timezone name containing 'Europe' are used as a source airport (source_airport_id) for a route that has aircraft that have a wake of 'M' or 'L'
This was the code I came up with,
SELECT count(DISTINCT airports.id) FROM airports WHERE timezone_name LIKE '%Europe%' AND id IN
(SELECT source_airport_id FROM routes WHERE id IN
(SELECT id FROM route_aircrafts WHERE aircraft_id IN
(SELECT id FROM aircrafts WHERE wake_size IN ('M', 'L'))));
it returned 544, while the professors answer returned 566.
SELECT count (DISTINCT airports.id)
FROM airports, routes, route_aircrafts, aircrafts
WHERE airports.id = routes.source_airport_id
AND routes.id = route_aircrafts.route_id
AND aircrafts.id = route_aircrafts.aircraft_id
AND airports.timezone_name LIKE'%Europe%'
AND aircrafts.wake_size IN ('M', 'L'); --566
To me, those two should be doing the same thing and I can't understand why the answers are different.

To get the same answer in your query you need:
SELECT count(DISTINCT airports.id) FROM airports WHERE timezone_name LIKE '%Europe%' AND id IN
(SELECT source_airport_id FROM routes WHERE id IN
(SELECT route_id FROM route_aircrafts WHERE aircraft_id IN
(SELECT id FROM aircrafts WHERE wake_size IN ('M', 'L'))));
You'd used the primary ID field rather than the foreign key route_id. You were getting an approximately similar result because there must be a significant overlap in the values.

I would go with something along the lines of:
SELECT COUNT(DISTINCT airports.id)
FROM airports
INNER JOIN routes ON airports.id = routes.source_airport_id
INNER JOIN route_aircrafts ON routes.id = route_aircrafts.route_id
INNER JOIN aircrafts ON route_aircrafts.aircraft_id = aircrafts.id
AND aircrafts.wake_size IN ('M', 'L')
WHERE airports.timezone_name LIKE '%Europe%'
Explanation:
SELECT COUNT(DISTINCT airports.id)
You don't want to count duplicate airports.ids more than once.
FROM airports
This is the main table you're counting from. All other tables build from this one.
INNER JOIN routes ON airports.id = routes.source_airport_id
INNER JOIN will only include rows that match in both tables. Matching on airports.id and routes.source_airport_id.
INNER JOIN route_aircrafts ON routes.id = route_aircrafts.route_id
INNER JOIN will only include rows that match in both tables. Matching on routes.id and route_aircrafts.route_id.
INNER JOIN aircrafts ON route_aircrafts.aircraft_id = aircrafts.id
AND aircrafts.wake_size IN ('M', 'L')
Same thing with the INNER JOINs above. We've added an additional filter for wakes. For an INNER JOIN, this filter can also be performed in the WHERE clause without changing the results. Putting filters in the JOIN keeps the intent together (and the optimizer will likely filter this way anyway). For an OUTER JOIN, filtering in the JOIN vs filtering in the WHERE can possibly return different results (depending on your data).
WHERE airports.timezone_name LIKE '%Europe%'
Now we are filtering the entire resultset by the timezone_name from the base table of airports.
When working with SQL, it's important to think of your data in SETS. This will help you write more performant, and less programatic, queries.

Related

How to count total crossover from two tables each with specific conditions

I am working from two tables in a dataset. Let's call the first one 'Demographic_Info', the other 'Study_Info'. The two tables both have a Subject_ID column. How can I run a query that will return all of the Subject_IDs where Sex = Male (from Demographic_Info) but also where the Study Case = Case (from Study_Info)?
Is this an inner join? Do I need to make a combined table?
I just don't know what function to use. I know how to select for each of these conditions in each table individually, but not how to run them against eachother.
Yes, you will want to inner join and then use the where clause to filter on both tables.
select
s.Subject_ID
from `Study_info` s
inner join `Demographic_info` d on s.Subject_ID = d.Subject_ID
where d.Sex = 'Male'
and s.Study_Case = 'Case' -- Unclear from your question about the actual field name
The aliases s and d will be useful for organizing which table each field comes from (or if the same field occurs in both tables).
Similarly, you could filter first and then perform the join.
with study as (select * from `Study_info` where Study_Case = 'Case'),
demographics as (select * from `Demographic_info` where Sex = 'Male')
select s.Subject_ID
from study s
inner join demographics d on s.Subject_ID = d.Subject_ID

How to select joined rows even if there is no match?

I checked so many similar questions but none apply to Firebird I guess.
I have two tables; one stores the customer information and the second stores the stock activities (which also includes orders). I'd like to fetch all customers and the counts of orders they have made. But no matter how I join the orders table; I end up with only the customers that have at least one order. That means customers who don't have a match in the stock activities table won't show up in the result set.
Here is the query I run;
SELECT
C.NAME, C.GROUPNAME, C.EMAIL,
COALESCE(COUNT(DISTINCT S.ORDERNO), '0') AS TOTALORDERS,
COALESCE(SUM(S.AMOUNT), '0') as TOTALREVENUE
FROM CUSTOMERS C
LEFT OUTER JOIN STOCK_ACTIVITY S ON C.ID = S.CUSTOMERID
WHERE C.GROUPNAME = 'B'
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
GROUP BY C.NAME, C.GROUPNAME, C.EMAIL
Without the join, I get 570 rows (of customers) and it's the correct result set. When I join the orders table to fetch the total order amount of these customers; I get only 379 results; which are the ones having at least one order. That means customers who don't have orders won't return. As you might have guessed; I want to have the customers having zero activity to return "0" as order amount and revenue.
The problem is that your WHERE clause filters on the "right hand" table's values.
WHERE ...
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
When the outer join generates records for "unmatched" rows from the left table, it supplies NULL values for all columns from the right table. So S.TYPE is NULL for those records.
There are two possible solutions:
Explicitly allow for the "NULL record" case in your WHERE logic.
By some standards this might be "more pure" in separating join conditions from filters, but it can get fairly complicated (and hence error-prone). One issue to be aware of is that you may have to distinguish generated NULL records from "real" records of the right table that just happen to have some NULL data.
Testing for the right table's value for the join key to be NULL should be reasonably safe. You could test for the right table's PK value to be NULL (assuming you have a true PK on that table).
Move the predicate from the WHERE clause to the outer join's ON clause.
This is very simple, and looks like
SELECT C.NAME, C.GROUPNAME, C.EMAIL,
COALESCE(COUNT(DISTINCT S.ORDERNO), '0') AS TOTALORDERS,
COALESCE(SUM(S.AMOUNT), '0') as TOTALREVENUE
FROM CUSTOMERS C
LEFT OUTER JOIN STOCK_ACTIVITY S
ON C.ID = S.CUSTOMERID
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
WHERE C.GROUPNAME = 'B'
GROUP BY C.NAME, C.GROUPNAME, C.EMAIL
This effectively filters the STOCK_ACTIVITY records presented to the join before attempting to match them against CUSTOMERS records (meaning the NULL records can still be generated without interference). ("Effectively" because it's folly to talk like you know what steps the DBMS will follow; all we can say is this has the same effect that you'd get by following certain steps...)
If there is no STOCK_ACTIVITY for a CUSTOMER a line full of NULLs will be attached. This also means that the WHERE statement AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE') never can be true for those lines.
Keep the aggregate operation separated from the JOIN. That is the cleanest. First do the grouping then join the additional information.

SQL - why is this 'where' needed to remove row duplicates, when I'm already grouping?

Why, in this query, is the final 'WHERE' clause needed to limit duplicates?
The first LEFT JOIN is linking programs to entities on a UID
The first INNER JOIN is linking programs to a subquery that gets statistics for those programs, by linking on a UID
The subquery (that gets the StatsForDistributorClubs subset) is doing a grouping on UID columns
So, I would've thought that this would all be joining unique records anyway so we shouldn't get row duplicates
So why the need to limit based on the final WHERE by ensuring the 'program' is linked to the 'entity'?
(irrelevant parts of query omitted for clarity)
SELECT LmiEntity.[DisplayName]
,StatsForDistributorClubs.*
FROM [Program]
LEFT JOIN
LMIEntityProgram
ON LMIEntityProgram.ProgramUid = Program.ProgramUid
INNER JOIN
(
SELECT e.LmiEntityUid,
sp.ProgramUid,
SUM(attendeecount) [Total attendance],
FROM LMIEntity e,
Timetable t,
TimetableOccurrence [to],
ScheduledProgramOccurrence spo,
ScheduledProgram sp
WHERE
t.LicenseeUid = e.lmientityUid
AND [to].TimetableOccurrenceUid = spo.TimetableOccurrenceUid
AND sp.ScheduledProgramUid = spo.ScheduledProgramUid
GROUP BY e.lmientityUid, sp.ProgramUid
) AS StatsForDistributorClubs
ON Program.ProgramUid = StatsForDistributorClubs.ProgramUid
INNER JOIN LmiEntity
ON LmiEntity.LmiEntityUid = StatsForDistributorClubs.LmiEntityUid
LEFT OUTER JOIN Region
ON Region.RegionId = LMIEntity.RegionId
WHERE (
[Program].LicenseeUid = LmiEntity.LmiEntityUid
OR
[LMIEntityProgram].LMIEntityUid = LmiEntity.LmiEntityUid
)
If you were grouping in your outer query, the extra criteria probably wouldn't be needed, but only your inner query is grouped. Your LEFT JOIN to a grouped inner query can still result in multiple records being returned, for that matter any of your JOINs could be the culprit.
Without seeing sample of duplication it's hard to know where the duplicates originate from, but GROUPING on the outer query would definitely remove full duplicates, or revised JOIN criteria could take care of it.
You have in result set:
SELECT LmiEntity.[DisplayName]
,StatsForDistributorClubs.*
I suppose that you dublicates comes from LMIEntityProgram.
My conjecture: LMIEntityProgram - is a bridge table with both LmiEntityId an ProgramId, but you join only by ProgramId.
If you have several LmiEntityId for single ProgramId - you must have dublicates.
And this dublicates you're filtering in WHERE:
[LMIEntityProgram].LMIEntityUid = LmiEntity.LmiEntityUid
You can do it in JOIN:
LEFT JOIN LMIEntityProgram
ON LMIEntityProgram.ProgramUid = Program.ProgramUid
AND [LMIEntityProgram].LMIEntityUid = LmiEntity.LmiEntityUid

SQL Table A Left Join Table B And top of table B

Im working myself into an SQL frenzy, hopefully someone out there can help!
I've got 2 tables which are basically Records and Outcomes, I want to join the 2 tables together, count the number of outcomes per record (0 or more) which I've got quite easily with:
Select records.Id, (IsNull(Count(outcomes.Id),0)) as outcomes
from records
Left Join
outcomes
on records.Id = outcomes.Id
group by
records.Id
The outcomes table also has a timestamp in it, what I want to do is include the last outcome in my result set, if I add that the my query it generates a record for every combination of records to outcomes.
Can any SQL expert point me in the right direction?
Cheers,
try:
SELECT
dt.Id, dt.outcomes,MAX(o.YourTimestampColumn) AS LastOne
FROM (SELECT --basically your original query, just indented differently
records.Id, (ISNULL(COUNT(outcomes.Id),0)) AS outcomes
from records
LEFT JOIN outcomes ON records.Id = outcomes.Id
GROUP BY records.Id
) dt
INNER JOIN outcomes o ON dt.Id = o.Id
GROUP BY dt.Id, dt.outcomes

SQL help: COUNT aggregate, list of entries and its comment count

So, what I intended to do is to fetch a list of entries/posts with their category and user details, AND each of its total published comments. (entries, categories, users, and comments are separate tables)
This query below fetches the records fine, but it seems to skip those entries with no comments. As far as I can see, the JOINs are good (LEFT JOIN on the comments table), and the query is correct. What did I miss ?
SELECT entries.entry_id, entries.title, entries.content,
entries.preview_image, entries.preview_thumbnail, entries.slug,
entries.view_count, entries.posted_on, entry_categories.title AS category_title,
entry_categories.slug AS category_slug, entry_categories.parent AS category_parent,
entry_categories.can_comment AS can_comment, entry_categories.can_rate AS can_rate,
users.user_id, users.group_id, users.username, users.first_name, users.last_name,
users.avatar_small, users.avatar_big, users.score AS user_score,
COUNT(entry_comments.comment_id) AS comment_count
FROM (entries)
JOIN entry_categories ON entries.category = entry_categories.category_id
JOIN users ON entries.user_id = users.user_id
LEFT JOIN entry_comments ON entries.entry_id = entry_comments.entry_id
WHERE `entries`.`publish` = 'Y'
AND `entry_comments`.`publish` = 'Y'
AND `entry_comments`.`deleted_at` IS NULL
AND `category` = 5
GROUP BY entries.entry_id, entries.title, entries.content,
entries.preview_image, entries.preview_thumbnail, entries.slug,
entries.view_count, entries.posted_on, category_title, category_slug,
category_parent, can_comment, can_rate, users.user_id, users.group_id,
users.username, users.first_name, users.last_name, users.avatar_big,
users.avatar_small, user_score
ORDER BY posted_on desc
edit: I am using MySQL 5.0
Well, you're doing a left join on entry_comments, with conditions:
`entry_comments`.`publish` = 'Y'
`entry_comments`.`deleted_at` IS NULL
For the entries with no comments, these conditions are false.
I guess this should solve the problem:
WHERE `entries`.`publish` = 'Y'
AND (
(`entry_comments`.`publish` = 'Y'
AND `entry_comments`.`deleted_at` IS NULL)
OR
`entry_comments`.`id` IS NULL
)
AND `category` = 5
In the OR condition, I put entry_comments.id, assuming this is the primary key of the entry_comments table, so you should replace it with the real primary key of entry_comments.
It's because you are setting a filter on columns in the entry_comments table. Replace the first with:
AND IFNULL(`entry_comments`.`publish`, 'Y') = 'Y'
Because your other filter on this table is an IS NULL one, this is all you need to do to allow the unmatched rows from the LEFT JOIN through.
Try changing the LEFT JOIN to a LEFT OUTER JOIN
OR
I'm no expert with this style of SQL joins (more of an Oracle man myself), but the wording of the left join is leading me to believe that it is joining entry_comments on to entries with entry_comments on the left, you really want it to be the other way around (I think).
So try something like:
LEFT OUTER JOIN entries ON entries.entry_id = entry_comments.entry_id