Under what conditions do these two SQL queries give different results? - sql

Disclaimer: I'm helping a person learn SQL by using an online tutorial. So you can consider this as a homework question.
There are 3 SQL Server tables that we are dealing with here:
Ships (name, class)
Classes - different classes of the ship (class)
Outcomes - what has happened to that ship (ship, result)
Now there are several crazy things in the layout of the database, the biggest one being that the ships in the Outcomes table may not be present in the Ships table when they are named the same way as the class in Class.
The point is to get the number of sunk ships of each class. I have helped the student to get to the following SQL:
Select
dbo.Classes.[class], Count(dbo.Outcomes.ship) as [count]
from
dbo.Classes
left join
dbo.Ships on dbo.Ships.[class] = dbo.Classes.[class]
left join
dbo.Outcomes on (dbo.Outcomes.ship = dbo.Classes.[class] or
dbo.Outcomes.ship = dbo.Ships.name)
and dbo.Outcomes.result = 'sunk'
Group by
dbo.Classes.[class]
However, it, apparently, is the incorrect solution, as it can on some occasions return incorrect results. On the web i have managed to find the following solution to this tutorial:
select
classes.class, count(T.ship)
from
classes
left join
(select
ship, class
from
outcomes
left join
ships on ship = name
where
result = 'sunk'
union
select
ship, class
from
outcomes
left join
classes on ship = class
where
result = 'sunk') as T on classes.class = T.class
group by
classes.class
But I cannot understand under which conditions will the results be different. Doesn't the Union operation of using two different join paths serve the exactly same function as does OR in the condition of the join?
P.S. This particular question in the tutorial is actually marked as 2 on the scale from 1-5 for difficulty. So i feel myself quite stupid.

With this dataset:
with Ships as (
select * from (values
('HMS Prince of Wales','King George V')
,('King George V','King George V')
)Ships(name,class)
),
Classes as (
select * from (values
('King George V')
)Classes(class)
),
Outcomes as (
select * from (values
('HMS Prince of Wales','sunk')
,('King George V','sunk')
)Outcomes(ship,result)
)
the two queries you provide respectively yield:
class count
------------- -----------
King George V 3
class
------------- -----------
King George V 2
The reason for the difference is that UNION is a set operator that eliminates duplicates (unlike UNION ALL) while the OR operator doesn't. We can test this by replacing UNION in the second query with UNION ALL, which now yields:
class
------------- -----------
King George V 3
just as in your first proposed solution.

The queries might return different results in many cases. One obvious case is when a sunk ship matches both class and name in outcomes. The union in the second query will return one row in this case. The join in the first query will return two rows. Hence the counts will be different.
I think you could fix this particular issue by using count(distinct) in the first query.

Needed goal :
"The point is to get the number of sunk ships of each class. I have helped the student to get to the following SQL:"
Sudo code should be created in three steps. You would need to break them into working steps
select * from class
//create a list of all of the classes that exists in your db
select * from [working previous statement] where results = "sunk"
//create another select statement that uses the previous select statement to refine results
//then refine it to have ships that have been sunk
select count from [[previous statement with the two selects]]
//This create the count of all of the sunk ships.
select count from [[previous statement with the two selects]] group by [previous statement with one select]
//this one should create the individual count of all of the sunken ships based on it's class
select sum from [[your pick of which statement]]
//do a sum
naturally you will need to do the linking so the code should be implemented as
select ship from ships sh, classes cl, outcome oc where sh.class = cl.class and oc.ship = sh.ship and oc.result = sunk group by cl.classes
then add the count steps in the mix.

Related

How to join 2 tables with multiple conditions each?

I have data split between 2 tables, and need to join the necessary data together for analysis.
One table Test 3 Output contains ID numbers, and the return value of the test. The other table Test Results contains the same IDs, along with their corresponding serial number and overall test result.
I need to combine these into a single table that just displays ID, serial number and test value.
Sorry in advance for the horrible SQL thats about to follow, I'm brand new to this.
I have 2 working queries that give me what I want, but I can't seem to join them together.
The first query:
select `ID`,`Serial Number` from `Test Results`t where (len(`Serial Number`)=16 and FailMode = '24V Supply FAIL')
This gets me the ID and serial number of all the tests that failed '24V supply'. It also filters out garbage serial numbers as the correct ones should have 16 digits.
The second query:
select `ID` from `Test 3 Output`o where o.`24V Supply (V)`<30
This gets me the ID and test results, and filters out some results that were greater than 30V. Note that '24V Supply(V) is the name of the column containing the test results.
Now when I try to join these with the ID, I get a syntax error. Here's what I tried:
select `ID`,`Serial Number`
from `Test Results`t
where (len(`Serial Number`)=16 and FailMode = '24V Supply FAIL')
left join (`Test 3 Output`o ON t.`ID` = o.`ID` where o.`24V Supply (V)`<30)
This gives the error:
Error: Syntax error (missing operator) in query expression (len(`Serial Number`)=16 and FailMode = '24V Supply FAIL') left join (`Test 3 Output`o ON t.`ID` = o.`ID` where o.`24V Supply (V)`<30)
I'm not sure what operator I'm missing but I had a feeling its related to the fact there's two where statements?
Can anyone offer some help?
Edit: I found a workaround since I can't use 2 where clauses with a join. I created 2 views with my 2 separate queries, and performed the join on those which got me what I wanted. I'd still like to hear a proper way of doing it though :)
You can join 2 subqueries like this:
SELECT q1.a, q1.b, q2.c
FROM (
(SELECT a, b FROM table1
WHERE b > 10) AS q1
LEFT JOIN
(SELECT a, c FROM table2
WHERE c > 20) AS q2
ON q1.a = q2.a
)
Doing the subqueries as separate query objects is easier to debug, but the query objects keep piling up...

Having SQL Server choose and show one record over other

Ok, hopefully I can explain this accurately. I work in SQL Server, and I am trying to get one row from a table that will show multiple rows for the same person for various reasons.
There is a column called college_attend which will show either New or Cont for each student.
My issue: my initial query narrows down the rows I'm pulling by Academic Year, which consists of two semesters: Fall of one year, and Spring of the following to create an academic year. This is why there are two rows returned for some students.
Basically, I need to generate an accurate count of those that are "New" and those that are "Cont", but I don't want both records for the same student counted. They will have two records because they will have one for spring and one for fall (usually). So if a student is "New" in fall, they will have a "Cont" record for spring. I want the query to show ONLY the "New" record if they have both a "New' and "Cont" record, and count it (which I will do in Report Builder). The other students will basically have two records that are "Cont": one for fall, and one "Cont" for spring, and so those would be considered the continuing ones or "Cont".
Here is the basic query I have so far:
SELECT DISTINCT
people.people_id,
people.last_name,
people.first_name,
academic.college_attend AS NewORCont,
academic.academic_year,
academic.academic_term,
FROM
academic
INNER JOIN
people ON people.people_id = academic.people_id
INNER JOIN
academiccalendar acc ON acc.academic_year = academic.academic_year
AND acc.academic_term = academic.academic_term
AND acc.true_academic_year = #Academic_year
I'm not sure if this can be done with a CASE statement? I thought of a GROUP BY, but then SQL Server will want me to add all of my columns to the GROUP BY clause, and that ends up negating the purpose of the grouping in the first place.
Just a sample of what I work with for each student:
People ID
Last
First
NeworCont
12345
Soanso
Guy
New
12345
Soanso
Guy
Cont
32345
Person
Nancy
Cont
32345
Person
Nancy
Cont
55555
Smith
John
New
55555
Smith
John
Cont
---------
------
-------
----------
Hopefully this sheds some light on the duplicate record issue I mentioned.
Without sample data its awkward to visualize the problem, and without the expected results specified it's also unclear what you want as the outcome. Perhaps this will assist, it will limit the results to only those who have both 'New' and 'Cont' in a single "true academic year" but the count seems redundant as this (I imagine) will always be 2 (being 1 New term and 1 Cont term)
SELECT
people.people_id
, people.last_name
, people.first_name
, acc.true_academic_year
, count(*) AS count_of
FROM academic
INNER JOIN people ON people.people_id = academic.people_id
INNER JOIN academiccalendar acc ON acc.academic_year = academic.academic_year
AND acc.academic_term = academic.academic_term
AND acc.true_academic_year = #Academic_year
GROUP BY
people.people_id
, people.last_name
, people.first_name
, acc.true_academic_year
HAVING MAX(academic.college_attend) = 'New'
AND MIN(academic.college_attend) = 'Cont'

SQL - Getting a column from another table to join this query

I've got the code below which displays the location_id and total number of antisocial crimes but I would like to get the location_name from a different table called location_dim be output as well. I tried to find a way to UNION it but couldn't get it to work. Any ideas?
SELECT fk5_location_id , COUNT(fk3_crime_id) as TOTAL_ANTISOCIAL_CRIMES
from CRIME_FACT
WHERE fk1_time_id = 3 AND fk3_crime_id = 1
GROUP BY fk5_location_id;
You want to use join to lookup the location name. The query would probably look like this:
SELECT ld.location_name, COUNT(cf.fk3_crime_id) as TOTAL_ANTISOCIAL_CRIMES
from CRIME_FACT cf join
LOCATION_DIM ld
on cf.fk5_location_id = ld.location_id
WHERE cf.fk1_time_id = 3 AND cf.fk3_crime_id = 1
GROUP BY ld.location_name;
You need to put in the right column names for ld.location_name and ld.location_id.
you need to find a relationship between the two tables to link a location to crime. that way you could use a "join" and select the fields from each table you are interested in.
I suggest taking a step back and reading up on the fundamentals of relational databases. There are many good books out there which is the perfect place to start.

sql query help multi joins?

here are my tables, im using sql developer oracle
Carowner(Carowner id, carowner-name,)
Car (Carid, car-name, carowner-id*)
Driver(driver_licenceno, driver-name)
Race(Race no, race-name, prize-money, race-date)
RaceEntry(Race no*, Car id*, Driver_licenceno*, finishing_position)
im trying to list to do the query below
which drivers have come second in races from the start of this year.
lncluding race name, driver name, and the name of the car in the output
i have attempted
select r.racename, d.driver-name, c.carowner-name
from race r, driver d, car c, raceentry re
where re.finishing_position = 2 and r.race-date is ...
Something like:
select r.racename, d.driver-name, c.carowner-name
from race r
join raceentry re on r.race_no = re.race_no
join car c on re.car_Id = c.car_id
join driver d on re.driverliscenceNo = d.driverliscenceNo
where re.finishing_position = 2 and r.race-date >='20130101'
This assumes only one car and one driver with a finsih place of 2nd in a particular race. You may need more conditions otherwise. If this is your own table design, you need to start right now learning to be consistent in your nameing between tables. It is important. Fields that are in multiple tables should have the same name and data type. Also you need to stop using implicit syntax. This ia aSQL antipattern and a very poor programming technique. It leads to mistakes such as accidental cross joins and is harder to read and maintain when things get more complex. As you are clearly learning, you need to stop this bad habit right now.
First off, multiple joins in the where clause are hard to get used to when you define more than 3 or 4 tables IMHO.
Do this instead:
Select
a.columnfroma
, b.columnfromb
, c.columnfromc
from tablea a
join tableb b on a.columnAandBShare = b.columnAandBShare
join tablec c on b.columnBandCShare = c.columnBandCShare
This while no one would say is a method you have to use, it is a much more readable method of performing joins.
Otherwise you are doing the joins in the where clause and if you have other predicates with your joins you are going to have to comment out which is which if you ever need to go back and look at it.

How to write a query returning non-chosen records

I have written a psychological testing application, in which the user is presented with a list of words, and s/he has to choose ten words which very much describe himself, then choose words which partially describe himself, and words which do not describe himself. The application itself works fine, but I was interested in exploring the meta-data possibilities: which words have been most frequently chosen in the first category, and which words have never been chosen in the first category. The first query was not a problem, but the second (which words have never been chosen) leaves me stumped.
The table structure is as follows:
table words: id, name
table choices: pid (person id), wid (word id), class (value between 1-6)
Presumably the answer involves a left join between words and choices, but there has to be a modifying statement - where choices.class = 1 - and this is causing me problems. Writing something like
select words.name
from words left join choices
on words.id = choices.wid
where choices.class = 1
and choices.pid = null
causes the database manager to go on a long trip to nowhere. I am using Delphi 7 and Firebird 1.5.
TIA,
No'am
Maybe this is a bit faster:
SELECT w.name
FROM words w
WHERE NOT EXISTS
(SELECT 1
FROM choices c
WHERE c.class = 1 and c.wid = w.id)
Something like that should do the trick:
SELECT name
FROM words
WHERE id NOT IN
(SELECT DISTINCT wid -- DISTINCT is actually redundant
FROM choices
WHERE class == 1)
SELECT words.name
FROM
words
LEFT JOIN choices ON words.id = choices.wid AND choices.class = 1
WHERE choices.pid IS NULL
Make sure you have an index on choices (class, wid).