Difference in where clause and join - sql

im very new to SQL and currently working with joins the first time in my life. What I am trying to figure out right now is trying to get the difference between to queries.
Query 1:
SELECT name
FROM actor
JOIN casting ON id = actorid
where (SELECT COUNT(ord) FROM casting join actor on actorid = actor.id AND ord=1) >= 30
GROUP BY name
Query 2:
SELECT name
FROM actor
JOIN casting ON id = actorid
AND (SELECT COUNT(ord) FROM casting WHERE actorid = actor.id AND ord=1)>=30)
GROUP BY name
So I would think that doing
FROM casting join actor on actorid = actor.id
in the subquery is the same as
FROM casting WHERE actorid = actor.id.
But apparently it is not. Could anyone help me out and explain why?
Edit: If anyone is wondering: The queries are based on question 13 from http://sqlzoo.net/wiki/More_JOIN_operations

Actually, the part that really looks like a "where" statement is only what's after the keyword ON. We sometimes fall on queries performing some data filtering directly at this stage, but its actual purpose is to specify the criteria used
A "join" is a very common operation that consists of associating the rows of two distinct tables according to a common criteria. For example, if you have, on one side, a table containing a client list in which each of them has a unique client number, and on a other side a order list table in which each order contains the client's number, then you may want to "resolve" the number of the latter table into its name, address, and so on.
Before SQL92 (26 years ago), the only way to achieve this was to write something like this :
SELECT client.name,
client.adress,
orders.product,
orders.totalprice
FROM client,orders
WHERE orders.clientNumber = client.clientNumber
AND orders.totalprice > 100.00
Selecting something from two (or more) tables induces a "cartesian product" which actually consists of associating every row from the first set which every row of the second one. This means that if your first table contains 3 rows and the second one 8 rows, the resulting set would be 24-row wide. And out of these, you use the WHERE clause to exclude basically everything and retain only rows in which the client number is the same on both side.
We understand that the size of the resulting set before filtering can grow exponentially if the contents of the different tables exceed a few rows (which is always the case) and it can get even worse if you imply more than two tables. Also, on the programmer's side, it rapidly becomes rather unreadable.
Therefore, if this is what you actually want to do, you now can explicitly tell the server about it, and specify the criteria at first, which will avoid unnecessary growing temporary subsets, while still letting you filter the results with WHERE if needed.
SELECT client.name,
client.adress,
orders.product,
orders.totalprice
FROM client
JOIN orders
ON orders.clientNumber = client.clientNumber
WHERE orders.totalprice > 100.00
It becomes critical when performing multiple JOIN in a single query, especially when performing both INNER and OUTER joins.

In the 2nd query your nested query takes the actor.id from its root query and only counts the results from that. In the 1st query your nested query counts results from all actors instead of only the specified one.

Related

SQL refusing to do a join even when every identifier is valid? (ORA-00904)

Made this account just to ask about this question after being unable to find/expending the local resources I have, so I come to you all.
I'm trying to join two tables - ORDERS and CUSTOMER - as per a question on my assignment
For every order, list the order number and order date along with the customer number, last name, and first name of the customer who placed the order.
So I'm looking for the order number, date, customer number, and the full name of customers.
The code goes as such
SELECT ORDERS.ORDR_ORDER_NUMBER, ORDERS.ORDR_ORDER_DATE, ORDERS.ORDR_CUSTOMER_NUMBER, CUSTOMER.CUST_LAST, CUSTOMER.CUST_FIRST
FROM ORDERS, CUSTOMER
WHERE ORDERS.ORDR_CUSTOMER_NUMBER = CUSTOMER.CUST_CUSTOMER_NUMBER;
I've done this code without the table identifiers, putting quotation marks around ORDERS.ORDR_CUSTOMER_NUMBER, aliases for the two tables, and even putting a space after ORDR_ in both SELECT & WHERE for laughs and nothing's working. All of them keep coming up with the error in the title (ORA-00904), saying [ORDERS.]ORDR_CUSTOMER_NUMBER is the invalid identifier even though it shouldn't be.
Here also are the tables I'm working with, in case that context is needed for help.
Anyway, the query that produces the result you want should take the form:
select
o.ordr_order_number,
o.ordr_order_date,
c.cust_customer_number,
c.cust_last,
c.cust_first
from orders o
join customer c on c.cust_customer_number = o.ordr_customer_number
As you see the query becomes a lot easier to read and write if you use modern join syntax, and if you use table aliases (o and c).
You have to add JOIN or INNER JOIN to your query. Because the data comes from two different tables the WHERE clause will not select both.
FROM Orders INNER JOIN Customers ON Orders.order_Customer_Number = Customer.Cust_Customer_Number

MySQL: Creating query that sums column while gathering info from other tables

List the Rider's Name, RaceLevel as
Race_Level and the total number of all points based on their placement. Make sure that you don't
list any riders who have not raced in any races yet (not placed yet). Sort the data from highest to
lowest total points
Here is what I have tried so far.
SELECT RIDERS.FIRST_NAME, RIDERS.LAST_NAME, RACES.RACE_LEVEL, PARTICIPATION.PLACEMENT
FROM RIDERS, RACES, PARTICIPATION
WHERE RIDERS.RIDER_ID = PARTICIPATION.RIDER_ID AND RACES.RACE_LEVEL = 'EASY' AND PARTICIPATION.PLACEMENT > 0;
I've tried adding SUM(PARTICIPATION.PLACEMENT) but it removes all results and leaves me with one line. I need to figure out how to sum placement for each individual.
I've tried adding SUM(PARTICIPATION.PLACEMENT) but it removes all results and leaves me with one line.
You need a group by clause to make this a valid aggregation query:
select ri.first_name, ri.last_name, ra.race_level, sum(pa.placement)
from riders ri
inner join participation pa on ri.rider_id = pa.rider_id
inner join races ra on ??
where ra.race_level = 'easy' and pa.placement > 0
group by ri.rider_id, ri.first_name, ri.last_name, ra.race_level
Important notes:
Use standard joins! Implicit joins (with commas in the from clause) are legacy syntax, that should not be used in new code. I modified your code, and now it is plain to see that you are missing a join condition on the race table; you did not provide enough information for me to make an educated guess, so I represented that as ?? in the query
table aliases make the query easier to write and read
adding the primary key of the riders table in the group by clause ensures that the query won't wrongly group together different riders that would have the same name

Return query results where two fields are different (Access 2010)

I'm working in a large access database (Access 2010) and am trying to return records where two locations are different.
In my case, I have a large number of birds that have been observed on multiple dates and potentially on different islands. Each bird has a unique BirdID and also an actual physical identifier (unfortunately that may have changed over time). [I'm going to try addressing the changing physical identifier issue later]. I currently want to query individual birds where one or more of their observations is different than the "IslandAlpha" (the first island where they were observed). Something along the lines of a criteria for BirdID: WHERE IslandID [doesn't equal] IslandAlpha.
I then need a separate query to find where all observations DO equal where they were first observed. So where IslandID = IslandAlpha
I'm new to Access, so let me know if you need more information on how my tables/relationships are set up! Thanks in advance.
Assuming the following tables:
Birds table in which all individual birds have records with a unique BirdID and IslandAlpha.
Sightings table in which individual sightings are recorded, including IslandID.
Your first query would look something like this:
SELECT *
FROM Birds
INNER JOIN Sightings ON Birds.BirdID=Sightings.BirdID
WHERE Sightings.IslandID <> Birds.IslandAlpha
You second query would be the same but with = instead of <> in the WHERE clause.
Please provide us information about the tables and columns you are using.
I will presume you are asking this question because a simple join of tables and filtering where IslandAlpha <> ObsLoc is not possible because IslandAlpha is derived from first observation record for each bird. Pulling first observation record for each bird requires a nested query. Need a unique record identifier in Observations - autonumber should serve. Assuming there is an observation date/time field, consider:
SELECT * FROM Observations WHERE ObsID IN
(SELECT TOP 1 ObsID FROM Observations AS Dupe
WHERE Dupe.ObsBirdID = Observations.ObsBirdID ORDER BY Dupe.ObsDateTime);
Now use that query for subsequent queries.
SELECT * FROM Observations
INNER JOIN Query1 ON Observations.ObsBirdID = Query1.ObsBirdID
WHERE Observations.ObsLocID <> Query1.ObsLocID;

SQL Multiple Joins - How do they work exactly?

I'm pretty sure this works universally across various SQL implementations. Suppose I have many-to-many relationship between 2 tables:
Customer: id, name
has many:
Order: id, description, total_price
and this relationship is in a junction table:
Customer_Order: order_date, customer_id, order_id
Now I want to write SQL query to join all of these together, mentioning the customer's name, the order's description and total price and the order date:
SELECT name, description, total_price FROM Customer
JOIN Customer_Order ON Customer_Order.customer_id = Customer.id
JOIN Order = Order.id = Customer_Order.order_id
This is all well and good. This query will also work if we change the order so it's FROM Customer_Order JOIN Customer or put the Order table first. Why is this the case? Somewhere I've read that JOIN works like an arithmetic operator (+, * etc.) taking 2 operands and you can chain operator together so you can have: 2+3+5, for example. Following this logic, first we have to calculate 2+3 and then take that result and add 5 to it. Is it the same with JOINs?
Is it that behind the hood, the first JOIN must first be completed in order for the second JOIN to take place? So basically, the first JOIN will create a table out of the 2 operands left and right of it. Then, the second JOIN will take that resulting table as its left operand and perform the usual joining. Basically, I want to understand how multiple JOINs work behind the hood.
In many ways I think ORMs are the bane of modern programming. Unleashing a barrage of underprepared coders. Oh well diatribe out of the way, You're asking a question about set theory. THere are potentially other options that center on relational algebra but SQL is fundamentally set theory based. here are a couple of links to get you started
Using set theory to understand SQL
A visual explanation of SQL

How do I remove "duplicate" rows from a view?

I have a view which was working fine when I was joining my main table:
LEFT OUTER JOIN OFFICE ON CLIENT.CASE_OFFICE = OFFICE.TABLE_CODE.
However I needed to add the following join:
LEFT OUTER JOIN OFFICE_MIS ON CLIENT.REFERRAL_OFFICE = OFFICE_MIS.TABLE_CODE
Although I added DISTINCT, I still get a "duplicate" row. I say "duplicate" because the second row has a different value.
However, if I change the LEFT OUTER to an INNER JOIN, I lose all the rows for the clients who have these "duplicate" rows.
What am I doing wrong? How can I remove these "duplicate" rows from my view?
Note:
This question is not applicable in this instance:
How can I remove duplicate rows?
DISTINCT won't help you if the rows have any columns that are different. Obviously, one of the tables you are joining to has multiple rows for a single row in another table. To get one row back, you have to eliminate the other multiple rows in the table you are joining to.
The easiest way to do this is to enhance your where clause or JOIN restriction to only join to the single record you would like. Usually this requires determining a rule which will always select the 'correct' entry from the other table.
Let us assume you have a simple problem such as this:
Person: Jane
Pets: Cat, Dog
If you create a simple join here, you would receive two records for Jane:
Jane|Cat
Jane|Dog
This is completely correct if the point of your view is to list all of the combinations of people and pets. However, if your view was instead supposed to list people with pets, or list people and display one of their pets, you hit the problem you have now. For this, you need a rule.
SELECT Person.Name, Pets.Name
FROM Person
LEFT JOIN Pets pets1 ON pets1.PersonID = Person.ID
WHERE 0 = (SELECT COUNT(pets2.ID)
FROM Pets pets2
WHERE pets2.PersonID = pets1.PersonID
AND pets2.ID < pets1.ID);
What this does is apply a rule to restrict the Pets record in the join to to the Pet with the lowest ID (first in the Pets table). The WHERE clause essentially says "where there are no pets belonging to the same person with a lower ID value).
This would yield a one record result:
Jane|Cat
The rule you'll need to apply to your view will depend on the data in the columns you have, and which of the 'multiple' records should be displayed in the column. However, that will wind up hiding some data, which may not be what you want. For example, the above rule hides the fact that Jane has a Dog. It makes it appear as if Jane only has a Cat, when this is not correct.
You may need to rethink the contents of your view, and what you are trying to accomplish with your view, if you are starting to filter out valid data.
So you added a left outer join that is matching two rows? OFFICE_MIS.TABLE_CODE is not unique in that table I presume? you need to restrict that join to only grab one row. It depends on which row you are looking for, but you can do something like this...
LEFT OUTER JOIN OFFICE_MIS ON
OFFICE_MIS.ID = /* whatever the primary key is? */
(select top 1 om2.ID
from OFFICE_MIS om2
where CLIENT.REFERRAL_OFFICE = om2.TABLE_CODE
order by om2.ID /* change the order to fit your needs */)
If the secondd row has one different value than it is not really duplicate and should be included.
Instead of using DISTINCT, you could use a GROUP BY.
Group by all the fields that you want to be returned as unique values.
Use MIN/MAX/AVG or any other function to give you one result for fields that could return multiple values.
Example:
SELECT Office.Field1, Client.Field1, MIN(Office.Field1), MIN(Client.Field2)
FROM YourQuery
GROUP BY Office.Field1, Client.Field1
You could try using Distinct Top 1 but as Hunter pointed out, if there is if even one column is different then it should either be included or if you don't care about or need the column you should probably remove it. Any other suggestions would probably require more specific info.
EDIT: When using Distinct Top 1 you need to have an appropriate group by statement. You would really be using the Top 1 part. The Distinct is in there because if there is a tie for Top 1 you'll get an error without having some way to avoid a tie. The two most common ways I've seen are adding Distinct to Top 1 or you could add a column to the query that is unique so that sql would have a way to choose which record to pick in what would otherwise be a tie.