Does EXISTS(subquery) really return TRUE? - sql

As far as I know, EXISTS(subQuery) returns true when subQuery is nonempty. However, I am not sure if TRUE or FALSE values are the only values EXISTS returns. Consider the following piece of code:
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT ProductName FROM Products WHERE Products.SupplierID = Suppliers.supplierID AND Price < 20);
Query above returns 24 records.
Also, consider the following:
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT * FROM Products);
(Note that second query is essentially the same as:
SELECT SupplierName
FROM Suppliers
)
The Second Query now returns 29 records. If EXISTS only returns TRUE when subquery is nonempty, then there shouldn't be a difference between two queries. What's going on?
(You are welcome to try on your own here: https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_exists)

In your first query, which I would write as:
SELECT SupplierName
FROM Suppliers s
WHERE EXISTS (
SELECT 1
FROM Products p
WHERE p.SupplierID = s.supplierID AND Price < 20
);
the EXISTS clause only returns true if, for a given supplier record, we can find a matching product record with a price less than 20. This may not be true for every supplier record in the outer part of the query. On the other hand, in your second query:
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT * FROM Products);
the EXISTS clause will always return true assuming that the Products table has even one record in it. So your second query is logically the same as WHERE TRUE assuming Products is not empty.
The key point to understand here is that the exists clause will logically be evaluated for every record in the Suppliers table. It is not simply evaluated once, though in the case of the second query that could be done this way with the same result.

The difference between the two queries is that the first query is checking for a specific condition (price < 20) within the subquery, while the second query is only checking for the existence of any records within the subquery. So in the first query, only suppliers with products that have a price less than 20 will be returned, while in the second query, all suppliers will be returned because there are products in the Products table. The EXISTS keyword is used to check if there are any records in the subquery that meet the specified condition, and if there are, it returns true and the main query returns the specified columns.
here is an example of using the EXISTS operator in SQL:
-- Example 1: Find all customers who have placed an order
SELECT * FROM Customers
WHERE EXISTS (SELECT * FROM Orders WHERE Orders.CustomerID = Customers.CustomerID);
-- Example 2: Find all products that have never been ordered
SELECT * FROM Products
WHERE NOT EXISTS (SELECT * FROM Orders WHERE Orders.ProductID = Products.ProductID);
In this example, the outer query selects all rows from the Products table where no corresponding row exists in the Orders table that has a matching ProductID.
Keep in mind that the subquery in the EXISTS operator can be any SELECT statement that returns a result set. The only requirement is that the subquery returns at least one row for the EXISTS to evaluate to true.

Related

Performing a query on a foreign table but returning only the original table?

Let's say I have a Item and Person table, where the Person 'id' column matches the Item 'person_id' column. I want to retrieve records in the Item table where the owner (Person) of the Item has 'category' 1. Sample structure below:
Item
id
item_name
person_id
Person
id
name
category (can be 1, 2, or 3)
I understand that I can use 'join' to find rows of the Item table where the two ids match, but I cannot use join for my use case. I need my SQL query to return only Item columns, and not Person columns. Basically, how can I construct a query that will query a table using values in another table, while still maintaining the original table structure?
I understand that I can use 'join' to find rows of the Item table where the two ids match, but I cannot use join for my use case. I need my SQL query to return only Item columns, and not Person columns.
This statement is just false. You can select whatever columns you want when joining:
select i.*
from items i join
persons p
on i.person_id = p.id and p.category = 1;
In terms of constructing the query, exists is also a very reasonable approach, but you can definitely use join.
One way is using EXISTS with correlated subquery
SELECT *
FROM Item i
WHERE EXISTS (SELECT 1
FROM Person p
WHERE p.id = i.person_id AND p.category = 1)
You could use IN clause to filter the person table with category 1
SELECT * FROM Item WHERE person_id IN (
SELECT Id FROM Person WHERE category=1)

Why do repeated values appear in SQL results

I'm with a doubt about joins. For example, using an example database dvdrental, this query:
SELECT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
Some records appear repeated, for example, it appears 3 times "342 Harold Martino" like:
342 Harold Martino
342 Harold Martino
342 Harold Martino
Do you know why it appears repeated records like in this example that appears the same Record 3 times? This repetition means that there are 3 records in the payment table where customer_id = 342? But this query "select * from payment where customer_id = 342" returns 32 records. So I'm not understanding properly how the join works.
There are many resources around this, so to be short your expression says this in plain english:
Get all the records from the customer table
Then for each of those records, get every payment record that has the same value in the customer_Id field.
return a single row for each payment record that duplicates all the fields from the customer record for each row in the payment record.
Finally, only return 3 columns:
the customer_id column from the customer table
the first_name column that is in one of the customer or payment table
the last_name column that is in one of the customer or payment table
Note that we didn't bring back any columns from the payment table... (I assume first_name and last_name are fields in the customer table...)
Keep in mind, a CROSS JOIN (or a FULL OUTER JOIN) is a join that says take all fields from the left side and create a single row combination that is multiplied by the right side, so for every row on the left, return a combination of the left row with every row on the right. So the number of rows returned in a CROSS JOIN is the number of rows in the current table, multiplied by the number of rows in the joined table.
In your query, an INNER JOIN or LEFT INNER JOIN will produce a recordset that includes all the fields from the current table structure and will include fields from the joined table as well.
the implicit LEFT component specifies that only records that match the existing table structure should be returned, so in this case only Payment records that match a customer_id in the currently not filtered customer table will be returned.
The number of resulting rows is the number of rows in the joined table that have a match in the current table.
If instead you want to query:
Select all the customers that have payments
then you can use a join, but you should also use a DISTINCT clause, to only return the unique records:
SELECT DISTINCT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
An alternative way to do this is to use a sub-query instead of a join:
SELECT customer_id
, first_name
, last_name
FROM customer
WHERE EXISTS (SELECT customer_id FROM payment WHERE payment.customer_id = customer.customer_id)
The rules on when to use one style of query over the other are pretty convoluted and very dependant on the number of rows in each table, the types of indexes that are available and even the type or version of the RDBMS you are operating within.
To optimise, run both queries, compare the results and timing and use the one that fits your database better. If later performance becomes an issue, try the other one :)
Select the Customer_id field

Why is the AVG not returning anything in my Where clause in SQL?

The question ask,
"List the customer name and phone number for customers whose credit limit is less than the average credit limit. (Customers table)"
So far I have
SELECT customerName, phone
FROM Customers
WHERE creditLimit < (SELECT AVG (creditLimit)
FROM Products);
Yet this code is not returning anything, what am I doing wrong? Thank you.
I would be surprised if Products has a credit limit. If you qualify your column names, you will immediately see your error:
SELECT c.customerName, c.phone
FROM Customers c
WHERE c.creditLimit < (SELECT AVG(p.creditLimit)
FROM Products p
);
In other words, you should be using Customers in the subquery. What is happening is that the query is interpreted like this:
SELECT c.customerName, c.phone
FROM Customers c
WHERE c.creditLimit < (SELECT AVG(c.creditLimit)
FROM Products p
);
That is, the subquery is using the creditLimit from the row being compared, rather than from the table referenced in the subquery (because there is no such column). A value is never less than itself, so this never evaluates to true.
A more important lesson to learn: always qualify your column names, so you don't ever have a problem like this again.

Is GROUP BY needed in the following correlated subquery?

Given scenario:
table fd
(cust_id, fd_id) primary-key and amount
table loan
(cust_id, l_id) primary-key and amount
I want to list all customers who have a fixed deposit with an amount less than the sum of all their loans.
Query:
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id);
OR should we use
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id group by cust_id);
A customer can have multiple loans but one FD is considered at a time.
GROUP BY can be omitted in this case, because there is only (one) aggregate function(s) in the SELECT list and all rows are guaranteed to belong to the same group of cust_id ( by the WHERE clause).
The aggregation will be over all rows with matching cust_id in both cases. So both queries are correct.
This would be a cleaner another way to implement the same thing:
SELECT fd.cust_id
FROM fd
JOIN loan USING (cust_id)
GROUP BY fd.cust_id, fd.amount
HAVING fd.amount < sum(loan.amount)
There is one difference: rows with identical (cust_id, amount) in fd only appear once in the result of my query, while they would appear multiple times in the original.
Either way, if there is no matching row with a non-null amount in table loan, you get no rows at all. I assume you are aware of that.
There are no need for GROUP BY since you filtered data by cust_id. In any case inner query will return the same result.
No, it isn't, because you calculate sum(amount) for customer with id = fd.cust_id, so for a single customer.
However, if somehow your subquery calculate sum for more than one customer, the group by would cause the subquery to generate more than one row and this will cause the condition(<) to fail, and thus, the query to fail.
A query with an aggregate like sum but without a group by will output one group. The aggregates will be computed over all matching rows.
A subquery in a condition clause is only allowed to return one row. If the subquery returned multiple rows, what would the following expression mean?
where 1 > (... subquery ...)
So the group by must be omitted; you would even get an error for your second query.
N.B. When you specify all, any, or some a subquery can return multiple rows:
where 1 > ALL (... subquery ...)
But it's easy to see why that doesn't make sense in your case; you'd compare one customer's data to that of another.

Some SQL Questions

I have been using SQL for years, but have mostly been using the query designer within SQL Studio (etc.) to put together my queries. I've recently found some time to actually "learn" what everything is doing and have set myself the following fairly simple tasks. Before I begin, I'd like to ask the SOF community their thoughts on the questions, possible answers and any tips they may have.
The questions are;
Find all records w/ a duplicate in a particular column (e.g. a linking id is in more than 1 record throughout table)
SUM price from a linked table within the same query (select within a select?)
Explain the difference between the 4 joins; LEFT, RIGHT, OUTER, INNER
Copy data from one table to another based on SELECT and WHERE criteria
Input welcomed & appreciated.
Chris
I recommend that you start by following some tutorials on this topic. Your questions are not uncommon questions for someone moving from a beginner to intermediate level in SQL. SQLZoo is an excellent resource for learning SQL so consider following that.
In response to your questions:
1) Find all records with a duplicate in a particular column
There are two steps here: find duplicate records and select those records. To find the duplicate records you should be doing something along the lines of:
select possible_duplicate_field, count(*)
from table
group by possible_duplicate_field
having count(*) > 1
What we're doing here is selecting everything from a table, then grouping it by the field we want to check for duplicates. The count function then gives me a count of the number of items within that group. The HAVING clause indicates that we want to filter AFTER the grouping to only show the groups which have more than one entry.
This is all fine in itself but it doesn't give you the actual records that have those values on them. If you knew the duplicate values then you'd write this:
select * from table where possible_duplicate_field = 'known_duplicate_value'
We can use the SELECT within a select to get a list of the matches:
select *
from table
where possible_duplicate_field in (
select possible_duplicate_field
from table
group by possible_duplicate_field
having count(*) > 1
)
2) SUM price from a linked table within the same query
This is a simple JOIN between two tables with a SUM of the two:
select sum(tableA.X + tableB.Y)
from tableA
join tableB on tableA.keyA = tableB.keyB
What you're doing here is joining two tables together where those two tables are linked by a key field. In this case, this is a natural join which operates as you would expect (i.e. get me everything from the left table which has a matching record in the right table).
3) Explain the difference between the 4 joins; LEFT, RIGHT, OUTER, INNER
Consider two tables A and B. The concept of "LEFT" and "RIGHT" in this case are slightly clearer if you read your SQL from left to right. So, when I say:
select x from A join B ...
The left table is "A" and the right table is "B". Now, when you explicitly say "LEFT" the SQL statement you are declaring which of the two tables you are joining is the primary table. What I mean by this is: Which table do I scan through first? Incidentally, if you omit the LEFT or RIGHT, then SQL implicitly uses LEFT.
For INNER and OUTER you are declaring what to do when matches don't exist in one of the tables. INNER declares that you want everything in the primary table (as declared using LEFT or RIGHT) where there is a matching record in the secondary table. Hence, if the primary table contains keys "X", "Y" and "Z", and the secondary table contains keys "X" and "Z", then an INNER will only return "X" and "Z" records from the two tables.
When OUTER is used, we're saying: Give me everything from the primary table and anything that matches from the secondary table. Hence, in the previous example, we'd get "X", "Y" and "Z" records in the output record set. However, there would be NULLs in the fields which should have come from the secondary table for key value "Y" as it doesn't exist in the secondary table.
4) Copy data from one table to another based on SELECT and WHERE criteria
This is pretty trivial and I'm surprised you've never encountered it. It's a simple nested SELECT in an INSERT statement (this may not be supported by your database - if not, try the next option):
insert into new_table select * from old_table where x = y
This assumes the tables have the same structure. If you have different structures then you'll need to specify the columns:
insert into new_table (list, of, fields)
select list, of, fields from old_table where x = y
Let's say you have 2 tables named :
[OrderLine] with the columns [Id, OrderId, ProductId, Qty, Status]
[Product] with [Id, Name, Price]
1) all orderline of command having more than 1 line (it's technically the same as looking for duplicates on OrderId :) :
select OrderId, count(*)
from OrderLine
group by OrderId
having count(*) > 1
2) total price for all order line of the order 1000
select sum(p.Price * ol.Qty) as Price
from OrderLine ol
inner join Product p on ol.ProductId = p.Id
where ol.OrderId = 1000
3) difference between joins:
a inner join b => take all a that has a match with b. if b is not found, a will be not be returned
a left join b => take all a, match them with b, include a even if b is not found
a righ join b => b left join a
a outer join b => (a left join b) union ( a right join b)
4) copy order lines to a history table :
insert into OrderLinesHistory
(CopiedOn, OrderLineId, OrderId, ProductId, Qty)
select
getDate(), Id, OrderId, ProductId, Qty
from
OrderLine
where
status = 'Closed'
To answer #4 and to perhaps show at least some understanding of SQL and the fact this isn't HW, just me trying to learn best practise;
SET NOCOUNT ON;
DECLARE #rc int
if #what = 1
BEGIN
select id from color_mapper where product = #productid and color = #colorid;
select #rc = ##rowcount
if #rc = 0
BEGIN
exec doSavingSPROC #colorid, #productid;
END
END
END