How do I use T-SQL's Exists keyword? - sql

I have a query I want to run as a subquery that will return a set of FK's. With them I want to return only rows that has a matching key.
Subquery:
SELECT ID
FROM tblTenantTransCode
WHERE
tblTenantTransCode.CheckbookCode =
(SELECT ID FROM tblCheckbookCode WHERE Description = 'Rent Income')
That will return all the transaction codes that have a checkbook code that matches Rent Income
Now I want to select All Transactions where their transactioncode matches an ID returned in the subquery. I've gotten this far, but SQL Server complains of a syntax error. How can I do this?
Full Query:
SELECT *
FROM tblTransaction
WHERE
tblTransaction.TransactionCode IN
(SELECT ID FROM tblTenantTransCode
WHERE tblTenantTransCode.CheckbookCode =
(SELECT ID FROM tblCheckbookCode WHERE Description = 'Rent Income'))
Tables:
tblCheckbookCode
ID
Description
Other Info
tblTenantTransCode
ID
CheckbookCode <-- fk we're looking for
in the tblCheckbookCode.
We're selecting only checkbook codes
that have the Description 'Rent Income'
Other Info
tblTransactions
ID
TransactionCode <-- fk to tenant transaction code.
We're looking for an ID that is returned
in the above query/join

To answer your question about using the EXISTS keyword, here is an example query that uses an EXISTS predicate, based on the query as currently given in your question.
SELECT t.*
FROM tblTransaction t
WHERE EXISTS
(
SELECT 1
FROM tblTenantTransCode ttc
JOIN tblCheckbookCode cc
ON (cc.ID = ttc.CheckbookCode AND cc.Description='Rent Income')
WHERE ttc.ID = t.TransactionCode
)
Additional Details:
We all recognize that there are a variety of SQL statements that will return the result set that meets the specified requirements. And there are likely going to be differences in the observed performance of those queries. Performance is particularly dependent on the DBMS, the optimizer mode, the query plan, and the statistics (number of rows and data value distribution).
One advantage of the EXISTS is that it makes clear that we aren't interested returning any expressions from tables in the subquery. It serves to logically separate the subquery from the outer query, in a way that a JOIN does not.
Another advantage of using EXISTS is that avoids returning duplicate rows that would be (might be) returned if we were to instead use a JOIN.
An EXISTS predicate can be used to test for the existence of any related row in a child table, without requiring a join. As an example, the following query returns a set of all orders that have at least one associated line_item:
SELECT o.*
FROM order o
WHERE EXISTS
( SELECT 1
FROM line_item li
WHERE li.order_id = o.id
)
Note that the subquery doesn't need to find ALL matching line items, it only needs to find one row in order to satisfy the condition. (If we were to write this query as a JOIN, then we would return duplicate rows whenever an order had more than one line item.)
A NOT EXISTS predicate is also useful, for example, to return a set of orders that do not have any associated line_items.
SELECT o.*
FROM order o
WHERE NOT EXISTS
( SELECT 1
FROM line_item li
WHERE li.order_id = o.id
)
Of course, NOT EXISTS is just one alternative. An equivalent result set could be obtained using an OUTER join and an IS NULL test (assuming we have at least one expression available from the line_item table that is NOT NULL)
SELECT o.*
FROM order o
LEFT
JOIN line_item li ON (li.order_id = o.id)
WHERE li.id IS NULL
There seems to be a lot of discussion (relating to answers to the original question) about needing to use an IN predicate, or needing to use a JOIN.
Those constructs are alternatives, but aren't necessary. The required result set can be returned by a query without using an IN and without using a JOIN. The result set can be returned with a query that uses an EXISTS predicate. (Note that the title of the OP question did ask about how to use the EXISTS keyword.)
Here is another alternative query (this is not my first choice), but the result set returned does satisfy the specified requirements:
SELECT t.*
FROM tblTransaction t
WHERE EXISTS
(
SELECT 1
FROM tblTenantTransCode ttc
WHERE ttc.ID = t.TransactionCode
AND EXISTS
(
SELECT 1
FROM tblCheckbookCode cc
WHERE cc.ID = ttc.CheckbookCode
AND cc.Description = 'Rent Income'
)
)
Of primary importance, the query should return a correct result set, one that satisfies the specified requirements, given all possible sets of conditions.
Some of the queries presented as answers here do NOT return the requested result set, or if they do, they happen to do so by accident. Some of the queries will work if we pre-assume something about the data, such that some columns are UNIQUE and NOT NULL.
Performance differences
Sometimes a query with an EXISTS predicate will not perform as well as a query with a JOIN or an IN predicate. In some cases, it may perform better. (With the EXISTS predicate, the subquery only has to find one row that satisfies the condition, rather than finding ALL matching rows, as would be required by a JOIN.)
Performance of various query options is best gauged by observation.

You are describing an inner join.
select tc.id
from tblTenantTransCode tc
inner join tblCheckbookCode cc on tc.CheckbookCode = cc.CheckbookCode
EDIT: It's still an inner join. I don't see any reason yet to use the IN clause.
select *
from tblTransaction t
inner join tblTenantTransCode tc on tc.id = t.TransactionCode
inner join tblCheckbookCode cc on cc.id = tc.CheckbookCode
where cc.description = 'Rent Income'
EDIT: If you must use the EXISTS predicate to solve this problem, see #spencer7953's answer. However, from what I'm seeing, the solution above is simpler and there are assumptions of uniqueness based on the fact that "Subquery" works for you (it wouldn't 100% of the time if there wasn't uniqueness in that table). I'm also addressing
Now I want to select All Transactions
where their transactioncode
matches an ID returned in the subquery
in my answer. If the request were something on the lines of this:
Now I want to select All Transcations
when any transactioncode matches an ID returned in the subquery.
I would use EXISTS to see if any transactioncode existed in the child table and return every row or none as appropriate.

Given your full query, this query will get you where you need to go using a single join.
The join filters out any transaction that doesn't have a transaction code of 'Rent Income.' It will take all record from the first table, build out the subset of the second table (that WHERE clause limits the records), and then filters the first table where those table math the join condition.
SELECT
t.*
FROM
tblTransaction t
INNER JOIN tblTenantTransCode c ON
t.TransactionCode = c.ID
INNER JOIN tblCheckbookCode chk ON
c.CheckbookCode = chk.ID
WHERE
chk.Description = 'Rent Income'
Edit: One other note: Avoid using SELECT * -- always specify the columns.
Edit Dos: I missed that there were three tables. Corrected! Thanks, spencer!

You need to use the 'IN' clause:
select id from tblTenantTransCode
where tblTenantTransCode.CheckbookCode in
(select id from tblCheckbookCode
where description = 'rent income')
an inner join would probably be a better solution though...
select ttc.id from tblTenantTransCode as ttc
inner join tblCheckbookCode as tcc
on ttc.CheckBookId = tcc.id
where tcc.description = 'rent income'

Try this:
SELECT
tblTenantTransCode.ID
FROM tblCheckbookCode
INNER JOIN tblTenantTransCode ON tblCheckbookCode.ID=tblTenantTransCode.CheckbookCode
WHERE tblCheckbookCode.Description = 'Rent Income'
Make sure you index tblCheckbookCode.Description.

Related

Where clause applied to only one column in join

I'm having some trouble with writing a certain SQL query. I have a wallet and a balance which I do join. The query now looks like that:
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN `balances` ON
( `wallet`.`currency` = `balances`.`currency` )
WHERE
`balances`.`user_id` = '181'
Because of the where clause, the query returns just matching records. I want to get all records from wallets table and only those from balances which do match where clause... hope I explained it well enough!
Cheers!
use subquery
SELECT w.*,t.*
FROM
wallet w
LEFT JOIN ( select * from balances where user_id = 181
) t ON w.currency =t.currency
Issue is you are applying filter on left join table wallets.
use below query.
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN (select * from `balances` `user_id` = '181') ON
( `wallet`.`currency` = `balances`.`currency` );
The question is not fully clear, but you almost definitely need an extra join clause on some sort of ID. Now there is no way to match a wallet with its balance(s). Assuming that balance have eg. a wallet_id, you'll want something like:
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN `balances` ON
(`wallet`.`id` = `balance`.`wallet_id` )
WHERE
`balances`.`user_id` = '181'
Move the condition to the ON clause. Don't use subqueries!
SELECT w.*, b.id
FROM wallet w LEFT JOIN
balances b
ON w.currency = b.currency AND
b.user_id = 181;
Notes:
The subquery in the FROM can impede the optimizer.
If you are using a LEFT JOIN, you should be selecting columns from the first table.
I am guessing that user_id is a number, so I removed the quotes around the comparison value.
Table aliases make the query easier to write and to read.
Backticks make the query harder to write and harder to read.

SQL Query to count the records

I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.
LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

What's the difference between filtering in the WHERE clause compared to the ON clause?

I would like to know if there is any difference in using the WHERE clause or using the matching in the ON of the inner join.
The result in this case is the same.
First query:
with Catmin as
(
select categoryid, MIN(unitprice) as mn
from production.Products
group by categoryid
)
select p.productname, mn
from Catmin
inner join Production.Products p
on p.categoryid = Catmin.categoryid
and p.unitprice = Catmin.mn;
Second query:
with Catmin as
(
select categoryid, MIN(unitprice) as mn
from production.Products
group by categoryid
)
select p.productname, mn
from Catmin
inner join Production.Products p
on p.categoryid = Catmin.categoryid
where p.unitprice = Catmin.mn; // this is changed
Result both queries:
My answer may be a bit off-topic, but I would like to highlight a problem that may occur when you turn your INNER JOIN into an OUTER JOIN.
In this case, the most important difference between putting predicates (test conditions) on the ON or WHERE clauses is that you can turn LEFT or RIGHT OUTER JOINS into INNER JOINS without noticing it, if you put fields of the table to be left out in the WHERE clause.
For example, in a LEFT JOIN between tables A and B, if you include a condition that involves fields of B on the WHERE clause, there's a good chance that there will be no null rows returned from B in the result set. Effectively, and implicitly, you turned your LEFT JOIN into an INNER JOIN.
On the other hand, if you include the same test in the ON clause, null rows will continue to be returned.
For example, take the query below:
SELECT * FROM A
LEFT JOIN B
ON A.ID=B.ID
The query will also return rows from A that do not match any of B.
Take this second query:
SELECT * FROM A
LEFT JOIN B
WHERE A.ID=B.ID
This second query won't return any rows from A that don't match B, even though you think it will because you specified a LEFT JOIN. That's because the test A.ID=B.ID will leave out of the result set any rows with B.ID that are null.
That's why I favor putting predicates in the ON clause rather than in the WHERE clause.
The results are exactly same.
Using "ON" clause is more suggested due to increasing performance of the query.
Instead of requesting the data from tables then filtering, by using on clause, you first filter first data-set and then join the data to other tables. So, lesser data to match and faster result is given.
There is no difference between the above two queries outputs both of them result same.
When you are using On Clause the join operation joins only those rows that matches the codidtion specified on ON Clause
Where as in case of Where Clause, the join opeartion joins all the rows and then filters out based on where condidtion Specified
So, obviously On Clause is more effective and should be preferred over where condidtion

What is the most efficient way to write a select statement with a "not in" subquery?

What is the most efficient way to write a select statement similar to the below.
SELECT *
FROM Orders
WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders)
The gist is you want the records from one table when the item is not in another table.
For starters, a link to an old article in my blog on how NOT IN predicate works in SQL Server (and in other systems too):
Counting missing rows: SQL Server
You can rewrite it as follows:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
, however, most databases will treat these queries the same.
Both these queries will use some kind of an ANTI JOIN.
This is useful for SQL Server if you want to check two or more columns, since SQL Server does not support this syntax:
SELECT *
FROM Orders o
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM HeldOrders ho
)
Note, however, that NOT IN may be tricky due to the way it treats NULL values.
If Held.Orders is nullable, no records are found and the subquery returns but a single NULL, the whole query will return nothing (both IN and NOT IN will evaluate to NULL in this case).
Consider these data:
Orders:
OrderID
---
1
HeldOrders:
OrderID
---
2
NULL
This query:
SELECT *
FROM Orders o
WHERE OrderID NOT IN
(
SELECT OrderID
FROM HeldOrders ho
)
will return nothing, which is probably not what you'd expect.
However, this one:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
will return the row with OrderID = 1.
Note that LEFT JOIN solutions proposed by others is far from being a most efficient solution.
This query:
SELECT *
FROM Orders o
LEFT JOIN
HeldOrders ho
ON ho.OrderID = o.OrderID
WHERE ho.OrderID IS NULL
will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius
An ANTI JOIN method used by both IN and EXISTS will just need to make sure that a record does not exists once per each row in Orders, so it will eliminate all possible duplicates first:
NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
A HASH ANTI JOIN will eliminate duplicates when building the hash table.
"Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.
There are three ways I commonly use to accomplish what you want, depending on the situation.
1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.
2. Another method is the "correlated subquery" which is a slight variation of what you have...
SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID
FROM HeldOrders h
where h.order_id = o.order_id)
Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.
3. Another method I use sometimes is left outer join...
SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null
When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.
Each of these variations can work more or less efficiently in various scenarios.
You can use a LEFT OUTER JOIN and check for NULL on the right table.
SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL
I'm not sure what is the most efficient, but other options are:
1. Use EXISTS
SELECT *
FROM ORDERS O
WHERE NOT EXISTS (SELECT 1
FROM HeldOrders HO
WHERE O.Order_ID = HO.OrderID)
2. Use EXCEPT
SELECT O.Order_ID
FROM ORDERS O
EXCEPT
SELECT HO.Order_ID
FROM HeldOrders
Try
SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL