Difference between EXISTS and IN in SQL? - sql

What is the difference between the EXISTS and IN clause in SQL?
When should we use EXISTS, and when should we use IN?

The exists keyword can be used in that way, but really it's intended as a way to avoid counting:
--this statement needs to check the entire table
select count(*) from [table] where ...
--this statement is true as soon as one match is found
exists ( select * from [table] where ... )
This is most useful where you have if conditional statements, as exists can be a lot quicker than count.
The in is best used where you have a static list to pass:
select * from [table]
where [field] in (1, 2, 3)
When you have a table in an in statement it makes more sense to use a join, but mostly it shouldn't matter. The query optimiser should return the same plan either way. In some implementations (mostly older, such as Microsoft SQL Server 2000) in queries will always get a nested join plan, while join queries will use nested, merge or hash as appropriate. More modern implementations are smarter and can adjust the plan even when in is used.

EXISTS will tell you whether a query returned any results. e.g.:
SELECT *
FROM Orders o
WHERE EXISTS (
SELECT *
FROM Products p
WHERE p.ProductNumber = o.ProductNumber)
IN is used to compare one value to several, and can use literal values, like this:
SELECT *
FROM Orders
WHERE ProductNumber IN (1, 10, 100)
You can also use query results with the IN clause, like this:
SELECT *
FROM Orders
WHERE ProductNumber IN (
SELECT ProductNumber
FROM Products
WHERE ProductInventoryQuantity > 0)

Based on rule optimizer:
EXISTS is much faster than IN, when the sub-query results is very large.
IN is faster than EXISTS, when the sub-query results is very small.
Based on cost optimizer:
There is no difference.

I'm assuming you know what they do, and thus are used differently, so I'm going to understand your question as: When would it be a good idea to rewrite the SQL to use IN instead of EXISTS, or vice versa.
Is that a fair assumption?
Edit: The reason I'm asking is that in many cases you can rewrite an SQL based on IN to use an EXISTS instead, and vice versa, and for some database engines, the query optimizer will treat the two differently.
For instance:
SELECT *
FROM Customers
WHERE EXISTS (
SELECT *
FROM Orders
WHERE Orders.CustomerID = Customers.ID
)
can be rewritten to:
SELECT *
FROM Customers
WHERE ID IN (
SELECT CustomerID
FROM Orders
)
or with a join:
SELECT Customers.*
FROM Customers
INNER JOIN Orders ON Customers.ID = Orders.CustomerID
So my question still stands, is the original poster wondering about what IN and EXISTS does, and thus how to use it, or does he ask wether rewriting an SQL using IN to use EXISTS instead, or vice versa, will be a good idea?

EXISTS is much faster than IN when the subquery results is very large.
IN is faster than EXISTS when the subquery results is very small.
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO
INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5
INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'
Query 1
SELECT
FROM t1
WHERE not EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
Query 2
SELECT t1.*
FROM t1
WHERE t1.id not in (SELECT t2.t1id FROM t2 )
If in t1 your id has null value then Query 1 will find them, but Query 2 cant find null parameters.
I mean IN can't compare anything with null, so it has no result for null, but EXISTS can compare everything with null.

If you are using the IN operator, the SQL engine will scan all records fetched from the inner query. On the other hand if we are using EXISTS, the SQL engine will stop the scanning process as soon as it found a match.

IN supports only equality relations (or inequality when preceded by NOT).
It is a synonym to =any / =some, e.g
select *
from t1
where x in (select x from t2)
;
EXISTS supports variant types of relations, that cannot be expressed using IN, e.g. -
select *
from t1
where exists (select null
from t2
where t2.x=t1.x
and t2.y>t1.y
and t2.z like '℅' || t1.z || '℅'
)
;
And on a different note -
The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.
The tables' definition, statistics' accuracy, database configuration and optimizer's version have all impact on the execution plan and therefore on the performance metrics.

The Exists keyword evaluates true or false, but IN keyword compare all value in the corresponding sub query column.
Another one Select 1 can be use with Exists command. Example:
SELECT * FROM Temp1 where exists(select 1 from Temp2 where conditions...)
But IN is less efficient so Exists faster.

I think,
EXISTS is when you need to match the results of query with another subquery.
Query#1 results need to be retrieved where SubQuery results match. Kind of a Join..
E.g. select customers table#1 who have placed orders table#2 too
IN is to retrieve if the value of a specific column lies IN a list (1,2,3,4,5)
E.g. Select customers who lie in the following zipcodes i.e. zip_code values lies in (....) list.
When to use one over the other... when you feel it reads appropriately (Communicates intent better).

As per my knowledge when a subquery returns a NULL value then the whole statement becomes NULL. In that cases we are using the EXITS keyword. If we want to compare particular values in subqueries then we are using the IN keyword.

Which one is faster depends on the number of queries fetched by the inner query:
When your inner query fetching thousand of rows then EXIST would be better choice
When your inner query fetching few rows, then IN will be faster
EXIST evaluate on true or false but IN compare multiple value. When you don't know the record is exist or not, your should choose EXIST

Difference lies here:
select *
from abcTable
where exists (select null)
Above query will return all the records while below one would return empty.
select *
from abcTable
where abcTable_ID in (select null)
Give it a try and observe the output.

The reason is that the EXISTS operator works based on the “at least found” principle. It returns true and stops scanning table once at least one matching row found.
On the other hands, when the IN operator is combined with a subquery, MySQL must process the subquery first, and then uses the result of the subquery to process the whole query.
The general rule of thumb is that if the subquery contains a large
volume of data, the EXISTS operator provides a better performance.
However, the query that uses the IN operator will perform faster if
the result set returned from the subquery is very small.

In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.
https://docs.oracle.com/cd/B19306_01/server.102/b14211/sql_1016.htm#i28403

My understand is both should be the same as long as we are not dealing with NULL values.
The same reason why the query does not return the value for = NULL vs is NULL.
http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/
As for as boolean vs comparator argument goes, to generate a boolean both values needs to be compared and that is how any if condition works.So i fail to understand how IN and EXISTS behave differently
.

If a subquery returns more than one value, you might need to execute the outer query- if the values within the column specified in the condition match any value in the result set of the subquery. To perform this task, you need to use the in keyword.
You can use a subquery to check if a set of records exists. For this, you need to use the exists clause with a subquery. The exists keyword always return true or false value.

I believe this has a straightforward answer. Why don't you check it from the people who developed that function in their systems?
If you are a MS SQL developer, here is the answer directly from Microsoft.
IN:
Determines whether a specified value matches any value in a subquery or a list.
EXISTS:
Specifies a subquery to test for the existence of rows.

I found that using EXISTS keyword is often really slow (that is very true in Microsoft Access).
I instead use the join operator in this manner :
should-i-use-the-keyword-exists-in-sql

If you can use where in instead of where exists, then where in is probably faster.
Using where in or where exists
will go through all results of your parent result. The difference here is that the where exists will cause a lot of dependet sub-queries. If you can prevent dependet sub-queries, then where in will be the better choice.
Example
Assume we have 10,000 companies, each has 10 users (thus our users table has 100,000 entries). Now assume you want to find a user by his name or his company name.
The following query using were exists has an execution of 141ms:
select * from `users`
where `first_name` ='gates'
or exists
(
select * from `companies`
where `users`.`company_id` = `companies`.`id`
and `name` = 'gates'
)
This happens, because for each user a dependent sub query is executed:
However, if we avoid the exists query and write it using:
select * from `users`
where `first_name` ='gates'
or users.company_id in
(
select id from `companies`
where `name` = 'gates'
)
Then depended sub queries are avoided and the query would run in 0,012 ms

I did a little exercise on a query that I have recently been using. I originally created it with INNER JOINS, but I wanted to see how it looked/worked with EXISTS. I converted it. I will include both version here for comparison.
SELECT DISTINCT Category, Name, Description
FROM [CodeSets]
WHERE Category NOT IN (
SELECT def.Category
FROM [Fields] f
INNER JOIN [DataEntryFields] def ON f.DataEntryFieldId = def.Id
INNER JOIN Section s ON f.SectionId = s.Id
INNER JOIN Template t ON s.Template_Id = t.Id
WHERE t.AgencyId = (SELECT Id FROM Agencies WHERE Name = 'Some Agency')
AND def.Category NOT IN ('OFFLIST', 'AGENCYLIST', 'RELTO_UNIT', 'HOSPITALS', 'EMS', 'TOWCOMPANY', 'UIC', 'RPTAGENCY', 'REP')
AND (t.Name like '% OH %')
AND (def.Category IS NOT NULL AND def.Category <> '')
)
ORDER BY 1
Here are the statistics:
Here is the converted version:
SELECT DISTINCT cs.Category, Name, Description
FROM [CodeSets] cs
WHERE NOT Exists (
SELECT * FROM [Fields] f
WHERE EXISTS (SELECT * FROM [DataEntryFields] def
WHERE def.Id = f.DataEntryFieldId
AND def.Category NOT IN ('OFFLIST', 'AGENCYLIST', 'RELTO_UNIT', 'HOSPITALS', 'EMS', 'TOWCOMPANY', 'UIC', 'RPTAGENCY', 'REP')
AND (def.Category IS NOT NULL AND def.Category <> '')
AND def.Category = cs.Category
AND EXISTS (SELECT * FROM Section s
WHERE f.SectionId = s.Id
AND EXISTS (SELECT * FROM Template t
WHERE s.Template_Id = t.Id
AND EXISTS (SELECT * FROM Agencies
WHERE Name = 'Some Agency' and t.AgencyId = Id)
AND (t.Name like '% OH %')
)
)
)
)
ORDER BY 1
The results, at least to me, were unimpressive.
If I were more technically knowledgeable about how SQL works, I could give you an answer, but take this example as you may and make your own conclusion.
The INNER JOIN and IN () is easier to read, however.

EXISTS Is Faster in Performance than IN.
If Most of the filter criteria is in subquery then better to use IN and If most of the filter criteria is in main query then better to use EXISTS.

If you are using the IN operator, the SQL engine will scan all records fetched from the inner query. On the other hand if we are using EXISTS, the SQL engine will stop the scanning process as soon as it found a match.

Related

How to make a query where every column is a parallel count of a subquery?

I need to render a query such that every column contains the count of a respective table.
The code I have now is:
SELECT COUNT(table1.Id),
COUNT(table2.Id),
COUNT(table3.Id)
FROM table1,
table2,
table3
WHERE table1.done = 'No' OR
table2.done = 'No' OR
table3.done = 'No' OR
But I need the query to return the same result values as if every table would be counted independently, like:
SELECT COUNT(tableX.Id) FROM tableX WHERE talbeX.done = 'No'
where the 'X' stands for 1,2 or 3.
How can this be achived with SQL?
Thanks beforhand for the help.
Just use a nested sub query, exactly as you have explained it:
SELECT
(SELECT COUNT(table1.Id) FROM table1 WHERE table1.done = 'No') as T1Count,
(SELECT COUNT(table2.Id) FROM table2 WHERE table2.done = 'No') as T2Count,
(SELECT COUNT(table3.Id) FROM table3 WHERE table3.done = 'No') as T3Count,
(SELECT COUNT(tableN.Id) FROM tableN) as TNCount;
This will query the tables independently so you are free to use what ever additional criteria you may need without trying to correlate the results from each query
FROM in this case is not strictly necessary in the outer query as we are not returning rows from any specific table, there is no table that we could specify in the from clause. Each RDBMS has their own convention for these types of queries, MS SQL Server and Oracle are to predominant database engines used in Outsystems
If we did specify a table in FROM then this would return 1 row for every record in that table, which is inefficient and not required. So it is important that we do not include a FROM clause.
Transact-SQL - FROM
The FROM clause is usually required on the SELECT statement. The exception is when no table columns are listed, and the only items listed are literals or variables or arithmetic expressions.
ORACLE - DUAL Table
DUAL is a table automatically created by Oracle Database along with the data dictionary. DUAL is in the schema of the user SYS but is accessible by the name DUAL to all users. It has one column, DUMMY, defined to be VARCHAR2(1), and contains one row with a value X. Selecting from the DUAL table is useful for computing a constant expression with the SELECT statement. Because DUAL has only one row, the constant is returned only once. Alternatively, you can select a constant, pseudocolumn, or expression from any table, but the value will be returned as many times as there are rows in the table.
Update - OP is using Oracle!
After attempting the solution, OP responded that it raised the following error:
Error in advanced query SQL2: ORA-00923: FROM keyword not found where expected
The ORA prefix of this error number indicates that the data store is actually an Oracle implementation, so we need to append the FROM DUAL to the query.
SELECT
(SELECT COUNT(table1.Id) FROM table1 WHERE table1.done = 'No') as T1Count,
(SELECT COUNT(table2.Id) FROM table2 WHERE table2.done = 'No') as T2Count,
(SELECT COUNT(table3.Id) FROM table3 WHERE table3.done = 'No') as T3Count,
(SELECT COUNT(tableN.Id) FROM tableN) as TNCount
FROM DUAL;

Ensuring two columns only contain valid results from same subquery

I have the following table:
id symbol_01 symbol_02
1 abc xyz
2 kjh okd
3 que qid
I need a query that ensures symbol_01 and symbol_02 are both contained in a list of valid symbols. In other words I would needs something like this:
select *
from mytable
where symbol_01 in (
select valid_symbols
from somewhere)
and symbol_02 in (
select valid_symbols
from somewhere)
The above example would work correctly, but the subquery used to determine the list of valid symbols is identical both times and is quite large. It would be very innefficient to run it twice like in the example.
Is there a way to do this without duplicating two identical sub queries?
Another approach:
select *
from mytable t1
where 2 = (select count(distinct symbol)
from valid_symbols vs
where vs.symbol in (t1.symbol_01, t1.symbol_02));
This assumes that the valid symbols are stored in a table valid_symbols that has a column named symbol. The query would also benefit from an index on valid_symbols.symbol
You could try use a CTE like;
WITH ValidSymbols AS (
SELECT DISTINCT valid_symbol
FROM somewhere
)
SELECT mt.*
FROM MyTable mt
INNER JOIN ValidSymbols v1
ON mt.symbol_01 = v1.valid_symbol
INNER JOIN ValidSymbols v2
ON mt.symbol_02 = v2.valid_symbol
From a performance perspective, your query is the right way to do this. I would write it as:
select *
from mytable t
where exists (select 1
from valid_symbols vs
where t.symbol_01 = vs.valid_symbol
) and
exists (select 1
from valid_symbols vs
where t.symbol_02 = vs.valid_symbol
) ;
The important component is that you need an index on valid_symbols(valid_symbol). With this index, the lookup should be pretty fast. Appropriate indexes can even work if valid_symbols is a view, although the effect depends on the complexity of the view.
You seem to have a situation where you have two foreign key relationships. If you explicitly declare these relationships, then the database will enforce that the columns in your table match the valid symbols.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

SQL IN() operator with condition inside

I've got table with few numbers inside (or even empty): #states table (value int)
And I need to make SELECT from another table with WHERE clause by definite column.
This column's values must match one of #states numbers or if #states is empty then accept all values (like there is no WHERE condition for this column).
So I tried something like this:
select *
from dbo.tbl_docs docs
where
docs.doc_state in(iif(exists(select 1 from #states), (select value from #states), docs.doc_state))
Unfortunately iif() can't return subquery resulting dataset. I tried different variations with iif() and CASE but it wasn't successful. How to make this condition?
select *
from dbo.tbl_docs docs
where
(
(select count(*) from #states) > 0
AND
docs.doc_state in(select value from #states)
)
OR
(
(select count(*) from #states)=0
AND 1=1
)
Wouldn't a left join do?
declare #statesCount int;
select #statesCount = count(1) from #states;
select
docs.*
from dbo.tbl_docs docs
left join #states s on docs.doc_state = s.value
where s.value is not null or #statesCount = 0;
In general, whenever your query contains sub-queries, you should stop for five minutes, and think hard about whether you really need a sub-query at all.
And if you've got a server capable of doing that, in many cases it might be better to preprocess the input parameters first, or perhaps use constructs such as MS SQL's with.
select *
from dbo.tbl_docs docs
where exists (select 1 from #states where value = doc_state)
or not exists (select 1 from #state)

TSQL NOT EXISTS Why is this query so slow?

Debugging an app which queries SQL Server 05, can't change the query but need to optimise things.
Running all the selects seperately are quick <1sec, eg: select * from acscard, select id from employee... When joined together it takes 50 seconds.
Is it better to set uninteresting accesscardid fields to null or to '' when using EXISTS?
SELECT * FROM ACSCard
WHERE NOT EXISTS
( SELECT Id FROM Employee
WHERE Employee.AccessCardId = ACSCard.acs_card_number )
AND NOT EXISTS
( SELECT Id FROM Visit
WHERE Visit.AccessCardId = ACSCard.acs_card_number )
ORDER by acs_card_id
Do you have indexes on Employee.AccessCardId, Visit.AccessCardId, and ACSCard.acs_card_number?
The SELECT clause is not evaluated in an EXISTS clause. This:
WHERE EXISTS(SELECT 1/0
FROM EMPLOYEE)
...should raise an error for dividing by zero, but it won't. But you need to put something in the SELECT clause for it to be a valid query - it doesn't matter if it's NULL or a zero length string.
In SQL Server, NOT EXISTS (and NOT IN) are better than the LEFT JOIN/IS NULL approach if the columns being compared are not nullable (the values on either side can not be NULL). The columns compared should be indexed, if they aren't already.