Multi keys in SQL WHERE IN clause - sql

Say you have Accounts table where ID column is PK and TaxID+AccountNumber is unique constraint:
select * from Accounts where ID in (100, 101)
now you want to make similar query using the natural key:
select * from Accounts
where {TaxID, AccountNumber} in
({"0123456", "2000897"}, {"0125556", "2000866"})
So this involves tuples and looks pretty legitimate. Is it possible to express somehow with ANSI SQL? Maybe in some specific SQL extension? If not, why (will appreciate any speculations)?

Both of these are valid ISO/ANSI Full SQL-92 syntax:
SELECT a.*
FROM Accounts a
INNER JOIN
( VALUES('0123456', '2000897'), ('0125556', '2000866')
) AS v(TaxID, AccountNumber)
ON (a.TaxID, a.AccountNumber) = (v.TaxID, v.AccountNumber)
SELECT *
FROM Accounts a
WHERE (a.TaxID, a.AccountNumber) IN
( VALUES ('0123456', '2000897'), ('0125556', '2000866') )
But I don't think either of them works in any current DBMS.
This is also valid Full SQL-92 syntax (it doesn't work in SQL-Server 2008 because of the NATURAL JOIN):
SELECT a.*
FROM Accounts a
NATURAL JOIN
( VALUES('0123456', '2000897'), ('0125556', '2000866')
) AS v(TaxID, AccountNumber)
This is also valid SQL (not sure if it is in the 92 specification or later) - and is what you have (but using parenthesis, not curly brackets).
It is supported by MySQL, Postgres, DB2 (but not SQL Server):
SELECT a.*
FROM Accounts a
WHERE (TaxID, AccountNumber) IN
( ('0123456', '2000897'), ('0125556', '2000866') )
;
There has been a similar question in DBA.SE, with various other ways to formulate this:
selecting where two columns are in a set

If you are using T-SQL, then an option that looks a bit like your hypothetical query is to use table literals, like this:
select *
from Accounts a
inner join (values('0123456', '2000897'),('0125556', '2000866'))
as v(TaxID, AccountNumber)
on a.TaxID = v.TaxID and a.AccountNumber = v.AccountNumber
Here you create a table literal named v that contains the fields TaxID and AccountNumber. Now you can join the table literal on two fields to get the desired result. One caveat is that a table literal can only contain 1000 rows. You can read more about T-SQL support for table literals on this page.
Edit: this page indicates that this construct also works in PostgreSQL.

Be careful how to intrepret this. Shark's answer will work, but will return
TaxID AccountNumber
1234 8765
1234 7654
2345 8765
2345 7654
Which might not be what you want... For example, if you only want account number 8765 for tax ID 1234 and 7654 for tax ID 2345, you would need a WHERE clause like this:
WHERE (taxId'1234' and accountnumber='8765') OR
(taxid='2345' and accountNumber='7654')

A crude way would be to concatenate the 2 values together..
e.g.
SELECT *
FROM Accounts
WHERE CAST(TaxID AS VARCHAR(10)) + '-' + CAST(AccountNumber AS VARCHAR(10))
IN ('0123456-2000897', '......', ....)
However, in e.g. SQL Server, this would not be able to use an index.
You could add a computed column that combines both values into 1 and then match on that:
SELECT * FROM Accounts WHERE MyComputedColumn IN ('0123456-2000897', ....)
Or, you could do:
SELECT a.*
FROM Accounts a
JOIN
(
SELECT '0123456' AS TaxID, '2000897' AS AccountNumber
UNION ALL
SELECT '0125556', '2000866'
) x ON a.TaxID = x.TaxID AND a.AccountNumber = x.Number

In Oracle SQL you can just substitute parenthesis for the curly brackets "{}" in the original post (second example). May not be the ANSII standard, but it's close, and it works fine.
Concatenating the values is not recommended, even with uncommon delimiters there's always some tiny risk of it incorrectly matching freely-entered text values. Best not to get in the habit.

Related

Ensuring two columns only contain valid results from same subquery

I have the following table:
id symbol_01 symbol_02
1 abc xyz
2 kjh okd
3 que qid
I need a query that ensures symbol_01 and symbol_02 are both contained in a list of valid symbols. In other words I would needs something like this:
select *
from mytable
where symbol_01 in (
select valid_symbols
from somewhere)
and symbol_02 in (
select valid_symbols
from somewhere)
The above example would work correctly, but the subquery used to determine the list of valid symbols is identical both times and is quite large. It would be very innefficient to run it twice like in the example.
Is there a way to do this without duplicating two identical sub queries?
Another approach:
select *
from mytable t1
where 2 = (select count(distinct symbol)
from valid_symbols vs
where vs.symbol in (t1.symbol_01, t1.symbol_02));
This assumes that the valid symbols are stored in a table valid_symbols that has a column named symbol. The query would also benefit from an index on valid_symbols.symbol
You could try use a CTE like;
WITH ValidSymbols AS (
SELECT DISTINCT valid_symbol
FROM somewhere
)
SELECT mt.*
FROM MyTable mt
INNER JOIN ValidSymbols v1
ON mt.symbol_01 = v1.valid_symbol
INNER JOIN ValidSymbols v2
ON mt.symbol_02 = v2.valid_symbol
From a performance perspective, your query is the right way to do this. I would write it as:
select *
from mytable t
where exists (select 1
from valid_symbols vs
where t.symbol_01 = vs.valid_symbol
) and
exists (select 1
from valid_symbols vs
where t.symbol_02 = vs.valid_symbol
) ;
The important component is that you need an index on valid_symbols(valid_symbol). With this index, the lookup should be pretty fast. Appropriate indexes can even work if valid_symbols is a view, although the effect depends on the complexity of the view.
You seem to have a situation where you have two foreign key relationships. If you explicitly declare these relationships, then the database will enforce that the columns in your table match the valid symbols.

alternatives to using IN clause

I am running the below query:
SELECT
ReceiptVoucherId,
VoucherId,
ReceiptId,
rvtransactionAmount,
AmountUsed,
TransactionTypeId
FROM
[Scratch].[dbo].[LoyaltyVoucherTransactionDetails]
WHERE
VoucherId IN
(2000723,
2000738,
2000774,
2000873,
2000888,
2000924,
2001023,
2001038,
2001074,
2001173)
the aim being to extract the ReceiptVoucherId / VoucherId / ReceiptId / rvtransactionAmount / AmountUsed / TransactionTypeId data for the list of voucherId's that I have.
My problem here is that my list of VoucherID's is 187k long so an IN clause is not possible as it returns the error:
Internal error: An expression services limit has been reached
Can anyone advise on a alternative to doing it this way?
I am using SSMS 2014
You can try the approach:
select from mytable where id in (select id from othertable)
or left join:
select from othertable left join mytable using id
not sure what has better performance, also second query could give you empty rows if it is not declared as foreign key.
fly-by-post, feel free to improve it.
Just create a table containing all this Vouchers (Hopefully you already have one) and then use IN() selecting from the table :
SELECT
ReceiptVoucherId,
VoucherId,
ReceiptId,
rvtransactionAmount,
AmountUsed,
TransactionTypeId
FROM
[Scratch].[dbo].[LoyaltyVoucherTransactionDetails]
WHERE
VoucherId IN (SELECT VoucherId FROM VourchersTable)
insert the vouchers to lookup in a seperate table . lets call it Voucher.
Then this query should do the trick. It does not use the IN Clause. but instead it uses Inner join which will be faster.
SELECT
L.ReceiptVoucherId,
L.VoucherId,
L.ReceiptId,
L.rvtransactionAmount,
L.AmountUsed,
L.TransactionTypeId
FROM
[Scratch].[dbo].[LoyaltyVoucherTransactionDetails] L
INNER JOIN dbo.Vouchers V ON L.VoucherId = V.VoucherId
Maybe the following works for you:
First of all, declare a variable of type table (or alternatively a temp table) and insert your IDs into it.
Modify your Query to
WHERE VoucherID in (SELECT VoucherID FROM #t)
Alternatively (but similar write-intensive for your Hands ;-) ) is the creation of a CTE:
WITH cte AS (SELECT 2000723 UNION ALL SELECT ...)
and again the redesign of your "WHERE... IN..." section.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

TSQL NOT EXISTS Why is this query so slow?

Debugging an app which queries SQL Server 05, can't change the query but need to optimise things.
Running all the selects seperately are quick <1sec, eg: select * from acscard, select id from employee... When joined together it takes 50 seconds.
Is it better to set uninteresting accesscardid fields to null or to '' when using EXISTS?
SELECT * FROM ACSCard
WHERE NOT EXISTS
( SELECT Id FROM Employee
WHERE Employee.AccessCardId = ACSCard.acs_card_number )
AND NOT EXISTS
( SELECT Id FROM Visit
WHERE Visit.AccessCardId = ACSCard.acs_card_number )
ORDER by acs_card_id
Do you have indexes on Employee.AccessCardId, Visit.AccessCardId, and ACSCard.acs_card_number?
The SELECT clause is not evaluated in an EXISTS clause. This:
WHERE EXISTS(SELECT 1/0
FROM EMPLOYEE)
...should raise an error for dividing by zero, but it won't. But you need to put something in the SELECT clause for it to be a valid query - it doesn't matter if it's NULL or a zero length string.
In SQL Server, NOT EXISTS (and NOT IN) are better than the LEFT JOIN/IS NULL approach if the columns being compared are not nullable (the values on either side can not be NULL). The columns compared should be indexed, if they aren't already.

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique