SQL multiple columns in IN clause - sql

If we need to query a table based on some set of values for a given column, we can simply use the IN clause.
But if query need to be performed based on multiple columns, we could not use IN clause(grepped in SO threads.)
From other SO threads, we can circumvent this problem using joins or exists clause etc. But they all work if both main table and search data are in the database.
E.g
User table:
firstName, lastName, City
Given a list of (firstname, lastName) tuples, I need to get the cities.
I can think of following solutions.
1
Construct a select query like,
SELECT city from user where (firstName=x and lastName=y) or (firstName=a and lastName=b) or .....
2
Upload all firstName, lastName values into a staging table and perform a join between 'user' table and the new staging table.
Are there any options for solving this problem and what is the preferred of solving this problem in general?

You could do like this:
SELECT city FROM user WHERE (firstName, lastName) IN (('a', 'b'), ('c', 'd'));
The sqlfiddle.

It often ends up being easier to load your data into the database, even if it is only to run a quick query. Hard-coded data seems quick to enter, but it quickly becomes a pain if you start having to make changes.
However, if you want to code the names directly into your query, here is a cleaner way to do it:
with names (fname,lname) as (
values
('John','Smith'),
('Mary','Jones')
)
select city from user
inner join names on
fname=firstName and
lname=lastName;
The advantage of this is that it separates your data out of the query somewhat.
(This is DB2 syntax; it may need a bit of tweaking on your system).

In Oracle you can do this:
SELECT * FROM table1 WHERE (col_a,col_b) IN (SELECT col_x,col_y FROM table2)

In general you can easily write the Where-Condition like this:
select * from tab1
where (col1, col2) in (select col1, col2 from tab2)
Note
Oracle ignores rows where one or more of the selected columns is NULL. In these cases you probably want to make use of the NVL-Funktion to map NULL to a special value (that should not be in the values):
select * from tab1
where (col1, NVL(col2, '---') in (select col1, NVL(col2, '---') from tab2)
oracle sql

Ensure you have an index on your firstname and lastname columns and go with 1. This really won't have much of a performance impact at all.
EDIT: After #Dems comment regarding spamming the plan cache ,a better solution might be to create a computed column on the existing table (or a separate view) which contained a concatenated Firstname + Lastname value, thus allowing you to execute a query such as
SELECT City
FROM User
WHERE Fullname in (#fullnames)
where #fullnames looks a bit like "'JonDoe', 'JaneDoe'" etc

Determine whether the list of names is different with each query or reused. If it is reused, it belongs to the database.
Even if it is unique with each query, it may be useful to load it to a temporary table (#table syntax) for performance reasons - in that case you will be able to avoid recompilation of a complex query.
If the maximum number of names is fixed, you should use a parametrized query.
However, if none of the above cases applies, I would go with inlining the names in the query as in your approach #1.

Related

Optimizing an Oracle SQL query which uses IN clause extensively

I maintain an application where I am trying to optimize an Oracle SQL query wherein multiple IN clauses are used. This query is now a blocker as it hogs nearly 3 minutes of execution time and affects application performance severely.The query is called from Java code(JDBC) and looks like this :
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1 and not(col1 in (idsetone1,idsetone2,... idsetoneN)) or
(col1 in(idsettwo1,idsettwo2,...idsettwoN))....
(col1 in(idsetN1,idsetN2,...idsetNN))
The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible. ID sets have grown over time with use of the application and currently they number more than 10,000 records.
How can I start with optimizing this query ?
I really doupt about "The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible." Of course you can join the tables, provided you got select privileges on it.
Anyway, let's assume it is not possible due to whatever reason. One solution could be to insert all entries first into a Nested Table and the use this one:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1
and not (col1 NOT MEMBER OF (NUMBER_TABLE_TYPE(idsetone1,idsetone2,... idsetoneN))
OR
(col1 MEMBER OF NUMBER_TABLE_TYPE(idsettwo1,idsettwo2,...idsettwoN))
Regarding the max. number of elements Oracle Documentation says: Because a nested table does not have a declared size, you can put as many elements in the constructor as necessary.
I don't know how serious you can take this statement.
You should put all the items into one temporary table and to an explicit join:
Select your cols
from Table1
left join table_with_items
on table_with_items.id = Table1.col1
where table_with_items.id is null;
Also that distinct suggest a problem in your business logic or in the architecture of application. Why do you have duplicate ids? You should get rid of that distinct.

I would like to treat on the momentary table or inner table of oracle by making two or more values into a line.

Since I am a Japanese, I am poor at English.
Please understand the situation.
There is the following as indispensable requirements.
This requirement is unchangeable.
I know only ID of two or more values.
This number is over 500000.
It acquires early at low cost by 1 time of SQL.
The index is created by id and it is optimized.
The following SQL queries think of me by making these things into the method of taking as a search condition.
select *
from emp
where id in(1,5,7,8.....)
or id in(5000,5002....)
It divides 1000 affairs at a time by "in" after above where.
However, processing takes most time in case of this method.
As a result of investigating many things, it turned out that it is processing time earlier to specify conditions by "exists" rather than having specified conditions by "in".
In order to use "exists", you have to ask by a subquery.
However, it calls by a subquery well by what kind of method, or I cannot imagine.
Someone should teach a good method.
Thank you for your consideration.
If my understanding is correct, you are trying to do this:
select * from emp where emp in (list of several thousand values)
Because oracle only support lists of 1000 values in that construct your code uses a union.
Suggested solution:
Create a global temporary table with an id field the same size as emp.id
Insert the id:s you want to find in this table
Join against this table in your select
create global temporary table temp_id (id number) on commit delete rows;
Your select code can be replaced by:
<code to insert the emp.id:s you want to search for>
select * from emp inner join temp_id tmp on emp.id = temp_id.id;

SQL WHERE ID IN (id1, id2, ..., idn)

I need to write a query to retrieve a big list of ids.
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
3) Writing a programmatic solution:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
What would be a correct solution for this problem?
Option 1 is the only good solution.
Why?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
What Ed Guiness suggested is really a performance booster , I had a query like this
select * from table where id in (id1,id2.........long list)
what i did :
DECLARE #temp table(
ID int
)
insert into #temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
First option is definitely the best option.
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select
Why should you divide into chunks?
You will never get memory overflow exception which is very common in scenarios like yours.
You will have optimized number of database calls resulting in better performance.
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
Doing this instead returned results immediately:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
You may want to read this articles:
Passing parameters in MySQL: IN list vs. temporary table
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
For 1st option
Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column)
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)
Try this
SELECT Position_ID , Position_Name
FROM
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

How do I return ALL columns in one table and just a few in another using join?

I would rather not have to list all columns in tableA The '*' works for one table, but I don't want to get back all columns in tableB from JOIN. Reason being, these records are being deleted, and I want to store data from tableA (only) as serialized xml for period of time.
select tableA.*, tableB.col1, tableB.col2, ...
It is a poor practice to ever use select * or select table1.*. It is bad for maintenance and performance both. You should never do that in production code.
Just use the column names that you want.

Sql Un-Wizardry: Compare values from one list in another

I have a comparison I'd like to make more efficient in SQL.
The input field (fldInputField) is a comma separated list of "1,3,4,5"
The database has a field (fldRoleList) which contains "1,2,3,4,5,6,7,8"
So, for the first occurrence of fldInputField within fldRoleList, tell us which value it was.
Is there a way to achieve the following in MySQL or a Stored Procedure?
pseudo-code
SELECT *
FROM aTable t1
WHERE fldInputField in t1.fldRoleList
/pseudo-code
I'm guessing there might be some functions that are best suited for this type of comparison? I couldn't find anything in the search, if someone could direct me I'll delete the question... Thanks!
UPDATE: This isn't the ideal (or good) way to do things. It's inherited code and we are simply trying to put in a quick fix while we look at building in the logic to deal with this via normalized rows.. Luckily this isn't heavily used code.
I agree with #Ken White's answer that comma-delimited lists have no place in a normalized database design.
The solution would be simpler and perform better if you stored the fldRoleList as multiple rows in a dependent table:
SELECT t1.*, r1.fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
(see the MySQL function FIND_IN_SET())
But that outputs multiple rows if multiple roles match the comma-separated input string. If you need to restrict the result to one row per aTable entry, with the first matching role:
SELECT t1.*, MIN(r1.fldRole) AS First_fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
GROUP BY t1.aTable_id;
You have a terrible schema design, you know. Comma-delimited lists have no business in a DB.
That being said... You're looking for LIKE.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE fldInputField + '%'
If the content might not always match at the beginning, add another percent sign before fldInputField.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE '%' + fldInputField + '%'