Oracle error code ORA-00913 - IN CLAUSE limitation with more than 65000 values (Used OR condition for every 1k values) - sql

My application team is trying to fetch 85,000 values from a table using a SELECT query that is being built on the fly by their program.
SELECT * FROM TEST_TABLE
WHERE (
ID IN (00001,00002, ..., 01000)
OR ID IN (01001,01002, ..., 02000)
...
OR ID IN (84001,84002, ..., 85000)
));
But i am getting an error "ORA-00913 too many values".
If I reduce the in clause to only 65,000 values, I am not getting this error. Is there any limitation of values for the IN CLAUSE (accompanied by OR clause)

The issue isn't about in lists; it is about a limit on the number of or-delimited compound conditions. I believe the limit applies not to or specifically, but to any compound conditions using any combination of or, and and not, with or without parentheses. And, importantly, this doesn't seem to be documented anywhere, nor acknowledged by anyone at Oracle.
As you clearly know already, there is a limit of 1000 items in an in list - and you have worked around that.
The parser expands an in condition as a compound, or-delimited condition. The limit that applies to you is the one I mentioned already.
The limit is 65,535 "atomic" conditions (put together with or, and, not). It is not difficult to write examples that confirm this.
The better question is why (and, of course, how to work around it).
My suspicion: To evaluate such compound conditions, the compiled code must use a stack, which is very likely implemented as an array. The array is indexed by unsigned 16-bit integers (why so small, only Oracle can tell). So the stack size can be no more than 2^16 = 65,536; and actually only one less, because Oracle thinks that array indexes start at 1, not at 0 - so they lose one index value (0).
Workaround: create a temporary table to store your 85,000 values. Note that the idea of using tuples (artificial as it is) allows you to overcome the 1000 values limit for a single in list, but it does not work around the limit of 65,535 "atomic" conditions in an or-delimited compound condition; this limit applies in the most general case, regardless of where the conditions come from originally (in lists or anything else).
More information on AskTom - you may want to start at the bottom (my comments, which are the last ones in the threads):
https://asktom.oracle.com/pls/apex/f?p=100:11:10737011707014::::P11_QUESTION_ID:9530196000346534356#9545388800346146842
https://asktom.oracle.com/pls/apex/f?p=100:11:10737011707014::::P11_QUESTION_ID:778625947169#9545394700346458835

Related

In Oracle SQL , what is the maximum number of AND clauses in a query?

More of a curious question .. Studying a SQL and I want to know about what is the maximum number of AND clauses:
WHERE condition1
AND condition2
AND condition3
AND condition4
...
AND condition?
...
AND condition_n;
i.e what isthe biggest possible n ? It would seem that since these could be trivial comparisons, the limit it high.
How far can one go before reach limit?
src
Practically, there is no limit.
Most tools will have some limit on the length of the SQL statement that they can deal with. If you want to get really deep into the weeds, though, you could use the dbms_sql package which accepts a collection of varchar2(4000) that comprise a single SQL statement. That would get you up to 2^32 * 4000 bytes. If we assume that every condition is at least 10 bytes, that puts a reasonable upper limit of 400 * 2^32 which is roughly 800 billion conditions. If you're getting anywhere close to that, you're doing something really wrong. Most tools will have limits that kick in well before that.
Of course, if you did create the largest possible SQL statement using dbms_sql, that SQL statement would require ~16 trillion bytes. A single SQL statement that required 16 TB of storage would probably create other issues...
I put together a simple test case:
select * from dual
where 1=1
and 1=1
...
Using SQL*Plus, I was able to run with 100,000 conditions (admittedly, very simple ones) without an issue. I'd find any use case that came even close to approaching that number to be highly suspect...
Not sure about that cause I don't think even in Oracle spec they have defined it but on few factor this would be determined.
length of the query. In few Oracle community/forum post I have read that in oracle 9i the maximum length of an SQL statement is 64k but in later version that limit is not specified rather it's specified saying depends on disk space, memory availability etc.
Again, in few Oracle forum I have read that, Oracle support 1000 element in INLIST (IN (a1,a2,...,a1000)). So it will get converted to 1000 OR condition like a1 OR a2 Or ... OR a1000. With that, my understanding is, if it supports 1000 OR condition; it will be able to cope up with same number of AND condition as well.
But ultimately, I don't think there is any documented limit/upperbound present.

How many items can be placed in an array in WHERE IN?

I have a query
SELECT [whatever] FROM [somewhere] WHERE [someValue] IN [value1, valeue2, ..., valueN]
What is the maximum size for N (from valueN above) in an Oracle 10g database? Could it be as high as 10k or 50k?
If you're using the 'expression list' version of the IN condition, which appears to be the case from your question though you're missing the brackets around the list of values, then you're limited by the expression list itself:
A comma-delimited list of expressions can contain no more than 1000
expressions. A comma-delimited list of sets of expressions can contain
any number of sets, but each set can contain no more than 1000
expressions.
If you're using the subquery version then there is no limit, other than possibly system resources.
Oracle has a fixed limit of 1000 elements for an IN clause as documented in the manual:
http://docs.oracle.com/cd/E11882_01/server.112/e26088/conditions013.htm#i1050801
You can specify up to 1000 expressions in expression_list.
This thread suggests that the limit is 1000. However, I would suggest you don't even go there and instead place your values in a table and turn your query into a subselect. Much neater, more flexible and better performance.
That is depends up on the number of rows you have for that particular column. In some cases it may be millions of records you have in table .

For an Oracle NUMBER datatype, LIKE operator vs BETWEEN..AND operator

Assume mytable is an Oracle table and it has a field called id. The datatype of id is NUMBER(8). Compare the following queries:
select * from mytable where id like '715%'
and
select * from mytable where id between 71500000 and 71599999
I would think the second is more efficient since I think "number comparison" would require fewer number of assembly language instructions than "string comparison". I need a confirmation or correction. Please confirm/correct and throw any further comment related to either operator.
UPDATE: I forgot to mention 1 important piece of info. id in this case must be an 8-digit number.
If you only want values between 71500000 and 71599999 then yes the second one is much more efficient. The first one would also return values between 7150-7159, 71500-71599 etc. and so forth. You would either need to sift through unecessary results or write another couple lines of code to filter the rest of them out. The second option is definitely more efficient for what you seem to want to do.
It seems like the execution plan on the second query is more efficient.
The first query is doing a full table scan of the id's, whereas the second query is not.
My Test Data:
Execution Plan of first query:
Execution Plan of second query:
I don't like the idea of using LIKE with a numeric column.
Also, it may not give the results you are looking for.
If you have a value of 715000000, it will show up in the query result, even though it is larger than 71599999.
Also, I do not like between on principle.
If a thing is between two other things, it should not include those two other things. But this is just a personal annoyance.
I prefer to use >= and <= This avoids confusion when I read the query. In addition, sometimes I have to change the query to something like >= a and < c. If I started by using the between operator, I would have to rewrite it when I don't want to be inclusive.
Harv
In addition to the other points raised, using LIKE in the manner you suggest would cause Oracle to not use any indexes on the ID column due to the implicit conversion of the data from number to character, resulting in a full table scan when using LIKE versus and index range scan when using BETWEEN. Assuming, of course, you have an index on ID. Even if you don't, however, Oracle will have to do the type conversion on each value it scans in the LIKE case, which it won't have to do in the other.
You can use math function, otherwise you have to use to_char function to use like, but it will cause performance problems.
select * from mytable where floor(id /100000) = 715
or
select * from mytable where floor(id /100000) = TO_NUMBER('715') // this is parametric

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?
Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.
It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.
It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.
As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.