How many items can be placed in an array in WHERE IN? - sql

I have a query
SELECT [whatever] FROM [somewhere] WHERE [someValue] IN [value1, valeue2, ..., valueN]
What is the maximum size for N (from valueN above) in an Oracle 10g database? Could it be as high as 10k or 50k?

If you're using the 'expression list' version of the IN condition, which appears to be the case from your question though you're missing the brackets around the list of values, then you're limited by the expression list itself:
A comma-delimited list of expressions can contain no more than 1000
expressions. A comma-delimited list of sets of expressions can contain
any number of sets, but each set can contain no more than 1000
expressions.
If you're using the subquery version then there is no limit, other than possibly system resources.

Oracle has a fixed limit of 1000 elements for an IN clause as documented in the manual:
http://docs.oracle.com/cd/E11882_01/server.112/e26088/conditions013.htm#i1050801
You can specify up to 1000 expressions in expression_list.

This thread suggests that the limit is 1000. However, I would suggest you don't even go there and instead place your values in a table and turn your query into a subselect. Much neater, more flexible and better performance.

That is depends up on the number of rows you have for that particular column. In some cases it may be millions of records you have in table .

Related

Conditionally LIMIT in BigQuery

I have read that in Postgres setting LIMIT NULL will effectively not limit the results of the SELECT. However in BigQuery when I set LIMIT NULL based on a condition I see Syntax error: Unexpected keyword NULL.
I'd like to figure out a way to limit or not based on a condition (could be an argument passed into a procedure, or a parameter passed in by a query job, anything I can write a CASE or IF statement for). The mechanism for setting the condition shouldn't matter, what I'm looking for is whether there is a way to syntactically indicate a value for LIMIT, that will not limit, in a valid way to BigQuery.
The LIMIT clause works differently within BigQuery. It specifies the maximum number of depression inputs in the result. The LIMIT n must be a constant INT64.
Using the LIMIT clause, you can overcome the limitation on cache result size:
Using filters to limit the result set.
Using a LIMIT clause to reduce the result set, especially if you are
using an ORDER BY clause.
You can see this example:
SELECT
title
FROM
`my-project.mydataset.mytable`
ORDER BY
title DESC
LIMIT
100
This will only return 100 rows.
The best practice is to use it if you are sorting a very large number of values. You can see this document with examples.
If you want to return all rows from a table, you need to omit the LIMIT clause.
SELECT
title
FROM
`my-project.mydataset.mytable`
ORDER BY
title DESC
This example will return all the rows from a table. It is not recommended to omit LIMIT if your tables are too large, as it will consume a lot of resources.
One solution to optimize resources is to use cluster tables. This will save costs and querying times. You can see this document with a detailed explanation of how it works.
You can write a stored procedure that dynamically creates a query based on input parameters. Once your sql query is ready, you can use execute immediate to run that. In this way, you can control what value should be provided to the limit clause of your query.
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#execute_immediate
Hope this answers your query.

Oracle error code ORA-00913 - IN CLAUSE limitation with more than 65000 values (Used OR condition for every 1k values)

My application team is trying to fetch 85,000 values from a table using a SELECT query that is being built on the fly by their program.
SELECT * FROM TEST_TABLE
WHERE (
ID IN (00001,00002, ..., 01000)
OR ID IN (01001,01002, ..., 02000)
...
OR ID IN (84001,84002, ..., 85000)
));
But i am getting an error "ORA-00913 too many values".
If I reduce the in clause to only 65,000 values, I am not getting this error. Is there any limitation of values for the IN CLAUSE (accompanied by OR clause)
The issue isn't about in lists; it is about a limit on the number of or-delimited compound conditions. I believe the limit applies not to or specifically, but to any compound conditions using any combination of or, and and not, with or without parentheses. And, importantly, this doesn't seem to be documented anywhere, nor acknowledged by anyone at Oracle.
As you clearly know already, there is a limit of 1000 items in an in list - and you have worked around that.
The parser expands an in condition as a compound, or-delimited condition. The limit that applies to you is the one I mentioned already.
The limit is 65,535 "atomic" conditions (put together with or, and, not). It is not difficult to write examples that confirm this.
The better question is why (and, of course, how to work around it).
My suspicion: To evaluate such compound conditions, the compiled code must use a stack, which is very likely implemented as an array. The array is indexed by unsigned 16-bit integers (why so small, only Oracle can tell). So the stack size can be no more than 2^16 = 65,536; and actually only one less, because Oracle thinks that array indexes start at 1, not at 0 - so they lose one index value (0).
Workaround: create a temporary table to store your 85,000 values. Note that the idea of using tuples (artificial as it is) allows you to overcome the 1000 values limit for a single in list, but it does not work around the limit of 65,535 "atomic" conditions in an or-delimited compound condition; this limit applies in the most general case, regardless of where the conditions come from originally (in lists or anything else).
More information on AskTom - you may want to start at the bottom (my comments, which are the last ones in the threads):
https://asktom.oracle.com/pls/apex/f?p=100:11:10737011707014::::P11_QUESTION_ID:9530196000346534356#9545388800346146842
https://asktom.oracle.com/pls/apex/f?p=100:11:10737011707014::::P11_QUESTION_ID:778625947169#9545394700346458835

Is it possible to use LIKE with a set of strings instead of a single element?

I have a list of proper names (in a table), and another table with a free-text field. I want to check whether that field contains any of the proper names. If it were just one, I could do
WHERE free_text LIKE "%proper_name%"
but how do you do that for an entire list? Is there a better string function I can use with a list?
Thanks
No, like does not have that capability.
Many databases support regular expressions, which enable to you do what you want. For instance, in Postgres this is phrased as:
where free_text ~ 'name1|name2|name3'
Many databases also have full-text search capabilities that speed such searches.
Both capabilities are highly specific to the database you are using.
Well, you can use LIKE in a standard JOIN, but the query most likely will be slow, because it will search each proper name in each free_text.
For example, if you have 10 proper names in a list and a certain free_text value contains the first name, the server will continue processing the rest of 9 names.
Here is the query:
SELECT -- DISTINCT
free_text_table.*
FROM
free_text_table
INNER JOIN proper_names_table ON free_text_table.free_text LIKE proper_names_table.proper_name
;
If a certain free_text value contains several proper names, that row will be returned several times, so you may need to add DISTINCT to the query. It depends on what you need.
It is possible to use LATERAL JOIN to avoid Cartesian product (where each row in free_text_table is compared to each rows in proper_names_table). The end result may be faster, than the simple variant. It depends on your data distribution.
Here is SQL Server syntax.
SELECT
free_text_table.*
FROM
free_text_table
CROSS APPLY
(
SELECT TOP(1)
proper_names_table.proper_name
FROM proper_names_table
WHERE free_text_table.free_text LIKE proper_names_table.proper_name
-- ORDER BY proper_names_table.frequency
) AS A
;
Here we don't need DISTINCT, there will be at most one row in the result for each row from free_text_table (one or zero). Optimiser should be smart enough to stop reading and processing proper_names_table as soon as the first match is found due to TOP(1) clause.
If you also can somehow order your proper names and put those that are most likely to be found first, then the query is more likely to be faster than a simple JOIN. (Add a suitable ORDER BY clause in subquery).

In Oracle SQL , what is the maximum number of AND clauses in a query?

More of a curious question .. Studying a SQL and I want to know about what is the maximum number of AND clauses:
WHERE condition1
AND condition2
AND condition3
AND condition4
...
AND condition?
...
AND condition_n;
i.e what isthe biggest possible n ? It would seem that since these could be trivial comparisons, the limit it high.
How far can one go before reach limit?
src
Practically, there is no limit.
Most tools will have some limit on the length of the SQL statement that they can deal with. If you want to get really deep into the weeds, though, you could use the dbms_sql package which accepts a collection of varchar2(4000) that comprise a single SQL statement. That would get you up to 2^32 * 4000 bytes. If we assume that every condition is at least 10 bytes, that puts a reasonable upper limit of 400 * 2^32 which is roughly 800 billion conditions. If you're getting anywhere close to that, you're doing something really wrong. Most tools will have limits that kick in well before that.
Of course, if you did create the largest possible SQL statement using dbms_sql, that SQL statement would require ~16 trillion bytes. A single SQL statement that required 16 TB of storage would probably create other issues...
I put together a simple test case:
select * from dual
where 1=1
and 1=1
...
Using SQL*Plus, I was able to run with 100,000 conditions (admittedly, very simple ones) without an issue. I'd find any use case that came even close to approaching that number to be highly suspect...
Not sure about that cause I don't think even in Oracle spec they have defined it but on few factor this would be determined.
length of the query. In few Oracle community/forum post I have read that in oracle 9i the maximum length of an SQL statement is 64k but in later version that limit is not specified rather it's specified saying depends on disk space, memory availability etc.
Again, in few Oracle forum I have read that, Oracle support 1000 element in INLIST (IN (a1,a2,...,a1000)). So it will get converted to 1000 OR condition like a1 OR a2 Or ... OR a1000. With that, my understanding is, if it supports 1000 OR condition; it will be able to cope up with same number of AND condition as well.
But ultimately, I don't think there is any documented limit/upperbound present.

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.