How to put more than 1million ID's using union All [duplicate] - sql

I have comma delimited id's that I want to use in NOT IN clause..
I'm using oracle 11g.
select * from table where ID NOT IN (1,2,3,4,...,1001,1002,...)
results in
ORA-01795: maximum number of expressions in a list is 1000
I don't want to use temp table. am trying considering doing this
select * from table1 where ID NOT IN (1,2,3,4,…,1000) AND
ID NOT IN (1001,1002,…,2000)
Is there any other better workaround to this issue?

You said you don't want to, but: use a temporary table. That's the correct solution here.
Query parsing is expensive in Oracle, and that's what you'll get when you put thousands of identifiers into a giant blob of SQL. Also, there are ill-defined limits on query length that you're going to hit. Doing an anti-JOIN against a table, on the other hand... Oracle is good at that. Bulk loading data into a table, Oracle is good at that too. Use a temp table.
Limiting IN to a thousand entries is a sanity check. The fact that you're hitting it means you're trying to do something insane.

Jump out of the question, can you combine the SQL to get more than 1000 IDs with this SQL. That's the better way to simplify your SQLs.

It's insane.
But you can probably try to select from select:
SELECT * FROM
(SELECT * FROM table WHERE ID NOT IN (1,2,3,4,...,1000))
WHERE ID NOT IN (1001,1002,…,2000)
Make as many levels as you need.

Use MINUS, the opposite to `UNION
SELECT * FROM TABLE
MINUS
SELECT T.* FROM TABLE T,TABLE2 T2 WHERE T.ID = T2.ID
This represents registers on table T which id not in table2 t2

Related

SQL Like Operator very slow when using from Another table in AWS Athena

I have SQL query in athena that is very slow when using like operator value from another table
Select * from table1 t1
Where t1.value like (select
concat('%',t2.value,'%') as val
from table2 t2 where t2.id =1
limit 1)
The above query is very slow
When i am using something like below query its working super fast
Select * from table1 t1
Where t1.value like
'%somevalue%'
In my scenario the like value is not fixed it can be changed by the time that's why i need to use this value from another table.
Please suggest fastest way
"Slow" is a relative term, but a query that joins two tables will always be slower than a query that doesn't. A query that compares against a pattern that needs to be looked up in another table at query time will always be slower than a query that uses a static pattern.
Does that mean that the second query is slow? Perhaps, but it you have to base that on what you're actually asking the query engine to do.
Let's dissect what your query is doing:
The outer query looks for all columns of all rows of the first table where one of the columns contains a particular string.
That string is dynamically looked up by scanning every row in the second table looking for a row with a particular value for the id column.
In other words, the first query scans only the first table but the second scans both tables. That's always going to be slower, because it's doing a lot more work. How much more work? That depends on the sizes of the tables. You aren't specifying the running times of any of the queries or the sizes of the tables, so it's hard to know.
You don't provide enough context in your question to answer any more precise than this. We can only respond with generalities like: if it's slow then don't use LIKE, that's a slow operation. Don't do a correlated subquery that reads the whole second table, that's slow.
i have found other method to the same and it's super faster in Athena
Select * from table1 t1
Where POSITION ( (select concat('%',t2.value,'%') as val from table2 t2 where t2.id =1 limit 1) in t1.value )>0

Difference between two tables, unknown fields

Is there a way in Access using SQL to get the difference between 2 tables?
I'm building an audit function and I want to return all records from table1 where a value (or values) doesn't match the corresponding record in table2. Primary keys will always match between the two tables. They will always contain the exact same number of fields, field names, and types, as each other. However, the number and name of those fields cannot be determined before the query is run.
Please also note, I am looking for an Access SQL solution. I know how to solve this with VBA.
Thanks,
There are several possibilities to compare fields with known names, but there is no way in SQL to access fields without knowing their name. Mostly becase SQL doesn't consider fields to have a specific order in a table.
So the only way to accomplish what you need in pure Access-SQL would be, if there was a SQL-Command for it (kind of like the * as placeholder for all fields). But there isn't. Microsoft Access SQL Reference.
What you COULD do is create an SQL-clause on the fly in VBA. (I know, you said you didn't want to do it in VBA - but this is doing it in SQL, but using VBA to create the SQL..).
Doing everything in VBA would probably take some time, but creating an SQL on the fly is very fast and you can optimize it to the specific table. Then executing the SQL is the fastest solution you can get.
Not sure without your table structure but you can probably get that done using NOT IN operator (OR) using WHERE NOT EXISTS like
select * from table1
where some_field not in (select some_other_field from table2);
(OR)
select * from table1 t1
where not exists (select 1 from table2 where some_other_field = t1.some_field);
SELECT A.*, B.* FROM A FULL JOIN B ON (A.C = B.C) WHERE A.C IS NULL OR B.C IS NULL;
IF you have tables A and B, both with colum C, here are the records, which are present in table A but not in B.To get all the differences with a single query, a full join must be used,like above

Optimizing an Oracle SQL query which uses IN clause extensively

I maintain an application where I am trying to optimize an Oracle SQL query wherein multiple IN clauses are used. This query is now a blocker as it hogs nearly 3 minutes of execution time and affects application performance severely.The query is called from Java code(JDBC) and looks like this :
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1 and not(col1 in (idsetone1,idsetone2,... idsetoneN)) or
(col1 in(idsettwo1,idsettwo2,...idsettwoN))....
(col1 in(idsetN1,idsetN2,...idsetNN))
The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible. ID sets have grown over time with use of the application and currently they number more than 10,000 records.
How can I start with optimizing this query ?
I really doupt about "The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible." Of course you can join the tables, provided you got select privileges on it.
Anyway, let's assume it is not possible due to whatever reason. One solution could be to insert all entries first into a Nested Table and the use this one:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1
and not (col1 NOT MEMBER OF (NUMBER_TABLE_TYPE(idsetone1,idsetone2,... idsetoneN))
OR
(col1 MEMBER OF NUMBER_TABLE_TYPE(idsettwo1,idsettwo2,...idsettwoN))
Regarding the max. number of elements Oracle Documentation says: Because a nested table does not have a declared size, you can put as many elements in the constructor as necessary.
I don't know how serious you can take this statement.
You should put all the items into one temporary table and to an explicit join:
Select your cols
from Table1
left join table_with_items
on table_with_items.id = Table1.col1
where table_with_items.id is null;
Also that distinct suggest a problem in your business logic or in the architecture of application. Why do you have duplicate ids? You should get rid of that distinct.

SQL WHERE ID IN (id1, id2, ..., idn)

I need to write a query to retrieve a big list of ids.
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
3) Writing a programmatic solution:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
What would be a correct solution for this problem?
Option 1 is the only good solution.
Why?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
What Ed Guiness suggested is really a performance booster , I had a query like this
select * from table where id in (id1,id2.........long list)
what i did :
DECLARE #temp table(
ID int
)
insert into #temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
First option is definitely the best option.
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select
Why should you divide into chunks?
You will never get memory overflow exception which is very common in scenarios like yours.
You will have optimized number of database calls resulting in better performance.
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
Doing this instead returned results immediately:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
You may want to read this articles:
Passing parameters in MySQL: IN list vs. temporary table
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
For 1st option
Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column)
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)
Try this
SELECT Position_ID , Position_Name
FROM
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

Sql Un-Wizardry: Compare values from one list in another

I have a comparison I'd like to make more efficient in SQL.
The input field (fldInputField) is a comma separated list of "1,3,4,5"
The database has a field (fldRoleList) which contains "1,2,3,4,5,6,7,8"
So, for the first occurrence of fldInputField within fldRoleList, tell us which value it was.
Is there a way to achieve the following in MySQL or a Stored Procedure?
pseudo-code
SELECT *
FROM aTable t1
WHERE fldInputField in t1.fldRoleList
/pseudo-code
I'm guessing there might be some functions that are best suited for this type of comparison? I couldn't find anything in the search, if someone could direct me I'll delete the question... Thanks!
UPDATE: This isn't the ideal (or good) way to do things. It's inherited code and we are simply trying to put in a quick fix while we look at building in the logic to deal with this via normalized rows.. Luckily this isn't heavily used code.
I agree with #Ken White's answer that comma-delimited lists have no place in a normalized database design.
The solution would be simpler and perform better if you stored the fldRoleList as multiple rows in a dependent table:
SELECT t1.*, r1.fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
(see the MySQL function FIND_IN_SET())
But that outputs multiple rows if multiple roles match the comma-separated input string. If you need to restrict the result to one row per aTable entry, with the first matching role:
SELECT t1.*, MIN(r1.fldRole) AS First_fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
GROUP BY t1.aTable_id;
You have a terrible schema design, you know. Comma-delimited lists have no business in a DB.
That being said... You're looking for LIKE.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE fldInputField + '%'
If the content might not always match at the beginning, add another percent sign before fldInputField.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE '%' + fldInputField + '%'