How to make Query with IN in WHERE clause run faster - sql

I have this query in Oracle11g :
SELECT DOC_ID,DOC_NAME,DESC
FROM TABLE1
WHERE DOC_ID NOT IN(
SELECT DOC_ID FROM TABLE2
)
The sql query above run very slow since i have so many data in the tables.
Is there any solution to get the same result with better performance and run faster?
Any help much appreciated.
Thanks.

Using WHERE EXISTS may have better performance:
SELECT DOC_ID,DOC_NAME,DESCr
FROM TABLE1 t1
WHERE not exists (
SELECT 1 FROM TABLE2 where
doc_id = t1.doc_id
);
Example: http://sqlfiddle.com/#!4/4b59e/3

I wouldn't use the in statement for that. If you join on what I imagine is one of your keys it should be much much faster:
select tb1.DOC_ID, tb1.DOC_NAME, tb1.DESC
from table1 tb1
left join table2 tb2
on tb1.DOC_ID = tb2.DOC_ID
where tab2.DOC_ID is not null

Related

Pass values as parameter from select query

I want to pass values from output of select query to another query. Basically both queries will be part of a stored procedure. e.g.
select Id, RelId
from tables
There will be multiple rows returned by above query and I want to pass them to the following query
select name
from table2
where Id = #Id and MgId = #RelId
Please suggest
You cannot pass multiple values in SQL.
But maybe you can just join your 2 tables, that would be far more efficient.
Not knowing your table schemes I suggest something like this. You might have to adapt this to your actual table schemas off course
select name
from table2 t2
inner join tables t on t2.Id = t.Id
and t2.MgId = t.RelId
EDIT
As Gordon mentioned in his answer, this approach can show double rows in your result.
If you don't want that than here are 2 ways of getting rid of the doubles
select distinct name
from ...
or by grouping by adding this at the end of the statement
group by name
Though this will work, avoiding the doubles like in Gordon's answer is better
I would suggest using exists:
select t2.name
from table2 t2
where exists (select 1
from tables t
where t2.Id = t.Id and t2.MgId = t.RelId
);
The difference between exists and join is that this will not generate duplicates, if there are multiple matches between the tables.
Or...
SELECT *
INTO #Table1
FROM ...
SELECT *
INTO #Table2
FROM ...
SELECT *
FROM #Table1 T1
JOIN #Table2 T2
DROP TABLE #Table1, #Table2

Run UPDATE query for every row in another SELECT

I have an SQL Query that can return from 0 to let's say 20 results. Here is for example:
SELECT value_id FROM table1 t1
INNER JOIN table2 t2 ON ....
INNER JOIN table3 t3 ON ....
WHERE ....
Then, I want to run for each value_id an UPDATE query. Let's say:
UPDATE table4
SET new_value = 1
WHERE value_id IN (SELECT value_id FROM table1 t1
INNER JOIN table2 t2 ON ....
INNER JOIN table3 t3 ON ....
WHERE ....)
Can a subquery work on this? Is it performance efficient or there is another way?
Your query is fine. The performance depends on how your database is structured. For instance, if the SELECT runs fast, then the UPDATE should be pretty fast (not as fast: there is more overhead for the UPDATE).
So, the answer to your question is: Yes, a subquery can work like this. Test the SELECT version (with table4) to get an idea of the affect on performance.

Unable to understand query

I am working on an SSIS job that contains a complex query.
It has some thing like :
some sql statements
left outer join
(
select query joining two more tables )
table1
here, i am unable to understand what that table1 mean ? Is it a kind of temporary view
created . This table1 is used in the other parts of query . But, actually the table1 does
exists in the database.
Is it like , the results of the select query in the parenthesis is created as table1
Please clarify me on this..
I am not able to put down my code because of Security Policies
Here is SQL Fiddel example
Below is the sample query
Select Temp1.id,Table1.id Table1_id
from Temp1
left Outer join
(
Select Temp2.id
from Temp2
join Temp3
On Temp2.id = Temp3.id
) Table1
on Temp1.id = Table1.Id
In above example table1 is the Alias for data coming from joinsof two tables (temp2 and temp3)
table1 is an alisas your subquery. It's the name of subquery you can use with columns for example table1.col1
It is an alias for the query in the parenthesis.
If you would remove that you would get an error.
Aliases are also good when you have the same column in more than on joined tables, so you can distinquish them.
For instance if colX is both in Table1 and Table2 you would have a query like:
SELECT T1.colX,T2.colX
FROM Table1 T1
JOIN Table2 T2
ON T1.id = T2.id

In SQL, how can I perform a "subtraction" operation?

Suppose I have two tables, which both have user ids. I want to perform an operation that would return all user IDS in table 1 that are not in table 2. I know there has to be some easy way to do this - can anyone offer some assistance?
Its slow, but you can normally accomplish this with something like 'not in'. (There are other functions in various RDBMS systems to do this in better ways, Oracle for instance has a 'exists' clause that can be used for this.
But you could say:
select id from table1 where id not in (select id from table2)
There are a few ways to do it. Here's one approach using NOT EXISTS:
SELECT userid
FROM table1
WHERE NOT EXISTS
(
SELECT *
FROM table2
WHERE table1.userid = table2.userid
)
And here's another approach using a join:
SELECT table1.userid
FROM table1
LEFT JOIN table2
ON table1.userid = table2.userid
WHERE table2.userid IS NULL
The fastest approach depends on the database.
One way is to use EXCEPT if your TSQL dialect supports it. It is equivalent to performing a left join and null test
SELECT user_id FROM table1 LEFT JOIN table2 ON table1.user_id = table2.user_id WHERE table2.user_id IS NULL;
If it is
SQL Server:
SELECT id FROM table1
EXCEPT
SELECT id FROM table2
Oracle:
SELECT id FROM table1
MINUS
SELECT id FROM table2
Rest: Am not sure....
Try this:
SELECT id FROM table1 WHERE id NOT IN
(
SELECT id FROM table2
)
select ID from table1
where ID not in (select ID from table2)

SQL query: how to translate IN() into a JOIN?

I have a lot of SQL queries like this:
SELECT o.Id, o.attrib1, o.attrib2
FROM table1 o
WHERE o.Id IN (
SELECT DISTINCT Id
FROM table1
, table2
, table3
WHERE ...
)
These queries have to run on different database engines (MySql, Oracle, DB2, MS-Sql, Hypersonic), so I can only use common SQL syntax.
Here I read, that with MySql the IN statement isn't optimized and it's really slow, so I want to switch this into a JOIN.
I tried:
SELECT o.Id, o.attrib1, o.attrib2
FROM table1 o, table2, table3
WHERE ...
But this does not take into account the DISTINCT keyword.
Question: How do I get rid of the duplicate rows using the JOIN approach?
To write this with a JOIN you can use an inner select and join with that:
SELECT o.Id, o.attrib1, o.attrib2 FROM table1 o
JOIN (
SELECT DISTINCT Id FROM table1, table2, table3 WHERE ...
) T1
ON o.id = T1.Id
I'm not sure this will be much faster, but maybe... you can try it for yourself.
In general restricting yourself only to SQL that will work on multiple databases is not going to result in the best performance.
But this does not take into account
the DISTINCT keyword.
You do not need the distinct in the sub-query. The in will return one row in the outer query regardless of whether it matches one row or one hundred rows in the sub-query. So, if you want to improve the performance of the query, junking that distinct would be a good start.
One way of tuning in clauses is to rewrite them using exists instead. Depending on the distribution of data this may be a lot more efficient, or it may be slower. With tuning, the benchmark is king.
SELECT o.Id, o.attrib1, o.attrib2
FROM table1 o
WHERE EXISTS (
SELECT Id FROM table1 t1, table2 t2, table3 t3 WHERE ...
AND ( t1.id = o.id
or t2.id = o.id
or t3.id = o.id
)
Not knowing your business logic the precise formulation of that additional filter may be wrong.
Incidentally I notice that you have table1 in both the outer query and the sub-query. If that is not a mistake in transcribing your actual SQL to here you may want to consider whether that makes sense. It would be better to avoid querying that table twice; using exists make make it easier to avoid the double hit.
SELECT DISTINCT o.Id, o.attrib1, o.attrib2
FROM table1 o, table2, table3
WHERE ...
Though if you need to support a number of different database back ends you probably want to give each its own set of repository classes in your data layer, so you can optimize your queries for each. This also gives you the power to persist in other types of databases, or xml, or web services, or whatever should the need arise down the road.
I'm not sure to really understand what is your problem. Why don't you try this :
SELECT distinct o.Id, o.attrib1, o.attrib2
FROM
table1 o
, table o1
, table o2
...
where
o1.id1 = o.id
or o2.id = o.id