In clause versus OR clause performance wise - sql

I have a query as below:
select *
from table_1
where column_name in ('value1','value2','value3');
considering that the data in such a table may be in millions, will the below restructuring help better??
select *
from table_1 where
column_name = 'value1'
or column_name = 'value2'
or column_name ='value3';
or
select *
from table_1
where column_name = any ('value1','value2','value3');
I need to know performance benefits also if possible.
Thanks in advance

the query doesn't matter much in case of 3 value checking only.
Oracle will re-write the query anyways to match the best option available.
in case there were more values and that too dynamic then the in clause or inner join could have been better.
its best to leave the query as it is currently

There is a 3rd way which is faster than 'IN' or multiple 'WHERE' conditions:
select *
from table_1 as tb1
inner join table_2 as tb2
where tb1.column_name = tb2.column_name
Here table_2 (or query) would have required values that were listed in 'IN' and 'WHERE' conditions in your example.

Related

SQL join performance operation order

I am trying to come up with how to order a join query to improve its performance.
Lets say we have two tables to join, to which some filters must be applied.
Is it the same to do:
table1_result = select * from table1 where field1 = 'A';
table2_result = select * from table2 where field1 = 'A';
result = select * from table1 as one inner join table2 as two on one.field1 = two.field1;
to doing this:
result = select * from table1 as one inner join table2 as two on one.field1 = two.field1
where one.field1 = 'A' and two.field1 = 'A';
or even doing this:
result = select * from table1 as one inner join table2 as two on one.field1 = two.field1 and one.field1 = 'A';
Thank you so much!!
Some common optimization techniques to improve your queries are here:
Index the columns used in joining. If they are foreign keys, normally databases like MySql already index them.
Index the columns used in conditions or WHERE clause.
Avoid * and explicitly select the columns that you really need.
The order of joining in most of the cases won't matter, because DB-Engines are inteligent enough to decide that.
So its better to analyze your structure of both the joining tables, have indexes in place.
And if anyone is further intrested, how changing conditions order can help getting the better performance. I've a detailed answer over here mysql Slow query issue.

How to make Query with IN in WHERE clause run faster

I have this query in Oracle11g :
SELECT DOC_ID,DOC_NAME,DESC
FROM TABLE1
WHERE DOC_ID NOT IN(
SELECT DOC_ID FROM TABLE2
)
The sql query above run very slow since i have so many data in the tables.
Is there any solution to get the same result with better performance and run faster?
Any help much appreciated.
Thanks.
Using WHERE EXISTS may have better performance:
SELECT DOC_ID,DOC_NAME,DESCr
FROM TABLE1 t1
WHERE not exists (
SELECT 1 FROM TABLE2 where
doc_id = t1.doc_id
);
Example: http://sqlfiddle.com/#!4/4b59e/3
I wouldn't use the in statement for that. If you join on what I imagine is one of your keys it should be much much faster:
select tb1.DOC_ID, tb1.DOC_NAME, tb1.DESC
from table1 tb1
left join table2 tb2
on tb1.DOC_ID = tb2.DOC_ID
where tab2.DOC_ID is not null

What is a better way of writing this oracle sql query?

Somehow, this does not seem very efficient.. Can this be optimized of made more efficient?
SELECT DISTINCT p.col1 from table1 p where p.col1 not in
(SELECT DISTINCT o.col1 from table1 o where o.col2 = 'ABC')
For ex, select all supermarkets that do not have product = soap
You want all col1 values where col2 is never 'ABC'. You can approach this with aggregation:
select p.col1
from table1 p
group by p.col1
having sum(case when p.col2 = 'ABC' then 1 else 0 end) = 0;
Why is this faster? Well, there are cases where it won't be. But it often will be. A select distinct is doing an aggregation anyway. So, other methods that use join's or in are adding extra work. Now, this extra work is worth it if they significantly reduce the amount of data being processed.
Also, not in is dangerous semantically. If any values of col1 are NULL whenever col2 = 'ABC', then all data will be filtered out. That is, the query will return no rows at all. That can be sped up a great deal! This formulation assumes that col1 is never NULL in this case.
Finally, if you have a list of col1 values that is already unique, then the fastest method is probably:
select c.col1
from col1table c
where not exists (select 1 from table1 o where o.col1 = c.col1 and o.col2 = 'ABC')
For this query, an index on table1(col1, col2) is optimal for performance.
Did you try just querying with a not clause?
i.e.
select distinct col1 from table1 where col2 <> 'ABC'
I would structure that along the lines of:
select supermarkets.*
from supermarkets
where not exists (
select 1
from product_in_supermarkets
where product_in_supermarkets.supermarket_id = supermarkets.id and
product_in_supermarkets.product_type = 'soap')
Have an index on:
product_in_supermarkets(supermarket_id, product_type)
for best performance.
Now having said that, it could be that under the right circumstances a NOT EXISTS and a NOT IN query get transformed to be the same, and an anti-join would be executed. Semantically I like the correlated subquery with not exists, as I think it better represents the intent of the query.
NOT IN is also susceptible to unexpected effects should there be a null value in the projection from the subquery, as no value can be said to be not in a list that includes NULL (including NULL).
I think you should consider creating an index on col1.
I'd also try using
select distinct p.col1 from table1 p where not exists
(select distinct o.col1 from table1 o where o.col1 = p.col1 and o.col2 = 'ABC');
Also, depending on the amount of rows and data entropy, sometimes avoiding the distinct from the inner query can be an useful trade-off.

Use join with a table and SQL Statement

Joins are usually used to fetch data from 2 tables using a common factor from either tables
Is it possible to use a join statement using a table and results of another SQL statement and if it is what is the syntax
Sure, this is called a derived table
such as:
select a.column, b.column
from
table1 a
join (select statement) b
on b.column = a.column
keep in mind that it will run the select for the derived table in entirety, so it can be helpful if you only select things you need.
EDIT: I've found that I rarely need to use this technique unless I am joining on some aggregated queries.... so I would carefully consider your design here.
For example, thus far most demonstrations in this thread have not required the use of a derived table.
It depends on what the other statement is, but one of the techniques you can use is common table expressions - this may not be available on your particular SQL platform.
In the case of SQL Server, if the other statement is a stored procedure, you may have to insert the results into a temporary table and join to that.
It's also possible in SQL Server (and some other platforms) to have table-valued functions which can be joined just like a view or table.
select *
from TableA a
inner join (select x from TableB) b
on a.x = b.x
Select c.CustomerCode, c.CustomerName, sq.AccountBalance
From Customers c
Join (
Select CustomerCode, AccountBalance
From Balances
)sq on c.CustomerCode = sq.CustomerCode
Sure, as an example:
SELECT *
FROM Employees E
INNER JOIN
(
SELECT EmployeeID, COUNT(EmployeeID) as ComplaintCount
FROM Complaints
GROUP BY EmployeeID
) C ON E.EmployeeID = C.EmployeeID
WHERE C.ComplaintCount > 3
It is. But what specifically are you looking to do?
That can be done with either a sub-select, a view or a temp table... More information would help us answer this question better, including which SQL software, and an example of what you'd like to do.
Try this:
SELECT T1.col1, t2.col2 FROM Table1 t1 INNER JOIN
(SELECT col1, col2, col3 FROM Table 2) t2 ON t1.col1 = t2.col1

how to select fields from different db's with the same table and field name

I have two databases, for argument sake lets call them db1 and db2. they are both structured exactly the same and both have a table called table1 which both have fields id and value1.
My question is how do I do a query that selects the field value1 for both tables linked by the same id???
You can prefix the table names with the database name to identify the two similarly named tables. You can then use that fully qualified table name to refer to the similarly named fields.
So, without aliases:
select db1.table1.id, db1.table1.value1, db2.table1.value1
from db1.table1 inner join db2.table1 on db1.table1.id = db2.table1.id
and with aliases
select t1.id, t1.value1, t2.value1
from db1.table1 as t1 inner join db2.table1 as t2 on t1.id = t2.id
You may also want to alias the selected columns so your select line becomes:
select t1.id as id, t1.value1 as value_from_db1, t2.value1 as value_from_db2
This is T-Sql, but I can't imagine mysql would be that much different (will delete answer if that's not the case)
SELECT
a.Value1 AS [aValue]
,b.Value1 AS [bValue]
FROM
db1.dbo.Table1 a
INNER JOIN db2.dbo.Table1 b
ON a.Id = b.Id
Try something such as this.
$dbhost="server_name";
$dbuser1="user1";
$dbpass1="password1";
$dbname1="database_I";
$dbname2="database_II";
$db1=mssql_connect($dbhost,$dbuser1,$dbpass1);
mssql_select_db($dbname1,$db1);
$query="SELECT ... FROM database_I.table1, database_II.table2 WHERE ....";
etc. Sorry if this does not help.
There is an easy way in sql. Extend your syntax for FROM clause, so instead of using select ... from tablename, use
select ... from database.namespace.tablename
The default namespace is dbo.
You could use a union select:
Simple example:
select "one" union select "two";
This will return 2 rows, the first row contains one and the 2nd row contains two. It is as if you are concatenating 2 sql quires, the only constant is that they both must return the same number of columns.
Multiple databases:
select * from client_db.users where id=1 union select * from master_db.users where id=1;
In this case both users databases must have the same number of columns. You said they have the same structure, so you shouldn't have a problem.