Query with IN operator not performing well in SQL Server - sql

I have a query in SQL Server something like this.
select sum (col1)
from TableA
where col2 = ?
and col3 = ?
and col4 = in (?, ?, ?... ?)
TableA has a composite index on (col2, col3, col4).
This query is not performing well when the size is increasing in the list of the IN operator.
Is there a good way to rewrite this query for better performance?
List can grow from 1 to 300 items.

Your index should be pretty good. For this query:
select sum(col1)
from TableA
where col2 = ? and
col3 = ? and
col4 in (?, ?, ?... ?);
The optimal index is (col2, col3, col4, col1) (the last column can be included if you prefer.
For the index to be used, you need to be sure that the types are compatible for the comparisons. Conversions -- and changes in collation -- preclude the use of indexes.

Related

Best way to compare three columns in sql Hive

I need to do some comparison through 3 columns containing string dates 'yyyy-mm-dd', in Hive SQL. Please take in consideration that the table has more than 2 million records.
Consider three columns (col1; col2; col3) from table T1, I must guarantee that:
col1 = col2, and both, or at least one is different from col3.
My best regards,
Logically you have an issue.
col1 = col2
Therefore if col1 != col3 then col2 != col3;
There for it's really enough to use:
select * from T1 where col1 = col2 and col1 != col3;
It is appropriate to do this map side so using a where criteria is likely good enough.
If you wanted to say 2 out of the 3 need to match you could use group by with having to reduce comparisons.

SQL parametrised query containing multiple options

I would like to write a query
Select col1, col2
from table
where col1 = 'blah' or 'blah2' or 'blah3'
and col2 = 'blah' or 'blah2' or 'blah3'
I am used to writing them like this for a SINGLE option
select
col1, col2
from
table
where
col1 = :col1 and col2 = :col2
Parameters.AddWithValue(":col1", 'blah')
Parameters.AddWithValue(":col2", 'blah')
Now I want to add several options with OR between them and obviously the above code wont work. The SQL is for SQLite. Can anyone suggest how I could do this? I may potential have more then 3 different values for each parameter. I have tried searching but the answer is elusive.
You still have to use complete expressions, i.e., you need to write col1 = or col2 = every time.
Alternative, use IN:
SELECT ... WHERE col1 IN (:c11, :c12, :c13) AND col2 IN (:c21, :c22, :c23);

Query to retrieve data based on many composite keys

Unfortunately the db I am stuck supporting contains little to no surrogate/primary keys, and there's no chance to add one. So I'm left with using composite keys.
I'd like to query a single table with a list of composite keys. I can have as many as 5k composite keys. So how can I do this? The below query works, but I would have to build it dynamically and is not something I've ever seen done or had to do. It seems there should be a better way to do this...
select * from dog_manners
where
col1 NOT IN ('', 'abcd')
and
(
-- here are the composite keys (each pair must be unique)
(col2 = 'Scottish Terrier' and col3 = 'black') or
(col2 = 'Golden Retriever' and col3 = 'brown') or
(col2 = 'Golden Retriever' and col3 = 'wheaten') or
(col2 = 'Yorkshire Terrier' and col3 = 'brown') or
etc...
)
If this is the best way, is there a limit to the number of OR conditions I can have? If so, I'll have to break it up into smaller chuncks.
I don't know informix so I googled and it appears as if temporary tables exists. One idea is to create one and use that in a join:
CREATE TEMP TABLE tmp1 ( col2 varchar(20), col3 varchar(20) );
INSERT INTO tmp1 (col2, col3) VALUES ('Scottish Terrier', 'black')
, ('Golden Retriever', 'brown');
SELECT *
FROM dog_manners x
JOIN tmp1
ON tmp1.col2 = x.col2
AND tmp1.col3 = x.col3;
I'm not sure I'd recommend this, but without more detail to understand your requirement
but needs to be built dynamically
the following alternative would work, although may not be terribly efficient:
select * from dog_manners
where
col1 NOT IN ('', 'abcd')
and TRIM(col2)||':'||TRIM(col3) IN
(
'Scottish Terrier:black', 'Golden Retriever:brown',
'Golden Retriever:wheaten', 'Yorkshire Terrier:brown',
...
)
Can you give some more detail on how these combinations are generated?

Query optimization; Table.Column = #Param OR #Param IS NULL

In WHERE clause when using condition like this Table.Column = #Param OR #Param IS NULL It does not use INDEX on Column.
Is it true and if so then how to write this kind of query which also use INDEX
Query Example
SELECT Col1, Col2 ...
FROM Table
WHERE (Col1 = #col OR #col IS NULL)
AND (Col2 = #col2 OR #col2 IS NULL)
AND (Col3 = #col3 OR #col3 IS NULL)
Any help.
Unfortunately, the generation of execution plans does not behave as you expect.
For that single query, a single plan is created. In creating that plan the indexes to use are selected, and fixed. It doesn't matter what parameters you provide, the same plan, same indexes, etc, are always used.
The otpimiser has tried to find the best plan that can fit all eventuallities, but by the nature of this type of query, there isn't one. A characteristic born out by the plan you have not using an index at all.
The solution is to use dynamic SQL. This feels untidy, but if you use parameterised queries with sp_executesql, it can actually be quite stuctured, and very performant.
Here is a link to a very useful article on the subject: dynamic search
It's very in depth, but it is a very robust approach to this problem.
SELECT Col1, Col2 ...
FROM Table
WHERE EXISTS(
SELECT Col1, Col2, Col3
INTERSECT
SELECT #col, #col2, #col3)
Intuitively, this seems like it should perform very badly, but SQL Server's query optimiser knows how to give INTERSECT special treatment, and internally translates it to (pseudo-SQL)
SELECT Col1, Col2 ...
FROM Table
WHERE (Col1, Col2, Col3) IS (#col, #col2, #col3)
as you can see in the query plan. If you have indices on these columns, they can and do get used.
I originally picked this up from Paul White's Undocumented Query Plans: Equality Comparisons blog post, which may be an interesting further read.
Why don't try this:
SELECT Col1, Col2 ...
FROM Table
WHERE Col1 = IsNull(#col,Col1)
AND Col2 = IsNull(#col2,Col2)
AND Col3 = IsNull(#col3,Col3)
About your question:
Your query analyzer say it don't use the index on column1,2,3 ? You made a index for all 3 columns? Then it should use it regardless the other OR IS NULL
Try to have index on all where clause columns and try to use the more structured query as given below:
SELECT Col1, Col2 ...
FROM Table
WHERE Col1 = **COALESCE**(#col,Col1)
AND Col2 = **COALESCE**(#col2,Col2)
AND Col3 = **COALESCE**(#col3,Col3)
The COALESCE() function returns the first non-null argument so if STATUS is NULL it will return ''.

Inline If Statements in SQL

I wish to do something like this:
DECLARE #IgnoreNulls = 1;
SELECT Col1, Col2
FROM tblSimpleTable
IF #IgnoreNulls
BEGIN
WHERE Col2 IS NOT NULL
END
ORDER BY Col1 DESC;
The idea is to, in a very PHP/ASP.NET-ish kinda way, only filter NULLs if the user wishes to. Is this possible in T-SQL? Or do we need one large IF block like so:
IF #IgnoreNulls
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
WHERE Col2 IS NOT NULL
ORDER BY Col1 DESC;
END
ELSE
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
ORDER BY Col1 DESC;
END
You can do that this way:
SELECT Col1, Col2
FROM tblSimpleTable
WHERE ( #IgnoreNulls != 1 OR Col2 IS NOT NULL )
ORDER BY Col1 DESC
Dynamically changing searches based on the given parameters is a complicated subject and doing it one way over another, even with only a very slight difference, can have massive performance implications. The key is to use an index, ignore compact code, ignore worrying about repeating code, you must make a good query execution plan (use an index).
Read this and consider all the methods. Your best method will depend on your parameters, your data, your schema, and your actual usage:
Dynamic Search Conditions in T-SQL by by Erland Sommarskog
In general (unless the table is small) the best approach is to separate out the cases and do something like you have in your question.
IF (#IgnoreNulls = 1)
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
WHERE Col2 IS NOT NULL
ORDER BY Col1 DESC;
END
ELSE
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
ORDER BY Col1 DESC;
END
This is less likely to cause you problems with sub optimal query plans being cached.