SQL Server Where Clause Performance - sql

I have a SQL query which looks over 10 conditions in where clause. I am not sure which way of below would be better in terms of performance. Some of my parameters are important and some are secondary.
If you can explain which is better and why I would be grateful.
My Params
DECLARE #ParamImportant1 int, #ParamImportant2 int, #ParamImportant2 int,
#ParamSecondary1 int, #ParamSecondary2 int,#ParamSecondary3 int
First Method
I have an index which contains all params.
SELECT
*
FROM MyTable
WHERE Col1 = #ParamImportant1 AND Col2 = #ParamImportant2 AND Col3 = #ParamImportant3
AND (#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1)
AND (#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2)
AND (#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
Second Method
Dividing query using subquery or cte.
SELECT
*
FROM
(
SELECT
*
FROM MyTable
WHERE Col1 = #ParamImportant1 AND Col2 = #ParamImportant2 AND Col3 = #ParamImportant3
) X
WHERE (#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1)
AND (#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2)
AND (#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
Third Method
Using Temp Table
SELECT
*
INTO #MyTemp
FROM MyTable
WHERE Col1 = #ParamImportant1 AND Col2 = #ParamImportant2 AND Col3 = #ParamImportant3
SELECT
*
FROM #MyTemp
WHERE (#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1)
AND (#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2)
AND (#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)

Under most circumstances, your first version should work well:
WHERE Col1 = #ParamImportant1 AND
Col2 = #ParamImportant2 AND
Col3 = #ParamImportant3 AND
(#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1) AND
(#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2) AND
(#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
The index that you want needs to start with the three important columns. It can then include the other columns as well (col1, col2, col3, colsec1, colsec2, colsec3).
Note that the index will be scanned through all values of col1, col2, col3. That is, the index will not reduce the number of rows for the secondary columns.
Under most circumstances, that seems reasonable. If this is not, then you may need multiple indexes and dynamic SQL.

Avoid "smart logic" in your parameters, otherwise MSSQL cannot figure out the best way of getting the data.
The most reliable method for arriving at the best execution plan is to avoid unnecessary filters in the SQL statement.
https://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic

Related

NOT IN vs concatenate columns

Isn't both below SQL the same? I mean functionality wise should do the same thing?
I was expecting this first SQL should have got result as well.
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 +COL2 FROM #TEST_1)
CREATE TABLE #TEST
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')
CREATE TABLE #TEST_1
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')
--No result
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:
--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--WHERE 2
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.
WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.
From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.
For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.
In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:
--1 row
SELECT T.COL1, --Don't use *, specify your columns
T.COL2, --Qualifying your columns is important!
T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
FROM #TEST_1 T1
WHERE T1.COL1 = T.COL1
AND T1.COL2 = T.COL2);
db<>fiddle
When you add strings in this way (using + instead of concatenation) it adds the two strings and gives you numeric value.
At the first query you are not adding strings so what you did is:
Select all rows from #Test that values of Col1 and Col2 are not in Test1
And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.
Second query sums that strings, but not by concatenation.
It actually convert varchars to numbers behind the scene.
So the second query does:
Select all rows from #test where COL1+COL2 (its 444 at first row, and 559 in second row) are not in #Test 1
And if you add rows at #Test1, values are:
For the first row COL1+COL2= 655
For the second row COL1+COL2= 559
So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.
To sum up:
Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.
So '123' +'321' is not '123321' but '444'.

Select query too slow even though I'm using an index

I'm not sure if I'm doing something wrong here but I have a query running on a table with millions of rows.
The query is something like this:
select *
from dbo.table with (index (index_1), nolock)
where col1 = 15464
and col2 not in ('X', 'U')
and col3 is null
and col4 = 'E'
Index looks like this:
CREATE NONCLUSTERED INDEX [index_1] ON [dbo].[table] ([col1], [col2], [col3], [col4]) WITH (FILLFACTOR=90) ON [PRIMARY]
GO
This select still takes over a minute to run. What am I missing?
For this query:
select *
from table
where col1 = 15464 and
col2 not in ('X', 'U') and
col3 is null and
col4 = 'E';
The best index is table(col1, col4, col3, col2). The query should use the index automatically, without a hint.
When choosing an index based on a where clause, you should put the equality conditions in first -- followed by one column with an inequality. For the purposes of indexing, in and not in are inequality conditions in general.
Also, if you mix data types, then sometimes indexes are not used. So, this assumes that col1 is numeric.

Multiple data row columns per line

I am trying to display a single column from a data set but spread out across a single row. For example:
[Row1] [Row2] [Row3]
[Row4] [Row5] [Row6]
Instead of:
[Row1]
[Row2]
[Row3] etc.
The data set needs to be joined with another table based on column from an outer table which means, AFAIK, cross tabs are out of the question as you can't use data set parameters with them. There is not a limit to how many rows there will be in a single data set but I want to have 3 row columns per line.
I can modify the data set query however I can only use plain old SQL in those queries except for creating temporary tables or creating anything "new" on the server side - a BIRT-only solution would be more desirable however.
If you can change the query to output
1 1 [Row1]
1 2 [Row2]
1 3 [Row3]
2 1 [Row4]
2 2 [Row5]
2 3 [Row6]
into a temporary table tmp, then you could query that using something like
select col1, col3 from tmp into tmp1 where col2 = 1;
select col1, col3 from tmp into tmp2 where col2 = 2;
select col1, col3 from tmp into tmp3 where col2 = 3;
select tmp1.col3, tmp2.col3, tmp3.col3 from tmp1, tmp2, tmp3 where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
You could generate col1 and col2 using rownum, but it's non-standard, and it requires the output of the original query to be sorted properly.
Edit:
If you can't use a temporary table, I assume you can use subqueries:
select tmp1.col3, tmp2.col3, tmp3.col3 from
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 1) as tmp1,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 2) as tmp2,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 3) as tmp3
where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
and hope the optimizer is smart.

INSERT INTO SELECT + 1 custom column

I need to copy data from original table and add custom column specified in query
Original table struct: col1, col2, col3
Insert table struct: x, col1, col2, col3
INSERT INTO newtable
SELECT *
FROM original
WHERE cond
and I'm getting this error
Column count doesn't match value count at row 1
HOW can I insert X value in this single query?
I tought something like this can pass
INSERT INTO newtable
SELECT 'x' = NULL, *
FROM original
WHERE cond
Any ideas?
Is it possible to use *? Because that table has so many columns and X has to be first value
I know this all is bad but I have to edit unbeliveable ugly db with even worse php code
The second statement is almost correct, but instead of 'x' = null, use null x (I'm assuming you want to store a null value in a column named x);
INSERT INTO newtable
SELECT null x, o.* FROM original o WHERE cond
Select Null as X, *
into newtable
from original
where ...
INSERT INTO newtable
SELECT null as x, col1, col2, col3 FROM original WHERE cond

Difference two rows in a single SQL SELECT statement

I have a database table that has a structure like the one shown below:
CREATE TABLE dated_records (
recdate DATE NOT NULL
col1 DOUBLE NOT NULL,
col2 DOUBLE NOT NULL,
col3 DOUBLE NOT NULL,
col4 DOUBLE NOT NULL,
col5 DOUBLE NOT NULL,
col6 DOUBLE NOT NULL,
col7 DOUBLE NOT NULL,
col8 DOUBLE NOT NULL
);
I want to write an SQL statement that will allow me to return a record containing the changes between two supplied dates, for specified columns - e.g. col1, col2 and col3
for example, if I wanted to see how much the value in col1, col2 and col3 has changed during the interval between two dates. A dumb way of doing this would be to select the rows (separately) for each date and then difference the fields outside the db server -
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'";
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'";
however, I'm sure there there is a way a smarter way of performing the differencing using pure SQL. I am guessing that it will involve using a self join (and possibly a nested subquery), but I may be over complicating things - I decided it would be better to ask the SQL experts on here to see how they would solve this problem in the most efficient way.
Ideally the SQL should be DB agnostic, but if it needs to be tied to be a particular db, then it would have to be PostgreSQL.
Just select the two rows, join them into one, and subtract the values:
select d1.recdate, d2.recdate,
(d2.col1 - d1.col1) as delta_col1,
(d2.col2 - d1.col2) as delta_col2,
...
from (select *
from dated_records
where recdate = <date1>
) d1 cross join
(select *
from dated_records
where recdate = <date2>
) d2
I think that if what you want to do is get in the result set rows that doesn't intersect the two select queries , you can use the EXCEPT operator :
The EXCEPT operator returns the rows that are in the first result set
but not in the second.
So your two queries will become one single query with the except operator joining them :
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'
EXCEPT
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'
SELECT
COALESCE
(a.col1 -
(
SELECT b.col1
FROM dated_records b
WHERE b.id = a.id + 1
),
a.col1)
FROM dated_records a
WHERE recdate='2001-01-01';
You could use window functions plus DISTINCT:
SELECT DISTINCT
first_value(recdate) OVER () AS date1
,last_value(recdate) OVER () AS date2
,last_value(col1) OVER () - first_value(col1) OVER () AS delta1
,last_value(col2) OVER () - first_value(col2) OVER () AS delta2
...
FROM dated_records
WHERE recdate IN ('2001-01-01', '2001-01-03')
For any two days. Uses a single index or table scan, so it should be fast.
I did not order the window, but all calculations use the same window, so the values are consistent.
This solution can easily be generalized for calculations between n rows. You may want to use nth_value() from the Postgres arsenal of window functions in this case.
This seemed a quicker way to write this if you are looking for a simple delta.
SELECT first(col1) - last(col1) AS delta_col1
, first(col2) - last(col2) AS delta_col2
FROM dated_records WHERE recdate IN ('2001-02-01', '2001-01-01')
You may not know whether the first row or the second row comes first, but you can always wrap the answer in abs(first(col1)-last(col1))