I'm not sure if I'm doing something wrong here but I have a query running on a table with millions of rows.
The query is something like this:
select *
from dbo.table with (index (index_1), nolock)
where col1 = 15464
and col2 not in ('X', 'U')
and col3 is null
and col4 = 'E'
Index looks like this:
CREATE NONCLUSTERED INDEX [index_1] ON [dbo].[table] ([col1], [col2], [col3], [col4]) WITH (FILLFACTOR=90) ON [PRIMARY]
GO
This select still takes over a minute to run. What am I missing?
For this query:
select *
from table
where col1 = 15464 and
col2 not in ('X', 'U') and
col3 is null and
col4 = 'E';
The best index is table(col1, col4, col3, col2). The query should use the index automatically, without a hint.
When choosing an index based on a where clause, you should put the equality conditions in first -- followed by one column with an inequality. For the purposes of indexing, in and not in are inequality conditions in general.
Also, if you mix data types, then sometimes indexes are not used. So, this assumes that col1 is numeric.
Related
Isn't both below SQL the same? I mean functionality wise should do the same thing?
I was expecting this first SQL should have got result as well.
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 +COL2 FROM #TEST_1)
CREATE TABLE #TEST
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')
CREATE TABLE #TEST_1
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')
--No result
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:
--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--WHERE 2
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.
WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.
From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.
For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.
In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:
--1 row
SELECT T.COL1, --Don't use *, specify your columns
T.COL2, --Qualifying your columns is important!
T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
FROM #TEST_1 T1
WHERE T1.COL1 = T.COL1
AND T1.COL2 = T.COL2);
db<>fiddle
When you add strings in this way (using + instead of concatenation) it adds the two strings and gives you numeric value.
At the first query you are not adding strings so what you did is:
Select all rows from #Test that values of Col1 and Col2 are not in Test1
And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.
Second query sums that strings, but not by concatenation.
It actually convert varchars to numbers behind the scene.
So the second query does:
Select all rows from #test where COL1+COL2 (its 444 at first row, and 559 in second row) are not in #Test 1
And if you add rows at #Test1, values are:
For the first row COL1+COL2= 655
For the second row COL1+COL2= 559
So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.
To sum up:
Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.
So '123' +'321' is not '123321' but '444'.
I have a SQL query which looks over 10 conditions in where clause. I am not sure which way of below would be better in terms of performance. Some of my parameters are important and some are secondary.
If you can explain which is better and why I would be grateful.
My Params
DECLARE #ParamImportant1 int, #ParamImportant2 int, #ParamImportant2 int,
#ParamSecondary1 int, #ParamSecondary2 int,#ParamSecondary3 int
First Method
I have an index which contains all params.
SELECT
*
FROM MyTable
WHERE Col1 = #ParamImportant1 AND Col2 = #ParamImportant2 AND Col3 = #ParamImportant3
AND (#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1)
AND (#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2)
AND (#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
Second Method
Dividing query using subquery or cte.
SELECT
*
FROM
(
SELECT
*
FROM MyTable
WHERE Col1 = #ParamImportant1 AND Col2 = #ParamImportant2 AND Col3 = #ParamImportant3
) X
WHERE (#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1)
AND (#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2)
AND (#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
Third Method
Using Temp Table
SELECT
*
INTO #MyTemp
FROM MyTable
WHERE Col1 = #ParamImportant1 AND Col2 = #ParamImportant2 AND Col3 = #ParamImportant3
SELECT
*
FROM #MyTemp
WHERE (#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1)
AND (#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2)
AND (#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
Under most circumstances, your first version should work well:
WHERE Col1 = #ParamImportant1 AND
Col2 = #ParamImportant2 AND
Col3 = #ParamImportant3 AND
(#ParamSecondary1 IS NULL OR ColSec1 = #ParamSecondary1) AND
(#ParamSecondary2 IS NULL OR ColSec2 = #ParamSecondary2) AND
(#ParamSecondary3 IS NULL OR ColSec3 = #ParamSecondary3)
The index that you want needs to start with the three important columns. It can then include the other columns as well (col1, col2, col3, colsec1, colsec2, colsec3).
Note that the index will be scanned through all values of col1, col2, col3. That is, the index will not reduce the number of rows for the secondary columns.
Under most circumstances, that seems reasonable. If this is not, then you may need multiple indexes and dynamic SQL.
Avoid "smart logic" in your parameters, otherwise MSSQL cannot figure out the best way of getting the data.
The most reliable method for arriving at the best execution plan is to avoid unnecessary filters in the SQL statement.
https://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
I have a query that I am building that requires multiple flags. One of those flags is to find the percentage of increase between two values in the same row.
For example I have two values on my row:
Col1 26323 &
Col2 26397
Col2 has increased by 0.28 % on Col1. How can I express this in my query?
In this way
select Col1, Col2, (Col2 *100.0/Col1)-100 from (
select Col1 = 26323 , Col2 =26397
)a
Result :
Col1 Col2 (No column name)
26323 26397 0.281122972305
SELECT
100.0*(col1 - col2) / col2 As pdif
FROM ptable
Hope it is what you are looking for.
I have two tables
select col1 , col2 , col3, col4, ........, col20 from ftcm; --TABLE has 470708 ROWS
select val from cspm where product='MARK'; --TABLE has 1 ROW
i have to make col3 as null if col2=val.
have thought of joining as
select
col1 , col2 , decode(col2,val,NULL,col3) col3 , col4, ........, col20
from ftcm a left outer join ( select val from cspm where product='MARK') b
on a.col2=b.val;
but it seems to be time taking
Please advise if there is any other way to get it tuned in best way.
I have not tested this query but if you know that the record from cspm is returning only one value, then you can perhaps try the following query :-
select col1, col2, decode(col2,(select val from cspm where product='MARK'),NULL,col3) col3, col4 ... col20 from ftcm
Since you are doing an outer join, the above might produce an equivalent output.
Another option which you can explore is to use a parallel hint
select /*+ parallel(em,4) */ col1, col2, decode(col2,(select val from cspm where product='MARK'),NULL,col3) col3, col4 ... col20 from ftcm em
However, consult with your DBA before using parallel hint at the specified degree (4)
I have a database table that has a structure like the one shown below:
CREATE TABLE dated_records (
recdate DATE NOT NULL
col1 DOUBLE NOT NULL,
col2 DOUBLE NOT NULL,
col3 DOUBLE NOT NULL,
col4 DOUBLE NOT NULL,
col5 DOUBLE NOT NULL,
col6 DOUBLE NOT NULL,
col7 DOUBLE NOT NULL,
col8 DOUBLE NOT NULL
);
I want to write an SQL statement that will allow me to return a record containing the changes between two supplied dates, for specified columns - e.g. col1, col2 and col3
for example, if I wanted to see how much the value in col1, col2 and col3 has changed during the interval between two dates. A dumb way of doing this would be to select the rows (separately) for each date and then difference the fields outside the db server -
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'";
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'";
however, I'm sure there there is a way a smarter way of performing the differencing using pure SQL. I am guessing that it will involve using a self join (and possibly a nested subquery), but I may be over complicating things - I decided it would be better to ask the SQL experts on here to see how they would solve this problem in the most efficient way.
Ideally the SQL should be DB agnostic, but if it needs to be tied to be a particular db, then it would have to be PostgreSQL.
Just select the two rows, join them into one, and subtract the values:
select d1.recdate, d2.recdate,
(d2.col1 - d1.col1) as delta_col1,
(d2.col2 - d1.col2) as delta_col2,
...
from (select *
from dated_records
where recdate = <date1>
) d1 cross join
(select *
from dated_records
where recdate = <date2>
) d2
I think that if what you want to do is get in the result set rows that doesn't intersect the two select queries , you can use the EXCEPT operator :
The EXCEPT operator returns the rows that are in the first result set
but not in the second.
So your two queries will become one single query with the except operator joining them :
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'
EXCEPT
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'
SELECT
COALESCE
(a.col1 -
(
SELECT b.col1
FROM dated_records b
WHERE b.id = a.id + 1
),
a.col1)
FROM dated_records a
WHERE recdate='2001-01-01';
You could use window functions plus DISTINCT:
SELECT DISTINCT
first_value(recdate) OVER () AS date1
,last_value(recdate) OVER () AS date2
,last_value(col1) OVER () - first_value(col1) OVER () AS delta1
,last_value(col2) OVER () - first_value(col2) OVER () AS delta2
...
FROM dated_records
WHERE recdate IN ('2001-01-01', '2001-01-03')
For any two days. Uses a single index or table scan, so it should be fast.
I did not order the window, but all calculations use the same window, so the values are consistent.
This solution can easily be generalized for calculations between n rows. You may want to use nth_value() from the Postgres arsenal of window functions in this case.
This seemed a quicker way to write this if you are looking for a simple delta.
SELECT first(col1) - last(col1) AS delta_col1
, first(col2) - last(col2) AS delta_col2
FROM dated_records WHERE recdate IN ('2001-02-01', '2001-01-01')
You may not know whether the first row or the second row comes first, but you can always wrap the answer in abs(first(col1)-last(col1))