Why does the optimizer choose a keylookup instead of 2 separate queries? - sql

I have a table that has a primary key/clustered index on an ID column and a nonclustered index on a system date column. If I query all the columns from the table using the system date column (covering index wouldn't make sense here) the execution plan shows a key lookup because for each record it finds it has to go the the ID to get all of the column data.
The weird thing is, if I write 2 queries with a temp table it performs much faster. I can query the system date to get a table of ID's and then use that table to search the ID column. This makes sense because you're no longer doing the slow key lookup for each record.
Why doesn't the optimizer do this for us?
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01'
--faster version
--id primary key/clustered index
--systemdate nonclustered index
select ID, SystemDate
into #myTempTable
from MyTable
where SystemDate > '2019-01-01'
select t1.ID, t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.SystemDate
from MyTable t1
inner join #myTempTable t2
on t1.ID = t2.ID

Well, in second case you're actually doing a key lookup yourself, aren't you? ; )
Optimizer could perform slower due to outdated (or missing) statistics, fragmented index.
To tell you why it's actually slower, it's best if you'd paste your execution plans here. This would be way easier to explain what happens.

Query optimizer chooses key lookup because the query is not supported by covering index. It has to grab missing columns from table itself:
/*
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index
*/
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01';
Adding a covering index should boost the performance:
CREATE INDEX my_idx ON MyTable(SystemDate) INCLUDE(col1, col2, col3, col4, col5);
db<>fiddle demo
For query without JOIN:
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable -- single table
where SystemDate > '2019-01-01';
There is JOIN in execution plan:
After introducing covering index there is no need for additional key lookup:

Related

Performance for Avg & Max in SQL

I want to decrease the query execution time for the following query.
This query is taking around 1 min 20 secs for about 2k records.
Numbers of records in table: 1348474
Number of records processed through where query: 25000
Number of records returned: 2152
SELECT Col1, Col2,
ISNULL(AVG(Col3),0) AS AvgCol,
ISNULL(MAX(Col3),0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName WITH(NOLOCK)
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC
I tried removing the AVG & MAX columns and it lowered to 1 sec.
Is there any optimized solution for the same?
I have no other indexing other than Primary key.
Update
Indexes added:
nonclustered located on PRIMARY - Col1
nonclustered located on PRIMARY - Col2
clustered, unique, primary key located on PRIMARY - Id
======
Thanks in advance..Happy coding !!!
For this query:
SELECT Col1, Col2,
COALESCE(AVG(Col3), 0) AS AvgCol,
COALESCE(MAX(Col3), 0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC;
I would start with an index on (Col1, Col2, Col3).
I'm not sure if this will help. It is possible that the issue is the time for ordering the results.

order of columns in primary key in cratedb

Does the order of columns in the primary key impact the performance of related queries depending on the order of columns given in the select statement?
Example:
primary key (col1, col2, col3);
select col2, col3 from table;
-> would this select use the pk index?
select col3,col1,col2 from table;
-> would this select use the pk index?
No the order is not relevant.
But the primary key index is only used if all primary key columns will be used inside a where clause (like all indices).
select ... from table where col1 = ... and col2 = ... and col3 = ...;

Multiple Non-Clustered index and performance?

I have a table in SQL Server that has 700 000 records. But, when I am making a simple select query with 3 to 4 conditions in where clause, it is taking up to 45 seconds. I already have 2 non-clustered and 1 clustered index on that. So I was thinking to add 2 more non-clustered index in that table. By doing so, My table will have indexes for all columns which I am using in where clause of my query. I have also done it and found that result is coming quite faster as compared to previous one.
Can having 5 to 6 Non-clustered index can harm database performance or it would not affect much?
My Query structure is
SELECT ( SOME COLUMNS) FROM MyTable
WHERE COL1 = #Id AND COL2 >= #SomeDate AND (NOT (COL3 = 1)) AND
(COL4 <= #SomeOtherDate)
Table has 35 columns.
This is your query:
SELECT ( SOME COLUMNS)
FROM MyTable
WHERE COL1 = #Id AND COL2 >= #SomeDate AND (NOT (COL3 = 1)) AND
(COL4 <= #SomeOtherDate)
Unfortunately, your query can only make direct use of two columns in this clause. I would suggest the following composite index: (col1, col2, col3, col4). This index covers the where clause, but can only be used directly for the first two conditions.
A clustered index would probably be a marginal improvement over a non-clustered b-tree index.
Note if col3 only takes on the values 0 and 1, then you should write the where case:
WHERE COL1 = #Id AND COL2 >= #SomeDate AND COL3 = 0 AND
(COL4 <= #SomeOtherDate)
And use either (col1, col3, col2, col4) or (col1, col3, col4, col2).

SQL Covering Columns Order

Does the order of covering columns matter in an index?
CREATE INDEX idx1 ON MyTable (Col1, Col2) INCLUDE (Col3, Col4)
That is the order of Col3 & Col4 in the above example.
No, included columns are not ordered, so the order that they appear does not matter

SQL Server indexes - ascending or descending, what difference does it make?

When you create an index on a column or number of columns in MS SQL Server (I'm using version 2005), you can specify that the index on each column be either ascending or descending. I'm having a hard time understanding why this choice is even here. Using binary sort techniques, wouldn't a lookup be just as fast either way? What difference does it make which order I choose?
This primarily matters when used with composite indexes:
CREATE INDEX ix_index ON mytable (col1, col2 DESC);
can be used for either:
SELECT *
FROM mytable
ORDER BY
col1, col2 DESC
or:
SELECT *
FROM mytable
ORDER BY
col1 DESC, col2
, but not for:
SELECT *
FROM mytable
ORDER BY
col1, col2
An index on a single column can be efficiently used for sorting in both ways.
See the article in my blog for details:
Descending indexes
Update:
In fact, this can matter even for a single column index, though it's not so obvious.
Imagine an index on a column of a clustered table:
CREATE TABLE mytable (
pk INT NOT NULL PRIMARY KEY,
col1 INT NOT NULL
)
CREATE INDEX ix_mytable_col1 ON mytable (col1)
The index on col1 keeps ordered values of col1 along with the references to rows.
Since the table is clustered, the references to rows are actually the values of the pk. They are also ordered within each value of col1.
This means that that leaves of the index are actually ordered on (col1, pk), and this query:
SELECT col1, pk
FROM mytable
ORDER BY
col1, pk
needs no sorting.
If we create the index as following:
CREATE INDEX ix_mytable_col1_desc ON mytable (col1 DESC)
, then the values of col1 will be sorted descending, but the values of pk within each value of col1 will be sorted ascending.
This means that the following query:
SELECT col1, pk
FROM mytable
ORDER BY
col1, pk DESC
can be served by ix_mytable_col1_desc but not by ix_mytable_col1.
In other words, the columns that constitute a CLUSTERED INDEX on any table are always the trailing columns of any other index on that table.
For a true single column index it makes little difference from the Query Optimiser's point of view.
For the table definition
CREATE TABLE T1( [ID] [int] IDENTITY NOT NULL,
[Filler] [char](8000) NULL,
PRIMARY KEY CLUSTERED ([ID] ASC))
The Query
SELECT TOP 10 *
FROM T1
ORDER BY ID DESC
Uses an ordered scan with scan direction BACKWARD as can be seen in the Execution Plan. There is a slight difference however in that currently only FORWARD scans can be parallelised.
However it can make a big difference in terms of logical fragmentation. If the index is created with keys descending but new rows are appended with ascending key values then you can end up with every page out of logical order. This can severely impact the size of the IO reads when scanning the table and it is not in cache.
See the fragmentation results
avg_fragmentation avg_fragment
name page_count _in_percent fragment_count _size_in_pages
------ ------------ ------------------- ---------------- ---------------
T1 1000 0.4 5 200
T2 1000 99.9 1000 1
for the script below
/*Uses T1 definition from above*/
SET NOCOUNT ON;
CREATE TABLE T2( [ID] [int] IDENTITY NOT NULL,
[Filler] [char](8000) NULL,
PRIMARY KEY CLUSTERED ([ID] DESC))
BEGIN TRAN
GO
INSERT INTO T1 DEFAULT VALUES
GO 1000
INSERT INTO T2 DEFAULT VALUES
GO 1000
COMMIT
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('T1'), 1, NULL, 'DETAILED')
WHERE index_level = 0
UNION ALL
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('T2'), 1, NULL, 'DETAILED')
WHERE index_level = 0
It's possible to use the spatial results tab to verify the supposition that this is because the later pages have ascending key values in both cases.
SELECT page_id,
[ID],
geometry::Point(page_id, [ID], 0).STBuffer(4)
FROM T1
CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
UNION ALL
SELECT page_id,
[ID],
geometry::Point(page_id, [ID], 0).STBuffer(4)
FROM T2
CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
The sort order matters when you want to retrieve lots of sorted data, not individual records.
Note that (as you are suggesting with your question) the sort order is typically far less significant than what columns you are indexing (the system can read the index in reverse if the order is opposite what it wants). I rarely give index sort order any thought, whereas I agonize over the columns covered by the index.
#Quassnoi provides a great example of when it does matter.