How can LIKE '%...' seek on an index? - sql

I would expect these two SELECTs to have the same execution plan and performance. Since there is a leading wildcard on the LIKE, I expect an index scan. When I run this and look at the plans, the first SELECT behaves as expected (with a scan). But the second SELECT plan shows an index seek, and runs 20 times faster.
Code:
-- Uses index scan, as expected:
SELECT 1
FROM AccountAction
WHERE AccountNumber LIKE '%441025586401'
-- Uses index seek somehow, and runs much faster:
declare #empty VARCHAR(30) = ''
SELECT 1
FROM AccountAction
WHERE AccountNumber LIKE '%441025586401' + #empty
Question:
How does SQL Server use an index seek when the pattern starts with a wildcard?
Bonus question:
Why does concatenating an empty string change/improve the execution plan?
Details:
There is a non-clustered index on Accounts.AccountNumber
There are other indexes, but both the seek and the scan are on this index.
The Accounts.AccountNumber column is a nullable varchar(30)
The server is SQL Server 2012
Table and index definitions:
CREATE TABLE [updatable].[AccountAction](
[ID] [int] IDENTITY(1,1) NOT NULL,
[AccountNumber] [varchar](30) NULL,
[Utility] [varchar](9) NOT NULL,
[SomeData1] [varchar](10) NOT NULL,
[SomeData2] [varchar](200) NULL,
[SomeData3] [money] NULL,
--...
[Created] [datetime] NULL,
CONSTRAINT [PK_Account] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_updatable_AccountAction_AccountNumber_UtilityCode_ActionTypeCd] ON [updatable].[AccountAction]
(
[AccountNumber] ASC,
[Utility] ASC
)
INCLUDE ([SomeData1], [SomeData2], [SomeData3]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE CLUSTERED INDEX [CIX_Account] ON [updatable].[AccountAction]
(
[Created] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
NOTE:
Here is the actual execution plan for the two queries. The names of the objects differ slightly from the code above because I was trying to keep the question simple.

These tests (database AdventureWorks2008R2) shows what happens:
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'Test #1';
SELECT p.BusinessEntityID, p.LastName
FROM Person.Person p
WHERE p.LastName LIKE '%be%';
PRINT 'Test #2';
DECLARE #Pattern NVARCHAR(50);
SET #Pattern=N'%be%';
SELECT p.BusinessEntityID, p.LastName
FROM Person.Person p
WHERE p.LastName LIKE #Pattern;
SET STATISTICS IO OFF;
SET NOCOUNT OFF;
Results:
Test #1
Table 'Person'. Scan count 1, logical reads 106
Test #2
Table 'Person'. Scan count 1, logical reads 106
The results from SET STATISTICS IO shows that LIO are the same.
But the execution plans are quite different:
In the first test, SQL Server uses an Index Scan explicit but in the second test SQL Server uses an Index Seek which is an Index Seek - range scan. In the last case SQL Server uses a Compute Scalar operator to generate these values
[Expr1005] = Scalar Operator(LikeRangeStart([#Pattern])),
[Expr1006] = Scalar Operator(LikeRangeEnd([#Pattern])),
[Expr1007] = Scalar Operator(LikeRangeInfo([#Pattern]))
and, the Index Seek operator use an Seek Predicate (optimized) for a range scan (LastName > LikeRangeStart AND LastName < LikeRangeEnd) plus another unoptimized Predicate (LastName LIKE #pattern).
How can LIKE '%...' seek on an index?
My answer: it isn't a "real" Index Seek. It's a Index Seek - range scan which, in this case, has the same performance like Index Scan.
Please see, also, the difference between Index Seek and Index Scan (similar debate):
So…is it a Seek or a Scan?.
Edit 1: The execution plan for OPTION(RECOMPILE) (see Aaron's recommendation please) shows, also, an Index Scan (instead of Index Seek):

Related

Is this index on a table necessary?

I have a table with 5 indexes in SQL server. I'm aware of the fact that indexes affect inserts so I'd like to keep them to the minimum. I definitely need the first four indexes (as seen in the below sample).
However, I'm note quite sure if the last index (TimeSubmitted) is absolutely necessary - note that there is already CliendId+TimeSubmitted index.
The only reason why it is there is to make purging of expired rows from the table more efficient - or at least this is what my intention is. The purging job will be scheduled to run once a day - most likely at night.
There could be hundreds of thousands of records in the table at any given time.
Stored proc to purge the table:
CREATE PROCEDURE uspPurgeMyTable
(
#ExpiryDate datetime
)
AS
BEGIN
DELETE FROM MyTable
WHERE TimeSubmitted < #ExpiryDate;
END
Table (non relevant columns omitted):
CREATE TABLE MyTable (
[ClientId] [char](36) NOT NULL,
[UserName] [nvarchar](256) NOT NULL,
[TimeSubmitted] [datetime] NOT NULL,
[ProviderId] [uniqueidentifier] NULL,
[RegionId] [int] NULL
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientArea] ON [dbo].[MyTable]
(
[ClientId] ASC,
[RegionId] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientPrinter] ON [dbo].[MyTable]
(
[ClientId] ASC,
[ProviderId] DESC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientTime] ON [dbo].[MyTable]
(
[ClientId] ASC,
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientUser] ON [dbo].[MyTable]
(
[ClientId] ASC,
[UserName] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_MyTable_TimeSubmitted] ON [dbo].[MyTable]
(
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
Yes, the IX_MyTable_TimeSubmitted index will be beneficial for the purge process you described. Since the purge is based on the TimeSubmitted column only and none of the other indexes start with that, the best SQL Server would be able to do is use them in an index scan.
As with any indexing, there are trade-offs between read and update performance. You should measure your read, insert, and purge performance with and without the additional index to see what provides the best response time for each situation. Batch processes such as the purge that run overnight may not be as time sensitive as inserting or reading in your particular scenario.
If you are concerned about performance, you should take a different approach, partitioning. A reasonable place to learn about partitioning is the documentation.
A partitioned table stores each partition in a separate set of files. These are invisible to users of the database, in general. However, if you want to drop old data, you can just drop a partition.
This not only eliminates the need for the index. It also eliminates the need for the delete. And, dropping a partition is much more efficient than deleting, because there is much less logging.

SQL Server - Is creating an index with a unique constraint as one of the columns necessary?

I have the query below:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = #input
AND FK_SLA_Process = #input2
AND IsActive = 1
And this is my index for this SLA table.
CREATE INDEX IX_SLA_SLAName_FK_SLA_Process_IsActive ON dbo.SLA (SLAName, FK_SLA_Process, IsActive) INCLUDE (SLATimeInSeconds)
However, the SLAName column is unique so it has a unique constraint/index.
Is my created index an overkill? Do I still need it or will SQL Server use the index created on the unique column SLAName?
It would be an "overkill" if your index would only be on SLAName, but you are also ordering by FK_SLA_Process and IsActive so queries that need needs columns will benefit more from your index and less if you just had the unique one.
So for a query like this:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Both index will yield the same results and there would be no point in yours. But for queries like:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
AND FK_SLA_Process = 'Some Value'
Or
SELECT SLATimeInSeconds
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Your index will be better than the unique one (2nd example is a covering index).
You should inspect which kind of SELECT you do to this table and decide if you need this one or not. Keep in mind that having many indexes might speed up selects but slow down inserts, updates and deletes.
Assuming you have a such table declaration:
CREATE TABLE SLA
(
ID INT PRIMARY KEY,
SLAName VARCHAR(50) NOT NULL UNIQUE,
fk_SLA INT,
IsActive TINYINT
)
Under the hood we have two indexes:
CREATE TABLE [dbo].[SLA](
[ID] [int] NOT NULL,
[SLAName] [varchar](50) NOT NULL,
[fk_SLA] [int] NULL,
[IsActive] [tinyint] NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
So this query will have an index seek and has an optimal plan:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
Its query plan indicates index seek because we are searching by index UNIQUE NONCLUSTERED ([SLAName] ASC ) and don't use other columns in WHERE statement:
But if you add extra parameters into WHERE:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
AND s.fk_SLA = 1
AND s.IsActive = 1
Execution plan will have extra look up:
Lookup happens when index does not have necessary information. SQL Query engine has to get out from UNIQUE NONCLUSTERED index data structure to find data of columns fk_SLA and IsActive in your table SLA.
So your index is overkill as you have UNIQUE NONCLUSTERED index:
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
If SLAName column is unique and it has a unique constraint, any query that returns only one or 0 rows (all the queries with a point search that include SLAName = 'SomeName' condition) will use the unique index and make (maximum) one lookup in the base table.
Unless your queries have a range search like SLAName like 'SomeName%' there is no need in covering index because index search + 1 lookup is almost the same as only index search, and there is no need to waste space / maintain another index for such a miserable performance gain.

How to optimize a select top N Query

I have a very large table, consisting of 40 million rows, in a SQL Server 2008 Database.
CREATE TABLE [dbo].[myTable](
[ID] [bigint] NOT NULL,
[CONTRACT_NUMBER] [varchar](50) NULL,
[CUSTOMER_NAME] [varchar](200) NULL,
[INVOICE_NUMBER] [varchar](50) NULL,
[AGENCY] [varchar](50) NULL,
[AMOUNT] [varchar](50) NULL,
[INVOICE_MONTH] [int] NULL,
[INVOICE_YEAR] [int] NULL,
[Unique_ID] [bigint] NULL,
[bar_code] [varchar](50) NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[ID] ASC,
[bar_code] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I am trying to optimize performance for the following query:
SELECT top 35 ID,
CONTRACT_NR,
CUSTOMER_NAME,
INVOICE_NUMBER,
AMOUNT,
AGENCY,
CONTRACT_NUMBER,
ISNULL([INVOICE_MONTH], 1) as [INVOICE_MONTH],
ISNULL([INVOICE_YEAR], 1) as [INVOICE_YEAR],
bar_code,
Unique_ID
from MyTable
WHERE
CONTRACT_NUMBER like #CONTRACT_NUMBER and
INVOICE_NUMBER like #INVOICE_NUMBER and
CUSTOMER_NAME like #CUSTOMER_NAME
ORDER BY Unique_ID desc
In order to do that i build an included index on the columns CONTRACT_NUMBER, INVOICE_NUMBER and CUSTOMER_NAME.
CREATE NONCLUSTERED INDEX [ix_search_columns_without_uniqueid] ON [dbo].[MyTable]
(
[CONTRACT_NUMBER] ASC,
[CUSTOMER_NAME] ASC,
[INVOICE_NUMBER] ASC
)
INCLUDE ( [ID],
[AGENCY],
[AMOUNT],
[INVOICE_MONTH],
[INVOICE_YEAR],
[Unique_ID],
[Contract_nr],
[bar_code]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Still the query is taking from 3 sec to 10 sec to execute. From the query execution plan i see that an index seek operation is taking place consuming about 30% of the total workload and than a Sort (Top N) operation which is consuming the other 70%. Any idea how can i optimize this query, a response time of less than 1 sec is preferred?
Note: I tried also to include dhe column [Unique_ID] in the index columns. In this case the query execution plan is doing an index scan, but with many users querying the database, i am having the same problem.
Check this page for more detail.
Update the statistic with a full scan to make the optimizer work easier.
UPDATE STATISTICS tablename WITH fullscan
GO
Set statistics time on and execute the following query
SET STATISTICS time ON
GO
SELECT num_of_reads, num_of_bytes_read,
num_of_writes, num_of_bytes_written
FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)
GO
SELECT TOP 100 c1, c2,c3
FROM yourtablename
WHERE c1<30000
ORDER BY c2
GO
SELECT num_of_reads, num_of_bytes_read,
num_of_writes, num_of_bytes_written
FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)
GO
Result
CPU time = 124 ms, elapsed time = 91 ms
Before Query execution
num_of_reads num_of_bytes_read num_of_writes num_of_bytes_written
-------------------- -------------------- -------------------- --------------------
725864 46824931328 793589 51814416384
After Query execution
num_of_reads num_of_bytes_read num_of_writes num_of_bytes_written
-------------------- -------------------- -------------------- --------------------
725864 46824931328 793589 51814416384
Source : https://www.mssqltips.com/sqlservertip/2053/trick-to-optimize-top-clause-in-sql-server/
Try and replace your clustered index (currently on two columns) with one solely on unique_id (assuming that it really is unique). This will aid your sorting. Then add a second covering index - as you have tried - on the three columns used in the WHERE. Check your statistics are upto date. Ihave a feeling that the column bar_code in your PK is preventing your sort from running as quickly as it could.
Do your variables contain wildcards?If they do,and they are leading wildcards, the index on the WHERE columns cannot be used. If they are not wildcarded, try a direct "=", assuming case-sensitivity is not an issue.
UPDATE: since you have leading wildcards, you will not be able to take advantage of an index on CONTRACT_NUMBER , INVOICE_NUMBER or CUSTOMER_NAME: as GriGrim suggested, the only alternative here is to use fulltext searches (CONTAINS keyword etc.).

SQL Server 2008 index fragmentation problem

Everyday I import 2,000,000 rows from some text files using BULK INSERT into SQL Server 2008 and then I do some post-processing to update the records.
I have some indexes on the table to execute the post-process as fast as possible and in normal situation, the post-processing script takes about 40 seconds to run.
But sometimes (I don't know when) the post-processing does not work. In the situation I've mentioned, it is not done after an hour! After rebuilding indexes, everything is fine and normal.
What should I do to prevent the problem to be happened?
Right now, I have a nightly job to rebuild all indexes. Why the index fragmentation grows up to 90%?
Update:
Here is my table which I import text file into:
CREATE TABLE [dbo].[My_Transactions](
[My_TransactionId] [bigint] NOT NULL,
[FileId] [int] NOT NULL,
[RowNo] [int] NOT NULL,
[TransactionTypeId] [smallint] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionNumber] [dbo].[TransactionNumber] NOT NULL,
[CardNumber] [dbo].[CardNumber] NULL,
[AccountNumber] [dbo].[CardNumber] NULL,
[BankCardTypeId] [smallint] NOT NULL,
[AcqBankId] [smallint] NOT NULL,
[DeviceNumber] [dbo].[DeviceNumber] NOT NULL,
[Amount] [dbo].[Amount] NOT NULL,
[DeviceTypeId] [smallint] NOT NULL,
[TransactionFee] [dbo].[Amount] NOT NULL,
[AcqSwitchId] [tinyint] NOT NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K1_K4_K12_K6_K11_5] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC,
[TransactionTypeId] ASC,
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K12_K6_K11_K1_5] ON [dbo].[Jam_Transactions]
(
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC,
[Jam_TransactionId] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Jam_Transactions] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Have you tried just refreshing the statistics after such a large insert:
UPDATE STATISTICS my_table
My experience with large bulk inserts is that the statistics get all mangled up and need a refreshed afterwards, it's also much faster than running a REINDEX or index REORDER.
Another option is to look into padding the index, you likely have no padding fill factor on your indexes meaning that if your index is:
A, B, D, E, F
and you insert a value with a CardNumber of C, then your index will look like:
A, B, D, E, F, C
and hence be ~20% fragmented, if you instead specify a fill factor for your index of say 15% we would see it look like roughly:
A, B, D, _, E, F
(Note the internal the empty space is put roughly in the middle point of the fillfactor % not at the end)
So that when you insert the C value it is closer to being correct, but it actually sees that the D is just swapped with the C and usually moves the D at that point.
Beyond that, are you sure that the fragmentation is actually the problem, as part of reindexing the table is read and loaded entirely into memory (provided it fits) and thus any query you run on it will be very fast.
Instead of including this table in the nightly job, why don't you make index maintenance (on this table specifically) part of the nightly import job, between BULK INSERT and whatever 'post-processing' is?
We don't have enough information to know why the index fragmentation grows that quickly. Which index(es)? How many indexes are there? What is the order of the data in the file?
You may also consider using the ORDER option in the BULK INSERT statement to change the way the data is inserted. It may make the load take longer but it should reduce the need to reorganize. Again depending on the order of the source data and the index(es) that become fragmented.
Finally, what is the impact of rebuilding/not rebuilding or reorganizing/not reorganizing the indexes? Have you tried both? Perhaps it makes the post-processing run quicker if you rebuild, but perhaps only a defragment is necessary. And while it may make the post-processing quicker, what about the queries that are run against the table later in the day? Have you done any metrics against those to see if they speed up or slow down depending on what you do at night?
Does your main table keep growing by 2 million rows per day or is there a lot of deleting taking place as well? Could you bulk insert into a temporary import table and do your processing prior to inserting into the main table? You can always use hints to force your queries to use certain indices:
SELECT *
FROM your_table_name WITH (INDEX(your_index_name))
WHERE your_column_name = 5
I would try taking the index offline before a mass row insertion and the bring it back online after the mass row insertion. Much, much more faster as compared to re-indexing, or performing a drop and create index..... difference is that the index is there data is being stored but the index is currently not being used, "Offline" until it is brought back "Online". I have a 1.5 million row insertion process and had a problem with one of my non clustered indexes fragmenting which was causing poor performance. Went form 99% fragmentation to .14% using the MSSQL Offline Online Index option....
Code sample:
ALTER INDEX idx_a ON dbo.tbl_A
REBUILD WITH (ONLINE = OFF);
Toggle between OFF and ON and you are good to go....

Problem in using indexes with SQL Server 2005 and Hibernate

I've a problem with a query generated by Hibernate that do not uses an index. Access to database is made from Java using JTDS and server version is SQL Server 2005, latest service pack.
The field is nullable and is a foreign key that, in some specific scenarios, could be completely null, column is indexed via a not clustered index but the index is never used when the column is entirely null, creating a large number of full table scans and performance issues.
The situation could be verified also using the standard query analyzer with the following SQL code:
Create table and indexes
CREATE TABLE [dbo].[TestNulls](
[PK] [varchar](36) NOT NULL,
[DATA] [varchar](36) NULL,
[DATANULL] [varchar](36) NULL,
CONSTRAINT [PK_TestNulls] PRIMARY KEY NONCLUSTERED
(
[PK] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IDX_DATA] ON [dbo].[TestNulls]
(
[DATA] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_DATANULL] ON [dbo].[TestNulls]
(
[DATANULL] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Fill it with some random data using the newid function
declare #i as int
set #i = 0
while (#i < 500000)
begin
set nocount on
insert into TestNulls values(NEWID(), NEWID(), null)
insert into TestNulls values(NEWID(), null, null)
insert into TestNulls values(NEWID(), null, null)
set #i = (#i + 1)
set nocount on
end;
This query perform a full table scan
declare #p varchar(36)
set #p = NEWID()
select PK, DATA, DATANULL from TestNulls
where DATANULL = #p
If I complete the query with a "and DATANULL IS NOT NULL" the query now uses the index.
Help needed:
How can I force the JTDS/Hibernate combination to use the index (the sendStringParametersAsUnicode is already set to false by default)?
Is there a way to append "and column is not null" for all hibernate queries that uses a nullable field?
Any explanation about this behavior?
Regards
Massimo
1) I think, we should avoid NULL values. Just use DEFAULT and place some {00000-0000-000...} as NULL value. Your data filling script generates too many nulls values, so selectivity of values of this field is very low. I think SQL Server will choose to scan then use index in this case (SQL Server automaticly chooses to use or does not use index itself). And it makes sence. You should analyse your REAL data. Any way you can force it to just use some index. You can create stored procedure to sql server and then query it from hibernate, for example,
or command hibernate to use custom query to request data (I think, it is possible) and add table hint to your query to force using some index:
INDEX ( index_val [ ,...n ] ):
select PK, DATA, DATANULL from TestNulls WITH INDEX(IDX_DATANULL)
The selectivity is the "number of rows" / "cardinality", so if you have 10K customers, and search for all "female", you have to consider that the search would return 10K/2 = 5K rows, so a very "bad" selectivity.
Luck.
You are using a table with no clustered index ("heap table", as it is called), that is generally not very efficient for SELECTs, because any meaningful query requires either a bookmark lookup or a full table scan.
So, to use the index the server will have to: 1) find the given values in the index and retrieve corresponding Row IDs, 2) retrive the rows by the IDs and return the data.
Given the nature of your data, the optimizer "thinks" full scan is mor efficient.
I'd suggest you to try:
Rebuilding the statistics on the table. Outdated stats could lead optimizer to wrong decisions.
Force using index via a hint. Do not forget to test whether it is really faster on your actual data (sometimes optimizer happens to know better than you).
Create a covering index for this query by adding some data (it will make inserts/updates somewhat slower, so you should consider the overall impact on the system):
CREATE INDEX IDX_DATANULL_FULL ON TestNulls (DATANULL) INCLUDE (PK, DATA)