SQL Server 2008 index fragmentation problem

SQL Server 2008 index fragmentation problem - sql

Everyday I import 2,000,000 rows from some text files using BULK INSERT into SQL Server 2008 and then I do some post-processing to update the records.
I have some indexes on the table to execute the post-process as fast as possible and in normal situation, the post-processing script takes about 40 seconds to run.
But sometimes (I don't know when) the post-processing does not work. In the situation I've mentioned, it is not done after an hour! After rebuilding indexes, everything is fine and normal.
What should I do to prevent the problem to be happened?
Right now, I have a nightly job to rebuild all indexes. Why the index fragmentation grows up to 90%?
Update:
Here is my table which I import text file into:
CREATE TABLE [dbo].[My_Transactions](
[My_TransactionId] [bigint] NOT NULL,
[FileId] [int] NOT NULL,
[RowNo] [int] NOT NULL,
[TransactionTypeId] [smallint] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionNumber] [dbo].[TransactionNumber] NOT NULL,
[CardNumber] [dbo].[CardNumber] NULL,
[AccountNumber] [dbo].[CardNumber] NULL,
[BankCardTypeId] [smallint] NOT NULL,
[AcqBankId] [smallint] NOT NULL,
[DeviceNumber] [dbo].[DeviceNumber] NOT NULL,
[Amount] [dbo].[Amount] NOT NULL,
[DeviceTypeId] [smallint] NOT NULL,
[TransactionFee] [dbo].[Amount] NOT NULL,
[AcqSwitchId] [tinyint] NOT NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K1_K4_K12_K6_K11_5] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC,
[TransactionTypeId] ASC,
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K12_K6_K11_K1_5] ON [dbo].[Jam_Transactions]
(
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC,
[Jam_TransactionId] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Jam_Transactions] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Have you tried just refreshing the statistics after such a large insert:
UPDATE STATISTICS my_table
My experience with large bulk inserts is that the statistics get all mangled up and need a refreshed afterwards, it's also much faster than running a REINDEX or index REORDER.
Another option is to look into padding the index, you likely have no padding fill factor on your indexes meaning that if your index is:
A, B, D, E, F
and you insert a value with a CardNumber of C, then your index will look like:
A, B, D, E, F, C
and hence be ~20% fragmented, if you instead specify a fill factor for your index of say 15% we would see it look like roughly:
A, B, D, _, E, F
(Note the internal the empty space is put roughly in the middle point of the fillfactor % not at the end)
So that when you insert the C value it is closer to being correct, but it actually sees that the D is just swapped with the C and usually moves the D at that point.
Beyond that, are you sure that the fragmentation is actually the problem, as part of reindexing the table is read and loaded entirely into memory (provided it fits) and thus any query you run on it will be very fast.

Instead of including this table in the nightly job, why don't you make index maintenance (on this table specifically) part of the nightly import job, between BULK INSERT and whatever 'post-processing' is?
We don't have enough information to know why the index fragmentation grows that quickly. Which index(es)? How many indexes are there? What is the order of the data in the file?
You may also consider using the ORDER option in the BULK INSERT statement to change the way the data is inserted. It may make the load take longer but it should reduce the need to reorganize. Again depending on the order of the source data and the index(es) that become fragmented.
Finally, what is the impact of rebuilding/not rebuilding or reorganizing/not reorganizing the indexes? Have you tried both? Perhaps it makes the post-processing run quicker if you rebuild, but perhaps only a defragment is necessary. And while it may make the post-processing quicker, what about the queries that are run against the table later in the day? Have you done any metrics against those to see if they speed up or slow down depending on what you do at night?

Does your main table keep growing by 2 million rows per day or is there a lot of deleting taking place as well? Could you bulk insert into a temporary import table and do your processing prior to inserting into the main table? You can always use hints to force your queries to use certain indices:
SELECT *
FROM your_table_name WITH (INDEX(your_index_name))
WHERE your_column_name = 5

I would try taking the index offline before a mass row insertion and the bring it back online after the mass row insertion. Much, much more faster as compared to re-indexing, or performing a drop and create index..... difference is that the index is there data is being stored but the index is currently not being used, "Offline" until it is brought back "Online". I have a 1.5 million row insertion process and had a problem with one of my non clustered indexes fragmenting which was causing poor performance. Went form 99% fragmentation to .14% using the MSSQL Offline Online Index option....
Code sample:
ALTER INDEX idx_a ON dbo.tbl_A
REBUILD WITH (ONLINE = OFF);
Toggle between OFF and ON and you are good to go....

Related

Table deadlock issues with lock escalations

I have a table where anywhere between 1 and 5 million records come in as a batch and then a bunch of stored procedures are ran on them that update and delete records in the batch.
All of these stored procedures are using two fields for selectivity so they only run on the records in that batch.
Both of these fields are in a nonclustered index.
There are times when multiple batches are run at the same time and I am continuously getting deadlocks happening between batches, I assume due to lock escalations.
Trying to figure out if there is a way to solve this without a complete redesign to use a dedicated table for each batch. Is disabling page locks asking for more trouble?
Additional information:
Example of table structure and Index(the real one has a lot more columns than this)
CREATE TABLE [dbo].[TempImport](
[UID] [int] IDENTITY(1,1) NOT NULL,
[EID] [int] NULL,
[EXTID] [int] NULL,
[COL1] [varchar](50) NULL,
[COL2] [varchar](50) NULL,
CONSTRAINT [PK_TempImport] PRIMARY KEY CLUSTERED
(
[UID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_TempImport_Main] ON [dbo].[TempImport]
(
[EID] ASC,
[EXTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
And the types of queries in the stored procedures look something like this:
update TempImport set COL1 = 'foo' where EID = #EID and EXTID = #EXT and COL2='bar'
And the last thing that happens when batch completes is something like this:
Delete from TempImport where EID = #EID and EXTID = #EXT
It is typically the delete and the updates in the stored procedures that are involved in the deadlock.
Please let me know if any other info would be useful

Just some potential suggestions
Yeah, you could split them up into (say) batches of 2000, to stop the row locks escalating to table locks, but you'd have a gazillion running. Probably not a good solution.
You could modify the update processes (as per Brent Ozar's video re deadlocks I recommended in the comments) to update each table once and once only. As a suggestion, you could also try encapsulating them in transactions. These could remove deadlocks, but add blocking (where the second operation has to wait for the first to finish).
A structural method is to make a 'loading' or 'scratch' table which has IDs and relevant data to be operated on (you could see this as a queue). When calls to do updates come in, they simply insert their requests into that queue. Then you have an asynchronous process (that can only be called one at a time) that gets all outstanding data from the queue (and flags it as such), does the relevant processing, then cleans out the processed data from the scratch table.
Note that if you have other things accessing/using this table, then they could get blocked or deadlocked on this table too, and then your approach needs to be very careful.

Is this index on a table necessary?

I have a table with 5 indexes in SQL server. I'm aware of the fact that indexes affect inserts so I'd like to keep them to the minimum. I definitely need the first four indexes (as seen in the below sample).
However, I'm note quite sure if the last index (TimeSubmitted) is absolutely necessary - note that there is already CliendId+TimeSubmitted index.
The only reason why it is there is to make purging of expired rows from the table more efficient - or at least this is what my intention is. The purging job will be scheduled to run once a day - most likely at night.
There could be hundreds of thousands of records in the table at any given time.
Stored proc to purge the table:
CREATE PROCEDURE uspPurgeMyTable
(
#ExpiryDate datetime
)
AS
BEGIN
DELETE FROM MyTable
WHERE TimeSubmitted < #ExpiryDate;
END
Table (non relevant columns omitted):
CREATE TABLE MyTable (
[ClientId] [char](36) NOT NULL,
[UserName] [nvarchar](256) NOT NULL,
[TimeSubmitted] [datetime] NOT NULL,
[ProviderId] [uniqueidentifier] NULL,
[RegionId] [int] NULL
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientArea] ON [dbo].[MyTable]
(
[ClientId] ASC,
[RegionId] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientPrinter] ON [dbo].[MyTable]
(
[ClientId] ASC,
[ProviderId] DESC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientTime] ON [dbo].[MyTable]
(
[ClientId] ASC,
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientUser] ON [dbo].[MyTable]
(
[ClientId] ASC,
[UserName] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_MyTable_TimeSubmitted] ON [dbo].[MyTable]
(
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

Yes, the IX_MyTable_TimeSubmitted index will be beneficial for the purge process you described. Since the purge is based on the TimeSubmitted column only and none of the other indexes start with that, the best SQL Server would be able to do is use them in an index scan.
As with any indexing, there are trade-offs between read and update performance. You should measure your read, insert, and purge performance with and without the additional index to see what provides the best response time for each situation. Batch processes such as the purge that run overnight may not be as time sensitive as inserting or reading in your particular scenario.

If you are concerned about performance, you should take a different approach, partitioning. A reasonable place to learn about partitioning is the documentation.
A partitioned table stores each partition in a separate set of files. These are invisible to users of the database, in general. However, if you want to drop old data, you can just drop a partition.
This not only eliminates the need for the index. It also eliminates the need for the delete. And, dropping a partition is much more efficient than deleting, because there is much less logging.

SQL Server : MERGE performance

I have a database table with 5 million rows. The clustered index is auto-increment identity column. There PK is a code generated 256 byte VARCHAR which is a SHA256 hash of a URL, this is a non-clustered index on the table.
The table is as follows:
CREATE TABLE [dbo].[store_image](
[imageSHAID] [nvarchar](256) NOT NULL,
[imageGUID] [uniqueidentifier] NOT NULL,
[imageURL] [nvarchar](2000) NOT NULL,
[showCount] [bigint] NOT NULL,
[imageURLIndex] AS (CONVERT([nvarchar](450),[imageURL],(0))),
[autoIncID] [bigint] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_imageSHAID] PRIMARY KEY NONCLUSTERED
(
[imageSHAID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [autoIncPK] ON [dbo].[store_image]
(
[autoIncID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
imageSHAID is a SHA256 hash of an image URL e.g. "http://blah.com/image1.jpg", it is hashed into a varchar of 256 length.
imageGUID is a code generated guid in which I identify the image (it will be used as an index later, but for now I have omitted this column as an index)
imageURL is the full URL of the image (up to 2000 characters)
showCount is the number of times the image is shown, this is incremented each time this particular image is shown.
imageURLIndex is a computed column limited by 450 characters, this allows me to do text searches on the imageURL should I choose to, it is indexable (again index is omitted for brevity)
autoIncID is the clustered index, should allow faster inserting of data.
Periodically I merge from a temp table into the store_image table. The temp table structure is as follows (very similar to the store_image table):
CREATE TABLE [dbo].[store_image_temp](
[imageSHAID] [nvarchar](256) NULL,
[imageURL] [nvarchar](2000) NULL,
[showCount] [bigint] NULL,
) ON [PRIMARY]
GO
When the merge process is run, I write a DataTable to the temp table using the following code:
using (SqlBulkCopy bulk = new SqlBulkCopy(storeConn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.KeepNulls, null))
{
bulk.DestinationTableName = "[dbo].[store_image_temp]";
bulk.WriteToServer(imageTableUpsetDataTable);
}
I then run the merge command to update the showCount in the store_image table by merging from the temp table based on the imageSHAID. If the image doesn't currently exist in the store_image table, I create it:
merge into store_image as Target using [dbo].[store_image_temp] as Source
on Target.imageSHAID=Source.imageSHAID
when matched then update set
Target.showCount=Target.showCount+Source.showCount
when not matched then insert values (Source.imageSHAID,NEWID(), Source.imageURL, Source.showCount);
I'm typically trying to merge 2k-5k rows from the temp table to the store_image table at any one merge process.
I used to run this DB on a SSD (only SATA 1 connected) and it was very fast (under 200 ms). I ran out of room on the SSD so I swapped the DB to a 1TB 7200 cache spinning disk, since then completion times are over 6-100 seconds (6000 - 100000MS). When the bulk insert is running I can see disk activity of around 1MB-2MB/sec, low CPU usage.
Is this a typical write time for this amount of data? It seems a little slow to me, what is causing the slow performance? Surely with the imageSHAID being indexed we should expect quicker seek times than this?
Any help would be appreciated.
Thanks for your time.

Your UPDATE clause in the MERGE updates showCount. This requires a key lookup on the clustered index.
However, the clustered index is also declared non-unique. This gives information to the optimiser even though the underlying column is unique.
So, I'd make these changes
the clustered primary key to be autoIncID
the current PK on imageSHAID to be a standalone unique index (not constraint) and add an INCLUDE for showCount. Unique constraints can't have INCLUDEs
More observations:
you don't need nvarchar for the hash or URL columns. These are not unicode.
A hash is also fixed length so can be char(64) (for SHA2-512).
The length of a column defines how much memory to assign to the query. See this for more: is there an advantage to varchar(500) over varchar(8000)?

Soft Delete - Use IsDeleted flag or separate joiner table?

Should we use a flag for soft deletes, or a separate joiner table? Which is more efficient? Database is SQL Server.
Background Information
A while back we had a DB consultant come in and look at our database schema. When we soft delete a record, we would update an IsDeleted flag on the appropriate table(s). It was suggested that instead of using a flag, store the deleted records in a seperate table and use a join as that would be better. I've put that suggestion to the test, but at least on the surface, the extra table and join looks to be more expensive then using a flag.
Initial Testing
I've set up this test.
Two tables, Example and DeletedExample. I added a nonclustered index on the IsDeleted column.
I did three tests, loading a million records with the following deleted/non deleted ratios:
Deleted/NonDeleted
50/50
10/90
1/99
Results - 50/50
Results - 10/90
Results - 1/99
Database Scripts, For Reference, Example, DeletedExample, and Index for Example.IsDeleted
CREATE TABLE [dbo].[Example](
[ID] [int] NOT NULL,
[Column1] [nvarchar](50) NULL,
[IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Example] ADD CONSTRAINT [DF_Example_IsDeleted] DEFAULT ((0)) FOR [IsDeleted]
GO
CREATE TABLE [dbo].[DeletedExample](
[ID] [int] NOT NULL,
CONSTRAINT [PK_DeletedExample] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DeletedExample] WITH CHECK ADD CONSTRAINT [FK_DeletedExample_Example] FOREIGN KEY([ID])
REFERENCES [dbo].[Example] ([ID])
GO
ALTER TABLE [dbo].[DeletedExample] CHECK CONSTRAINT [FK_DeletedExample_Example]
GO
CREATE NONCLUSTERED INDEX [IX_IsDeleted] ON [dbo].[Example]
(
[IsDeleted] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

The numbers you have seem to indicate that my initial impression was correct: if your most common query against this database is to filter on IsDeleted = 0, then performance will be better with a simple bit flag, especially if you make wise use of indexes.
If you often query for deleted and undeleted data separately, then you could see a performance gain by having a table for deleted items and another for undeleted items, with identical fields. But denormalizing your data like this is rarely a good idea, as it will most often cost you far more in code maintenance costs than it will gain you in performance increases.

I'm not the SQL expert but in my opinion,it all depends on the usage frequency of the database. If the database is accessed by the large number of users and needs to be efficient then usage of a seperate isDeleted table will be good. The better option would be using a flag during the production time and as a part of daily/weekly/monthly maintanace you may move all the soft deleted records to the isDeleted table and clear the production table of soft deleted records. The mixture of both option will be good a good one.

Problem in using indexes with SQL Server 2005 and Hibernate

I've a problem with a query generated by Hibernate that do not uses an index. Access to database is made from Java using JTDS and server version is SQL Server 2005, latest service pack.
The field is nullable and is a foreign key that, in some specific scenarios, could be completely null, column is indexed via a not clustered index but the index is never used when the column is entirely null, creating a large number of full table scans and performance issues.
The situation could be verified also using the standard query analyzer with the following SQL code:
Create table and indexes
CREATE TABLE [dbo].[TestNulls](
[PK] [varchar](36) NOT NULL,
[DATA] [varchar](36) NULL,
[DATANULL] [varchar](36) NULL,
CONSTRAINT [PK_TestNulls] PRIMARY KEY NONCLUSTERED
(
[PK] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IDX_DATA] ON [dbo].[TestNulls]
(
[DATA] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_DATANULL] ON [dbo].[TestNulls]
(
[DATANULL] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Fill it with some random data using the newid function
declare #i as int
set #i = 0
while (#i < 500000)
begin
set nocount on
insert into TestNulls values(NEWID(), NEWID(), null)
insert into TestNulls values(NEWID(), null, null)
insert into TestNulls values(NEWID(), null, null)
set #i = (#i + 1)
set nocount on
end;
This query perform a full table scan
declare #p varchar(36)
set #p = NEWID()
select PK, DATA, DATANULL from TestNulls
where DATANULL = #p
If I complete the query with a "and DATANULL IS NOT NULL" the query now uses the index.
Help needed:
How can I force the JTDS/Hibernate combination to use the index (the sendStringParametersAsUnicode is already set to false by default)?
Is there a way to append "and column is not null" for all hibernate queries that uses a nullable field?
Any explanation about this behavior?
Regards
Massimo

1) I think, we should avoid NULL values. Just use DEFAULT and place some {00000-0000-000...} as NULL value. Your data filling script generates too many nulls values, so selectivity of values of this field is very low. I think SQL Server will choose to scan then use index in this case (SQL Server automaticly chooses to use or does not use index itself). And it makes sence. You should analyse your REAL data. Any way you can force it to just use some index. You can create stored procedure to sql server and then query it from hibernate, for example,
or command hibernate to use custom query to request data (I think, it is possible) and add table hint to your query to force using some index:
INDEX ( index_val [ ,...n ] ):
select PK, DATA, DATANULL from TestNulls WITH INDEX(IDX_DATANULL)
The selectivity is the "number of rows" / "cardinality", so if you have 10K customers, and search for all "female", you have to consider that the search would return 10K/2 = 5K rows, so a very "bad" selectivity.
Luck.

You are using a table with no clustered index ("heap table", as it is called), that is generally not very efficient for SELECTs, because any meaningful query requires either a bookmark lookup or a full table scan.
So, to use the index the server will have to: 1) find the given values in the index and retrieve corresponding Row IDs, 2) retrive the rows by the IDs and return the data.
Given the nature of your data, the optimizer "thinks" full scan is mor efficient.
I'd suggest you to try:
Rebuilding the statistics on the table. Outdated stats could lead optimizer to wrong decisions.
Force using index via a hint. Do not forget to test whether it is really faster on your actual data (sometimes optimizer happens to know better than you).
Create a covering index for this query by adding some data (it will make inserts/updates somewhat slower, so you should consider the overall impact on the system):
CREATE INDEX IDX_DATANULL_FULL ON TestNulls (DATANULL) INCLUDE (PK, DATA)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server 2008 index fragmentation problem - sql

Related

Table deadlock issues with lock escalations

Is this index on a table necessary?

SQL Server : MERGE performance

Soft Delete - Use IsDeleted flag or separate joiner table?

Problem in using indexes with SQL Server 2005 and Hibernate

Categories

Resources