Updating SQL Database Table with 30M Rows - sql

I have a MS SQL table with approx 30M rows that I need to update a field based on the previous records. Here is an update that works but it is taking an incredible amount of time:
UPDATE AccountTransaction
SET EndingBalance = (SELECT COALESCE(SUM(b.amount), 0)
FROM AccountTransaction AS b
WHERE b.AccountId = AccountTransaction.AccountId
and b.Date <= AccountTransaction.Date
and (b.Date != AccountTransaction.Date
or b.CreatedDate < AccountTransaction.CreatedDate))
+ Amount
Here is the full DDL:
CREATE TABLE [dbo].[AccountTransaction](
[AccountTransactionId] [uniqueidentifier] NOT NULL,
[AccountId] [uniqueidentifier] NOT NULL,
[Amount] [decimal](16, 2) NOT NULL,
[EndingBalance] [decimal](16, 2) NOT NULL,
[Date] [date] NOT NULL,
[CreatedDate] [datetime2](3) NOT NULL,
CONSTRAINT [PkAccountTransaction] PRIMARY KEY CLUSTERED
(
[AccountTransactionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IxAccountTransaction_AccountId_Date_CreatedDate] ON [dbo].[AccountTransaction]
(
[AccountId] ASC,
[Date] ASC,
[CreatedDate] ASC
)
INCLUDE ([AccountTransactionId],
[Amount],
[EndingBalance]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IxAccountTransaction_AccountId] ON [dbo].[AccountTransaction]
(
[AccountId] ASC
)
INCLUDE ([AccountTransactionId],
[Amount],
[EndingBalance],
[Date],
[CreatedDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

The following should yield much better performance and will be able to take advantage of the IxAccountTransaction_AccountId_Date_CreatedDate index...
WITH
cte_Runningtotal AS (
SELECT
at1.EndingBalance,
NewEB = SUM(at1.Amount) OVER (PARTITION BY at1.AccountId ORDER BY at1.[Date] ROWS UNBOUNDED PRECEDING)
FROM
dbo.AccountTransaction at1
)
UPDATE rt SET
rt.EndingBalance = rt.NewEB
FROM
cte_Runningtotal rt;

Related

OFFSET ... FETCH is slow on high paging value

This is my scenario:
CREATE TABLE [dbo].[tblSMSSendQueueMain](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SendMethod] [int] NOT NULL
CONSTRAINT [PK_tblSMSSendQueueLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblSMSSendQueueMainSendStatus](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[QueueID] [int] NULL,
[SendStatus] [int] NULL,
[StatusDate] [datetime] NULL,
[UserID] [int] NULL,
CONSTRAINT [PK_tblSMSSendQueueMainSendStatus] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
and some indexes:
CREATE NONCLUSTERED INDEX [IX_tblSMSSendQueueMainSendStatus_SendStatus_Single] ON [dbo].[tblSMSSendQueueMainSendStatus]
(
[SendStatus] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSSendQueueMain_SendMethod] ON [dbo].[tblSMSSendQueueMain]
(
[SendMethod] ASC,
[ID] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
each table have about 13m rows
QueueID column of tblSMSSendQueueMainSendStatus is foreign key of ID column in tblSMSSendQueueMain.
The server has an 8 cores Xeon CPU and 8GB RAM.
I use offset and fetch for my paging plan, its perfect and OK for offset under 100k but when the offset going up (more than 100k), the query response is slow, and takes about 5 or 6 seconds to run.
This is my query:
SELECT q.ID
FROM tblSMSSendQueueMain q
INNER JOIN tblSMSSendQueueMainSendStatus qs
ON q.ID = qs.QueueID
WHERE 1 = 1
AND qs.SendStatus = 5
AND [SendMethod] = 19
ORDER BY q.ID desc OFFSET 10 * (1000000 - 1) ROWS
FETCH NEXT 10 ROWS ONLY
Does anyone have any idea where I am going wrong?
The reason this is so slow is that the only way for the server to get the correct starting row is by reading every single row before it.
You are much better off using Keyset Pagination. Instead of paging by starting row-number, pass in a parameter of the starting key.
For this to work, you must return a unique column or columns, and for this to be performant they should be indexed well.
Pass in #startingRow as the previous batch's highest ID, you can get this any way you like. E.g. I have used an ORDER BY so it will be the last row, or your client app will be able to retrieve it from a variable.
SELECT TOP (10)
q.ID
FROM tblSMSSendQueueMain q
INNER JOIN tblSMSSendQueueMainSendStatus qs
ON q.ID = qs.QueueID
WHERE 1 = 1
AND qs.SendStatus = 5
AND q.[SendMethod] = 19
AND qs.ID > #startingRow -- drop this line for the first query
ORDER BY qs.ID;
I must say, your query is somewhat strange. If the foreign key is q.ID = qs.QueueID, then you will get multiple identical results if you are just querying q.ID. I suspect you actually only want q.ID, in which case that is your unique key:
SELECT TOP (10) DISTINCT
q.ID
FROM tblSMSSendQueueMain q
INNER JOIN tblSMSSendQueueMainSendStatus qs
ON q.ID = qs.QueueID
WHERE 1 = 1
AND qs.SendStatus = 5
AND q.[SendMethod] = 19
AND q.ID > #startingRow -- drop this line for the first query
ORDER BY q.ID;
Alternatively, I would prefer an EXISTS/IN as it more clearly states the requirement:
SELECT TOP (10)
q.ID
FROM tblSMSSendQueueMain q
WHERE 1 = 1
AND q.[SendMethod] = 19
AND q.ID IN (
SELECT qs.QueueID
FROM tblSMSSendQueueMainSendStatus qs
WHERE qs.SendStatus = 5
)
AND q.ID > #startingRow -- drop this line for the first query
ORDER BY q.ID;

Azure DB/MSSQL 2017 query performance regression

I have this pretty simple table with 17m records in it:
CREATE TABLE [dbo].[LineNumbers](
[Id] [int] IDENTITY(1,1) NOT NULL,
[LineDescriptionId] [int] NOT NULL,
[ProtocolId] [int] NULL,
[Value] [int] NULL,
CONSTRAINT [PK_LineNumbers] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Query to the table with additional join works fine, if there is no ProtocolId in it:
select top 1
ln.LineDescriptionId
from LineNumbers ln
join LineDescriptions ld on ld.Id = ln.LineDescriptionId and ld.ProtocolSetId = 25
-- Elapsed time: 00:00:00.1718750
Execution plan: https://www.brentozar.com/pastetheplan/?id=rJV34gvR7
But when I try to add ProtocolId to the field list, query time grows dramatically:
select top 1
ln.ProtocolId
from LineNumbers ln
join LineDescriptions ld on ld.Id = ln.LineDescriptionId and ld.ProtocolSetId = 25
-- Elapsed time: 00:02:19.6464843
Execution plan: https://www.brentozar.com/pastetheplan/?id=SkG-hyDCQ
Also, this works smooth:
select top 1
(select ProtocolId from LineNumbers where LineNumbers.Id = ln.Id) as ProtocolId
from LineNumbers ln
join LineDescriptions ld on ld.Id = ln.LineDescriptionId and ld.ProtocolSetId = 25
-- Elapsed time: 00:00:00.1718750
Tried this queries and variations on Azure DB and local MSSQL 2017. Results are the same. As long as I keep ProtocolId out of the field list everything is fine.
Is there some mistake in my data scheme (everything was created via migrations of Entity Framework)?
CREATE TABLE [dbo].[LineNumbers](
[Id] [int] IDENTITY(1,1) NOT NULL,
[LineDescriptionId] [int] NOT NULL,
[ProtocolId] [int] NULL,
[Value] [int] NULL,
CONSTRAINT [PK_LineNumbers] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Index [IX_LineNumbers_LineDescriptionId] Script Date: 21.11.2018 10:47:09 ******/
CREATE NONCLUSTERED INDEX [IX_LineNumbers_LineDescriptionId] ON [dbo].[LineNumbers]
(
[LineDescriptionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [IX_LineNumbers_LineDescriptionId_Value] Script Date: 21.11.2018 10:47:09 ******/
CREATE NONCLUSTERED INDEX [IX_LineNumbers_LineDescriptionId_Value] ON [dbo].[LineNumbers]
(
[LineDescriptionId] ASC,
[Value] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [IX_LineNumbers_ProtocolId] Script Date: 21.11.2018 10:47:09 ******/
CREATE NONCLUSTERED INDEX [IX_LineNumbers_ProtocolId] ON [dbo].[LineNumbers]
(
[ProtocolId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
ALTER TABLE [dbo].[LineNumbers] WITH NOCHECK ADD CONSTRAINT [FK_LineNumbers_LineDescriptions_LineDescriptionId] FOREIGN KEY([LineDescriptionId])
REFERENCES [dbo].[LineDescriptions] ([Id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[LineNumbers] CHECK CONSTRAINT [FK_LineNumbers_LineDescriptions_LineDescriptionId]
GO
ALTER TABLE [dbo].[LineNumbers] WITH NOCHECK ADD CONSTRAINT [FK_LineNumbers_Protocols_ProtocolId] FOREIGN KEY([ProtocolId])
REFERENCES [dbo].[Protocols] ([Id])
GO
ALTER TABLE [dbo].[LineNumbers] CHECK CONSTRAINT [FK_LineNumbers_Protocols_ProtocolId]
GO
Eventually, I solved it by adding nonclustered index on field LineNumbers.LineDescriptionId with inclusion of LineNumbers.ProtocolId
CREATE NONCLUSTERED INDEX [IX_LineNumbers_LineDescriptionId_ProtocolId] ON
[dbo].[LineNumbers]([LineDescriptionId] ASC)
INCLUDE ([ProtocolId])
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
Result:
SELECT TOP 1
ln.ProtocolId
FROM LineNumbers ln
JOIN LineDescriptions ld ON ld.Id = ln.LineDescriptionId AND ld.ProtocolSetId = 25
-- Elapsed time: 00:00:00.1403155
Execution plan: https://www.brentozar.com/pastetheplan/?id=Syywn1wRQ
Why does it work that way?
For example, if I'd do similar use case with PostgreSQL then there is no need in any additional indexes at all (beside obvious FK indexes on ProtocolId and LineDescriptionId fields).

SQL Server slow select from large Table

i have 2 Really Big sql server Database tables for IOT Project
First TABLE IS Message (rows count 7,423,889,085 rows)
CREATE TABLE [aymax].[Message](
[MessageId] [bigint] IDENTITY(1,1) NOT NULL,
[ObjectId] [int] NOT NULL,
[TimeStamp] [datetime] NOT NULL CONSTRAINT [DF__Message__TimeSta__3B75D760] DEFAULT (getdate()),
[GpsTime] [datetime] NOT NULL,
[VisibleSatelites] [int] NOT NULL,
[X] [float] NOT NULL,
[Y] [float] NOT NULL,
CONSTRAINT [Message_PK] PRIMARY KEY NONCLUSTERED
(
[MessageId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Second table is MessageSensors , row count (26,359,568,037 rows) , this table have value for each sensor in message table
CREATE TABLE [aymax].[MessageSensors](
[MessageId] [bigint] NOT NULL,
[DataSourceId] [int] NOT NULL,
[Value] [float] NOT NULL CONSTRAINT [DF__AnalogDat__Value__5812160E] DEFAULT ((0)),
CONSTRAINT [AnalogData_PK] PRIMARY KEY CLUSTERED
(
[MessageId] ASC,
[DataSourceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
my problem that seek by time interval between 2 date time is really slow , also it became more slow if i select with message sensor data , also when i use sp_BlitzIndex check from brentozar.com it say that i have
"Indexaphobia: High value missing index"
[aymax].[MessageSensors] (EQUALITY: [DataSourceId], [Value] INCLUDES: [MessageId] )
[aymax].[MessageSensors] EQUALITY: [Value] INCLUDES: [MessageId], [DataSourceId]
I belive that create this 2 index is will increase storage alot , also will take too much time to be created , i need your advice for both table regarding index
my current indexes
1-
CREATE NONCLUSTERED INDEX [IX_gpstime_objectid] ON [aymax].[Message]
(
[GpsTime] ASC
)
INCLUDE ( [MessageId],
[ObjectId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
2-
alter TABLE [aymax].[Message] ADD CONSTRAINT [Message_PK] PRIMARY KEY NONCLUSTERED
(
[MessageId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
3rd-
ALTER TABLE [aymax].[MessageSensors] ADD CONSTRAINT [AnalogData_PK] PRIMARY KEY CLUSTERED
(
[MessageId] ASC,
[DataSourceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
4-
CREATE NONCLUSTERED INDEX [MessageData_DataSourceId_IDX] ON [aymax].[MessageSensors]
(
[DataSourceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
any help please , i need to make a fast retrieve from message , and message sensor
update
while doing some investigate i found that select float value will slow up the result too much , from 1 second to 3 minutes
SELECT m.messageid,
m.objectid,
m.gpstime,
m.x,
m.y,
-- slow is here if i replace md.value with md.messageId will return fast , md.value is float
md.Value ,
0
FROM aymax.[message] m WITH (nolock)
left JOIN aymax.MessageSensors md WITH (nolock)
ON m.messageid = md.messageid
AND md.datasourceid = 425732
WHERE m.objectid = 14099
AND m.gpstime BETWEEN '2017-04-01 19:46:18.607' AND '2017-04-10 19:05:18.607'
Possible solutions:
Filtered index (filter by date and do not index old data)
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes.
Clustered index on GpsTime, MessageId (Espessially if you have no plans about another indexes). Requires rebuild your table.
Partitions (see #Siyaul's comments)

Why XML column is greyed out?

why XML column doesn't let me put anything in it and greyed out, I can populate other columns ID A or ID B I can add values but with Xml it doesn't let me add anything, even "<element>" ?
Creating a table like this,
USE [DBNAME]
GO
DROP TABLE [dbo].[TableName]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TableName](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ID_A] [int] NULL,
[ID_B] [int] NULL,
[Xml] [xml] NULL,
CONSTRAINT [PK_TableName] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY],
CONSTRAINT [IX_TableName] UNIQUE NONCLUSTERED
(
[ID_A] ASC,
[ID_B] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

Why is index scan used instead of index seek?

I created unique, clustered indices on two large tables that I have to join, but those indices are not used as evidenced by the query plan. When I force the use of those indices with a hint, an index scan is used and performance gets much worse. The unique key of one table is the foreign key of the second table, so this puzzles me. Here is the schema. The two tables are LOC and POL. LOC has 7 million odd rows, and POL has over 6 million rows.
CREATE TABLE [dbo].[LOC](
[acct_num] [char](30) NOT NULL,
[cntr_num] [char](30) NOT NULL,
[lob_cde] [char](2) NOT NULL,
[ste_locn_nme] [char](30) NOT NULL,
[buldg_num] [char](20) NOT NULL,
[prctr_cde] [char](3) NULL,
...more fields...
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IDX_All] ON [dbo].[LOC]
(
[acct_num] ASC,
[cntr_num] ASC,
[lob_cde] ASC,
[buldg_num] ASC,
[prctr_cde] ASC,
[spcl_cond_1_id] ASC,
[spcl_cond_2_id] ASC,
[spcl_cond_3_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE UNIQUE CLUSTERED INDEX [IDX_LOC_PKEY] ON [dbo].[LOC]
(
[acct_num] ASC,
[lob_cde] ASC,
[prctr_cde] ASC,
[cntr_num] ASC,
[ste_locn_nme] ASC,
[buldg_num] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
... more non-unique indices on LOC, each on single columns: buldg_num, prctr_cde, acct_num
CREATE TABLE [dbo].[POL](
[acct_num] [char](30) NOT NULL,
[cntr_num] [char](30) NOT NULL,
[lob_cde] [char](2) NOT NULL,
[prctr_cde] [char](3) NULL,
...more fields...
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IDX_All] ON [dbo].[POL]
(
[acct_num] ASC,
[cntr_num] ASC,
[lob_cde] ASC,
[prctr_cde] ASC,
[acct_nme] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE UNIQUE CLUSTERED INDEX [IDX_POL_PKEY] ON [dbo].[POL]
(
[acct_num] ASC,
[lob_cde] ASC,
[prctr_cde] ASC,
[cntr_num] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
... more non-unique indices on POL, each on single columns: cntr_num, prctr_cde
As you can see, the four fields of the POL table's unique key (acct_num, lob_cde, prctr_cde, cntr_num) are the first four columns of the LOC table's primary key. Here is one query I want to run, like many that will join these two tables:
select
[Easy matches] = COUNT(*)
FROM LOC INNER JOIN POL ON (
LOC.acct_num = POL.acct_num
AND LOC.lob_cde = POL.lob_cde
AND LOC.prctr_cde = POL.prctr_cde
AND LOC.cntr_num = POL.cntr_num)
Without hints, this likes to use the IDX_prctr_cde index from each table. The prctr_cde column is not very selective; there are only seven different values in the LOC or POL tables. If I hint that the query should use IDX_cntr_num index, I get good performance, since it is a highly selective column (over 6 million distinct values in each table). acct_num is almost as selective as cntr_num, also with over 6 million distinct values.
Why is a non-selective index used by default? Why is switching to using the unique, clustered index making the query run much slower? (10x, 20x or even 30x slower.)
Note: The hint I used was:
OPTION (
TABLE HINT(POL, INDEX (IDX_POL_PKEY)),
TABLE HINT(LOC, INDEX (IDX_LOC_PKEY))
)
Note: I am using SQL Server 2005 and SQL Server 2008.