I have a SQL Server query that is performing poorly when retrieving data via pagination using offset/fetch. The earlier pages return results very fast but later ones are extremely slow and creating a bottleneck in our system. It's joining on two temp tables (#A and #T). Here is a pared down version of the query:
Select
I.CustomerID,
I.InvoiceID,
I.ItemID,
TI1.TrackID as TrackID1,
TI1.ItemName as ItemName1,
TI1.TrackCatID as TrackCatID1,
TI1.CategoryName as CategoryName1,
TI2.TrackID as TrackID2,
TI2.ItemName as ItemName2,
TI2.TrackCatID as TrackCatID2,
TI2.CategoryName as CategoryName2,
A.AccID,
A.Name as AccountName,
A.utimestamp as UpdateTimeStamp
FROM
#A A
Inner Join [dbo].[Item] I WITH(FORCESEEK)
On
A.CustomerID = I.CustomerID And
A.AccountID = I.AccountID
Left Join #T TI1 On
I.CustomerID = TI1.CustomerID And
I.TrackID1 = TI1.TrackID
Left Join #T TI2 On
I.CustomerID = TI2.CustomerID And
I.TrackID2 = TI2.TrackID
Order by
A.utimestamp
Offset 0 Rows Fetch Next 1000 Rows Only
Where the temp tables are defined as:
Create table #T (CustomerID uniqueidentifier, TrackCatID uniqueidentifier, TrackID uniqueidentifier, ItemName varchar(100), CategoryName varchar(100),PRIMARY KEY (CustomerID,TrackID))
Create table #A (CustomerID uniqueidentifier, AccountID uniqueidentifier, Name varchar(100), utimestamp timestamp, PRIMARY KEY (CustomerID, AccountID))
Regarding the DB table [dbo].[Item] it is defined as:
CREATE TABLE [dbo].[Item](
[Sequence] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [uniqueidentifier] NOT NULL,
[ItemID] [uniqueidentifier] NOT NULL,
[AccountID] [uniqueidentifier] NULL,
[TrackID1] [uniqueidentifier] NULL,
[TrackID2] [uniqueidentifier] NULL,
CONSTRAINT [PK_Item] PRIMARY KEY NONCLUSTERED
(
[CustomerID] ASC,
[ItemID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY],
CONSTRAINT [CX_Item] UNIQUE CLUSTERED
(
[CustomerID] ASC,
[Sequence] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
And has a number of indexes that each have have several columns:
Item Indexes
1: CusotmerID, Sequence
2: CusotmerID, AccountId, Sequence
3: CusotmerID, TrackId1
4: CusotmerID, TrackId2
5: CusotmerID, ItemID
Is there something I'm missing that's causing the later paginated queries to be slow? Note: The 0 in "Offset 0 Rows" increases by 1000 for every page.
Also, I added a index to the temp table #A and didn't see an improvement to the results:
CREATE INDEX IDX_Timestamp ON #A(utimestamp)
Related
I have below table:
CREATE TABLE [dbo].[Client](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](150) NOT NULL,
[InternalSiteId] AS (isnull(CONVERT([int],[dbo].[GetCurrentTemporalValue]([Id],'Client_InternalSite')),(0))),
[BudgetingStatusId] AS (isnull(CONVERT([int],[dbo].[GetCurrentTemporalValue]([Id],'Client_BudgetingStatus')),(1))),
[BusinessUnitId] AS (isnull(CONVERT([int],[dbo].[GetCurrentTemporalValue]([Id],'Client_BusinessUnit')),(0))),
CONSTRAINT [PK_dbo.Client] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
which has 3 calculated fields using scalar function:
CREATE FUNCTION [dbo].[GetCurrentTemporalValue]
(
#clientId INT,
#temporalType NVARCHAR(128)
)
RETURNS NVARCHAR(255)
AS
BEGIN
DECLARE #retVal INT
DECLARE #at DATETIME
SET #at = GETUTCDATE()
SELECT #retVal = CAST(Value AS INT) FROM dbo.Temporal
WHERE 1 = 1
AND ClientId = #clientId
AND TemporalType = #temporalType
AND ( ValidFrom <= #at OR ValidFrom IS NULL )
AND ( ValidTo >= #at OR ValidTo IS NULL)
RETURN #retVal
END
GO
and this function uses below table:
CREATE TABLE [dbo].[Temporal](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ClientId] [int] NOT NULL,
[Value] [nvarchar](255) NULL,
[ValidFrom] [datetime2](7) NULL,
[ValidTo] [datetime2](7) NULL,
[TemporalType] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_dbo.Temporal] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
The Client table has around 2500 records but the select * from [Client] takes about 14 seconds!
When I comment these 3 calculated fields, it gets back to normal value below one second. So the scalar function seems to cause the issue.
I tried to make condition easier ( left only 1 = 1 ) but it didn't change the speed.
What I want to achieve is:
historical values (kept in Temporal table)
current value to be pointed out by Client.InternalSiteId, Client.BudgetingStatusId and Client.BusinessUnitId
I cannot just simply make InternalSiteId a regular int fields with foreign key to, let say, InternalSiteHistoryTable as there are ValidFrom and ValidTo fields pointing out the period in which given value is valid and current and e.g. ValidFrom can be set to future value. That's why I need such calculations in function to find out current value.
What should I do / change to achieve above goals, but keep the reasonable fetching data speed?
Use a view to join to your Temporal table instead of embedding function calls
-- Table without those functions to slow things down
CREATE TABLE [dbo].[ClientName](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](150) NOT NULL,
CONSTRAINT [PK_dbo.Client] PRIMARY KEY CLUSTERED
(
[Id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)
ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
--instead, use a view to do the same thing
CREATE VIEW [dbo].[Client] as
SELECT
ClientName.Id,
ClientName.Name,
isnull(CONVERT([int], Client_InternalSite.Value), 0) AS InternalSiteId,
isnull(CONVERT([int], Client_BudgetingStatus.Value, 1) AS BudgetingStatusId,
isnull(CONVERT([int], Client_BusinessUnit.Value, 0) AS BusinessUnitId
FROM ClientName INNER JOIN
( SELECT Value, ClientId
FROM Temporal
WHERE TemporalType='Client_InternalSite'
AND ( ValidFrom <= GETUTCDATE() OR ValidFrom IS NULL )
AND ( ValidTo >= GETUTCDATE() OR ValidTo IS NULL)
) AS Client_InternalSite ON ClientName.ID = Client_InternalSite.ClientID
INNER JOIN
( SELECT Value, ClientId
FROM Temporal
WHERE TemporalType='Client_BudgetingStatus'
AND ( ValidFrom <= GETUTCDATE() OR ValidFrom IS NULL )
AND ( ValidTo >= GETUTCDATE() OR ValidTo IS NULL)
) AS Client_BudgetingStatus ON ClientName.ID = Client_BudgetingStatus.ClientID
INNER JOIN
( SELECT Value, ClientId
FROM Temporal
WHERE TemporalType='Client_BusinessUnit'
AND ( ValidFrom <= GETUTCDATE() OR ValidFrom IS NULL )
AND ( ValidTo >= GETUTCDATE() OR ValidTo IS NULL)
) AS Client_BusinessUnit ON ClientName.ID = Client_BusinessUnit.ClientID
GO
I can't test this against your DB, so I don't know about your indexes on your Temporal table (ID, ValidFrom, ValidTo columns), but typically a VIEW like this is going to run quicker because the tables are queried only once.
I'm trying to join data from 3 tables in SQL Servre and display in result:
Alias of an entity
if the entity is virtual
the last date (if known)
the value (if known)
I tried this :
select
sr.alias, c.virtual, max(d.date) date
from
App_references sr
join
Sensor c on (c.id_capteur = sr.id_capteur)
left join
Sensor_data d on (c.id_capteur = d.id_capteur)
group by
d.id_capteur, sr.alias, c.virtual
order by
sr.alias
Here is the database scheme:
CREATE TABLE [dbo].[App_reference]
(
[id_ref] [int] IDENTITY(1,1) NOT NULL,
[alias] [varchar](60) NOT NULL,
[id_capteur] [int] NOT NULL,
CONSTRAINT [PK_App_reference] PRIMARY KEY CLUSTERED
(
[id_ref] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Sensor]
(
[id_capteur] [int] IDENTITY(1,1) NOT NULL,
[name] [varchar](50) NOT NULL,
[virtual] [tinyint] NULL,
[unite] [varchar](5) NULL,
[id_type] [int] NOT NULL,
CONSTRAINT [PK_Sensor] PRIMARY KEY CLUSTERED
(
[id_capteur] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Sensor_data]
(
[id_entry] [int] IDENTITY(1,1) NOT NULL,
[id_capteur] [int] NOT NULL,
[value] [xml] NOT NULL,
[date] [datetime] NOT NULL,
CONSTRAINT [PK_Sensor_data] PRIMARY KEY CLUSTERED
(
[id_entry] ASC,
[id_capteur] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Supposing each columns like "id_%" are linked by foreign key.
The request on top pass well, I got value :
alias virtual date
Place 1 (Physique) 0 2017-04-27 14:58:42.423
Place 2 1 NULL
Place 3 1 NULL
But I tried to select the value too by doing this :
select
sr.alias, c.virtual, max(d.date) date, d.value
from
Citopia_test.dbo.Smartparking_reference sr
join
Citopia_test.dbo.Sensor c on (c.id_capteur = sr.id_capteur)
left join
Citopia_test.dbo.Sensor_data d on (c.id_capteur = d.id_capteur)
group by
d.id_capteur, sr.alias, c.virtual
order by
sr.alias
And I got this error :
Column 'Sensor_data.value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
So I tried several things like adding column in the group by but nothing changes.
You probably want the value from the record with max date. Use ROW_NUMBER to get those records.
select alias, virtual, date, value
from
(
select
sr.alias, c.virtual, d.date, d.value,
row_number() over (partition by sr.alias order by d.date desc) as rn
from Citopia_test.dbo.Smartparking_reference sr
join Citopia_test.dbo.Sensor c on (c.id_capteur = sr.id_capteur)
left join Citopia_test.dbo.Sensor_data d on (c.id_capteur = d.id_capteur)
) numbered
where rn = 1
order by sr.alias;
This gets you one row per sr.alias. If you want one row per sr.alias + c.virtual then change the partition by clause accordingly.
I am in need to some help writing a SQL 2012 query that will help me find and mark orderID's that are a $0.00 payments due to reversal(s)
So far I have:
Select Distinct a.orderID, a.orderPaid,
(Select SUM((c1.linePrice + c1.lineShippingCost + c1.lineTaxCost + c1.lineOptionCost) * c1.lineQuantity)
From vwSelectOrderLineItems c1 Where c1.orderID = a.orderID) As OrderAmount,
(Select SUM(b1.payAmount) FROM vwSelectOrderPayments b1 Where b1.orderID = a.orderID) as Payment,
1 As IsReversal
From vwSelectOrders a
Left Outer Join vwSelectOrderPayments b On b.orderID = a.orderID
Where b.payValid = 1 AND a.orderPaid = 0
Which is returning me some $0 payments on some orders. When I query that payment table with the orderID of these records, I can see that 2 payments were posted... 1 the original payment, 2 the reversal.
How Can I flag the Orders that are $0 payments?
Oders
CREATE TABLE [dbo].[TblOrders](
[orderID] [bigint] IDENTITY(1000,1) NOT NULL,
[orderPaid] [bit] NOT NULL,
[orderPaidOn] [datetime] NULL
CONSTRAINT [PK_TblOrders] PRIMARY KEY CLUSTERED
(
[orderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[TblOrders] ADD CONSTRAINT [DF__TblOrders__order__1975C517] DEFAULT ((0)) FOR [orderPaid]
Order Line Items
CREATE TABLE [dbo].[TblOrderLineItems](
[lineID] [bigint] IDENTITY(1,1) NOT NULL,
[orderID] [bigint] NOT NULL,
[lineQuantity] [int] NOT NULL,
[linePrice] [money] NOT NULL,
[lineShippingCost] [money] NOT NULL,
[lineTaxCost] [money] NOT NULL,
[lineOptionCost] [money] NOT NULL,
CONSTRAINT [PK_TblOrderLineItems] PRIMARY KEY CLUSTERED
(
[lineID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[TblOrderLineItems] ADD CONSTRAINT [DF_TblOrderLineItems_lineShippingCost] DEFAULT ((0)) FOR [lineShippingCost]
GO
ALTER TABLE [dbo].[TblOrderLineItems] ADD CONSTRAINT [DF_TblOrderLineItems_lineTaxCost] DEFAULT ((0)) FOR [lineTaxCost]
GO
ALTER TABLE [dbo].[TblOrderLineItems] ADD CONSTRAINT [DF_TblOrderLineItems_lineOptionCost] DEFAULT ((0)) FOR [lineOptionCost]
GO
Order Payments
CREATE TABLE [dbo].[TblOrderPayments](
[paymentID] [bigint] IDENTITY(1,1) NOT NULL,
[orderID] [bigint] NOT NULL,
[payAmount] [money] NOT NULL,
[payPosted] [datetime] NOT NULL,
[payValid] [bit] NOT NULL,
CONSTRAINT [PK_TblOrderPayments] PRIMARY KEY CLUSTERED
(
[paymentID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[TblOrderPayments] ADD CONSTRAINT [DF_TblOrderPayments_payValid] DEFAULT ((0)) FOR [payValid]
GO
Views
CREATE VIEW [dbo].[vwSelectOrderLineItems] AS
SELECT linePrice, lineShippingCost, lineTaxCost, lineOptionCost, lineQuantity
FROM [dbo].[TblOrderLineItems]
CREATE VIEW [dbo].[vwSelectOrderPayments] AS
SELECT paymentID, orderID, payAmount, payValid
FROM dbo.TblOrderPayments
CREATE VIEW [dbo].[vwSelectOrders] AS
SELECT orderID , orderPaid
FROM dbo.TblOrders
Note
I cannot change the table structure
SELECT distinct a.orderid,
a.orderPaid,
c.OrderAmount
d.Payment
From vwSelectOrders AS a
INNER JOIN ( Select SUM((linePrice + lineShippingCost + lineTaxCost + lineOptionCost) * lineQuantity) As orderAmount,OrderID
From vwSelectOrderLineItems group by orderid) AS C on c.orderID = a.orderID
INNER JOIN (Select SUM(payAmount) as Payment,orderID FROM vwSelectOrderPayments WHERE isnull(SUM(PayAmount),0) > 0 GROUP BY OrderID) AS d ON d.orderID = a.orderID
Left Outer Join vwSelectOrderPayments b On b.orderID = a.orderID
Where b.payValid = 1 AND a.orderPaid = 0 AND
This is a better query as you do not have to us a correlated subquery. Correlated queries are when a subquery references an outerquery row. This isn't optimal because every row the outerquery runs the correlated subquery will execute. Once you give us table definitions we can probably fix the overall data return of your query.
I have the following table:
CREATE TABLE [Cache].[Marker](
[ID] [int] NOT NULL,
[SubID] [varchar](15) NOT NULL,
[ReadTime] [datetime] NOT NULL,
[EquipmentID] [varchar](25) NULL,
[Sequence] [int] NULL
) ON [PRIMARY]
With the following clustered index:
CREATE UNIQUE CLUSTERED INDEX [IX_Marker_EquipmentID_ReadTime_SubID] ON [Cache].[Marker]
(
[EquipmentID] ASC,
[ReadTime] ASC,
[SubID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
And this query:
Declare #EquipmentId nvarchar(50)
Set #EquipmentId = 'KLM52B-MARKER'
SELECT TOP 1
cr.C44DistId,
cr.C473RightLotId
From Cache.Marker m
INNER JOIN Cache.vwCoaterRecipe AS cr ON cr.MarkerId = m.ID
Where m.EquipmentID = #EquipmentId And m.ReadTime >= '3/1/2013'
ORDER BY m.Id desc
Here is the query plan being generated:
My question is this. Why isn't the clustered index on the Cache.Marker table being used with a seek instead of a scan on another index? Furthermore, SSMS query analyzer is suggesting I add an index on Marker.ReadTime with ID and EquipmentID columns included.
There are roughly 1M rows in the Cache.Marker table.
How many unique equipment ID's do you have? It's probably decided date is a better first lookup (perhaps mistakenly). You can force it to use your index though with the WITH( INDEX() ) statement. FORCESEEK can help as well. I highly recommend this because then index behavior is predictable as databases grow to large sizes.
SELECT TOP 1
cr.C44DistId,
cr.C473RightLotId
From Cache.Marker m
WITH ( INDEX( IX_Marker_EquipmentID_ReadTime_SubID ), FORCESEEK )
INNER JOIN Cache.vwCoaterRecipe AS cr
ON cr.MarkerId = m.ID
Where m.EquipmentID = #EquipmentId And m.ReadTime >= '3/1/2013'
ORDER BY m.Id desc
table optin_channel_1 (for each 'channel' there's a dedicated table)
CREATE TABLE [dbo].[optin_channel_1](
[key_id] [bigint] NOT NULL,
[valid_to] [datetime] NOT NULL,
[valid_from] [datetime] NOT NULL,
[key_type_id] [int] NOT NULL,
[optin_flag] [tinyint] NOT NULL,
[source_proc_id] [int] NOT NULL,
[date_inserted] [datetime] NOT NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [ix_id] ON [dbo].[optin_channel_1]
(
[key_type_id] ASC,
[key_id] ASC,
[valid_to] ASC,
[valid_from] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
table profile_conns
CREATE TABLE [dbo].[profile_conns](
[profile_key_id] [bigint] NOT NULL,
[valid_to] [datetime] NOT NULL,
[valid_from] [datetime] NOT NULL,
[conn_key_id] [bigint] NOT NULL,
[conn_key_type_id] [int] NOT NULL,
[conn_type_id] [int] NOT NULL,
[source_proc_id] [int] NOT NULL,
[date_inserted] [datetime] NOT NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [ix_id] ON [dbo].[profile_conns]
(
[profile_key_id] ASC,
[conn_key_type_id] ASC,
[conn_key_id] ASC,
[valid_to] ASC,
[valid_from] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
table lu_channel_conns
CREATE TABLE [dbo].[lu_channel_conns](
[channel_id] [int] NOT NULL,
[conn_type_id] [int] NOT NULL,
CONSTRAINT [PK_lu_channel_conns] PRIMARY KEY CLUSTERED
(
[channel_id] ASC,
[conn_type_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
table lu_conn_type
CREATE TABLE [dbo].[lu_conn_type](
[conn_type_id] [int] NOT NULL,
[default_key_type_id] [int] NOT NULL,
[master_key_type_id] [int] NOT NULL,
[date_inserted] [datetime] NOT NULL,
CONSTRAINT [PK_lu_conns] PRIMARY KEY CLUSTERED
(
[conn_type_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
view v_source_proc_id_by_group_id
SELECT DISTINCT x.source_proc_id, x.source_proc_group_id
FROM lu_source_proc x INNER JOIN lu_source_proc_group y ON x.source_proc_group_id = y.group_id
There's a dynamic SQL statement going to be executed:
SET #sql_str='SELECT #ret=MAX(o.optin_flag)
FROM optin_channel_'+CAST(#channel_id AS NVARCHAR(100))+' o
INNER HASH JOIN dbo.v_source_proc_id_by_group_id y ON o.source_proc_id=y.source_proc_id AND y.source_proc_group_id=#source_proc_group_id
INNER HASH JOIN profile_conns z ON z.profile_key_id=cast(#profile_key_id AS NVARCHAR(100)) AND z.conn_key_type_id=o.key_type_id AND z.conn_key_id=o.[key_id] AND z.valid_to=''01.01.3000''
INNER HASH JOIN lu_channel_conns x ON x.channel_id=#channel_id AND z.conn_type_id=x.conn_type_id
INNER HASH JOIN lu_conn_type ct ON ct.conn_type_id=x.conn_type_id AND ct.default_key_type_id=o.key_type_id'
SET #param='#channel_id INT, #profile_key_id INT, #source_proc_group_id INT, #ret NVARCHAR(400) OUTPUT'
EXEC sp_executesql #sql_str,#param,#channel_id,#profile_key_id,#source_proc_group_id,#ret OUTPUT
I.e. this gives:
SELECT #ret=MAX(o.optin_flag) AS optin_flag
FROM optin_channel_1 o
INNER HASH JOIN dbo.v_source_proc_id_by_group_id y
ON o.source_proc_id=y.source_proc_id
AND y.source_proc_group_id=5
INNER HASH JOIN profile_conns z
ON z.profile_key_id=1
AND z.conn_key_type_id=o.key_type_id
AND z.conn_key_id=o.[key_id]
AND z.valid_to='01.01.3000'
INNER HASH JOIN lu_channel_conns x
ON x.channel_id=1
AND z.conn_type_id=x.conn_type_id
INNER HASH JOIN lu_conn_type ct
ON ct.conn_type_id=x.conn_type_id
AND ct.default_key_type_id=o.key_type_id
These tables are used for an optin database. optin_flag could be 0 or 1. With the last statement I want to get a 1 as optin_flag from optin_channel_1 for the given channel_id=1 for user with profile_key_id=1, when optin was inserted into database by process belonging to source_proc_group_id=5. I hope this is enough to comprehend what's going on.
Is this the best way to use the CLUSTERED INDEX'es? Or would it be better to remove profile_key_id from index on profile_conns and put z.profile_key_id=1 in a WHERE clause?
May be there's a much better way for optimizing this select (changes in database schema is not possible, only changes on indexes and modifing statement).
Without knowing the size of the tables and the sort of data stored in it them it is difficult to gauge.
Assuming optin_channel_1 has a lot of data and profile_cons has a lot of data I would try the following:
Clustered index on optin_channel_1(key_id) or key_type_id depending on which field has the most distinct values. (since you don't have a covering index)
Clustered index on profile_conns (cons_key_id) or cons_key_type_id depending on what you have chosen in optin_channel_1
etc...
Basically, if your table profile_conns table has not much data, I would put the clustered index on the most fragmented "filter" field (I suspect profile_key_id). If the table has a lot of data I would aim for a hash/merge join and match the clustered index with the clustered index of the optin_channel_1 table.
I would also rewrite the query as such:
SELECT #ret = MAX(o.optin_flag) AS optin_flag
FROM optin_channel_1 o
JOIN dbo.v_source_proc_id_by_group_id y
ON o.source_proc_id = y.source_proc_id
JOIN profile_conns z
ON z.conn_key_type_id = o.key_type_id
AND z.conn_key_id = o.[key_id]
JOIN lu_channel_conns x
ON z.conn_type_id = x.conn_type_id
JOIN lu_conn_type ct
ON ct.conn_type_id = x.conn_type_id
AND ct.default_key_type_id=o.key_type_id
WHERE y.source_proc_group_id = 5
AND z.profile_key_id = 1
AND x.channel_id = 1
AND z.valid_to = '01.01.3000'
The query changed this way because:
Putting the filter conditions in the where clause shows you what are relevant fields to aim for a hash/merge join
Putting join hints is rarely a good idea. It is very hard to beat the query governor to determine the best query plan. A bad plan usually indicates you have an issue with your indexes/statistics.
So as summary:
small table joined to big table ==> go for nested loops & focus your clustered index on the "filter" field in the small table & the join field in the big table.
big table joined to big table => go for hash/merge join and put the clustered index on the matching field on both sides
multi-field indexes usually only a good idea when they are "covering", this means all the fields you query are included in the index. (or are included with the include() clause)