Clustered index trouble - sql

In our production system (SQL Server 2008 / R2) there is a table in which generated documents are stored.
The documents have a reference (varchar) and a sequence_nr (int). The document may be generated multiple times and each iteration gets saved in this table incrementing the sequence number. Additionally each record has a data column (varbinary) and a timestamp as well as a user tag.
The only reason to query this table is for auditing purposes later on and during inserts.
The primary key for the table is clustered over the reference and sequence_nr columns.
As you can probably guess generation of documents and thus the data in the table (since a document can be generated again at a later time) does not grow in order.
I realized this after inserts in the table started timing out.
The inserts are performed with a stored procedure. The stored procedure determines the current max sequence_nr for the given reference and inserts the new row with the next sequence_nr.
I am fairly sure a poor choice of clustered index is causing the timeout problems, since records will be inserted for already existing references, only with a different sequence_nr and thus may end up anywhere in the record collection, but most likely not at the end.
On to my question: would it be better to go for a non-clustered index as primary key or would it be better to introduce an identity column, make it a clustered primary key and keep an index for the combination of reference and sequence_nr?
Knowing that for the time being (and not at all as far as we can foresee) there is no need to query this table intensively, except for the case where a new sequence_nr must be determined.
Edit in answer to questions:
Tbh, I'm not sure about the timeout in the production environment. I do know that new documents get added in parallel running processes.
Table:
CREATE TABLE [dbo].[tbl_document] (
[reference] VARCHAR(50) NOT NULL,
[sequence_nr] INT NOT NULL,
[creation_date] DATETIME2 NOT NULL,
[creation_user] NVARCHAR (50) NOT NULL,
[document_data] VARBINARY(MAX) NOT NULL
);
Primary Key:
ALTER TABLE [dbo].[tbl_document]
ADD CONSTRAINT [PK_tbl_document] PRIMARY KEY CLUSTERED ([reference] ASC, [sequence_nr] ASC)
WITH (ALLOW_PAGE_LOCKS = ON, ALLOW_ROW_LOCKS = ON, PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF);
Stored procedure:
CREATE PROCEDURE [dbo].[usp_save_document] #reference NVARCHAR (50),
#sequence_nr INT OUTPUT,
#creation_date DATETIME2,
#creation_user NVARCHAR(50),
#document_data VARBINARY(max)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #current_sequence_nr INT
SELECT #current_sequence_nr = max(sequence_nr)
FROM [dbo].[tbl_document]
WHERE [reference] = #reference
IF #current_sequence_nr IS NULL
BEGIN
SELECT #sequence_nr = 1
END
ELSE
BEGIN
SELECT #sequence_nr = #current_sequence_nr + 1
END
INSERT INTO [dbo].[tbl_document]
([reference],
[sequence_nr],
[creation_date],
[creation_user],
[document_data])
VALUES (#reference,
#sequence_nr,
#creation_date,
#creation_user,
#document_data)
END
Hope that helps.

I would go for the setting the PK not clustered, since:
keeping a b-tree balanced when the key has varchar makes the each leaf much bigger.
you for what you say, you aren't scanning this table for many rows at a time

Since a clustered index physically reorders the records of the table to match the index order, it is only useful if you want to read out several consecutive records in that order because then the whole records can be read by doing a sequential read on the disk.
If you are only using data that is present in the index, there is no gain in make it clustered, because the index in itself (clustered or not) is kept separate from the data and in order.
So for your specific case a non-clustered index is the right way to go. Inserts won't need to reorder the data (only the index) and finding a new sequence_nr can be fulfill by looking at the index alone.

Related

SQL Server create primary key constraint duplicate key error

I have been experiencing some strange behaviour with one of my SQL commands taken from one of our stored procedures.
This command follows the below order of execution:
1) Drop table
2) Select * into table name from live server
3) Alter table to apply PK - this step fails once out of 4 daily executions
My SQL statement:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'
[inf].[tblBase_MyTable]') AND type in (N'U'))
DROP TABLE [inf].[tblBase_MyTable]
SELECT * INTO [inf].[tblBase_MyTable]
FROM LiveServer.KMS_ALLOCATION WITH (NOLOCK)
ALTER TABLE [inf].[tblBase_MyTable] ADD
CONSTRAINT [PK_KMS_ALLOCATION] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GRANT SELECT ON [inf].[tblBase_MyTable] TO ourGroup
This is very strange considering the table is dropped, and I thought the indexes / keys would also be dropped. However I get this error at the same time every day. Any advice would be very much appreciated.
Error:
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'inf.tblBase_MyTable' and the index name 'PK_KMS_ALLOCATION'.
Duplicate keys in [inf].[tblBase_MyTable] table are actually possible thanks to the WITH (NOLOCK) hint which allows "dirty reads". Have a look at blog which describes this in detail: SQL Server NOLOCK Hint & other poor ideas:
What many people think NOLOCK is doing
Most people think the NOLOCK hint just reads rows & doesn’t have to
wait till others have committed their updates or selects. If someone
is updating, that is OK. If they’ve changed a value then 99.999% of
the time they will commit, so it’s OK to read it before they commit.
If they haven’t changed the record yet then it saves me waiting, its
like my transaction happened before theirs did.
The Problem
The issue is that transactions do more than just update the row. Often
they require an index to be updated OR they run out of space on the
data page. This may require new pages to be allocated & existing rows
on that page to be moved, called a PageSplit. It is possible for your
select to completely miss a number of rows &/or count other rows
twice.
Well... you might have to repeat creating the new table and filling it until the check-query from #DarkoMartinovic does not return duplicates. Only then you can continue to add the PK. But this solution might cause heavy load on your live system. And you nave no guarantee that you have a 1:1 copy of the data as well.
Having reviewed various helpful comments here, I have decided against (for now) implementing SNAPSHOT isolation as this interface does not make use of a proper staging environment.
To move to this would mean either creating a staging area and setting that database to READ COMMITTED SNAPSHOT isolation, and a rebuild of the entire interface.
To that end and on the basis of saving development time, we have opted for ensuring that any ghost reads where dupes could be brought across from the source are handled before applying the PK.
This is by no means an ideal solution in terms of performance on the target server but will provide some headroom for now and certainly remove the previous error.
SQL approach below:
DECLARE #ALLOCTABLE TABLE
(SEQ INT, ID NVARCHAR(1000), CLASSID NVARCHAR(1000), [VERSION] NVARCHAR(25), [TYPE]
NVARCHAR(100), VERSIONSEQUENCE NVARCHAR(100), VERSIONSEQUENCE_TO NVARCHAR(100),
BRANCHID NVARCHAR(100), ISDELETED INT, RESOURCE_CLASS NVARCHAR(25), RESOURCE_ID
NVARCHAR(100), WARD_ID NVARCHAR(100), ISCOMPLETE INT, TASK_ID NVARCHAR(100));
------- ALLOCATION
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[inf].
[tblBase_MyTable]') AND type in (N'U'))
DROP TABLE [inf].[tblBase_MyTable]
SELECT * INTO [inf].[tblBase_MyTable]
FROM LiveServer.KMS_ALLOCATION WITH (NOLOCK)
INSERT INTO #ALLOCTABLE
SELECT *
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ISCOMPLETE DESC) SEQ, AL.*
FROM [inf].[tblBase_MyTable] AL
)DUPS
WHERE SEQ >1
DELETE FROM [inf].[tblBase_MyTable]
WHERE ID IN (SELECT ID FROM #ALLOCTABLE)
AND ISCOMPLETE = 0
ALTER TABLE [inf].[tblBase_MyTable] ADD CONSTRAINT
[PK_KMS_ALLOCATION] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GRANT SELECT ON [inf].[tblBase_MyTable] TO OurGroup

SQL type IGNORE_DUP_KEY

CREATE TYPE [dbo].[IdList] AS TABLE ([Id] [int] NULL)
GO
How can I insert the same value multiple times in a type?
Guess it is something like IGNORE_DUP_KEY, but I can't seem to get to work
If there is no key or index on the column (as there isn't, in the statement you've given) then there already is no restriction on inserting the same value multiple times in a table.
DECLARE #i IdList
INSERT #i VALUES (1), (1), (1)
will work just fine. If you want to have a unique index with the IGNORE_DUP_KEY option so inserts will be discarded if the value is already there, rather than producing a constraint violation, you can do so by including a unique index with that option in the declaration:
CREATE TYPE [dbo].[IdList] AS TABLE (
[Id] [int] NULL,
INDEX IX_IdList_Id UNIQUE(ID) WITH (IGNORE_DUP_KEY = ON)
);
Or with a primary key (for non-nullable columns):
CREATE TYPE [dbo].[IdList] AS TABLE ([Id] [int] PRIMARY KEY WITH (IGNORE_DUP_KEY = ON));
Be careful with this, because silently discarding duplicate values can be a real good way to mask essential problems in your processing. SQL Server does produce the informational message "Duplicate key was ignored", but that message is itself easy to ignore (and gives no details on what key(s)).
Jeroen Mostert did tell me just to remove the key and i did ant did work

Sql Server create table queries

I am using Sql Server Migration assistant to migrate my DB from MySql to SQL Server and in the process learning Sql Server.
The following is the create table syntax for the autogenerated schema.
CREATE TABLE [dbo].[TABLE1] (
[COLUMN1] BIGINT IDENTITY (131556, 1) NOT NULL,
[COLUMN2] INT CONSTRAINT [DF__TABLE_1__PRD_I__24E777C3] DEFAULT ((0)) NULL,
[COLUMN3] INT CONSTRAINT [DF__TABLE1__PROMO__2AA05119] DEFAULT ((0)) NULL,
CONSTRAINT [PK_TABLE1] PRIMARY KEY CLUSTERED ([COLUMN1] ASC)
);
GO
CREATE NONCLUSTERED INDEX [COLUMN3]
ON [dbo].[TABLE1]([COLUMN3] ASC);
GO
EXECUTE sp_addextendedproperty #name = N'MS_SSMA_SOURCE', #value = N'TABLE1',
#level0type = N'SCHEMA', #level0name = N'dbo', #level1type = N'TABLE',
#level1name = N'TABLE1';
I am trying to understand and cleanup the schema.Can someone please help with the following (naive) questions?
Why should the primary key (COLUMN1) specified as a PRIMARY KEY CLUSTERED?
In the original MySql table, COLUMN3 indexed. Is NONCLUSTERED INDEX the equivalent for Sql Server? What is the meaning of
CREATE NONCLUSTERED INDEX [COLUMN3]
ON [dbo].[TABLE1]([COLUMN3] ASC);
I did not understand the following:
EXECUTE sp_addextendedproperty #name = N'MS_SSMA_SOURCE', #value = N'TABLE1',
#level0type = N'SCHEMA', #level0name = N'dbo', #level1type = N'TABLE',
#level1name = N'TABLE1';
Can someone explain what it does?
Is the above create table syntax the minimal syntax to achieve what it does?
I have 131555 rows in my MySql table. Should I be specifying IDENTITY (131556, 1) to start auto increment of key from 131556 after I migrate data?
1.Why should the primary key (COLUMN1) specified as a "PRIMARY KEY CLUSTERED"?
It is generally best for every SQL Server table to have a clustered index, and only one clustered index is allowed per table because the b-tree leaf nodes of the clustered index are the actual data pages. The index supporting the primary key is often the best candidate but you can make the PK non-clustered and have different index as the clustered one if that's advantageous for your particular situation. For example, if your queries of most often range searches on COLUMN3 instead of queries that select or join by COLUMN1, the COLUMN3 index might be a better choice along with a NONCLUSTERED primary key.
2.In the original MySql table, COLUMN3 indexed. Is NONCLUSTERED INDEX the equivalent for Sql Server? What is the meaning of
A non-clustered index is also a b-tree index, allowing rows to be located by the index key more efficiently than a table scan.
3.I did not understand the following:
EXECUTE sp_addextendedproperty #name = N'MS_SSMA_SOURCE', #value =
N'TABLE1', #level0type = N'SCHEMA', #level0name = N'dbo', #level1type
= N'TABLE', #level1name = N'TABLE1';
Can someone explain what it does?
SQL Server has an extended property feature that allows you to attach meta-data to database objects. In this case, SSMA added an extended property to indicate the table was created by the tool.
4.Is the above create table syntax the minimal syntax to achieve what it does?
No. For example, one could omit the constraint names and SQL Server would generate a name automatically. However, the best practice is to explicitly name constraints to facilitate subsequent DDL changes and so that the purpose is self-documenting. Personally, I'd name the default constraints like DF_TABLENAME_COLUMNNAME.
5.I have 131555 rows in my MySql table. Should I be specifying IDENTITY (131556, 1) to start auto increment of key from 131556 after
I migrate data?
If you were to create the table with IDENTITY(1,1) and then insert rows with the IDENTITY INSERT ON option, SQL Server will automatically adjust the next IDENTITY according to the highest value inserted. I don't know much about SSMA but it looks like SSMA already did that for you.
1 and 2. You should probably read this description of the differences between a clustered and non-clustered index: http://msdn.microsoft.com/en-au/library/ms190457.aspx. In short there can be only one clustered index on a table and it defines the sort order for data in the table. You can have many non-clustered indexes on a table.
SQL Server allows you to add extended properties to objects - see here for details: http://technet.microsoft.com/en-us/library/ms190243%28v=sql.105%29.aspx. Basically they are for storing metadata about the object - a description for the table for example, or an input mask for a column.
Here's the full syntax for creating tables: http://msdn.microsoft.com/en-AU/library/ms174979.aspx. I note that your example is creating the table and adding some default constraints. You could rewrite this as
CREATE TABLE [dbo].[TABLE1]
(
[COLUMN1] BIGINT IDENTITY (131556, 1) NOT NULL,
[COLUMN2] INT DEFAULT ((0)) NULL,
[COLUMN3] INT NULL DEFAULT (0),
CONSTRAINT [PK_TABLE1] PRIMARY KEY CLUSTERED ([COLUMN1] ASC)
);
It really depends on whether you want to retain your existing key values, which I would assume you do. If so you should insert the data with
SET IDENTITY_INSERT yourTable ON
Read about identity columns here: http://msdn.microsoft.com/en-us/library/ms186775.aspx

SQL Index Update with Covering Columns

I am creating an index on a table and I want to include a covering column: messageText nvarchar(1024)
After insertion, the messageText is never updated, so it's an ideal candidate to include in a covering index to speed up lookups.
But what happens if I update other columns in same index?
Will the entire row in the index need reallocating or will just that data from the updated column be updated in the index?
Simple Example
Imaging the following table:
CREATE TABLE [Messages](
[messageID] [int] IDENTITY(1,1) NOT NULL,
[mbrIDTo] [int] NOT NULL,
[isRead] [bit] NOT NULL,
[messageText] [nvarchar](1024) NOT NULL
)
And the following Index:
CREATE NONCLUSTERED INDEX [IX_messages] ON [Messages] ( [mbrIDTo] ASC, [messageID] ASC )
INCLUDE ( [isRead], [messageText])
When we update the table:
UPDATE Messages
SET isRead = 1
WHERE (mbrIDTo = 6546)
The query plan shows that the index IX_messages is utilized and will also be updated becuase the column isRead is part of the index.
Therefore does including large text fields (such as messageText in the above) as part of a covering column in an index, impact performance when other values, in that same index, are updated?
When a row is updated in SQL Server, the entire row is deleted and a new row with the updated records is inserted. Therefore, even if the messageText field is not changing, it will still have to be re-written to the disk.
Here is a blog post from Paul Randall with a good example: http://www.sqlskills.com/blogs/paul/do-changes-to-index-keys-really-do-in-place-updates/

Increasing performance on a logging table in SQL Server 2005

I have a "history" table where I log each request into a Web Handler on our web site. Here is the table definition:
/****** Object: Table [dbo].[HistoryRequest] Script Date: 10/09/2009 17:18:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HistoryRequest](
[HistoryRequestID] [uniqueidentifier] NOT NULL,
[CampaignID] [int] NOT NULL,
[UrlReferrer] [nvarchar](512) NOT NULL,
[UserAgent] [nvarchar](512) NOT NULL,
[UserHostAddress] [nvarchar](15) NOT NULL,
[UserHostName] [nvarchar](512) NOT NULL,
[HttpBrowserCapabilities] [xml] NOT NULL,
[Created] [datetime] NOT NULL,
[CreatedBy] [nvarchar](100) NOT NULL,
[Updated] [datetime] NULL,
[UpdatedBy] [nvarchar](100) NULL,
CONSTRAINT [PK_HistoryRequest] PRIMARY KEY CLUSTERED
(
[HistoryRequestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[HistoryRequest] WITH CHECK ADD CONSTRAINT [FK_HistoryRequest_Campaign] FOREIGN KEY([CampaignID])
REFERENCES [dbo].[Campaign] ([CampaignId])
GO
ALTER TABLE [dbo].[HistoryRequest] CHECK CONSTRAINT [FK_HistoryRequest_Campaign]
GO
37 seconds for 1050 rows on this statement:
SELECT *
FROM HistoryRequest AS hr
WHERE Created > '10/9/2009'
ORDER BY Created DESC
Does anyone have anysuggestions for speeding this up? I have a Clustered Index on the PK and a regular Index on the CREATED column. I tried a Unique Index and it barfed complaining there is a duplicate entry somewhere - which can be expected.
Any insights are welcome!
You are requesting all columns (*) over a non-covering index (created). On a large data set you are guaranteed to hit the Index Tipping Point where the clustered index scan is more efficient than an nonclustered index range seek and bookmark lookup.
Do you need * always? If yes, and if the typical access pattern is like this, then you must organize the table accordingly and make Created the leftmost clustered key.
If not, then consider changing your query to a coverable query, eg. select only HistoryRequestID and Created, which are covered by the non clustered index. If more fields are needed, add them as included columns to the non-clustered index, but take into account that this will add extra strorage space and IO log write time.
Hey, I've seen some odd behavior when pulling XML columns in large sets. Try putting your index on Created back, then specify the columns in your select statement; but omit the XML. See how that affects the return time for results.
For a log table, you probably don't need a uniqueidentifier column. You're not likely to query on it either, so it's not a good candidate for a clustered index. Your sample query is on "Created", yet there's no index on it. If you query frequently on ranges of "Created" values then it would be a good candidate for clustering even though it's not necessarily unique.
OTOH, the foreign key suggests frequent querying by Campaign, in which case having the clustering done by that column could make sense, and would also probably do a better job of scattering the inserted keys in the indexes - both the surrogate key and the timestamp would add records in sequential order, which is net more work over time for insertions because the node sectors are filled less randomly.
If it's just a log table, why does it have update audit columns? It would normally be write-only.
Rebuild indexes. Use WITH (NOLOCK) clause after the table names where appropriate, this probably applies if you want to run long(ish) running queries against table that are heavily used in a live environment (such as a log file). It basically means your query migth miss some of teh very latest records but you also aren't holding a lock open on the table - which creates additional overhead.