I have a table with 5 indexes in SQL server. I'm aware of the fact that indexes affect inserts so I'd like to keep them to the minimum. I definitely need the first four indexes (as seen in the below sample).
However, I'm note quite sure if the last index (TimeSubmitted) is absolutely necessary - note that there is already CliendId+TimeSubmitted index.
The only reason why it is there is to make purging of expired rows from the table more efficient - or at least this is what my intention is. The purging job will be scheduled to run once a day - most likely at night.
There could be hundreds of thousands of records in the table at any given time.
Stored proc to purge the table:
CREATE PROCEDURE uspPurgeMyTable
(
#ExpiryDate datetime
)
AS
BEGIN
DELETE FROM MyTable
WHERE TimeSubmitted < #ExpiryDate;
END
Table (non relevant columns omitted):
CREATE TABLE MyTable (
[ClientId] [char](36) NOT NULL,
[UserName] [nvarchar](256) NOT NULL,
[TimeSubmitted] [datetime] NOT NULL,
[ProviderId] [uniqueidentifier] NULL,
[RegionId] [int] NULL
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientArea] ON [dbo].[MyTable]
(
[ClientId] ASC,
[RegionId] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientPrinter] ON [dbo].[MyTable]
(
[ClientId] ASC,
[ProviderId] DESC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientTime] ON [dbo].[MyTable]
(
[ClientId] ASC,
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientUser] ON [dbo].[MyTable]
(
[ClientId] ASC,
[UserName] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_MyTable_TimeSubmitted] ON [dbo].[MyTable]
(
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
Yes, the IX_MyTable_TimeSubmitted index will be beneficial for the purge process you described. Since the purge is based on the TimeSubmitted column only and none of the other indexes start with that, the best SQL Server would be able to do is use them in an index scan.
As with any indexing, there are trade-offs between read and update performance. You should measure your read, insert, and purge performance with and without the additional index to see what provides the best response time for each situation. Batch processes such as the purge that run overnight may not be as time sensitive as inserting or reading in your particular scenario.
If you are concerned about performance, you should take a different approach, partitioning. A reasonable place to learn about partitioning is the documentation.
A partitioned table stores each partition in a separate set of files. These are invisible to users of the database, in general. However, if you want to drop old data, you can just drop a partition.
This not only eliminates the need for the index. It also eliminates the need for the delete. And, dropping a partition is much more efficient than deleting, because there is much less logging.
Related
I have a table where anywhere between 1 and 5 million records come in as a batch and then a bunch of stored procedures are ran on them that update and delete records in the batch.
All of these stored procedures are using two fields for selectivity so they only run on the records in that batch.
Both of these fields are in a nonclustered index.
There are times when multiple batches are run at the same time and I am continuously getting deadlocks happening between batches, I assume due to lock escalations.
Trying to figure out if there is a way to solve this without a complete redesign to use a dedicated table for each batch. Is disabling page locks asking for more trouble?
Additional information:
Example of table structure and Index(the real one has a lot more columns than this)
CREATE TABLE [dbo].[TempImport](
[UID] [int] IDENTITY(1,1) NOT NULL,
[EID] [int] NULL,
[EXTID] [int] NULL,
[COL1] [varchar](50) NULL,
[COL2] [varchar](50) NULL,
CONSTRAINT [PK_TempImport] PRIMARY KEY CLUSTERED
(
[UID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_TempImport_Main] ON [dbo].[TempImport]
(
[EID] ASC,
[EXTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
And the types of queries in the stored procedures look something like this:
update TempImport set COL1 = 'foo' where EID = #EID and EXTID = #EXT and COL2='bar'
And the last thing that happens when batch completes is something like this:
Delete from TempImport where EID = #EID and EXTID = #EXT
It is typically the delete and the updates in the stored procedures that are involved in the deadlock.
Please let me know if any other info would be useful
Just some potential suggestions
Yeah, you could split them up into (say) batches of 2000, to stop the row locks escalating to table locks, but you'd have a gazillion running. Probably not a good solution.
You could modify the update processes (as per Brent Ozar's video re deadlocks I recommended in the comments) to update each table once and once only. As a suggestion, you could also try encapsulating them in transactions. These could remove deadlocks, but add blocking (where the second operation has to wait for the first to finish).
A structural method is to make a 'loading' or 'scratch' table which has IDs and relevant data to be operated on (you could see this as a queue). When calls to do updates come in, they simply insert their requests into that queue. Then you have an asynchronous process (that can only be called one at a time) that gets all outstanding data from the queue (and flags it as such), does the relevant processing, then cleans out the processed data from the scratch table.
Note that if you have other things accessing/using this table, then they could get blocked or deadlocked on this table too, and then your approach needs to be very careful.
I have the query below:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = #input
AND FK_SLA_Process = #input2
AND IsActive = 1
And this is my index for this SLA table.
CREATE INDEX IX_SLA_SLAName_FK_SLA_Process_IsActive ON dbo.SLA (SLAName, FK_SLA_Process, IsActive) INCLUDE (SLATimeInSeconds)
However, the SLAName column is unique so it has a unique constraint/index.
Is my created index an overkill? Do I still need it or will SQL Server use the index created on the unique column SLAName?
It would be an "overkill" if your index would only be on SLAName, but you are also ordering by FK_SLA_Process and IsActive so queries that need needs columns will benefit more from your index and less if you just had the unique one.
So for a query like this:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Both index will yield the same results and there would be no point in yours. But for queries like:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
AND FK_SLA_Process = 'Some Value'
Or
SELECT SLATimeInSeconds
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Your index will be better than the unique one (2nd example is a covering index).
You should inspect which kind of SELECT you do to this table and decide if you need this one or not. Keep in mind that having many indexes might speed up selects but slow down inserts, updates and deletes.
Assuming you have a such table declaration:
CREATE TABLE SLA
(
ID INT PRIMARY KEY,
SLAName VARCHAR(50) NOT NULL UNIQUE,
fk_SLA INT,
IsActive TINYINT
)
Under the hood we have two indexes:
CREATE TABLE [dbo].[SLA](
[ID] [int] NOT NULL,
[SLAName] [varchar](50) NOT NULL,
[fk_SLA] [int] NULL,
[IsActive] [tinyint] NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
So this query will have an index seek and has an optimal plan:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
Its query plan indicates index seek because we are searching by index UNIQUE NONCLUSTERED ([SLAName] ASC ) and don't use other columns in WHERE statement:
But if you add extra parameters into WHERE:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
AND s.fk_SLA = 1
AND s.IsActive = 1
Execution plan will have extra look up:
Lookup happens when index does not have necessary information. SQL Query engine has to get out from UNIQUE NONCLUSTERED index data structure to find data of columns fk_SLA and IsActive in your table SLA.
So your index is overkill as you have UNIQUE NONCLUSTERED index:
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
If SLAName column is unique and it has a unique constraint, any query that returns only one or 0 rows (all the queries with a point search that include SLAName = 'SomeName' condition) will use the unique index and make (maximum) one lookup in the base table.
Unless your queries have a range search like SLAName like 'SomeName%' there is no need in covering index because index search + 1 lookup is almost the same as only index search, and there is no need to waste space / maintain another index for such a miserable performance gain.
Table Props already has a non-clustered index for column 'CreatedOn' but this index doesn't include certain other columns that are required in order to significantly improve the query performance of a frequently run query.
To fix this is it best to;
1. create an additional non-clustered index with the included columns or
2. alter the existing index to add the other columns as included columns?
In addition:
- How will my decision affect the performance of other queries currently using the non-clustered index?
- If it is best to alter the existing index should it be dropped and re-created or altered in order to add the included columns?
A simplified version of the table is below along with the index in question:
CREATE TABLE dbo.Props(
PropID int NOT NULL,
Reference nchar(10) NULL,
CustomerID int NULL,
SecondCustomerID int NULL,
PropStatus smallint NOT NULL,
StatusLastChanged datetime NULL,
PropType smallint NULL,
CreatedOn datetime NULL,
CreatedBy int NULL
CONSTRAINT PK_Props PRIMARY KEY CLUSTERED
(
PropID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
Current index:
CREATE NONCLUSTERED INDEX idx_CreatedOn ON dbo.Props
(
CreatedOn ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
All 5 of the columns required in the new or altered index are; foreign key columns, a mixture of smallint and int, nullable and non-nullable.
In the example the columns to include are: CustomerID, SecondCustomerID, PropStatus, PropType and CreatedBy.
As always... It depends...
As a general rule having redundant indexes is not desirable. So, in the absence of other information, you'd be better off adding the included columns, making it a covering index.
That said, the original index was likely built for another "high frequency" query... So now you have to determine weather or not the the increased index page count is going adversely affect the existing queries that use the index in it's current state.
You'd also want to look at the expense of doing a key lookup in relation to the rest of the query. If the key lookup in only a minor part of the total cost, it's unlikely that the performance gains will offset the expense of maintaining the larger index.
Should we use a flag for soft deletes, or a separate joiner table? Which is more efficient? Database is SQL Server.
Background Information
A while back we had a DB consultant come in and look at our database schema. When we soft delete a record, we would update an IsDeleted flag on the appropriate table(s). It was suggested that instead of using a flag, store the deleted records in a seperate table and use a join as that would be better. I've put that suggestion to the test, but at least on the surface, the extra table and join looks to be more expensive then using a flag.
Initial Testing
I've set up this test.
Two tables, Example and DeletedExample. I added a nonclustered index on the IsDeleted column.
I did three tests, loading a million records with the following deleted/non deleted ratios:
Deleted/NonDeleted
50/50
10/90
1/99
Results - 50/50
Results - 10/90
Results - 1/99
Database Scripts, For Reference, Example, DeletedExample, and Index for Example.IsDeleted
CREATE TABLE [dbo].[Example](
[ID] [int] NOT NULL,
[Column1] [nvarchar](50) NULL,
[IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Example] ADD CONSTRAINT [DF_Example_IsDeleted] DEFAULT ((0)) FOR [IsDeleted]
GO
CREATE TABLE [dbo].[DeletedExample](
[ID] [int] NOT NULL,
CONSTRAINT [PK_DeletedExample] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DeletedExample] WITH CHECK ADD CONSTRAINT [FK_DeletedExample_Example] FOREIGN KEY([ID])
REFERENCES [dbo].[Example] ([ID])
GO
ALTER TABLE [dbo].[DeletedExample] CHECK CONSTRAINT [FK_DeletedExample_Example]
GO
CREATE NONCLUSTERED INDEX [IX_IsDeleted] ON [dbo].[Example]
(
[IsDeleted] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
The numbers you have seem to indicate that my initial impression was correct: if your most common query against this database is to filter on IsDeleted = 0, then performance will be better with a simple bit flag, especially if you make wise use of indexes.
If you often query for deleted and undeleted data separately, then you could see a performance gain by having a table for deleted items and another for undeleted items, with identical fields. But denormalizing your data like this is rarely a good idea, as it will most often cost you far more in code maintenance costs than it will gain you in performance increases.
I'm not the SQL expert but in my opinion,it all depends on the usage frequency of the database. If the database is accessed by the large number of users and needs to be efficient then usage of a seperate isDeleted table will be good. The better option would be using a flag during the production time and as a part of daily/weekly/monthly maintanace you may move all the soft deleted records to the isDeleted table and clear the production table of soft deleted records. The mixture of both option will be good a good one.
Everyday I import 2,000,000 rows from some text files using BULK INSERT into SQL Server 2008 and then I do some post-processing to update the records.
I have some indexes on the table to execute the post-process as fast as possible and in normal situation, the post-processing script takes about 40 seconds to run.
But sometimes (I don't know when) the post-processing does not work. In the situation I've mentioned, it is not done after an hour! After rebuilding indexes, everything is fine and normal.
What should I do to prevent the problem to be happened?
Right now, I have a nightly job to rebuild all indexes. Why the index fragmentation grows up to 90%?
Update:
Here is my table which I import text file into:
CREATE TABLE [dbo].[My_Transactions](
[My_TransactionId] [bigint] NOT NULL,
[FileId] [int] NOT NULL,
[RowNo] [int] NOT NULL,
[TransactionTypeId] [smallint] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionNumber] [dbo].[TransactionNumber] NOT NULL,
[CardNumber] [dbo].[CardNumber] NULL,
[AccountNumber] [dbo].[CardNumber] NULL,
[BankCardTypeId] [smallint] NOT NULL,
[AcqBankId] [smallint] NOT NULL,
[DeviceNumber] [dbo].[DeviceNumber] NOT NULL,
[Amount] [dbo].[Amount] NOT NULL,
[DeviceTypeId] [smallint] NOT NULL,
[TransactionFee] [dbo].[Amount] NOT NULL,
[AcqSwitchId] [tinyint] NOT NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K1_K4_K12_K6_K11_5] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC,
[TransactionTypeId] ASC,
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K12_K6_K11_K1_5] ON [dbo].[Jam_Transactions]
(
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC,
[Jam_TransactionId] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Jam_Transactions] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Have you tried just refreshing the statistics after such a large insert:
UPDATE STATISTICS my_table
My experience with large bulk inserts is that the statistics get all mangled up and need a refreshed afterwards, it's also much faster than running a REINDEX or index REORDER.
Another option is to look into padding the index, you likely have no padding fill factor on your indexes meaning that if your index is:
A, B, D, E, F
and you insert a value with a CardNumber of C, then your index will look like:
A, B, D, E, F, C
and hence be ~20% fragmented, if you instead specify a fill factor for your index of say 15% we would see it look like roughly:
A, B, D, _, E, F
(Note the internal the empty space is put roughly in the middle point of the fillfactor % not at the end)
So that when you insert the C value it is closer to being correct, but it actually sees that the D is just swapped with the C and usually moves the D at that point.
Beyond that, are you sure that the fragmentation is actually the problem, as part of reindexing the table is read and loaded entirely into memory (provided it fits) and thus any query you run on it will be very fast.
Instead of including this table in the nightly job, why don't you make index maintenance (on this table specifically) part of the nightly import job, between BULK INSERT and whatever 'post-processing' is?
We don't have enough information to know why the index fragmentation grows that quickly. Which index(es)? How many indexes are there? What is the order of the data in the file?
You may also consider using the ORDER option in the BULK INSERT statement to change the way the data is inserted. It may make the load take longer but it should reduce the need to reorganize. Again depending on the order of the source data and the index(es) that become fragmented.
Finally, what is the impact of rebuilding/not rebuilding or reorganizing/not reorganizing the indexes? Have you tried both? Perhaps it makes the post-processing run quicker if you rebuild, but perhaps only a defragment is necessary. And while it may make the post-processing quicker, what about the queries that are run against the table later in the day? Have you done any metrics against those to see if they speed up or slow down depending on what you do at night?
Does your main table keep growing by 2 million rows per day or is there a lot of deleting taking place as well? Could you bulk insert into a temporary import table and do your processing prior to inserting into the main table? You can always use hints to force your queries to use certain indices:
SELECT *
FROM your_table_name WITH (INDEX(your_index_name))
WHERE your_column_name = 5
I would try taking the index offline before a mass row insertion and the bring it back online after the mass row insertion. Much, much more faster as compared to re-indexing, or performing a drop and create index..... difference is that the index is there data is being stored but the index is currently not being used, "Offline" until it is brought back "Online". I have a 1.5 million row insertion process and had a problem with one of my non clustered indexes fragmenting which was causing poor performance. Went form 99% fragmentation to .14% using the MSSQL Offline Online Index option....
Code sample:
ALTER INDEX idx_a ON dbo.tbl_A
REBUILD WITH (ONLINE = OFF);
Toggle between OFF and ON and you are good to go....