Soft Delete - Use IsDeleted flag or separate joiner table? - indexing

Should we use a flag for soft deletes, or a separate joiner table? Which is more efficient? Database is SQL Server.
Background Information
A while back we had a DB consultant come in and look at our database schema. When we soft delete a record, we would update an IsDeleted flag on the appropriate table(s). It was suggested that instead of using a flag, store the deleted records in a seperate table and use a join as that would be better. I've put that suggestion to the test, but at least on the surface, the extra table and join looks to be more expensive then using a flag.
Initial Testing
I've set up this test.
Two tables, Example and DeletedExample. I added a nonclustered index on the IsDeleted column.
I did three tests, loading a million records with the following deleted/non deleted ratios:
Deleted/NonDeleted
50/50
10/90
1/99
Results - 50/50
Results - 10/90
Results - 1/99
Database Scripts, For Reference, Example, DeletedExample, and Index for Example.IsDeleted
CREATE TABLE [dbo].[Example](
[ID] [int] NOT NULL,
[Column1] [nvarchar](50) NULL,
[IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Example] ADD CONSTRAINT [DF_Example_IsDeleted] DEFAULT ((0)) FOR [IsDeleted]
GO
CREATE TABLE [dbo].[DeletedExample](
[ID] [int] NOT NULL,
CONSTRAINT [PK_DeletedExample] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DeletedExample] WITH CHECK ADD CONSTRAINT [FK_DeletedExample_Example] FOREIGN KEY([ID])
REFERENCES [dbo].[Example] ([ID])
GO
ALTER TABLE [dbo].[DeletedExample] CHECK CONSTRAINT [FK_DeletedExample_Example]
GO
CREATE NONCLUSTERED INDEX [IX_IsDeleted] ON [dbo].[Example]
(
[IsDeleted] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

The numbers you have seem to indicate that my initial impression was correct: if your most common query against this database is to filter on IsDeleted = 0, then performance will be better with a simple bit flag, especially if you make wise use of indexes.
If you often query for deleted and undeleted data separately, then you could see a performance gain by having a table for deleted items and another for undeleted items, with identical fields. But denormalizing your data like this is rarely a good idea, as it will most often cost you far more in code maintenance costs than it will gain you in performance increases.

I'm not the SQL expert but in my opinion,it all depends on the usage frequency of the database. If the database is accessed by the large number of users and needs to be efficient then usage of a seperate isDeleted table will be good. The better option would be using a flag during the production time and as a part of daily/weekly/monthly maintanace you may move all the soft deleted records to the isDeleted table and clear the production table of soft deleted records. The mixture of both option will be good a good one.

Related

Why is my filtered index not being used for one statement?

I have the following table (an excerpt follows w/o other columns):
USE [opg-systems-dev]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Orders]
(
[SyncChannelEngineOrder] [bit] NOT NULL,
[IsSyncing] [bit] NOT NULL,
CONSTRAINT [PK_Orders]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Orders]
ADD DEFAULT (CONVERT([bit], (0))) FOR [IsSyncing]
GO
-- -----
CREATE NONCLUSTERED INDEX [IX_Orders_SyncChannelEngineOrder]
ON [dbo].[Orders] ([SyncChannelEngineOrder] ASC)
WHERE ([SyncChannelEngineOrder] <> (0))
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_Orders_IsSyncing]
ON [dbo].[Orders] ([IsSyncing] ASC)
WHERE ([IsSyncing] <> (0))
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
I'm trying to use a filtered index here - but it seems when I'm querying that the the index for SyncChannelEngineOrder is not being used for some reason.
The setup is supposed to be the same for both columns.
The query is requesting all columns with a select *. The indexes only contain one column (ignoring the clustered key). To provide the missing column data that is not on the index, the query optimizer is calculating a lower cost to just scan all of the rows in the source table.
If SQL Server chose to use the existing indexes, it would first scan those rows, and then it would perform a bookmark lookup to get the rest of the columns from the table. If there are very few rows in the filtered index, and the table is very large, the SQL Server query optimizer could theoretically calculate a low enough cost to use the filtered index to limit the number of rows. Otherwise, bookmark lookups are expensive on a large number of rows. I am guessing the number of rows in the index is over this threshold.
SQL Server will add the clustered key to a non-clustered index to facilitate the bookmark lookup. If the query were to request only the clustered key and the index filter columns, the indexes would cover the query, and I expect the query optimizer would choose to use the filtered indexes.

Is this index on a table necessary?

I have a table with 5 indexes in SQL server. I'm aware of the fact that indexes affect inserts so I'd like to keep them to the minimum. I definitely need the first four indexes (as seen in the below sample).
However, I'm note quite sure if the last index (TimeSubmitted) is absolutely necessary - note that there is already CliendId+TimeSubmitted index.
The only reason why it is there is to make purging of expired rows from the table more efficient - or at least this is what my intention is. The purging job will be scheduled to run once a day - most likely at night.
There could be hundreds of thousands of records in the table at any given time.
Stored proc to purge the table:
CREATE PROCEDURE uspPurgeMyTable
(
#ExpiryDate datetime
)
AS
BEGIN
DELETE FROM MyTable
WHERE TimeSubmitted < #ExpiryDate;
END
Table (non relevant columns omitted):
CREATE TABLE MyTable (
[ClientId] [char](36) NOT NULL,
[UserName] [nvarchar](256) NOT NULL,
[TimeSubmitted] [datetime] NOT NULL,
[ProviderId] [uniqueidentifier] NULL,
[RegionId] [int] NULL
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientArea] ON [dbo].[MyTable]
(
[ClientId] ASC,
[RegionId] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientPrinter] ON [dbo].[MyTable]
(
[ClientId] ASC,
[ProviderId] DESC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientTime] ON [dbo].[MyTable]
(
[ClientId] ASC,
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_MyTable_ClientUser] ON [dbo].[MyTable]
(
[ClientId] ASC,
[UserName] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_MyTable_TimeSubmitted] ON [dbo].[MyTable]
(
[TimeSubmitted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
Yes, the IX_MyTable_TimeSubmitted index will be beneficial for the purge process you described. Since the purge is based on the TimeSubmitted column only and none of the other indexes start with that, the best SQL Server would be able to do is use them in an index scan.
As with any indexing, there are trade-offs between read and update performance. You should measure your read, insert, and purge performance with and without the additional index to see what provides the best response time for each situation. Batch processes such as the purge that run overnight may not be as time sensitive as inserting or reading in your particular scenario.
If you are concerned about performance, you should take a different approach, partitioning. A reasonable place to learn about partitioning is the documentation.
A partitioned table stores each partition in a separate set of files. These are invisible to users of the database, in general. However, if you want to drop old data, you can just drop a partition.
This not only eliminates the need for the index. It also eliminates the need for the delete. And, dropping a partition is much more efficient than deleting, because there is much less logging.

Include columns in an existing non-clustered index or add a new non-clustered index

Table Props already has a non-clustered index for column 'CreatedOn' but this index doesn't include certain other columns that are required in order to significantly improve the query performance of a frequently run query.
To fix this is it best to;
1. create an additional non-clustered index with the included columns or
2. alter the existing index to add the other columns as included columns?
In addition:
- How will my decision affect the performance of other queries currently using the non-clustered index?
- If it is best to alter the existing index should it be dropped and re-created or altered in order to add the included columns?
A simplified version of the table is below along with the index in question:
CREATE TABLE dbo.Props(
PropID int NOT NULL,
Reference nchar(10) NULL,
CustomerID int NULL,
SecondCustomerID int NULL,
PropStatus smallint NOT NULL,
StatusLastChanged datetime NULL,
PropType smallint NULL,
CreatedOn datetime NULL,
CreatedBy int NULL
CONSTRAINT PK_Props PRIMARY KEY CLUSTERED
(
PropID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
Current index:
CREATE NONCLUSTERED INDEX idx_CreatedOn ON dbo.Props
(
CreatedOn ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
All 5 of the columns required in the new or altered index are; foreign key columns, a mixture of smallint and int, nullable and non-nullable.
In the example the columns to include are: CustomerID, SecondCustomerID, PropStatus, PropType and CreatedBy.
As always... It depends...
As a general rule having redundant indexes is not desirable. So, in the absence of other information, you'd be better off adding the included columns, making it a covering index.
That said, the original index was likely built for another "high frequency" query... So now you have to determine weather or not the the increased index page count is going adversely affect the existing queries that use the index in it's current state.
You'd also want to look at the expense of doing a key lookup in relation to the rest of the query. If the key lookup in only a minor part of the total cost, it's unlikely that the performance gains will offset the expense of maintaining the larger index.

SQLSERVER Identity Column values Jumping by millions

I have table with an int identity column and it is skipping id in thousands at times. Searches suggest it is normal of sql server to skip it by 1000 or 1001, but mine does increase by 20000 or more on occasions, But last time it got jumped by 95216000.
Unable to find reason why this is happening, check sql server for crash logs and any other suspicious events but no luck.
Having replication on table, is that related..??
Create Table Script is like..
CREATE TABLE [dbo].[Table](
[CId] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
.
.
.
CONSTRAINT [PK_Table] PRIMARY KEY CLUSTERED
(
[CId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
Did you checked values for this
SELECT IDENT_SEED(TABLE_NAME) AS Seed,
IDENT_INCR(TABLE_NAME) AS Increment,
IDENT_CURRENT(TABLE_NAME) AS Current_Identity,
TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE OBJECTPROPERTY(OBJECT_ID(TABLE_NAME), 'YourTableName') = 1
AND TABLE_TYPE = 'BASE TABLE'
Also by any chance you truncated the data into table?
Please refer this link to find the reason for the gaps.
http://sqlity.net/en/792/the-gap-in-the-identity-value-sequence/

create primary key on existing table with data

As part of a migration project, we have imported data from a JDE iSeries DB2 database. An SSIS package was created to create the destination tables and import data. The import went successfully.
Now comes the problem - The customer wants Primary Keys created in the destination DB (SQL 2008 R2). The problem table in this case, would be one table that has 104 columns and 7.5 million rows of data. The PK required for this table is composite and has 7 columns.
We are considering this :
BEGIN TRANSACTION
GO
ALTER TABLE [dbo].[F0911] ADD CONSTRAINT [F0911_PK] PRIMARY KEY CLUSTERED
(
[GLDCT] ASC,
[GLDOC] ASC,
[GLKCO] ASC,
[GLDGJ] ASC,
[GLJELN] ASC,
[GLLT] ASC,
[GLEXTL] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
COMMIT
or this:
-- Rename existing tables
sp_RENAME '[F0911]' , '[F0911_old]'
GO
-- Create new table
SELECT * INTO F0911 FROM F0911_old WHERE 1=0
GO
--Create PK constraints
ALTER TABLE [dbo].[F0911] ADD CONSTRAINT [F0911_PK] PRIMARY KEY CLUSTERED
(
[GLDCT] ASC,
[GLDOC] ASC,
[GLKCO] ASC,
[GLDGJ] ASC,
[GLJELN] ASC,
[GLLT] ASC,
[GLEXTL] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
--Insert data into new tables
INSERT INTO F0911
SELECT * FROM F0911_old
GO
-- Drop old tables
DROP TABLE F0911_old
GO
Which would be a more efficient approach, performance wise? I have a gut feeling that both are the same and even the first approach does the same thing as the second one does, implicitly. Is this understanding correct?
Please note that all these columns already exist in the table and we cannot modify the table definition.
Thanks,
Raj
They're the same. The effect of creating a clustered index is to arrange the pages which will happen in both cases. For non-clustered indexes it will help to disable the index and then turn it back on and rebuilding it.
I think the first approach is right, but I don't understand the reason of BEGIN Transaction and END transaction. I don't think Transaction keyword is necessary because you are not modifying data of the table. Transaction is used where we have to lock the data and we are modifying real time data so the old data is not used.