Composite Keys SQL server - sql

I have created a joining table for many-to-many relationship.
The table only has 2 cols in it, ticketid and groupid
typical data would be
groupid ticketid
20 56
20 87
20 96
24 13
24 87
25 5
My question is when creating the composite key should I have ticketid followed by groupid
CONSTRAINT [PK_ticketgroup] PRIMARY KEY CLUSTERED
(
[ticketid] ASC,
[groupid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Or the other way, groupid followed by ticketid
CONSTRAINT [PK_ticketgroup] PRIMARY KEY CLUSTERED
(
[groupid] ASC,
[ticketid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Would searching the index be quicker in option 1 as the ticketid's have more chance of being unique then the groupids and they would be at the start of the composite key? Or is this negligible?

The difference would most likely be negligible.
It is however recommended for SQL Server that the most selective column be placed fist. If a column with low selectivity is placed first, the Optimizer may determine that your index is not very selective and will choose to ignore it. See this sqlserverpedia.com Wiki Article for more information.

I would actually create two indexes. Given that ticket IDs are more likely to be unique, the clustered index would be GroupID,TicketID in that order. I would then create a non-clustered non-unique index on TicketID.
The reason being that if you wanted to query based only on group ID, they would logically be contiguous and there would be a block of them. The other index will give you fastest when only TicketID is specified.
I do think it would probably be negligible overall depending on how the data will be queried (i.e. if groupid and ticketid are always provided).

Related

SQL Server - Is creating an index with a unique constraint as one of the columns necessary?

I have the query below:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = #input
AND FK_SLA_Process = #input2
AND IsActive = 1
And this is my index for this SLA table.
CREATE INDEX IX_SLA_SLAName_FK_SLA_Process_IsActive ON dbo.SLA (SLAName, FK_SLA_Process, IsActive) INCLUDE (SLATimeInSeconds)
However, the SLAName column is unique so it has a unique constraint/index.
Is my created index an overkill? Do I still need it or will SQL Server use the index created on the unique column SLAName?
It would be an "overkill" if your index would only be on SLAName, but you are also ordering by FK_SLA_Process and IsActive so queries that need needs columns will benefit more from your index and less if you just had the unique one.
So for a query like this:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Both index will yield the same results and there would be no point in yours. But for queries like:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
AND FK_SLA_Process = 'Some Value'
Or
SELECT SLATimeInSeconds
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Your index will be better than the unique one (2nd example is a covering index).
You should inspect which kind of SELECT you do to this table and decide if you need this one or not. Keep in mind that having many indexes might speed up selects but slow down inserts, updates and deletes.
Assuming you have a such table declaration:
CREATE TABLE SLA
(
ID INT PRIMARY KEY,
SLAName VARCHAR(50) NOT NULL UNIQUE,
fk_SLA INT,
IsActive TINYINT
)
Under the hood we have two indexes:
CREATE TABLE [dbo].[SLA](
[ID] [int] NOT NULL,
[SLAName] [varchar](50) NOT NULL,
[fk_SLA] [int] NULL,
[IsActive] [tinyint] NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
So this query will have an index seek and has an optimal plan:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
Its query plan indicates index seek because we are searching by index UNIQUE NONCLUSTERED ([SLAName] ASC ) and don't use other columns in WHERE statement:
But if you add extra parameters into WHERE:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
AND s.fk_SLA = 1
AND s.IsActive = 1
Execution plan will have extra look up:
Lookup happens when index does not have necessary information. SQL Query engine has to get out from UNIQUE NONCLUSTERED index data structure to find data of columns fk_SLA and IsActive in your table SLA.
So your index is overkill as you have UNIQUE NONCLUSTERED index:
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
If SLAName column is unique and it has a unique constraint, any query that returns only one or 0 rows (all the queries with a point search that include SLAName = 'SomeName' condition) will use the unique index and make (maximum) one lookup in the base table.
Unless your queries have a range search like SLAName like 'SomeName%' there is no need in covering index because index search + 1 lookup is almost the same as only index search, and there is no need to waste space / maintain another index for such a miserable performance gain.

Include columns in an existing non-clustered index or add a new non-clustered index

Table Props already has a non-clustered index for column 'CreatedOn' but this index doesn't include certain other columns that are required in order to significantly improve the query performance of a frequently run query.
To fix this is it best to;
1. create an additional non-clustered index with the included columns or
2. alter the existing index to add the other columns as included columns?
In addition:
- How will my decision affect the performance of other queries currently using the non-clustered index?
- If it is best to alter the existing index should it be dropped and re-created or altered in order to add the included columns?
A simplified version of the table is below along with the index in question:
CREATE TABLE dbo.Props(
PropID int NOT NULL,
Reference nchar(10) NULL,
CustomerID int NULL,
SecondCustomerID int NULL,
PropStatus smallint NOT NULL,
StatusLastChanged datetime NULL,
PropType smallint NULL,
CreatedOn datetime NULL,
CreatedBy int NULL
CONSTRAINT PK_Props PRIMARY KEY CLUSTERED
(
PropID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
Current index:
CREATE NONCLUSTERED INDEX idx_CreatedOn ON dbo.Props
(
CreatedOn ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
All 5 of the columns required in the new or altered index are; foreign key columns, a mixture of smallint and int, nullable and non-nullable.
In the example the columns to include are: CustomerID, SecondCustomerID, PropStatus, PropType and CreatedBy.
As always... It depends...
As a general rule having redundant indexes is not desirable. So, in the absence of other information, you'd be better off adding the included columns, making it a covering index.
That said, the original index was likely built for another "high frequency" query... So now you have to determine weather or not the the increased index page count is going adversely affect the existing queries that use the index in it's current state.
You'd also want to look at the expense of doing a key lookup in relation to the rest of the query. If the key lookup in only a minor part of the total cost, it's unlikely that the performance gains will offset the expense of maintaining the larger index.

SQL Conditional Unique Constraint With Where Clause Within Same Table

I have a table where I want to ensure that a combination of five columns remain unique within that table. For example:
ALTER TABLE [dbo].[MyTable]
ADD CONSTRAINT [UQ__MyTable.MFG.Model.Class.Depiction.Iteration]
UNIQUE NONCLUSTERED
(
[ManufacturerID] ASC,
[Model] ASC,
[BlockClassID] ASC,
[BlockDepictionID] ASC,
[BlockIterationID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
GO
I want to exclude combinations where a sixth separate column has a particular value. For example, I only want to enforce this above constraint when the column [Flag] = 0 and exclude enforcement when the column [Flag] = 1 .
As workaround, You can get the proper ANSI behavior in SQL Server 2008 and above by creating a unique, filtered index.
CREATE UNIQUE NONCLUSTERED INDEX [IX__MyTable.MFG.Model.Class.Depiction.Iteration]
ON [dbo].[MyTable] ([ManufacturerID],[Model],[BlockClassID],[BlockDepictionID],[BlockIterationID])
WHERE [Flag] = 0;
TechNet article

create primary key on existing table with data

As part of a migration project, we have imported data from a JDE iSeries DB2 database. An SSIS package was created to create the destination tables and import data. The import went successfully.
Now comes the problem - The customer wants Primary Keys created in the destination DB (SQL 2008 R2). The problem table in this case, would be one table that has 104 columns and 7.5 million rows of data. The PK required for this table is composite and has 7 columns.
We are considering this :
BEGIN TRANSACTION
GO
ALTER TABLE [dbo].[F0911] ADD CONSTRAINT [F0911_PK] PRIMARY KEY CLUSTERED
(
[GLDCT] ASC,
[GLDOC] ASC,
[GLKCO] ASC,
[GLDGJ] ASC,
[GLJELN] ASC,
[GLLT] ASC,
[GLEXTL] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
COMMIT
or this:
-- Rename existing tables
sp_RENAME '[F0911]' , '[F0911_old]'
GO
-- Create new table
SELECT * INTO F0911 FROM F0911_old WHERE 1=0
GO
--Create PK constraints
ALTER TABLE [dbo].[F0911] ADD CONSTRAINT [F0911_PK] PRIMARY KEY CLUSTERED
(
[GLDCT] ASC,
[GLDOC] ASC,
[GLKCO] ASC,
[GLDGJ] ASC,
[GLJELN] ASC,
[GLLT] ASC,
[GLEXTL] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
--Insert data into new tables
INSERT INTO F0911
SELECT * FROM F0911_old
GO
-- Drop old tables
DROP TABLE F0911_old
GO
Which would be a more efficient approach, performance wise? I have a gut feeling that both are the same and even the first approach does the same thing as the second one does, implicitly. Is this understanding correct?
Please note that all these columns already exist in the table and we cannot modify the table definition.
Thanks,
Raj
They're the same. The effect of creating a clustered index is to arrange the pages which will happen in both cases. For non-clustered indexes it will help to disable the index and then turn it back on and rebuilding it.
I think the first approach is right, but I don't understand the reason of BEGIN Transaction and END transaction. I don't think Transaction keyword is necessary because you are not modifying data of the table. Transaction is used where we have to lock the data and we are modifying real time data so the old data is not used.

Soft Delete - Use IsDeleted flag or separate joiner table?

Should we use a flag for soft deletes, or a separate joiner table? Which is more efficient? Database is SQL Server.
Background Information
A while back we had a DB consultant come in and look at our database schema. When we soft delete a record, we would update an IsDeleted flag on the appropriate table(s). It was suggested that instead of using a flag, store the deleted records in a seperate table and use a join as that would be better. I've put that suggestion to the test, but at least on the surface, the extra table and join looks to be more expensive then using a flag.
Initial Testing
I've set up this test.
Two tables, Example and DeletedExample. I added a nonclustered index on the IsDeleted column.
I did three tests, loading a million records with the following deleted/non deleted ratios:
Deleted/NonDeleted
50/50
10/90
1/99
Results - 50/50
Results - 10/90
Results - 1/99
Database Scripts, For Reference, Example, DeletedExample, and Index for Example.IsDeleted
CREATE TABLE [dbo].[Example](
[ID] [int] NOT NULL,
[Column1] [nvarchar](50) NULL,
[IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Example] ADD CONSTRAINT [DF_Example_IsDeleted] DEFAULT ((0)) FOR [IsDeleted]
GO
CREATE TABLE [dbo].[DeletedExample](
[ID] [int] NOT NULL,
CONSTRAINT [PK_DeletedExample] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DeletedExample] WITH CHECK ADD CONSTRAINT [FK_DeletedExample_Example] FOREIGN KEY([ID])
REFERENCES [dbo].[Example] ([ID])
GO
ALTER TABLE [dbo].[DeletedExample] CHECK CONSTRAINT [FK_DeletedExample_Example]
GO
CREATE NONCLUSTERED INDEX [IX_IsDeleted] ON [dbo].[Example]
(
[IsDeleted] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
The numbers you have seem to indicate that my initial impression was correct: if your most common query against this database is to filter on IsDeleted = 0, then performance will be better with a simple bit flag, especially if you make wise use of indexes.
If you often query for deleted and undeleted data separately, then you could see a performance gain by having a table for deleted items and another for undeleted items, with identical fields. But denormalizing your data like this is rarely a good idea, as it will most often cost you far more in code maintenance costs than it will gain you in performance increases.
I'm not the SQL expert but in my opinion,it all depends on the usage frequency of the database. If the database is accessed by the large number of users and needs to be efficient then usage of a seperate isDeleted table will be good. The better option would be using a flag during the production time and as a part of daily/weekly/monthly maintanace you may move all the soft deleted records to the isDeleted table and clear the production table of soft deleted records. The mixture of both option will be good a good one.