How to design a mapping table? - sql

Im having a bit of a design issue here. Im kind of a novice in this, so I need some help.
Due to a company merge, where everything must go in one of the systems; Im supposed to map our customers with a new customerid in the other company.
When I get the new customerID's Im supposed to ensure that it is unique and the same goes for our existing customerID.
Current customerID: CurCustID
New customer ID: NewCustID
First, I would like the database to make sure that every CurCustID in column CurCustID is unique - only with one record, secondly I would like the column NewCustID to be unique - only with one record.
Third I would like that the row combination of CurCustID and NewCustID only accepts unique data.
If you can help me I would be very thankful, on the otherhand if my approach is bad practice and there is a best practice way of doing this, then please let me know.
USE [Database]
GO
/****** Object: Table [dbo].[TblMapning] Script Date: 05/30/2016 14:30:21 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TblMapning](
[CurCustID] [varchar](255) NOT NULL,
[NewCustID] [varchar](255) NOT NULL,
PRIMARY KEY CLUSTERED
(
[CurCustID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO

Seems like you'd have to create 3 separate tables to enforce all those things. How new customer ids are generated in the new and mapping tables could be automated depending on your house rules for how new ones are assigned. You would probably want to create code or SPs to make the process more user-friendly, spit out errors, etc. for duped oldcustids, but at the table level this would be one way to enforce it.
CREATE TABLE [dbo].[Tbloldcustids](
CustID [varchar](255) NOT NULL
PRIMARY KEY (CustID)
)
CREATE TABLE [dbo].[Tblnewcustids](
CustID [varchar](255) NOT NULL
PRIMARY KEY (CustID)
)
CREATE TABLE [dbo].[TblMapping](
[CurCustID] [varchar](255) NOT NULL,
[NewCustID] [varchar](255) NOT NULL
PRIMARY KEY (CurCustID,NewCustID)
)

Related

Violation of PRIMARY KEY constraint 'xx'. Cannot insert duplicate key in object 'cc'. The duplicate key is (x)

Updated question for better understanding and because I found the solution, I was looking for:
My script goes like this:
CREATE TABLE [dbo].[MyTable]
(
[Columndata1] [nvarchar] (255) NOT NULL,
[Columndata2] [nvarchar] (max) NOT NULL,
[Columndata3] [nvarchar] (max) NOT NULL,
[ColumndataTime] [datetime] NOT NULL,
CONSTRAINT [PK_MyTable]
PRIMARY KEY CLUSTERED ([Columndata1] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[MyTable]
ADD CONSTRAINT [DF_MyTable_ColumndataTime] DEFAULT (getutcdate()) FOR [ColumndataTime]
GO
I am trying to do a workaround in case of duplicate PK, so that if it happens, it should ignore the request and if not create the table.
I guess I wasn't clear about it in my initial question.
I think you already have your answer, you are inserting duplicate values to the column you designated as your primary key, the error you are getting tells you this, and it's all particularly clear. I can see that you assumed something more "sinister" was happening, but it seems to be that this wasn't the case, and it wasn't a "race condition" or something more complex.
However, I thought it might be worth pointing out something that I see as a bit of a "red flag". Maybe this doesn't classify as an answer, but it does address some points in your original question, particularly when you start asking about the options in the "complicated" part of your CREATE TABLE script, and it's too long for a comment?
If you have a "default" out of the box installation of SQL Server then 90% of the statements in your CREATE TABLE script are simply redundant defaults.
I can run this script:
CREATE TABLE [dbo].[MyTable] (
[Columndata1] [nvarchar] (255) NOT NULL,
[Columndata2] [nvarchar] (max) NOT NULL,
[Columndata3] [nvarchar] (max) NOT NULL,
[ColumndataTime] [datetime] NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ([Columndata1]));
Then I can generate the create script for the table, directly from SSMS, to get this:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[MyTable](
[Columndata1] [nvarchar](255) NOT NULL,
[Columndata2] [nvarchar](max) NOT NULL,
[Columndata3] [nvarchar](max) NOT NULL,
[ColumndataTime] [datetime] NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[Columndata1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
That second script looks very similar to the one in your post now doesn't it?
Now there's nothing necessarily wrong with specifying all of these default options, and there are cases where it might end up to your advantage, but my personal preference (and the preference of everyone I have ever worked with) is to omit the default options, as they just make scripts harder to peer review, and are essentially "clutter". You can argue over whether it's worth specifying the ASC in the PRIMARY KEY section or not, some of this is assumed knowledge, and there's always the possibility that Microsoft might decide to change defaults in the future (and then the first script wouldn't generate what you wanted). However, taking a pragmatic view of how these things work, the chances of Microsoft changing these options in a future version are incredibly slim, as it would break so many databases out there being used in the wild.
Take this as you want, but I thought it was worth a stab at explaining this, as you seem to be a little fixated (maybe the wrong word :)) on the "long part" (your words) in your original query?
My solution for it was to use the WHERE NOT EXISTS statement in SQL greatly inspired from this answer: https://stackoverflow.com/a/3025332
INSERT INTO `table` (`value1`, `value2`)
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1)
This way I won't get duplicates.
Kudos to Richard Hansell for the explanation, making me move my focus from the "complicated" part.

Re-seeding a large sql table

Using version:
Microsoft SQL Server 2008 R2 (SP3-OD) (KB3144114) - 10.50.6542.0 (Intel X86)
Feb 22 2016 18:12:09
Copyright (c) Microsoft Corporation
Standard Edition on Windows NT 5.2 <X86> (Build : )
I have a heavy table (135K rows), that I moved from another DB.
It transferred with the [id] column being a standard int column instead of it being the key & seed column.
When trying to edit that field to become an identity specification, with a seed value, its errors out and gives me this error:
Execution Timeout Expired.
The timeout period elapsed prior to completion of the operation...
I even tried deleting that column, to try recreate it later, but i get the same issue.
Thanks
UPDATE:
Table structure:
CREATE TABLE [dbo].[tblEmailsSent](
[id] [int] IDENTITY(1,1) NOT NULL, -- this is what it should be. currently its just an `[int] NOT NULL`
[Sent] [datetime] NULL,
[SentByUser] [nvarchar](50) NULL,
[ToEmail] [nvarchar](150) NULL,
[StudentID] [int] NULL,
[SubjectLine] [nvarchar](200) NULL,
[MessageContent] [nvarchar](max) NULL,
[ReadStatus] [bit] NULL,
[Folder] [nvarchar](50) NULL,
CONSTRAINT [PK_tblMessages] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
I think that your question is a duplicate of Adding an identity to an existing column. That question above has an answer that should be perfect for your situation. I'll reproduce its essential part here below.
But before that, let's clarify why you see the timeout error.
You are trying to add the IDENTITY property to existing column. And you are using SSMS GUI for it. A simple ALTER COLUMN statement can't do it and even if it could, SSMS generates a script that creates a new table, copies over the data into the new table, drops the old table and renames the new table to the old name. When you do this operation via SSMS GUI it runs its scripts with a predefined timeout of 30 seconds.
Of course, you can change this setting in SSMS and increase the timeout, but there is a much better way.
Simple/lazy way
Use SSMS GUI to change the column definition, but then instead of clicking "Save", click "Generate Change Script" in the table designer.
Then save this script to a file and review the generated T-SQL code that GUI runs behind the scene.
You'll see that it creates a temp table with the required schema, copies data over, re-creates foreign keys and indexes, drops the old table and renames the new table.
The script itself is usually correct, but pay close attention to transactions in it. For some reason SSMS often doesn't use a single transaction for the whole operation, but several transactions. I'd recommend to manually review the script and make sure that there is only one BEGIN TRANSACTION at the top and one COMMIT in the end. You don't want to end up with a half-done operation with, say, a table where all indexes and foreign keys were dropped.
If it is a one-off operation, it could be enough for you. Your table is only 2.4GB, so it may take few minutes, but it should not be hours.
If you run the T-SQL script yourself in SSMS, then by default there is no timeout. You can stop it yourself if it takes too long.
Smart and fast way to do it is described in details in this answer by Justin Grant.
The main idea is to use the ALTER TABLE...SWITCH statement to make the change only touching the metadata without touching each page of the table.
BEGIN TRANSACTION;
-- create a new table with required schema
CREATE TABLE [dbo].[NEW_tblEmailsSent](
[id] [int] IDENTITY(1,1) NOT NULL,
[Sent] [datetime] NULL,
[SentByUser] [nvarchar](50) NULL,
[ToEmail] [nvarchar](150) NULL,
[StudentID] [int] NULL,
[SubjectLine] [nvarchar](200) NULL,
[MessageContent] [nvarchar](max) NULL,
[ReadStatus] [bit] NULL,
[Folder] [nvarchar](50) NULL,
CONSTRAINT [PK_tblEmailsSent] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
-- switch the tables
ALTER TABLE [dbo].[tblEmailsSent] SWITCH TO [dbo].[NEW_tblEmailsSent];
-- drop the original (now empty) table
DROP TABLE [dbo].[tblEmailsSent];
-- rename new table to old table's name
EXEC sp_rename 'NEW_tblEmailsSent','tblEmailsSent';
COMMIT;
After the new table has IDENTITY property you normally should set the current identity value to the maximum of the actual values in your table. If you don't do it, new rows inserted into the table would start from 1.
One way to do it is to run DBCC CHECKIDENT after you switched the tables:
DBCC CHECKIDENT('dbo.tblEmailsSent')
Alternatively, you can specify the new seed in the table definition:
CREATE TABLE [dbo].[NEW_tblEmailsSent](
[id] [int] IDENTITY(<max value of id + 1>, 1) NOT NULL,

SQL DLETE Statement not working

I have a customized application in my company where I can create a place for users to input their values to a database.
The table where I am submitting the data has 5 columns with its SQL CREATE Query as below:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Log_Ongoing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[LogType] [int] NULL,
[ActivityDate] [datetime] NOT NULL,
[ActivityDescription] [text] NULL,
[Train] [int] NULL,
CONSTRAINT [PK_Log_Ongoing] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[Log_Ongoing] WITH CHECK ADD CONSTRAINT [FK_Log_Ongoing_Trains] FOREIGN KEY([Train])
REFERENCES [dbo].[Trains] ([Id])
GO
ALTER TABLE [dbo].[Log_Ongoing] CHECK CONSTRAINT [FK_Log_Ongoing_Trains]
GO
The purpose of this table is to record the ongoing activities in the plant.
The user can come later and modify those activities by updating, adding or deleting through the application by choosing the report data then modifying the data.
My thinking was that before the user submits the data I will delete the old data with the same report date first then insert the new data again.
Unfortunately the data is submitted successfully, but not deleted.
I made a SQL trace to check the queries that the application sends to the database, and I found the below two statements:
exec sp_executesql N'DELETE FROM Log_Ongoing WHERE ActivityDate = #StartDate',N'#startDate datetimeoffset(7)',#startDate='2017-02-12 07:00:00 +02:00'
exec sp_executesql N'INSERT INTO Log_Ongoing (LogType, ActivityDate, ActivityDescription, Train ) VALUES (1,#StartDate, #Activity, #Train)',N'#Train int,#Activity nvarchar(2),#startDate datetimeoffset(7)',#Train=1,#Activity=N'11',#startDate='2017-02-12 07:00:00 +02:00'
When I tested the INSERT staement in the SSMS, it worked fine, but then when I tested the DELETE statement, it didn't work. What is wrong with this query?

Trigger to change autoincremented value

I have a simple table with tax rates
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TaxRates](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NULL,
CONSTRAINT [PK_TaxRates] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
If user deleted record I want to not change autoincrementer while next insert.
To have more clearance.
Now I have 3 records with id 0,1 and 2. When I delete row with Id 2 and some time later I add next tax rate I want to have records in this table like before 0,1,2.
There shouldn't be chance to have a gap like 0,1,2,4,6. It must be trigger.
Could you help with that?
You need to accept gaps or don't use IDENTITY
id should have no external meaning
You can't update IDENTITY values
IDENTITY columns will always have gaps
In this case you'd update the clustered Pk which will be expensive
What about foreign keys? you'd need a CASCADE
Contiguous numbers can be generated with ROW_NUMBER() at read time
Without IDENTITY (whether you load this table or another) won't be concurrency-safe under load
Trying to INSERT into a gap (by an INSTEAD OF trigger) won't be concurrency-safe under load
(Edit) History tables may have the deleted values anyway
An option, if the identity column has become something passed around in your organization is to duplicate that column into a non-identity column on the same table and you can modify those new id values at will while retaining the actual identity field.
turning identity_insert on and off can allow you to insert identity values.

Increasing performance on a logging table in SQL Server 2005

I have a "history" table where I log each request into a Web Handler on our web site. Here is the table definition:
/****** Object: Table [dbo].[HistoryRequest] Script Date: 10/09/2009 17:18:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HistoryRequest](
[HistoryRequestID] [uniqueidentifier] NOT NULL,
[CampaignID] [int] NOT NULL,
[UrlReferrer] [nvarchar](512) NOT NULL,
[UserAgent] [nvarchar](512) NOT NULL,
[UserHostAddress] [nvarchar](15) NOT NULL,
[UserHostName] [nvarchar](512) NOT NULL,
[HttpBrowserCapabilities] [xml] NOT NULL,
[Created] [datetime] NOT NULL,
[CreatedBy] [nvarchar](100) NOT NULL,
[Updated] [datetime] NULL,
[UpdatedBy] [nvarchar](100) NULL,
CONSTRAINT [PK_HistoryRequest] PRIMARY KEY CLUSTERED
(
[HistoryRequestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[HistoryRequest] WITH CHECK ADD CONSTRAINT [FK_HistoryRequest_Campaign] FOREIGN KEY([CampaignID])
REFERENCES [dbo].[Campaign] ([CampaignId])
GO
ALTER TABLE [dbo].[HistoryRequest] CHECK CONSTRAINT [FK_HistoryRequest_Campaign]
GO
37 seconds for 1050 rows on this statement:
SELECT *
FROM HistoryRequest AS hr
WHERE Created > '10/9/2009'
ORDER BY Created DESC
Does anyone have anysuggestions for speeding this up? I have a Clustered Index on the PK and a regular Index on the CREATED column. I tried a Unique Index and it barfed complaining there is a duplicate entry somewhere - which can be expected.
Any insights are welcome!
You are requesting all columns (*) over a non-covering index (created). On a large data set you are guaranteed to hit the Index Tipping Point where the clustered index scan is more efficient than an nonclustered index range seek and bookmark lookup.
Do you need * always? If yes, and if the typical access pattern is like this, then you must organize the table accordingly and make Created the leftmost clustered key.
If not, then consider changing your query to a coverable query, eg. select only HistoryRequestID and Created, which are covered by the non clustered index. If more fields are needed, add them as included columns to the non-clustered index, but take into account that this will add extra strorage space and IO log write time.
Hey, I've seen some odd behavior when pulling XML columns in large sets. Try putting your index on Created back, then specify the columns in your select statement; but omit the XML. See how that affects the return time for results.
For a log table, you probably don't need a uniqueidentifier column. You're not likely to query on it either, so it's not a good candidate for a clustered index. Your sample query is on "Created", yet there's no index on it. If you query frequently on ranges of "Created" values then it would be a good candidate for clustering even though it's not necessarily unique.
OTOH, the foreign key suggests frequent querying by Campaign, in which case having the clustering done by that column could make sense, and would also probably do a better job of scattering the inserted keys in the indexes - both the surrogate key and the timestamp would add records in sequential order, which is net more work over time for insertions because the node sectors are filled less randomly.
If it's just a log table, why does it have update audit columns? It would normally be write-only.
Rebuild indexes. Use WITH (NOLOCK) clause after the table names where appropriate, this probably applies if you want to run long(ish) running queries against table that are heavily used in a live environment (such as a log file). It basically means your query migth miss some of teh very latest records but you also aren't holding a lock open on the table - which creates additional overhead.