To Find and Eliminate the Duplicates

To Find and Eliminate the Duplicates - sql

I use this code to update one of my table by calling a function which generates a random ID each item. I started with around 1000 rows but now the size is growing and i find that there are duplicate ID's in the table. Is there any way to can modify the code i am using, so that it look for ID's that are already generated in the table so that it will generate a new code if there is a similar one. I also noticed

Your code shows you setting the field password, but the results show that UniqueID is the duplicated field. (Maybe it's password renamed?)
Assuming userId is unique: (if not, ADD an actual identity column NOW, "ALTER TABLE dbo.Users ADD ID INT NOT NULL IDENTITY(1, 1)" should do the trick) and assuming password is the field to change, use the following:
DECLARE #FN VARCHAR(20);
DECLARE #LN VARCHAR(20);
DECLARE #PW VARCHAR(20);
DECLARE #ID INT;
SELECT TOP 1
#FN = FirstName,
#LN = LastName,
#ID = userID
FROM dbo.Users
WHERE Password IS NULL;
WHILE ##ROWCOUNT = 1
BEGIN
SET #PW = dbo.GenerateID(FirstName, LastName);
WHILE EXIST (SELECT TOP 1 Password FROM dbo.Users WHERE Password = #PW)
SET #PW = dbo.GenerateID(FirstName, LastName);
UPDATE dbo.Users SET Password = #PW WHERE userId = #ID;
SELECT TOP 1
#FN = FirstName,
#LN = LastName,
#ID = userID
FROM dbo.Users
WHERE Password IS NULL;
END
This should look for a blank password. If none is found the outer loop is skipped. If one is found, we generate passwords until we find one not in the table. Next we look for another row with a blank password before the end of the outer loop.

Sounds like your new to this. Don't worry, TSQL is pretty easy to learn. First thing first, I suggest that you create a unique non-clustered index on the UniqueID column--this will prevent duplicates values from being inserted into your table. If someone does try to insert a duplicate value into the table it will throw an exception. Before you can use this though you'll need to remove all the duplicate 'UniqueID' values from your table.
CREATE UNIQUE NONCLUSTERED INDEX [IDX_UniqueID] ON [dbo].[Users]
(
[UniqueID] ASC
) ON [PRIMARY]
You can learn more about non-clustered indexes here: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described
I also suggest that you consider changing the underlying type of your UniqueID field to a 'uniqueidentifier.' Here's an example of a table schema that uses a 'uniqueidentifier' column type for the UniqueID column:
CREATE TABLE [dbo].[Users](
[personId] [int] IDENTITY(1,1) NOT NULL,
[firstName] [nvarchar](50) NOT NULL,
[lastName] [nvarchar](50) NOT NULL,
[UniqueID] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
[personId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
A 'uniqueidentifier' column type in SQL Serever holds a Global Unique Identifier (aka a GUID or UUID). It's easy to generate a GUID in most languages. To generate a GUID in TSQL you just new to invoke the NEWID() function.
SELECT NEWID() -- output: D100FC00-B482-4580-A161-199BE264C1D1
You can learn more about GUIDs here: https://en.wikipedia.org/wiki/Universally_unique_identifier
Hope this helps. Best of luck on your project. :)

Related

SQL Server identity number is not consecutive

I am creating a database table with an index number (CustRef) for each row.
The index starts from 1 and increases 1 with each new row. The query language as followed:
CREATE TABLE [dbo].[CustDetails]
(
[CustRef] [int] IDENTITY(1,1) NOT NULL,
[LName] [nchar](25) NOT NULL,
[FName] [nchar](25) NOT NULL,
[Address] [nchar](80) NULL,
[Suburb] [nchar](25) NULL,
[State] [nchar](5) NULL,
[PCode] [nchar](5) NULL,
CONSTRAINT [PK_CustDetails]
PRIMARY KEY CLUSTERED ([CustRef] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
The table is created successfully and I was testing it by inserting some sample data. At one stage, I deleted a row, which the index number (CustRef) was 6. After I deleted index 6 row and continue inserting sample data.
Unfortunately, the index number is not consecutive. In other word, I was expecting the new data content will use the index 6 as its row index number. However, new entries skip index 6, it starts from index 7.
As you can see from the above screenshot, between index 5 and 7, index 6 is missing.
How to resolve this issue? Thanks in advance.

1. Create the table
CREATE TABLE [dbo].[CustDetails]
(
[CustRef] [int] IDENTITY(1,1) NOT NULL,
[LName] [nchar](25) NOT NULL,
[FName] [nchar](25) NOT NULL,
[Address] [nchar](80) NULL,
[Suburb] [nchar](25) NULL,
[State] [nchar](5) NULL,
[PCode] [nchar](5) NULL,
)
GO
Insert records
deleted the last record. in my case it was 4th record.
Set identity insert on
SET IDENTITY_INSERT [dbo].[CustDetails] ON;
Inserted the record with identity column included.
insert into [dbo].[CustDetails] ([CustRef] , [LName], [FName] , [Address],
[Suburb], [State] , [PCode])values (4, 'test4',
'test','test','test','test','test')
this approach is good for testing. But normally, unless you truncate the table, your index will not reset in identity columns in SQL server as it is already assigned. So this is kind of a solution. using Identity insert you get the row count and by based on max(row count) you can set identity column value when inserting a record.

In your table CustDetails column CustRef is identity column. It auto increments at 1,2,3,4,5,6 now you deleted last row i.e. delete from CustDetails where CustRef=6. Technically next row will be inserted is 7 because SQL has already assigned value 6 to previous row. In other scenario if you had entered upto 1,2,3,4,5,6,7,8,9 and now deleting 6th row. what do you expect next value according to your requirement? 10 or 6?
I had implemented logic when client was asking to provide facility to delete last row only and no number to be skipped, I had write a logic of maxvalue + 1 for next row. you can write as remove identity and max(CustRef)+1 at time of insert. Alternatively you can use dbcc reseed that after each delete you write
declare #id int = 0
SELECT #id = MAX(CustRef) FROM CustDetails
DBCC CHECKIDENT ('dbo.CustDetails', reseed, #id)
this will tell sql to reset value to last max. which is not at all recommended way to do so. this is just an option but yeah you can choose remove identity and write logic for get max(CustRef) before insert statement and increment it and insert into table CustDetails.

Well, if it didn't work that way, then you would run into this:
insert custRef ...; // id = 1
insert custRef ...; // id = 2
insert custRef (fName, lName)
values ('Jane', 'Goodall');
// id = 3
Jane Goodall! She deserves a prize.
declare #giveAPrizeTo int = (
select id
from custRef
where fname = 'Jane'
and lname = 'Goodall'
);
We'll actually deliver it in a bit.
But first, another task. Customers that can't be contacted aren't really useful. The boss asked that we keep the table pure.
delete custRef where address is null;
alter table custRef alter column address nchar(80) not null;
Okay, well, moving on, add the next person.
insert custRef (fName, lName, address)
values ('Jeffrey', 'Dahmer', '1234 W Somewhere; Milwaukee, Wisconsin 12345');
In this hypothetical dialect where id's get recycled, Jeffrey Dahmer now has id = 3.
Hmm, that's interesting. I should be careful of the newest customer. Well, I got distracted, what was I doing? Oh yeah, that prize! I'd better deliver it.
declare #message nvarchar(max) = (
select 'Congratulations ' + fName + ' ' + lName ', ' +
'we support your good works. Have a prize!'
from custRef
where id = #giveAPrizeTo;
);
print (#message);
Congratulations Jeffrey Dahmer, we support your good works. Have a Prize!
Oops!

Data gets changed when copying data in chunks between two identical tables

In short, I am trying to copy data from one table to another nearly identical table (minus constraints, indices, and a precision change to a decimal column) in batches using Insert [NewTable] Select Top X * from [Table] but some data is getting changed during the copy. Read on for more details.
Why we are copying in the first place
We are altering the precision of a couple of columns in our largest table and do not have the time in our deployment window to do a simple alter statement. As an alternative, we decided to create a table with the new schema and copy the data in batches in the days leading up to the deploy to allow us to simple drop the old table and rename this table during the deployment window.
Creation scripts for new and old tables
These are not the exact tables we have in our DB, but they've been trimmed down for this question. The actual table has ~100 columns.
CREATE TABLE [dbo].[Table]
(
[Id] BIGINT NOT NULL PRIMARY KEY NONCLUSTERED IDENTITY,
[ForeignKey1] INT NOT NULL,
[ForeignKey2] INT NOT NULL,
[ForeignKey3] INT NOT NULL,
[Name] VARCHAR(MAX) NOT NULL,
[SomeValue] DECIMAL(14, 5) NULL,
CONSTRAINT [FK_Table_ForeignKeyTable1] FOREIGN KEY ([ForeignKey1]) REFERENCES [ForeignKeyTable1]([ForeignKey1]),
CONSTRAINT [FK_Table_ForeignKeyTable2] FOREIGN KEY ([ForeignKey2]) REFERENCES [ForeignKeyTable2]([ForeignKey2]),
CONSTRAINT [FK_Table_ForeignKeyTable3] FOREIGN KEY ([ForeignKey3]) REFERENCES [ForeignKeyTable3]([ForeignKey3]),
)
GO
CREATE INDEX [IX_Table_ForeignKey2] ON [dbo].[Table] ([ForeignKey2])
GO
CREATE TABLE [dbo].[NewTable]
(
[Id] BIGINT NOT NULL PRIMARY KEY NONCLUSTERED IDENTITY,
[ForeignKey1] INT NOT NULL,
[ForeignKey2] INT NOT NULL,
[ForeignKey3] INT NOT NULL,
[Name] VARCHAR(MAX) NOT NULL,
[SomeValue] DECIMAL(16, 5) NULL
)
SQL I wrote to copy data
DECLARE #BatchSize INT
DECLARE #Count INT

-- Leave these the same --
SET #Count = 1

-- Update these to modify run behavior --
SET #BatchSize = 5000

WHILE #Count > 0
BEGIN
SET IDENTITY_INSERT [dbo].[NewTable] ON;
INSERT INTO [dbo].[NewTable]
([Id],
[ForeignKey1],
[ForeignKey2],
[ForeignKey3],
[Name],
[SomeValue])
SELECT TOP (#BatchSize)
[Id],
[ForeignKey1],
[ForeignKey2],
[ForeignKey3],
[Name],
[SomeValue]
FROM [dbo].[Table]
WHERE not exists(SELECT 1 FROM [dbo].[NewTable] WHERE [dbo].[NewTable].Id = [dbo].[Table].Id)
ORDER BY Id

SET #Count = ##ROWCOUNT

SET IDENTITY_INSERT [dbo].[NewTable] OFF;
END
The Problem
Somehow data is getting garbled or modified in a seemingly random pattern during the copy. Most (maybe all) of the modified data we've seen has been for the ForeignKey2 column. And the value we end up with in the new table is seemingly random as well as it didn't exist at all in the old table. There doesn't seem to be any rhyme or reason to which records it affects either.
For example, here is one row for the original table and the corresponding row in the new table:
Old Table
ID: 204663
FK1: 452
FK2: 522413
FK3: 11190
Name: Masked
Some Value: 0.0
New Table
ID: 204663
FK1: 452
FK2: 120848
FK3: 11190
Name: Masked but matches Old Table
Some Value: 0.0
Environment
SQL was run in SSMS. Database is an Azure SQL Database.

Update all records with same boolean if at least one record meets the criteria?

I am wondering if they way I am trying to update a set of records is the best way or if there is a more efficient way to handle this.
Example Table:
CREATE TABLE [dbo].[ListItems] (
[Id] int NOT NULL IDENTITY(1,1) ,
[EmailAddress] nvarchar(MAX) NULL ,
[FirstName] nvarchar(MAX) NULL ,
[LastName] nvarchar(MAX) NULL ,
[IpAddress] nvarchar(MAX) NULL,
[IsUnsubscribed] bit,
[Md5Hash] varchar(250) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ListId] int,
CONSTRAINT [FK_dbo.ListItems_dbo.Lists_ListId] FOREIGN KEY ([ListId]) REFERENCES [dbo].[Lists] ([Id]) ON DELETE CASCADE ON UPDATE NO ACTION
)
CREATE INDEX [IX_MD5] ON [dbo].[ListEmailItems]
([Md5Hash] ASC)
WITH (FILLFACTOR = 80)
ON [PRIMARY]
GO
User's can be part of multiple lists and there is a flag if they are unsubscribed from a specific list. There are times where a user wants to be removed from all lists. Their ID will be different for each list, so I cannot use the ID as an identifier. I have an index on the Md5Hash so I am using that since it is unique for each email address. This is what I setup so far in that case, but it is slow if there are a lot of records to look through:
Update ListItems set IsUnsubscribed = 1 where IsUnsubscribed = 0 and Md5Hash in (Select Md5Hash from ListItems where IsUnsubscribed = 1)
I was curious if there is a better way of doing this.

This is about as good as you're going to do, but I would probably re-write it as an EXISTS:
UPDATE li
SET IsUnsubscribed = 1
FROM dbo.ListItems AS li -- always use schema prefix!
WHERE IsUnsubscribed = 0
AND EXISTS
(
SELECT 1 FROM dbo.ListItems
WHERE Md5Hash = li.Md5Hash
AND IsUnsubscribed = 1
);
Of course another idea would be to check for the existence of at least one value of 1 at query time, instead of constantly having to run this query to keep all the values at 1. This is busy work for no good reason.
I still think you can abstract this status away into another table (as I suggested in my comment) in a way that is transparent to the surrounding infrastructure. Views, synonyms, enforcing data access through stored procedures, etc. can all assist in this...

Alter an existing Identity Column's Increment value

I am stumped,
I am trying to alter the increment value of Identity columns in a collection of existing MS SQL tables (which all have data) and have been trying to research if it is possible to do without writing custom scripts per table.
I can't find a solution that doesn't require dropping and recreating the tables which would require a different script for each table as they each have different column lists.
for example i want to change the existing table
CREATE TABLE [dbo].[ActionType](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Action] [varchar](100) NOT NULL,
CONSTRAINT [PK_ActionType] PRIMARY KEY CLUSTERED
(
[ID] ASC
) ON [PRIMARY]
) ON [PRIMARY]
To
CREATE TABLE [dbo].[ActionType](
[ID] [int] IDENTITY(1,5) NOT NULL,
[Action] [varchar](100) NOT NULL,
CONSTRAINT [PK_ActionType] PRIMARY KEY CLUSTERED
(
[ID] ASC
) ON [PRIMARY]
) ON [PRIMARY]
Via something like
exec sp_AlterIncrement #TABLE_NAME = 'ActionType', #NEW_ICREMENT = 5
While keeping the data.
This would fix a big deployment issue i am facing right now so any help would be appreciated

You can not alter identity increment after you create it.It is possible just to change seed value with DBCC Chekident .
You should drop and recreate the column.

I had to do that before on a small table and it's fairly easy to do, trick is that you have to update it to something that currently doesn't exist as a key, and then back, since you can't increment it by 1 because that key already exists. It takes 2 updates, for a table with IDs smaller than 100 for example:
update my_table set id = id+100;
update my_table set id = id-99;

But anyways , I do not understand why you want to alter the identity value, Because anyhow you will keep the same as primary key or part of the clustered key.
Also, if any change in the column type is being required then i don't think that there is a possibility without altering the table structure.
Alter table ActionType
Alter column ID
You can also revert to the original structure when not required. This can be used for the specified case as well, As if you require this on demand basis.
Please suggest so that i can provide the further feedback.

Couple of things, maybe too much info but helpful when do stuff like this. The following will set the increment to whatever you want:
DBCC CHECKIDENT ([DB.Schema.Table], reseed, 0) --First record will have a 1. You can set it to any value
If you want to insert data into a table that has an identity but you need to force the value to something specific, do this:
SET IDENTITY_INSERT [DB].[schema].[Table] ON
...Add your data here
SET IDENTITY_INSERT [DB].[schema].[Table] OFF

Sometimes this is necessary.this might provide an answer. For example existing table is identity(1,1) [ex below would be A]
It contains value but you would like to change it to increment of to let's say so that it works well with another table [ex below would be B]
So a would have odd ids + whatever it use to contains.while be would now have even number
this script show you how to do it.
create table A(id int identity(1,1),v char)
insert into A
Select 'A'
union select 'B'
union select 'C'
go
create table B(id int identity(1,2),v char)
go
SET IDENTITY_INSERT B ON
GO
insert into B(Id,v)
Select Id,v from A
go
SET IDENTITY_INSERT B OFF
GO
insert into B
Select 'D'
union select 'E'
go
drop table A
go
EXEC sp_RENAME 'B' , 'A'
go
Select * from A
go
Select max(Id)+1 from A
go
create table B(id int identity(8,2),v char)
go
insert into B
Select 'A'
union select 'B'
union select 'C'
go
Select * from B

If you need to reenumerate or compress your Identity field, the easiest way is as follows:
Convert, temporarily, your identity filed into an integer
Replace the values using for example an Excel sheet in other to fill them up
Copy and Paste the column in your Excel file into the Int field.
Save the table
Open it again in design mode and change back the Int field into an Identity
If this Identity field is used in a child table, make sure you have a trigger to also export the new values into the dependant tables .
And that's all.
If you need to control Identity data in your applicaton, just change it to Int and manage the incremental values with code with the Dmax function.
Hope it helps

Inserting record from one column to another column in the same scope or statement

I have a Stored Procedure that populates a table: This table as indicated in the code below has an identity column which is also the primary key column.
I would like to append the primary key to contain leading letters: Example: ABC123.
Obviously this is not possible because the Primary key column is INT datatype.
So I created an additional column so that I can insert the appended primary key. This works except I have to make the new column Null and I am using an UPDATE statement.
Something tells me there is a better way.
Is there a way I can do this without using UPDATE after the initial Insert and have the new column CategoryID as Not Null?
Table Code:
CREATE TABLE [dbo].[Registration] (
[SystemID] INT IDENTITY (100035891, 1) NOT NULL,
[CategoryID] CHAR (13) NULL,
[FName] VARCHAR (30) NOT NULL,
[LName] VARCHAR (30) NOT NULL,
[MInit] CHAR (1) NULL,
PRIMARY KEY CLUSTERED ([SystemID] ASC)
);
Stored Procedure:
CREATE PROCEDURE [dbo].[uspInsertRegistration]
#FName VARCHAR(30),
#LName VARCHAR(30),
#MInit CHAR(1),
#CategoryID CHAR(13),
#SystemID int OUTPUT
AS
BEGIN
SET NOCOUNT ON
DECLARE #ErrCode int
INSERT INTO [dbo].[Registration] ([FName],[LName],[MInit])
VALUES (#FName, #LName, #MInit)
SELECT #ErrCode = ##ERROR, #SystemID = SCOPE_IDENTITY()
UPDATE [dbo].[Registration]
SET CategoryID = 'ABC'+ CAST(SystemID AS CHAR)
SET NOCOUNT OFF
RETURN #ErrCode
END
Finally this is what the table looks like with the data:
Thanks for being contagious with your knowledge. :)
Guy

My suggestion is to use a computed column, as what you're trying to do introduces redundancy. See below:
http://msdn.microsoft.com/en-us/library/ms191250%28v=sql.105%29.aspx
Alternately, make it big enough to contain a GUID, put a GUID into the column on the insert, then update it afterwards.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

To Find and Eliminate the Duplicates - sql

Related

SQL Server identity number is not consecutive

Data gets changed when copying data in chunks between two identical tables

Update all records with same boolean if at least one record meets the criteria?

Alter an existing Identity Column's Increment value

Inserting record from one column to another column in the same scope or statement

Categories

Resources