SQL Server Insert with out duplicate - sql

I have a table with millions of records SQL DB. I want to insert a record if new one is not a duplicate record. But I dont want to check whether duplicate record exists. Is there any way to insert directly and if duplicate record exists just ignore the new insert?

You might be able to achieve what you want by applying a UNIQUE INDEX on the table and specifying IGNORE_DUP_KEY ON:
Specifies the error response when an insert operation attempts to insert duplicate key values into a unique index. The IGNORE_DUP_KEY option applies only to insert operations after the index is created or rebuilt. The option has no effect when executing CREATE INDEX, ALTER INDEX, or UPDATE. The default is OFF.
ON
A warning message will occur when duplicate key values are inserted into a unique index. Only the rows violating the uniqueness constraint will fail.
So the insert will succeed for new unique rows, but you can't avoid the warning message.
You would create this index across all columns by which you're defining uniqueness/duplicates - which may or may not be all columns in the table - you haven't given us a definition to work from.

If you are inserting record from a Table
INSERT INTO INSERT_TABLE_NAME
(.....)
SELECT
(.....)
FROM TABLE_NAME T1
INNER JOIN INSERT_TABLE_NAME T2
ON T1.COLUMN_NAME1<>T2.COLUMN_NAME1
OR T1.COLUMN_NAME2<>T2.COLUMN_NAME2
OR ...
If you are inserting record by values
INSERT INTO INSERT_TABLE_NAME
(.....)
VALUES
(.....)
WHERE
ON VALUE1<>T2.COLUMN_NAME1
OR VALUE2<>T2.COLUMN_NAME2
My solution is only suitable when Column in you table are in reasonable number.
Ofcouse #Damien_The_Unbeliever have given a better solution. But you can't implement it After some point.

Related

Adding a computed column that uses MAX

I need to create a sequential number column for record number proposes
I am OK with losing sequence if I delete a row from the middle of the table
For example
1
2
3
If I delete 2, I am ok with new column been 4.
I tried to alter my table to
alter table [dbo].[mytable]
add [record_seq] as (MAX(record_seq) + 1)
but I am getting An aggregate may not appear in a computed column expression or check constraint.
Which is a bit confusing? do I need to specify an initial value? is there a better way?
If you're looking to allocate a sequence number even in cases where the table doesn't get a record inserted, I would handle it in the process responsible for performing those inserts. Create another table, in this table keep track of the max identity value of that sequence. Each time you want to perform an insert, reserve the sequence number you want by updating that table first. If you rely on selecting the max existing value, you could be at risk of multiple sessions getting the same "new" sequence number before inserting. Even if the insert fails, you will have incremented that control table so nothing else uses that value that has been reserved.
Its not supported in MsSql. You can use identity column:
ALTER TABLE [dbo].[mytable]
ADD [record_seq] INT IDENTITY
Or use trigger to update your seq column after insert and/or delete

Insert Records with Violations in SQL Server

I want to populate 5000 records in the below format to a particular table.
Insert into #Table
(c1,c2,c3,c4,c5)
Values
(1,2,3,4,5),
(2,2,3,4,5),
(3,2,3,4,5),
(4,2,3,4,5),
(5,2,3,4,5)
....
....
Up to 1000 rows
When I try to execute it. I got a foreign Key violation. I know the reason since one of the value did not exist in its corresponding parent table.
There are few records causing this violation. It's very hard to find those violated rows among the 1000 rows so I want to insert at least the valid records to my target table leaving the violated rows as it is for now.
I am not sure how to perform this. Please suggest me any ideas to do this.
If this is a one time thing, then you can do the following:
Drop the FK constraint
ALTER TABLE MyTAble
DROP CONSTRAINT FK_Contstraint
GO
Execute INSERT
Find the records with no matching parent id.
SELECT * FROM MyTable MT WHERE NOT EXISTS (SELECT 1 FROM ParentTable PT WHERE MT.ParentId = PT.ID)
DELETE those records or do something else with them.
Recreate the FK constraint.
Disable the foreign key or fix your data.
Finding the bad data is simple - you can always temporarily insert it into a buffer table and run queries to find which data is missing in the related table.

SQL Server concurrency

I asked two questions at once in my last thread, and the first has been answered. I decided to mark the original thread as answered and repost the second question here. Link to original thread if anyone wants it:
Handling SQL Server concurrency issues
Suppose I have a table with a field which holds foreign keys for a second table. Initially records in the first table do not have a corresponding record in the second, so I store NULL in that field. Now at some point a user runs an operation which will generate a record in the second table and have the first table link to it. If two users simultaneously try to generate the record, a single record should be created and linked to, and the other user receives a message saying the record already exists. How do I ensure that duplicates are not created in a concurrent environment?
The steps I need to carry out are:
1) Look up x number of records in table A
2) Perform some business logic that prepares a single row which is inserted into table B
3) Update the records selected in step 1) to point to the newly created record in table B
I can use scope_identity() to retrieve the primary key of the newly created record in table B, so I don't need to worry about the new record being lost due to simultaneous transactions. However I need to eliminate the possibility of concurrently executing processes resulting in a duplicate record in table B being created.
In SQL Server 2008, this can be handled with a filtered unique index:
CREATE UNIQUE INDEX ix_MyIndexName ON MyTable (FKField) WHERE FkField IS NOT NULL
This will require all non-null values be unique, and the database will enforce it for you.
The 2005 way of simulating a unique filtered index for constraint purposes is
CREATE VIEW dbo.EnforceUnique
WITH SCHEMABINDING
AS
SELECT FkField
FROM dbo.TableB
WHERE FkField IS NOT NULL
GO
CREATE UNIQUE CLUSTERED INDEX ix ON dbo.EnforceUnique(FkField)
Connections that update the base table will need to have the correct SET options but unless you are using non default options this will be the case anyway in SQL Server 2005 (ARITH_ABORT used to be the problem one in 2000)
Using a computed column
ALTER TABLE MyTable ADD
OneNonNullOnly AS ISNULL(FkField, -PkField)
CREATE UNIQUE INDEX ix_OneNullOnly ON MyTable (OneNonNullOnly);
Assumes:
FkField is numeric
no clash of FkField and -PkField values
Decided to go with the following:
1) Begin transaction
2) UPDATE tableA SET foreignKey = -1 OUTPUT inserted.id INTO #tempTable
FROM (business logic)
WHERE foreignKey is null
3) If ##rowcount > 0 Then
3a) Create record in table 2.
3b) Capture ID of newly created record using scope_identity()
3c) UPDATE tableA set foreignKey = IdOfNewRecord FROM tableA INNER JOIN #tempTable ON tableA.id = tempTable.id
Since I write junk into the foreign key field in step 2), those rows are locked and no concurrent transactions will touch them. The first transaction is free to create the record. After the transaction is committed, the blocked transaction will execute the update query, but won't capture any of the original rows due to the WHERE clause only considering NULL foreignKey fields. If no rows are returned (##rowcount = 0), the current transaction exits without creating the record in table B, and returns some sort of error message to the client. (e.g. Error: Record already exists)

How to apply Unique key constraint on 2 columns in the table when the values in the table are showing error

The CREATE UNIQUE INDEX statement terminated because a duplicate key was found
for the object name 'dbo.tblhm' and the index name 'New_id1'. The duplicate
key value is (45560, 44200).
i want to know how to work on unique key constraint taking 2 columns together.Such that the values previously stored in the database are not in that format.Such it is showing me the above error ,So how to overcome that so that all the work can be done and no column value in the database gets deleted
If I follow you correctly, you have a duplicate key which you want to ignore but still want to apply a unique constraint going forward? I don't think this is possible. Either you need to remove the duplicate row (or update it such that it is not a duplicate), move the duplicated data into an archive table without a unique index or add the index to the existing table without a unique constraint.
I stand to be corrected, but I don't think there is any other way round this.
Lets assume that you are creating a Unique Index on columns : column1 and column2 in your table dbo.tblhm
This would assume that there is no repitition of any combination of column1, column2 values in any rows in table dbo.tblhm
As per your error, the following combination (45560, 44200) of values for column1, column2 is present in more than 1 row and hence the constraint fails.
What you need to do is to clean up your data first using an UPDATE statement to change the column1 or column2 values in the rows that are duplicates BEFORE you try to create the constraint.
AFAIK, in Oracle you have the " novalidate" keyword which can be used to achieve what you want to do without cleaning up the existing data. But atleast I am not aware of any way to achieve that in SQL Server without first cleaning up the data
The error means exactly what it says - there is more than one row with the same key.
i.e. for
CREATE UNIQUE INDEX New_id1 on dbo.tblhm(Column1, Column2)
there is more than one row with the same values for Column1 and Column2
So either
Your data is corrupt (e.g. inserting without checking for duplicates) - you will need to find and merge / delete duplicate keys before recreating the index
Or your Index can't be unique (e.g. there is a valid reason why there can be more than one row with this key, e.g. at a business level).

Changing table field to UNIQUE

I want to run the following sql command:
ALTER TABLE `my_table` ADD UNIQUE (
`ref_id` ,
`type`
);
The problem is that some of the data in the table would make this invalid, therefore altering the table fails.
Is there a clever way in MySQL to delete the duplicate rows?
SQL can, at best, handle this arbitrarily. To put it another way: this is your problem.
You have data that currently isn't unique. You want to make it unique. You need to decide how to handle the duplicates.
There are a variety of ways of handling this:
Modifying or deleting duplicate rows by hand if the numbers are sufficiently small;
Running statements to update or delete duplicate that meet certain criteria to get to a point where the exceptions can be dealt with on an individual basis;
Copying the data to a temporary table, emptying the original and using queries to repopulate the table; and
so on.
Note: these all require user intervention.
You could of course just copy the table to a temporary table, empty the original and copy in the rows just ignoring those that fail but I expect that won't give you the results that you really want.
if you don't care which row gets deleted, use IGNORE:
ALTER IGNORE TABLE `my_table` ADD UNIQUE (
`ref_id` ,
`type`
);
What you can do is add a temporary identity column to your table. With that you can write query to identify and delete the duplicates (you can modify the query little bit to make sure only one copy from the set of duplicate rows are retained).
Once this is done, drop the temporary column and add unique constraint to your original column.
Hope this helps.
What I've done in the past is export the unique set of data, drop the table, recreate it with the unique columns and import the data.
It is often faster than trying to figure out how to delete the duplicate data.
There is a good KB article that provides a step-by-step approach to finding and removing rows that have duplicate values. It provides two approaches - a one-off approach for finding and removing a single row and a broader solution to solving this when many rows are involved.
http://support.microsoft.com/kb/139444
Here is a snippet I used to delete duplicate rows in one of the tables
BEGIN TRANSACTION
Select *,
rank() over (Partition by PolicyId, PlanSeqNum, BaseProductSeqNum,
CoInsrTypeCd, SupplierTypeSeqNum
order by CoInsrAmt desc) as MyRank
into #tmpTable
from PlanCoInsr
select distinct PolicyId,PlanSeqNum,BaseProductSeqNum,
SupplierTypeSeqNum, CoInsrTypeCd, CoInsrAmt
into #tmpTable2
from #tmpTable where MyRank=1
truncate table PlanCoInsr
insert into PlanCoInsr
select * from #tmpTable2
drop table #tmpTable
drop table #tmpTable2
COMMIT
This worked for me:
ALTER TABLE table_name ADD UNIQUE KEY field_name (field_name)
You will have to find some other field that is unique because deleting on ref_id and type alone will delete them all.
To get the duplicates:
select ref_id, type from my_table group by ref_id, type having count(*)>1
Xarpb has some clever tricks (maybe too clever): http://www.xaprb.com/blog/2007/02/06/how-to-delete-duplicate-rows-with-sql-part-2/