Bulk Insert Only New Rows - sql

I have a a group of csv files that I'm using to populate a SQL database. I'm setting this up to be a daily process. I've just discovered though that a handful of the files come in each day with all the historical data, not just the daily updates.
When I try to do the bulk insert this causes an error because the primary key is being violated.
I thought that setting IGNORE_DUP_KEY = ON would stop the non-unique records from being inserted.
CREATE TABLE TABLE1
(
Column1 varchar(32),
Column2 varchar(32),
Column3 char(9),
Column4 int,
Column5 float(53),
Column6 date,
CONSTRAINT pk_One
PRIMARY KEY (Column1, Column2, Column6)
WITH (IGNORE_DUP_KEY = ON)
);
Then when I try to run the script that updates the tables I get the unhelpful error message "Associated statement is not prepared (0)."
I could get around this by writing the results into a separate table, then writing the new rows into the table proper, but having separate handling for the different tables strikes me as painful, and ugly.
Is there an easy way to just tell SQL to only write the rows that don't violate the primary key constraint?

Insert into table1 (column1) value (value1)
... So, if you just want to ignore inserts that fail, do INSERT IGNORE INTO. Then it'll try to insert the values, fail, and move on gracefully. If you want it to replace the duplicate key values with the new values it attempted to insert, check out : (below, ignore optional, depends on if you want it to throw errors.)
INSERT [IGNORE] INTO table1 (column1) values (value1) ON DUPLICATE KEY UPDATE
so "insert ignore" is the answer, on inserts not on table creation. refer to mysql docs. and ON DUPLICATE KEY UPDATE is useful to know.

Related

Violation of UNIQUE KEY constraint '...'. Cannot insert duplicate key in object 'dbo.Cliente'. The duplicate key value is (<NULL>) [duplicate]

I want to have a unique constraint on a column which I am going to populate with GUIDs. However, my data contains null values for this columns. How do I create the constraint that allows multiple null values?
Here's an example scenario. Consider this schema:
CREATE TABLE People (
Id INT CONSTRAINT PK_MyTable PRIMARY KEY IDENTITY,
Name NVARCHAR(250) NOT NULL,
LibraryCardId UNIQUEIDENTIFIER NULL,
CONSTRAINT UQ_People_LibraryCardId UNIQUE (LibraryCardId)
)
Then see this code for what I'm trying to achieve:
-- This works fine:
INSERT INTO People (Name, LibraryCardId)
VALUES ('John Doe', 'AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAAA');
-- This also works fine, obviously:
INSERT INTO People (Name, LibraryCardId)
VALUES ('Marie Doe', 'BBBBBBBB-BBBB-BBBB-BBBB-BBBBBBBBBBBB');
-- This would *correctly* fail:
--INSERT INTO People (Name, LibraryCardId)
--VALUES ('John Doe the Second', 'AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAAA');
-- This works fine this one first time:
INSERT INTO People (Name, LibraryCardId)
VALUES ('Richard Roe', NULL);
-- THE PROBLEM: This fails even though I'd like to be able to do this:
INSERT INTO People (Name, LibraryCardId)
VALUES ('Marcus Roe', NULL);
The final statement fails with a message:
Violation of UNIQUE KEY constraint 'UQ_People_LibraryCardId'. Cannot insert duplicate key in object 'dbo.People'.
How can I change my schema and/or uniqueness constraint so that it allows multiple NULL values, while still checking for uniqueness on actual data?
What you're looking for is indeed part of the ANSI standards SQL:92, SQL:1999 and SQL:2003, ie a UNIQUE constraint must disallow duplicate non-NULL values but accept multiple NULL values.
In the Microsoft world of SQL Server however, a single NULL is allowed but multiple NULLs are not...
In SQL Server 2008, you can define a unique filtered index based on a predicate that excludes NULLs:
CREATE UNIQUE NONCLUSTERED INDEX idx_yourcolumn_notnull
ON YourTable(yourcolumn)
WHERE yourcolumn IS NOT NULL;
In earlier versions, you can resort to VIEWS with a NOT NULL predicate to enforce the constraint.
SQL Server 2008 +
You can create a unique index that accept multiple NULLs with a WHERE clause. See the answer below.
Prior to SQL Server 2008
You cannot create a UNIQUE constraint and allow NULLs. You need set a default value of NEWID().
Update the existing values to NEWID() where NULL before creating the UNIQUE constraint.
SQL Server 2008 And Up
Just filter a unique index:
CREATE UNIQUE NONCLUSTERED INDEX UQ_Party_SamAccountName
ON dbo.Party(SamAccountName)
WHERE SamAccountName IS NOT NULL;
In Lower Versions, A Materialized View Is Still Not Required
For SQL Server 2005 and earlier, you can do it without a view. I just added a unique constraint like you're asking for to one of my tables. Given that I want uniqueness in column SamAccountName, but I want to allow multiple NULLs, I used a materialized column rather than a materialized view:
ALTER TABLE dbo.Party ADD SamAccountNameUnique
AS (Coalesce(SamAccountName, Convert(varchar(11), PartyID)))
ALTER TABLE dbo.Party ADD CONSTRAINT UQ_Party_SamAccountName
UNIQUE (SamAccountNameUnique)
You simply have to put something in the computed column that will be guaranteed unique across the whole table when the actual desired unique column is NULL. In this case, PartyID is an identity column and being numeric will never match any SamAccountName, so it worked for me. You can try your own method—be sure you understand the domain of your data so that there is no possibility of intersection with real data. That could be as simple as prepending a differentiator character like this:
Coalesce('n' + SamAccountName, 'p' + Convert(varchar(11), PartyID))
Even if PartyID became non-numeric someday and could coincide with a SamAccountName, now it won't matter.
Note that the presence of an index including the computed column implicitly causes each expression result to be saved to disk with the other data in the table, which DOES take additional disk space.
Note that if you don't want an index, you can still save CPU by making the expression be precalculated to disk by adding the keyword PERSISTED to the end of the column expression definition.
In SQL Server 2008 and up, definitely use the filtered solution instead if you possibly can!
Controversy
Please note that some database professionals will see this as a case of "surrogate NULLs", which definitely have problems (mostly due to issues around trying to determine when something is a real value or a surrogate value for missing data; there can also be issues with the number of non-NULL surrogate values multiplying like crazy).
However, I believe this case is different. The computed column I'm adding will never be used to determine anything. It has no meaning of itself, and encodes no information that isn't already found separately in other, properly defined columns. It should never be selected or used.
So, my story is that this is not a surrogate NULL, and I'm sticking to it! Since we don't actually want the non-NULL value for any purpose other than to trick the UNIQUE index to ignore NULLs, our use case has none of the problems that arise with normal surrogate NULL creation.
All that said, I have no problem with using an indexed view instead—but it brings some issues with it such as the requirement of using SCHEMABINDING. Have fun adding a new column to your base table (you'll at minimum have to drop the index, and then drop the view or alter the view to not be schema bound). See the full (long) list of requirements for creating an indexed view in SQL Server (2005) (also later versions), (2000).
Update
If your column is numeric, there may be the challenge of ensuring that the unique constraint using Coalesce does not result in collisions. In that case, there are some options. One might be to use a negative number, to put the "surrogate NULLs" only in the negative range, and the "real values" only in the positive range. Alternately, the following pattern could be used. In table Issue (where IssueID is the PRIMARY KEY), there may or may not be a TicketID, but if there is one, it must be unique.
ALTER TABLE dbo.Issue ADD TicketUnique
AS (CASE WHEN TicketID IS NULL THEN IssueID END);
ALTER TABLE dbo.Issue ADD CONSTRAINT UQ_Issue_Ticket_AllowNull
UNIQUE (TicketID, TicketUnique);
If IssueID 1 has ticket 123, the UNIQUE constraint will be on values (123, NULL). If IssueID 2 has no ticket, it will be on (NULL, 2). Some thought will show that this constraint cannot be duplicated for any row in the table, and still allows multiple NULLs.
For people who are using Microsoft SQL Server Manager and want to create a Unique but Nullable index you can create your unique index as you normally would then in your Index Properties for your new index, select "Filter" from the left hand panel, then enter your filter (which is your where clause). It should read something like this:
([YourColumnName] IS NOT NULL)
This works with MSSQL 2012
When I applied the unique index below:
CREATE UNIQUE NONCLUSTERED INDEX idx_badgeid_notnull
ON employee(badgeid)
WHERE badgeid IS NOT NULL;
every non null update and insert failed with the error below:
UPDATE failed because the following SET options have incorrect settings: 'ARITHABORT'.
I found this on MSDN
SET ARITHABORT must be ON when you are creating or changing indexes on computed columns or indexed views. If SET ARITHABORT is OFF, CREATE, UPDATE, INSERT, and DELETE statements on tables with indexes on computed columns or indexed views will fail.
So to get this to work correctly I did this
Right click [Database]-->Properties-->Options-->Other
Options-->Misscellaneous-->Arithmetic Abort Enabled -->true
I believe it is possible to set this option in code using
ALTER DATABASE "DBNAME" SET ARITHABORT ON
but i have not tested this
It can be done in the designer as well
Right click on the Index > Properties to get this window
Create a view that selects only non-NULL columns and create the UNIQUE INDEX on the view:
CREATE VIEW myview
AS
SELECT *
FROM mytable
WHERE mycolumn IS NOT NULL
CREATE UNIQUE INDEX ux_myview_mycolumn ON myview (mycolumn)
Note that you'll need to perform INSERT's and UPDATE's on the view instead of table.
You may do it with an INSTEAD OF trigger:
CREATE TRIGGER trg_mytable_insert ON mytable
INSTEAD OF INSERT
AS
BEGIN
INSERT
INTO myview
SELECT *
FROM inserted
END
It is possible to create a unique constraint on a Clustered Indexed View
You can create the View like this:
CREATE VIEW dbo.VIEW_OfYourTable WITH SCHEMABINDING AS
SELECT YourUniqueColumnWithNullValues FROM dbo.YourTable
WHERE YourUniqueColumnWithNullValues IS NOT NULL;
and the unique constraint like this:
CREATE UNIQUE CLUSTERED INDEX UIX_VIEW_OFYOURTABLE
ON dbo.VIEW_OfYourTable(YourUniqueColumnWithNullValues)
In my experience - if you're thinking a column needs to allow NULLs but also needs to be UNIQUE for values where they exist, you may be modelling the data incorrectly. This often suggests you're creating a separate sub-entity within the same table as a different entity. It probably makes more sense to have this entity in a second table.
In the provided example, I would put LibraryCardId in a separate LibraryCards table with a unique not-null foreign key to the People table:
CREATE TABLE People (
Id INT CONSTRAINT PK_MyTable PRIMARY KEY IDENTITY,
Name NVARCHAR(250) NOT NULL,
)
CREATE TABLE LibraryCards (
LibraryCardId UNIQUEIDENTIFIER CONSTRAINT PK_LibraryCards PRIMARY KEY,
PersonId INT NOT NULL
CONSTRAINT UQ_LibraryCardId_PersonId UNIQUE (PersonId),
FOREIGN KEY (PersonId) REFERENCES People(id)
)
This way you don't need to bother with a column being both unique and nullable. If a person doesn't have a library card, they just won't have a record in the library cards table. Also, if there are additional attributes about the library card (perhaps Expiration Date or something), you now have a logical place to put those fields.
Maybe consider an "INSTEAD OF" trigger and do the check yourself? With a non-clustered (non-unique) index on the column to enable the lookup.
As stated before, SQL Server doesn't implement the ANSI standard when it comes to UNIQUE CONSTRAINT. There is a ticket on Microsoft Connect for this since 2007. As suggested there and here the best options as of today are to use a filtered index as stated in another answer or a computed column, e.g.:
CREATE TABLE [Orders] (
[OrderId] INT IDENTITY(1,1) NOT NULL,
[TrackingId] varchar(11) NULL,
...
[ComputedUniqueTrackingId] AS (
CASE WHEN [TrackingId] IS NULL
THEN '#' + cast([OrderId] as varchar(12))
ELSE [TrackingId_Unique] END
),
CONSTRAINT [UQ_TrackingId] UNIQUE ([ComputedUniqueTrackingId])
)
You can create an INSTEAD OF trigger to check for specific conditions and error if they are met. Creating an index can be costly on larger tables.
Here's an example:
CREATE TRIGGER PONY.trg_pony_unique_name ON PONY.tbl_pony
INSTEAD OF INSERT, UPDATE
AS
BEGIN
IF EXISTS(
SELECT TOP (1) 1
FROM inserted i
GROUP BY i.pony_name
HAVING COUNT(1) > 1
)
OR EXISTS(
SELECT TOP (1) 1
FROM PONY.tbl_pony t
INNER JOIN inserted i
ON i.pony_name = t.pony_name
)
THROW 911911, 'A pony must have a name as unique as s/he is. --PAS', 16;
ELSE
INSERT INTO PONY.tbl_pony (pony_name, stable_id, pet_human_id)
SELECT pony_name, stable_id, pet_human_id
FROM inserted
END
You can't do this with a UNIQUE constraint, but you can do this in a trigger.
CREATE TRIGGER [dbo].[OnInsertMyTableTrigger]
ON [dbo].[MyTable]
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #Column1 INT;
DECLARE #Column2 INT; -- allow nulls on this column
SELECT #Column1=Column1, #Column2=Column2 FROM inserted;
-- Check if an existing record already exists, if not allow the insert.
IF NOT EXISTS(SELECT * FROM dbo.MyTable WHERE Column1=#Column1 AND Column2=#Column2 #Column2 IS NOT NULL)
BEGIN
INSERT INTO dbo.MyTable (Column1, Column2)
SELECT #Column2, #Column2;
END
ELSE
BEGIN
RAISERROR('The unique constraint applies on Column1 %d, AND Column2 %d, unless Column2 is NULL.', 16, 1, #Column1, #Column2);
ROLLBACK TRANSACTION;
END
END
CREATE UNIQUE NONCLUSTERED INDEX [UIX_COLUMN_NAME]
ON [dbo].[Employee]([Username] ASC) WHERE ([Username] IS NOT NULL)
WITH (ALLOW_PAGE_LOCKS = ON, ALLOW_ROW_LOCKS = ON, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF, ONLINE = OFF,
MAXDOP = 0) ON [PRIMARY];
this code if u make a register form with textBox and use insert and ur textBox is empty and u click on submit button .
CREATE UNIQUE NONCLUSTERED INDEX [IX_tableName_Column]
ON [dbo].[tableName]([columnName] ASC) WHERE [columnName] !=`''`;

Avoid inserting duplicates when using autoincrementing index

I have a query:
INSERT INTO tweet_hashtags(hashtag_id, tweet_id)
VALUES(1, 1)
ON CONFLICT DO NOTHING
RETURNING id
which work fine and inserts with id = 1, but when there is a duplicate let's say another (1, 1) it inserts with an id = 2. I want to prevent this from happening and I read that I can do ON CONFLICT (col_name) but that doesn't really help because I need to check for two values at a time.
The on conflict clause requires a unique constraint or index on the set of columns that you want to be unique - and it looks like you don't have that in place.
You can set it when you create table table:
create table tweet_hashtags(
id serial primary key,
hashtag_id int,
tweet_id int,
unique (hashtag_id, tweet_id)
);
Or, if the table already exists, you can create a unique index (but you need to get rid of the duplicates first):
create unique index idx_tweet_hashtags on tweet_hashtags(hashtag_id, tweet_id);
Then your query should just work:
insert into tweet_hashtags(hashtag_id, tweet_id)
values(1, 1)
on conflict (hashtag_id, tweet_id) do nothing
returning id
Specifying the conflict target makes the intent clearer and should be generally preferred (although it is not mandatory with do nothing).
Note that the query returns nothing when the insert is skipped (that is, the existing id is not returned).
Here is a demo on DB Fiddle that demonstrates the behavior with and without the unique index.

How to ignore duplicate Primary Key in SQL?

I have an excel sheet with several values which I imported into SQL (book1$) and I want to transfer the values into ProcessList. Several rows have the same primary keys which is the ProcessID because the rows contain original and modified values, both of which I want to keep. How do I make SQL ignore the duplicate primary keys?
I tried the IGNORE_DUP_KEY = ON but for rows with duplicated primary key, only 1 the latest row shows up.
CREATE TABLE dbo.ProcessList
(
Edited varchar(1),
ProcessId int NOT NULL PRIMARY KEY WITH (IGNORE_DUP_KEY = ON),
Name varchar(30) NOT NULL,
Amount smallmoney NOT NULL,
CreationDate datetime NOT NULL,
ModificationDate datetime
)
INSERT INTO ProcessList SELECT Edited, ProcessId, Name, Amount, CreationDate, ModificationDate FROM Book1$
SELECT * FROM ProcessList
Also, if I have a row and I update the values of that row, is there any way to keep the original values of the row and insert a clone of that row below, with the updated values and creation/modification date updated automatically?
How do I make SQL ignore the duplicate primary keys?
Under no circumstances can a transaction be committed that results in a table containing two distinct rows with the same primary key. That is fundamental to the nature of a primary key. SQL Server's IGNORE_DUP_KEY option does not change that -- it merely affects how SQL Server handles the problem. (With the option turned on it silently refuses to insert rows having the same primary key as any existing row; otherwise, such an insertion attempt causes an error.)
You can address the situation either by dropping the primary key constraint or by adding one or more columns to the primary key to yield a composite key whose collective value is not duplicated. I don't see any good candidate columns for an expanded PK among those you described, though. If you drop the PK then it might make sense to add a synthetic, autogenerated PK column.
Also, if I have a row and I update the values of that row, is there any way to keep the original values of the row and insert a clone of that row below, with the updated values and creation/modification date updated automatically?
If you want to ensure that this happens automatically, however a row happens to be updated, then look into triggers. If you want a way to automate it, but you're willing to make the user ask for the behavior, then consider a stored procedure.
try this
INSERT IGNORE INTO ProcessList SELECT Edited, ProcessId, Name, Amount, CreationDate, ModificationDate FROM Book1$
SELECT * FROM ProcessList
You drop the constraint. Something like this:
alter table dbo.ProcessList drop constraint PK_ProcessId;
You need to know the constraint name.
In other words, you can't ignore a primary key. It is defined as unique and not-null. If you want the table to have duplicates, then that is not the primary key.

How to force duplicate key insert mssql

I know this sounds crazy (And if I designed the database I would have done it differently) but I actually want to force a duplicate key on an insert. I'm working with a database that was designed to have columns as 'not null' pk's that have the same value in them for every row. The records keeping software I'm working with is somehow able to insert dups into these columns for every one of its records. I need to copy data from a column in another table into one column on this one. Normally I just would try to insert into that column only, but the pk's are set to 'not null' so I have to put something in them, and the way the table is set up that something has to be the same thing for every record. This should be impossible but the company that made the records keeping software made it work some how. I was wondering if anyone knows how this could be done?
P.S. I know this is normally not a good idea at all. So please just include suggestions for how this could be done regardless of how crazy it is. Thank you.
A SQL Server primary key has to be unique and NOT NULL. So, the column you're seeing duplicate data in cannot be the primary key on it's own. As urlreader suggests, it must be part of a composite primary key with one or more other columns.
How to tell what columns make up the primary key on a table: In Enterprise Manager, expand the table and then expand Columns. The primary key columns will have a "key" symbol next to them. You'll also see "PK" in the column description after, like this:
MyFirstIDColumn (PK, int, not null)
MySecondIDColumn (PK, int, not null)
Once you know which columns make up the primary key, simply ensure that you are inserting a combination of unique data into the columns. So, for my sample table above, that would be:
INSERT INTO MyTable (MyFirstIDColumn, MySecondIDColumn) VALUES (1,1) --SUCCEED
INSERT INTO MyTable (MyFirstIDColumn, MySecondIDColumn) VALUES (1,2) --SUCCEED
INSERT INTO MyTable (MyFirstIDColumn, MySecondIDColumn) VALUES (1,1) --FAIL because of duplicate (1,1)
INSERT INTO MyTable (MyFirstIDColumn, MySecondIDColumn) VALUES (1,3) --SUCCEED
More on primary keys:
http://msdn.microsoft.com/en-us/library/ms191236%28v=sql.105%29.aspx

renumber primary key

How would I reset the primary key counter on a sql table and update each row with a new primary key?
I would add another column to the table first, populate that with the new PK.
Then I'd use update statements to update the new fk fields in all related tables.
Then you can drop the old PK and old fk fields.
EDIT: Yes, as Ian says you will have to drop and then recreate all foreign key constraints.
Not sure which DBMS you're using but if it happens to be SQL Server:
SET IDENTITY_INSERT [MyTable] ON
allows you to update/insert the primary key column. Then when you are done updating the keys (you could use a CURSOR for this if the logic is complicated)
SET IDENTITY_INSERT [MyTable] OFF
Hope that helps!
This may or not be MS SQL specific, but:
TRUNCATE TABLE resets the identity counter, so one way to do this quick and dirty would be to
1) Do a Backup
2) Copy table contents to temp table:
3) Copy temp table contents back to table (which has the identity column):
SELECT Field1, Field2 INTO #MyTable FROM MyTable
TRUNCATE TABLE MyTable
INSERT INTO MyTable
(Field1, Field2)
SELECT Field1, Field2 FROM #MyTable
SELECT * FROM MyTable
-----------------------------------
ID Field1 Field2
1 Value1 Value2
Why would you even bother? The whole point of counter-based "identity" primary keys is that the numbers are arbitrary and meaningless.
you could do it in the following steps:
create copy of yourTable with extra column new_key
populate copyOfYourTable with the affected rows from yourTable along with desired values of new_key
temporarily disable constraints
update all related tables to point to the value of new_key instead of the old_key
delete affected rows from yourTable
SET IDENTITY_INSERT [yourTable] ON
insert affected rows again with the new proper value of the key (from copy table)
SET IDENTITY_INSERT [yourTable] OFF
reseed identity
re-enable constraints
delete the copyOfYourtable
But as others said all that work is not needed.
I tend to look at the identity type primary keys as if they were equivalent of pointers in C, I use them to reference other objects but never modify of access them explicitly
If this is Microsoft's SQL Server, one thing you could do is use the [dbcc checkident](http://msdn.microsoft.com/en-us/library/ms176057(SQL.90).aspx)
Assume you have a single table that you want to move around data within along with renumbering the primary keys. For the example, the name of the table is ErrorCode. It has two fields, ErrorCodeID (which is the primary key) and a Description.
Example Code Using dbcc checkident
-- Reset the primary key counter
dbcc checkident(ErrorCode, reseed, 7000)
-- Move all rows greater than 8000 to the 7000 range
insert into ErrorCode
select Description from ErrorCode where ErrorCodeID >= 8000
-- Delete the old rows
delete ErrorCode where ErrorCodeID >= 8000
-- Reset the primary key counter
dbcc checkident(ErrorCode, reseed, 8000)
With this example, you'll effectively be moving all rows to a different primary key and then resetting so the next insert takes on an 8000 ID.
Hope this helps a bit!