Primary key violation while merging data from another table

Primary key violation while merging data from another table - sql

I have two tables, TBTC03 and TBTC03Y, with TBTC03Y having two extra columns as EFFDTE and EXPDTE. I have to merge the data from TBTC03 to TBTC03Y with the following logic:
If no matching TC03 entry is found in TC03Y
a new TC03Y record is build with the TC03 data
the Effective Date will default to '01-01-1980'
the Expiration Date will default to '09-30-1995'
I wrote a query for the same as :
insert into TBTC03Y (LOB,MAJPERIL,LOSSCAUSE,NUMERICCL,EFFDTE,EXPDTE)
select LOB,MAJPERIL,LOSSCAUSE,NUMERICCL,'0800101' ,'0950930'
from TBTC03 where not EXISTS (select * from TBTC03Y where
TBTC03Y.LOB = TBTC03.LOB AND
TBTC03Y.MAJPERIL = TBTC03.MAJPERIL AND
TBTC03Y.LOSSCAUSE = TBTC03.LOSSCAUSE AND
TBTC03Y.NUMERICCL = TBTC03.NUMERICCL)
The primary key for both the tables is LOB, MAJPERIL and LOSSCAUSE.
However i have some TBTC03Y records, that already have the data with the primary key.
Firing the above query gives primary key constraints on some of the rows.
I am unable to figure out how i can acomplish it.

The issue with the primary key is that you're also including NUMERICCL in the WHERE clause. If you remove this you'll then be inserting unique data.
You may have to create a separate process as it appears you have some records in each table that have the same LOB, MAJPERIL and LOSSCAUSE but have a different NUMERICCL. I can think of three options here;
You have an issue with the data that needs fixing.
Maybe you want to update this value to match, in which case you're looking at an UPDATE rather than INSERT INTO.
You need to update your composite primary key to include the column NUMERICCL.
Removing NUMERICCL from the where clause would also correct this.

If the PK for both tables is {LOB, MAJPERIL, LOSSCAUSE}, you should remove TBTC03Y.NUMERICCL = TBTC03.NUMERICCL from your where clause.
Example:
t1{LOB, MAJPERIL, LOSSCAUSE, NUMERICCL}
1 1 1 1
t2{LOB, MAJPERIL, LOSSCAUSE, NUMERICCL}
1 1 1 2
In t2 there is no row where:
TBTC03Y.LOB = TBTC03.LOB AND
TBTC03Y.MAJPERIL = TBTC03.MAJPERIL AND
TBTC03Y.LOSSCAUSE = TBTC03.LOSSCAUSE AND
TBTC03Y.NUMERICCL = TBTC03.NUMERICCL
But inserting will obvioulsy violate PK constraint in t2:
t2{LOB, MAJPERIL, LOSSCAUSE}
1 1 1

Related

SQL/DB2 SQLSTATE=23505 error when executing an UPDATE statement

I am getting a SQLSTATE=23505 error when I execute the following DB2 statement:
update SEOURLKEYWORD
set URLKEYWORD = REPLACE(URLKEYWORD, '/', '-')
where STOREENT_ID = 10701
and URLKEYWORD like '%/%';
After a quick search, a SQL state 23505 error is defined as follows:
AN INSERTED OR UPDATED VALUE IS INVALID BECAUSE THE INDEX IN INDEX SPACE CONSTRAINS COLUMNS OF THE TABLE SO NO TWO ROWS CAN CONTAIN DUPLICATE VALUES IN THOSE COLUMNS RID OF EXISTING ROW IS X
The full error I am seeing is:
The full error I am seeing is:
DB2 Database Error: ERROR [23505] [IBM][DB2/LINUXX8664] SQL0803N One or more values in the INSERT statement, UPDATE statement, or foreign key update caused by a DELETE statement are not valid because the primary key, unique constraint or unique index identified by "2" constrains table "WSCOMUSR.SEOURLKEYWORD" from having duplicate values for the index key. SQLSTATE=23505
1 0
I'm not sure what the "index identified by '2'" means, but it could be significant.
The properties of the columns for the SEOURLKEYWORD table are as follows:
Based on my understanding of this information, the only column that is forced to be unique is SEOURLKEYWORD_ID, the primary key column. This makes it sound like the update statement I'm trying to run is attempting to insert a row that has a SEOURLKEYWORD_ID that already exists in the table.
If I run a select * statement on the rows I'm trying to update, here's what I get:
select * from SEOURLKEYWORD
where storeent_id = 10701
and lower(URLKEYWORD) like '%/%';
I don't understand how executing the UPDATE statement is resulting in an error here. There are only 4 rows this statement should even be looking at, and I'm not manually updating the primary key at all. It kind of seems like it's reinserting a duplicate row with the updated column value before deleting the existing row.
Why am I seeing this error when I try to update the URLKEYWORD column of these four rows? How can I resolve this issue?
IMPORTANT: As I wrote this question, I have narrowed down the problem to the last of the four rows in the table above, SEOURLKEYWORD_ID = 3074457345616973668. I can update the other three rows just fine, but the 4th row is causing the error, I have no idea why. If I run a select * from SEOURLKEYWORD where SEOURLKEYWORD_ID = 3074457345616973668;, I see only the 1 row.

The error is pretty clear. You have a unique index/constraint in the table. Say you have two rows like this:
STOREENT_ID
URLKEYWORD
10701
A/B
10701
A-B
When the first version is replaced by 'A-B', the result would violate a unique constraint on (STOREENT_ID, URLKEYWORD) or (URLKEYWORD) (do note that other columns could possibly be included in the unique constraint/index as well).
You could avoid these situations by not updating them. I don't know what columns the unique constraint is on, but let's say only on URLKEYWORD. Then:
update SEOURLKEYWORD
set URLKEYWORD = REPLACE(URLKEYWORD, '/', '-')
where STOREENT_ID = 10701 and
URLKEYWORD like '%/%' and
not exists (select 1 from SEOURLKEYWORD s2 where replace(s2.urlkeyword, '/', '-') = REPLACE(SEOURLKEYWORD.URLKEYWORD, '/', '-')
);
Note the replace() is required for both columns because you might have:
A-B/C
A/B-C
These only conflict after the replacement in both values.

To complement the answer given by #GordonLinoff, here is a query that can be used to find a table's unique constraints, with their IDs, and the columns included in them:
SELECT c.tabschema, c.tabname, i.iid AS index_id, i.indname, ck.colname
FROM syscat.tabconst c
INNER JOIN syscat.indexes i
ON i.indname = c.constname -- unique index name matches constraint name
AND i.tabschema = c.tabschema AND i.tabname = c.tabname
INNER JOIN syscat.keycoluse ck
ON ck.constname = c.constname
AND ck.tabschema = c.tabschema c.tabname = ck.tabname AND
WHERE c.type = 'U' -- constraint type: unique
AND (c.tabschema, c.tabname) = ('YOURSCHEMA', 'YOURTABLE') -- replace schema/table
ORDER BY i.iid, ck.colseq

Cannot insert duplicate key SQL

insert into A (id,Name)
select ti.id,ti .Name
from A ti
where ti.id >= 1 AND ti.id<=3
id is the primary key but not autogenerated. When I run the query I get an error
Violation of PRIMARY KEY constraint 'XPKA'. Cannot insert duplicate key in object 'dbo.A'
tabel A
id Name
1 A
2 B
3 C
and I want to insert
id Name
4 A
5 B
6 C

Every row must have a different value for the Primary Key column. You are inserting the records from A back into itself, thus you are attempting to create a new row using a Primary Key value that is already being used. This leads to the error message that you see.
If you must insert records in this fashion, then you need to include a strategy for including unique values in the PK Column. If you cannot use an autoincrement rule (the normal method), then your logic needs to enforce this requirement, otherwise you will continue to see errors like this.

You are selecting from table A and inserting straight back in to it. This means that the ID values you insert will certainly already be there.
The message says that ID col has a PrimaryKey on it and requires the values in the column to be unique. It won't let you perform the action for this reason.
To fix your query based on your stated requirement, change the script to:
insert into A (id,Name)
select ti.id + 3,ti .Name
from A ti
where ti.id >= 1 AND ti.id<=3

You need to adjust the ID of the rows you are inserting. In your example to produce keys 4, 5, 6:
insert into A (id,Name)
select ti.id + 3 as NewKey,ti.Name
from A ti
where ti.id >= 1 AND ti.id<=3
But in reality you need to pick a value that will keep your new keys separate from any possible old key, maybe:
insert into A (id,Name)
select ti.id + 100000 as NewKey,ti.Name
from A ti
where ti.id >= 1 AND ti.id<=3

As Yaakov Ellis has said...
Every row must have a different value for the Primary Key column.
And as you have a WHERE clause which constricts your rows to 3 in total EVER
Those with the unique Id's 1, 2 and 3
So if you want to replace those rather then tring to INSERT them where they already exist and generating your error.
Maybe you could UPDATE them instead?
And that will resolve your issue.
UPDATE
After your addition of extra code...
You should set your UNIQUE Key Identifier to the ID Number and not the ABC field name (whatever you have called it)

Inserting rows and updating rows on tables that may or may not have primary keys

I am trying to update a table so that rows that are missing are added and rows that are not up to date are updated using information from another table on a different database as a reference.
However, some tables have primary keys and some do not.
If there is a primary key the insert command will not run and if there is not a primary key, rows will duplicate.
Is there a way to have the insert command skip over primary key values that are already there?
I'm using sql server management studio 2005 and here is the code I have so far for a table with a primary key (PKcolumn):
INSERT [testDB].[dbo].[table1]
SELECT * FROM [sourceDB].[dbo].[table1]
UPDATE test
SET
test.[PKcolumn] = source.[PKcolumn]
,test.[column2] = source.[column2]
,test.[column3] = source.[column3]
FROM
[sourceDB].[dbo].[sourceDB] AS source
INNER JOIN
[testDB].[dbo].[PKcolumn] AS test
ON source.[PKcolumn] = test.[PKcolumn]
Update works perfectly but Insert will not run at all if there is even one duplicate.
Any suggestions on how to make this code work?
Also, any tips for doing the same thing on a table without a primary key?

You'll need to exclude rows that are already present in the table in your INSERT query, using a LEFT OUTER JOIN:
INSERT [testDB].[dbo].[table1]
SELECT * FROM [sourceDB].[dbo].[table1]
LEFT OUTER JOIN [testDB].[dbo].[table1] ON [sourceDB].[dbo].[table1].[PKcolumn] = [testDB].[dbo].[table1].[PKcolumn]
WHERE [testDB].[dbo].[table1].[PKcolumn] IS NULL
For a table without a primary key, I suppose you'd need to join on ALL columns to avoid duplicates.

Regarding delete a record

HI I am having a table which does not have any primary key or unique key.
How can I delete the duplicate records?
Can any one of u tell me?

The easiest way would be to copy all of the duplicates into another identical table, delete them all from the original table, then put back the duplicates (just once for each unique one of course) from the temporary table.
For example:
BEGIN TRANSACTION
CREATE TABLE Holding_Table (my_string VARCHAR(20) NOT NULL)
INSERT INTO Holding_Table (my_string)
SELECT my_string
FROM My_Table
GROUP BY my_string
HAVING COUNT(*) > 1
DELETE MT
FROM Holding_Table HT
INNER JOIN My_Table MT ON MT.my_string = HT.my_string
INSERT INTO My_Table (my_string)
SELECT my_string
FROM Holding_Table
DROP TABLE Holding_Table
COMMIT TRANSACTION
This is just a simple example with one column. You would need to adjust it for your table obviously. Then be sure to add a primary key to your table...

You would have to create a primary key first. Then you would be able to run an aggregate query and see how many duplicates there are and delete based off of the new ID. You could then remove the primary key and make another field the primary key if you so desired (or stick with the one you created).
I have done this many times when fixing ancient legacy databases.

If you use: SET ROWCOUNT 1
You can get SQL to delete only a single row, and use whatever technique you prefer to delete the identical rows one at a time.
To revert back to normal behaviour, use: SET ROWCOUNT 0
However, it would be advisable to at least add a column that allows you to uniquely identify each row so that you can avoid this problem in future. The following does the trick:
ALTER TABLE TableName ADD TableName_ID int IDENTITY NOT NULL
Now you can simply: DELETE TableName WHERE TableName_ID = ? for each of your duplicates.

Check this site on support.microsoft.com: Site
It can tell you alot of how to identify, etc.

Adding this as another answer since it's a different approach...
You could also add a new column to the table, make that one unique, and then use that to delete all but one of the duplicate rows. For example:
ALTER TABLE My_Table
ADD my_id INT IDENTITY NOT NULL
DELETE
MT1
FROM
My_Table MT1
WHERE EXISTS (
SELECT
*
FROM
My_Table MT2
WHERE
MT2.my_string = MT1.my_string AND
MT2.my_id < MT1.my_id)
ALTER TABLE My_Table
DROP COLUMN my_id

Create a unique primary key (hash) from database columns

I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S

The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.

Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.

Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.

If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas