Oracle sql query with complex constraint - sql

Table named T1 with following values
Col1 Col2 Col3
Rs1 S S2
Rs2 SX S3
Rs3 S S2
From a csv, I need to insert some values into the table, having values Rs4, SX and S3 respectively to each column.
I need to apply a check with following constraints.
One S3 can belong to only one SX, but S3 and SX as pair can belong can belong to multiple columns1's values.
What will be the oracle query for this? And if the above condition is true then I need to run an insertion query which is prepared. How can it validated?
PS: we can't create another table.

Had to do a little discovery after I was informed that I totally missed the ORACLE tag. Knowing what you do not know is very important to me. This post should be sufficiently different.
THE BASIC PROBLEM WITH ORACLE'S CHECK
A check constraint can NOT be defined on a SQL View. The check constraint defined on a table must refer to only columns in that
table. It can not refer to columns in other tables.
A check constraint can NOT include a SQL Subquery.
A check constraint can be defined in either a SQL CREATE TABLE statement or a SQL ALTER TABLE statement.
REVISITING THE PROBLEM
We know that (Col2,Col3)| #(Col2,COl3) >= 1.
We know that {Col1}∩(Col2,Col3)
However, the #Cardinality of Col1? Can it be more than 1?
Clearly, the business requirements are not fully explained.
REVISITING THE SOLUTIONS
Adding Objects to the database.
While adding additional tables has been voted down, is it possible to add an ID column? Assuming Col1 is NOT unique to the subsets of (Col2,COl3), then you can add a true ID Column that fulfills the need for normalization while providing true indexing power in your query.
Col1 Col2 Col3 Col4
Rs1 S S2 1
Rs2 SX S3 2
Rs3 S S2 1
To be clear, Col4 would still be an ID since the values of Col2, Col3 are determined by Col4. (Col2,Col3) 1:1 Col4.
CHECKS
Multiple CHECK constraints, each with a simple condition enforcing a
single business rule, are preferable to a single CHECK constraint with
a complicated condition enforcing multiple business rules ORACLE - Constraint
A single column can have multiple CHECK constraints that reference the
column in its definition. There is no limit to the number of CHECK
constraints that you can define on a column. ORACLE - Data Integrity
If you can add a column...by the love of monkeys, please do...not only will it make your life much easier, but you can also make QUERYING the table very efficient. However, for the rest of this post, I will assume you cannot add columns:
RESTATING THE PROBLEM IN CONSTRAINTS
Col2 may not appear with a different Col3. Vice Versa.
(Col2,Col3) may have multiple Co1...what is the possible Cardinality of Col1? Can it be repetitive? I read no.
WRITING OUT THE THEORY ON CHECKS
IF Col1 truly is unique in {(col2,col3)}, then the following already works:
ALTER TABLE EXAMPLE3
ADD CONSTRAINT ch_example3_3way UNIQUE (C, D, X) --only works if these valus never repeat.
The other main constraint #(Col2,Col3) > 1 simply cannot work unless you knew what value was being entered so as to enforce a real SARG. Any Col1 = Col1 or Col1 IN Col1 is the same thing as writing 1 = 1.
ON TRIGGERS
As tempting as the idea sounds, a quick glance through ORACLE lane left me warning against the use. Some reasons from ORACLE:
ORACLE - USING TRIGGERS
Do not create recursive triggers.
For example, if you create an AFTER UPDATE statement trigger on the
employees table, and the trigger itself issues an UPDATE statement on
the employees table, the trigger fires recursively until it runs out
of memory.
Use triggers on DATABASE judiciously. They are executed for every user every time the event occurs on which the trigger is created
Other problems include: TOADWORLD - ORACLE WIKI
Not Compiled -STORED PROCs can reuse a cached plan
No SELECT Trigger Support
Complete Trigger Failure
Disabled Triggers
No Version Control
Update OF COLUMN
No Support of SYS Table Triggers
Mutating Triggers
Hidden Behavior
Still, there are advantages of TRIGGERs, and you could still enforce data integrity by using a query where the first result of
SELECT Col2, Col3 FROM T1 WHERE ROWNUM = 1
Is compared to the inserted value *new.*Col2, *new.*Col3, but this would require the trigger to fire EVERY TIME a row was inserted...recompiled and everything,..I STRONGLY URGE AVOIDANCE.
STORED PROCS
Whatever you may think of STORED PROCEDURES, I suggest you consider them again. Everything from Functions, DML, DDL, database management, RECURSIVE LOGIC, sp_executesql, and beyond can be accomplished through a PROC.
Easily managed, provides encapsulation from accidental or malicious disabling or mutilization of coding.
PROCs are compiled once and can be reuse query plan caches, providing improved performances.
Provides superior portability, can be embedded into TRIGGERS, ORM framework, applications and beyond.
Can literally automate almost any function in a database including ETL, Resource management, security, and discovery. Views are commonly run through stored Procs.
THE UNIQUE ADVANTAGE OF ORACLE
Perhaps forgotten, consider that this is ORACLE which allows you to suspend CONSTRAINTS by inserting in the CONSTRAINT DEFFERABLE. From an ETL specialist perspective, this is essentially making a staging table out of your only table...which is pretty sweet in your predicament of having limited DDL rights.
CONCLUDING COMMENTS
There are a few efficient methods to delete duplicates in your data.
DELETE FROM T1
WHERE rowid NOT IN
(SELECT MAX(rowid)
FROM T1
GROUP BY Col1, Col2, Col3);
NOTE: rowid is the physical location of the row, while rownum represents the logical position in the query.
Lastly, my last attempt at rowid. Unfortunately, time is running late, and the free COMPILER from ORACLE is unhelpful. But I think the idea is what is important.
CREATE TABLE Example3 (MUT VARCHAR(50), D VARCHAR(50), X VARCHAR(50) );
INSERT INTO Example3 (MUT, D, X) VALUES('MUT', 'T', 'M' );
INSERT INTO Example3 (MUT, D, X) VALUES('MUT', 'T', 'P' );
INSERT INTO Example3 (MUT, D, X) VALUES('MUT', 'X', 'LP');
INSERT INTO Example3 (MUT, D, X) VALUES('MUT', 'X', 'Z');
INSERT INTO Example3 (MUT, D, X) VALUES('MUT', 'Y', 'POP');
SELECT C.D, B.X, B.rowid
FROM EXAMPLE3 A
LEFT OUTER JOIN (
SELECT DISTINCT X, C.rowid
FROM EXAMPLE3) B ON B.rowid = A.rowid
LEFT OUTER JOIN (
SELECT DISTINCT D, MAX(rowid) AS [rowid]
FROM Example3) C ON C.rowid = B.rowid

Finally, I'm able to resolve the question with a some select queries and few if conditions being applied. I have done this in a stored procedure.
SELECT count(col3)
INTO V_exist_value
FROM T3
WHERE col3's value = Variable_col3
AND col1's value <> Variable_col1
AND col2's value = Variable_col2;
IF (V_exist_value >= 1) THEN
INSERT INTO T3 (col1, col2, col3)
VALUES (Variable_col1, Variable_col2, Variable_col3);
ELSE
SELECT count(col3)
INTO V_exist_value1
FROM T3
WHERE col3's value = Variable_col3;
IF (V_exist_value1 = 0) THEN
INSERT INTO T3 (col1, col2, col3)
VALUES (Variable_col1, Variable_col2, Variable_col3);
ELSE
RAISE Exception_col3_value_exists;
END IF;
END IF;

If you don't want to use a trigger then you must normalize your tables.
Create a second table - say T1_PAIRS - that will store all permitted pairs of (col2, col3).
Create an unique constraint on col2 column in table T1_PAIRS - this constraint allows only for unique values of COL2 - for example no more than one S3 value can be used in all pairs ==> this enforces the rule: "One S3 can belong to only one SX"
Create a primary key on ( col2, col3 ) columns in this table T1_PAIRS.
Create a foreign key constraint on ( col2, col3 ) in T1 table that references the primary key of T1_PAIRS table.
In the end create an unique constraint on (col1, col2, col3) columnt to enforce a rule ==> S3 and SX as pair can belong can belong to multiple columns1's values (but no more than one value of column1)"
An example:
CREATE TABLE T1_PAIRS (
Col2 varchar2(10), Col3 varchar2(10),
CONSTRAINT T1_PAIRS_PK PRIMARY KEY( col2, col3 ),
CONSTRAINT T1_col2_UQ UNIQUE( col2 )
);
INSERT ALL
INTO T1_PAIRS( col2, col3 ) VALUES( 'S', 'S2' )
INTO T1_PAIRS( col2, col3 ) VALUES( 'SX', 'S3' )
SELECT 1 FROM dual;
ALTER TABLE T1
ADD CONSTRAINT col2_col3_pair_fk
FOREIGN KEY ( col2, col3 ) REFERENCES T1_pairs( col2, col3 );
ALTER TABLE T1
ADD CONSTRAINT pair_can_belong_to_multi_col1 UNIQUE( col1, col2, col3 );

Related

SQL UPDATE value based on row and column location without ID or key

In SQL (I'm using postgres, but am open to other variations), is it possible to update a value based on a row location and a column name when the table doesn't have unique rows or keys? ...without adding a column that contains unique values?
For example, consider the table:
col1
col2
col3
1
1
1
1
1
1
1
1
1
I would like to update the table based on the row number or numbers. For example, change the values of rows 1 and 3, col2 to 5 like so:
col1
col2
col3
1
5
1
1
1
1
1
5
1
I can start with the example table:
CREATE TABLE test_table (col1 int, col2 int, col3 int);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
Now, I could add an additional column, say "id" and simply:
UPDATE test_table SET col2 = 5 WHERE id = 1
UPDATE test_table SET col2 = 5 WHERE id = 3
But can this be done just based on row number?
I can select based on row number using something like:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER() FROM test_table
) as sub
WHERE row_number BETWEEN 1 AND 2
But this doesn't seem to play well with the update function (at least in postgres). Likewise, I have tried using some subsets or common table expressions, but again, I'm running into difficulties with the UPDATE aspect. How can I perform something that accomplishes something like this pseudo code?: UPDATE <my table> SET <col name> = <new value> WHERE row_number = 1 or 3, or... This is trivial other languages like R or python (e.g., using pandas's .iloc function). It would be interesting to know how to do this in SQL.
Edit: in my table example, I should have specified the column types to something like int.
This is one of the many instances where you should embrace the lesser evil that is Surrogate Keys. Whichever table has a primary key of (col1,col2,col3) should have an additional key created by the system, such as an identity or GUID.
You don't specify the data type of (col1,col2,col3), but if for some reason you're allergic to surrogate keys you can embrace the slightly greater evil of a "combined key", where instead of a database-created value your unique key field is derived from some other fields. (In this instance, it'd be something like CONCAT(col1, '-', col2, '-', col3) ).
Should neither of the above be practical, you will be left with the greatest evil of having to manually specify all three columns each time you query a record. Which means that any other object or table which references this one will need to have not one but three distinct fields to identify which record you're talking about.
Ideally, btw, you would have some business key in the actual data which you can guarantee by design will be unique, never-changing, and never-blank. (Or at least changing so infrequently that the db can handle cascade updates reasonably well.)
You may wind up using a surrogate key for performance in such a case anyway, but that's an implementation detail rather than a data modeling requirement.

PostgreSQL insert multiple on conflict targets [duplicate]

I have two columns in table col1, col2, they both are unique indexed (col1 is unique and so is col2).
I need at insert into this table, use ON CONFLICT syntax and update other columns, but I can't use both column in conflict_targetclause.
It works:
INSERT INTO table
...
ON CONFLICT ( col1 )
DO UPDATE
SET
-- update needed columns here
But how to do this for several columns, something like this:
...
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
....
ON CONFLICT requires a unique index* to do the conflict detection. So you just need to create a unique index on both columns:
t=# create table t (id integer, a text, b text);
CREATE TABLE
t=# create unique index idx_t_id_a on t (id, a);
CREATE INDEX
t=# insert into t values (1, 'a', 'foo');
INSERT 0 1
t=# insert into t values (1, 'a', 'bar') on conflict (id, a) do update set b = 'bar';
INSERT 0 1
t=# select * from t;
id | a | b
----+---+-----
1 | a | bar
* In addition to unique indexes, you can also use exclusion constraints. These are a bit more general than unique constraints. Suppose your table had columns for id and valid_time (and valid_time is a tsrange), and you wanted to allow duplicate ids, but not for overlapping time periods. A unique constraint won't help you, but with an exclusion constraint you can say "exclude new records if their id equals an old id and also their valid_time overlaps its valid_time."
A sample table and data
CREATE TABLE dupes(col1 int primary key, col2 int, col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
INSERT INTO dupes values(1,1,'a'),(2,2,'b');
Reproducing the problem
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this Q1. The result is
ERROR: duplicate key value violates unique constraint "col2_unique"
DETAIL: Key (col2)=(2) already exists.
What the documentation says
conflict_target can perform unique index inference. When performing
inference, it consists of one or more index_column_name columns and/or
index_expression expressions, and an optional index_predicate. All
table_name unique indexes that, without regard to order, contain
exactly the conflict_target-specified columns/expressions are inferred
(chosen) as arbiter indexes. If an index_predicate is specified, it
must, as a further requirement for inference, satisfy arbiter indexes.
This gives the impression that the following query should work, but it does not because it would actually require a together unique index on col1 and col2. However such an index would not guarantee that col1 and col2 would be unique individually which is one of the OP's requirements.
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1,col2) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this query Q2 (this fails with a syntax error)
Why?
Postgresql behaves this way is because what should happen when a conflict occurs on the second column is not well defined. There are number of possibilities. For example in the above Q1 query, should postgresql update col1 when there is a conflict on col2? But what if that leads to another conflict on col1? how is postgresql expected to handle that?
A solution
A solution is to combine ON CONFLICT with old fashioned UPSERT.
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE dupes SET col3 = data WHERE col1 = key1 and col2 = key2;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently, or key2
-- already exists in col2,
-- we could get a unique-key failure
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
You would need to modify the logic of this stored function so that it updates the columns exactly the way you want it to. Invoke it like
SELECT merge_db(3,2,'c');
SELECT merge_db(1,2,'d');
In nowadays is (seems) impossible. Neither the last version of the ON CONFLICT syntax permits to repeat the clause, nor with CTE is possible: not is possible to breack the INSERT from ON CONFLICT to add more conflict-targets.
If you are using postgres 9.5, you can use the EXCLUDED space.
Example taken from What's new in PostgreSQL 9.5:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Vlad got the right idea.
First you have to create a table unique constraint on the columns col1, col2 Then once you do that you can do the following:
INSERT INTO dupes values(3,2,'c')
ON CONFLICT ON CONSTRAINT dupes_pkey
DO UPDATE SET col3 = 'c', col2 = 2
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
works fine. but you should not update col1, col2 in the SET section.
Create a constraint (foreign index, for example).
OR/AND
Look at existing constraints (\d in psq).
Use ON CONSTRAINT(constraint_name) in the INSERT clause.
You can typically (I would think) generate a statement with only one on conflict that specifies the one and only constraint that is of relevance, for the thing you are inserting.
Because typically, only one constraint is the "relevant" one, at a time. (If many, then I'm wondering if something is weird / oddly-designed, hmm.)
Example:
(License: Not CC0, only CC-By)
// there're these unique constraints:
// unique (site_id, people_id, page_id)
// unique (site_id, people_id, pages_in_whole_site)
// unique (site_id, people_id, pages_in_category_id)
// and only *one* of page-id, category-id, whole-site-true/false
// can be specified. So only one constraint is "active", at a time.
val thingColumnName = thingColumnName(notfificationPreference)
val insertStatement = s"""
insert into page_notf_prefs (
site_id,
people_id,
notf_level,
page_id,
pages_in_whole_site,
pages_in_category_id)
values (?, ?, ?, ?, ?, ?)
-- There can be only one on-conflict clause.
on conflict (site_id, people_id, $thingColumnName) <—— look
do update set
notf_level = excluded.notf_level
"""
val values = List(
siteId.asAnyRef,
notfPref.peopleId.asAnyRef,
notfPref.notfLevel.toInt.asAnyRef,
// Only one of these is non-null:
notfPref.pageId.orNullVarchar,
if (notfPref.wholeSite) true.asAnyRef else NullBoolean,
notfPref.pagesInCategoryId.orNullInt)
runUpdateSingleRow(insertStatement, values)
And:
private def thingColumnName(notfPref: PageNotfPref): String =
if (notfPref.pageId.isDefined)
"page_id"
else if (notfPref.pagesInCategoryId.isDefined)
"pages_in_category_id"
else if (notfPref.wholeSite)
"pages_in_whole_site"
else
die("TyE2ABK057")
The on conflict clause is dynamically generated, depending on what I'm trying to do. If I'm inserting a notification preference, for a page — then there can be a unique conflict, on the site_id, people_id, page_id constraint. And if I'm configuring notification prefs, for a category — then instead I know that the constraint that can get violated, is site_id, people_id, category_id.
So I can, and fairly likely you too, in your case?, generate the correct on conflict (... columns ), because I know what I want to do, and then I know which single one of the many unique constraints, is the one that can get violated.
Kind of hacky but I solved this by concatenating the two values from col1 and col2 into a new column, col3 (kind of like an index of the two) and compared against that. This only works if you need it to match BOTH col1 and col2.
INSERT INTO table
...
ON CONFLICT ( col3 )
DO UPDATE
SET
-- update needed columns here
Where col3 = the concatenation of the values from col1 and col2.
I get I am late to the party but for the people looking for answers I found this:
here
INSERT INTO tbl_Employee
VALUES (6,'Noor')
ON CONFLICT (EmpID,EmpName)
DO NOTHING;
ON CONFLICT is very clumsy solution, run
UPDATE dupes SET key1=$1, key2=$2 where key3=$3
if rowcount > 0
INSERT dupes (key1, key2, key3) values ($1,$2,$3);
works on Oracle, Postgres and all other database

Avoid Duplicates with INSERT INTO TABLE VALUES from csv file

I have a .csv file with 600 million plus rows. I need to upload this into a database. It will have 3 columns assigned as primary keys.
I use pandas to read the file in chunks of 1000 lines.
At each chunk iteration I use the
INSERT INTO db_name.dbo.table_name("col1", "col2", "col3", "col4")
VALUES (?,?,?,?)
cursor.executemany(query, df.values.tolist())
Syntax with pyodbc in python to upload data in chunks of 1000 lines.
Unfortunately, there are apparently some duplicate rows present. When the duplicate row is encountered the uploading stops with an error from SQL Server.
Question: how can I upload data such that whenever a duplicate is encountered instead of stopping it will just skip that line and upload the rest? I found some questions and answers on insert into table from another table, or insert into table from variables declared, but nothing on reading from a file and using insert into table col_names values() command.
Based on those answers one idea might be:
At each iteration of chunks:
Upload to a temp table
Do the insertion from the temp table into the final table
Delete the rows in the temp table
However, with such a large file each second counts, and I was looking for an answer with better efficiency.
I also tried to deal with duplicates using python, however, since the file is too large to fit into the memory I could not find a way to do that.
Question 2: if I were to use bulk insert, how would I achieve to skip over the duplicates?
Thank you
You can try to use a CTE and an INSERT ... SELECT ... WHERE NOT EXISTS.
WITH cte
AS
(
SELECT ? col1,
? col2,
? col3,
? col4
)
INSERT INTO db_name.dbo.table_name
(col1,
col2,
col3,
col4)
SELECT col1,
col2,
col3,
col4
FROM cte
WHERE NOT EXISTS (SELECT *
FROM db_name.dbo.table_name
WHERE table_name.col1 = cte.col1
AND table_name.col2 = cte.col2
AND table_name.col3 = cte.col3
AND table_name.col4 = cte.col4);
Possibly delete some of the table_name.col<n> = cte.col<n>, if the column isn't part of the primary key.
I would always load into a temporary load table first, which doesn't have any unique or PK constraint on those columns. This way you can always see that the whole file has loaded, which is an invaluable check in any ETL work, and for any other easy analysis of the source data.
After that then use an insert such as suggested by an earlier answer, or if you know that the target table is empty then simply
INSERT INTO db_name.dbo.table_name(col1,col2,col3,col4)
SELECT distinct col1,col2,col3,col4 from load_table
The best approach is to use a temporary table and execute a MERGE-INSERT statement. You can do something like this (not tested):
CREATE TABLE #MyTempTable (col1 VARCHAR(50), col2, col3...);
INSERT INTO #MyTempTable(col1, col2, col3, col4)
VALUES (?,?,?,?)
CREATE CLUSTERED INDEX ix_tempCol1 ON #MyTempTable (col1);
MERGE INTO db_name.dbo.table_name AS TARGET
USING #MyTempTable AS SOURCE ON TARGET.COL1 = SOURCE.COL1 AND TARGET.COL2 = SOURCE.COL2 ...
WHEN NOT MATCHED THEN
INSERT(col1, col2, col3, col4)
VALUES(source.col1, source.col2, source.col3, source.col4);
You need to consider the best indexes for your temporary table to make the MERGE faster. With the statement WHEN NOT MATCHED you avoid duplicates depending on the ON clause.
SQL Server Integration Services offers one method that can read data from a source (via a Dataflow task), then remove duplicates using it's Sort control (a checkbox to remove duplicates).
https://www.mssqltips.com/sqlservertip/3036/removing-duplicates-rows-with-ssis-sort-transformation/
Of course the data has to be sorted and 60 million+ rows isn't going to be fast.
If you want to use pure SQL Server then you need a staging table (without a pk constraint). After importing your data into Staging, you would insert into your target table using filtering for the composite PK combination. For example,
Insert into dbo.RealTable (KeyCol1, KeyCol2, KeyCol3, Col4)
Select Col1, Col2, Col3, Col4
from dbo.Staging S
where not exists (Select *
from dbo.RealTable RT
where RT.KeyCol1 = S.Col1
AND RT.KeyCol2 = S.Col2
AND RT.KeyCol3 = S.Col3
)
In theory you could also use the set operator EXCEPT since it takes the distinct values from both tables. For example:
INSERT INTO RealTable
SELECT * FROM Staging
EXCEPT
SELECT * FROM RealTable
Would insert distinct rows from Staging into RealTable (that don't already exist in RealTable). This method doesn't take into account the composite PK using different values on multiple rows- so an insert error would indicate different values are being assigned to the same PK composite key in the csv.

postgresql insert into from select

I have two tables table1 and test_table1 which have the same schema.
Both tables have rows/data and pk id's starting from 1.
I would like to do:
insert into test_table1 select * from table1;
but this fails due to the pk values from table1 existing in test_table1.
Way around it would be to specify columns and leave the pk column out, but for some reason thats not working either:
e.g.
NOTE - no pk columns in query below
insert into test_table1 (col1, col2,..., coln) select col1,col2,...,coln from table1;
returns
ERROR: duplicate key value violates unique constraint "test_table1_pkey"
DETAIL: Key (id)=(1) already exists.
I know this works in MySql, is this just due to Postgresql? Anyway around it?
EDIT:
Both tables have primary keys and sequence set.
Since it wasn't clear - tables don't have the same data.
I would just like to add rows from table1 to test_table1.
For answers telling me to exclude primary key from the query - I did as I said before.
Just remove pk column from columns of query
insert into test_table1 (col2,..., coln) select col2,...,coln from table1;
If it still fails maybe you have not sequence on pk columns.
Create sequence on already existing pk column
create sequence test_table1_seq;
ALTER TABLE test_table1
ALTER COLUMN col1 SET DEFAULT nextval('test_table1_seq'::regclass);
And update sequence value to current
SELECT setval('test_table1_seq', (SELECT MAX(col1) FROM test_table1));
This post helped me solve my problem, not sure what went wrong:
How to fix PostgreSQL error "duplicate key violates unique constraint"
If you get this message when trying to insert data into a PostgreSQL database:
ERROR: duplicate key violates unique constraint
That likely means that the primary key sequence in the table you're working with has somehow become out of sync, likely because of a mass import process (or something along those lines). Call it a "bug by design", but it seems that you have to manually reset the a primary key index after restoring from a dump file. At any rate, to see if your values are out of sync, run these two commands:
SELECT MAX(the_primary_key) FROM the_table;
SELECT nextval('the_primary_key_sequence');
If the first value is higher than the second value, your sequence is out of sync. Back up your PG database (just in case), then run thisL
SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);
That will set the sequence to the next available value that's higher than any existing primary key in the sequence.
You rather would want to do a UPDATE JOIN like
UPDATE test_table1 AS v
SET col1 = s.col1,
col2 = s.col2,
col3 = s.col3,
.....
colN = s.colN
FROM table1 AS s
WHERE v.id = s.id;
what you want to do is an upsert.
with upsert as (
update test_table1 tt
set col1 = t.col1,
col2 = t.col2,
col3 = t.col3
from table1 t
where t.id = tt.id
returning *
)
insert into test_table1(id, col1, col2, col3)
select id, col1,col2,col3
from table1
where not exists (select * from upsert)

SQL Constraint IGNORE_DUP_KEY on Update

I have a Constraint on a table with IGNORE_DUP_KEY. This allows bulk inserts to partially work where some records are dupes and some are not (only inserting the non-dupes). However, it does not allow updates to partially work, where I only want those records updated where dupes will not be created.
Does anyone know how I can support IGNORE_DUP_KEY when applying updates?
I am using MS SQL 2005
If I understand correctly, you want to do UPDATEs without specifying the necessary WHERE logic to avoid creating duplicates?
create table #t (col1 int not null, col2 int not null, primary key (col1, col2))
insert into #t
select 1, 1 union all
select 1, 2 union all
select 2, 3
-- you want to do just this...
update #t set col2 = 1
-- ... but you really need to do this
update #t set col2 = 1
where not exists (
select * from #t t2
where #t.col1 = t2.col1 and col2 = 1
)
The main options that come to mind are:
Use a complete UPDATE statement to avoid creating duplicates
Use an INSTEAD OF UPDATE trigger to 'intercept' the UPDATE and only do UPDATEs that won't create a duplicate
Use a row-by-row processing technique such as cursors and wrap each UPDATE in TRY...CATCH... or whatever the language's equivalent is
I don't think anyone can tell you which one is best, because it depends on what you're trying to do and what environment you're working in. But because row-by-row processing could potentially produce some false positives, I would try to stick with a set-based approach.
I'm not sure what is really going on, but if you are inserting duplicates and updating Primary Keys as part of a bulk load process, then a staging table might be the solution for you. You create a table that you make sure is empty prior to the bulk load, then load it with the 100% raw data from the file, then process that data into your real tables (set based is best). You can do things like this to insert all rows that don't already exist:
INSERT INTO RealTable
(pk, col1, col2, col3)
SELECT
pk, col1, col2, col3
FROM StageTable s
WHERE NOT EXISTS (SELECT
1
FROM RealTable r
WHERE s.pk=r.pk
)
Prevent the duplicates in the first place is best. You could also do UPDATEs on your real table by joining in the staging table, etc. This will avoid the need to "work around" the constraints. When you work around the constraints, you usually create difficult to find bugs.
I have the feeling you should use the MERGE statement and then in the update part you should really not update the key you want to have unique. That also means that you have to define in your table that a key is unique (Setting a unique index or define as primary key). Then any update or insert with a duplicate key will fail.
Edit: I think this link will help on that:
http://msdn.microsoft.com/en-us/library/bb522522.aspx