I'm inserting bulk records using COPY statement in PostgreSQL. What I realize is, the sequence IDs are not getting updated and when I try to insert a record later, it throws duplicate sequence ID. Should I manually update the sequence number to get the number of records after performing COPY? Isn't there a solution while performing COPY, just increment the sequence variable, that is, the primary key field of the table? Please clarify me on this. Thanks in advance!
For instance, if I insert 200 records, COPY does good and my table shows all the records. When I manually insert a record later, it says duplicate sequence ID error. It very well implies that it didn’t increment the sequence ids during COPYing as work fine during normal INSERTing. Instead of instructing the sequence id to set the max number of records, won’t there be any mechanism to educate the COPY command to increment the sequence IDs during its bulk COPYing option?
You ask:
Should I manually update the sequence number to get the number of records after performing COPY?
Yes, you should, as documented here:
Update the sequence value after a COPY FROM:
| BEGIN;
| COPY distributors FROM 'input_file';
| SELECT setval('serial', max(id)) FROM distributors;
| END;
You write:
it didn’t increment the sequence ids during COPYing as work fine during normal INSERTing
But that is not so! :) When you perform a normal INSERT, typically you do not specify an explicit value for the SEQUENCE-backed primary key. If you did, you would run in to the same problems as you are having now:
postgres=> create table uh_oh (id serial not null primary key, data char(1));
NOTICE: CREATE TABLE will create implicit sequence "uh_oh_id_seq" for serial column "uh_oh.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "uh_oh_pkey" for table "uh_oh"
CREATE TABLE
postgres=> insert into uh_oh (id, data) values (1, 'x');
INSERT 0 1
postgres=> insert into uh_oh (data) values ('a');
ERROR: duplicate key value violates unique constraint "uh_oh_pkey"
DETAIL: Key (id)=(1) already exists.
Your COPY command, of course, is supplying an explicit id value, just like the example INSERT above.
I realize that this is a bit old but maybe someone might still be looking for the answer.
As other said COPY works in a similar way as INSERT, so for inserting into a table that has a sequence, you simply don't mention the sequence field at all and it is taken care of for you. For COPY it works in the same exact way. But doesn't it COPY require ALL fields in the table to be present in the text file? The correct answer is NO, it doesn't, but it is the default behavior.
To COPY and leave the sequence out do the following:
COPY $YOURSCHEMA.$YOURTABLE(col1,col2,col3,col4) FROM '$your_input_file' DELIMITER ',' CSV HEADER;
No need to manually update the schema afterwards, it works as intended and in my testing is just about as fast.
You could copy to a sister table, then insert into mytable select * from sister - that would increment the sequence.
If your loaded data has the id field, don't select it for the insert: insert into mytable (col1, col2, col3) select col1, col2, col3 from sister
Related
I want to add another row in my existing table and I'm a bit hesitant if I'm doing the right thing because it might skew the database. I have my script below and would like to hear your thoughts about it.
I want to add another row for 'Jane' in the table, which will be 'SKATING" in the ACT column.
Table: [Emp_table].[ACT].[LIST_EMP]
My script is:
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
Will this do the trick?
Your statement looks ok. If the database has a problem with it (for example, due to a foreign key constraint violation), it will reject the statement.
If any of the fields in your table are numeric (and not varchar or char), just remove the quotes around the corresponding field. For example, if emp_cod and line_no are int, insert the following values instead:
('REG','EMP',45233,'2016-06-20 00:00:00:00',2,'SKATING','JANE')
Inserting records into a database has always been the most common reason why I've lost a lot of my hairs on my head!
SQL is great when it comes to SELECT or even UPDATEs but when it comes to INSERTs it's like someone from another planet came into the SQL standards commitee and managed to get their way of doing it implemented into the final SQL standard!
If your table does not have an automatic primary key that automatically gets generated on every insert, then you have to code it yourself to manage avoiding duplicates.
Start by writing a normal SELECT to see if the record(s) you're going to add don't already exist. But as Robert implied, your table may not have a primary key because it looks like a LOG table to me. So insert away!
If it does require to have a unique record everytime, then I strongly suggest you create a primary key for the table, either an auto generated one or a combination of your existing columns.
Assuming the first five combined columns make a unique key, this select will determine if your data you're inserting does not already exist...
SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND [EMP_COD] = wsEmpCod AND [DATE] = wsDate AND [LINE_NO] = wsLineno
The wsXXX declarations, you will have to replace them with direct values or have them DECLAREd earlier in your script.
If you ran this alone and recieved a value of 1 or more, then the data exists already in your table, at least those 5 first columns. A true duplicate test will require you to test EVERY column in your table, but it should give you an idea.
In the INSERT, to do it all as one statement, you can do this ...
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
WHERE (SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND
[EMP_COD] = wsEmpCod AND [DATE] = wsDate AND
[LINE_NO] = wsLineno) = 0
Just replace the wsXXX variables with the values you want to insert.
I hope that made sense.
I'm did some research about SQL batch inserts - let's say I have 100k items to be inserted, and I set the batch size to 100.
If the ID column is not marked as Identity then that bulk insert will work.
But I found quite interesting (theoretical so far) problem, and I need some opinions:
The problem can be, if e.g. 5 users are making the bulk inserts in the same time, how then safely provide the the ID column value ? I can't just get the table rows count + 1, because in that way all of that 5 users will have the ID duplicates and the bulk insert operation will fail.
You can use SEQUENCE as an UNIQUE ID generator or try TRIGGER ON INSERT to get a unique ID.
EDIT
With mysql you can build trigger for every row
DELIMITER $$
CREATE TRIGGER adresse_trigger_insert_check
BEFORE INSERT ON adresse
FOR EACH ROW BEGIN
IF NEW.land IS NULL THEN
SET NEW.land := 'XY';
END IF;
END$$
DELIMITER ;
Should I Use IDENTITY or Not?
Sometimes Another Approach Works Better
I've declared the following table for use by audit triggers:
CREATE TABLE audit_transaction_ids (id IDENTITY PRIMARY KEY, uuid VARCHAR UNIQUE NOT NULL, `time` TIMESTAMP NOT NULL);
The trigger will get invoked multiple times in the same transaction.
The first time the trigger is invoked, I want it to insert a new
row with the current TRANSACTION_ID() and time.
The subsequent times the trigger is invoked, I want it to return
the existing "id" (I invoke Statement.getGeneratedKeys() to that end)
without altering "uuid" or "time".
The current schema seems to have two problems.
When I invoke MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES(TRANSACTION_ID(), NOW()) I get: org.h2.jdbc.JdbcSQLException: Column "ID" contains null values; SQL
statement: MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES
(TRANSACTION_ID(), NOW()) [90081-155]
I suspect that invoking MERGE on an existing row will alter "time".
How do I fix both these problems?
MERGE is analogous to java.util.Map.put(key, value): it will insert the row if it doesn't exist, and update the row if it does. That being said, you can still merge into a table containing AUTO_INCREMENT columns so long as you use another column as the key.
Given customer[id identity, email varchar(30), count int] you could merge into customer(id, email, count) key(email) values((select max(id) from customer c2 where c2.email='test#acme.com'), 'test#acme.com', 10). Meaning, re-use the id if a record exists, use null otherwise.
See also https://stackoverflow.com/a/18819879/14731 for a portable way to insert-or-update depending on whether a row already exists.
1. MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES(TRANSACTION_ID(), NOW())
If you just want to insert a new row, use:
INSERT INTO audit_transaction_ids (uuid, time) VALUES(TRANSACTION_ID(), NOW())
MERGE without setting the value for the column ID doesn't make sense if ID is used as the key, because that way it could never (even in theory) update an existing rows. What you could do is using another key column (in the case above there is no column that could be used). See the documentation for MERGE for details.
2. Invoking MERGE on an existing row will alter "time"
I'm not sure if you talk about the fact that the value of the column 'time' is altered. This is the expected behavior if you use MERGE ... VALUES(.., NOW()), because the MERGE statement is supposed to update that column.
Or maybe you mean that older versions of H2 returned different values within the same transaction (unlike most other databases, which return the same value within the same transaction). This is true, however with H2 version 1.3.155 (2011-05-27) and later, this incompatibility is fixed. See also the change log: "CURRENT_TIMESTAMP() and so on now return the same value within a transaction." It looks like this is not the problem in your case, because you do seem to use version 1.3.155 (the error message [90081-155] includes the build / version number).
Short Answer:
MERGE INTO AUDIT_TRANSACTION_IDS (uuid, time) KEY (uuid, time)
VALUES (TRANSACTION_ID(), NOW());
little performance tip: make sure uuid is indexed
Long Answer:
MERGE is basically an UPDATE which INSERTs when no record found to be updated.
Wikipedia gives a more concise, standardized syntax of
MERGE but you have to supply your own update and insert.
(Whether this will be supported in H2 or not is not mine to answer)
So how do you update a record using MERGE in H2? You define a key to be looked up for, if it is found you update the row (with column names you supply, and you can define DEFAULT here, to reset your columns to its defaults), otherwise you insert the row.
Now what is Null? Null means unknown, not found, undefined, anything which is not what you're looking for.
That is why Null works as key to be looked up for. Because it means the record is not found.
MERGE INTO table1 (id, col1, col2)
KEY(id) VALUES (Null, 1, 2)
Null has a value. it IS a value.
Now let's see your SQL.
MERGE INTO table1 (id, col1, col2)
KEY(id) VALUES (DEFAULT, 1, 2)
What is that implying? To me, it says
I have this [DEFAULT, 1, 2], find me a DEFAULT in column id,
then update col1 to 1, col2 to 2, if found.
otherwise, insert default to id, 1 to col1, 2 to col2.
See what I emphasized there? What does that even mean? What is DEFAULT? How do you compare DEFAULT to id?
DEFAULT is just a keyword.
You can do stuff like,
MERGE INTO table1 (id, col1,
timeStampCol) KEY(id) VALUES (Null, 1,
DEFAULT)
but don't put DEFAULT in the key column.
What you do when you need to maintain a table with unique values when you can't use UNIQUE constraint?
For example, I use MySQL and want to map my urls to ids. So I create a table:
CREATE TABLE url (id INTEGER PRIMARY KEY AUTO_INCREMENT, url VARCHAR(2048));
The problem is that mysql doesn't allow unique field bigger than 1000 bytes.
How in general do insert only if not exist in sql atomically?
You could create an extra field which would be the hash of a url e.g. md5, and make that hash field unique. You can certainly be sure that the URL is unique then, and with almost 100% certainty you can insert a new URL if it isn't already there.
It is tempting to create a table lock, however creating a table lock will implicitly commit the transaction you are working on: http://www.databasesandlife.com/mysql-lock-tables-does-an-implicit-commit/
You could create a single-row table e.g. name mutex, type=InnoDB, insert a row into it, and do a select for update on that row to create a lock which is compatible with transactions. It's nasty but that's the way I do table locks in MySQL in my applications :(
You could use a not exist condition:
insert YourTable
(url)
values ('blah blah blah')
where not exists
(
select *
from YourTable
where url = 'blah blah blah'
)
In my opinion the best way to handle it is to write a trigger. The trigger is going to check each value in the table to see whether they are equal and if yes, to raise an error. However, I don't think an URL will go beyond 1000 characters but if it does in your case, you should write a trigger to handle the uniqueness.
I have the following hsqldb table, in which I map UUIDs to auto incremented IDs:
SHORT_ID (BIG INT, PK, auto incremented) | UUID (VARCHAR, unique)
Create command:
CREATE TABLE mytable (SHORT_ID BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, UUID VARCHAR(36) UNIQUE)
In order to add new pairs concurrently, I want to use the atomic MERGE INTO statement. So my (prepared) statement looks like this:
MERGE INTO mytable USING (VALUES(CAST(? AS VARCHAR(36)))) AS v(x) ON mytable.UUID = v.x WHEN NOT MATCHED THEN INSERT VALUES v.x
When I execute the statement (setting the placeholder correctly), I always get a
Caused by: org.hsqldb.HsqlException: row column count mismatch
Could you please give me a hint, what is going wrong here?
Thanks in advance.
Epilogue
I reported this behavior as a bug, and it is today (2010-05-25) fixed in the hsqldb SVN repository, per hsqldb-Bugs-2989597. (Thanks, hsqldb!)
Updated Answer
Neat one! Here's what I got to work under HSQLDB 2.0.0rc9, which supports the syntax and the error message you posted:
MERGE INTO mytable
USING (SELECT 'a uuid' FROM dual) AS v(x) -- my own "DUAL" table
ON (mytable.UUID = v.x)
WHEN NOT MATCHED THEN INSERT
VALUES (NULL, x) -- explicit NULL for "SHORT_ID" :(
Note, I could not convince 2.0.0rc9 to accept ... THEN INSERT (UUID) VALUES (x), which is IIUC a perfectly acceptable and clearer specification than the above. (My SQL knowledge is hardly compendious, but this looks like a bug to me.)
Original Answer
You appear to be INSERTing a single value (a 1-tuple) into a table with more than one column. Perhaps you can modify the end of your statement to read:
... WHEN NOT MATCHED INSERT ("UUID") VALUES (v.x)
I got same problems but solve in few minutes.
Its occur when datavalues and table structure are not same.Add explicit (NULL) in your empty column value.
Like i created table
TestCase table:
ID TESTCASEID DESCRIPTION
but your insertion statement you donot want to add any description for any testcase description then you have to explicite in insertion statement you have to set null value for description