Background: I am migrating from postgreSQL to Vertica and found, that there are some issues in IDENTITY or AUTO_INCREMENT columns. One of these issues is, that vertica cannot assign values to IDENTITY columns or alter a column, that already has data into an IDENTITY column. Therefore I created a sequence and set the default value of the column to be unique doing:
SELECT MAX(id_column) FROM MY_SCHEMA.my_table;
which is 12345
CREATE SEQUENCE MY_SCHEMA.seq_id_column MINVALUE 12346 CACHE 1;
ALTER TABLE MY_SCHEMA.my_table
ALTER COLUMN id_column SET DEFAULT(MY_SCHEMA.seq_id_column.nextval);
ALTER TABLE MY_SCHEMA.log ADD UNIQUE(id_column);
Which works as expected. In this case, I have the cache deactivated, as I am on a single node installation and I want my ID column to be contiguous. However, this is not an option on a cluster installation as the needed lock leads to a bottleneck.
Question: In a vertica cluster with several nodes, how can I access the ID of the last insert in a session (without an additional select)?
E.g. in postgreSQL I could do something like
INSERT INTO MY_SCHEMA.my_table RETURNING id_column;
which does not work in Vertica. Furthermore, the LAST_INSERT_ID() function of Vertica does not work for named sequences. I also feel, that querying the current_value of MY_SCHEMA.seq_id_column could be giving wrong results due to caching, but I am unsure about this.
Why no additional SELECT?
To my knowledge, the select will only give correct values after a commit. I cannot do a commit after every single insert due to performance.
The comments from LukStorms pointed me in the right direction.
The NEXTVAL() function (as far as I have tested) gives contiguous values in the case, where one single session queries them. Furthermore, on concurrent access, if issued after an insert, CURRVAL retrieves the cached value, which is guaranteed to be unique but not necessarily contiguous. As I never call NEXTVAL anywhere else as in my default clause, this solves the problem for me, although there might be cases, where an additional call to NEXTVAL between inserts increments the sequence counter.
One case I can think of (and that I will test in the future) is what happens if AUTO COMMIT is set to OFF, which is ON by default for the vertica client drivers.
UPDATE:
This even seems to work with AUTOCOMMIT being OFF (shown using the vertica-python client driver, where C is the connection and cur the cursor):
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.fetchall()
--> 1
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 1
cur.execute("SET SESSION AUTOCOMMIT TO OFF")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 4
However, this seems to be unchanged during a rollback of the connection. So the following happens:
C.rollback()
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 4
Related
Here are my sqlite db details,
sqlite> .table
url_db
sqlite> .schema url_db
CREATE TABLE url_db(URL TEXT UNIQUE);
sqlite> select * from url_db;
play.googleapis.com
notifications.google.com
contacts.skype.com
edge.skype.com
people.skype.com
I wanted to execute the command delete from url_db limit 1.
So I've downloaded the full source code (sqlite-src-3240000.zip) from the official download page.
Compiled the source code with the option 'SQLITE_ENABLE_UPDATE_DELETE_LIMIT=1'
When I execute that command (command executed), sometimes it deletes the random entry but not the first entry. I wanted to delete play.googleapis.com, instead that command deleted contacts.skype.com.
sqlite> select * from url_db;
play.googleapis.com
notifications.google.com
edge.skype.com
people.skype.com
What's the cause for this behavior? I am implementing a FIFO list in which when the entries reach 500, I need to delete the first entry.
sometimes it deletes the random entry but not the first entry.
You misunderstand relational databases. A table represents an unordered set. There is no "first" row in a table, unless you explicitly define the ordering.
Because this problem occurs so often, SQLite has a built-in work-around, the rowid "column". You can use this as a regular column, resulting in:
delete from url_db
order by rowid
limit 1;
Personally, I prefer an explicitly declared auto-increment column, but SQLite builds in this functions (unlike other databases).
I have 730000+ records which I need to delete in Ingres db which work with ANSI92 and I need to delete then without overload db, simple delete where search condition, doesn't work, DB just use all memory and trowing error. thinking to run it in loop, and delete by portions 10-20K of records .
i tried to use top and it didn't work
delete top (10)from TABLE where web_id <0 ;
, also was trying to use Limit also didnt work
DELETE FROM from TABLE where web_id <0 LIMIT 10;
any ideas how to do it ? Thank you !
You could use a session temporary table to hold the first 10 tids (tuple id's) and then delete based on those:
declare global temporary table session.tenrows as
select first 10 tid the_tid from "table" where web_id<0
on commit preserve rows with norecovery;
delete from "table" where tid in (select the_tid from session.tenrows);
When you say "without overload db", do you mean avoiding hitting the force-abort limit of the transaction log file? If so what might work for you is:
set session with on_logfull=notify;
delete from table where web_id<0;
This would automatically commit your transaction at points where force-abort is reached then carry on, rather than rolling back and reporting an error.
A downside of using this setting is that it can be tricky to unpick what has/hasn't been done if any other error should occur (your work will likely be partially committed), but since this appears to be a straight delete from a table it should be quite obvious which rows remain and which don't.
The "set session" statement must be run at the start of a transaction.
I would advise not running concurrent sessions with "on_logfull=notify" (there have been bugs in this area, whether they're fixed in your installation depends on your version/patch level).
I've almost seen every post concerning this question but haven't captured the best one. Some of them recommend using Identity but some triggers to perform incrementing integer column.
I'd like also to use triggers as there will be more delete happen in my table in this case. In addition, as I have mainly come from Interbase DBMS where I used to create a before insert trigger on table this issue sucks until now as I migrated from Interbase to MS SQL Server.
This is how I did in Interbase
CREATE trigger currency_bi for currency
active before insert position 0
AS
declare variable m integer;
begin
select max(id)+1 from currency into :m;
if (:m is NULL ) then m=1;
new.id=:m;
end
So, as I should frequently use this, which is the best way to create a trigger that increments integer column using max(id)+1 ?
Don't use triggers to do this, it will either kill the performance or cause all sorts of concurrency problems, depending on your use of transactions and locking.
It's better to use one of mechanisms available in the engine -- identity property or sequence object.
If you're running a newer version of SQL Server, with sequence feature available, use sequence. It will allow you to reserve a range of ids from the client applcation, and assign them to new rows on the client, before sending them to server for insert.
Always use Identity option , because as you told that you frequently delete the record, in this case trigger will some time give wrong information ( Called Isolation level).
Suppose one transaction delete the highest one record and just before or same time your trigger fired. So it get the deleted highest record which is not exist after few second.
So when you fired select query, it show the gap which is wrong.
Sqlserver give the inbuilt mechanism of this type of situation with auto identity true option.
http://mrbool.com/understanding-auto-increment-in-sql-server/29171
You donot bother about this. Also draw back of trigger is if multiple insert happened, then it always fired after the last insert statement.
Try to never use trigger , as it is harmful and not controllable.
Still if you want , then add in your insert statement , not use trigger
How can I auto-increment a column without using IDENTITY?
I got a Table with an Auto Increment Column which looks like:
ALTER TABLE SOMESCHEMA.SOMETABLE
ALTER COLUMN ID
SET DATA TYPE INTEGER GENERATED BY DEFAULT
SET INCREMENT BY 1
SET NO ORDER
SET NO CYCLE
SET MINVALUE 1
SET MAXVALUE 2147483647
SET NO CACHE;
As long as i let the DBMS generate the Ids everything works fine and I can get the generated Id via:
SELECT IDENTITY_VAL_LOCAL() FROM sysibm.sysdummy1
But sometimes i need to insert a row with an ID of my choice and there i get into trouble.
Lets say we got a single row in the table with ID 1.Now i insert a new row with a manually assigned id of 2. The next time i try to insert a new row without a preset ID i get an error SQL0803 "DUPLICATE KEY".
I assume the internal "NextId" field for that Auto-Increment Column doesnt update itself if the Id of a row is manually set.
So I tried reseting this field with:
ALTER TABLE SOMESCHEMA.SOMETABLE ALTER COLUMN ID RESTART WITH 3
But this causes a permanent Table lock, which i dont know how to unlock.
How can i get this "Mixed-Mode" ID-Column working? Is it possible to get it to work like MySQL where the DBMS automatically updates the "NextID" upon a manually-Id Insert? If not, how can I release that {insert swear-word here} lock that pops up if i try to reset the NextId?
SQL0913 isn't creating a lock - it is reporting that a lock exists. ALTER TABLE needs an exclusive lock on the table in order to reset the ID number. A table can be locked by another process having it open, or it can be locked by this process if there are uncommitted rows.
There is another reason the table is in use - soft close (or pseudo-close). For performance reasons, DB2 for i keeps cursors in memory so that they can be reused as efficiently as possible. So even if you say CLOSE CURSOR, DB2 keeps it in memory. These soft closed cursors can be closed by the command ALCOBJ OBJ((SOMSCHEMA/SOMETABLE *FILE *EXCL)) WAIT(1) CONFLICT(*RQSRLS) The CONFLICT(*RQSRLS) parameter tells DB2 to close all soft closed cursors.
So the root of the issue is that DB2 wants exclusive access to the table. Which is sort of a design question, because typically one doesn't manipulate the table's structure during the work day. It sounds as though this table is sometimes a parent and sometimes a child when it comes to ID numbers. If that is the case, may I suggest that you ALTER the table again?
I think the implementation might be better if you used a trigger rather than auto-increment. Fire the trigger on INSERT. If ID is supplied, do nothing. If ID is not supplied, SELECT MAX(ID)+1 and use that as the actual ID number you commit to the database.
ALTER TABLE table_name ALTER COLUMN column_name RESTART WITH 99999;
Fixed my issue. "99999" is the next ID to be used for example
In one of our apps, we read in data from a file and expand it into several tables. If any part of the file is corrupt, we halt the read, and delete whatever data got inserted.
The issue, here, is that we have an auto-increment ID on one of the import tables, and when we remove a problem file, the ID continues from its post-import value, rather than its pre-import value.
In other words...
ID starts at 50.
Insert 100 records, max ID is now 150.
Delete 100 records, max ID is still 150.
Insert 50 records, ID is 200.
We've "lost" the range of 100 records. Is there an "auto decrement" equivalent to go with the auto increment?
Autonumbers shouldn't be that meaningful to you. Their guarantee is they provide uniqueness nothing more. You can still reseed if you are using sql server DBCC CHECKIDENT.
Checks the current identity value for the specified table and, if it is needed, changes the identity value. You can also use DBCC CHECKIDENT to manually set a new seed value for the identity column.
From BOL:
The following example forces the current identity value in the Employee table in the AdventureWorks database to a value of 30.
USE AdventureWorks;
GO
DBCC CHECKIDENT ('HumanResources.Employee', RESEED, 30);
GO
I am not recommending this but just pointing it out. DBCC CHECKIDENT can throw an error if you try to reseed to a value that is already being used, in that case you'd have to have logic if you were relying upon such a task.
I question the thought process of what makes these numbers so important? It sounds like you want one additional field called LineNumber that is incremented or decremented, etc. But even in this case you have to handle the rows that come after the deleted record. So if you have 50 rows and you delete row 25 you have to renumber anything greater then 25:
UPDATE
MyTable
SET LineItemNumber = LineItemNumber - 1
WHERE
LineItemNumber > #LineItemNumberToBeDeleted
Auto decrementing on delete sounds like a bad idea. If done incorrectly, you can start injecting much bigger bugs into your code. If the IDs are a big deal, try giving it a batch number, and an incremented id for every item in the batch. You could also use guids, though they're not sequential.
You can reseed the AutoIncrement ID by doing:
DBCC CHECKIDENT
(
tablename
[, [NORESEED | RESEED [, newreseedvalue]]]
)
However I would not recommend this as a best practice. Your query should be atomic (It commits and updates the table or Rollsback if it fails (leaving the ID untouched). To implement an atomic query you could use a TRANSACTION.
BEGIN TRY
BEGIN TRANSACTION #TranName;
-- Your database logic here
COMMIT TRANSACTION #TranName;
END TRY
BEGIN CATCH
ROLLBACK TRAN #TranName;
END CATCH
GO
Sources:
http://msdn.microsoft.com/en-us/library/ms188929.aspx
http://www.techrepublic.com/blog/datacenter/how-do-i-reseed-a-sql-server-identity-column/406
In my experience, a best practice for this portion of an ETL (Extract Transform Load) process is to perform the bulk load in more than one step:
Load data from file(s) into empty "loading" or "staging" table(s) which exist just for this purpose. This detects file-level corruption.
Check data for referential integrity and other validations. This detects data-level errors.
Insert only valid data into the "real" tables. This avoids unnecessary deletes and avoids wasting auto increment values.
Log or report data which failed the import checks.
Immediately before the next run of this process, truncate the "loading" tables.