Vertica sql overwrite data on insert - sql

How to overwrite the table each time there is an insert statement in a vertica?
Consider:
INSERT INTO table1 VALUES ('My Value');
This will give say
| MyCol |
----------
MyValue
How to overwrite the same table on next insert statement say
INSERT INTO table1 VALUES ('My Value2');
| MyCol |
----------
MyValue2

You can either DELETE or TRUNCATE your table. There is no override method for Vertica. Use TRUNCATE since you have wanted only and only a value.
Source
INSERT INTO table1 VALUES ('My Value');
TRUNCATE TABLE table1;
INSERT INTO table1 VALUES ('My Value2');
Or (if connection get lost before you commit then it will not get effected.)
Rollback
An individual statement returns an ERROR message. In this case, Vertica rolls back the statement.
DDL errors, systemic failures, dead locks, and resource constraints return a ROLLBACK message. In this case, Vertica rolls back the entire transaction.
INSERT INTO table1 VALUES ('My Value');
DELETE FROM table1
WHERE MyCol !='My Value2';
INSERT INTO table1 VALUES ('My Value2');
COMMIT;

I might suggest that you don't do such a thing.
The simplest method is to populate the table with a row, perhaps:
insert into table1 (value)
values (null);
Then use update, not insert:
update table1
set value = ?;
That fixes your problem.
If you insist on using insert, you could insert values with an identity column and use a view to get the most recent value:
create table table1 (
table1_id identity(1, 1),
value varchar(255)
);
Then access the table using a view:
create view v_table1 as
select value
from table1
order by table1_id desc
limit 1;
If the view becomes inefficient, you can periodically empty the table.
One advantage of this approach is that the table is never empty and not locked for very long -- so it is generally available. Deleting rows and inserting rows can be tricky in that respect.
If you really like triggers, you can use a table as above. Then use a trigger to update the row in another table that has a single row. This also maximizes availability, without overhead for fetching the most recent value.

If it is a single-row table, then there's no risk whatsoever to fill it with a single row that can be NULL, as #Gordon Linoff suggests.
Internally, you should be aware that Vertica, in the background, always implements an UPDATE as a DELETE, by adding a delete vector for the row, and then applying an INSERT.
No problem with a single-row table, as the Tuple Mover (the background daemon process that wakes up all 5 mins to de-fragment the internal storage, to put it simply, and will create a single data (Read Optimized Storage - ROS) container out of: the previous value; the delete vector pointing to that previous value, thus deactivating it, and the newly inserted value that it is updated to.
So:
CREATE TABLE table1 (
mycol VARCHAR(16)
) UNSEGMENTED ALL NODES; -- a small table, replicate it across all nodes
-- now you have an empty table
-- for the following scenario, I assume you commit the changes every time, as other connected
-- processes will want to see the data you changed
-- then, only once:
INSERT INTO table1 VALUES(NULL::VARCHAR(16);
-- now, you get a ROS container for one row.
-- Later:
UPDATE table1 SET mycol='first value';
-- a DELETE vector is created to mark the initial "NULL" value as invalid
-- a new row is added to the ROS container with the value "first value"
-- Then, before 5 minutes have elapsed, you go:
UPDATE table1 SET mycol='second value';
-- another DELETE vector is created, in a new delete-vector-ROS-container,
-- to mark "first value" as invalid
-- another new row is added to a new ROS container, containing "second value"
-- Now 5 minutes have elapsed since the start, the Tuple Mover sees there's work to do,
-- and:
-- - it reads the ROS containers containing "NULL" and "first value"
-- - it reads the delete-vector-ROS containers marking both "NULL" and "first value"
-- as invalid
-- - it reads the last ROS container containing "second value"
-- --> and it finally merges all into a brand new ROS container, to only contain.
-- "second value", and, at the end the four other ROS containers are deleted.
With a single-row table, this works wonderfully. Don't do it like that for a billion rows.

Related

Edit inserted table sql

When insert I need edit a value if it is null. I create a trigger but I don't know how to edit inserted table.
ALTER TRIGGER [trigger1] on [dbo].[table]
instead of insert
as
declare #secuencia bigint, #ID_PERSONA VARCHAR;
select #secuencia = SECUENCIA from inserted
select #ID_PERSONA = ID_PERSONA from inserted
if #secuencia is null begin
set inserted.SECUENCIA = NEXT VALUE FOR SEQ_BIOINTEG --(Sequence)
end
i dont know how to edit inserted table.
You do not. That table is read only.
Note how your trigger also says:
instead of insert
There is no way to edit the inserted table.
What you do instead, is setting up an INSERT command for the original table, using the data from the inserted table to filter to the ROWS of inserted - mostly by a join.
Changing inserted makes no sense, logically - because triggers in SQL are one of two things:
INSTEAD OF - then there is no actual insert happening for inserted to start with. Instead of doing the insert, the trigger is called. As such, changing inserted - makes no sense.
AFTER - then the insert already happened (and you UPDATE the rows). As the trigger runs after the update, changing inserting makes no sense.
Note that I say ROWS - your trigger has one very basic error: it assumes inerted contains ONE row. It is a table - it is possible the changes come from an insert statement that inserts multiple rows (which is trivial, i.e. select into, or simply an insert with values for multiple rows). Handle those.
select #ID_PERSONA = ID_PERSONA from inserted
Makes NO sense - inserted is a table, so ID_PERSONA from inserted contains what value, if 2 rows are inserted? You must treat inserted like any other table.
Apart from all the varied issues with your trigger code, as mentioned by others, the easiest way to use a SEQUENCE value in a table is to just put it in a DEFAULT constraint:
ALTER TABLE dbo.[table]
ADD CONSTRAINT DF_table_seq
DEFAULT (NEXT VALUE FOR dbo.SEQ_BIOINTEG)
FOR SECUENCIA;

Is there an elegant way to deal with inserting rows with duplicate identifiers?

I'm trying to insert some rows into my table that have the same unique identifier, but all the other fields are different (the rows represent points on a map, and they just happen to have the same name). The final result I'd like to end up with is to somehow modify the offending rows to have unique identifiers (adding on some incrementing number to the identifier, like "name0", "name1", "name2", etc.) during the insertion command.
I'm aware of Postgres's recent addition of "ON CONFLICT" support, but it's not quite what I'm looking for.
According to the Postgres 9.6 Documentation:
The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error. For each individual row proposed for insertion, either the insertion proceeds, or ... the alternative conflict_action is taken. ...ON CONFLICT DO UPDATE updates the existing row that conflicts with the row proposed for insertion as its alternative action.
What I would like to do is 1) either modify the offending row or the insertion itself and 2) proceed with the insertion (instead of replacing it with an update, like the ON CONFLICT feature does). Is there an elegant way of accomplishing this? Or am I going to need to write something more complex?
You can do this:
create table my_table
(
name text primary key,
some_column varchar
);
create sequence my_table_seq;
The sequence is used to assign a unique suffix to the new row's PK column.
The "insert on conflict insert modified" behaviour can be done like this:
with data (name, some_column) as (
values ('foo', 'bar')
), inserted as (
insert into my_table
select *
from data
on conflict (name) do nothing
returning *
)
insert into my_table (name, some_column)
select concat(name, '_', nextval('my_table_seq')), some_column
from data
where not exists (select 1 from inserted);
The first time you insert a value into the PK column, the insert (in the CTE "inserted") just proceeds. The final insert won't insert anything because the where not exists () prevents that as the inserted returned one row.
The second time you run this, the first insert won't insert anything, and thus the second (final) insert will.
There is one drawback though: if something was inserted by the "inserted" CTE, the the overall statement will report "0 rows affected" because the final insert is the one "driving" this information.

multithreading with the trigger

I have written a Trigger which is transferring a record from a table members_new to members_old. The Function of trigger is to insert a record into members_old on after insert in members_new. So suppose a record is getting inserted into a members_new like
nMmbID nMmbName nMmbAdd
1 Abhi Bangalore
This record will get inserted into members_old with the same data structure of the table
My trigger is like :
create trigger add_new_record
after
insert on members_new
for each row
INSERT INTO `test`.`members_old`
(
`nMmbID`,
`nMmbName`,
`nMmbAdd`
)
(
SELECT
`members_new`.`nMmbID`,
`members_new`.`nMmbName`,
`members_new`.`nMmbAdd`
FROM `test`.`members_new`
where nMmbID = (select max(nMmbID) from `test`.`members_new` // written to read the last record from the members_new and stop duplication on the members_old , also this will reduce the chances of any error . )
)
This trigger is working for now , but my confusion is that what will happen if a multiple insertion is happening at one instance of time.
Will it reduce the performance?
Will I face deadlock condition ever in any case as my members_old have FKs?
If any better solution for this situation is there, please give limelight on that
From the manual:
You can refer to columns in the subject table (the table associated with the trigger) by using the aliases OLD and NEW. OLD.col_name refers to a column of an existing row before it is updated or deleted. NEW.col_name refers to the column of a new row to be inserted or an existing row after it is updated.
create trigger add_new_record
after
insert on members_new
for each row
INSERT INTO `test`.`members_old`
SET
`nMmbID` = NEW.nMmbID,
`nMmbName` = NEW.nMmbName,
`nMmbAdd` = NEW.nMmbAdd;
And you will have no problem with deadlocks or whatever. Also it should be much faster, because you don't have to read the max value before (which is also unsecure and might lead to compromised data). Read about isolation levels and transactions if you're interested why...

Delete and Insert or Select and Update

We have a status table. When the status changes we currently delete the old record and insert a new.
We are wondering if it would be faster to do a select to check if it exists followed by an insert or update.
Although similar to the following question, it is not the same, since we are changing individual records and the other question was doing a total table refresh.
DELETE, INSERT vs UPDATE || INSERT
Since you're talking SQL Server 2008, have you considered MERGE? It's a single statement that allows you to do an update or insert:
create table T1 (
ID int not null,
Val1 varchar(10) not null
)
go
insert into T1 (ID,Val1)
select 1,'abc'
go
merge into T1
using (select 1 as ID,'def' as Val1) upd on T1.ID = upd.ID --<-- These identify the row you want to update/insert and the new value you want to set. They could be #parameters
when matched then update set Val1 = upd.Val1
when not matched then insert (ID,Val1) values (upd.ID,upd.Val1);
What about INSERT ... ON DUPLICATE KEY? First doing a select to check if a record exists and checking in your program the result of that creates a race condition. That might not be important in your case if there is only a single instance of the program however.
INSERT INTO users (username, email) VALUES ('Jo', 'jo#email.com')
ON DUPLICATE KEY UPDATE email = 'jo#email.com'
You can use ##ROWCOUNT and perform UPDATE. If it was 0 rows affected - then perform INSERT after, nothing otherwise.
Your suggestion would mean always two instructions for each status change. The usual way is to do an UPDATE and then check if the operation changed any rows (Most databases have a variable like ROWCOUNT which should be greater than 0 if something changed). If it didn't, do an INSERT.
Search for UPSERT for find patterns for your specific DBMS
Personally, I think the UPDATE method is the best. Instead of doing a SELECT first to check if a record already exists, you can first attempt an UPDATE but if no rows are affected (using ##ROWCOUNT) you can do an INSERT.
The reason for this is that sooner or later you might want to track status changes, and the best way to do this would be to keep an audit trail of all changes using a trigger on the status table.

Getting the next ID without inserting a row

Is it possible in SQL (SQL Server) to retrieve the next ID (integer) from an identity column in a table before, and without actually, inserting a row? This is not necessarily the highest ID plus 1 if the most recent row was deleted.
I ask this because we occassionally have to update a live DB with new rows. The ID of the row is used in our code (e.g. Switch (ID){ Case ID: } and must be the same. If our development DB and live DB get out of sync, it would be nice to predict a row ID in advance before deployment.
I could of course SET IDENTITY OFF SET INSERT_IDENTITY ON or run a transaction (does this roll back the ID?) etc but wondered if there was a function that returned the next ID (without incrementing it).
try IDENT_CURRENT:
Select IDENT_CURRENT('yourtablename')
This works even if you haven't inserted any rows in the current session:
Returns the last identity value generated for a specified table or view. The last identity value generated can be for any session and any scope.
Edit:
After spending a number of hours comparing entire page dumps, I realised there is an easier way and I should of stayed on the DMVs.
The value survives a backup / restore, which is a clear indication that it is stored - I dumped all the pages in the DB and couldn't find the location / alteration for when
a record was added. Comparing 200k line dumps of pages isn't fun.
I had used the dedicated admin console I took a dump of every single internal table exposed inserted a row and then took a further dump of the system tables. Both of the dumps were identical, which indicates that whilst it survived, and therefore must be stored, it is not exposed even at that level.
So after going around in a circle I realised the DMV did have the answer.
create table foo (MyID int identity not null, MyField char(10))
insert into foo values ('test')
go 10
-- Inserted 10 rows
select Convert(varchar(8),increment_value) as IncrementValue,
Convert(varchar(8),last_value) as LastValue
from sys.identity_columns where name ='myid'
-- insert another row
insert into foo values ('test')
-- check the values again
select Convert(varchar(8),increment_value) as IncrementValue,
Convert(varchar(8),last_value) as LastValue
from sys.identity_columns where name ='myid'
-- delete the rows
delete from foo
-- check the DMV again
select Convert(varchar(8),increment_value) as IncrementValue,
Convert(varchar(8),last_value) as LastValue
from sys.identity_columns where name ='myid'
-- value is currently 11 and increment is 1, so the next insert gets 12
insert into foo values ('test')
select * from foo
Result:
MyID MyField
----------- ----------
12 test
(1 row(s) affected)
Just because the rows got removed, the last value was not reset, so the last value + increment should be the right answer.
Also going to write up the episode on my blog.
Oh, and the short cut to it all:
select ident_current('foo') + ident_incr('foo')
So it actually turns out to be easy - but this all assumes no one else has used your ID whilst you got it back. Fine for investigation, but I wouldn't want to use it in code.
This is a little bit strange but it will work:
If you want to know the next value, start by getting the greatest value plus one:
SELECT max(id) FROM yourtable
To make this work, you'll need to reset the identity on insert:
DECLARE #value INTEGER
SELECT #value = max(id) + 1 FROM yourtable
DBCC CHECKIDENT (yourtable, reseed, #value)
INSERT INTO yourtable ...
Not exactly an elegant solution but I haven't had my coffee yet ;-)
(This also assumes that there is nothing done to the table by your process or any other process between the first and second blocks of code).
You can pretty easily determine that the last value used is:
SELECT
last_value
FROM
sys.identity_columns
WHERE
object_id = OBJECT_ID('yourtablename')
Usually, the next ID will be last_value + 1 - but there's no guarantee for that.
Marc
Rather than using an IDENTITY column, you could use a UNIQUEIDENTIFIER (Guid) column as the unique row identifer and insert known values.
The other option (which I use) is SET IDENTITY_INSERT ON, where the row IDs are managed in a source controlled single 'document'.