How do I add an auto incrementing column to an existing vertica table? - sql

I have a table that currently has the following structure
id, row1
(null), 232
(null), 4455
(null), 16
I'd like for id to be an auto incrementing primary key, as follows:
id, row1
1, 232
2, 4455
3, 16
I've read the documentation and it looks like the function that I need is AUTO_INCREMENT and that I can edit the table using an ALTER TABLE statement. However, I can't seem to get the syntax quite right. How do I go about doing this? Is it even possible with a pre-existing table?

What you need to do is the following:
create a new sequence:
CREATE SEQUENCE sequence_auto_increment START 1;
create a new table:
create table tab2 as select * from tab1 limit 0;
insert the data:
insert /*+ direct */ into tab2
select NEXTVAL('sequence_auto_increment'),row1 from tab1;
as #Kermit mentioned the best way to do it in Vertica is to recreate the table(once) instead of multiple times, use the direct hint so you skip the WOS storage(much faster)
As for the column constraint that #Nazmul created, i won't use it Vertica doesn't care to much about constraints, you will need to force him to insert what you want and default constraints are not the way.

You need to update your exiting data something like below
UPDATE table
SET id = table2.id
FROM
(
SELECT row1, RANK() OVER (ORDER BY val) as id
FROM t1;
) as table2
where table.primaryKey = table2.primaryKey
Then you do alter your table using below syntax
-- get the value to start sequence at
SELECT MAX(id) FROM t2;
-- create the sequence
CREATE SEQUENCE seq1 START 5;
-- syntax as of 6.1
-- modify the column to add next value for future rows
ALTER TABLE t2 ALTER COLUMN id SET DEFAULT NEXTVAL('seq1');

If you want to use the Auto_Increment feature,
1)Copy data to temp table
2)Recreate the base table with the column using auto increment
3)Copy back the data to for other columns
If you just want the numbers in, refer the other answer by Nazmul

Related

How to Insert new Record into Table if the Record is not Present in the Table in Teradata

I want to insert a new record if the record is not present in the table
For that I am using below query in Teradata
INSERT INTO sample(id, name) VALUES('12','rao')
WHERE NOT EXISTS (SELECT id FROM sample WHERE id = '12');
When I execute the above query I am getting below error.
WHERE NOT EXISTS
Failure 3706 Syntax error: expected something between ')' and the 'WHERE' keyword.
Can anyone help with the above issue. It will be very helpful.
You can use INSERT INTO ... SELECT ... as follows:
INSERT INTO sample(id,name)
select '12','rao'
WHERE NOT EXISTS (SELECT id FROM sample WHERE id = '12');
You can also create the primary/unique key on id column to avoid inserting duplicate data in id column.
I would advise writing the query as:
INSERT INTO sample (id, name)
SELECT id, name
FROM (SELECT 12 as id, 'rao' as name) x
WHERE NOT EXISTS (SELECT 1 FROM sample s WHERE s.id = x.id);
This means that you do not need to repeat the constant value -- such repetition can be a cause of errors in queries. Note that I removed the single quotes. id looks like a number so treat it as a number.
The uniqueness of ids is usually handled using a unique constraint or index:
alter table sample add constraint unq_sample_id unique (id);
This makes sure that the database ensures uniqueness. Your approach can fail if two inserts are run at the same time with the same id. An attempt to insert a duplicates returns an error (which the exists can then avoid).
In practice, id columns are usually generated automatically by the database. So the create table statement would look more like:
id integer generated by default as identity
And the insert would look like:
insert into sample (name)
values (name);
If id is the Primary Index of the table you can use MERGE:
merge into sample as tgt
using VALUES('12','rao') as src (id, name)
on src.id = tgt.id
when not matched
then insert (src.id,src.name)

max value in insert into

I need to get the maximum number in a column because in a insert operation I have to insert max number in that column +1 for each insert, I did this:
insert into table1(id ,.., field,...)
select newid(), ..., (SELECT CONVERT(VARCHAR(8), FORMAT((MAX(number)+1),'00000000'))
FROM table1)
It works just for the first inserted row, than I get this number for other rows too!
This is too long for a comment.
Why aren't you using an identity column? You can define the number as:
number int identity(1, 1)
If you want the value as a string padded to eight characters, then use a computed column:
number_string as (format(number, '00000000'))
EDIT:
There are strong reasons why you want an identity column and not to calculate the values yourself. You can do what you want using row_number(), where the logic looks like this:
insert into table1(id ,.., field,...)
select newid(), ...,
convert(varchar(8),
(coalesce(t1.max_number, 0) +
row_number() over (order by (select null))
)
)
from table2 t2 cross join
(select max(t1.number) as max_number from table1 t1) t1;
N
Note: I am assuming that the inserts are coming from a different table, but table2 can really be table1.
Very importantly: This is not thread safe. Two different threads can run the same code and result in the same values. The solution to this is locking the entire table. However, that can have very significant performance impacts.
There's a lot to comment on here, and like Gordon, I can't fit it into one.
Firstly, I notice you have a column id and you're inserted the value NEWID() into it. I therefore hope that id isn't your CLUSTERED INDEX, as if it is NEWID() is not doing it any favours. If you are using a uniqueindentifier for your CLUSTERED INDEX, use NEWSEQUENTIALID() instead, and don't provuide the value in the INSERT:
ALTER TABLE dbo.Table1 ADD CONSTRAINT DF_Table1_id DEFAULT NEWSEQUENTIALID() FOR id;
As for your INSERT, as I said in my comment: "Use an IDENTITY column. If you must then have sequential values, use a VIEW and ROW_NUMBER. Let SQL Server gracefully handle the incrementing value. Trying to increment the number yourself is going to only cause you problems, such as race conditions, and your data will be in a far worse position.". Unfortunately you can't change an existing column to an IDENTITY, so this is a little harder. Likely you'll want to do something like:
ALTER TABLE dbo.Table1 ADD number_new int IDENTITY(1,1);
GO
SET IDENTITY_INSERT dbo.Table1 ON;
--UPdate the new column with the existing values
UPDATE dbo.Table1
SET number_new = number;
SET IDENTITY_INSERT dbo.Table1 OFF;
GO
--Drop the old column and rename
ALTER TABLE dbo.Table1 DROP COLUMN number;
EXEC sp_rename N'dbo.Table1.number_new', N'number', N'COLUMN'
As Gordon said, if you simply then need a formatted value (with leading 0's) and no worry about gaps, use a computed column:
ALTER TABLE dbo.Table1 ADD Number_f AS RIGHT(CONCAT('00000000',number),8) PERSISTED;
If, however, you want them to be in sequential order, and update accordingly when a row is deleted, or a INSERT fails, etc, then you can use a view, with the following expression:
RIGHT(CONCAT('00000000',ROW_NUMBER() OVER (ORDER BY number ASC),8)
You can use a ROW_NUMBER() function to increment the IDs. Just replace the +1 with ROW_NUMBER() OVER(). Something like this:
insert into table1(id ,.., field,...)
select (SELECT MAX(number) FROM table1) + ROW_NUMBER() OVER(ORDER BY <field1>), ...
from table1
SQL Fiddle

Can I keep old keys linked to new keys when making a copy in SQL?

I am trying to copy a record in a table and change a few values with a stored procedure in SQL Server 2005. This is simple, but I also need to copy relationships in other tables with the new primary keys. As this proc is being used to batch copy records, I've found it difficult to store some relationship between old keys and new keys.
Right now, I am grabbing new keys from the batch insert using OUTPUT INTO.
ex:
INSERT INTO table
(column1, column2,...)
OUTPUT INSERTED.PrimaryKey INTO #TableVariable
SELECT column1, column2,...
Is there a way like this to easily get the old keys inserted at the same time I am inserting new keys (to ensure I have paired up the proper corresponding keys)?
I know cursors are an option, but I have never used them and have only heard them referenced in a horror story fashion. I'd much prefer to use OUTPUT INTO, or something like it.
If you need to track both old and new keys in your temp table, you need to cheat and use MERGE:
Data setup:
create table T (
ID int IDENTITY(5,7) not null,
Col1 varchar(10) not null
);
go
insert into T (Col1) values ('abc'),('def');
And the replacement for your INSERT statement:
declare #TV table (
Old_ID int not null,
New_ID int not null
);
merge into T t1
using (select ID,Col1 from T) t2
on 1 = 0
when not matched then insert (Col1) values (t2.Col1)
output t2.ID,inserted.ID into #TV;
And (actually needs to be in the same batch so that you can access the table variable):
select * from T;
select * from #TV;
Produces:
ID Col1
5 abc
12 def
19 abc
26 def
Old_ID New_ID
5 19
12 26
The reason you have to do this is because of an irritating limitation on the OUTPUT clause when used with INSERT - you can only access the inserted table, not any of the tables that might be part of a SELECT.
Related - More explanation of the MERGE abuse
INSERT statements loading data into tables with an IDENTITY column are guaranteed to generate the values in the same order as the ORDER BY clause in the SELECT.
If you want the IDENTITY values to be assigned in a sequential fashion
that follows the ordering in the ORDER BY clause, create a table that
contains a column with the IDENTITY property and then run an INSERT ..
SELECT … ORDER BY query to populate this table.
From: The behavior of the IDENTITY function when used with SELECT INTO or INSERT .. SELECT queries that contain an ORDER BY clause
You can use this fact to match your old with your new identity values. First collect the list of primary keys that you intend to copy into a temporary table. You can also include your modified column values as well if needed:
select
PrimaryKey,
Col1
--Col2... etc
into #NewRecords
from Table
--where whatever...
Then do your INSERT with the OUTPUT clause to capture your new ids into the table variable:
declare #TableVariable table (
New_ID int not null
);
INSERT INTO #table
(Col1 /*,Col2... ect.*/)
OUTPUT INSERTED.PrimaryKey INTO #NewIds
SELECT Col1 /*,Col2... ect.*/
from #NewRecords
order by PrimaryKey
Because of the ORDER BY PrimaryKey statement, you will be guaranteed that your New_ID numbers will be generated in the same order as the PrimaryKey field of the copied records. Now you can match them up by row numbers ordered by the ID values. The following query would give you the parings:
select PrimaryKey, New_ID
from
(select PrimaryKey,
ROW_NUMBER() over (order by PrimaryKey) OldRow
from #NewRecords
) PrimaryKeys
join
(
select New_ID,
ROW_NUMBER() over (order by New_ID) NewRow
from #NewIds
) New_IDs
on OldRow = NewRow

Create a unique primary key (hash) from database columns

I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?

Row number in Sybase tables

Sybase db tables do not have a concept of self updating row numbers. However , for one of the modules , I require the presence of rownumber corresponding to each row in the database such that max(Column) would always tell me the number of rows in the table.
I thought I'll introduce an int column and keep updating this column to keep track of the row number. However I'm having problems in updating this column in case of deletes. What sql should I use in delete trigger to update this column?
You can easily assign a unique number to each row by using an identity column. The identity can be a numeric or an integer (in ASE12+).
This will almost do what you require. There are certain circumstances in which you will get a gap in the identity sequence. (These are called "identity gaps", the best discussion on them is here). Also deletes will cause gaps in the sequence as you've identified.
Why do you need to use max(col) to get the number of rows in the table, when you could just use count(*)? If you're trying to get the last row from the table, then you can do
select * from table where column = (select max(column) from table).
Regarding the delete trigger to update a manually managed column, I think this would be a potential source of deadlocks, and many performance issues. Imagine you have 1 million rows in your table, and you delete row 1, that's 999999 rows you now have to update to subtract 1 from the id.
Delete trigger
CREATE TRIGGER tigger ON myTable FOR DELETE
AS
update myTable
set id = id - (select count(*) from deleted d where d.id < t.id)
from myTable t
To avoid locking problems
You could add an extra table (which joins to your primary table) like this:
CREATE TABLE rowCounter
(id int, -- foreign key to main table
rownum int)
... and use the rownum field from this table.
If you put the delete trigger on this table then you would hugely reduce the potential for locking problems.
Approximate solution?
Does the table need to keep its rownumbers up to date all the time?
If not, you could have a job which runs every minute or so, which checks for gaps in the rownum, and does an update.
Question: do the rownumbers have to reflect the order in which rows were inserted?
If not, you could do far fewer updates, but only updating the most recent rows, "moving" them into gaps.
Leave a comment if you would like me to post any SQL for these ideas.
I'm not sure why you would want to do this. You could experiment with using temporary tables and "select into" with an Identity column like below.
create table test
(
col1 int,
col2 varchar(3)
)
insert into test values (100, "abc")
insert into test values (111, "def")
insert into test values (222, "ghi")
insert into test values (300, "jkl")
insert into test values (400, "mno")
select rank = identity(10), col1 into #t1 from Test
select * from #t1
delete from test where col2="ghi"
select rank = identity(10), col1 into #t2 from Test
select * from #t2
drop table test
drop table #t1
drop table #t2
This would give you a dynamic id (of sorts)