Get identity of row inserted in Snowflake Datawarehouse

Get identity of row inserted in Snowflake Datawarehouse - sql

If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created. I know that generally, StackOverflow questions need some sort of code that was attempted or research effort, but I'm not sure where to begin with Snowflake. I've dug through their documentation and I've found nothing for this.
The best I could do so far is try result_scan() and last_query_id(), but these don't give me any relevant information about the row that was inserted, just confirmation that a row was inserted.
I believe what I'm asking for is along the lines of MS SQL Server's SCOPE_IDENTITY() function.
Is there a Snowflake equivalent function for MS SQL Server's SCOPE_IDENTITY()?
EDIT: for the sake of having code in here:
CREATE TABLE my_db..my_table
(
ROWID INT IDENTITY(1,1),
some_number INT,
a_time TIMESTAMP_LTZ(9),
b_time TIMESTAMP_LTZ(9),
more_data VARCHAR(10)
);
INSERT INTO my_db..my_table
(
some_number,
a_time,
more_data
)
VALUES
(1, my_time_value, some_data);
I want to get to that auto-increment ROWID for this row I just inserted.

NOTE: The answer below can be not 100% correct in some very rare cases, see the UPDATE section below
Original answer
Snowflake does not provide the equivalent of SCOPE_IDENTITY today.
However, you can exploit Snowflake's time travel to retrieve the maximum value of a column right after a given statement is executed.
Here's an example:
create or replace table x(rid int identity, num int);
insert into x(num) values(7);
insert into x(num) values(9);
-- you can insert rows in a separate transaction now to test it
select max(rid) from x AT(statement=>last_query_id());
----------+
MAX(RID) |
----------+
2 |
----------+
You can also save the last_query_id() into a variable if you want to access it later, e.g.
insert into x(num) values(5);
set qid = last_query_id();
...
select max(rid) from x AT(statement=>$qid);
Note - it will be usually correct, but if the user e.g. inserts a large value into rid manually, it might influence the result of this query.
UPDATE
Note, I realized the code above might rarely generate incorrect answer.
Since the execution order of various phases of a query in a distributed system like Snowflake can be non-deterministic, and Snowflake allows concurrent INSERT statements, the following might happen
Two queries, Q1 and Q2, do a simple single row INSERT, start at roughly the same time
Q1 starts, is a bit ahead
Q2 starts
Q1 creates a row with value 1 from the IDENTITY column
Q2 creates a row with value 2 from the IDENTITY column
Q2 gets ahead of Q1 - this is the key part
Q2 commits, is marked as finished at time T2
Q1 commits, is marked as finished at time T1
Note that T1 is later than T2. Now, when we try to do SELECT ... AT(statement=>Q1), we will see the state as-of T1, including all changes from statements before, hence including the value 2 from Q2. Which is not what we want.
The way around it could be to add a unique identifier to each INSERT (e.g. from a separate SEQUENCE object), and then use a MAX.
Sorry. Distributed transactions are hard :)

If I have a table with an auto-incrementing ID column, I'd like to be
able to insert a row into that table, and get the ID of the row I just
created.
FWIW, here's a slight variation of the current accepted answer (using Snowflake's 'Time Travel' feature) that gives any column values "of the row I just created." It applies to auto-incrementing sequences and more generally to any column configured with a default (e.g. CURRENT_TIMESTAMP() or UUID_STRING()). Further, I believe it avoids any inconsistencies associated with a second query utilizing MAX().
Assuming this table setup:
CREATE TABLE my_db.my_table
(
ROWID INT IDENTITY(1,1),
some_number INT,
a_time TIMESTAMP_LTZ(9),
b_time TIMESTAMP_LTZ(9),
more_data VARCHAR(10)
);
Make sure the 'Time Travel' feature (change_tracking) is enabled for this table with:
ALTER TABLE my_db.my_table SET change_tracking = true;
Perform the INSERT per usual:
INSERT INTO my_db.my_table
(
some_number,
a_time,
more_data
)
VALUES
(1, my_time_value, some_data);
Use the CHANGES clause with BEFORE(statement... and END(statement... specified as LAST_QUERY_ID() to SELECT the row(s) added to my_table which are the precise result of the previous INSERT statement (with column values that existed the moment the row(s) was(were) added, including any defaults):
SET insertQueryId=LAST_QUERY_ID();
SELECT
ROWID,
some_number,
a_time,
b_time,
more_data
FROM my_db.my_table
CHANGES(information => default)
BEFORE(statement => $insertQueryId)
END(statement => $insertQueryId);
For more information on the CHANGES, BEFORE, END clauses see the Snowflake documentation here.

Related

How to insert a row if not exists otherwise select and return its ID in both cases in MariaDB?

I have a table with ID primary key (autoincrement) and a unique column Name. Is there an efficient way in MariaDB to insert a row into this table if the same Name doesn't exist, otherwise select the existing row and, in both cases, return the ID of the row with this Name?
Here's a solution for Postgres. However, it seems MariaDB doesn't have the RETURNING id clause.
What I have tried so far is brute-force:
INSERT IGNORE INTO services (Name) VALUES ('JohnDoe');
SELECT ID FROM services WHERE Name='JohnDoe';
UPDATE: MariaDB 10.5 has RETURNING clause, however, the queries I have tried so far throw a syntax error:
WITH i AS (INSERT IGNORE INTO services (`Name`) VALUES ('John') RETURNING ID)
SELECT ID FROM i
UNION
SELECT ID FROM services WHERE `Name`='John'

For a single row, assuming id is AUTO_INCREMENT.
INSERT INTO t (name)
VALUES ('JohnDoe')
ON DUPLICATE KEY id = LAST_INSERT_ID(id);
SELECT LAST_INSERT_ID();
That looks kludgy, but it is an example in the documentation.
Caution: Most forms of INSERT will "burn" auto_inc ids. That is, they grab the next id(s) before realizing that the id won't be used. This could lead to overflowing the max auto_inc size.
It is also wise not to put the normalization inside the transaction that does the "meat" of the code. It ties up the table unnecessarily long and runs extra risk of burning ids in the case of rollback.
For batch updating of a 'normalization' table like that, see my notes here: http://mysql.rjweb.org/doc.php/staging_table#normalization (It avoids burning ids.)

Find Last Inserted Record MS SQL SERVER

I applied 12Lac Insert command in Single table ,
but after some time query terminated , How can I find Last
Inserted Record
a)Table don't have created Date column
b)Can not apply order by clause because primary key values are manually generated
c)Last() is not buit in fumction in mssql.
Or any way to find last executed query
There will be some way but not able to figure out
Table contain only primary key constrain no other constrain

As per comment request here a quick and dirty manual solution, assuming you've got the list of INSERT statements (or the according data) in the same sequence as the issued INSERTs. For this example I assume 1 million records.
INSERT ... VALUES (1, ...)
...
INSERT ... VALUES (250000, ...)
...
INSERT ... VALUES (500000, ...)
...
INSERT ... VALUES (750000, ...)
...
INSERT ... VALUES (1000000, ...)
You just have to find the last PK, that has been inserted. Luckily in this case there is one. So you start doing a manual binary search in the table issuing
SELECT pk FROM myTable WHERE pk = 500000
If you get a row back, you know it got so far. Continue checking with pk = 750000. Then again, if it is there with pk = 875000. If 750000 is not there, then the INSERTs must have stopped earlier. Then check for pk = 675000. This process stops in this case after 20 steps.
It's just plain manual divide and conquer.

There is a way.
Unfortunately you have to do this in advance so it helps you.
So if you have, by any chance the PRIMARY KEYS you inserted, still at hand go ahead and delete all rows that have those keys:
DELETE FROM tableName WHERE ID IN (id1, id2, ...., idn)
Then you enable Change Data Capture for your database (have the db already selected):
EXEC sys.sp_cdc_enable_db;
Now you also need to enable Change Data Capture for that table, in an example that I've tried I could just run:
EXEC sys.sp_cdc_enable_table #source_schema = N'dbo', #source_name = N'tableName', #role_name = null
Now you are almost setup! You need to look into your system services and verify that SQL Server Agent is running for your DBMS, if it does not capturing will not happen.
Now when you insert something into your table you can select data changes from a new table called [cdc].[dbo_tableName_CT]:
SELECT [__$start_lsn]
,[__$end_lsn]
,[__$seqval]
,[__$operation]
,[__$update_mask]
,[ID]
,[Value]
FROM [cdc].[dbo_tableName_CT]
GO
An example output of this looks like this:
you can order by __$seqval that should give you the order in which the rows were inserted.
NOTE: this feature seems not to be present in SQL Server Express

How to use multiple identity numbers in one table?

I have an web application that creates printable forms, these forms have a unique number on them, the problem is I have 2 forms that separate numbers need to be created for them.
ie)
Form1- Numbered 2000000-2999999
Form2- Numbered 3000000-3999999
dbo.test2 - is my form information table
Tsel - is my autoinc table for the 3000000 series numbers
Tadv - is my autoinc table for the 2000000 series numbers
What I have done is create 2 tables with just autoinc row (one for 2000000 series numbers and one for 3000000 series numbers), I then created a trigger to add a record to the coresponding table, read back the autoinc number and add it to my table that stores the form information including the just created autoinc number for the right series of forms.
Although it does work, I'm concerned that the numbers will get messed up under load.
I'm not sure the ##IDENTITY will always return the right value when many people are using the system. (I cannot have duplicates and I need to use the numbering form show above.
See code below.
**** TRIGGER ****
CREATE TRIGGER MAKEANID2 ON dbo.test2
AFTER INSERT
AS
SET NOCOUNT ON
declare #someid int
declare #someid2 int
declare #startfrom int
declare #test1 varchar(10)
select #someid=##IDENTITY
select #test1 = (Select name1 from test2 where sysid = #someid )
if #test1 = 'select'
begin
insert into Tsel Default values
select #someid2 = ##IDENTITY
end
if #test1 = 'adv'
begin
insert into Tadv Default values
select #someid2 = ##IDENTITY
end
update test2
set name2=(#someid2) where sysid = #someid
SET NOCOUNT OFF

The best way to keep the two IDs in sync is to create a persisted Computed Column based on the actual identity column. Where Col1 is the identity column and Col2 is the persisted computed column that is the result of some formula based on Col1. You can then even Create Indexes on Computed Columns.
test this out:
CREATE TABLE YourTable
(Col1 int not null identity(2000000,1)
,Col2 AS (Col1-2000000+3000000) PERSISTED
,Col3 varchar(5)
)
GO
insert into YourTable (col3) values ('a')
insert into YourTable (col3) SELECT 'b' UNION SELECT 'c'
SELECT * FROM YourTable
OUTPUT:
Col1 Col2 Col3
----------- ----------- -----
2000000 3000000 a
2000001 3000001 b
2000002 3000002 c
(3 row(s) affected)
EDIT After OPs comments, I'm still not 100% sure what you are after.
I never used SQL Server 2000 (we skipped that version), and I don't really want to look up how to do everything in that version, it is so limited without the OUTPUT clause and ROW_NUMBER(), CTEs, etc.
I can think of three methods to do:
1) You could just create a sequence table, where you have 2 rows one for A and one for B, each time you need to insert one, look up, increment, and save the value of the type of seq you need and then insert with that value. for example if you are inserting a type "A" row, do this:
INSERT INTO test2
(col1, col2, col3,...)
SELECT
ISNULL(MAX(NextSeq),0)+1, col2, col3,...
FROM YourSequenceTable WITH (UPDLOCK, HOLDLOCK)
WHERE SequenceType='A'
UPDATE YourSequenceTable
SET NextSeq=ISNULL(NextSeq,0)+1
WHERE SequenceType='A'
2) change your table structure to just save the data in Tsel or Tadv and have a trigger insert into a third common table table where you can have your additional "common" identity. common table would be like
CommonTable
ID int not null indentity(1,1) primary key
TselID int null FK to Tsel.PK
TadvID int null FK to Tadv.PK
3) if you need a single table, try this, which is a real hack. Change your Tsel and Tadv tables to contain all the necessary columns and from the application INSERT INTO Tsel when the value is select and have a trigger grab that identity value and then INSERT that into test2, then remove the data from tsel. Then, from the application when the value is adv just INSERT INTO Tadv an have a trigger on that table insert the data into test2, and remove the data from Tadv. You need to have all data columns in Tsel and Tadv so the trigger can copy the values to test2, but the trigger will remove the rows from there (the identity will be sequential even if the original rows are removed).
your Tsel trigger would look like:
CREATE Trigger MAKEANID2_Tsel ON dbo.Tsel
AFTER INSERT
AS
--copy data from Tsel into test2., test2 can still have its own identity value
INSERT INTO test2
(PK, col1, col2, col3,...)
SELECT
col0, col1, col2, col3,....
FROM INSERTED
--remove rows from Tsel, which were just copied and not needed anymore.
DELETE Tsel
WHERE PK IN (SELECT PK FROM INSERTED)
GO

YOu are right to worry about ##identity, it is not a recommended peice of code, if somone else adds a differnet trigger that inserets an identity and that one triggers first, that is the value you will get.
But you have much bigger problems. Your trigger is deisgned to work on only one record ata time. This is a very very very bad thing to do with a trigger. Triggers operate on sets of data and must ALWAYS even if you think therer will never be more than one record inserted ata time) be set up to handle sets of data not one record. Further, you don;t need to ask for the identity, you have the identities of all records inserted inteh batch in a psuedotable availlble in triggers called inserted.
Now reading one of your comments, you say you can't have any missing values at all. Inthat case you cannot under any circustance use an identity column as it will have gaps if any transaction is rolled back. You will have to write your own process to create the numbers based onteh last number and look out for race conditions.

Asking a Microsoft SQL Server database for the next auto-generated identifier on a table

I have a table in a SQL Server database that has an auto-generated integer primary key. Without inserting a record into the table, I need to query the database and get what the next auto-generated ID number will be.
I think it's SQL Server version 2005, if that makes a difference.
Is there a way to do this?

Yes, but it's unreliable because another session might use the expected number.
If you still want to do this, use IDENT_CURRENT
Edit, as the comments have pointed out (improving my answer):
you need to add one IDENT_INCR('MyTable') to this to get the potential next number
another process may rollback and this number may not be the one used anyway

No, there is not. The ID will only ever be defined and handed out when the actual INSERT happens.
You can check the last given ID by using
DBCC CHECKIDENT('YourTableName')
but that's just the last one used - no guarantee that the next one is really going to be this value + 1 - it could be - but no guarantees

The only way to get a number that is guranteed not to be used by another process (i.e., a race condition) is to do the insert - is there any reason you can't do a NULL insert (i.e., just insert into the table with NULLs or default values for all other columns) and then subsequently UPDATE it?
i.e.,
CREATE TABLE bob (
seq INTEGER IDENTITY (1,1) NOT NULL,
col1 INTEGER NULL
)
GO
DECLARE #seqid INTEGER
INSERT INTO bob DEFAULT VALUES
SET #seqid = SCOPE_IDENTITY()
-- do stuff with #seqid
UPDATE bob SET col1 = 42 WHERE seq = #seqid
GO

You shouldn't use the technique in code, but if you need to do it for investigative purposes:
select ident_current(‘foo’) + ident_incr(‘foo’)
That gives you the last value generated + the incrementation for the identity, so should represent the next choice SQL would make without inserting a row to find out. This is a correct value even if a rollback has pushed the value forwards - but again, this is investigative SQL not stuff I would put in code.
The two values can also be found in the sys.identity_values DMV, the fields are increment_value and last_value.

Another way, depending on what your doing, is inserting whatever data goes into the table, and then using ##identity to retrieve the id of the record inserted.
example:
declare #table table (id int identity(1,1), name nvarchar(10))
insert into #table values ('a')
insert into #table values ('b')
insert into #table values ('c')
insert into #table values ('d')
select ##identity
insert into #table values ('e')
insert into #table values ('f')
select ##identity

This is pretty much a bad idea straight off the bat, but if you don't anticipate high volume and/or concurrency issues, you could just do something like this
select #nextnum = max(Id) + 1 from MyTable

I don't think thats possible out of the box in MS SQL (any version). You can do this with column type uniqueidentifier and using function NEWID().
For int column, you would have to implement your own sequential generator.

Row number in Sybase tables

Sybase db tables do not have a concept of self updating row numbers. However , for one of the modules , I require the presence of rownumber corresponding to each row in the database such that max(Column) would always tell me the number of rows in the table.
I thought I'll introduce an int column and keep updating this column to keep track of the row number. However I'm having problems in updating this column in case of deletes. What sql should I use in delete trigger to update this column?

You can easily assign a unique number to each row by using an identity column. The identity can be a numeric or an integer (in ASE12+).
This will almost do what you require. There are certain circumstances in which you will get a gap in the identity sequence. (These are called "identity gaps", the best discussion on them is here). Also deletes will cause gaps in the sequence as you've identified.
Why do you need to use max(col) to get the number of rows in the table, when you could just use count(*)? If you're trying to get the last row from the table, then you can do
select * from table where column = (select max(column) from table).
Regarding the delete trigger to update a manually managed column, I think this would be a potential source of deadlocks, and many performance issues. Imagine you have 1 million rows in your table, and you delete row 1, that's 999999 rows you now have to update to subtract 1 from the id.

Delete trigger
CREATE TRIGGER tigger ON myTable FOR DELETE
AS
update myTable
set id = id - (select count(*) from deleted d where d.id < t.id)
from myTable t
To avoid locking problems
You could add an extra table (which joins to your primary table) like this:
CREATE TABLE rowCounter
(id int, -- foreign key to main table
rownum int)
... and use the rownum field from this table.
If you put the delete trigger on this table then you would hugely reduce the potential for locking problems.
Approximate solution?
Does the table need to keep its rownumbers up to date all the time?
If not, you could have a job which runs every minute or so, which checks for gaps in the rownum, and does an update.
Question: do the rownumbers have to reflect the order in which rows were inserted?
If not, you could do far fewer updates, but only updating the most recent rows, "moving" them into gaps.
Leave a comment if you would like me to post any SQL for these ideas.

I'm not sure why you would want to do this. You could experiment with using temporary tables and "select into" with an Identity column like below.
create table test
(
col1 int,
col2 varchar(3)
)
insert into test values (100, "abc")
insert into test values (111, "def")
insert into test values (222, "ghi")
insert into test values (300, "jkl")
insert into test values (400, "mno")
select rank = identity(10), col1 into #t1 from Test
select * from #t1
delete from test where col2="ghi"
select rank = identity(10), col1 into #t2 from Test
select * from #t2
drop table test
drop table #t1
drop table #t2
This would give you a dynamic id (of sorts)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get identity of row inserted in Snowflake Datawarehouse - sql

Related

How to insert a row if not exists otherwise select and return its ID in both cases in MariaDB?

Find Last Inserted Record MS SQL SERVER

How to use multiple identity numbers in one table?

Asking a Microsoft SQL Server database for the next auto-generated identifier on a table

Row number in Sybase tables

Categories

Resources