I have a SQL Server table with three columns, the first two columns are the primary key. I'm writing a stored procedure that will update the last two columns in mass and it works fine for that as long as there are are no primary key violations but when there is a primary key violation it throws an error and stops executing.
How can I make it to ignore the line and continue updating the record as long as there is no primary key violation?
Is there a better way to approach this problem? I'm only doing a simple update with where as column2= somevalue AND column 3 = some value.
In SQL Server you'd use MERGE to upsert (i.e. insert or update):
MERGE mytable
USING (SELECT 1 as key1, 2 as key2, 3 as col1, 4 as col2) AS src
ON (mytable.key1 = src.key1 AND mytable.key2 = src.key2)
WHEN MATCHED THEN
UPDATE SET col1 = src.col1, col2 = src.col2
WHEN NOT MATCHED THEN
INSERT (key1, key2, col1, col2) VALUES (src.key1, src.key2, src.col1, src.col2);
There is nothing inherently wrong with your question, despite the rather loud protestations. Your question is confusing, especially when you refer to columns by position. That is a big no-no. So, a script that demonstrates your problem is generally the best way to both demonstrate your problem and get useful suggestions.
The short answer to your question is - you can't. A statement either succeeds or fails as a whole. If you want to update each row individually and ignore certain errors, then you need to write your tsql to do that.
And despite the protests (again), there are situations where it is necessary to update columns that are part of the primary key. It is unusual - very unusual - but you should also be wary of any absolute statement about tsql. When you find yourself doing unusual things, you should review your schema (and your approach) because it is quite possible that there are better ways to accomplish your goal.
And in this case, I suggest that you SHOULD really think about what you are trying to accomplish. If you want to update a set of rows in a particular way and the statement fails - that means there is a flaw somewhere!. Typically, this error implies that your update logic is not correct. Perhaps you assume something about your data that is not accurate? It is impossible to know from a distance. The error message will tell you what set of values caused the conflict - so that should give you sufficient information to investigate. As another tool, write a select statement that demonstrates your proposed update and look for the values in the error message. E.g.
set nocount on;
create table #x (a smallint not null, b smallint not null, c varchar(10) not null, constraint xx primary key(a, b));
insert #x (a, b, c) values (1, 1, 'test'), (1, 2, 'zork');
select * from #x;
update #x set b = 2, c = 'dork';
select a, b, c, cast(2 as smallint) as new_b, 'dork' as new_c
from #x
order by a, new_b;
drop table #x;
Related
If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created. I know that generally, StackOverflow questions need some sort of code that was attempted or research effort, but I'm not sure where to begin with Snowflake. I've dug through their documentation and I've found nothing for this.
The best I could do so far is try result_scan() and last_query_id(), but these don't give me any relevant information about the row that was inserted, just confirmation that a row was inserted.
I believe what I'm asking for is along the lines of MS SQL Server's SCOPE_IDENTITY() function.
Is there a Snowflake equivalent function for MS SQL Server's SCOPE_IDENTITY()?
EDIT: for the sake of having code in here:
CREATE TABLE my_db..my_table
(
ROWID INT IDENTITY(1,1),
some_number INT,
a_time TIMESTAMP_LTZ(9),
b_time TIMESTAMP_LTZ(9),
more_data VARCHAR(10)
);
INSERT INTO my_db..my_table
(
some_number,
a_time,
more_data
)
VALUES
(1, my_time_value, some_data);
I want to get to that auto-increment ROWID for this row I just inserted.
NOTE: The answer below can be not 100% correct in some very rare cases, see the UPDATE section below
Original answer
Snowflake does not provide the equivalent of SCOPE_IDENTITY today.
However, you can exploit Snowflake's time travel to retrieve the maximum value of a column right after a given statement is executed.
Here's an example:
create or replace table x(rid int identity, num int);
insert into x(num) values(7);
insert into x(num) values(9);
-- you can insert rows in a separate transaction now to test it
select max(rid) from x AT(statement=>last_query_id());
----------+
MAX(RID) |
----------+
2 |
----------+
You can also save the last_query_id() into a variable if you want to access it later, e.g.
insert into x(num) values(5);
set qid = last_query_id();
...
select max(rid) from x AT(statement=>$qid);
Note - it will be usually correct, but if the user e.g. inserts a large value into rid manually, it might influence the result of this query.
UPDATE
Note, I realized the code above might rarely generate incorrect answer.
Since the execution order of various phases of a query in a distributed system like Snowflake can be non-deterministic, and Snowflake allows concurrent INSERT statements, the following might happen
Two queries, Q1 and Q2, do a simple single row INSERT, start at roughly the same time
Q1 starts, is a bit ahead
Q2 starts
Q1 creates a row with value 1 from the IDENTITY column
Q2 creates a row with value 2 from the IDENTITY column
Q2 gets ahead of Q1 - this is the key part
Q2 commits, is marked as finished at time T2
Q1 commits, is marked as finished at time T1
Note that T1 is later than T2. Now, when we try to do SELECT ... AT(statement=>Q1), we will see the state as-of T1, including all changes from statements before, hence including the value 2 from Q2. Which is not what we want.
The way around it could be to add a unique identifier to each INSERT (e.g. from a separate SEQUENCE object), and then use a MAX.
Sorry. Distributed transactions are hard :)
If I have a table with an auto-incrementing ID column, I'd like to be
able to insert a row into that table, and get the ID of the row I just
created.
FWIW, here's a slight variation of the current accepted answer (using Snowflake's 'Time Travel' feature) that gives any column values "of the row I just created." It applies to auto-incrementing sequences and more generally to any column configured with a default (e.g. CURRENT_TIMESTAMP() or UUID_STRING()). Further, I believe it avoids any inconsistencies associated with a second query utilizing MAX().
Assuming this table setup:
CREATE TABLE my_db.my_table
(
ROWID INT IDENTITY(1,1),
some_number INT,
a_time TIMESTAMP_LTZ(9),
b_time TIMESTAMP_LTZ(9),
more_data VARCHAR(10)
);
Make sure the 'Time Travel' feature (change_tracking) is enabled for this table with:
ALTER TABLE my_db.my_table SET change_tracking = true;
Perform the INSERT per usual:
INSERT INTO my_db.my_table
(
some_number,
a_time,
more_data
)
VALUES
(1, my_time_value, some_data);
Use the CHANGES clause with BEFORE(statement... and END(statement... specified as LAST_QUERY_ID() to SELECT the row(s) added to my_table which are the precise result of the previous INSERT statement (with column values that existed the moment the row(s) was(were) added, including any defaults):
SET insertQueryId=LAST_QUERY_ID();
SELECT
ROWID,
some_number,
a_time,
b_time,
more_data
FROM my_db.my_table
CHANGES(information => default)
BEFORE(statement => $insertQueryId)
END(statement => $insertQueryId);
For more information on the CHANGES, BEFORE, END clauses see the Snowflake documentation here.
I'd like something like
INSERT VALUES(1,2,3) INTO sometable ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
So I'd like The following behavior:
#CREATE TABLE sometable (a int primary key, b int, c int);
CREATE TABLE
#INSERT INTO sometable (1,2,3) ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
INSERT 0 1
#INSERT INTO sometable (1,2,3) ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
INSERT 0 0
#INSERT INTO sometable (1,3,2) ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
ERROR: duplicate key value violates unique constraint "sometable_pkey"
DETAIL: Key (a)=(1) already exists.
Desiring this seems a very natural thing, because a client application can't assume it will know if an insert succeeded (if postgres or the client crashes or the network fails, the request might have been processed but the client never receives confirmation). So any well written application needs to deal with this case somehow.
However, the least bad way of achieving this that I have found is still very annoying:
INSERT INTO sometable (a,b,c) VALUES(1,2,3) ON CONFLICT(a) UPDATE set sometable.b=2 WHERE sometable.b=2 AND sometable.c=3;
In other words, do a no-op update, but only if the values are what you would have inserted and then throw an error if 0 rows (rather than 1) where touched.
Is there a better way?
You can use an INSERT based on a select:
insert into sometable
select *
from ( values (1,2,3) ) as data(a,b,c)
where not exists (select *
from sometable
where data = sometable);
Yes, the condition where data = sometable is valid in Postgres and simply compares all columns.
This can also be extended to multiple rows:
insert into sometable
select *
from (
values
(1,2,3),
(4,5,6),
(7,8,9)
) as data(a,b,c)
where not exists (select *
from sometable
where data = sometable);
This does not prevent PK violation errors (as on conflict does) if done from multiple transactions though. You still need to handle those errors.
I want to set up a table with a constraint on it, but when I insert records, I don't want to get any constraint violation errors. I would like SQL to quietly drop any records that aren't unique, but carry on inserting those that can be inserted.
for example....
create table table1
(value1 int,
value2 int,
constraint uc_tab1 Unique (value1,value2)
)
create table table2
(value1 int,
value2 int
)
insert into table2 (value1,value2)
select 1,1
union all
select 2,1
union all
select 3,1
union all
select 1,1
insert into table1
select value1,value2 from table2
At the moment, this will fall over on a violation constraint. I want to suppress that error, so that table1 contains...
1,1
2,1
3,1
(in this example, I could just do a group by on table2, but in my actual application that isn't really viable)
I vaguely remember reading something about this years ago, but I might have imagined it. Is this possible?
Many thanks in advance
Please don't do this, you will lose data very easily
Instead try to change your application so it only inserts valid data isntead of dropping incorrect data
You can use the IGNORE_DUP_KEY index option, although personally I think it is better to find another way of solving your problem.
You can set it to ON to only generate warnings for inserted rows that violate the unique constraint instead of generating errors.
Look into the MERGE statement. It's complex, but can be made to do what you are describing.
(There is or was something that could cause an INSERT statement to continue to insert data even if some rows could not be inserted, but for the life of me I can't find it in BOL or recall what it was called. I'm pretty sure it raised errors anyway, and it always sounded like a horrible idea to me.)
Specifying Ignore_Dup_Key when I created my constraint did the trick. In the above example, I changed the table1 definition to....
create table table1
(value1 int,
value2 int,
constraint uc_tab1 Unique (value1,value2) WITH (IGNORE_DUP_KEY = ON)
)
And it worked perfectly
I applied 12Lac Insert command in Single table ,
but after some time query terminated , How can I find Last
Inserted Record
a)Table don't have created Date column
b)Can not apply order by clause because primary key values are manually generated
c)Last() is not buit in fumction in mssql.
Or any way to find last executed query
There will be some way but not able to figure out
Table contain only primary key constrain no other constrain
As per comment request here a quick and dirty manual solution, assuming you've got the list of INSERT statements (or the according data) in the same sequence as the issued INSERTs. For this example I assume 1 million records.
INSERT ... VALUES (1, ...)
...
INSERT ... VALUES (250000, ...)
...
INSERT ... VALUES (500000, ...)
...
INSERT ... VALUES (750000, ...)
...
INSERT ... VALUES (1000000, ...)
You just have to find the last PK, that has been inserted. Luckily in this case there is one. So you start doing a manual binary search in the table issuing
SELECT pk FROM myTable WHERE pk = 500000
If you get a row back, you know it got so far. Continue checking with pk = 750000. Then again, if it is there with pk = 875000. If 750000 is not there, then the INSERTs must have stopped earlier. Then check for pk = 675000. This process stops in this case after 20 steps.
It's just plain manual divide and conquer.
There is a way.
Unfortunately you have to do this in advance so it helps you.
So if you have, by any chance the PRIMARY KEYS you inserted, still at hand go ahead and delete all rows that have those keys:
DELETE FROM tableName WHERE ID IN (id1, id2, ...., idn)
Then you enable Change Data Capture for your database (have the db already selected):
EXEC sys.sp_cdc_enable_db;
Now you also need to enable Change Data Capture for that table, in an example that I've tried I could just run:
EXEC sys.sp_cdc_enable_table #source_schema = N'dbo', #source_name = N'tableName', #role_name = null
Now you are almost setup! You need to look into your system services and verify that SQL Server Agent is running for your DBMS, if it does not capturing will not happen.
Now when you insert something into your table you can select data changes from a new table called [cdc].[dbo_tableName_CT]:
SELECT [__$start_lsn]
,[__$end_lsn]
,[__$seqval]
,[__$operation]
,[__$update_mask]
,[ID]
,[Value]
FROM [cdc].[dbo_tableName_CT]
GO
An example output of this looks like this:
you can order by __$seqval that should give you the order in which the rows were inserted.
NOTE: this feature seems not to be present in SQL Server Express
I have a Constraint on a table with IGNORE_DUP_KEY. This allows bulk inserts to partially work where some records are dupes and some are not (only inserting the non-dupes). However, it does not allow updates to partially work, where I only want those records updated where dupes will not be created.
Does anyone know how I can support IGNORE_DUP_KEY when applying updates?
I am using MS SQL 2005
If I understand correctly, you want to do UPDATEs without specifying the necessary WHERE logic to avoid creating duplicates?
create table #t (col1 int not null, col2 int not null, primary key (col1, col2))
insert into #t
select 1, 1 union all
select 1, 2 union all
select 2, 3
-- you want to do just this...
update #t set col2 = 1
-- ... but you really need to do this
update #t set col2 = 1
where not exists (
select * from #t t2
where #t.col1 = t2.col1 and col2 = 1
)
The main options that come to mind are:
Use a complete UPDATE statement to avoid creating duplicates
Use an INSTEAD OF UPDATE trigger to 'intercept' the UPDATE and only do UPDATEs that won't create a duplicate
Use a row-by-row processing technique such as cursors and wrap each UPDATE in TRY...CATCH... or whatever the language's equivalent is
I don't think anyone can tell you which one is best, because it depends on what you're trying to do and what environment you're working in. But because row-by-row processing could potentially produce some false positives, I would try to stick with a set-based approach.
I'm not sure what is really going on, but if you are inserting duplicates and updating Primary Keys as part of a bulk load process, then a staging table might be the solution for you. You create a table that you make sure is empty prior to the bulk load, then load it with the 100% raw data from the file, then process that data into your real tables (set based is best). You can do things like this to insert all rows that don't already exist:
INSERT INTO RealTable
(pk, col1, col2, col3)
SELECT
pk, col1, col2, col3
FROM StageTable s
WHERE NOT EXISTS (SELECT
1
FROM RealTable r
WHERE s.pk=r.pk
)
Prevent the duplicates in the first place is best. You could also do UPDATEs on your real table by joining in the staging table, etc. This will avoid the need to "work around" the constraints. When you work around the constraints, you usually create difficult to find bugs.
I have the feeling you should use the MERGE statement and then in the update part you should really not update the key you want to have unique. That also means that you have to define in your table that a key is unique (Setting a unique index or define as primary key). Then any update or insert with a duplicate key will fail.
Edit: I think this link will help on that:
http://msdn.microsoft.com/en-us/library/bb522522.aspx