batch update a column using sql

batch update a column using sql - sql

I have a database table (db2) with a very large number of rows (a couple million). I need to change the datatype of one of the columns.
In DB2 LUW there does not seem to be a way to directly change the datatype of a column (ALTER TABLE ALTER COLUMN SET DATA TYPE does not work). So I am creating a new column, copying data to it and dropping the old column.
Since its not a good idea to do a direct update on the table, I'm creating a procedure which will update and commit 10000 rows at a time.
Given this, I have the following questions:
What is the best way to carry out the update here? - as far as I can tell, the cursor allows iteration over 1 row at a time. Is updating 10000 rows one at a time, then committing, and repeating until the table is updated the correct way to do it?
Is there any better way to handle the original issue of changing the data type of a column in a simpler way?

DB2 has a feature called LOAD FROM CURSOR that allows for fast migration of data.
The following is an example of this using LOAD FROM CURSOR:
-- this is the original table
CREATE TABLE TEST (
ID INT NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
TEXT VARCHAR(50)
)#
CREATE TABLE TEST_NEW (
ID INT NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
TEXT CLOB(5000000)
)#
DECLARE C1 CURSOR FOR SELECT ID, TEXT FROM TEST#
LOAD FROM C1 OF CURSOR INSERT INTO TEST_NEW (ID,TEXT)#
DROP TABLE TEST#
RENAME TABLE TEST_NEW TO TEST#
In addition, the following procedure can also be used (commit after every 10000 records):
-- this is the original table
CREATE TABLE TEST (
ID INT NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
TEXT VARCHAR(50)
)#
CREATE OR REPLACE PROCEDURE MIGRATE_TEST()
LANGUAGE SQL
BEGIN
DECLARE EOF INT DEFAULT 0;
DECLARE CUR_COUNT INT DEFAULT 0;
DECLARE CUR_ID INT DEFAULT 0;
DECLARE C CURSOR WITH HOLD FOR
SELECT ID FROM TEST WHERE TEXT_NEW IS NULL;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET EOF = 1;
OPEN C;
FETCH_LOOP: LOOP
FETCH FROM C INTO CUR_ID;
IF EOF <> 0 THEN
LEAVE FETCH_LOOP;
END IF;
UPDATE TEST
SET TEXT_NEW = TEXT
WHERE ID = CUR_ID ;
SET CUR_COUNT = CUR_COUNT + 1;
IF CUR_COUNT >= 10000 THEN
CALL DBMS_OUTPUT.PUT_LINE('COMMITTING');
COMMIT WORK;
SET CUR_COUNT = 0;
END IF;
END LOOP FETCH_LOOP;
COMMIT WORK;
CLOSE C;
END#

Related

Generate a unique column sequence value based on a query handling concurrency

I have a requirement to automatically generate a column's value based on another query's result. Because this column value must be unique, I need to take into consideration concurrent requests. This query needs to generate a unique value for a support ticket generator.
The template for the unique value is CustomerName-Month-Year-SupportTicketForThisMonthCount.
So the script should automatically generate:
AcmeCo-10-2019-1
AcmeCo-10-2019-2
AcmeCo-10-2019-3
and so on as support tickets are created. How can ensure that AcmeCo-10-2019-1 is not generated twice if two support tickets are created at the same time for AcmeCo?
insert into SupportTickets (name)
select concat_ws('-', #CustomerName, #Month, #Year, COUNT())
from SupportTickets
where customerName = #CustomerName
and CreatedDate between #MonthStart and #MonthEnd;

One possibility:
Create a counter table:
create table Counter (
Id int identify(1,1),
Name varchar(64)
Count1 int
)
Name is a unique identifier for the sequence, and in your case name would be CustomerName-Month-Year i.e. you would end up with a row in this table for every Customer/Year/Month combination.
Then write a stored procedure similar to the following to allocate a new sequence number:
create procedure [dbo].[Counter_Next]
(
#Name varchar(64)
, #Value int out -- Value to be used
)
as
begin
set nocount, xact_abort on;
declare #Temp int;
begin tran;
-- Ensure we have an exclusive lock before changing variables
select top 1 1 from dbo.Counter with (tablockx);
set #Value = null; -- if a value is passed in it stuffs us up, so null it
-- Attempt an update and assignment in a single statement
update dbo.[Counter] set
#Value = Count1 = Count1 + 1
where [Name] = #Name;
if ##rowcount = 0 begin
set #Value = 10001; -- Some starting value
-- Create a new record if none exists
insert into dbo.[Counter] ([Name], Count1)
select #Name, #Value;
end;
commit tran;
return 0;
end;

You could look into using a TIME type instead of COUNT() to create unique values. That way it is much less likely to have duplicates. Hope that helps

Best way to generate a UniqueID for a group of rows?

This is very simplified but I have a web service array of items that look something like this:
[12345, 34131, 13431]
and I am going to be looping through the array and inserting them one by one into a database and I want that table to look like this. These values would be tied to a unique identifier showing that they were
1 12345
1 34131
1 13431
and then if another array came along it would then insert all of its numbers with unique ID 2.... basically this is to keep track of groups.
There will be multiple processes executing this potentially at the same time so what would be the best way to generate the unique identifier and also ensure that 2 processes couldn't have used the same one?

You should fix your data model. It is missing an entity, say, batches.
create table batches (
batch_id int identity(1, 1) primary key,
created_at datetime default getdate()
);
You might have other information as well.
And your table should have a foreign key reference, batch_id to batches.
Then your code should do the following:
Insert a new row into batches. A new batch has begun.
Fetch the id that was just created.
Use this id for the rows that you want to insert.
Although you could do this with a sequence, a separate table makes more sense to me. You are tying a bunch of rows together into something. That something should be represented in the data model.

You can declare this :
DECLARE #UniqueID UNIQUEIDENTIFIER = NEWID();
and use this as your unique identifier when you insert your batch

Since it isn't a primary key, an identity column is out. Honestly I'd probably just track it using a separate id sequence table. Create a proc that grabs the next available ID and then increments it. If you open a transaction at the beginning of the proc it should prevent the second thread from getting the number until the first thread is done with it's update.
Something like:
CREATE PROCEDURE getNextID
#NextNumber INT OUTPUT
,#id_type VARCHAR(20)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #NextValue TABLE (NextNumber int);
BEGIN TRANSACTION;
UPDATE id_sequence
SET last_used_number = ISNULL(#NextNumber, 0) + 1
OUTPUT inserted.last_used_number INTO #NextValue(NextNumber)
WHERE id_type = #id_type
SELECT #NextNumber = NextNumber FROM #NextValue
COMMIT TRANSACTION;
END

SQL update if exist and insert else and return the key of the row

I have a table named WORD with the following columns
WORD_INDEX INT NOT NULL AUTO_INCREMENT,
CONTENT VARCHAR(255),
FREQUENCY INT
What I want to do is when I try to add a row to the table if a row with the same CONTENT exits, I want to increment the FREQUENCY by 1. Otherwise I want to add the row to the table. And then the WORD_INDEX in the newly inserted row or updated row must be returned.
I want to do this in H2 database from one query.
I have tried 'on duplicate key update', but this seems to be not working in H2.
PS- I can do this with 1st making a select query with CONTENT and if I get a empty result set, makeing insert query and otherwise making a update query. But as I have a very large number of words, I am trying to optimize the insert operation. So what I am trying to do is reducing the database interactions I am making.

Per your edited question .. you can achieve this using a stored procedure like below [A sample code]
DELIMITER $$
create procedure sp_insert_update_word(IN CONTENT_DATA VARCHAR(255),
IN FREQ INT, OUT Insert_Id INT)
as
begin
declare #rec_count int;
select #rec_count = count(*) from WORD where content = CONTENT_DATA;
IF(#rec_count > 0) THEN
UPDATE WORD SET FREQUENCY = FREQUENCY + 1 where CONTENT = CONTENT_DATA;
SELECT NULL INTO Insert_Id;
else
INSERT INTO WORD(CONTENT, FREQUENCY) VALUES(CONTENT_DATA, FREQ);
SELECT LAST_INSERT_ID() INTO Insert_Id;
END IF;
END$$
DELIMITER ;
Then call your procedure and select the returned inserted id like below
CALL sp_insert_update_word('some_content_data', 3, #Insert_Id);
SELECT #Insert_Id;
The above procedure code essentially just checking that, if the same content already exists then perform an UPDATE otherwise perform an INSERT. Finally return the newly generated auto increment ID if it's insert else return null.

First try to update frequency where content = "your submitted data here". If the affected row = 0 then insert a new row. You also might want make CONTENT unique considering it will always stored different data.

Generating the Next Id when Id is non-AutoNumber

I have a table called Employee. The EmpId column serves as the primary key. In my scenario, I cannot make it AutoNumber.
What would be the best way of generating the the next EmpId for the new row that I want to insert in the table?
I am using SQL Server 2008 with C#.
Here is the code that i am currently getting, but to enter Id's in key value pair tables or link tables (m*n relations)
Create PROCEDURE [dbo].[mSP_GetNEXTID]
#NEXTID int out,
#TABLENAME varchar(100),
#UPDATE CHAR(1) = NULL
AS
BEGIN
DECLARE #QUERY VARCHAR(500)
BEGIN
IF EXISTS (SELECT LASTID FROM LASTIDS WHERE TABLENAME = #TABLENAME and active=1)
BEGIN
SELECT #NEXTID = LASTID FROM LASTIDS WHERE TABLENAME = #TABLENAME and active=1
IF(#UPDATE IS NULL OR #UPDATE = '')
BEGIN
UPDATE LASTIDS
SET LASTID = LASTID + 1
WHERE TABLENAME = #TABLENAME
and active=1
END
END
ELSE
BEGIN
SET #NEXTID = 1
INSERT INTO LASTIDS(LASTID,TABLENAME, ACTIVE)
VALUES(#NEXTID+1,#TABLENAME, 1)
END
END
END

Using MAX(id) + 1 is a bad idea both performance and concurrency wise.
Instead you should resort to sequences which were design specifically for this kind of problem.
CREATE SEQUENCE EmpIdSeq AS bigint
START WITH 1
INCREMENT BY 1;
And to generate the next id use:
SELECT NEXT VALUE FOR EmpIdSeq;
You can use the generated value in a insert statement:
INSERT Emp (EmpId, X, Y)
VALUES (NEXT VALUE FOR EmpIdSeq, 'x', 'y');
And even use it as default for your column:
CREATE TABLE Emp
(
EmpId bigint PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR EmpIdSeq),
X nvarchar(255) NULL,
Y nvarchar(255) NULL
);
Update: The above solution is only applicable to SQL Server 2012+. For older versions you can simulate the sequence behavior using dummy tables with identity fields:
CREATE TABLE EmpIdSeq (
SeqID bigint IDENTITY PRIMARY KEY CLUSTERED
);
And procedures that emulates NEXT VALUE:
CREATE PROCEDURE GetNewSeqVal_Emp
#NewSeqVal bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON
INSERT EmpIdSeq DEFAULT VALUES
SET #NewSeqVal = scope_identity()
DELETE FROM EmpIdSeq WITH (READPAST)
END;
Usage exemple:
DECLARE #NewSeqVal bigint
EXEC GetNewSeqVal_Emp #NewSeqVal OUTPUT
The performance overhead of deleting the last inserted element will be minimal; still, as pointed out by the original author, you can optionally remove the delete statement and schedule a maintenance job to delete the table contents off-hour (trading space for performance).
Adapted from SQL Server Customer Advisory Team Blog.
Working SQL Fiddle

The above
select max(empid) + 1 from employee
is the way to get the next number, but if there are multiple user inserting into the database, then context switching might cause two users to get the same value for empid and then add 1 to each and then end up with repeat ids. If you do have multiple users, you may have to lock the table while inserting. This is not the best practice and that is why the auto increment exists for database tables.

I hope this works for you. Considering that your ID field is an integer
INSERT INTO Table WITH (TABLOCK)
(SELECT CASE WHEN MAX(ID) IS NULL
THEN 1 ELSE MAX(ID)+1 END FROM Table), VALUE_1, VALUE_2....

Try following query
INSERT INTO Table VALUES
((SELECT isnull(MAX(ID),0)+1 FROM Table), VALUE_1, VALUE_2....)
you have to check isnull in on max values otherwise it will return null in final result when table contain no rows .

sql stored procedure not working(no rows affected)

trying to get this stored procedure to work.
ALTER PROCEDURE [team1].[add_testimonial]
-- Add the parameters for the stored procedure here
#currentTestimonialDate char(10),#currentTestimonialContent varchar(512),#currentTestimonialOriginator varchar(20)
AS
BEGIN
DECLARE
#keyValue int
SET NOCOUNT ON;
--Get the Highest Key Value
SELECT #keyValue=max(TestimonialKey)
FROM Testimonial
--Update the Key by 1
SET #keyValue=#keyValue+1
--Store into table
INSERT INTO Testimonial VALUES (#keyValue, #currentTestimonialDate, #currentTestimonialContent, #currentTestimonialOriginator)
END
yet it just returns
Running [team1].[add_testimonial] ( #currentTestimonialDate = 11/11/10, #currentTestimonialContent = this is a test, #currentTestimonialOriginator = theman ).
No rows affected.
(0 row(s) returned)
#RETURN_VALUE = 0
Finished running [team1].[add_testimonial].
and nothing is added to the database, what might be the problem?

There may have problems in two place:
a. There is no data in the table so, max(TestimonialKey) returns null, below is the appropriate way to handle it.
--Get the Highest Key Value
SELECT #keyValue= ISNULL(MAX(TestimonialKey), 0)
FROM Testimonial
--Update the Key by 1
SET #keyValue=#keyValue+1
b. Check your data type of the column currentTestimonialDate whether it is char or DateTime type, if this field is datetime type in the table then convert #currentTestimonialDate to DateTime before inserting to the table.
Also, check number of columns that are not null allowed and you're passing data to them.
If you're not passing data for all columns then try by specifying columns name as below:
--Store into table
INSERT INTO Testimonial(keyValue, currentTestimonialDate,
currentTestimonialContent, currentTestimonialOriginator)
VALUES (#keyValue, #currentTestimonialDate,
#currentTestimonialContent, #currentTestimonialOriginator)
EDIT:
After getting the comment from marc_s:
Make keyValue as INT IDENTITY, If multiple user call it concurrently that wont be problem, DBMS will handle it, so the ultimate query in procedure might be as below:
ALTER PROCEDURE [team1].[add_testimonial]
-- Add the parameters for the stored procedure here
#currentTestimonialDate char(10),
#currentTestimonialContent varchar(512),#currentTestimonialOriginator varchar(20)
AS
BEGIN
SET NOCOUNT ON;
--Store into table
INSERT INTO Testimonial VALUES (#currentTestimonialDate,
#currentTestimonialContent, #currentTestimonialOriginator)
END

Two issues that I can spot:
SELECT #keyValue=max(TestimonialKey)
should be
SELECT #keyValue=ISNULL(max(TestimonialKey), 0)
To account for the case when there are no records in the database
Second, I believe that with NOCOUNT ON, you will not return the count of inserted rows to the caller. So, before your INSERT statement, add
SET NOCOUNT OFF

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

batch update a column using sql - sql

Related

Generate a unique column sequence value based on a query handling concurrency

Best way to generate a UniqueID for a group of rows?

SQL update if exist and insert else and return the key of the row

Generating the Next Id when Id is non-AutoNumber

sql stored procedure not working(no rows affected)

Categories

Resources