Adding/updating bulk data using SQL - sql

We are inserting bulk data into one of our database tables using SQL Server Management studio. Currently we are in a position where the data being sent to the database will be added to a particular row in a table (this is controlled by a stored procedure). What we are finding is that a timeout is occurring before the operation completes; at this point we think the operation is slow because of the while loop but we're unsure of how to approach writing a faster equivalent.
-- Insert statements for procedure here
WHILE #i < #nonexistingTblCount
BEGIN
Insert into AlertRanking(MetricInstanceID,GreenThreshold,RedThreshold,AlertTypeID,MaxThreshold,MinThreshold)
VALUES ((select id from #nonexistingTbl order by id OFFSET #i ROWS FETCH NEXT 1 ROWS ONLY), #greenThreshold, #redThreshold, #alertTypeID, #maxThreshold, #minThreshold)
set #id = (SELECT ID FROM AlertRanking
WHERE MetricInstanceID = (select id from #nonexistingTbl order by id OFFSET #i ROWS FETCH NEXT 1 ROWS ONLY)
AND GreenThreshold = #greenThreshold
AND RedThreshold = #redThreshold
AND AlertTypeID = #alertTypeID);
set #i = #i + 1;
END
Where #nonexistingTblCount is the total number of rows inside the table #nonexistingTbl. The #nonexistingTbl table is declared earlier and contains all the values we want to add to the table.

Instead of using a loop, you should be able to insert all of the records with a single statement.
INSERT INTO AlertRanking(MetricInstanceID,GreenThreshold,RedThreshold,AlertTypeID,MaxThreshold,MinThreshold)
SELECT id, #greenThreshold, #redThreshold, #alertTypeID, #maxThreshold, #minThreshold FROM #nonexistingTbl ORDER BY id

Related

Azure Synapse fastest way to process 20k statements in order

I am designing an incremental update process for a cloud based database (Azure). The only existing changelog is a .txt file that records every insert, delete, and update statement that the database processes. There is no change data capture table available, or any database table that records changes and I cannot enable watermarking on the database. The .txt file is structured as follows:
update [table] set x = 'data' where y = 'data'
go
insert into [table] values (data)
go
delete from [table] where x = data
go
I have built my process to convert the .txt file into a table in the cloud as follows:
update_id | db_operation | statement | user | processed_flag
----------|--------------|-------------------------------------------------|-------|---------------
1 | 'update' | 'update [table] set x = data where y = data' | user1 | 0
2 | 'insert' | 'insert into [table] values (data)' | user2 | 0
3 | 'delete' | 'delete from [table] where x = data' | user3 | 1
I use this code to create a temporary table of the unprocessed transactions, and then loop over the table, create a sql statement and then execute that transaction.
CREATE TABLE temp_incremental_updates
WITH
(
DISTRIBUTION = HASH ( [user] ),
HEAP
)
AS
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Sequence,
[user],
[statement]
FROM upd.incremental_updates
WHERE processed_flag = 0;
DECLARE #nbr_statements INT = (SELECT COUNT(*) FROM temp_incremental_updates),
#i INT = 1;
WHILE #i <= #nbr_statements
BEGIN
DECLARE #sql_code NVARCHAR(4000) = (SELECT [statement] FROM temp_incremental_updates WHERE Sequence = #i);
EXEC sp_executesql #sql_code;
SET #i +=1;
END
DROP TABLE temp_incremental_updates;
UPDATE incremental_updates SET processed_flag = 1
This is taking a very long time, upwards of an hour. Is there a different way I can quickly processes multiple sql statements that need to occur in a specific order? Order is relevant because, for example: if I try to process a delete statement before the insert statement that created that data, azure synapse will throw an error.
Less than 2 hours for 20k individual statements is pretty good for Synapse!
Synapse isn't meant to do transactional processing. You need to convert individual updates to batch updates and execute statements like MERGE for big batches or rows instead of INSERT, UPDATE and DELETE for each row.
In your situation, you could:
Group all inserts/updates by table name
Create a temp table for each group. E.g. table1_insert_updates
Run MERGE like statement from table1_insert_updates to table1.
For deletes:
Group primary keys by table name
Run one DELETE FROM table1 where key in (primary keys) per table.
Frankly 20k is such a bad number, it's not too small and far from big enough. So even after "grouping" you could still have performance issues if you batch/group sizes are too small.
Synapse isn't meant for transaction processing. It'll merge a table with a million rows into a table with a billion rows in less than 5 minutes using a single MERGE statement to upsert a million rows, but if you run 1000 delete and 1000 insert statements one after the other it'll probably take longer!
EDIT: You'll also have to use PARTITION BY and RANK (or ROWNUMBER) to de-duplicate in case there are multiple updates to same row in a single batch. Not easy depending on how your input is (update contains all columns (even unchanged) or only changed columns) this might become very complicated.
Again Synapse is not meant for transaction processing.
Try to declare a cursor for selecting all the data from temp_incremental_updates at once, instead of making multiple reads:
CREATE TABLE temp_incremental_updates
WITH
(
DISTRIBUTION = HASH ( [user] ),
HEAP
)
AS
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Sequence,
[user],
[statement]
FROM upd.incremental_updates
WHERE processed_flag = 0;
DECLARE cur CURSOR FOR SELECT [statement] FROM temp_incremental_updates ORDER BY Sequence
OPEN cur
FETCH NEXT FROM cur INTO #sql_code
WHILE ##FETCH_STATUS = 0 BEGIN
EXEC sp_executesql #sql_code;
FETCH NEXT FROM cur INTO #sql_code
END
-- Rest of the code

Updating a column in every table in a schema in SQL Server

I want to update the column Last_Modified in every table in a given schema. This column is updated with latest timestamp if another column in the same table (ENDTIME) is updated.
To do this I have the following script in SQL Server:
DECLARE #TotalRows FLOAT
SET #TotalRows = (SELECT COUNT(*) FROM table1)
DECLARE #TotalLoopCount INT
SET #TotalLoopCount = CEILING(#TotalRows / 100000)
DECLARE #InitialLoopCount INT
SET #InitialLoopCount = 1
DECLARE #AffectedRows INT
SET #AffectedRows = 0
DECLARE #intialrows INT;
SET #intialrows = 1
DECLARE #lastrows INT
SET #lastrows = 100000;
WHILE #InitialLoopCount <= #TotalLoopCount
BEGIN
WITH updateRows AS
(
SELECT
t1.*,
ROW_NUMBER() OVER (ORDER BY caster) AS seqnum
FROM
table1 t1
)
UPDATE updateRows
SET last_modified = ENDTIME AT TIME ZONE 'Central Standard Time'
WHERE last_modified IS NULL
AND updateRows.ENDTIME IS NOT NULL
AND updateRows.seqnum BETWEEN #intialrows AND #lastrows;
SET #AffectedRows = #AffectedRows + ##ROWCOUNT
SET #intialrows = #intialrows + 100000
SET #lastrows = #lastrows + 100000
-- COMMIT
SET #Remaining = #TotalRows - #AffectedRows
SET #InitialLoopCount = #InitialLoopCount + 1
END
This script determines the count of a table, divides it by 100000 and runs only that many loops to perform the entire update. It breaks down the update in batches/loops and then perform updates on certain rows until it completes updating them all.
This script is only for 1 table, i.e table1. I want to now modify this script in such a way that it dynamically takes all the tables in a schema and runs the above script for each of them. Let's say the schema name is schema1 and it has 32 tables, so this script should run for all those 32 tables.
I am able to retrieve the tables in schema1 but I am not able to dynamically send those to this script. Can anyone please help me with this?
To dynamically change table names at runtime you're going to need something like sp_executesql. See here for an example of its use: https://stackoverflow.com/a/3556554/22194
Then you could have an outer cursor that fetches the table names and then assembles the queries in a string and executes them. Its going to look horrible though.
If your schema doesn't change much another approach would be to generate a long script with a section for each table. You generate the script by querying the table names and then repeating the script with each different table name. Excel is actually pretty good for doing that sort of thing - paste your table names into Excel, use Excel to generate the script then copy/paste it back into SSMS.
This will be a long repetitive script but will avoid the disadvantage of having all the SQL in strings.

Do While loop based on conditions in SQL Server [duplicate]

This question already has answers here:
How to loop statements in SQL Server
(2 answers)
Closed 4 years ago.
How to implement a while loop in SQL server based on the below condition.
Then I need to execute select statement which returns ITEM_CODE ,It may have multiple rows too.
What I want to do inside the while loop is for each ITEM_CODE I need to get data from other tables (including joins ) and insert those data to a table .This loop will get end the count based on the first statements return.
Sample query structure:
SELECT ITEM_CODE //While loop must execute the count of this rows
FROM 'TABLE_1'
WHERE ITEM_AVAILABILITY = 'TRUE'
This statement will return a single row or may return multiple rows .
I need to pass this ITEM_CODE to the while loop each time .Inside the while loop I will get the values from multiple tables and insert it to another table.
WHILE (#COUNT <>0){
//Need to have the ITEM_CODE while looping each time.
//Get data from multiple tables and assign to variables (SET #VARIABLES)
//Insert statement
IF (#COUNT = #COUNT)
BREAK;
}
Is it possible with SQL server ,If Yes,please help me to fix this .
Try this:
DECLARE #DataSource TABLE
(
[ITEM_CODE] VARCHAR(12)
);
INSER INTO #DataSource ([ITEM_CODE])
SELECT ITEM_CODE //While loop must execute the count of this rows
FROM 'TABLE_1'
WHERE ITEM_AVAILABILITY = 'TRUE';
DECLARe #CurrentItemCode VARCHAR(12);
WHILE(EXISTS (SELECT 1 FROM #DataSource))
BEGIN;
SELECT TOP 1 #CurrentItemCod = [ITEM_CODE]
FROM #DataSource
--
--
DELETE FROM #DataSource
WHERE [ITEM_CODE] = #CurrentItemCod;
END;
The idea is to perform a loop while there is data in our buffer table. Then in each iteration of the loop, get one random item code value, perform your internal operations and delete the value from the buffer.
Some people are using row ID column in the buffer table, for example INT IDENTITY(1,1) and are selecting the first element, doing the internal operations and then deleting by id. But you need to know the count of the records and to increment the id with each iteration - something like for cycle.

How do I group related data into a batch on an insert using Declare Cursor?

I have one table to select from, one table to update where I create a new transaction ID and another table to insert into using the transaction ID. I want to group all of my like transactions from the first table into one insert using the a transaction ID that is created on the update.
Here is the first tables data
This is the Query I am using now to select update and insert
DECLARE #TransactionNo int
Declare #Counter int
Declare #DateOut Date
Declare #Department_No nvarchar (100)
Declare #Job_Id nvarchar(50)
Declare Cur cursor for select Counter, DateOut, Job, Department from [dbo].[TimeCards_Inv]
open cur
fetch next from cur into #Counter, #DateOut, #Department_no,#Job_id
While ##Fetch_Status = 0
Begin
--creates a new transaction number
UPDATE cas_tekworks.dbo.next_number
SET next_trx = next_trx + 1
WHERE table_name_no='next_inventory_qty_jrnl'
Select #TransactionNo = [next_trx] From /cas_tekworks.dbo.next_number
WHERE table_name_no='next_inventory_qty_jrnl'
if #PunchType = 'O'
begin
-- Insert header record
INSERT INTO cas_tekworks.dbo.inventory_job_h
( transaction_no, dateout,job)
Select #TransactionNo, #DateOut, #Job_ID
end
fetch next from cur into #Counter, #DateOut, #Department_no,#Job_id
end
close cur
deallocate cur
Currently this code creates a new transaction ID for each Line in the database but I would like it to create just 1 transaction when I group by job and department. These are the current results. In my scenario. I would like it just to have 160 for job 10000 not two transaction ids.
How do you want your update?
I mean if you want to add the number of line, why making a loop adding one, instead of doing a count adding the result?
You have this possibility if you want to add the count :
UPDATE cas_tekworks.dbo.next_number
SET next_trx = next_trx + (select count(*) from [dbo].[TimeCards_Inv]) --it's possible you have to do a distinct or something else here
WHERE table_name_no='next_inventory_qty_jrnl'
If you can add multiple values, it's still possible to do it in one operation.
You can create a temp table with the appropriate values, and in your update you add an output clause saving all the information you need. In your case it would be job, date, id.
Then you can make an insert from the temp table.
I resolved this issue by creating the transaction number first and then declaring a sub cursor within the first. I also took out the counter ID field and just grouped by job and department. While selecting date out I used MAX(dateout)

Incremental count column based on another column contents

I need to populate a column with the running count based on another column contents. The table is like this:
count seq_num
1 123-456-789
1 123-456-780
1 123-456-990
2 123-456-789
2 123-456-990
So, as the seq_num column changes, the counter resets to '1' and as the column repeats, the counter increments by 1.
This is using SQL2000, and the seq_num field is varchar.
Any ideas?
If you're inserting, you can use a subquery:
insert into
table (count, seq_num)
values
((select count(*)+1 from table where seq_num = #seq)
,#seq)
Otherwise, you'll need to have a date on there or some way of telling it how to determine what was first:
update table
set count =
(select count(*)+1 from table t2
where t2.seq_num = table.seq_num
and t2.insertdate < table.insertdate)
if you need to be able to continue updating this in the future, you might try this. It's a few steps but would fix it AND set it up for future use. (probably need to check my syntax - I mess with ORacle more now, so I may have mixed up some things - but the logic should work.)
first, create a table to contain the current counter level per sequence:
Create newTable (counter int, sequence varchar)
then, fill it with data like this:
insert into newTable
(select distinct 0 as Counter, sequence
from table)
This will put each sequence number in the table one time and the counter for each will be set at 0.
Then, create an update proc with TWO update statements and a bit of extra logic:
Create procedere counterUpdater(#sequence varchar) as
Declare l_counter as int;
select l_counter = counter
from newTable
where sequence = #sequence
--assuming you have a primary key in the table.
Declare #id int;
Select top 1 #id = id from table
where sequence = #sequence
and counter is null;
--update the table needing edits.
update table
set counter = l_counter + 1
where id = #id
--update the new table so you can keep track of which
--counter you are on
update newTable
set counter = l_counter + 1
where id = #id
Then run a proc to execute this proc for each record in your table.
Now you should have a "newTable" filled with the currently used counter for each record in the table. Set up your insert proc so that anytime a new record is created, if it is a sequence not already in the newTable, you add it with a count of 1 and you put a count of 1 in the main table. If the sequence DOES already exist, use the logic above (increment the count already in use the "newTable" and place that count as the counter value in the newTable and the mainTable.
Basically, this method decided to use memory in place of querying the existing table. It will become most beneficial if you have a large table with lots of repeated sequence numbers. If your sequence numbers only happen two or three times, you probably want to do a query instead when you update and then later insert:
First, to update:
--find out the counter value
Declare l_counter int
select l_counter = max(counter)
from table where sequence = #sequence
update table
set counter = l_counter + 1
where id = (select top 1 id from table where sequence = #sequence
and counter is null)
then run that for each record.
Then, when inserting new records:
Declare l_counter int
select l_counter = max(counter)
from table where sequence = #sequence
IsNull(l_counter, 0)
Insert into table
(counter, sequence) values (l_counter + 1, #sequence)
Again, I'm positive I've mixed-and-matched my syntaxes here, but the concepts should work. OF course, it's a "one at a time" approach instead of set based, so it might be a little inefficient, but it will work.