SQL - Insert millions of records

SQL - Insert millions of records - sql

I want to optimize the insertion of millions of records in SQL server.
while #i <= 2000000 begin
set #sql = 'select #max_c= isnull(max(c), 0) from nptb_data_v7'
Execute sp_executesql #sql, N'#max_c int output', #max_c output
set #max_c= #max_c+ 1
while #j <= 10 begin
set #sql_insert = #sql_insert + '(' + cast(#i as varchar) +',' + cast(#b as varchar) + ',' + cast(#max_c as varchar)+ '),'
if len(ltrim(rtrim(#sql_insert)))>=3800
begin
set #sql_insert = SUBSTRING(#sql_insert, 1 , len(ltrim(rtrim(#sql_insert)))-1)
Execute sp_executesql #sql_insert
set #sql_insert = 'insert into dbo.nptb_data_v7 values '
end
set #b = #b + 1
if #b > 100000
begin
set #b = 1
end
set #j = #j + 1
end
set #i = #i + 1
set #j=1
end
set #sql_insert = SUBSTRING(#sql_insert, 1 , len(ltrim(rtrim(#sql_insert)))-1)
Execute sp_executesql #sql_insert
end
I want to optimize the above code as it is taking hours to complete this.

There are quite a few critical things I want to hit on. First, iteratively doing just about anything (especially inserting millions of rows) in SQL is almost never the right way to go. So right off the bat, we need to look at throwing that model out the window.
Second, your approach appears to be to loop over a set of numbers millions of times and add a new set of parentheses to the VALUES clause in the insert, winding you up with a ridiciulously long string of brackets which just look like (1,1,2),(1,2,2)... etc. If you WERE going to do this iteratively, you'd want to make it so that every loop just did an insert rather than building an unwielding insert string.
Third, nothing here needs dynamic SQL; not the first assignment of #max_c, nor the insertion into nptb_data_v7. You could statically construct all of these statements without having to use dynamic sql and a) obfuscate your code and b) open yourself up to injection attack.
With those out of the way, now we can get down to brass tacks. All this appears to be doing is creating combinations of an auto incrementing number between 1 and 2 million, and based on some rules, a value for #i, #b and the current iteration.
The first thing you need here is a tally table (just a big table of integers). There are tons of ways to do this, but here's a succinct script which should get you started.
http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/
Once you have this table made, your script becomes a matter of self joining your newly created numbers/tally table to itself so that the rules you define are satisfied for your insert. Since it's not 100% clear what you're code is trying to get at, nor can I run it from the information provided, this is where I have to leave it up to you. If you can provide a better summary of what your tables look like, your variable declarations and your objective, I may be able to help you write some code. But hopefully this should get you started.

Related

How to pick a table_name value from one table and delete records from the table_name table based on a condition?

We have a table. Lets call it Table_A.
Table_A holds bunch of table_names and numeric value associated to each table_name. Refer to the picture below
Can someone help me write a query to:
Select table_names from TABLE_A one by one; go to that table, Check the Date_inserted of each record against NO_OF_DAYS in Table_A and if the record is older than NO_OF_DAYS in Table_A, then DELETE THAT RECORD from that specific table.
I'm guessing we have to create dynamic values for this query but I'm having a hard time.
So, in the above picture, the query should:
Select the first table_name (T_Table1) from Table_A
Go to that Table (T_Table1)
Check the date inserted of each record in (T_Table1) against the condition
If the condition (IF record was inserted prior to NO_OF_DAYS, which is 90 in this case THEN delete the record; ELSE move to next
record)
Move on to the next table (T_Table2) in Table_A
Continue till all the table_names in Table_A have been executed

What you posted as your attempt (in a comment), quite simply isn't going to work. Let's actually format that first, shall we:
SET SQL = '
DELETE [' + dbo + '].[' + TABLE_NAME + ']
where [Date_inserted ] < '
SET SQL = SQL + ' convert(varchar, DATEADD(day, ' + CONVERT(VARCHAR, NO_OF_DAYS) + ',' + '''' + CONVERT(VARCHAR, GETDATE(), 102) + '''' + '))'
PRINT SQL
EXEC (SQL)
Firstly, I actually have no idea what you're even trying to do here. You have things like [' + dbo + '], which means that you're referencing the column dbo; as you're using a SET, then no column dbo can exist. Also, variables are prefixed with a # in SQL Server; you have none.
Anyway, the solution. Some might not like this one, as I'm using a CURSOR, rather than doing it all in one go. I, however, do have my reasons. A CURSOR isn't actually a "bad" thing, like many believe; the problem is that people constantly use them incorrectly. Using a CURSOR to loop through records and create a hierarchy for example is a terrible idea; there are far better dataset approaches.
So, what are my reasons? Firstly I can parametrise the dynamic SQL; this would be harder outside a CURSOR as I'd need to declare a different parameter for every DELETE. Also, with a CURSOR, if the DELETE fails on one table, it won't on the others; one long piece of dynamic SQL would mean if one of the transactions fail, they would all be rolled back. Also, depending on the size of the deletes, that could be a very big DELETE.
It's important, however, you understand what I've done here; if you don't that's a problem unto itself. What happens if you need to trouble shoot it in the future? SO isn't a website for support like that; you need to support your own code. If you can't, understand the code you're given don't use it or learn what it's doing first (or you're doing the wrong thing).
Note I use my own objects, in the absence of consumable sample data:
CREATE TABLE TableOfTables (TableName sysname,
NoOfDays int);
GO
INSERT INTO TableOfTables
VALUES ('T1',10),
('T2',15),
('T3',5);
GO
DECLARE Deletes CURSOR FOR
SELECT TableName, NoOfDays
FROM TableOfTables;
DECLARE #SQL nvarchar(MAX), #TableName sysname, #Days int;
OPEN Deletes;
FETCH NEXT FROM Deletes
INTO #TableName, #Days;
WHILE ##FETCH_STATUS = 0 BEGIN
SET #SQL = N'DELETE FROM ' + QUOTENAME(#TableName) + NCHAR(10) +
N'WHERE DATEDIFF(DAY, InsertedDate, GETDATE()) >= #dDays;'
PRINT #SQL; --Say hello to your best friend. o/
--EXEC sp_executeSQL #SQL, N'#dDays int', #dDays = #Days; --Uncomment to run
FETCH NEXT FROM Deletes
INTO #TableName, #Days;
END
CLOSE Deletes;
DEALLOCATE Deletes;
GO
DROP TABLE TableOfTables;
GO

Adapt SQL query in a while loop

I have been spending a fair amount of time researching a method to adapt an sql query, while in a loop in order to bring back from multiple tables.
The one method I came across that makes this possible would be executing the query as a loadstring, then you could adapt the query each time the loop runs ( as explained via this link: https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-executesql-transact-sql ).
To be more specific, I am attempting to run a rather large query, which loops through multiple databases - however each database has a branch number, such as A, B, C, D, E etc.. So each time I execute the query I am using joins to go to all the databases I need from A. In order to make this work, I would need to copy and paste this entire 500 line query, over 5 times in order to cover every branch.
The method using loadstring would end up being similar to this:
DECLARE process varchar(max) = 'select * from Vis_' + Branch[i] + '_Quotes' exec(process)
Is there a better method to adapt your query search while its running?

Here is one example of how this might be used. It's not clear if this fits your requirements, but it appears that dynamic SQL is new to you so I've supplied an example that includes both looping and passing in parameters safely. This is untested, but hopefully should get you on the right track.
This assumes you have an existing table of branches with the corresponding branch codes (ideal, as then the script doesn't need updating when adding/disabling/removing a branch). If you don't, then you could always create a table variable and insert branches at the top of your script:
declare #sql varchar(max),
#BranchCode nvarchar(10) = '',
#param1 int,
#param2 nvarchar(10);
while 1=1 begin
set #BranchCode =
(select top 1 Code from Branch where Active = 1 and Code > #BranchCode order by Code)
if #BranchCode is null break;
set #sql = #sql + 'select * from Vis_' + #BranchCode + '_Quotes
where col1 = #param1 and #col2 like #param2
' -- notice extra linebreak (or space) added to separate each query
end
exec sp_executesql #sql,
'#param1 int, #param2 nvarchar(10), ...', -- parameter definitions
#param1, #param2, ... -- any additional parameters you need to safely pass in

iterative executing stored procedure with a set based approach

I have an issue where I am trying to replace the following code with a different solution. Currently I am using a cursor but it is running to slowly. I am under the understanding that iterative solutions can only be completed with cursors or while loops but I am trying to find a set based approach and running out of ideas. I was hoping that I could find some inspiration here. Thanks all.
--used to find a unique list of Some_ID
#Id1, #Id2, #Id3
DECLARE SomeCursor CURSOR FOR
SELECT SOME_ID FROM SomeTable
WHERE ID1=#Id1 AND ID2=#Id2 and ID3=#Id3
OPEN SomeCursor
FETCH NEXT FROM SomeCursor INTO #SomeID
WHILE ##Fetch_Status = 0
BEGIN
Print #SomeID
--simply populates a single table with values pulled from
--other tables in the database based on the give parameters.
EXEC SP_PART1 #SomeID, #parameters...
print 'part 2 starting'
EXEC SP_PART2 #SomeID, #parameters...
FETCH NEXT FROM SomeCursor INTO #SomeID
print getdate()
END
CLOSE SomeCursor;
DEALLOCATE SomeCursor;

Your only option to make this set-based is to rewrite the sps to make them set-based (using table-valed parameters intead of individual ones) or to write set based code in this proc instead of re-using procs designed for single record use. This is a case where code re-use is usually not appropriate.

I'm not too sure what you want, but why not use your select statement to create your sql scripts and execute them all at once with something like this.
DECLARE #sql VARCHAR(MAX);
SELECT #sql = COALESCE(#sql,'') + 'EXEC SP_Part1 ' + SOME_ID + '; EXEC SP_Part2 ' + SomeID + '; GO '
FROM SomeTable
WHERE ID1=#Id1 AND ID2=#Id2 and ID3=#Id3
EXEC (#sql)

How to copy large amount of data from one table to other table in SQL Server

I want to copy large amount of datas from one table to another table.I used cursors in Stored Procedure to do the same.But it is working only for tables with less records.If the tables contain more records it is executing for long time and hanged.Please give some suggestion as how can i copy the datas in faster way,My SP is as below:
--exec uds_shop
--select * from CMA_UDS.dbo.Dim_Shop
--select * from UDS.dbo.Dim_Shop
--delete from CMA_UDS.dbo.Dim_Shop
alter procedure uds_shop
as
begin
declare #dwkeyshop int
declare #shopdb int
declare #shopid int
declare #shopname nvarchar(60)
declare #shoptrade int
declare #dwkeytradecat int
declare #recordowner nvarchar(20)
declare #LogMessage varchar(600)
Exec CreateLog 'Starting Process', 1
DECLARE cur_shop CURSOR FOR
select
DW_Key_Shop,Shop_ID,Shop_Name,Trade_Sub_Category_Code,DW_Key_Source_DB,DW_Key_Trade_Category,Record_Owner
from
UDS.dbo.Dim_Shop
OPEN cur_shop
FETCH NEXT FROM cur_shop INTO #dwkeyshop,#shopid,#shopname,#shoptrade, #shopdb ,#dwkeytradecat,#recordowner
WHILE ##FETCH_STATUS = 0
BEGIN
Set #LogMessage = ''
Set #LogMessage = 'Records insertion/updation start date and time : ''' + Convert(varchar(19), GetDate()) + ''''
if (isnull(#dwkeyshop, '') <> '')
begin
if not exists (select crmshop.DW_Key_Shop from CMA_UDS.dbo.Dim_Shop as crmshop where (convert(varchar,crmshop.DW_Key_Shop)+CONVERT(varchar,crmshop.DW_Key_Source_DB)) = convert(varchar,(CONVERT(varchar, #dwkeyshop) + CONVERT(varchar, #shopdb))) )
begin
Set #LogMessage = Ltrim(Rtrim(#LogMessage)) + ' ' + 'Record for shop table is inserting...'
insert into
CMA_UDS.dbo.Dim_Shop
(DW_Key_Shop,DW_Key_Source_DB,DW_Key_Trade_Category,Record_Owner,Shop_ID,Shop_Name,Trade_Sub_Category_Code)
values
(#dwkeyshop,#shopdb,#dwkeytradecat,#recordowner,#shopid,#shopname,#shoptrade)
Set #LogMessage = Ltrim(Rtrim(#LogMessage)) + ' ' + 'Record successfully inserted in shop table for shop Id : ' + Convert(varchar, #shopid)
end
else
begin
Set #LogMessage = Ltrim(Rtrim(#LogMessage)) + ' ' + 'Record for Shop table is updating...'
update
CMA_UDS.dbo.Dim_Shop
set DW_Key_Trade_Category=#dwkeytradecat,
Record_Owner=#recordowner,
Shop_ID=#shopid,Shop_Name=#shopname,Trade_Sub_Category_Code=#shoptrade
where
DW_Key_Shop=#dwkeyshop and DW_Key_Source_DB=#shopdb
Set #LogMessage = Ltrim(Rtrim(#LogMessage)) + ' ' + 'Record successfully updated for shop Id : ' + Convert(varchar, #shopid)
end
end
Exec CreateLog #LogMessage, 0
FETCH NEXT FROM cur_shop INTO #dwkeyshop,#shopid,#shopname,#shoptrade, #shopdb ,#dwkeytradecat,#recordowner
end
CLOSE cur_shop
DEALLOCATE cur_shop
End

Assuming targetTable and destinationTable have the same schema...
INSERT INTO targetTable t
SELECT * FROM destinationTable d
WHERE someCriteria
Avoid the use of cursors unless there is no other way (rare).
You can use the WHERE clause to filter out any duplicate records.
If you have an identity column, use an explicit column list that doesn't contain the identity column.
You can also try disabling constraints and removing indexes provided you replace them (and make sure the constraints are checked) afterwards.
If you are on SQL Server 2008 (onwards) you can use the MERGE statement.

From my personal experience, when you copy the huge data from one table to another (with similar constraints), drop the constraints on the table where the data is getting copied. Once the copy is done, reinstate all the constraints again.
I could reduce the copy time from 7 hours to 30 mins in my case (100 million records with 6 constraints)

INSERT INTO targetTable
SELECT * FROM destinationTable
WHERE someCriteria (based on Criteria you can copy/move the records)

Cursors are notoriously slow and ram can begin to become a problem for very large datasets.
It does look like you are doing a good bit of logging in each iteration, so you may be stuck with the cursor, but I would instead look for a way to break the job up into multiple invocations so that you can keep your footprint small.
If you have an autonumber column, I would add a '#startIdx bigint' to the procedure, and redefine your cursor statement to take the 'TOP 1000' 'WHERE [autonumberFeild] <= #startIdx Order by [autonumberFeild]'. Then create a new stored procedure with something like:
DECLARE #startIdx bigint = 0
WHILE select COUNT(*) FROM <sourceTable> > #startIdx
BEGIN
EXEC <your stored procedure> #startIdx
END
SET #startIdx = #startIdx + 1000
Also, make sure your database files are set to auto-grow, and that it does so in large increments, so you are not spending all your time growing your datafiles.

MS SQL - High performance data inserting with stored procedures

Im searching for a very high performant possibility to insert data into a MS SQL database.
The data is a (relatively big) construct of objects with relations. For security reasons i want to use stored procedures instead of direct table access.
Lets say i have a structure like this:
Document
MetaData
User
Device
Content
ContentItem[0]
SubItem[0]
SubItem[1]
SubItem[2]
ContentItem[1]
...
ContentItem[2]
...
Right now I think of creating one big query, doing somehting like this (Just pseudo-code):
EXEC #DeviceID = CreateDevice ...;
EXEC #UserID = CreateUser ...;
EXEC #DocID = CreateDocument #DeviceID, #UserID, ...;
EXEC #ItemID = CreateItem #DocID, ...
EXEC CreateSubItem #ItemID, ...
EXEC CreateSubItem #ItemID, ...
EXEC CreateSubItem #ItemID, ...
...
But is this the best solution for performance? If not, what would be better?
Split it into more querys? Give all Data to one big stored procedure to reduce size of query? Any other performance clue?
I also thought of giving multiple items to one stored procedure, but i dont think its possible to give a non static amount of items to a stored procedure.
Since 'INSERT INTO A VALUES (B,C),(C,D),(E,F) is more performant than 3 single inserts i thought i could get some performance here.
Thanks for any hints,
Marks

One stored procedure so far as possible:
INSERT INTO MyTable(field1,field2)
SELECT "firstValue", "secondValue"
UNION ALL
SELECT "anotherFirstValue", "anotherSecondValue"
UNION ALL
If you aren't sure about how many items you're inserting you can construct the SQL query witin the sproc and then execute it. Here's a procedure I wrote to take a CSV list of groups and add their relationship to a user entity:
ALTER PROCEDURE [dbo].[UpdateUserADGroups]
#username varchar(100),
#groups varchar(5000)
AS
BEGIN
DECLARE #pos int,
#previous_pos int,
#value varchar(50),
#sql varchar(8000)
SET #pos = 1
SET #previous_pos = 0
SET #sql = 'INSERT INTO UserADGroups(UserID, RoleName)'
DECLARE #userID int
SET #userID = (SELECT TOP 1 UserID FROM Users WHERE Username = #username)
WHILE #pos > 0
BEGIN
SET #pos = CHARINDEX(',',#groups,#previous_pos+1)
IF #pos > 0
BEGIN
SET #value = SUBSTRING(#groups,#previous_pos+1,#pos-#previous_pos-1)
SET #sql = #sql + 'SELECT ' + cast(#userID as char(5)) + ',''' + #value + ''' UNION ALL '
SET #previous_pos = #pos
END
END
IF #previous_pos < LEN(#groups)
BEGIN
SET #value = SUBSTRING(#groups,#previous_pos+1,LEN(#groups))
SET #sql = #sql + 'SELECT ' + cast(#userID as char(5)) + ',''' + #value + ''''
END
print #sql
exec (#sql)
END
This is far faster than individual INSERTS.
Also, make sure you just a single clustered index on the primary key, more indexes will slow the INSERT down as they will need to update.
However, the more complex your dataset is, the less likely it is that you'll be able to do the above so you will simply have to make logical compromises. I actually end up calling the above routine around 8000 times.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas