I went through a lot of posts on SO. However, they do not fit my situation.
We have a situation where we want to store a large dataset on sqlserver 2017 into multiple reference tables.
We have tried with cursor and it is working fine. However, we are concerned about the performance issue of loading large data(1+ million rows)
Example
T_Bulk is a input table, T_Bulk_Orignal is destination table and T_Bulk_reference is a reference table for t_Bulk_orignal
create table T_Bulk
(
Id uniqueidentifier,
ElementType nvarchar(max),
[Description] nvarchar(max)
)
create table T_Bulk_orignal
(
Id uniqueidentifier,
ElementType nvarchar(max),
[Description] nvarchar(max)
)
create table T_Bulk_reference
(
Id uniqueidentifier,
Description2 nvarchar(max)
)
create proc UseCursor
(
#udtT_Bulk as dbo.udt_T_Bulk READONLY
)
as
begin
DECLARE #Id uniqueidentifier, #ElementType varchar(500), #Description varchar(500),#Description2 varchar(500)
DECLARE MY_CURSOR CURSOR
LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR
SELECT Id, ElementType, [Description]
FROM dbo.T_BULK
OPEN MY_CURSOR
FETCH NEXT FROM MY_CURSOR INTO #Id, #ElementType, #Description,#Description2
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN Transaction Trans1
BEgin TRy
IF EXISTS (select Id from T_Bulk_orignal where ElementType=#ElementType and Description=#Description)
select #Id = Id from T_Bulk_orignal where ElementType=#ElementType and Description=#Description
ELSE
BEGIN
insert T_Bulk_orignal(Id,ElementType,Description) values (#id, #ElementType,#Description)
END
INSERT T_Bulk_reference(Id,description2)
SELECT Id, Description2
FROM (select #Id as Id, #Description2 as Description2) F
WHERE NOT EXISTS (SELECT * FROM T_Bulk_reference C WHERE C.Id = F.Id and C.Description2 = F.Description2);
COMMIT TRANSACTION [DeleteTransaction]
FETCH NEXT FROM MY_CURSOR INTO #Id, #ElementType, #Description,#Description2
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION [Trans1]
SELECT ##Error
END CATCH
END
CLOSE MY_CURSOR
DEALLOCATE MY_CURSOR
end
We want this operation to execute in one go like bulk insertion however we also need to crosscheck any data discrepancy and if one row is not able to insert we need to rollback only that specific record
The only catch for bulk insertion is as there are reference table data present.
Please suggest best approach on this
This sounds like a job for SSIS (SQL Server Integration Services).
https://learn.microsoft.com/en-us/sql/integration-services/ssis-how-to-create-an-etl-package
In SSIS you can create a data migration job that can do reference checks. You can set it up to fail ,warn or ignore errors at each stage. To find resources on this google for ETL and SSIS.
I have done jobs like yours on 50+ million rows.
Sure it takes a while, and it rolls back everything (if set up like that) on an error, but it is the best tool for this kind of job.
I got a solution to upload large file with a go like bulk insert.
There is a Merge statement present in SQL.
The MERGE statement is used to make changes in one table based on
values matched from anther. It can be used to combine insert,
update, and delete operations into one statement
So we can pass the data using DataTable to StoredProcedure and then Source will be your UserDefinedDataTable and Target will be your actual SQL Table
Related
I want to use a database cursor; first I need to understand what its use and syntax are, and in which scenario we can use this in stored procedures? Are there different syntaxes for different versions of SQL Server?
When is it necessary to use?
Cursors are a mechanism to explicitly enumerate through the rows of a result set, rather than retrieving it as such.
However, while they may be more comfortable to use for programmers accustomed to writing While Not RS.EOF Do ..., they are typically a thing to be avoided within SQL Server stored procedures if at all possible -- if you can write a query without the use of cursors, you give the optimizer a much better chance to find a fast way to implement it.
In all honesty, I've never found a realistic use case for a cursor that couldn't be avoided, with the exception of a few administrative tasks such as looping over all indexes in the catalog and rebuilding them. I suppose they might have some uses in report generation or mail merges, but it's probably more efficient to do the cursor-like work in an application that talks to the database, letting the database engine do what it does best -- set manipulation.
cursor are used because in sub query we can fetch record row by row
so we use cursor to fetch records
Example of cursor:
DECLARE #eName varchar(50), #job varchar(50)
DECLARE MynewCursor CURSOR -- Declare cursor name
FOR
Select eName, job FROM emp where deptno =10
OPEN MynewCursor -- open the cursor
FETCH NEXT FROM MynewCursor
INTO #eName, #job
PRINT #eName + ' ' + #job -- print the name
WHILE ##FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM MynewCursor
INTO #ename, #job
PRINT #eName +' ' + #job -- print the name
END
CLOSE MynewCursor
DEALLOCATE MynewCursor
OUTPUT:
ROHIT PRG
jayesh PRG
Rocky prg
Rocky prg
Cursor might used for retrieving data row by row basis.its act like a looping statement(ie while or for loop).
To use cursors in SQL procedures, you need to do the following:
1.Declare a cursor that defines a result set.
2.Open the cursor to establish the result set.
3.Fetch the data into local variables as needed from the cursor, one row at a time.
4.Close the cursor when done.
for ex:
declare #tab table
(
Game varchar(15),
Rollno varchar(15)
)
insert into #tab values('Cricket','R11')
insert into #tab values('VollyBall','R12')
declare #game varchar(20)
declare #Rollno varchar(20)
declare cur2 cursor for select game,rollno from #tab
open cur2
fetch next from cur2 into #game,#rollno
WHILE ##FETCH_STATUS = 0
begin
print #game
print #rollno
FETCH NEXT FROM cur2 into #game,#rollno
end
close cur2
deallocate cur2
Cursor itself is an iterator (like WHILE). By saying iterator I mean a way to traverse the record set (aka a set of selected data rows) and do operations on it while traversing. Operations could be INSERT or DELETE for example. Hence you can use it for data retrieval for example. Cursor works with the rows of the result set sequentially - row by row. A cursor can be viewed as a pointer to one row in a set of rows and can only reference one row at a time, but can move to other rows of the result set as needed.
This link can has a clear explanation of its syntax and contains additional information plus examples.
Cursors can be used in Sprocs too. They are a shortcut that allow you to use one query to do a task instead of several queries. However, cursors recognize scope and are considered undefined out of the scope of the sproc and their operations execute within a single procedure. A stored procedure cannot open, fetch, or close a cursor that was not declared in the procedure.
I would argue you might want to use a cursor when you want to do comparisons of characteristics that are on different rows of the return set, or if you want to write a different output row format than a standard one in certain cases. Two examples come to mind:
One was in a college where each add and drop of a class had its own row in the table. It might have been bad design but you needed to compare across rows to know how many add and drop rows you had in order to determine whether the person was in the class or not. I can't think of a straight forward way to do that with only sql.
Another example is writing a journal total line for GL journals. You get an arbitrary number of debits and credits in your journal, you have many journals in your rowset return, and you want to write a journal total line every time you finish a journal to post it into a General Ledger. With a cursor you could tell when you left one journal and started another and have accumulators for your debits and credits and write a journal total line (or table insert) that was different than the debit/credit line.
CREATE PROCEDURE [dbo].[SP_Data_newUsingCursor]
(
#SCode NVARCHAR(MAX)=NULL,
#Month INT=NULL,
#Year INT=NULL,
#Msg NVARCHAR(MAX)=null OUTPUT
)
AS
BEGIN
DECLARE #SEPERATOR as VARCHAR(1)
DECLARE #SP INT
DECLARE #VALUE VARCHAR(MAX)
SET #SEPERATOR = ','
CREATE TABLE #TempSiteCode (id int NOT NULL)
WHILE PATINDEX('%' + #SEPERATOR + '%', #SCode ) <> 0
BEGIN
SELECT #SP = PATINDEX('%' + #SEPERATOR + '%' ,#SCode)
SELECT #VALUE = LEFT(#SCode , #SP - 1)
SELECT #SCode = STUFF(#SCode, 1, #SP, '')
INSERT INTO #TempSiteCode (id) VALUES (#VALUE)
END
DECLARE
#EmpCode bigint=null,
#EmpName nvarchar(50)=null
CREATE TABLE #TempEmpDetail
(
EmpCode bigint
)
CREATE TABLE #TempFinalDetail
(
EmpCode bigint,
EmpName nvarchar(500)
)
DECLARE #TempSCursor CURSOR
DECLARE #TempFinalCursor CURSOR
INSERT INTO #TempEmpDetail
(
EmpCode
)
(
SELECT DISTINCT EmpCode FRom tbl_Att_MSCode
WHERE tbl_Att_MSCode.SiteCode IN (SELECT id FROM #TempSiteCode)
AND fldMonth=#Month AND fldYear=#Year
)
SET #TempSiteFinalCursor=CURSOR FOR SELECT EmpCode FROM #TempEmpDetail
OPEN #TempSiteFinalCursor
FETCH NEXT FROM #TempSiteFinalCursor INTO #EmpCode,#SiteCode,#HrdCompanyId
WHILE ##FETCH_STATUS=0
BEGIN
SEt #EmpName=(SELECt EmpName FROm tbl_Employees WHERE EmpCode=#EmpCode)
INSERT INTO #TempFinalDetail
(
EmpCode,
EmpName
)
VALUES
(
#EmpCode,
#EmpName
)
FETCH NEXT FROM #TempSiteFinalCursor INTO #EmpCode
END
SELECT EmpCode,
EmpName
FROM #TempFinalDetail
DEALLOCATE #TempSiteFinalCursor
DROP TABLE #TempEmpDetail
DROP TABLE #TempFinalDetail
END
I have scoured the internet for a solution (mostly scouring stack overflow) and I cannot come up with anything.
Here is my goal: I have a local database and I have set up a linked server to another database. I am creating a trigger on one of my local tables. One of the column values is a Hotel ID. In the linked server there is a table called "Hotel". The point of this trigger is to check and make sure that the HotelID I am trying to insert into my local table is a value that exists in the linked server's Hotel table.
Example: If I want to insert a new row into my "Store Table" from local, I want to make sure that the HotelID I am trying to insert exists in the "Hotel" table in my linked server. If it does not exist, I want to rollback the transaction and display a message.
Below is the code I have been playing with. I feel like I could be close, but I am open to the idea that I am extremely far away.
FYI: The code inside of the IF NOT EXISTS statement is incorrect. I am just confused as to what needs to go in there.
CREATE TRIGGER tr_trigger ON Store
AFTER Insert
AS
DECLARE #HotelID smallint = (SELECT HotelID FROM inserted)
DECLARE #query NVARCHAR(MAX) = N'SELECT * FROM OPENQUERY (test,''
SELECT HotelID FROM test.dbo.Hotel WHERE HotelID = ''''' +
CONVERT(nvarchar(15),#HotelID) +''''''')'
DECLARE #StoredResult Nvarchar(20)
BEGIN
EXEC sp_executesql #query, N'#StoredResult NVARCHAR(MAX) OUTPUT', #StoredResult =
#StoredResult OUTPUT
SELECT #StoredResult
IF NOT EXISTS (SELECT * FROM OPENQUERY (test,' SELECT HotelID FROM test.dbo.Hotel'))
BEGIN
PRINT'That HotelID does not exist. Please try again.'
ROLLBACK TRANSACTION
END
END
GO
EDIT: This has been solved thanks to a couple of suggestions from marc_s. Below is my new code that works how I need it to.
CREATE TRIGGER tr_trigger ON Store
AFTER Update, Insert
AS
BEGIN
IF NOT EXISTS (SELECT A.* FROM OPENQUERY (test, 'SELECT HotelID FROM test.dbo.hotel') A
INNER JOIN inserted i
ON A.HotelID = i.HotelID)
BEGIN
PRINT'Please enter a valid HotelID'
ROLLBACK TRANSACTION
END
END
GO
How about:
CREATE TRIGGER tr_DataIntegrity ON Store
AFTER Update, Insert
AS
BEGIN
IF EXISTS (
SELECT * FROM inserted i
WHERE NOT EXISTS (
SELECT A.*
FROM OPENQUERY (TITAN_Prescott_Store, 'SELECT HotelID FROM FARMS_Prescott.dbo.hotel') A
WHERE A.HotelID = i.HotelID))
BEGIN
PRINT'Please do not enter an invalid HotelID'
ROLLBACK TRANSACTION
END
END
GO
I have to use a Stored Procedure - that I cannot change/modify. While it is a bit complicated, it can be simplified to be a SELECT statement i.e. with no RETURN or OUTPUT parameter. For the purpose of this discussion assume it to be something like:
SELECT [URL] as imgPath
FROM [mydatasource].[dbo].[DigitalContent]
I need to execute this Stored Procedure passing in the Row ID (SKU) of each row in a Table.
I use a cursor for this as below:
DECLARE #sku varchar(100)
DECLARE #imgPath varchar(500)
DECLARE c CURSOR FOR
SELECT [SKU]
FROM [mydatasource].[dbo].[PROD_TABLE]
OPEN c
FETCH NEXT FROM c INTO #sku
WHILE ##FETCH_STATUS = 0 BEGIN
EXEC #imgPath = [mydatasource].[dbo].[getImage] #sku
--UPDATE PROD_TABLE SET ImgPath=#imgPath WHERE SKU=#sku
SELECT #imgPath AS ImgPath
FETCH NEXT FROM c INTO #sku
END
CLOSE c
DEALLOCATE c
Unfortunately, the return value #imgPath comes back as 0 i.e. success. This results in 0s being inserted into my PROD_TABLE or dumped on the Console. However, as the getImage Stored Procedure executes, it dumps the correct values of imgPath to the console.
How do I get this correct value (i.e. the result of the SELECT in the Stored Procedure) in the Loop above, so that I can correctly update my PROD_TABLE?
Answer
Thanks to RBarryYoung suggestion, my final code looks like:
DECLARE #sku varchar(100)
DECLARE #imgPath varchar(500)
DECLARE c CURSOR FOR
SELECT [SKU]
FROM [mydatasource].[dbo].[PROD_TABLE]
OPEN c
FETCH NEXT FROM c INTO #sku
WHILE ##FETCH_STATUS = 0 BEGIN
CREATE TABLE #OutData ( imgPath varchar(500) )
INSERT INTO #OutData EXEC [mydatasource].[dbo].[getImage] #sku
--UPDATE PROD_TABLE SET ImgPath=(SELECT * FROM #OutData) WHERE SKU=#sku
SELECT * FROM #OutData
DROP TABLE #OutData
FETCH NEXT FROM c INTO #sku
END
CLOSE c
DEALLOCATE c
The performance may not be the best, but at least it works :-).
First, create a temp table (#OutData) whose definition matches the output dataset being returned by getImage.
Then, change your EXEC .. statement to this:
INSERT INTO #OutData EXEC [mydatasource].[dbo].[getImage] #sku
Response to the question: "Is it possible to insert the Key/Row ID into the Temp Table, that way I will not have to TRUNCATE it after each loop iteration?"
First, as a general rule you shouldn't use TRUNCATE on #temp tables as there are some obscure locking problems with that. If you need to do that, just DROP and CREATE them again (they're optimized for that anyway).
Secondly, you cannot modify the dataset returned by a stored procedure in any way. Of course once its in the #temp table you can do what you want with it. So you could add a KeyId column to #OutData. Then inside the loop make a second #temp table (#TmpData), and use INSERT..EXEC to dump into that table instead. Then INSERT..SELECT into #OutData by selecting from #TmpData, adding your KeyID column. Finally, DROP TABLE #TmpData as the last statement in your loop.
This should perform fairly well.
Sometimes executing code entirely inside SQL Server can be more difficult than doing so directly client-side, sending multiple queries calling the SProc (ideally batched in a single round-trip) and processing the results there directly.
Otherwise, the INSERT-EXEC method seems the easier if you absolutely can't modify the called procedure. There are a few alternative methods, all with some additional problems, shown here: http://www.sommarskog.se/share_data.html
My Scenario is bit different. what i am doing in my stored procedure is
Create Temp Table and insert rows it in using "Cursor"
Create Table #_tempRawFeed
(
Code Int Identity,
RawFeed VarChar(Max)
)
Insert Data in temp table using cursor
Set #GetATM = Cursor Local Forward_Only Static For
Select DeviceCode,ReceivedOn
From RawStatusFeed
Where C1BL=1 AND Processed=0
Order By ReceivedOn Desc
Open #GetATM
Fetch Next
From #GetATM Into #ATM_ID,#Received_On
While ##FETCH_STATUS = 0
Begin
Set #Raw_Feed=#ATM_ID+' '+Convert(VarChar,#Received_On,121)+' '+'002333'+' '+#ATM_ID+' : Bills - Cassette Type 1 - LOW '
Insert Into #_tempRawFeed(RawFeed) Values(#Raw_Feed)
Fetch Next
From #GetATM Into #ATM_ID,#Received_On
End
Now have to process each row in Temp Table using another Cursor
DECLARE #RawFeed VarChar(Max)
DECLARE Push_Data CURSOR FORWARD_ONLY LOCAL STATIC
FOR SELECT RawFeed
FROM #_tempRawFeed
OPEN Push_Data
FETCH NEXT FROM Push_Data INTO #RawFeed
WHILE ##FETCH_STATUS = 0
BEGIN
/*
What Should i write here to retrieve each row one at a time ??
One Row should get stored in Variable..in next iteration previous value should get deleted.
*/
FETCH NEXT FROM Push_Data INTO #RawFeed
END
CLOSE Push_Data
DEALLOCATE Push_Data
Drop Table #_tempRawFeed
What Should i write In BEGIN to retrieve each row one at a time ??
One Row should get stored in Variable..in next iteration previous value should get deleted.
Regarding your last question, if what you are really intending to do within your last cursor is to concatenate RawFeed column values into one variable, you don't need cursors at all. You can use the following (adapted from your SQL Fiddle code):
CREATE TABLE #_tempRawFeed
(
Code Int IDENTITY
RawFeed VarChar(MAX)
)
INSERT INTO #_tempRawFeed(RawFeed) VALUES('SAGAR')
INSERT INTO #_tempRawFeed(RawFeed) VALUES('Nikhil')
INSERT INTO #_tempRawFeed(RawFeed) VALUES('Deepali')
DECLARE #RawFeed VarChar(MAX)
SELECT #RawFeed = COALESCE(#RawFeed + ', ', '') + ISNULL(RawFeed, '')
FROM #_tempRawFeed
SELECT #RawFeed
DROP TABLE #_tempRawFeed
More on concatenating different row values into a single string here: Concatenate many rows into a single text string?
I am pretty sure that you can avoid using the first cursor as well. Please, avoid using cursors, since the really hurt performance. The same result can be achieved using set based operations.
I have a stored procedure that alters user data in a certain way. I pass it user_id and it does it's thing. I want to run a query on a table and then for each user_id I find run the stored procedure once on that user_id
How would I write query for this?
use a cursor
ADDENDUM: [MS SQL cursor example]
declare #field1 int
declare #field2 int
declare cur CURSOR LOCAL for
select field1, field2 from sometable where someotherfield is null
open cur
fetch next from cur into #field1, #field2
while ##FETCH_STATUS = 0 BEGIN
--execute your sproc on each row
exec uspYourSproc #field1, #field2
fetch next from cur into #field1, #field2
END
close cur
deallocate cur
in MS SQL, here's an example article
note that cursors are slower than set-based operations, but faster than manual while-loops; more details in this SO question
ADDENDUM 2: if you will be processing more than just a few records, pull them into a temp table first and run the cursor over the temp table; this will prevent SQL from escalating into table-locks and speed up operation
ADDENDUM 3: and of course, if you can inline whatever your stored procedure is doing to each user ID and run the whole thing as a single SQL update statement, that would be optimal
try to change your method if you need to loop!
within the parent stored procedure, create a #temp table that contains the data that you need to process. Call the child stored procedure, the #temp table will be visible and you can process it, hopefully working with the entire set of data and without a cursor or loop.
this really depends on what this child stored procedure is doing. If you are UPDATE-ing, you can "update from" joining in the #temp table and do all the work in one statement without a loop. The same can be done for INSERT and DELETEs. If you need to do multiple updates with IFs you can convert those to multiple UPDATE FROM with the #temp table and use CASE statements or WHERE conditions.
When working in a database try to lose the mindset of looping, it is a real performance drain, will cause locking/blocking and slow down the processing. If you loop everywhere, your system will not scale very well, and will be very hard to speed up when users start complaining about slow refreshes.
Post the content of this procedure you want call in a loop, and I'll bet 9 out of 10 times, you could write it to work on a set of rows.
You can do it with a dynamic query.
declare #cadena varchar(max) = ''
select #cadena = #cadena + 'exec spAPI ' + ltrim(id) + ';'
from sysobjects;
exec(#cadena);
Something like this substitutions will be needed for your tables and field names.
Declare #TableUsers Table (User_ID, MyRowCount Int Identity(1,1)
Declare #i Int, #MaxI Int, #UserID nVarchar(50)
Insert into #TableUser
Select User_ID
From Users
Where (My Criteria)
Select #MaxI = ##RowCount, #i = 1
While #i <= #MaxI
Begin
Select #UserID = UserID from #TableUsers Where MyRowCount = #i
Exec prMyStoredProc #UserID
Select
#i = #i + 1, #UserID = null
End
Use a table variable or a temporary table.
As has been mentioned before, a cursor is a last resort. Mostly because it uses lots of resources, issues locks and might be a sign you're just not understanding how to use SQL properly.
Side note: I once came across a solution that used cursors to update
rows in a table. After some scrutiny, it turned out the whole thing
could be replaced with a single UPDATE command. However, in this case,
where a stored procedure should be executed, a single SQL-command
won't work.
Create a table variable like this (if you're working with lots of data or are short on memory, use a temporary table instead):
DECLARE #menus AS TABLE (
id INT IDENTITY(1,1),
parent NVARCHAR(128),
child NVARCHAR(128));
The id is important.
Replace parent and child with some good data, e.g. relevant identifiers or the whole set of data to be operated on.
Insert data in the table, e.g.:
INSERT INTO #menus (parent, child)
VALUES ('Some name', 'Child name');
...
INSERT INTO #menus (parent,child)
VALUES ('Some other name', 'Some other child name');
Declare some variables:
DECLARE #id INT = 1;
DECLARE #parentName NVARCHAR(128);
DECLARE #childName NVARCHAR(128);
And finally, create a while loop over the data in the table:
WHILE #id IS NOT NULL
BEGIN
SELECT #parentName = parent,
#childName = child
FROM #menus WHERE id = #id;
EXEC myProcedure #parent=#parentName, #child=#childName;
SELECT #id = MIN(id) FROM #menus WHERE id > #id;
END
The first select fetches data from the temporary table. The second select updates the #id. MIN returns null if no rows were selected.
An alternative approach is to loop while the table has rows, SELECT TOP 1 and remove the selected row from the temp table:
WHILE EXISTS(SELECT 1 FROM #menuIDs)
BEGIN
SELECT TOP 1 #menuID = menuID FROM #menuIDs;
EXEC myProcedure #menuID=#menuID;
DELETE FROM #menuIDs WHERE menuID = #menuID;
END;
Can this not be done with a user-defined function to replicate whatever your stored procedure is doing?
SELECT udfMyFunction(user_id), someOtherField, etc FROM MyTable WHERE WhateverCondition
where udfMyFunction is a function you make that takes in the user ID and does whatever you need to do with it.
See http://www.sqlteam.com/article/user-defined-functions for a bit more background
I agree that cursors really ought to be avoided where possible. And it usually is possible!
(of course, my answer presupposes that you're only interested in getting the output from the SP and that you're not changing the actual data. I find "alters user data in a certain way" a little ambiguous from the original question, so thought I'd offer this as a possible solution. Utterly depends on what you're doing!)
I like the dynamic query way of Dave Rincon as it does not use cursors and is small and easy. Thank you Dave for sharing.
But for my needs on Azure SQL and with a "distinct" in the query, i had to modify the code like this:
Declare #SQL nvarchar(max);
-- Set SQL Variable
-- Prepare exec command for each distinctive tenantid found in Machines
SELECT #SQL = (Select distinct 'exec dbo.sp_S2_Laser_to_cache ' +
convert(varchar(8),tenantid) + ';'
from Dim_Machine
where iscurrent = 1
FOR XML PATH(''))
--for debugging print the sql
print #SQL;
--execute the generated sql script
exec sp_executesql #SQL;
I hope this helps someone...