SQL Query with Cursor optimization - sql

I have a query where I iterate through a table -> for each entry I iterate through another table and then compute some results. I use a cursor for iterating through the table. This query takes ages to complete. Always more than 3 minutes. If I do something similar in C# where the tables are arrays or dictionaries it doesn't even take a second. What am I doing wrong and how can I improve the efficiency?
DELETE FROM [QueryScores]
GO
INSERT INTO [QueryScores] (Id)
SELECT Id FROM [Documents]
DECLARE #Id NVARCHAR(50)
DECLARE myCursor CURSOR LOCAL FAST_FORWARD FOR
SELECT [Id] FROM [QueryScores]
OPEN myCursor
FETCH NEXT FROM myCursor INTO #Id
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #Score FLOAT = 0.0
DECLARE #CounterMax INT = (SELECT COUNT(*) FROM [Query])
DECLARE #Counter INT = 0
PRINT 'Document: ' + CAST(#Id AS VARCHAR)
PRINT 'Score: ' + CAST(#Score AS VARCHAR)
WHILE #Counter < #CounterMax
BEGIN
DECLARE #StemId INT = (SELECT [Query].[StemId] FROM [Query] WHERE [Query].[Id] = #Counter)
DECLARE #Weight FLOAT = (SELECT [tfidf].[Weight] FROM [TfidfWeights] AS [tfidf] WHERE [tfidf].[StemId] = #StemId AND [tfidf].[DocumentId] = #Id)
PRINT 'WEIGHT: ' + CAST(#Weight AS VARCHAR)
IF(#Weight > 0.0)
BEGIN
DECLARE #QWeight FLOAT = (SELECT [Query].[Weight] FROM [Query] WHERE [Query].[StemId] = #StemId)
SET #Score = #Score + (#QWeight * #Weight)
PRINT 'Score: ' + CAST(#Score AS VARCHAR)
END
SET #Counter = #Counter + 1
END
UPDATE [QueryScores] SET Score = #Score WHERE Id = #Id
FETCH NEXT FROM myCursor INTO #Id
END
CLOSE myCursor
DEALLOCATE myCursor
The logic is that i have a list of docs. And I have a question/query. I iterate through each and every doc and then have a nested iteration through the query terms/words to find if the doc contains these terms. If it does then I add/multiply pre-calculated scores.

The problem is that you're trying to use a set-based language to iterate through things like a procedural language. SQL requires a different mindset. You should almost never be thinking in terms of loops in SQL.
From what I can gather from your code, this should do what you're trying to do in all of those loops, but it does it in a single statement in a set-based manner, which is what SQL is good at.
INSERT INTO QueryScores (id, score)
SELECT
D.id,
SUM(CASE WHEN W.[Weight] > 0 THEN W.[Weight] * Q.[Weight] ELSE NULL END)
FROM
Documents D
CROSS JOIN Query Q
LEFT OUTER JOIN TfidfWeights W ON W.StemId = Q.StemId AND W.DocumentId = D.id
GROUP BY
D.id
Of course, without a description of your requirements or sample data with expected output I don't know if this is actually what you're looking to get, but it's my best guess given your code.
You should read: https://stackoverflow.com/help/how-to-ask

The query I came up with is very similar to the one from Tom H.
There's a lot of unknowns about the problem OP code is trying to solve. Is there a particular reason the code only checks for rows in the Query table where the Id value is between 0 and one less than the number of rows in the table? Or is the intent really just to get all of the rows from Query?
Here's my version:
INSERT INTO QueryScores (Id, Score)
SELECT d.Id
, SUM(CASE WHEN w.Weight > 0 THEN w.Weight * q.Weight ELSE NULL END) AS Score
FROM [Documents] d
CROSS
JOIN [Query] q
LEFT
JOIN [TfidfWeights] w
ON w.StemId = q.StemId
AND w.DocumentId = d.Id
GROUP BY d.Id
Processing RBAR (row by agonizing row) is almost always going to be slower than processing as a set. SQL is designed to operate on sets of data. There is overhead for each individual SQL statement, and for each context switch between the procedure and the SQL engine. Sure, there might be room to improve performance of individual parts of the procedure, but the big gain is going to be doing an operation on the entire set, in a single SQL statement.
If there's some reason you need to process one document at a time, using a cursor, then get rid of the loops and individual selects and all those PRINT, and just use a single query to get the score for the document.
OPEN myCursor
FETCH NEXT FROM myCursor INTO #Id
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE [QueryScores]
SET Score
= ( SELECT SUM( CASE WHEN w.Weight > 0
THEN w.Weight * q.Weight
ELSE NULL END
)
FROM [Query] q
JOIN [TfidfWeights] w
ON w.StemId = q.StemId
WHERE w.DocumentId = #Id
)
WHERE Id = #Id
FETCH NEXT FROM myCursor INTO #Id
END
CLOSE myCursor
DEALLOCATE myCursor

You might not even need documents
INSERT INTO QueryScores (id, score)
SELECT W.DocumentId as [id]
, SUM(W.[Weight] + Q.[Weight]) as [score]
FROM Query Q
JOIN TfidfWeights W
ON W.StemId = Q.StemId
AND W.[Weight] > 0
GROUP BY W.DocumentId

Related

SQL Grouping with condition

I want to sum rows in table. The algorithm is rather simple in theory but hard (at least for me) when I need to build a query.
Generally, I want to sum "values" of a "sub-group". Sub-group is defined as a range of elements starting with first row where type=0 and finishing with last row where type=1. the sub-group should contain only one (first) row with type=0.
The sample below presents correct (left) and incorrect (right) behavior.
I tried several approaches including grouping and partitioning. Unfortunately w/o any success. Anybody had similar problem?
I used MS SQL Server (so T-SQL 'magic' is allowed)
EDIT:
The results I want:
"ab",6
"cdef",20
"ghi",10
"kl",8
You can identify the groups by doing a cumulative sum of zeros. Then use aggregation or window functions.
Note that SQL tables represent unordered sets, so you need a column to specify the ordering. The code below assumes that this column is id.
select min(id), max(id), sum(value)
from (select t.*,
sum(case when type = 0 then 1 else 0 end) over (order by id) as grp
from t
) t
group by grp
order by min(id);
You can use window function with cumulative approach :
select t.*, sum(value) over (partition by grp)
from (select t.*, sum(case when type = 0 then 1 else 0 end) over (order by id) as grp
from table t
) t
where grp > 0;
Solution with a cursor and output-table.
As Gordon wrote it is not defined how the set will be ordered, so ID is also used here.
declare #output as table (
ID_sum nvarchar(max)
,value_sum int
)
DECLARE #ID as nvarchar(1)
,#value as int
,#type as int
,#ID_sum as nvarchar(max)
,#value_sum as int
,#last_type as int
DECLARE group_cursor CURSOR FOR
SELECT [ID],[value],[type]
FROM [t]
ORDER BY ID
OPEN group_cursor
FETCH NEXT FROM group_cursor
INTO #ID, #value,#type
WHILE ##FETCH_STATUS = 0
BEGIN
if (#last_type is null and #type = 0)
begin
set #ID_sum = #ID
set #value_sum = #value
end
if (#last_type in(0,1) and #type = 1)
begin
set #ID_sum += #ID
set #value_sum += #value
end
if (#last_type = 1 and #type = 0)
begin
insert into #output values (#ID_sum, #value_sum)
set #ID_sum = #ID
set #value_sum = #value
end
if (#last_type = 0 and #type = 0)
begin
set #ID_sum = #ID
set #value_sum = #value
end
set #last_type = #type
FETCH NEXT FROM group_cursor
INTO #ID, #value,#type
END
CLOSE group_cursor;
DEALLOCATE group_cursor;
if (#last_type = 1)
begin
insert into #output values (#ID_sum, #value_sum)
end
select *
from #output

Using a for loop in sql procedure

I want to check out #deemeturu in sql prosudure and check in where condition. For example, I want to continue where (# odemetur = 1 OR # odemetur = 2 OR # odemetur = 3 OR # odemetur = 4). How can we find a solution to this problem?
NOTE: #odemetur changes the number of indices. '1,2,3,4' is not static
alter PROCEDURE sp_siparis
(
#PageNo INT,
#RowCountPerPage INT,
#adsoyadfilter nvarchar(50),
#toplamtutarfilter decimal,
#tarihfilter datetime,
#odemeturu nvarchar(500) = '1,2,3,4'
)
AS
SELECT
u.AdiSoyadi as AdSoyad,
s.OdemeTipAdi as OdemeTipAdi,
sd.Adi as SiparisDurumAdi,
s.OlusturmaTarihi as OlusturmaTarihi,
s.GenelToplam as GenelToplam
FROM
Siparis as s with(NOLOCK)
inner join
SiparisDurum as sd with(NOLOCK) on s.Durumu=sd.Id
inner join
Uye as u with(NOLOCK) on s.Uye_Id=u.Id
WHERE
(u.AdiSoyadi LIKE '%' + #adsoyadfilter + '%') OR (s.GenelToplam = #toplamtutarfilter) OR (s.OlusturmaTarihi = #tarihfilter)
ORDER BY
s.Id OFFSET (#PageNo) ROWS FETCH NEXT #RowCountPerPage ROWS ONLY
GO
you may add this conditions in where clause
or
you may use Cursor and in while use if to pass specific records
DECLARE data_Cursor Cursor For
-- SELECT data here
OPEN data_Cursor
FETCH NEXT FROM data_Cursor INTO
-- add variables here
WHILE ##FETCH_STATUS = 0
BEGIN
IF -- your if condition
BEGIN
-- your logic here
END
FETCH NEXT FROM data_Cursor INTO
-- add variables here
END
CLOSE data_Cursor
DEALLOCATE data_Cursor

Loop through all the rows of a temp table and call a stored procedure for each row

I have declared a temp table to hold all the required values as follows:
DECLARE #temp TABLE
(
Password INT,
IdTran INT,
Kind VARCHAR(16)
)
INSERT INTO #temp
SELECT s.Password, s.IdTran, 'test'
from signal s inner join vefify v
on s.Password = v.Password
and s.IdTran = v.IdTran
and v.type = 'DEV'
where s.[Type] = 'start'
AND NOT EXISTS (SELECT * FROM signal s2
WHERE s.Password = s2.Password
and s.IdTran = s2.IdTran
and s2.[Type] = 'progress' )
INSERT INTO #temp
SELECT s.Password, s.IdTran, 'test'
FROM signal s inner join vefify v
on s.Password = v.Password
and s.IdTran = v.IdTran
and v.type = 'PROD'
where s.[Type] = 'progress'
AND NOT EXISTS (SELECT * FROM signal s2
WHERE s.Password = s2.Password
and s.IdTran = s2.IdTran
and s2.[Type] = 'finish' )
Now i need to loop through the rows in the #temp table and and for each row call a sp that takes all the parameters of #temp table as input.
How can I achieve this?
you could use a cursor:
DECLARE #id int
DECLARE #pass varchar(100)
DECLARE cur CURSOR FOR SELECT Id, Password FROM #temp
OPEN cur
FETCH NEXT FROM cur INTO #id, #pass
WHILE ##FETCH_STATUS = 0 BEGIN
EXEC mysp #id, #pass ... -- call your sp here
FETCH NEXT FROM cur INTO #id, #pass
END
CLOSE cur
DEALLOCATE cur
Try returning the dataset from your stored procedure to your datatable in C# or VB.Net. Then the large amount of data in your datatable can be copied to your destination table using a Bulk Copy. I have used BulkCopy for loading large datatables with thousands of rows, into Sql tables with great success in terms of performance.
You may want to experiment with BulkCopy in your C# or VB.Net code.
something like this?
DECLARE maxval, val, #ind INT;
SELECT MAX(ID) as maxval FROM table;
while (ind <= maxval ) DO
select `value` as val from `table` where `ID`=ind;
CALL fn(val);
SET ind = ind+1;
end while;
You can do something like this
Declare #min int=0, #max int =0 --Initialize variable here which will be use in loop
Declare #Recordid int,#TO nvarchar(30),#Subject nvarchar(250),#Body nvarchar(max) --Initialize variable here which are useful for your
select ROW_NUMBER() OVER(ORDER BY [Recordid] ) AS Rownumber, Recordid, [To], [Subject], [Body], [Flag]
into #temp_Mail_Mstr FROM Mail_Mstr where Flag='1' --select your condition with row number & get into a temp table
set #min = (select MIN(Rownumber) from #temp_Mail_Mstr); --Get minimum row number from temp table
set #max = (select Max(Rownumber) from #temp_Mail_Mstr); --Get maximum row number from temp table
while(#min <= #max)
BEGIN
select #Recordid=Recordid, #To=[To], #Subject=[Subject], #Body=Body from #temp_Mail_Mstr where Rownumber=#min
-- You can use your variables (like #Recordid,#To,#Subject,#Body) here
-- Do your work here
set #min=#min+1 --Increment of current row number
END
You always don't need a cursor for this. You can do it with a while loop. You should avoid cursors whenever possible. While loop is faster than cursors.

SQL while loop with Temp Table

I need to create a temporary table and then update the original table. Creating the temporary table is not a problem.
create table #mod_contact
(
id INT IDENTITY NOT NULL PRIMARY KEY,
SiteID INT,
Contact1 varchar(25)
)
INSERT INTO #mod_contact (SiteID, Contact1)
select r.id, r.Contact from dbo.table1 r where CID = 142
GO
Now I need to loop through the table and update r.contact = SiteID + r.contact
I have never used a while loop before and can't seem to make any examples I have seen work.
You can do this in multiple ways, but I think you're looking for a way using a cursor.
A cursor is sort of a pointer in a table, which when incremented points to the next record. ( it's more or less analogeous to a for-next loop )
to use a cursor you can do the following:
-- DECLARE the cursor
DECLARE CUR CURSOR FAST_FORWARD READ_ONLY FOR SELECT id, siteId, contract FROM #mod_contract
-- DECLARE some variables to store the values in
DECLARE #varId int
DECLARE #varSiteId int
DECLARE #varContract varchar(25)
-- Use the cursor
OPEN CUR
FETCH NEXT FROM CUR INTO #varId, #varSiteId, #varContract
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE dbo.table1
SET contract = #varSiteId + #varContract -- It might not work due to the different types
WHERE id = #varId
FETCH NEXT FROM CUR INTO #varId, #varSiteId, #varContract
END
CLOSE CUR
DEALLOCATE CUR
It's not the most efficient way to get this done, but I think this is what you where looking for.
Hope it helps.
Use a set based approach - no need to loop (from the little details):
UPDATE
r
SET
r.Contact = m.SiteID + r.Contact
FROM
table1 r
INNER JOIN
#mod_contact m
ON m.id=r.id
Your brain wants to do this:
while records
update(i); //update record i
records = records + 1
end while
SQL is set based and allows you to take a whole bunch of records and update them in a single command. The beauty of this is you can use the WHERE clause to filter certain rows that are not needed.
As others have mentioned, learning how to do loops in SQL is generally a bad idea; however, since you're trying to understand how to do something, here's an example:
DECLARE #id int
SELECT #ID =1
WHILE #ID <= (SELECT MAX(ID) FROM table_1)
-- while some condition is true, then do the following
--actions between the BEGIN and END
BEGIN
UPDATE table_1
SET contact = CAST(siteID as varchar(100)) + contact
WHERE table_1.CID = #ID
--increment the step variable so that the condition will eventually be false
SET #ID = #ID + 1
END
--do something else once the condition is satisfied
PRINT 'DONE!! Don't try this in production code...'
Try this one:
-- DECLARE the cursor
DECLARE CUR CURSOR FAST_FORWARD READ_ONLY FOR SELECT column1,column2 FROM table
-- DECLARE some variables to store the values in
DECLARE #varId int
DECLARE #varSiteId int
--DECLARE #varContract varchar(25)
-- Use the cursor
OPEN CUR
FETCH NEXT FROM CUR INTO #varId, #varSiteId
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT *
FROM Table2
WHERE column1 = #varId
AND column2 = #varSiteId
FETCH NEXT FROM CUR INTO #varId, #varSiteId
END
CLOSE CUR
DEALLOCATE CUR
need to create a temporary table and then up date the original table.
Why use a temporary table at all? Your CID column doesn't appear in the temporary table, so I don't see how you can successfully update the original table using SiteID, unless there is only one row where CID = 142 in which using a temp table is definitely overkill.
You can just do this:
UPDATE dbo.table1
SET contact = SiteID + contact
WHERE CID = 142;
Here's a related example which may help getting you to 'think in SQL':
UPDATE T
SET A = B, B = A;
Assuming A and B are of the same type, this would successfully swap their values.

How to update a column fetched by a cursor in TSQL

Before I go any further: Yes, I know that cursors perform poorly compared with set-based operations. In this particular case I'm running a cursor on a temporary table of 100 or so records, and that temporary table will always be fairly small, so performance is less crucial than flexibility.
My difficulty is that I'm having trouble finding an example of how to update a column fetched by a cursor. Previously when I've used cursors I've retrieved values into variables, then run an update query at each step based upon these values. On this occasion I want to update a field in the temporary table, yet I can't figure out how to do it.
In the example below, I'm trying to update the field CurrentPOs in temporary table #t1, based upon a query that uses #t1.Product_ID to look up the required value. You will see in the code that I have attempted to use the notation curPO.Product_ID to reference this, but it doesn't work. I have also attempted to use an update statement against curPO, also unsuccessfully.
I can make the code work by fetching to variables, but I'd like to know how to update the field directly.
I think I'm probably missing something obvious, but can anyone help?
declare curPO cursor
for select Product_ID, CurrentPOs from #t1
for update of CurrentPOs
open curPO
fetch next from curPO
while ##fetch_status = 0
begin
select OrderQuantity = <calculation>,
ReceiveQuantity = <calculation>
into #POs
from PurchaseOrderLine POL
inner join SupplierAddress SA ON POL.Supplier_ID = SA.Supplier_ID
inner join PurchaseOrderHeader POH ON POH.PurchaseOrder_ID = POL.PurchaseOrder_ID
where Product_ID = curPO.Product_ID
and SA.AddressType = '1801'
update curPO set CurrentPOs = (select sum(OrderQuantity) - sum(ReceiveQuantity) from #POs)
drop table #POs
fetch next from curPO
end
close curPO
deallocate curPO
After doing a bit more googling, I found a partial solution. The update code is as follows:
UPDATE #T1
SET CURRENTPOS = (SELECT SUM(ORDERQUANTITY) - SUM(RECEIVEQUANTITY)
FROM #POS)
WHERE CURRENT OF CURPO
I still had to use FETCH INTO, however, to retrieve #t1.Product_ID and run the query that produces #POs, so I'd still like to know if it's possible to use FETCH on it's own.
Is this what you want?
declare curPO cursor
for select Product_ID, CurrentPOs from #t1
for update of CurrentPOs
open curPO
fetch next from curPO
while ##fetch_status = 0
begin
update curPO set CurrentPOs =
(select sum(<OrderQuantityCalculation>)
from PurchaseOrderLine POL
inner join SupplierAddress SA ON POL.Supplier_ID = SA.Supplier_ID
inner join PurchaseOrderHeader POH ON POH.PurchaseOrder_ID = POL.PurchaseOrder_ID
where Product_ID = curPO.Product_ID
and SA.AddressType = '1801') -
(select sum(<ReceiveQuantityCalculation>)
from PurchaseOrderLine POL
inner join SupplierAddress SA ON POL.Supplier_ID = SA.Supplier_ID
inner join PurchaseOrderHeader POH ON POH.PurchaseOrder_ID = POL.PurchaseOrder_ID
where Product_ID = curPO.Product_ID
and SA.AddressType = '1801')
fetch next from curPO
end
close curPO
deallocate curPO
Maybe you need something like that:
update DataBaseName..TableName
set ColumnName = value
where current of your_cursor_name;
Here's an example to calculate one column based upon values from two others (note, this could be done during the original table select). This example can be copy / pasted into an SSMS query window to be run without the need for any editing.
DECLARE #cust_id INT = 2, #dynamic_val NVARCHAR(40), #val_a INT, #val_b INT
DECLARE #tbl_invoice table(Cust_ID INT, Cust_Fees INT, Cust_Tax INT)
INSERT #tbl_invoice ( Cust_ID, Cust_Fees, Cust_Tax ) SELECT 1, 111, 11
INSERT #tbl_invoice ( Cust_ID, Cust_Fees, Cust_Tax ) SELECT 2, 222, 22
INSERT #tbl_invoice ( Cust_ID, Cust_Fees, Cust_Tax ) SELECT 3, 333, 33
DECLARE #TblCust TABLE
(
Rec_ID INT
, Val_A INT
, Val_B INT
, Dynamic_Val NVARCHAR(40)
, PRIMARY KEY NONCLUSTERED (Rec_ID)
)
INSERT #TblCust(Rec_ID, Val_A, Val_B, Dynamic_Val)
SELECT Rec_ID = Cust_ID, Val_A = Cust_Fees, Val_B = Cust_Tax, NULL
FROM #tbl_invoice
DECLARE cursor_cust CURSOR FOR
SELECT Rec_ID, Val_A, Val_B, Dynamic_Val
FROM #TblCust
WHERE Rec_ID <> #cust_id
FOR UPDATE OF Dynamic_Val;
OPEN cursor_cust;
FETCH NEXT FROM cursor_cust INTO #cust_id, #val_a, #val_b, #dynamic_val;
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE #TblCust
SET Dynamic_Val = N'#c = "' + LTRIM(STR((#val_a + #val_b), 40)) + N'"'
WHERE CURRENT OF cursor_cust
FETCH NEXT FROM cursor_cust INTO #cust_id, #val_a, #val_b, #dynamic_val;
END
CLOSE cursor_cust
DEALLOCATE cursor_cust
SELECT * FROM #TblCust