Deduplication of imported records in SQL server

Deduplication of imported records in SQL server - sql

I have the following T_SQL Stored Procedure that is currently taking up 50% of the total time needed to run all processes on newly imported records into our backend analysis suite. Unfortunately, this data needs to be imported every time and is causing a bottleneck as our DB size grows.
Basically, we are trying to identify all duplicate in the records and keep only one of them.
DECLARE #status INT
SET #status = 3
DECLARE #contactid INT
DECLARE #email VARCHAR (100)
--Contacts
DECLARE email_cursor CURSOR FOR
SELECT email FROM contacts WHERE (reference = #reference AND status = 1 ) GROUP BY email HAVING (COUNT(email) > 1)
OPEN email_cursor
FETCH NEXT FROM email_cursor INTO #email
WHILE ##FETCH_STATUS = 0
BEGIN
PRINT #email
UPDATE contacts SET duplicate = 1, status = #status WHERE email = #email and reference = #reference AND status = 1
SELECT TOP 1 #contactid = id FROM contacts where reference = #reference and email = #email AND duplicate = 1
UPDATE contacts SET duplicate =0, status = 1 WHERE id = #contactid
FETCH NEXT FROM email_cursor INTO #email
END
CLOSE email_cursor
DEALLOCATE email_cursor
I have added all the indexes I can see from query execution plans, but it may be possible to update the entire SP to run differently, as I have managed to do with others.

Use this single query to de-dup.
;with tmp as (
select *
,rn=row_number() over (partition by email, reference order by id)
,c=count(1) over (partition by email, reference)
from contacts
where status = 1
)
update tmp
set duplicate = case when rn=1 then 0 else 1 end
,status = case when rn=1 then 1 else 3 end
where c > 1
;
It will only de-dup among the records where status=1, and considers rows with the same (email,reference) combination as dups.

Related

SQL Grouping with condition

I want to sum rows in table. The algorithm is rather simple in theory but hard (at least for me) when I need to build a query.
Generally, I want to sum "values" of a "sub-group". Sub-group is defined as a range of elements starting with first row where type=0 and finishing with last row where type=1. the sub-group should contain only one (first) row with type=0.
The sample below presents correct (left) and incorrect (right) behavior.
I tried several approaches including grouping and partitioning. Unfortunately w/o any success. Anybody had similar problem?
I used MS SQL Server (so T-SQL 'magic' is allowed)
EDIT:
The results I want:
"ab",6
"cdef",20
"ghi",10
"kl",8

You can identify the groups by doing a cumulative sum of zeros. Then use aggregation or window functions.
Note that SQL tables represent unordered sets, so you need a column to specify the ordering. The code below assumes that this column is id.
select min(id), max(id), sum(value)
from (select t.*,
sum(case when type = 0 then 1 else 0 end) over (order by id) as grp
from t
) t
group by grp
order by min(id);

You can use window function with cumulative approach :
select t.*, sum(value) over (partition by grp)
from (select t.*, sum(case when type = 0 then 1 else 0 end) over (order by id) as grp
from table t
) t
where grp > 0;

Solution with a cursor and output-table.
As Gordon wrote it is not defined how the set will be ordered, so ID is also used here.
declare #output as table (
ID_sum nvarchar(max)
,value_sum int
)
DECLARE #ID as nvarchar(1)
,#value as int
,#type as int
,#ID_sum as nvarchar(max)
,#value_sum as int
,#last_type as int
DECLARE group_cursor CURSOR FOR
SELECT [ID],[value],[type]
FROM [t]
ORDER BY ID
OPEN group_cursor
FETCH NEXT FROM group_cursor
INTO #ID, #value,#type
WHILE ##FETCH_STATUS = 0
BEGIN
if (#last_type is null and #type = 0)
begin
set #ID_sum = #ID
set #value_sum = #value
end
if (#last_type in(0,1) and #type = 1)
begin
set #ID_sum += #ID
set #value_sum += #value
end
if (#last_type = 1 and #type = 0)
begin
insert into #output values (#ID_sum, #value_sum)
set #ID_sum = #ID
set #value_sum = #value
end
if (#last_type = 0 and #type = 0)
begin
set #ID_sum = #ID
set #value_sum = #value
end
set #last_type = #type
FETCH NEXT FROM group_cursor
INTO #ID, #value,#type
END
CLOSE group_cursor;
DEALLOCATE group_cursor;
if (#last_type = 1)
begin
insert into #output values (#ID_sum, #value_sum)
end
select *
from #output

Loop and update without cursor

I have a table where item balances are stored.
CREATE TABLE itembalance (
ItemID VARCHAR(15),
RemainingQty INT,
Cost Money,
Id INT
)
I need to make sure that whenever an item is being sent out, the proper balances are deducted from the itembalance table. I do it this way:
DECLARE crsr CURSOR LOCAL FAST_FORWARD FOR
SELECT
itembalance.Cost,
itembalance.RemainingQty
itembalance.Id
FROM dbo.itembalance
WHERE itembalance.ItemID = #v_item_to_be_updated AND RemainingQty > 0
OPEN crsr
FETCH crsr
INTO
#cost,
#qty,
#id
WHILE ##FETCH_STATUS = 0
BEGIN
IF #qty >= #qty_to_be_deducted
BEGIN
UPDATE itembalance SET RemainingQty = RemainingQty - #qty_to_be_deducted WHERE Id = #id
/*do something with cost*/ BREAK
END
ELSE
BEGIN
UPDATE itembalance SET RemainingQty = 0 WHERE Id = #id
/*do something with cost*/ SET #qty_to_be_deducted = #qty_to_be_deducted - #qty
END
FETCH crsr
INTO
#cost,
#qty,
#id
END
CLOSE crsr
DEALLOCATE crsr
The table may contain same item code but with different cost. This code is okay for few items being updated at a time but whenever a lot of items/quantities are being sent out, the process becomes really slow. Is there a way to optimize this code? I am guessing the cursor is making it slow so I want to explore a different code for this process.

This looks like you just need a simple CASE expression:
UPDATE dbo.itembalance
SET Qty = CASE WHEN Qty >= #qty_to_be_deducted THEN Qty - #qty_to_be_deducted ELSE 0 END
WHERE ItemID = #v_item_to_be_updated
--What is the difference between Qty and RemainingQty?
--Why are you checking one and updating the other?
AND RemainingQty > 0;

You code is not very clear as to how and why the mechanism is required and works.
However assuming that you must have multiple records with an outstanding balance, and that you must consider multiple records sequentially as part of this mechanism, then you have two options to solve that within SQL (handling in client code is another option):
1) Use a cursor as you have done
2) Use a temp table or table variable and iterate over it - pretty similar to a cursor but might be faster - you'd have to try and see e.g.
declare #TableVariable table (Cost money, RemainingQty int, Id int, OrderBy int, Done bit default(0))
declare #Id int, #Cost money, #RemainingQty int
insert into #TableVariable (Cost, RemainingQty, Id, OrderBy)
SELECT
itembalance.Cost
, itembalance.RemainingQty
, itembalance.Id
, 1 /* Some order by condition */
FROM dbo.itembalance
WHERE itembalance.ItemID = #v_item_to_be_updated AND RemainingQty > 0
while exists (select 1 from #TableVariable where Done = 0) begin
select top 1 #Id = id, #Cost = Cost, #RemainingQty
from #TableVariable
where Done = 0
order by OrderBy
-- Do stuff here
update #TableVariable set Done = 1 where id = #Id
end
However the code you have shown doesn't appear that it should be slow - so it may be that you are lacking the appropriate indexes, and that a single ItemId update is locking too many rows in the ItemBalance table which is then affecting other ItemId updates.

Using a temp table with a stored procedure to cycle through IDs [duplicate]

How can one call a stored procedure for each row in a table, where the columns of a row are input parameters to the sp without using a Cursor?

Generally speaking I always look for a set based approach (sometimes at the expense of changing the schema).
However, this snippet does have its place..
-- Declare & init (2008 syntax)
DECLARE #CustomerID INT = 0
-- Iterate over all customers
WHILE (1 = 1)
BEGIN
-- Get next customerId
SELECT TOP 1 #CustomerID = CustomerID
FROM Sales.Customer
WHERE CustomerID > #CustomerId
ORDER BY CustomerID
-- Exit loop if no more customers
IF ##ROWCOUNT = 0 BREAK;
-- call your sproc
EXEC dbo.YOURSPROC #CustomerId
END

You could do something like this: order your table by e.g. CustomerID (using the AdventureWorks Sales.Customer sample table), and iterate over those customers using a WHILE loop:
-- define the last customer ID handled
DECLARE #LastCustomerID INT
SET #LastCustomerID = 0
-- define the customer ID to be handled now
DECLARE #CustomerIDToHandle INT
-- select the next customer to handle
SELECT TOP 1 #CustomerIDToHandle = CustomerID
FROM Sales.Customer
WHERE CustomerID > #LastCustomerID
ORDER BY CustomerID
-- as long as we have customers......
WHILE #CustomerIDToHandle IS NOT NULL
BEGIN
-- call your sproc
-- set the last customer handled to the one we just handled
SET #LastCustomerID = #CustomerIDToHandle
SET #CustomerIDToHandle = NULL
-- select the next customer to handle
SELECT TOP 1 #CustomerIDToHandle = CustomerID
FROM Sales.Customer
WHERE CustomerID > #LastCustomerID
ORDER BY CustomerID
END
That should work with any table as long as you can define some kind of an ORDER BY on some column.

DECLARE #SQL varchar(max)=''
-- MyTable has fields fld1 & fld2
Select #SQL = #SQL + 'exec myproc ' + convert(varchar(10),fld1) + ','
+ convert(varchar(10),fld2) + ';'
From MyTable
EXEC (#SQL)
Ok, so I would never put such code into production, but it does satisfy your requirements.

I'd use the accepted answer, but another possibility is to use a table variable to hold a numbered set of values (in this case just the ID field of a table) and loop through those by Row Number with a JOIN to the table to retrieve whatever you need for the action within the loop.
DECLARE #RowCnt int; SET #RowCnt = 0 -- Loop Counter
-- Use a table variable to hold numbered rows containg MyTable's ID values
DECLARE #tblLoop TABLE (RowNum int IDENTITY (1, 1) Primary key NOT NULL,
ID INT )
INSERT INTO #tblLoop (ID) SELECT ID FROM MyTable
-- Vars to use within the loop
DECLARE #Code NVarChar(10); DECLARE #Name NVarChar(100);
WHILE #RowCnt < (SELECT COUNT(RowNum) FROM #tblLoop)
BEGIN
SET #RowCnt = #RowCnt + 1
-- Do what you want here with the data stored in tblLoop for the given RowNum
SELECT #Code=Code, #Name=LongName
FROM MyTable INNER JOIN #tblLoop tL on MyTable.ID=tL.ID
WHERE tl.RowNum=#RowCnt
PRINT Convert(NVarChar(10),#RowCnt) +' '+ #Code +' '+ #Name
END

Marc's answer is good (I'd comment on it if I could work out how to!)
Just thought I'd point out that it may be better to change the loop so the SELECT only exists once (in a real case where I needed to do this, the SELECT was quite complex, and writing it twice was a risky maintenance issue).
-- define the last customer ID handled
DECLARE #LastCustomerID INT
SET #LastCustomerID = 0
-- define the customer ID to be handled now
DECLARE #CustomerIDToHandle INT
SET #CustomerIDToHandle = 1
-- as long as we have customers......
WHILE #LastCustomerID <> #CustomerIDToHandle
BEGIN
SET #LastCustomerId = #CustomerIDToHandle
-- select the next customer to handle
SELECT TOP 1 #CustomerIDToHandle = CustomerID
FROM Sales.Customer
WHERE CustomerID > #LastCustomerId
ORDER BY CustomerID
IF #CustomerIDToHandle <> #LastCustomerID
BEGIN
-- call your sproc
END
END

If you can turn the stored procedure into a function that returns a table, then you can use cross-apply.
For example, say you have a table of customers, and you want to compute the sum of their orders, you would create a function that took a CustomerID and returned the sum.
And you could do this:
SELECT CustomerID, CustomerSum.Total
FROM Customers
CROSS APPLY ufn_ComputeCustomerTotal(Customers.CustomerID) AS CustomerSum
Where the function would look like:
CREATE FUNCTION ComputeCustomerTotal
(
#CustomerID INT
)
RETURNS TABLE
AS
RETURN
(
SELECT SUM(CustomerOrder.Amount) AS Total FROM CustomerOrder WHERE CustomerID = #CustomerID
)
Obviously, the example above could be done without a user defined function in a single query.
The drawback is that functions are very limited - many of the features of a stored procedure are not available in a user-defined function, and converting a stored procedure to a function does not always work.

For SQL Server 2005 onwards, you can do this with CROSS APPLY and a table-valued function.
Using CROSS APPLY in SQL Server 2005
Just for clarity, I'm referring to those cases where the stored procedure can be converted into a table valued function.

This is a variation on the answers already provided, but should be better performing because it doesn't require ORDER BY, COUNT or MIN/MAX. The only disadvantage with this approach is that you have to create a temp table to hold all the Ids (the assumption is that you have gaps in your list of CustomerIDs).
That said, I agree with #Mark Powell though that, generally speaking, a set based approach should still be better.
DECLARE #tmp table (Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL, CustomerID INT NOT NULL)
DECLARE #CustomerId INT
DECLARE #Id INT = 0
INSERT INTO #tmp SELECT CustomerId FROM Sales.Customer
WHILE (1=1)
BEGIN
SELECT #CustomerId = CustomerId, #Id = Id
FROM #tmp
WHERE Id = #Id + 1
IF ##rowcount = 0 BREAK;
-- call your sproc
EXEC dbo.YOURSPROC #CustomerId;
END

This is a variation of n3rds solution above. No sorting by using ORDER BY is needed, as MIN() is used.
Remember that CustomerID (or whatever other numerical column you use for progress) must have a unique constraint. Furthermore, to make it as fast as possible CustomerID must be indexed on.
-- Declare & init
DECLARE #CustomerID INT = (SELECT MIN(CustomerID) FROM Sales.Customer); -- First ID
DECLARE #Data1 VARCHAR(200);
DECLARE #Data2 VARCHAR(200);
-- Iterate over all customers
WHILE #CustomerID IS NOT NULL
BEGIN
-- Get data based on ID
SELECT #Data1 = Data1, #Data2 = Data2
FROM Sales.Customer
WHERE [ID] = #CustomerID ;
-- call your sproc
EXEC dbo.YOURSPROC #Data1, #Data2
-- Get next customerId
SELECT #CustomerID = MIN(CustomerID)
FROM Sales.Customer
WHERE CustomerID > #CustomerId
END
I use this approach on some varchars I need to look over, by putting them in a temporary table first, to give them an ID.

If you don't what to use a cursor I think you'll have to do it externally (get the table, and then run for each statement and each time call the sp)
it Is the same as using a cursor, but only outside SQL.
Why won't you use a cursor ?

I usually do it this way when it's a quite a few rows:
Select all sproc parameters in a dataset with SQL Management Studio
Right-click -> Copy
Paste in to excel
Create single-row sql statements with a formula like '="EXEC schema.mysproc #param=" & A2' in a new excel column. (Where A2 is your excel column containing the parameter)
Copy the list of excel statements into a new query in SQL Management Studio and execute.
Done.
(On larger datasets i'd use one of the solutions mentioned above though).

DELIMITER //
CREATE PROCEDURE setFakeUsers (OUT output VARCHAR(100))
BEGIN
-- define the last customer ID handled
DECLARE LastGameID INT;
DECLARE CurrentGameID INT;
DECLARE userID INT;
SET #LastGameID = 0;
-- define the customer ID to be handled now
SET #userID = 0;
-- select the next game to handle
SELECT #CurrentGameID = id
FROM online_games
WHERE id > LastGameID
ORDER BY id LIMIT 0,1;
-- as long as we have customers......
WHILE (#CurrentGameID IS NOT NULL)
DO
-- call your sproc
-- set the last customer handled to the one we just handled
SET #LastGameID = #CurrentGameID;
SET #CurrentGameID = NULL;
-- select the random bot
SELECT #userID = userID
FROM users
WHERE FIND_IN_SET('bot',baseInfo)
ORDER BY RAND() LIMIT 0,1;
-- update the game
UPDATE online_games SET userID = #userID WHERE id = #CurrentGameID;
-- select the next game to handle
SELECT #CurrentGameID = id
FROM online_games
WHERE id > LastGameID
ORDER BY id LIMIT 0,1;
END WHILE;
SET output = "done";
END;//
CALL setFakeUsers(#status);
SELECT #status;

A better solution for this is to
Copy/past code of Stored Procedure
Join that code with the table for which you want to run it again (for each row)
This was you get a clean table-formatted output. While if you run SP for every row, you get a separate query result for each iteration which is ugly.

In case the order is important
--declare counter
DECLARE #CurrentRowNum BIGINT = 0;
--Iterate over all rows in [DataTable]
WHILE (1 = 1)
BEGIN
--Get next row by number of row
SELECT TOP 1 #CurrentRowNum = extendedData.RowNum
--here also you can store another values
--for following usage
--#MyVariable = extendedData.Value
FROM (
SELECT
data.*
,ROW_NUMBER() OVER(ORDER BY (SELECT 0)) RowNum
FROM [DataTable] data
) extendedData
WHERE extendedData.RowNum > #CurrentRowNum
ORDER BY extendedData.RowNum
--Exit loop if no more rows
IF ##ROWCOUNT = 0 BREAK;
--call your sproc
--EXEC dbo.YOURSPROC #MyVariable
END

I had some production code that could only handle 20 employees at a time, below is the framework for the code. I just copied the production code and removed stuff below.
ALTER procedure GetEmployees
#ClientId varchar(50)
as
begin
declare #EEList table (employeeId varchar(50));
declare #EE20 table (employeeId varchar(50));
insert into #EEList select employeeId from Employee where (ClientId = #ClientId);
-- Do 20 at a time
while (select count(*) from #EEList) > 0
BEGIN
insert into #EE20 select top 20 employeeId from #EEList;
-- Call sp here
delete #EEList where employeeId in (select employeeId from #EE20)
delete #EE20;
END;
RETURN
end

I had a situation where I needed to perform a series of operations on a result set (table). The operations are all set operations, so its not an issue, but...
I needed to do this in multiple places. So putting the relevant pieces in a table type, then populating a table variable w/ each result set allows me to call the sp and repeat the operations each time i need to .
While this does not address the exact question he asks, it does address how to perform an operation on all rows of a table without using a cursor.
#Johannes offers no insight into his motivation , so this may or may not help him.
my research led me to this well written article which served as a basis for my solution
https://codingsight.com/passing-data-table-as-parameter-to-stored-procedures/
Here is the setup
drop type if exists cpRootMapType
go
create type cpRootMapType as Table(
RootId1 int
, RootId2 int
)
go
drop procedure if exists spMapRoot2toRoot1
go
create procedure spMapRoot2toRoot1
(
#map cpRootMapType Readonly
)
as
update linkTable set root = root1
from linktable lt
join #map m on lt.root = root2
update comments set root = root1
from comments c
join #map m on c.root = root2
-- ever growing list of places this map would need to be applied....
-- now consolidated into one place
here is the implementation
... populate #matches
declare #map cpRootMapType
insert #map select rootid1, rootid2 from #matches
exec spMapRoot2toRoot1 #map

I like to do something similar to this (though it is still very similar to using a cursor)
[code]
-- Table variable to hold list of things that need looping
DECLARE #holdStuff TABLE (
id INT IDENTITY(1,1) ,
isIterated BIT DEFAULT 0 ,
someInt INT ,
someBool BIT ,
otherStuff VARCHAR(200)
)
-- Populate your #holdStuff with... stuff
INSERT INTO #holdStuff (
someInt ,
someBool ,
otherStuff
)
SELECT
1 , -- someInt - int
1 , -- someBool - bit
'I like turtles' -- otherStuff - varchar(200)
UNION ALL
SELECT
42 , -- someInt - int
0 , -- someBool - bit
'something profound' -- otherStuff - varchar(200)
-- Loop tracking variables
DECLARE #tableCount INT
SET #tableCount = (SELECT COUNT(1) FROM [#holdStuff])
DECLARE #loopCount INT
SET #loopCount = 1
-- While loop variables
DECLARE #id INT
DECLARE #someInt INT
DECLARE #someBool BIT
DECLARE #otherStuff VARCHAR(200)
-- Loop through item in #holdStuff
WHILE (#loopCount <= #tableCount)
BEGIN
-- Increment the loopCount variable
SET #loopCount = #loopCount + 1
-- Grab the top unprocessed record
SELECT TOP 1
#id = id ,
#someInt = someInt ,
#someBool = someBool ,
#otherStuff = otherStuff
FROM #holdStuff
WHERE isIterated = 0
-- Update the grabbed record to be iterated
UPDATE #holdAccounts
SET isIterated = 1
WHERE id = #id
-- Execute your stored procedure
EXEC someRandomSp #someInt, #someBool, #otherStuff
END
[/code]
Note that you don't need the identity or the isIterated column on your temp/variable table, i just prefer to do it this way so i don't have to delete the top record from the collection as i iterate through the loop.

Sql Optimization on advertising system

I am currently developing on an advertising system, which have been running just fine for a while now, apart from recently when our views per day have shot up from about 7k to 328k. Our server cannot take the pressure on this anymore - and knowing that I am not the best SQL guy around (hey, I can make it work, but not always in the best way) I am asking here for some optimization guidelines. I hope that some of you will be able to give rough ideas on how to improve this - I don't specifically need code, just to see the light :).
As it is at the moment, when an advert is supposed to be shown a PHP script is called, which in return calls a stored procedure. This stored procedure does several checks, it tests up against our customer database to see if the person showing the advert (given by a primary key id) is an actual customer under the given locale (our system is running on several languages which are all run as separate sites). Next up is all the advert details fetched out (image location as an url, height and width of the advert) - and lest step calls a separate stored procedure to test if the advert is allowed to be shown (is the campaign expired by either date or number of adverts allowed to show?) and if the customer has access to it (we got 2 access systems running, a blacklist and a whitelist one) and lastly what type of campaign we're running, is the view unique and so forth.
The code consists of a couple of stored procedures that I will post in here.
--- procedure called from PHP
CREATE PROCEDURE [dbo].[ExecView]
(
#publisherId bigint,
#advertId bigint,
#localeId int,
#ip varchar(15),
#ipIsUnique bit,
#success bit OUTPUT,
#campaignId bigint OUTPUT,
#advert varchar(500) OUTPUT,
#advertWidth int OUTPUT,
#advertHeight int OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #unique bit
DECLARE #approved bit
DECLARE #publisherEarning money
DECLARE #advertiserCost money
DECLARE #originalStatus smallint
DECLARE #advertUrl varchar(500)
DECLARE #return int
SELECT #success = 1, #advert = NULL, #advertHeight = NULL, #advertWidth = NULL
--- Must be valid publisher, ie exist and actually be a publisher
IF dbo.IsValidPublisher(#publisherId, #localeId) = 0
BEGIN
SELECT #success = 0
RETURN 0
END
--- Must be a valid advert
EXEC #return = FetchAdvertDetails #advertId, #localeId, #advert OUTPUT, #advertUrl OUTPUT, #advertWidth OUTPUT, #advertHeight OUTPUT
IF #return = 0
BEGIN
SELECT #success = 0
RETURN 0
END
EXEC CanAddStatToAdvert 2, #advertId, #publisherId, #ip, #ipIsUnique, #success OUTPUT, #unique OUTPUT, #approved OUTPUT, #publisherEarning OUTPUT, #advertiserCost OUTPUT, #originalStatus OUTPUT, #campaignId OUTPUT
IF #success = 1
BEGIN
INSERT INTO dbo.Stat (AdvertId, [Date], Ip, [Type], PublisherEarning, AdvertiserCost, [Unique], Approved, PublisherCustomerId, OriginalStatus)
VALUES (#advertId, GETDATE(), #ip, 2, #publisherEarning, #advertiserCost, #unique, #approved, #publisherId, #originalStatus)
END
END
--- IsValidPublisher
CREATE FUNCTION [dbo].[IsValidPublisher]
(
#publisherId bigint,
#localeId int
)
RETURNS bit
AS
BEGIN
DECLARE #customerType smallint
DECLARE #result bit
SET #customerType = (SELECT [Type] FROM dbo.Customer
WHERE CustomerId = #publisherId AND Deleted = 0 AND IsApproved = 1 AND IsBlocked = 0 AND LocaleId = #localeId)
IF #customerType = 2
SET #result = 1
ELSE
SET #result = 0
RETURN #result
END
-- Fetch advert details
CREATE PROCEDURE [dbo].[FetchAdvertDetails]
(
#advertId bigint,
#localeId int,
#advert varchar(500) OUTPUT,
#advertUrl varchar(500) OUTPUT,
#advertWidth int OUTPUT,
#advertHeight int OUTPUT
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SELECT #advert = T1.Advert, #advertUrl = T1.TargetUrl, #advertWidth = T1.Width, #advertHeight = T1.Height FROM Advert as T1
INNER JOIN Campaign AS T2 ON T1.CampaignId = T2.Id
WHERE T1.Id = #advertId AND T2.LocaleId = #localeId AND T2.Deleted = 0 AND T2.[Status] <> 1
IF #advert IS NULL
RETURN 0
ELSE
RETURN 1
END
--- CanAddStatToAdvert
CREATE PROCEDURE [dbo].[CanAddStatToAdvert]
#type smallint, --- Type of stat to add
#advertId bigint,
#publisherId bigint,
#ip varchar(15),
#ipIsUnique bit,
#success bit OUTPUT,
#unique bit OUTPUT,
#approved bit OUTPUT,
#publisherEarning money OUTPUT,
#advertiserCost money OUTPUT,
#originalStatus smallint OUTPUT,
#campaignId bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #campaignLimit int
DECLARE #campaignStatus smallint
DECLARE #advertsLeft int
DECLARE #campaignType smallint
DECLARE #campaignModeration smallint
DECLARE #count int
SELECT #originalStatus = 0
SELECT #success = 1
SELECT #approved = 1
SELECT #unique = 1
SELECT #campaignId = CampaignId FROM dbo.Advert
WHERE Id = #advertId
IF #campaignId IS NULL
BEGIN
SELECT #success = 0
RETURN
END
SELECT #campaignLimit = Limit, #campaignStatus = [Status], #campaignType = [Type], #publisherEarning = PublisherEarning, #advertiserCost = AdvertiserCost, #campaignModeration = ModerationType FROM dbo.Campaign
WHERE Id = #campaignId
IF (#type <> 0 AND #type <> 2 AND #type <> #campaignType) OR ((#campaignType = 0 OR #campaignType = 2) AND (#type = 1)) -- if not a click or view type, then type must match the campaign (ie, only able to do leads on lead campaigns, no isales or etc), click and view campaigns however can do leads too
BEGIN
SELECT #success = 0
RETURN
END
-- Take advantage of the fact that the variable only gets touched if there is a record,
-- which is supposed to override the existing one, if there is one
SELECT #publisherEarning = Earning FROM dbo.MapCampaignPublisherEarning
WHERE CanpaignId = #campaignId AND PublisherId = #publisherId
IF #campaignStatus = 1
BEGIN
SELECT #success = 0
RETURN
END
IF NOT #campaignLimit IS NULL
BEGIN
SELECT #advertsLeft = AdvertsLeft FROM dbo.Campaign WHERE Id = #campaignId
IF #advertsLeft < 1
BEGIN
SELECT #success = 0
RETURN
END
END
IF #campaignModeration = 0 -- blacklist
BEGIN
SELECT #count = COUNT([Status]) FROM dbo.MapCampaignModeration WHERE CampaignId = #campaignId AND PublisherId = #publisherId AND [Status] = 3
IF #count > 0
BEGIN
SELECT #success = 0
RETURN
END
END
ELSE -- whitelist
BEGIN
SELECT #count = COUNT([Status]) FROM dbo.MapCampaignModeration WHERE CampaignId = #campaignId AND PublisherId = #publisherId AND [Status] = 2
IF #count < 1
BEGIN
SELECT #success = 0
RETURN
END
END
IF #ipIsUnique = 1
BEGIN
SELECT #unique = 1
END
ELSE
BEGIN
IF (SELECT COUNT(T1.Id) FROM dbo.Stat AS T1
INNER JOIN dbo.IQ_Advert AS T2
ON T1.AdvertId = T2.Id
WHERE T2.CampaignId = #campaignId
AND T1.[Type] = #type
AND T1.[Unique] = 1
AND T1.PublisherCustomerId = #publisherId
AND T1.Ip = #ip
AND DATEADD(SECOND, 86400, T1.[Date]) > GETDATE()
) = 0
SELECT #unique = 1
ELSE
BEGIN
SELECT #unique = 0, #originalStatus = 1 -- not unique, and set status to be ip conflict
END
END
IF #unique = 0 AND #type <> 0 AND #type <> 2
BEGIN
SELECT #unique = 1, #approved = 0
END
IF #originalStatus = 0
SELECT #originalStatus = 5
IF #approved = 0 OR #type <> #campaignType
BEGIN
SELECT #publisherEarning = 0, #advertiserCost = 0
END
END
I am thinking this needs more than just a couple of indexes thrown in to help it, but rather a total rethinking of how to handle it. I have been heard that running this as a batch would help, but I am not sure how to get this implemented, and really not sure if i can implement it in a such way where I keep all these nice checks before the actual insert or if I have to give up on some of this.
Anyhow, all help would be appreciated, if you need any of the table layouts, let me know :).
Thanks for taking the time to look at it :)

Make sure to reference tables with the ownership prefix. So instead of:
INNER JOIN Campaign AS T2 ON T1.CampaignId = T2.Id
Use
INNER JOIN dbo.Campaign AS T2 ON T1.CampaignId = T2.Id
That will allow the database to cache the execution plan.
Another possibility is to disable database locking, which has data integrity risks, but can significantly increase performance:
INNER JOIN dbo.Campaign AS T2 (nolock) ON T1.CampaignId = T2.Id
Run a sample query in SQL Analyzer with "Show Execution Plan" turned on. This might give you a hint as to the slowest part of the query.

it seems like FetchAdvertDetails hit the same tables as the start of CanAddStatToAdvert (Advert and Campaign). If possible, I'd try to eliminate FetchAdvertDetails and roll its logic into CanAddStatToAdvert, so you don't have the hit Advert and Campaign the extra times.

Get rid of most of the SQL.
This stored procedure does several
checks, it tests up against our
customer database to see if the person
showing the advert (given by a primary
key id) is an actual customer under
the given locale (our system is
running on several languages which are
all run as separate sites). Next up is
all the advert details fetched out
(image location as an url, height and
width of the advert) - and lest step
calls a separate stored procedure to
test if the advert is allowed to be
shown (is the campaign expired by
either date or number of adverts
allowed to show?) and if the customer
has access to it (we got 2 access
systems running, a blacklist and a
whitelist one) and lastly what type of
campaign we're running, is the view
unique and so forth.
Most of this should not be done in the database for every request. In particular:
Customer and local can be stored in memory. Expire them after 5 minutes or so, but do not ask for this info on every repetitive request.
Advert details can also be stored. Every advert will have a "key" to identify it (Number)?. Ust a dictionary / hashtable in memory.
Eliminate as many SQL Parts as you can. Dumping repetitive work on the SQL Database is a typical mistake.

SQL Call Stored Procedure for each Row without using a cursor

How can one call a stored procedure for each row in a table, where the columns of a row are input parameters to the sp without using a Cursor?

Generally speaking I always look for a set based approach (sometimes at the expense of changing the schema).
However, this snippet does have its place..
-- Declare & init (2008 syntax)
DECLARE #CustomerID INT = 0
-- Iterate over all customers
WHILE (1 = 1)
BEGIN
-- Get next customerId
SELECT TOP 1 #CustomerID = CustomerID
FROM Sales.Customer
WHERE CustomerID > #CustomerId
ORDER BY CustomerID
-- Exit loop if no more customers
IF ##ROWCOUNT = 0 BREAK;
-- call your sproc
EXEC dbo.YOURSPROC #CustomerId
END

You could do something like this: order your table by e.g. CustomerID (using the AdventureWorks Sales.Customer sample table), and iterate over those customers using a WHILE loop:
-- define the last customer ID handled
DECLARE #LastCustomerID INT
SET #LastCustomerID = 0
-- define the customer ID to be handled now
DECLARE #CustomerIDToHandle INT
-- select the next customer to handle
SELECT TOP 1 #CustomerIDToHandle = CustomerID
FROM Sales.Customer
WHERE CustomerID > #LastCustomerID
ORDER BY CustomerID
-- as long as we have customers......
WHILE #CustomerIDToHandle IS NOT NULL
BEGIN
-- call your sproc
-- set the last customer handled to the one we just handled
SET #LastCustomerID = #CustomerIDToHandle
SET #CustomerIDToHandle = NULL
-- select the next customer to handle
SELECT TOP 1 #CustomerIDToHandle = CustomerID
FROM Sales.Customer
WHERE CustomerID > #LastCustomerID
ORDER BY CustomerID
END
That should work with any table as long as you can define some kind of an ORDER BY on some column.

DECLARE #SQL varchar(max)=''
-- MyTable has fields fld1 & fld2
Select #SQL = #SQL + 'exec myproc ' + convert(varchar(10),fld1) + ','
+ convert(varchar(10),fld2) + ';'
From MyTable
EXEC (#SQL)
Ok, so I would never put such code into production, but it does satisfy your requirements.

I'd use the accepted answer, but another possibility is to use a table variable to hold a numbered set of values (in this case just the ID field of a table) and loop through those by Row Number with a JOIN to the table to retrieve whatever you need for the action within the loop.
DECLARE #RowCnt int; SET #RowCnt = 0 -- Loop Counter
-- Use a table variable to hold numbered rows containg MyTable's ID values
DECLARE #tblLoop TABLE (RowNum int IDENTITY (1, 1) Primary key NOT NULL,
ID INT )
INSERT INTO #tblLoop (ID) SELECT ID FROM MyTable
-- Vars to use within the loop
DECLARE #Code NVarChar(10); DECLARE #Name NVarChar(100);
WHILE #RowCnt < (SELECT COUNT(RowNum) FROM #tblLoop)
BEGIN
SET #RowCnt = #RowCnt + 1
-- Do what you want here with the data stored in tblLoop for the given RowNum
SELECT #Code=Code, #Name=LongName
FROM MyTable INNER JOIN #tblLoop tL on MyTable.ID=tL.ID
WHERE tl.RowNum=#RowCnt
PRINT Convert(NVarChar(10),#RowCnt) +' '+ #Code +' '+ #Name
END

Marc's answer is good (I'd comment on it if I could work out how to!)
Just thought I'd point out that it may be better to change the loop so the SELECT only exists once (in a real case where I needed to do this, the SELECT was quite complex, and writing it twice was a risky maintenance issue).
-- define the last customer ID handled
DECLARE #LastCustomerID INT
SET #LastCustomerID = 0
-- define the customer ID to be handled now
DECLARE #CustomerIDToHandle INT
SET #CustomerIDToHandle = 1
-- as long as we have customers......
WHILE #LastCustomerID <> #CustomerIDToHandle
BEGIN
SET #LastCustomerId = #CustomerIDToHandle
-- select the next customer to handle
SELECT TOP 1 #CustomerIDToHandle = CustomerID
FROM Sales.Customer
WHERE CustomerID > #LastCustomerId
ORDER BY CustomerID
IF #CustomerIDToHandle <> #LastCustomerID
BEGIN
-- call your sproc
END
END

If you can turn the stored procedure into a function that returns a table, then you can use cross-apply.
For example, say you have a table of customers, and you want to compute the sum of their orders, you would create a function that took a CustomerID and returned the sum.
And you could do this:
SELECT CustomerID, CustomerSum.Total
FROM Customers
CROSS APPLY ufn_ComputeCustomerTotal(Customers.CustomerID) AS CustomerSum
Where the function would look like:
CREATE FUNCTION ComputeCustomerTotal
(
#CustomerID INT
)
RETURNS TABLE
AS
RETURN
(
SELECT SUM(CustomerOrder.Amount) AS Total FROM CustomerOrder WHERE CustomerID = #CustomerID
)
Obviously, the example above could be done without a user defined function in a single query.
The drawback is that functions are very limited - many of the features of a stored procedure are not available in a user-defined function, and converting a stored procedure to a function does not always work.

For SQL Server 2005 onwards, you can do this with CROSS APPLY and a table-valued function.
Using CROSS APPLY in SQL Server 2005
Just for clarity, I'm referring to those cases where the stored procedure can be converted into a table valued function.

This is a variation on the answers already provided, but should be better performing because it doesn't require ORDER BY, COUNT or MIN/MAX. The only disadvantage with this approach is that you have to create a temp table to hold all the Ids (the assumption is that you have gaps in your list of CustomerIDs).
That said, I agree with #Mark Powell though that, generally speaking, a set based approach should still be better.
DECLARE #tmp table (Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL, CustomerID INT NOT NULL)
DECLARE #CustomerId INT
DECLARE #Id INT = 0
INSERT INTO #tmp SELECT CustomerId FROM Sales.Customer
WHILE (1=1)
BEGIN
SELECT #CustomerId = CustomerId, #Id = Id
FROM #tmp
WHERE Id = #Id + 1
IF ##rowcount = 0 BREAK;
-- call your sproc
EXEC dbo.YOURSPROC #CustomerId;
END

This is a variation of n3rds solution above. No sorting by using ORDER BY is needed, as MIN() is used.
Remember that CustomerID (or whatever other numerical column you use for progress) must have a unique constraint. Furthermore, to make it as fast as possible CustomerID must be indexed on.
-- Declare & init
DECLARE #CustomerID INT = (SELECT MIN(CustomerID) FROM Sales.Customer); -- First ID
DECLARE #Data1 VARCHAR(200);
DECLARE #Data2 VARCHAR(200);
-- Iterate over all customers
WHILE #CustomerID IS NOT NULL
BEGIN
-- Get data based on ID
SELECT #Data1 = Data1, #Data2 = Data2
FROM Sales.Customer
WHERE [ID] = #CustomerID ;
-- call your sproc
EXEC dbo.YOURSPROC #Data1, #Data2
-- Get next customerId
SELECT #CustomerID = MIN(CustomerID)
FROM Sales.Customer
WHERE CustomerID > #CustomerId
END
I use this approach on some varchars I need to look over, by putting them in a temporary table first, to give them an ID.

If you don't what to use a cursor I think you'll have to do it externally (get the table, and then run for each statement and each time call the sp)
it Is the same as using a cursor, but only outside SQL.
Why won't you use a cursor ?

I usually do it this way when it's a quite a few rows:
Select all sproc parameters in a dataset with SQL Management Studio
Right-click -> Copy
Paste in to excel
Create single-row sql statements with a formula like '="EXEC schema.mysproc #param=" & A2' in a new excel column. (Where A2 is your excel column containing the parameter)
Copy the list of excel statements into a new query in SQL Management Studio and execute.
Done.
(On larger datasets i'd use one of the solutions mentioned above though).

DELIMITER //
CREATE PROCEDURE setFakeUsers (OUT output VARCHAR(100))
BEGIN
-- define the last customer ID handled
DECLARE LastGameID INT;
DECLARE CurrentGameID INT;
DECLARE userID INT;
SET #LastGameID = 0;
-- define the customer ID to be handled now
SET #userID = 0;
-- select the next game to handle
SELECT #CurrentGameID = id
FROM online_games
WHERE id > LastGameID
ORDER BY id LIMIT 0,1;
-- as long as we have customers......
WHILE (#CurrentGameID IS NOT NULL)
DO
-- call your sproc
-- set the last customer handled to the one we just handled
SET #LastGameID = #CurrentGameID;
SET #CurrentGameID = NULL;
-- select the random bot
SELECT #userID = userID
FROM users
WHERE FIND_IN_SET('bot',baseInfo)
ORDER BY RAND() LIMIT 0,1;
-- update the game
UPDATE online_games SET userID = #userID WHERE id = #CurrentGameID;
-- select the next game to handle
SELECT #CurrentGameID = id
FROM online_games
WHERE id > LastGameID
ORDER BY id LIMIT 0,1;
END WHILE;
SET output = "done";
END;//
CALL setFakeUsers(#status);
SELECT #status;

A better solution for this is to
Copy/past code of Stored Procedure
Join that code with the table for which you want to run it again (for each row)
This was you get a clean table-formatted output. While if you run SP for every row, you get a separate query result for each iteration which is ugly.

In case the order is important
--declare counter
DECLARE #CurrentRowNum BIGINT = 0;
--Iterate over all rows in [DataTable]
WHILE (1 = 1)
BEGIN
--Get next row by number of row
SELECT TOP 1 #CurrentRowNum = extendedData.RowNum
--here also you can store another values
--for following usage
--#MyVariable = extendedData.Value
FROM (
SELECT
data.*
,ROW_NUMBER() OVER(ORDER BY (SELECT 0)) RowNum
FROM [DataTable] data
) extendedData
WHERE extendedData.RowNum > #CurrentRowNum
ORDER BY extendedData.RowNum
--Exit loop if no more rows
IF ##ROWCOUNT = 0 BREAK;
--call your sproc
--EXEC dbo.YOURSPROC #MyVariable
END

I had some production code that could only handle 20 employees at a time, below is the framework for the code. I just copied the production code and removed stuff below.
ALTER procedure GetEmployees
#ClientId varchar(50)
as
begin
declare #EEList table (employeeId varchar(50));
declare #EE20 table (employeeId varchar(50));
insert into #EEList select employeeId from Employee where (ClientId = #ClientId);
-- Do 20 at a time
while (select count(*) from #EEList) > 0
BEGIN
insert into #EE20 select top 20 employeeId from #EEList;
-- Call sp here
delete #EEList where employeeId in (select employeeId from #EE20)
delete #EE20;
END;
RETURN
end

I had a situation where I needed to perform a series of operations on a result set (table). The operations are all set operations, so its not an issue, but...
I needed to do this in multiple places. So putting the relevant pieces in a table type, then populating a table variable w/ each result set allows me to call the sp and repeat the operations each time i need to .
While this does not address the exact question he asks, it does address how to perform an operation on all rows of a table without using a cursor.
#Johannes offers no insight into his motivation , so this may or may not help him.
my research led me to this well written article which served as a basis for my solution
https://codingsight.com/passing-data-table-as-parameter-to-stored-procedures/
Here is the setup
drop type if exists cpRootMapType
go
create type cpRootMapType as Table(
RootId1 int
, RootId2 int
)
go
drop procedure if exists spMapRoot2toRoot1
go
create procedure spMapRoot2toRoot1
(
#map cpRootMapType Readonly
)
as
update linkTable set root = root1
from linktable lt
join #map m on lt.root = root2
update comments set root = root1
from comments c
join #map m on c.root = root2
-- ever growing list of places this map would need to be applied....
-- now consolidated into one place
here is the implementation
... populate #matches
declare #map cpRootMapType
insert #map select rootid1, rootid2 from #matches
exec spMapRoot2toRoot1 #map

I like to do something similar to this (though it is still very similar to using a cursor)
[code]
-- Table variable to hold list of things that need looping
DECLARE #holdStuff TABLE (
id INT IDENTITY(1,1) ,
isIterated BIT DEFAULT 0 ,
someInt INT ,
someBool BIT ,
otherStuff VARCHAR(200)
)
-- Populate your #holdStuff with... stuff
INSERT INTO #holdStuff (
someInt ,
someBool ,
otherStuff
)
SELECT
1 , -- someInt - int
1 , -- someBool - bit
'I like turtles' -- otherStuff - varchar(200)
UNION ALL
SELECT
42 , -- someInt - int
0 , -- someBool - bit
'something profound' -- otherStuff - varchar(200)
-- Loop tracking variables
DECLARE #tableCount INT
SET #tableCount = (SELECT COUNT(1) FROM [#holdStuff])
DECLARE #loopCount INT
SET #loopCount = 1
-- While loop variables
DECLARE #id INT
DECLARE #someInt INT
DECLARE #someBool BIT
DECLARE #otherStuff VARCHAR(200)
-- Loop through item in #holdStuff
WHILE (#loopCount <= #tableCount)
BEGIN
-- Increment the loopCount variable
SET #loopCount = #loopCount + 1
-- Grab the top unprocessed record
SELECT TOP 1
#id = id ,
#someInt = someInt ,
#someBool = someBool ,
#otherStuff = otherStuff
FROM #holdStuff
WHERE isIterated = 0
-- Update the grabbed record to be iterated
UPDATE #holdAccounts
SET isIterated = 1
WHERE id = #id
-- Execute your stored procedure
EXEC someRandomSp #someInt, #someBool, #otherStuff
END
[/code]
Note that you don't need the identity or the isIterated column on your temp/variable table, i just prefer to do it this way so i don't have to delete the top record from the collection as i iterate through the loop.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Deduplication of imported records in SQL server - sql

Related

SQL Grouping with condition

Loop and update without cursor

Using a temp table with a stored procedure to cycle through IDs [duplicate]

Sql Optimization on advertising system

SQL Call Stored Procedure for each Row without using a cursor

Categories

Resources