Iterate through rows in SQL Server 2008 - sql

Consider the table SAMPLE:
id integer
name nvarchar(10)
There is a stored proc called myproc. It takes only one paramater ( which is id)
Given a name as parameter, find all rows with the name = #nameparameter and pass all those ids
to myproc
eg:
sample->
1 mark
2 mark
3 stu
41 mark
When mark is passed, 1 ,2 and 41 are to be passed to myproc individually.
i.e. the following should happen:
execute myproc 1
execute myproc 2
execute myproc 41
I can't touch myproc nor see its content. I just have to pass the values to it.

If you must iterate(*), use the construct designed to do it - the cursor. Much maligned, but if it most clearly expresses your intentions, I say use it:
DECLARE #ID int
DECLARE IDs CURSOR LOCAL FOR select ID from SAMPLE where Name = #NameParameter
OPEN IDs
FETCH NEXT FROM IDs into #ID
WHILE ##FETCH_STATUS = 0
BEGIN
exec myproc #ID
FETCH NEXT FROM IDs into #ID
END
CLOSE IDs
DEALLOCATE IDs
(*) This answer has received a few upvotes recently, but I feel I ought to incorporate my original comment here also, and add some general advice:
In SQL, you should generally seek a set-based solution. The entire language is oriented around set-based solutions, and (in turn) the optimizer is oriented around making set-based solutions work well. In further turn, the tools that we have available for tuning the optimizer is also set-oriented - e.g. applying indexes to tables.
There are a few situations where iteration is the best approach. These are few are far between, and may be likened to Jackson's rules on optimization - don't do it - and (for experts only) don't do it yet.
You're far better served to first try to formulate what you want in terms of the set of all rows to be affected - what is the overall change to be achieved? - and then try to formulate a query that encapsulates that goal. Only if the query produced by doing so is not performing adequately (or there's some other component that is unable to do anything other than deal with each row individually) should you consider iteration.

I just declare the temporary table #sample and insert the all rows which have the name='rahul' and also take the status column to check that the row is iterated.and using while loop i iterate through the all rows of temporary table #sample which have all the ids of name='rahul'
use dumme
Declare #Name nvarchar(50)
set #Name='Rahul'
DECLARE #sample table (
ID int,
Status varchar(500)
)
insert into #sample (ID,status) select ID,0 from sample where sample=#name
while ((select count(Id) from #sample where status=0 )>0)
begin
select top 1 Id from #sample where status=0 order by Id
update #sample set status=1 where Id=(select top 1 Id from #sample where status=0 order by Id)
end

Declare #retStr varchar(100)
select #retStr = COALESCE(#retStr, '') + sample.ID + ', '
from sample
WHERE sample.Name = #nameparameter
select #retStr = ltrim(rtrim(substring(#retStr , 1, len(#retStr )- 1)))
Return ISNULL(#retStr ,'')

Related

Create a stored procedure that keeps returning new rows

I have a table with x number of rows. I want to create a stored procedure that always select a new row and return that row (when all rows has been returned it will start over from first row). My idea is to select top 1 row (ordered by a date time row) return that from the stored procedure and then set an datetime column so next time it will be a new row that is returned. It needs to be thread safe so I would expect some row locking is needed (I don't know if this is true). How would you create a stored procedure like that? I am not sure of you need to use variables or it can be done in a single query. Something like:
select top 1 *
from [dbo].[AppRegistrations]
order by LastUsed
update [dbo].[AppRegistrations]
set LastUsed = getdate()
In the comments it is stated that it cannot be done in a single query. If I added following to a stored procedure will it then be thread safe? Or do I need to add a lock? And does the query make sense or should it be done differently?
declare #id int
declare #name as nvarchar(256)
select top 1 #id=id,#name=name from [dbo].[AppRegistrations] order by LastUsed
Update [dbo].[AppRegistrations] set LastUsed=getdate() where id=#id
select #id,#name
It is important that another query cannot interrupt returning a unique row because it updates a row between the select and the update. That is why I wanted it in a single query.
I tried to gather everything up and added a row lock. Following sample works as expected, but I dont know whether the row lock is the right way, or I should expect some challenges. Can someone validate if this approach is correct?
BEGIN TRAN
declare #id int
declare #name as nvarchar(256)
select top 1 #id=id,#name=name from [dbo].[AppRegistrations] WITH (HOLDLOCK, ROWLOCK) order by LastUsed
Update [dbo].[AppRegistrations] set LastUsed=getdate() where id=#id
select #id as id,#name as name
COMMIT TRAN
I make a good number of assumptions here
UPDATE [dbo].[AppRegistrations]
SET LastSelected = CURRENT_TIMESTAMP
OUTPUT INSERTED.*
WHERE Id = (SELECT TOP (1) Id
FROM [dbo].[AppRegistrations]
ORDER BY LastSelected
)
Here is some background on the OUTPUT https://learn.microsoft.com/en-us/sql/t-sql/queries/output-clause-transact-sql?view=sql-server-ver15
Here is another reference where you can do slightly more complex things https://learn.microsoft.com/en-us/sql/t-sql/queries/update-transact-sql?view=sql-server-ver15#CaptureResults

IN-clause with optional parameter SQL

I have a stored procedure that returns me a set of data based on 2 input parameters. One of the parameter is optional so I am using
WHERE
(tbl_Process.ProjectID = #ProjectID)
AND
(tbl_AnalysisLookup.AnalysisCodeID = 7)
AND
(tbl_ProcessSubStep.ProcessID = ISNULL(#ProcessID,tbl_ProcessSubStep.ProcessID))
The #ProcessID is optional parameter so the user may/may not provide it.
Now I need to change my stored procedure to accommodate multiple ProcessId's i.e. the user can now select a list of multiple ProcessId's, Single ProcessID or No ProcessID and the stored proc should handle all these scenarios. What is the best way to achieve this without using Dynamic queries unless absolutely required.
In a nutshell, I wanted my stored proc to handle optional parameters with multiple values(WHERE IN Clause). The solution and relative link to the webpage I got it from has been provided below. It's a very good article and will help you to choose the right solution based on your requirements.
I have finally figured out how to achieve this. There are a couple of ways to do this, what I am using now is a function to split a string of ProcessID's based on delimiter and Then Inserting them into a table. Then using that table in my stored proc. Here is the code and the link to the webpage.
http://www.codeproject.com/Articles/58780/Techniques-for-In-Clause-and-SQL-Server
CREATE FUNCTION [dbo].[ufnDelimitedBigIntToTable]
(
#List varchar(max), #Delimiter varchar(10)
)
RETURNS #Ids TABLE
(Id bigint) AS
BEGIN
DECLARE #list1 VARCHAR(MAX), #Pos INT, #rList VARCHAR(MAX)
SET #List = LTRIM(RTRIM(#List)) + #Delimiter
SET #pos = CHARINDEX(#Delimiter, #List, 1)
WHILE #pos > 0
BEGIN
SET #list1 = LTRIM(RTRIM(LEFT(#List, #pos - 1)))
IF #list1 <> ''
INSERT INTO #Ids(Id) VALUES (CAST(#list1 AS bigint))
SET #List = SUBSTRING(#List, #pos+1, LEN(#List))
SET #pos = CHARINDEX(#Delimiter, #list, 1)
END
RETURN
END
Once made, the table-function can be used in a query:
Collapse | Copy Code
CREATE PROCEDURE [dbo].[GetUsingDelimitedFunctionTable]
#Ids varchar(max)
AS
BEGIN
SET NOCOUNT ON
SELECT s.Id,s.SomeString
FROM SomeString s (NOLOCK)
WHERE EXISTS ( SELECT *
FROM ufnDelimitedBigIntToTable(#Ids,',') Ids
WHERE s.Id = Ids.id )
END
The Link also provides more ways to achieve this.
Not the best, but one way is to convert both sides to "varchar" and use "Like" operator to compare them. It doesn't need any huge modifications, just change the datatype of your parameter to "varchar". Something like the code below:
'%[,]' + Convert(varchar(10), tbl_ProcessSubStep.ProcessID) + '[,]%' Like #ProcessIDs
Hope it helps.
You didn't specify your database product in your question, but I'm going to guess from the #Pararemter naming style that you're using SQL Server.
Except for the unusual requirement of interpreting empty input to mean 'all', this a restatement of the problem of Arrays in SQL, explored throughly by Erland Sommarskog. Read all his articles on the subject for a good analysis of all the techniques you can use.
Here I'll explain how to use a table-valued parameter to solve your problem.
Execute the following scripts all together to set up the test environment in an idempotent way.
Creating a sample solution
First create a new empty test database StackOverFlow13556628:
USE master;
GO
IF DB_ID('StackOverFlow13556628') IS NOT NULL
BEGIN
ALTER DATABASE StackOverFlow13556628 SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE StackOverFlow13556628;
END;
GO
CREATE DATABASE StackOverFlow13556628;
GO
USE StackOverFlow13556628;
GO
Next, create a user-defined table type PrinciapalList with one column principal_id. This type contains the input values with which to query the system table sys.database_principals.
CREATE TYPE PrincipalList AS TABLE (
principal_id INT NOT NULL PRIMARY KEY
);
GO
After that, create the stored procedure GetPrincipals which takes a PrincipalList table-valued parameter as input, and returns a result set from sys.database_principals.
CREATE PROCEDURE GetPrincipals (
#principal_ids PrincipalList READONLY
)
AS
BEGIN
IF EXISTS(SELECT * FROM #principal_ids)
BEGIN
SELECT *
FROM sys.database_principals
WHERE principal_id IN (
SELECT principal_id
FROM #principal_ids
);
END
ELSE
BEGIN
SELECT *
FROM sys.database_principals;
END;
END;
GO
If the table-valued parameter contains rows, then the procedure returns all the rows in sys.database_principals that have a matching principal_id value. If the table-valued parameter is empty, it returns all the rows.
Testing the solution
You can query multiple principals like this:
DECLARE #principals PrincipalList;
INSERT INTO #principals (principal_id) VALUES (1);
INSERT INTO #principals (principal_id) VALUES (2);
INSERT INTO #principals (principal_id) VALUES (3);
EXECUTE GetPrincipals
#principal_ids = #principals;
GO
Result:
principal_id name
1 dbo
2 guest
3 INFORMATION_SCHEMA
You can query a single principal like this:
DECLARE #principals PrincipalList;
INSERT INTO #principals (principal_id) VALUES (1);
EXECUTE GetPrincipals
#principal_ids = #principals;
GO
Result:
principal_id name
1 dbo
You can query all principals like this:
EXECUTE GetPrincipals;
Result:
principal_id name
0 public
1 dbo
2 guest
3 INFORMATION_SCHEMA
4 sys
16384 db_owner
16385 db_accessadmin
16386 db_securityadmin
16387 db_ddladmin
16389 db_backupoperator
16390 db_datareader
16391 db_datawriter
16392 db_denydatareader
16393 db_denydatawriter
Remarks
This solution is inefficient because you always have to read from the table-valued parameter twice. In practice, unless your table-valued parameter has millions of rows, it will probably not be the major bottleneck.
Using an empty table-valued parameter in this way feels unintuitive. A more obvious design might simply be to have two stored procedures - one that returns all the rows, and one that returns only rows with matching ids. It would be up to the calling application to choose which one to call.

Execute SQL statements while looping a table

I want to create a table with a few records in it and then run a set of sql statements for every record in that table. I would use the data in the table to set values in the sql statement.
This should allow me to write the SQL just once and then run it for whatever data I put in the table.
But, I'm not sure how to go about doing this. Should I use a cursor to loop the table? Some other way?
Thanks for any help or advice you can give me.
CURSOR will have an overhead associated with it, but can be a good method to walk through your table. They are not a totally unnecessary evil and have their place.
With the limited information that WilliamB2 provided, it sounds like a CURSOR set may be a good solution for this problem to walk through his data and generate the multiple downstream INSERTs.
Yes you can use a cursor. You can also use a while loop
declare #table as table(col1 int, col2 varchar(20))
declare #col1 int
declare #col2 varchar(50)
declare #sql varchar(max)
insert into #table
SELECT col1, col2 FROM OriginalTable
while(exists(select top 1 'x' from #table)) --as long as #table contains records continue
begin
select top 1 #col1=col1, #col2=col2 from #table
SET #sql = 'INSERT INTO Table t VALUES('+cast(#col1 as varchar)+')'
delete top (1) from #table --remove the previously processed row. also ensures no infinite loop
end
I think cursor has an overhead attached to it.
With this second approach you are not working on the original table
Maybe you could use INSERT...SELECT instead of the loop:
INSERT INTO target_table
SELECT
some_col,
some_other_col,
'Some fixed value',
NULL,
42,
you_get_the_idea
FROM source_table
WHERE source_table.you_get_the_idea = 1
The columns on your SELECT should match the structure of the target table (you can omit an int/identity pk like id if you have one).
If the best option is this or the loop depends on how many tables you want to populate inside the loop. If it's just a few, I usually stick with INSERT...SELECT.

What is the most efficient way in T-SQL to compare answer strings to answer keys for scoring an exam

These exams typically have about 120 questions. Currently, they strings are compared to the keys and a value of 1 or 0 assigned. When complete, total the 1's for a raw score.
Are there any T-SQL functions like intersect or diff or something all together different that would handle this process as quickly as possible for 100,000 examinees?
Thanks in advance for your expertise.
-Steven
Try selecting the equality of a question to its correct answer. I assume you have the student's tests in one table and the key in another; something like this ought to work:
select student_test.student_id,
student_test.test_id,
student_test.question_id,
(student_test.answer == test_key.answer OR (student_test.answer IS NULL AND test_key.answer IS NULL))
from student_test
INNER JOIN test_key
ON student_test.test_id = test_key.test_id
AND student_test.question_id = test_key.question_id
WHERE student_test.test_id = <the test to grade>
You can group the results by student and test, then sum the last column if you want the DB to give you the total score. This will give a detailed "right/wrong" analysis of the test.
EDIT: The answers being stored as a continuous string make it much harder. You will most likely have to implement this in a procedural fashion with a cursor, meaning each student's answers are loaded, SUBSTRINGed into varchar(1)s, and compared to the key in an RBAR (row by agonizing row) fashion. You could also implement a scalar-valued function that compared string A to string B one character at a time and returned the number of differences, then call that function from a driving query that will call this function for each student.
Something like this might work out for you:
select student_id, studentname, answers, 0 as score
into #scores from test_answers
declare #studentid int
declare #i int
declare #answers varchar(120)
declare #testkey varchar(120)
select #testkey = test_key from test_keys where test_id = 1234
declare student_cursor cursor for
select student_id from #scores
open student_cursor
fetch next from student_cursor into #studentid
while ##FETCH_STATUS = 0
begin
select #i = 1
select #answers = answers from #scores where student_id = #studentid
while #i < len(#answers)
begin
if mid(#answers, #i, 1) = mid(#testkey, #i, 1)
update #scores set score = score + 1 where student_id = #studentid
select #i = #i + 1
end
fetch next from student_cursor into #studentid
end
select * from #scores
drop table #scores
I doubt that's the single most efficient way to do it, but it's not a bad starting point at least.

How to keep a rolling checksum in SQL?

I am trying to keep a rolling checksum to account for order, so take the previous 'checksum' and xor it with the current one and generate a new checksum.
Name Checksum Rolling Checksum
------ ----------- -----------------
foo 11829231 11829231
bar 27380135 checksum(27380135 ^ 11829231) = 93291803
baz 96326587 checksum(96326587 ^ 93291803) = 67361090
How would I accomplish something like this?
(Note that the calculations are completely made up and are for illustration only)
This is basically the running total problem.
Edit:
My original claim was that is one of the few places where a cursor based solution actually performs best. The problem with the triangular self join solution is that it will repeatedly end up recalculating the same cumulative checksum as a subcalculation for the next step so is not very scalable as the work required grows exponentially with the number of rows.
Corina's answer uses the "quirky update" approach. I've adjusted it to do the check sum and in my test found that it took 3 seconds rather than 26 seconds for the cursor solution. Both produced the same results. Unfortunately however it relies on an undocumented aspect of Update behaviour. I would definitely read the discussion here before deciding whether to rely on this in production code.
There is a third possibility described here (using the CLR) which I didn't have time to test. But from the discussion here it seems to be a good possibility for calculating running total type things at display time but out performed by the cursor when the result of the calculation must be saved back.
CREATE TABLE TestTable
(
PK int identity(1,1) primary key clustered,
[Name] varchar(50),
[CheckSum] AS CHECKSUM([Name]),
RollingCheckSum1 int NULL,
RollingCheckSum2 int NULL
)
/*Insert some random records (753,571 on my machine)*/
INSERT INTO TestTable ([Name])
SELECT newid() FROM sys.objects s1, sys.objects s2, sys.objects s3
Approach One: Based on the Jeff Moden Article
DECLARE #RCS int
UPDATE TestTable
SET #RCS = RollingCheckSum1 =
CASE WHEN #RCS IS NULL THEN
[CheckSum]
ELSE
CHECKSUM([CheckSum] ^ #RCS)
END
FROM TestTable WITH (TABLOCKX)
OPTION (MAXDOP 1)
Approach Two - Using the same cursor options as Hugo Kornelis advocates in the discussion for that article.
SET NOCOUNT ON
BEGIN TRAN
DECLARE #RCS2 INT
DECLARE #PK INT, #CheckSum INT
DECLARE curRollingCheckSum CURSOR LOCAL STATIC READ_ONLY
FOR
SELECT PK, [CheckSum]
FROM TestTable
ORDER BY PK
OPEN curRollingCheckSum
FETCH NEXT FROM curRollingCheckSum
INTO #PK, #CheckSum
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RCS2 = CASE WHEN #RCS2 IS NULL THEN #CheckSum ELSE CHECKSUM(#CheckSum ^ #RCS2) END
UPDATE dbo.TestTable
SET RollingCheckSum2 = #RCS2
WHERE #PK = PK
FETCH NEXT FROM curRollingCheckSum
INTO #PK, #CheckSum
END
COMMIT
Test they are the same
SELECT * FROM TestTable
WHERE RollingCheckSum1<> RollingCheckSum2
I'm not sure about a rolling checksum, but for a rolling sum for instance, you can do this using the UPDATE command:
declare #a table (name varchar(2), value int, rollingvalue int)
insert into #a
select 'a', 1, 0 union all select 'b', 2, 0 union all select 'c', 3, 0
select * from #a
declare #sum int
set #sum = 0
update #a
set #sum = rollingvalue = value + #sum
select * from #a
Select Name, Checksum
, (Select T1.Checksum_Agg(Checksum)
From Table As T1
Where T1.Name < T.Name) As RollingChecksum
From Table As T
Order By T.Name
To do a rolling anything, you need some semblance of an order to the rows. That can be by name, an integer key, a date or whatever. In my example, I used name (even though the order in your sample data isn't alphabetical). In addition, I'm using the Checksum_Agg function in SQL.
In addition, you would ideally have a unique value on which you compare the inner and outer query. E.g., Where T1.PK < T.PK for an integer key or even string key would work well. In my solution if Name had a unique constraint, it would also work well enough.