What is the most efficient way in T-SQL to compare answer strings to answer keys for scoring an exam - sql

These exams typically have about 120 questions. Currently, they strings are compared to the keys and a value of 1 or 0 assigned. When complete, total the 1's for a raw score.
Are there any T-SQL functions like intersect or diff or something all together different that would handle this process as quickly as possible for 100,000 examinees?
Thanks in advance for your expertise.
-Steven

Try selecting the equality of a question to its correct answer. I assume you have the student's tests in one table and the key in another; something like this ought to work:
select student_test.student_id,
student_test.test_id,
student_test.question_id,
(student_test.answer == test_key.answer OR (student_test.answer IS NULL AND test_key.answer IS NULL))
from student_test
INNER JOIN test_key
ON student_test.test_id = test_key.test_id
AND student_test.question_id = test_key.question_id
WHERE student_test.test_id = <the test to grade>
You can group the results by student and test, then sum the last column if you want the DB to give you the total score. This will give a detailed "right/wrong" analysis of the test.
EDIT: The answers being stored as a continuous string make it much harder. You will most likely have to implement this in a procedural fashion with a cursor, meaning each student's answers are loaded, SUBSTRINGed into varchar(1)s, and compared to the key in an RBAR (row by agonizing row) fashion. You could also implement a scalar-valued function that compared string A to string B one character at a time and returned the number of differences, then call that function from a driving query that will call this function for each student.

Something like this might work out for you:
select student_id, studentname, answers, 0 as score
into #scores from test_answers
declare #studentid int
declare #i int
declare #answers varchar(120)
declare #testkey varchar(120)
select #testkey = test_key from test_keys where test_id = 1234
declare student_cursor cursor for
select student_id from #scores
open student_cursor
fetch next from student_cursor into #studentid
while ##FETCH_STATUS = 0
begin
select #i = 1
select #answers = answers from #scores where student_id = #studentid
while #i < len(#answers)
begin
if mid(#answers, #i, 1) = mid(#testkey, #i, 1)
update #scores set score = score + 1 where student_id = #studentid
select #i = #i + 1
end
fetch next from student_cursor into #studentid
end
select * from #scores
drop table #scores
I doubt that's the single most efficient way to do it, but it's not a bad starting point at least.

Related

SQL Server 2008 Is there a more efficient way to do this update loop?

First posted question, I apologize in advance for any blunders.
The table contains records that are assigned to a team, the initial assignments are done with another process. Frequently, we have to reassign an agent's records and spread them out equally to the rest of the team. We have been doing this by hand, one by one, which was cumbersome. So I came up with this solution:
DECLARE #UpdtAgt TABLE (ID INT, Name varchar(25))
INSERT INTO #UpdtAgt
VALUES (1, 'Gandalf')
,(2,'Hank')
,(3,'Icarus')
CREATE TABLE #UpdtQry (TblID varchar(25))
INSERT INTO #UpdtQry
SELECT ShtID
FROM TestUpdate
DECLARE #RowID INT
DECLARE #AgtID INT
DECLARE #Agt varchar(25)
DECLARE #MaxID INT
SET #MaxID = (SELECT COUNT(*) FROM #UpdtAgt)
SET #AgtID = 1
--WHILE ((SELECT COUNT(*) FROM #UpdtQry) > 0)
WHILE EXISTS (SELECT TblID FROM #UpdtQry)
BEGIN
SET #RowID = (SELECT TOP 1 TblID FROM #UpdtQry)
SET #Agt = (SELECT Name FROM #UpdtAgt WHERE ID = #AgtID)
UPDATE TestUpdate
SET Assignment = #Agt
WHERE ShtID = #RowID
DELETE #UpdtQry WHERE TblID = #RowID
IF #AgtID < #MaxID
SET #AgtID = #AgtID + 1
ELSE
SET #AgtID = 1
END
DROP TABLE #UpdtQry
This is really my first attempt at doing something this in-depth. An update of 100 rows takes about 30 seconds to do. The UPDATE table, TestUpdate, has only the CLUSTERED index. How can I make this more efficient?
EDIT: I didn't define the #UpdtAgt and #UpdtQry tables very well in my explanation. #UpdtAgt will hold the agents that are being reassigned the records, and will likely change each time this is used. #UpdtQry will have a WHERE clause to define which agents records will be getting reassigned, again, this will change with each use. I hope that makes this a little more clear. Again, apologies for not getting it right the first time.
EDIT 2: I commented out the old WHILE clause and inserted the one that HABO suggested. Thank you again HABO.
I think this is what you're looking for:
DECLARE #UpdtAgt TABLE
(
ID INT,
Name VARCHAR(25)
)
INSERT #UpdtAgt
VALUES (1, 'Gandalf')
,(2, 'Hank')
,(3, 'Icarus')
UPDATE t
SET t.Assignment = a.Name
FROM TestUpdate AS t
INNER JOIN #UpdtAgt AS a
ON t.ShtID = a.ID
That should do all 4 rows at once.
P.S...
If you do create tables like in your original post in future, please try and keep the naming of your columns and variables consistent with their purpose!
In your example you used ID, AgtID, and ShtID and (most confusingly) TblID (and I think they're all the same thing? [please correct me if I'm wrong!]). If you called it AgtID everywhere (and #AgtID for the variable [There's no real need for #RowID]) then it would be much easier to see at a glance what'd going on! The same thing goes with Assignment and Name.
Because this is your first attempt at something like this, I want to congratulate you on something that works. While it is not ideal (and what is?) it meets the main goal: it works. There is a better way to do this using something known as a cursor. I remind myself of the proper syntax using the following page from Microsoft: Click here for full instruction on cursors
Having said that, the code at the end of this post shows my quick solution to your situation. Notice the following:
The #TestUpdate table is defined so that the query will run in MSSQL without using permanent tables.
Only the #UpdtAgt table needs to be setup as a temp table. However, if this is used regularly, it would be best to make it a permanent table.
The CLOSE and DEALLOCATE statements at the end are IMPORTANT - forgetting these will have rather unpleasant consequences.
DECLARE #TestUpdate TABLE (ShtID int, Assignment varchar(25))
INSERT INTO #TestUpdate
VALUES (1,'Fred')
,(2,'Barney')
,(3,'Fred')
,(4,'Wilma')
,(5,'Betty'),(6,'Leopold'),(7,'Frank'),(8,'Fred')
DECLARE #UpdtAgt TABLE (ID INT, Name varchar(25))
INSERT INTO #UpdtAgt
VALUES (1, 'Gandalf')
,(2,'Hank')
,(3,'Icarus')
DECLARE #recid int
DECLARE #AgtID int SET #AgtID=0
DECLARE #MaxID int SET #MaxID = (SELECT COUNT(*) FROM #UpdtAgt)
DECLARE assignment_cursor CURSOR
FOR SELECT ShtID FROM #TestUpdate
OPEN assignment_cursor
FETCH NEXT FROM assignment_cursor
INTO #recid
WHILE ##FETCH_STATUS = 0
BEGIN
SET #AgtID = #AgtID + 1
IF #AgtID > #MaxID SET #AgtID = 1
UPDATE #TestUpdate
SET Assignment = (SELECT TOP 1 Name FROM #UpdtAgt WHERE ID=#AgtID)
FROM #TestUpdate TU
WHERE ShtID=#recid
FETCH NEXT FROM assignment_cursor INTO #recid
END
CLOSE assignment_cursor
DEALLOCATE assignment_cursor
SELECT * FROM #TestUpdate

Stored Procedure algorithm taking 9 hours- better way of doing this?

I need to build an SQL stored procedure which basically updates an existing table (of about 150,000 rows) with an ID.
The table that this stored procedure will run over is basically a list of people, their names, addresses etc.
Now the algorithm for the id of the person is as follows:
- Take up to the first 4 characters of the persons first name.
- Take up to the first 2 characters of the persons last name.
- Pad the rest with 0's, with a counting number at the end, until the field is 8 characters.
For instance, the name JOHN SMITH would have an ID of 'JOHNSM00'. If there were 2 JOHN SMITH's, the ID of the next person would be JOHNSM01. If the persons name was FI LYNN for instance, the ID would be FILY0000.
I've got the following stored procedure that I wrote, but it takes around 9 hours to run! Is there a better way of doing this that I am missing?
ALTER PROCEDURE [dbo].[LM_SP_UPDATE_PERSON_CODES]
AS
DECLARE #NAMEKEY NVARCHAR(10)
DECLARE #NEWNAMEKEY NVARCHAR(10)
DECLARE #LENGTH INT
DECLARE #KEYCOUNT INT
DECLARE #I INT
DECLARE #PADDING NVARCHAR(8)
DECLARE #PERSONS CURSOR
DECLARE #FIRSTNAME NVARCHAR(30)
DECLARE #LASTNAME NVARCHAR(30)
SET #PADDING = '00000000'
--FIRST CLEAR OLD NEW NAMEKEYS IF ANY EXIST
UPDATE LM_T_PERSONS SET NEW_NAMEKEY = NULL
SET #PERSONS = CURSOR FOR
SELECT NAMEKEY, NAME_2, NAME_1 FROM LM_T_PERSONS
OPEN #PERSONS
FETCH NEXT FROM #PERSONS INTO #NAMEKEY, #FIRSTNAME, #LASTNAME
WHILE ##FETCH_STATUS = 0
BEGIN
--CHECK THE LENGTH OF FIRST NAME TO MAKE SURE NOTHING EXCEEDS 4
SET #LENGTH = LEN(#FIRSTNAME)
IF #LENGTH > 4
SET #LENGTH = 4
SET #NEWNAMEKEY = SUBSTRING(#FIRSTNAME,1,#LENGTH)
--CHECK THE LENGTH OF LAST NAME TO MAKE SURE NOTHING EXCEEDS 2
SET #LENGTH = LEN(#LASTNAME)
IF #LENGTH > 2
SET #LENGTH = 2
SET #NEWNAMEKEY = #NEWNAMEKEY + SUBSTRING(#LASTNAME,1,#LENGTH)
SET #LENGTH = LEN(#NEWNAMEKEY)
SET #I = 0
SET #PADDING = SUBSTRING('00000000',1,8 - LEN(#NEWNAMEKEY) - LEN(CONVERT(NVARCHAR(8),#I)))
--SEE IF THIS KEY ALREADY EXISTS
SET #KEYCOUNT = (SELECT COUNT(1) FROM LM_T_PERSONS WHERE NEW_NAMEKEY = #NEWNAMEKEY + #PADDING + CONVERT(NVARCHAR(8),#I) )
WHILE #KEYCOUNT > 0
BEGIN
SET #I = #I+1
SET #PADDING = SUBSTRING('00000000',1,8 - LEN(#NEWNAMEKEY) - LEN(CONVERT(NVARCHAR(8),#I)))
SET #KEYCOUNT = (SELECT COUNT(1) FROM LM_T_PERSONS WHERE NEW_NAMEKEY = #NEWNAMEKEY + #PADDING + CONVERT(NVARCHAR(8),#I) )
END
UPDATE LM_T_PERSONS SET NEW_NAMEKEY = #NEWNAMEKEY + #PADDING + CONVERT(NVARCHAR(8),#I) WHERE NAMEKEY = #NAMEKEY
FETCH NEXT FROM #PERSONS INTO #NAMEKEY, #FIRSTNAME, #LASTNAME
END
CLOSE #PERSONS
DEALLOCATE #PERSONS
Something like this can do it without the cursor:
UPDATE P
SET NAMEKEY = FIRSTNAME + LASTNAME + REPLICATE('0', 8 - LEN(FIRSTNAME) - LEN(LASTNAME) - LEN(I)) + I
FROM
LM_T_PERSONS AS P JOIN
(
SELECT
NAMEKEY,
LEFT(NAME_2, 4) AS FIRSTNAME,
LEFT(NAME_1, 2) AS LASTNAME,
CONVERT(NVARCHAR, ROW_NUMBER() OVER(PARTITION BY LEFT(NAME_2, 4), LEFT(NAME_1, 2) ORDER BY NAMEKEY)) AS I
FROM
LM_T_PERSONS
) AS DATA
ON P.NAMEKEY = DATA.NAMEKEY
You can verify the query here:
http://sqlfiddle.com/#!3/47365/19
I don't have any strict "you should do it XYZ way" but from similar sorts of exercises in the past:
If you wanted to keep with the stored proc and you have a window where you can do a task which is long (time-wise), like a weekend, and you can be sure you'll be the only operation running then setting the database to Simple recovery mode (I assume you're working on a Prod database so it is in Full recovery mode) for the duration of your work then that may speed things up (as you're not writing to the transaction log - since you're not, i.e. limited recoverability you want to make sure you're the only person doing anything). I'd take a full backup before starting work in case things get nasty
I don't think it's so much the stored proc but the cursor usage, substring etc as you're doing procedural code somewhere which is mainly set-based. I understand the "why" behind why these are there but an option would be to take it out and use something like SQL Server Integration Services, i.e. going with a technology option more suited to the looping or doing transformations against individual rows
Following on from using something more suited to procedural work...You could always write a simple .NET application or similar. Speaking from my own (limited) experience I have seen this done in the past but the mileage has tended to vary based on things like the complexity of the operation (in your case sounds simple enough in terms of transforming a UserId field), volumes and the person writing it...I would say I've never seen it go particularly well (in that we never turned around and went "that was awesome") but more like it got the job done so we'd move on to something else, taking neither good nor bad from the experience (just "average").
I think SSIS is a good way to go as you can extract these records from your DB, do the operations you need (considering SSIS supports a pretty broad variety of things you can do to data, including writing .NET code {albeit VB.NET from memory} if you have to) and then update your database.
Other kinds of ETL technologies will probably allow you to do similar things but I'm most familiar with SSIS. 150k rows wouldn't be a huge problem as it can deal with much larger volumes; from my own experience we would write SSIS packages that do nothing too special but they could do these sorts of operations over 1 million rows in about 15 mins...which I think the experts will say is still a little slow :-)
HTH a bit, Nathan
This query will get exactly what you want, and much faster.
select FirstName,
LastName,
ID + replicate('0',8-len(ID)-len(cast(rankNumber as varchar)))+cast(rankNumber as varchar)
from (
select dense_rank() over (partition by id order by rownumber) rankNumber,
FirstName,
LastName,
ID
from (
select row_number() over (Order by FirstName) rownumber,
FirstName,
LastName,
RTRIM(cast(FirstName as char(4)))+ RTRIM(cast(LastName as char(2))) as ID
from person
) A
) B
How about avoiding to use the inner WHILE loop by getting the maximum sequence number suffix (#I) for the existing NEW_NAMEKEY then just add 1 if more than 0, otherwise 0 if it returns NULL.

Iterate through rows in SQL Server 2008

Consider the table SAMPLE:
id integer
name nvarchar(10)
There is a stored proc called myproc. It takes only one paramater ( which is id)
Given a name as parameter, find all rows with the name = #nameparameter and pass all those ids
to myproc
eg:
sample->
1 mark
2 mark
3 stu
41 mark
When mark is passed, 1 ,2 and 41 are to be passed to myproc individually.
i.e. the following should happen:
execute myproc 1
execute myproc 2
execute myproc 41
I can't touch myproc nor see its content. I just have to pass the values to it.
If you must iterate(*), use the construct designed to do it - the cursor. Much maligned, but if it most clearly expresses your intentions, I say use it:
DECLARE #ID int
DECLARE IDs CURSOR LOCAL FOR select ID from SAMPLE where Name = #NameParameter
OPEN IDs
FETCH NEXT FROM IDs into #ID
WHILE ##FETCH_STATUS = 0
BEGIN
exec myproc #ID
FETCH NEXT FROM IDs into #ID
END
CLOSE IDs
DEALLOCATE IDs
(*) This answer has received a few upvotes recently, but I feel I ought to incorporate my original comment here also, and add some general advice:
In SQL, you should generally seek a set-based solution. The entire language is oriented around set-based solutions, and (in turn) the optimizer is oriented around making set-based solutions work well. In further turn, the tools that we have available for tuning the optimizer is also set-oriented - e.g. applying indexes to tables.
There are a few situations where iteration is the best approach. These are few are far between, and may be likened to Jackson's rules on optimization - don't do it - and (for experts only) don't do it yet.
You're far better served to first try to formulate what you want in terms of the set of all rows to be affected - what is the overall change to be achieved? - and then try to formulate a query that encapsulates that goal. Only if the query produced by doing so is not performing adequately (or there's some other component that is unable to do anything other than deal with each row individually) should you consider iteration.
I just declare the temporary table #sample and insert the all rows which have the name='rahul' and also take the status column to check that the row is iterated.and using while loop i iterate through the all rows of temporary table #sample which have all the ids of name='rahul'
use dumme
Declare #Name nvarchar(50)
set #Name='Rahul'
DECLARE #sample table (
ID int,
Status varchar(500)
)
insert into #sample (ID,status) select ID,0 from sample where sample=#name
while ((select count(Id) from #sample where status=0 )>0)
begin
select top 1 Id from #sample where status=0 order by Id
update #sample set status=1 where Id=(select top 1 Id from #sample where status=0 order by Id)
end
Declare #retStr varchar(100)
select #retStr = COALESCE(#retStr, '') + sample.ID + ', '
from sample
WHERE sample.Name = #nameparameter
select #retStr = ltrim(rtrim(substring(#retStr , 1, len(#retStr )- 1)))
Return ISNULL(#retStr ,'')

Why my T-SQL (WHILE) does not work?

In my code, I need to test whether specified column is null and the most close to 0 as possible (it can holds numbers from 0 to 50) so I have tried the code below.
It should start from 0 and for each value test the query. When #Results gets null, it should return. However, it does not work. Still prints 0.
declare #hold int
declare #Result int
set #hold0
set #Result=0
WHILE (#Result!=null)
BEGIN
select #Result=(SELECT Hold from Numbers WHERE Name='Test' AND Hold=#hold)
set #hold=#hold+1
END
print #hold
First, you can't test equality of NULL. NULL means an unknown value, so you don't know whether or not it does (or does not) equal any specific value. Instead of #Result!=NULL use #result IS NOT NULL
Second, don't use this kind of sequential processing in SQL if you can at all help it. SQL is made to handle sets, not process things sequentially. You could do all of this work with one simple SQL command and it will most likely run faster anyway:
SELECT
MIN(hold) + 1
FROM
Numbers N1
WHERE
N1.name = 'Test' AND
NOT EXISTS
(
SELECT
*
FROM
Numbers N2
WHERE
N2.name = 'Test' AND
N2.hold = N1.hold + 1
)
The query above basically tells the SQL Server, "Give me the smallest hold value plus 1 (MIN(hold) + 1) in the table Numbers where the name is test (name = 'Test') and where the row with name of 'Test' and hold of one more that that does not exist (the whole "NOT EXISTS" part)". In the case of the following rows:
Name Hold
-------- ----
Test 1
Test 2
NotTest 3
Test 20
SQL Server finds all of the rows with name of "Test" (1, 2, 20) then finds which ones don't have a row with name = Test and hold = hold + 1. For 1 there is a row with Test, 2 that exists. For Test, 2 there is no Test, 3 so it's still in the potential results. For Test, 20 there is no Test, 21 so that leaves us with:
Name Hold
-------- ----
Test 2
Test 20
Now SQL Server looks for MIN(hold) and gets 2 then it adds 1, so you get 3.
SQL Server may not perform the operations exactly as I described. The SQL statement tells SQL Server what you're looking for, but not how to get it. SQL Server has the freedom to use whatever method it determines is the most efficient for getting the answer.
The key is to always think in terms of sets and how do those sets get put together (through JOINs), filtered (through WHERE conditions or ON conditions within a join, and when necessary, grouped and aggregated (MIN, MAX, AVG, etc.).
have you tried
WHILE (#Result is not null)
BEGIN
select #Result=(SELECT Hold from Numbers WHERE Name='Test' AND Hold=#hold)
set #hold=#hold+1
END
Here's a more advanced version of Tom H.'s query:
SELECT MIN(N1.hold) + 1
FROM Numbers N1
LEFT OUTER JOIN Numbers N2
ON N2.Name = N1.Name AND N2.hold = N1.hold + 1
WHERE N1.name = 'Test' AND N2.name IS NULL
It's not as intuitive if you're not familiar with SQL, but it uses identical logic. For those who are more familiar with SQL, it makes the relationship between N1 and N2 easier to see. It may also be easier for the query optimizer to handle, depending on your DBMS.
Try this:
declare #hold int
declare #Result int
set #hold=0
set #Result=0
declare #max int
SELECT #max=MAX(Hold) FROM Numbers
WHILE (#hold <= #max)
BEGIN
select #Result=(SELECT Hold from Numbers WHERE Name='Test' AND Hold=#hold)
set #hold=#hold+1
END
print #hold
While is tricky in T-SQL - you can use this for (foreach) looping through (temp) tables too - with:
-- Foreach with T-SQL while
DECLARE #tempTable TABLE (rownum int IDENTITY (1, 1) Primary key NOT NULL, Number int)
declare #RowCnt int
declare #MaxRows int
select #RowCnt = 1
select #MaxRows=count(*) from #tempTable
declare #number int
while #RowCnt <= #MaxRows
begin
-- Number from given RowNumber
SELECT #number=Number FROM #tempTable where rownum = #RowCnt
-- next row
Select #RowCnt = #RowCnt + 1
end

How to keep a rolling checksum in SQL?

I am trying to keep a rolling checksum to account for order, so take the previous 'checksum' and xor it with the current one and generate a new checksum.
Name Checksum Rolling Checksum
------ ----------- -----------------
foo 11829231 11829231
bar 27380135 checksum(27380135 ^ 11829231) = 93291803
baz 96326587 checksum(96326587 ^ 93291803) = 67361090
How would I accomplish something like this?
(Note that the calculations are completely made up and are for illustration only)
This is basically the running total problem.
Edit:
My original claim was that is one of the few places where a cursor based solution actually performs best. The problem with the triangular self join solution is that it will repeatedly end up recalculating the same cumulative checksum as a subcalculation for the next step so is not very scalable as the work required grows exponentially with the number of rows.
Corina's answer uses the "quirky update" approach. I've adjusted it to do the check sum and in my test found that it took 3 seconds rather than 26 seconds for the cursor solution. Both produced the same results. Unfortunately however it relies on an undocumented aspect of Update behaviour. I would definitely read the discussion here before deciding whether to rely on this in production code.
There is a third possibility described here (using the CLR) which I didn't have time to test. But from the discussion here it seems to be a good possibility for calculating running total type things at display time but out performed by the cursor when the result of the calculation must be saved back.
CREATE TABLE TestTable
(
PK int identity(1,1) primary key clustered,
[Name] varchar(50),
[CheckSum] AS CHECKSUM([Name]),
RollingCheckSum1 int NULL,
RollingCheckSum2 int NULL
)
/*Insert some random records (753,571 on my machine)*/
INSERT INTO TestTable ([Name])
SELECT newid() FROM sys.objects s1, sys.objects s2, sys.objects s3
Approach One: Based on the Jeff Moden Article
DECLARE #RCS int
UPDATE TestTable
SET #RCS = RollingCheckSum1 =
CASE WHEN #RCS IS NULL THEN
[CheckSum]
ELSE
CHECKSUM([CheckSum] ^ #RCS)
END
FROM TestTable WITH (TABLOCKX)
OPTION (MAXDOP 1)
Approach Two - Using the same cursor options as Hugo Kornelis advocates in the discussion for that article.
SET NOCOUNT ON
BEGIN TRAN
DECLARE #RCS2 INT
DECLARE #PK INT, #CheckSum INT
DECLARE curRollingCheckSum CURSOR LOCAL STATIC READ_ONLY
FOR
SELECT PK, [CheckSum]
FROM TestTable
ORDER BY PK
OPEN curRollingCheckSum
FETCH NEXT FROM curRollingCheckSum
INTO #PK, #CheckSum
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RCS2 = CASE WHEN #RCS2 IS NULL THEN #CheckSum ELSE CHECKSUM(#CheckSum ^ #RCS2) END
UPDATE dbo.TestTable
SET RollingCheckSum2 = #RCS2
WHERE #PK = PK
FETCH NEXT FROM curRollingCheckSum
INTO #PK, #CheckSum
END
COMMIT
Test they are the same
SELECT * FROM TestTable
WHERE RollingCheckSum1<> RollingCheckSum2
I'm not sure about a rolling checksum, but for a rolling sum for instance, you can do this using the UPDATE command:
declare #a table (name varchar(2), value int, rollingvalue int)
insert into #a
select 'a', 1, 0 union all select 'b', 2, 0 union all select 'c', 3, 0
select * from #a
declare #sum int
set #sum = 0
update #a
set #sum = rollingvalue = value + #sum
select * from #a
Select Name, Checksum
, (Select T1.Checksum_Agg(Checksum)
From Table As T1
Where T1.Name < T.Name) As RollingChecksum
From Table As T
Order By T.Name
To do a rolling anything, you need some semblance of an order to the rows. That can be by name, an integer key, a date or whatever. In my example, I used name (even though the order in your sample data isn't alphabetical). In addition, I'm using the Checksum_Agg function in SQL.
In addition, you would ideally have a unique value on which you compare the inner and outer query. E.g., Where T1.PK < T.PK for an integer key or even string key would work well. In my solution if Name had a unique constraint, it would also work well enough.