In my code, I need to test whether specified column is null and the most close to 0 as possible (it can holds numbers from 0 to 50) so I have tried the code below.
It should start from 0 and for each value test the query. When #Results gets null, it should return. However, it does not work. Still prints 0.
declare #hold int
declare #Result int
set #hold0
set #Result=0
WHILE (#Result!=null)
BEGIN
select #Result=(SELECT Hold from Numbers WHERE Name='Test' AND Hold=#hold)
set #hold=#hold+1
END
print #hold
First, you can't test equality of NULL. NULL means an unknown value, so you don't know whether or not it does (or does not) equal any specific value. Instead of #Result!=NULL use #result IS NOT NULL
Second, don't use this kind of sequential processing in SQL if you can at all help it. SQL is made to handle sets, not process things sequentially. You could do all of this work with one simple SQL command and it will most likely run faster anyway:
SELECT
MIN(hold) + 1
FROM
Numbers N1
WHERE
N1.name = 'Test' AND
NOT EXISTS
(
SELECT
*
FROM
Numbers N2
WHERE
N2.name = 'Test' AND
N2.hold = N1.hold + 1
)
The query above basically tells the SQL Server, "Give me the smallest hold value plus 1 (MIN(hold) + 1) in the table Numbers where the name is test (name = 'Test') and where the row with name of 'Test' and hold of one more that that does not exist (the whole "NOT EXISTS" part)". In the case of the following rows:
Name Hold
-------- ----
Test 1
Test 2
NotTest 3
Test 20
SQL Server finds all of the rows with name of "Test" (1, 2, 20) then finds which ones don't have a row with name = Test and hold = hold + 1. For 1 there is a row with Test, 2 that exists. For Test, 2 there is no Test, 3 so it's still in the potential results. For Test, 20 there is no Test, 21 so that leaves us with:
Name Hold
-------- ----
Test 2
Test 20
Now SQL Server looks for MIN(hold) and gets 2 then it adds 1, so you get 3.
SQL Server may not perform the operations exactly as I described. The SQL statement tells SQL Server what you're looking for, but not how to get it. SQL Server has the freedom to use whatever method it determines is the most efficient for getting the answer.
The key is to always think in terms of sets and how do those sets get put together (through JOINs), filtered (through WHERE conditions or ON conditions within a join, and when necessary, grouped and aggregated (MIN, MAX, AVG, etc.).
have you tried
WHILE (#Result is not null)
BEGIN
select #Result=(SELECT Hold from Numbers WHERE Name='Test' AND Hold=#hold)
set #hold=#hold+1
END
Here's a more advanced version of Tom H.'s query:
SELECT MIN(N1.hold) + 1
FROM Numbers N1
LEFT OUTER JOIN Numbers N2
ON N2.Name = N1.Name AND N2.hold = N1.hold + 1
WHERE N1.name = 'Test' AND N2.name IS NULL
It's not as intuitive if you're not familiar with SQL, but it uses identical logic. For those who are more familiar with SQL, it makes the relationship between N1 and N2 easier to see. It may also be easier for the query optimizer to handle, depending on your DBMS.
Try this:
declare #hold int
declare #Result int
set #hold=0
set #Result=0
declare #max int
SELECT #max=MAX(Hold) FROM Numbers
WHILE (#hold <= #max)
BEGIN
select #Result=(SELECT Hold from Numbers WHERE Name='Test' AND Hold=#hold)
set #hold=#hold+1
END
print #hold
While is tricky in T-SQL - you can use this for (foreach) looping through (temp) tables too - with:
-- Foreach with T-SQL while
DECLARE #tempTable TABLE (rownum int IDENTITY (1, 1) Primary key NOT NULL, Number int)
declare #RowCnt int
declare #MaxRows int
select #RowCnt = 1
select #MaxRows=count(*) from #tempTable
declare #number int
while #RowCnt <= #MaxRows
begin
-- Number from given RowNumber
SELECT #number=Number FROM #tempTable where rownum = #RowCnt
-- next row
Select #RowCnt = #RowCnt + 1
end
Related
I have to create a function in a SQL Server trigger for generating random numbers after insert. I want to update the column with that generated random number please help what I have missed in my code.
If you know other ways please suggest a way to complete my task.
This my SQL Server trigger:
ALTER TRIGGER [dbo].[trgEnquiryMaster]
ON [dbo].[enquiry_master]
AFTER INSERT
AS
declare #EnquiryId int;
declare #ReferenceNo varchar(50);
declare #GenReferenceNo NVARCHAR(MAX);
select #EnquiryId = i.enquiry_id from inserted i;
select #ReferenceNo = i.reference_no from inserted i;
BEGIN
SET #GenReferenceNo = 'CREATE FUNCTION functionRandom (#Reference VARCHAR(MAX) )
RETURNS VARCHAR(MAX)
As
Begin
DECLARE #r varchar(8);
SELECT #r = coalesce(#r, '') + n
FROM (SELECT top 8
CHAR(number) n FROM
master..spt_values
WHERE type = P AND
(number between ascii(0) and ascii(9)
or number between ascii(A) and ascii(Z)
or number between ascii(a) and ascii(z))
ORDER BY newid()) a
RETURNS #r
END
'
EXEC(#GenReferenceNo)
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- update statements for trigger here
UPDATE enquiry_master
SET reference_no ='updated'
WHERE enquiry_id = #EnquiryId
END
To generate random numbers, just call CRYPT_GEN_RANDOM which was introduced in SQL Server 2008:
SELECT CRYPT_GEN_RANDOM(5) AS [Hex],
CONVERT(VARCHAR(20), CRYPT_GEN_RANDOM(5), 2) AS [HexStringWithout0x],
CONVERT(VARCHAR(20), CRYPT_GEN_RANDOM(10)) AS [Translated-ASCII],
CONVERT(NVARCHAR(20), CRYPT_GEN_RANDOM(20)) AS [Translated-UCS2orUTF16]
returns:
Hex HexStringWithout0x Translated-ASCII Translated-UCS2orUTF16
0x4F7D9ABBC4 0ECF378A7A ¿"bü<ݱØï 붻槬㟰添䛺⯣왚꒣찭퓚
If you are ok with just 0 - 9 and A - F, then the CONVERT(VARCHAR(20), CRYPT_GEN_RANDOM(5), 2) is all you need.
Please see my answer on DBA.StackExchange on a similar question for more details:
Password generator function
The UPDATE statement shown in the "Update" section of that linked answer is what you want, just remove the WHERE condition and add the JOIN to the Inserted pseudo-table.
The query should look something like the following:
DECLARE #Length INT = 10;
UPDATE em
SET em.[reference_no] = rnd.RandomValue
FROM dbo.enquiry_master em
INNER JOIN Inserted ins
ON ins.enquiry_id = em.enquiry_id
CROSS APPLY dbo.GenerateReferenceNo(CRYPT_GEN_RANDOM((em.[enquiry_id] % 1) + #Length)) rnd;
And since the function is slightly different, here is how it should be in order to get both upper-case and lower-case letters:
CREATE FUNCTION dbo.GenerateReferenceNo(#RandomValue VARBINARY(20))
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
WITH base(item) AS
(
SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL
SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL
), items(item) AS
(
SELECT NULL
FROM base b1
CROSS JOIN base b2
)
SELECT (
SELECT TOP (LEN(#RandomValue))
SUBSTRING('1234567890QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm',
(CONVERT(TINYINT, SUBSTRING(#RandomValue, 1, 1)) % 62) + 1,
1) AS [text()]
FROM items
FOR XML PATH('')
) AS [RandomReferenceNo];
GO
And please follow the usage shown above, passing in CRYPT_GEN_RANDOM((em.[enquiry_id] % 1) + #Length), not: CRYPT_GEN_RANDOM(#RefferenceNOLength).
Other notes:
#marc_s already explained the one-row vs multiple-rows flaw and how to fix that.
not only is a trigger not the place to create a new object (i.e. the function), that function wouldn't have worked anyway since the call to newid() (in the ORDER BY) is not allowed in a function.
You don't need to issue two separate SELECTs to set two different variables. You could do the following:
SELECT #EnquiryId = i.enquiry_id,
#ReferenceNo = i.reference_no
FROM TableName i;
Passing strings into a function requires quoting those strings inside of single-quotes: ASCII('A') instead of ASCII(A).
UPDATE
The full Trigger definition should be something like the following:
ALTER TRIGGER [dbo].[trgEnquiryMaster]
ON [dbo].[enquiry_master]
AFTER INSERT
AS
BEGIN
DECLARE #Length INT = 10;
UPDATE em
SET em.[reference_no] = rnd.RandomValue
FROM dbo.enquiry_master em
INNER JOIN Inserted ins
ON ins.enquiry_id = em.enquiry_id
CROSS APPLY dbo.GenerateReferenceNo(
CRYPT_GEN_RANDOM((em.[enquiry_id] % 1) + #Length)
) rnd;
END;
A trigger should be very nimble and quick - it is no place to do heavy and time-intensive processing, and definitely no place to create new database objects since (a) the trigger is executed in the context of the code causing it to fire, and (b) you cannot control when and how often the trigger is fired.
You need to
define and create your function to generate that random value during database setup - once, before any operations are executed on the database
rewrite your trigger to take into account that multiple rows could be inserted at once, and in that case, the Inserted table will contain multiple rows which all have to be handled.
So your trigger will look something like this (with several assumptions by me - e.g. that enquiry_id is the primary key on your table - you need this to establish the INNER JOIN between your data table and the Inserted pseudo table:
ALTER TRIGGER [dbo].[trgEnquiryMaster]
ON [dbo].[enquiry_master]
AFTER INSERT
AS
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- update statements for trigger here
UPDATE enq
SET reference_no = dbo.GenerateRandomValue(.....)
FROM enquiry_master enq
INNER JOIN inserted i ON enq.enquiry_id = i.enquiry_id
I created the following script below. I am pretty much looking to update a colomn called "C1Int" with the number 1234567 randomly based on the pkey of the row.
I created a random generator for the pkey that uses 1 as the min and the total rows as the max.
Then there is a loop that should update the rows over and over based on the number in the WHILE statement. When I run it, it just updates one random row with the 1234567 number and even though its still running the loop, it never updates anything else. Am I missing something? Is there a better way to do this?
DECLARE #a INT
DECLARE #maxpkey INT
DECLARE #minpkey INT
DECLARE #randompkey INT
SET #a = 1
SET #maxpkey = (select count(*) from [LoadTestTwo].[dbo].[actbenchdb.Table1])
SET #minpkey = 1
SET #randompkey = ROUND(((#maxpkey - #minpkey -1) * RAND() + #minpkey),0)
WHILE #a < 500000000000000000000
BEGIN
UPDATE [LoadTestTwo].[dbo].[actbenchdb.Table1]
SET C1Int = (1234567)
WHERE Pkey = #randompkey
SET #a = #a + 1
END
Your current loop only updates a single row because you set the random key outside the loop and then just run the same update statement 500,000,000,000,000,000,000 times (Which assuming 1 million executions per second would still take 15 million years to complete, and would likely hit all your records anyway).
It is clear what you are trying to do, I just don't know why you would want to randomly change data in your database. Anyway, I may not understand why, but I can at least say how, if you want to update n random rows, then rather than running a loop n times, it would be better to perform a single update:
DECLARE #n INT = 100; -- NUMBER OF RANDOM ROWS TO UPDATE
UPDATE t
SET CInt = 12345467
FROM ( SELECT TOP (#n) *
FROM [LoadTestTwo].[dbo].[actbenchdb.Table1]
ORDER BY NEWID()
) AS t;
There are simpler and better ways to create a DB Load Generator!
Let me Google that for you
I have a large table with 100,000,000 rows. I'd like to select every n'th row from the table. My first instinct is to use something like this:
SELECT id,name FROM table WHERE id%125000=0
to retrieve a even spread of 800 rows (id is a clustered index)
This technique works fine on smaller data sets but with my larger table the query takes 2.5 minutes. I assume this is because the modulus operation is applied to every row. Is there a more optimal method of row skipping ?
Your query assumes that the IDs are contiguous (and probably they aren't without you realizing this...). Anyway, you should generate the IDs yourself:
select *
from T
where ID in (0, 250000*1, 250000*2, ...)
Maybe you need a TVP to send all IDs because there are so many. Or, you produce the IDs on the server in T-SQL or a SQLCLR function or a numbers table.
This technique allows you to perform index seeks and will be the fastest you can possibly produce. It reads the minimal amount of data possible.
Modulo is not SARGable. SQL Server could support this if Microsoft wanted it, but this is an exotic use case. They will never make modulo SARGable and they shouldn't.
The time is not going into the modulus operation itself, but rather into just reading 124,999 unnecessary rows for every row that you actually want (i.e., the Table Scan or Clustered Index Scan).
Just about the only way to speed up a query like this is something that seems at first illogical: Add an extra non-Clustered index on just that column ([ID]). Additionally, you may have to add an Index Hint to force it to use that index. And finally, it may not actually make it faster, though for a modulus of 125,000+, it should be (though it'll never be truly fast).
If your IDs are not necessarily contiguous (any deleted rows will pretty much cause this) and you really do need exactly every modulo rows, by ID order, then you can still use the approach above, but you will have to resequence the IDs for the Modulo operation using ROW_NUMBER() OVER(ORDER BY ID) in the query.
If id is in an index, then I am thinking of something along these lines:
with ids as (
select 1 as id
union all
select id + 125000
from ids
where id <= 100000000
)
select ids.id,
(select name from table t where t.id = ids.id) as name
from ids
option (MAXRECURSION 1000);
I think this formulation will use the index on table.
EDIT:
As I think about this approach, you can actually use it to get actual random ids in the table, rather than just evenly spaced ones:
with ids as (
select 1 as cnt,
ABS(CONVERT(BIGINT,CONVERT(BINARY(8), NEWID()))) % 100000000 as id
union all
select cnt + 1, ABS(CONVERT(BIGINT,CONVERT(BINARY(8), NEWID()))) % 100000000
from ids
where cnt < 800
)
select ids.id,
(select name from table t where t.id = ids.id) as name
from ids
option (MAXRECURSION 1000);
The code for the actual random number generator came from here.
EDIT:
Due to quirks in SQL Server, you can still get non-contiguous ids, even in your scenario. This accepted answer explains the cause. In short, identity values are not allocated one at a time, but rather in groups. The server can fail and even unused values get skipped.
One reason I wanted to do the random sampling was to help avoid this problem. Presumably, the above situation is rather rare on most systems. You can use the random sampling to generate say 900 ids. From these, you should be able to find 800 that are actually available for your sample.
DECLARE #i int, #max int, #query VARCHAR(1000)
SET #i = 0
SET #max = (SELECT max(id)/125000 FROM Table1)
SET #query = 'SELECT id, name FROM Table1 WHERE id in ('
WHILE #i <= #max
BEGIN
IF #i > 0 SET #query = #query + ','
SET #query = #query + CAST(#i*125000 as varchar(12))
SET #i = #i + 1
END
SET #query = #query + ')'
EXEC(#query)
EDIT :
To avoid any "holes" in a non-Contiguous ID situation, you can try something like this :
DECLARE #i int, #start int, #id int, #max int, #query VARCHAR(1000)
SET #i = 0
SET #max = (SELECT max(id)/125000 FROM Table1)
SET #query = 'SELECT id, name FROM Table1 WHERE id in ('
WHILE #i <= #max
BEGIN
SET #start = #i*125000
SET #id = (SELECT TOP 1 id FROM Table1 WHERE id >= #start ORDER BY id ASC)
IF #i > 0 SET #query = #query + ','
SET #query = #query + CAST(#id as VARCHAR(12))
SET #i = #i + 1
END
SET #query = #query + ')'
EXEC(#query)
These exams typically have about 120 questions. Currently, they strings are compared to the keys and a value of 1 or 0 assigned. When complete, total the 1's for a raw score.
Are there any T-SQL functions like intersect or diff or something all together different that would handle this process as quickly as possible for 100,000 examinees?
Thanks in advance for your expertise.
-Steven
Try selecting the equality of a question to its correct answer. I assume you have the student's tests in one table and the key in another; something like this ought to work:
select student_test.student_id,
student_test.test_id,
student_test.question_id,
(student_test.answer == test_key.answer OR (student_test.answer IS NULL AND test_key.answer IS NULL))
from student_test
INNER JOIN test_key
ON student_test.test_id = test_key.test_id
AND student_test.question_id = test_key.question_id
WHERE student_test.test_id = <the test to grade>
You can group the results by student and test, then sum the last column if you want the DB to give you the total score. This will give a detailed "right/wrong" analysis of the test.
EDIT: The answers being stored as a continuous string make it much harder. You will most likely have to implement this in a procedural fashion with a cursor, meaning each student's answers are loaded, SUBSTRINGed into varchar(1)s, and compared to the key in an RBAR (row by agonizing row) fashion. You could also implement a scalar-valued function that compared string A to string B one character at a time and returned the number of differences, then call that function from a driving query that will call this function for each student.
Something like this might work out for you:
select student_id, studentname, answers, 0 as score
into #scores from test_answers
declare #studentid int
declare #i int
declare #answers varchar(120)
declare #testkey varchar(120)
select #testkey = test_key from test_keys where test_id = 1234
declare student_cursor cursor for
select student_id from #scores
open student_cursor
fetch next from student_cursor into #studentid
while ##FETCH_STATUS = 0
begin
select #i = 1
select #answers = answers from #scores where student_id = #studentid
while #i < len(#answers)
begin
if mid(#answers, #i, 1) = mid(#testkey, #i, 1)
update #scores set score = score + 1 where student_id = #studentid
select #i = #i + 1
end
fetch next from student_cursor into #studentid
end
select * from #scores
drop table #scores
I doubt that's the single most efficient way to do it, but it's not a bad starting point at least.
I am trying to keep a rolling checksum to account for order, so take the previous 'checksum' and xor it with the current one and generate a new checksum.
Name Checksum Rolling Checksum
------ ----------- -----------------
foo 11829231 11829231
bar 27380135 checksum(27380135 ^ 11829231) = 93291803
baz 96326587 checksum(96326587 ^ 93291803) = 67361090
How would I accomplish something like this?
(Note that the calculations are completely made up and are for illustration only)
This is basically the running total problem.
Edit:
My original claim was that is one of the few places where a cursor based solution actually performs best. The problem with the triangular self join solution is that it will repeatedly end up recalculating the same cumulative checksum as a subcalculation for the next step so is not very scalable as the work required grows exponentially with the number of rows.
Corina's answer uses the "quirky update" approach. I've adjusted it to do the check sum and in my test found that it took 3 seconds rather than 26 seconds for the cursor solution. Both produced the same results. Unfortunately however it relies on an undocumented aspect of Update behaviour. I would definitely read the discussion here before deciding whether to rely on this in production code.
There is a third possibility described here (using the CLR) which I didn't have time to test. But from the discussion here it seems to be a good possibility for calculating running total type things at display time but out performed by the cursor when the result of the calculation must be saved back.
CREATE TABLE TestTable
(
PK int identity(1,1) primary key clustered,
[Name] varchar(50),
[CheckSum] AS CHECKSUM([Name]),
RollingCheckSum1 int NULL,
RollingCheckSum2 int NULL
)
/*Insert some random records (753,571 on my machine)*/
INSERT INTO TestTable ([Name])
SELECT newid() FROM sys.objects s1, sys.objects s2, sys.objects s3
Approach One: Based on the Jeff Moden Article
DECLARE #RCS int
UPDATE TestTable
SET #RCS = RollingCheckSum1 =
CASE WHEN #RCS IS NULL THEN
[CheckSum]
ELSE
CHECKSUM([CheckSum] ^ #RCS)
END
FROM TestTable WITH (TABLOCKX)
OPTION (MAXDOP 1)
Approach Two - Using the same cursor options as Hugo Kornelis advocates in the discussion for that article.
SET NOCOUNT ON
BEGIN TRAN
DECLARE #RCS2 INT
DECLARE #PK INT, #CheckSum INT
DECLARE curRollingCheckSum CURSOR LOCAL STATIC READ_ONLY
FOR
SELECT PK, [CheckSum]
FROM TestTable
ORDER BY PK
OPEN curRollingCheckSum
FETCH NEXT FROM curRollingCheckSum
INTO #PK, #CheckSum
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RCS2 = CASE WHEN #RCS2 IS NULL THEN #CheckSum ELSE CHECKSUM(#CheckSum ^ #RCS2) END
UPDATE dbo.TestTable
SET RollingCheckSum2 = #RCS2
WHERE #PK = PK
FETCH NEXT FROM curRollingCheckSum
INTO #PK, #CheckSum
END
COMMIT
Test they are the same
SELECT * FROM TestTable
WHERE RollingCheckSum1<> RollingCheckSum2
I'm not sure about a rolling checksum, but for a rolling sum for instance, you can do this using the UPDATE command:
declare #a table (name varchar(2), value int, rollingvalue int)
insert into #a
select 'a', 1, 0 union all select 'b', 2, 0 union all select 'c', 3, 0
select * from #a
declare #sum int
set #sum = 0
update #a
set #sum = rollingvalue = value + #sum
select * from #a
Select Name, Checksum
, (Select T1.Checksum_Agg(Checksum)
From Table As T1
Where T1.Name < T.Name) As RollingChecksum
From Table As T
Order By T.Name
To do a rolling anything, you need some semblance of an order to the rows. That can be by name, an integer key, a date or whatever. In my example, I used name (even though the order in your sample data isn't alphabetical). In addition, I'm using the Checksum_Agg function in SQL.
In addition, you would ideally have a unique value on which you compare the inner and outer query. E.g., Where T1.PK < T.PK for an integer key or even string key would work well. In my solution if Name had a unique constraint, it would also work well enough.