SQL Binomial Distribution/Arithmetical Overflow - sql

I created a function for a cumulative binomial distribution. It works well for extremely modest sample sizes, but I get a arithmetical overflow on larger samples.
The largest culprit is the n!. In excel 170! = 7.3E+306. 171! = #NUM!
Excel has an internal function that calculates binomial distribution, and it works with ns much, much larger than 170.
Is there something I can do to limit the magnitude of the #s generated?
EDIT: I played with this
SET #probout = 2*3*4*5*6*7*8*9*10*11*12
Worked fine
SET #probout = 2*3*4*5*6*7*8*9*10*11*12*13/10000
Resulted in overflow
Function below.
ALTER FUNCTION [dbo].[binomdist_cumulative]
(
#n int
,#k int
,#p float
)
RETURNS float
AS
BEGIN
-- Local Variable Declarations
-- ---------------------------
DECLARE #kfac float
,#nfac float
,#nkfac float
,#i float
,#f int
,#probout float
SET #i = 0
SET #f = 0
SET #nfac = 0
SET #kfac = 0
SET #nkfac = 0
SET #probout = 0
WHILE #i <= #k
BEGIN
--k!
SET #f = #i-1
SET #kfac = #i
IF #kfac > 0
BEGIN
WHILE #f > 0
BEGIN
SET #kfac = #kfac*#f
SET #f = #f -1
END
END
ELSE
BEGIN
SET #kfac = 1
END
--n!
SET #f = #n-1
SET #nfac = #n
IF #nfac > 0
BEGIN
WHILE #f > 0
BEGIN
SET #nfac = #nfac * #f
SET #f = #f -1
END
END
ELSE
BEGIN
SET #nfac = 1
END
--(n-k)!
SET #f = #n-#i-1
SET #nkfac = #n-#i
IF #nkfac > 0
BEGIN
WHILE #f > 0
BEGIN
SET #nkfac = #nkfac * #f
SET #f = #f -1
END
END
ELSE
BEGIN
SET #nkfac = 1
END
--Accumulate distribution
SET #probout = #probout + #nfac/(#kfac*#nkfac)*POWER(#p,#i)*POWER(1-#p,#n-#i)
SET #i = #i+1
END
RETURN #probout
END

Let me give you a hint.
If you calculate the full factorials, you are quickly going to get overflows. If you do an incremental calculation, the you won't.
For instance, instead of calculating (5 // 3) as (5*4*3*2*1) / ((3*2) * (3*2*1)), calculate it as: (5 / 3) * (4 / 2) * (3 / 3) * (2 / 2) * (1 / 1). . . oh, wait, you can see that the last three terms are all "1".
To be clear, you want to calculate the product of:
((n - i) / (n - k - i)
For i between 0 and k - 1. That is, you are dividing the product of k consecutive numbers ending in n with k consecutive numbers starting with 1.
You'll see that this incremental approach will forestall the issues with overflow.

Related

SQL Query To Find Proceeding Records

This is a hard question for me to verbalize. I'll do my best. This is my record set:
DealershipID PlacementDealershipID
40309 -1
787289 40309
787461 787289
787402 787461
787520 787402
You will notice that the top of the chain is ID 40309 that has a PlacementDealershipID of -1. If we start with ID 787520, the PlacementDealershipID goes "up" the chain, until we get to DealershipID 40309.
Desired results would look like this starting at DealershipID 787520.
Starting DealershipID ---->>> 787520
1 Level(s) Up ---->>> 787402
2 Level(s) Up ---->>> 787461
3 Level(s) Up ---->>> 787289
4 Level(s) Up ---->>> 40309
I want to print a list of DealershipID's in SQL in order of DealershipID from starting point to 40309. I have done this using a while loop and declared variables. (Code Below). Is there an easier way to do this with SQL built in functionality?
-- Loop and pull structure
set #PreceedingDealershipID = 999
set #LevelNumber = 1
set #ContinueLoop = 1
set #LoopCount = 0
-- Determin maximum level for this system
set #MaxLevel = 30
set #MaxLevel = coalesce(#MaxLevel,30)
PRINT N'Starting ID ---->>> ' + convert(varchar(40), #DealershipID);
while (#ContinueLoop = 1 and #PreceedingDealershipID <> -1 and #PreceedingDealershipID <> 0)
begin
set #LoopCount = #LoopCount + 1
if (#LoopCount > #MaxLevel)
begin
raiserror('THERE IS A LOOP IN THE DATABASE! CANNOT CONTINUE!',15,15)
return
end
SELECT #PreceedingDealershipID = d.PlacementDealershipID
FROM Sponsorship as d WITH (NOLOCK)
WHERE d.DealershipID = CASE WHEN #LevelNumber = 1 THEN #DealershipId ELSE #PreceedingDealershipID END
if (#PreceedingDealershipID <> 0 and #PreceedingDealershipID <> -1)
Begin
PRINT N'Level ' + convert(varchar(40), #LevelNumber) + ' Next Level Up ---->>> ' + convert(varchar(40), #PreceedingDealershipID);
set #LevelNumber = #LevelNumber + 1
End
end -- end main loop

SP execution time is extremely slow

I created a stored procedure that calculates a financial spreading based on a linear self adjusting rule and it takes more than 2 minutes to finish the calculations.
The final value goes through multiple iterations in order to adjust and enhance it till finding the optimal optimized final value.
The parameters are the following:
#input1 = 100000
#input2 = 40
#input3 = 106833
BEGIN
DECLARE #X decimal(22,6) = 0
DECLARE #Y decimal(22,6) = 0.001
DECLARE #Z decimal(22,6)
DECLARE #r decimal(22,6)
DECLARE #v decimal(22,6)
SET #v = POWER(1/(1+ (#Y/12)), #input2)
SET #r = ((#Y/#input2) * input1) / (1-#v)
IF (#r < #input3)
SET #Z = #Y + ABS((#X - #Y)/2)
ELSE
SET #Z = #Y - ABS((#X - #Y) /2)
SET #X = #Y
SET #Y = #Z
WHILE (ABS(#r - #input3) > 0.001)
BEGIN
SET #v = POWER(1/(1+ (#Y/12)), #input2)
SET #r = ((#Y/#input2) * #input1) / (1-#v)
IF (#r < #input3)
SET #Z = #Y + ABS((#X - #Y)/2)
ELSE
SET #Z = #Y - ABS((#X - #Y) /2)
SET #X = #Y
IF #Y = #Z
BREAK
SET #Y = #Z
END
RETURN (CAST(#Y AS decimal(22,6)) * 100)
END
run time = 2 mins and 20 seconds
An alternative to your stored procedure written in TSQL might be a SQL CLR function written in C#. You have to use Visual Studio and create a Database Project.
public static decimal ConvertTo6(double d)
{
return Math.Round(Convert.ToDecimal(d), 6, MidpointRounding.AwayFromZero);
}
public static decimal ConvertTo6(decimal d)
{
return Math.Round(d, 6, MidpointRounding.AwayFromZero);
}
[Microsoft.SqlServer.Server.SqlFunction]
[return: SqlFacet(Precision = 22, Scale = 6)]
public static SqlDecimal CalcFinancialSpreading(int input1 = 100000, int input2 = 40, int input3 = 106833)
{
decimal x = 0.000000m;
decimal y = 0.001000m;
decimal z;
decimal r;
decimal v;
v = ConvertTo6(Math.Pow(1 / (1 + (Convert.ToDouble(y) / 12d)), input2));
r = ConvertTo6(((y / input2) * input1) / (1 - v));
if (r < input3)
{
z = y + Math.Abs((x - y) / 2);
z = ConvertTo6(z);
}
else
{
z = y - Math.Abs((x - y) / 2);
z = ConvertTo6(z);
}
x = y;
y = z;
while (Math.Abs(r - input3) > 0.001m)
{
v = ConvertTo6((Math.Pow(Convert.ToDouble(1 / (1 + (y / 12))), Convert.ToDouble(input2))));
r = ((y / input2) * input1) / (1 - v);
r = ConvertTo6(r);
if (r < input3)
{
z = y + Math.Abs((x - y) / 2);
z = ConvertTo6(z);
}
else
{
z = y - Math.Abs((x - y) / 2);
z = ConvertTo6(z);
}
x = y;
if (y == z) break;
y = z;
}
decimal result = y * 100;
return new SqlDecimal(result);
}
Executed as C# code the result is received in 45 seconds on my machine vs. TSQL in 1 Min 56 seconds.
Kudos to #wikiCan by answering this one...

Incorrect Syntax; Access VBA function to SQL Server

I am converting a large MS Access application to run almost entirely on SQL Server to increase performance as the Access DB is ancient and extremely slow. As such I am working on recreating the VBA functions into SQL functions. SQL Server is giving me syntax errors all over the place and I am trying to understand why, here is the VBA:
Function AlphaNum(prodno As String)
Dim i As Long
If Len(prodno) = 0 Then Exit Function
For i = 1 To Len(prodno)
If Mid(prodno, i, 1) Like "[0-9a-z]" Then AlphaNum = AlphaNum & Mid(prodno, i, 1)
Next i
If IsNumeric(AlphaNum) Then
While Left(AlphaNum, 1) = "0"
AlphaNum = Mid(AlphaNum, 2)
Wend
End If
End Function
I've gotten this far with SQL.. I am certainly not an expert at this either, any ideas?
CREATE FUNCTION dbo.[NAL_AlphaNum]
(
#prodno varchar(10)
)
RETURNS VarChar(10)
BEGIN
DECLARE #counter INT,
#AlphaNum varchar(10)
SET #counter='1'
CASE WHEN Len(#prodno) = 0 THEN EXIT
WHILE #counter < Len(#prodno)
BEGIN
CASE WHEN Mid(#prodno, i, 1) LIKE '[0-9a-z]'
THEN #AlphaNum = #AlphaNum + Mid(#prodno, i, 1)
SET #counter = #counter + 1
END
CASE WHEN IsNumeric(#AlphaNum) THEN
WHILE Left(#AlphaNum, 1) = '0'
#AlphaNum = Mid(#AlphaNum, 2)
END
RETURN #AlphaNum
END;
Thank you in advance.

Nesting while loop inside a while loop in cshell

I need my code to read files that are numbered between 1 and 4000. It will then do something with the files, I am trying to break them up in blocks of 500 with the following.
#!/bin/tcsh
# x = 1
# looper = 1
while ($x < 3)
while ($looper < 500)
#filenumber = $x -1
#filenumber = $filenumber * 500
#filenumber = $filenumber + $looper
echo $filenumber
#looper += 1
done
#x += 1
done
I want this to count from 1 to 1000 in units of 500. However when I try this the script only counts to 500. Does anyone know why this is?
Thanks for your help
You need to initialize #looper = 1 inside the outer loop, otherwise it gets initialized only once, and starts the second iteration with the value 500.
# x = 1
while ($x < 3)
#looper = 1 <-- here
while ($looper < 500)
#filenumber = $x -1
#filenumber = $filenumber * 500
#filenumber = $filenumber + $looper
echo $filenumber
#looper += 1
done
#x += 1
done
The answer is that right beneath the #x +=1 line there needs to be a line resetting the $looper variable
#x += 1
#looper = 1
done
WHOOPS!!!

Which function is faster Abs or IIf?

I ran the following test to determine the difference in efficiency between IIf and Abs:
Public Sub TestSpeed()
Dim i As Long
Dim res As Integer
Debug.Print "*** IIF ***"
Debug.Print Format(Now, "HH:mm:ss")
For i = 1 To 999999999
res = IIf(-1 = True, 1, 0)
Next
Debug.Print Format(Now, "HH:mm:ss")
Debug.Print "*** ABS **"
Debug.Print Format(Now, "HH:mm:ss")
For i = 1 To 999999999
res = Abs(-1)
Next
Debug.Print Format(Now, "HH:mm:ss")
End Sub
The results show that Abs is about 12 times faster:
TestSpeed
*** IIF ***
15:59:08
16:01:26
*** ABS **
16:01:26
16:01:37
Can anyone support this or prove that the contrary is true?
EDIT:
One situation in which one may need to decide between the two functions is for doing multiple counts based on criteria in an SQL query such as:
SELECT Sum(Abs(Colour = 'Yellow')) AS CountOfYellowItems, Sum(Abs(Votes>3) AS CountOfMoreThanThreeVotes FROM tblItems
versus
SELECT Sum(IIf(Colour = 'Yellow' ,1 ,0)) AS CountOfYellowItems, Sum(IIf(Votes > 3 ,1 ,0) AS CountOfMoreThanThreeVotes FROM tblItems
I ran a similar test that supports your findings:
Option Compare Database
Option Explicit
Public Sub SpeedTest()
Const LoopLimit = 99999999
Dim Tasked(LoopLimit) As Boolean, i As Long
Dim Total As Long, t0 As Single, Elapsed As Single
For i = 0 To LoopLimit
Tasked(i) = False
Next
Debug.Print "*** IIF ***"
Total = 0
t0 = Timer
For i = 0 To LoopLimit
Total = Total + IIf(Tasked(i) = True, 1, 0)
Next
Elapsed = Timer - t0
Debug.Print "Elapsed time: " & Format(Elapsed, "0.0") & " seconds."
Debug.Print "Average time: " & Format(Elapsed / (LoopLimit + 1) * 1000000000, "0") & " nanoseconds."
Debug.Print "*** ABS ***"
Total = 0
t0 = Timer
For i = 0 To LoopLimit
Total = Total + Abs(Tasked(i))
Next
Elapsed = Timer - t0
Debug.Print "Elapsed time: " & Format(Elapsed, "0.0") & " seconds."
Debug.Print "Average time: " & Format(Elapsed / (LoopLimit + 1) * 1000000000, "0") & " nanoseconds."
End Sub
resulting in
*** IIF ***
Elapsed time: 19.0 seconds.
Average time: 190 nanoseconds.
*** ABS ***
Elapsed time: 2.4 seconds.
Average time: 24 nanoseconds.
In terms of raw execution speed, Abs(BooleanValue) appears to be an order of magnitude faster than IIf(BooleanValue = True, 1, 0).
Whether or not that difference has a significant impact on the overall performance of the code in which the functions may be used depends very much on the context, as illustrated here.
I have had a look at this in SQL for comparison.
DECLARE #i As BIGINT = 1
DECLARE #res As INT
declare #v BIGINT = -1
Print '*** ABS **'
DECLARE #s2 datetime = GETDATE()
Print Format(#s2, 'HH:mm:ss')
SET #i = 1
WHILE #i < 9999999
begin
SET #res = Abs(#v)
SET #i = #i + 1
end
DECLARE #e2 datetime = GETDATE()
Print Format(#e2, 'HH:mm:ss')
Print DATEDIFF(MILLISECOND, #s2,#e2)
DECLARE #i As BIGINT = 1
DECLARE #res As INT
declare #v INT = -1
Print '*** IIF **'
DECLARE #s1 datetime = GETDATE()
Print Format(#s1, 'HH:mm:ss')
SET #i = 1
WHILE #i < 9999999
begin
SET #res = IIf(#v < 0 , #v*-1, #v)
SET #i = #i + 1
END
DECLARE #e1 datetime = GETDATE()
Print Format(#e1, 'HH:mm:ss')
Print DATEDIFF(MILLISECOND, #s1,#e1)
You can't run both statements together to get a fair comparison. I think there is almost no difference. One case where IIF is better is where
#v BIGINT = -2147483648 (or greater), as ABS would just fail with an overflow.