Are there any Linear Regression Function in SQL Server? - sql-server-2005

Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ?

To the best of my knowledge, there is none. Writing one is pretty straightforward, though. The following gives you the constant alpha and slope beta for y = Alpha + Beta * x + epsilon:
-- test data (GroupIDs 1, 2 normal regressions, 3, 4 = no variance)
WITH some_table(GroupID, x, y) AS
( SELECT 1, 1, 1 UNION SELECT 1, 2, 2 UNION SELECT 1, 3, 1.3
UNION SELECT 1, 4, 3.75 UNION SELECT 1, 5, 2.25 UNION SELECT 2, 95, 85
UNION SELECT 2, 85, 95 UNION SELECT 2, 80, 70 UNION SELECT 2, 70, 65
UNION SELECT 2, 60, 70 UNION SELECT 3, 1, 2 UNION SELECT 3, 1, 3
UNION SELECT 4, 1, 2 UNION SELECT 4, 2, 2),
-- linear regression query
/*WITH*/ mean_estimates AS
( SELECT GroupID
,AVG(x * 1.) AS xmean
,AVG(y * 1.) AS ymean
FROM some_table
GROUP BY GroupID
),
stdev_estimates AS
( SELECT pd.GroupID
-- T-SQL STDEV() implementation is not numerically stable
,CASE SUM(SQUARE(x - xmean)) WHEN 0 THEN 1
ELSE SQRT(SUM(SQUARE(x - xmean)) / (COUNT(*) - 1)) END AS xstdev
, SQRT(SUM(SQUARE(y - ymean)) / (COUNT(*) - 1)) AS ystdev
FROM some_table pd
INNER JOIN mean_estimates pm ON pm.GroupID = pd.GroupID
GROUP BY pd.GroupID, pm.xmean, pm.ymean
),
standardized_data AS -- increases numerical stability
( SELECT pd.GroupID
,(x - xmean) / xstdev AS xstd
,CASE ystdev WHEN 0 THEN 0 ELSE (y - ymean) / ystdev END AS ystd
FROM some_table pd
INNER JOIN stdev_estimates ps ON ps.GroupID = pd.GroupID
INNER JOIN mean_estimates pm ON pm.GroupID = pd.GroupID
),
standardized_beta_estimates AS
( SELECT GroupID
,CASE WHEN SUM(xstd * xstd) = 0 THEN 0
ELSE SUM(xstd * ystd) / (COUNT(*) - 1) END AS betastd
FROM standardized_data pd
GROUP BY GroupID
)
SELECT pb.GroupID
,ymean - xmean * betastd * ystdev / xstdev AS Alpha
,betastd * ystdev / xstdev AS Beta
FROM standardized_beta_estimates pb
INNER JOIN stdev_estimates ps ON ps.GroupID = pb.GroupID
INNER JOIN mean_estimates pm ON pm.GroupID = pb.GroupID
Here GroupID is used to show how to group by some value in your source data table. If you just want the statistics across all data in the table (not specific sub-groups), you can drop it and the joins. I have used the WITH statement for sake of clarity. As an alternative, you can use sub-queries instead. Please be mindful of the precision of the data type used in your tables as the numerical stability can deteriorate quickly if the precision is not high enough relative to your data.
EDIT: (in answer to Peter's question for additional statistics like R2 in the comments)
You can easily calculate additional statistics using the same technique. Here is a version with R2, correlation, and sample covariance:
-- test data (GroupIDs 1, 2 normal regressions, 3, 4 = no variance)
WITH some_table(GroupID, x, y) AS
( SELECT 1, 1, 1 UNION SELECT 1, 2, 2 UNION SELECT 1, 3, 1.3
UNION SELECT 1, 4, 3.75 UNION SELECT 1, 5, 2.25 UNION SELECT 2, 95, 85
UNION SELECT 2, 85, 95 UNION SELECT 2, 80, 70 UNION SELECT 2, 70, 65
UNION SELECT 2, 60, 70 UNION SELECT 3, 1, 2 UNION SELECT 3, 1, 3
UNION SELECT 4, 1, 2 UNION SELECT 4, 2, 2),
-- linear regression query
/*WITH*/ mean_estimates AS
( SELECT GroupID
,AVG(x * 1.) AS xmean
,AVG(y * 1.) AS ymean
FROM some_table pd
GROUP BY GroupID
),
stdev_estimates AS
( SELECT pd.GroupID
-- T-SQL STDEV() implementation is not numerically stable
,CASE SUM(SQUARE(x - xmean)) WHEN 0 THEN 1
ELSE SQRT(SUM(SQUARE(x - xmean)) / (COUNT(*) - 1)) END AS xstdev
, SQRT(SUM(SQUARE(y - ymean)) / (COUNT(*) - 1)) AS ystdev
FROM some_table pd
INNER JOIN mean_estimates pm ON pm.GroupID = pd.GroupID
GROUP BY pd.GroupID, pm.xmean, pm.ymean
),
standardized_data AS -- increases numerical stability
( SELECT pd.GroupID
,(x - xmean) / xstdev AS xstd
,CASE ystdev WHEN 0 THEN 0 ELSE (y - ymean) / ystdev END AS ystd
FROM some_table pd
INNER JOIN stdev_estimates ps ON ps.GroupID = pd.GroupID
INNER JOIN mean_estimates pm ON pm.GroupID = pd.GroupID
),
standardized_beta_estimates AS
( SELECT GroupID
,CASE WHEN SUM(xstd * xstd) = 0 THEN 0
ELSE SUM(xstd * ystd) / (COUNT(*) - 1) END AS betastd
FROM standardized_data
GROUP BY GroupID
)
SELECT pb.GroupID
,ymean - xmean * betastd * ystdev / xstdev AS Alpha
,betastd * ystdev / xstdev AS Beta
,CASE ystdev WHEN 0 THEN 1 ELSE betastd * betastd END AS R2
,betastd AS Correl
,betastd * xstdev * ystdev AS Covar
FROM standardized_beta_estimates pb
INNER JOIN stdev_estimates ps ON ps.GroupID = pb.GroupID
INNER JOIN mean_estimates pm ON pm.GroupID = pb.GroupID
EDIT 2 improves numerical stability by standardizing data (instead of only centering) and by replacing STDEV because of numerical stability issues. To me, the current implementation seems to be the best trade-off between stability and complexity. I could improve stability by replacing my standard deviation with a numerically stable online algorithm, but this would complicate the implementation substantantially (and slow it down). Similarly, implementations using e.g. Kahan(-Babuška-Neumaier) compensations for the SUM and AVG seem to perform modestly better in limited tests, but make the query much more complex. And as long as I do not know how T-SQL implements SUM and AVG (e.g. it might already be using pairwise summation), I cannot guarantee that such modifications always improve accuracy.

This is an alternate method, based off a blog post on Linear Regression in T-SQL, which uses the following equations:
The SQL suggestion in the blog uses cursors though. Here's a prettified version of a forum answer that I used:
table
-----
X (numeric)
Y (numeric)
/**
* m = (nSxy - SxSy) / (nSxx - SxSx)
* b = Ay - (Ax * m)
* N.B. S = Sum, A = Mean
*/
DECLARE #n INT
SELECT #n = COUNT(*) FROM table
SELECT (#n * SUM(X*Y) - SUM(X) * SUM(Y)) / (#n * SUM(X*X) - SUM(X) * SUM(X)) AS M,
AVG(Y) - AVG(X) *
(#n * SUM(X*Y) - SUM(X) * SUM(Y)) / (#n * SUM(X*X) - SUM(X) * SUM(X)) AS B
FROM table

I've actually written an SQL routine using Gram-Schmidt orthoganalization. It, as well as other machine learning and forecasting routines, is available at sqldatamine.blogspot.com
At the suggestion of Brad Larson I've added the code here rather than just direct users to my blog. This produces the same results as the linest function in Excel. My primary source is Elements of Statistical Learning (2008) by Hastie, Tibshirni and Friedman.
--Create a table of data
create table #rawdata (id int,area float, rooms float, odd float, price float)
insert into #rawdata select 1, 2201,3,1,400
insert into #rawdata select 2, 1600,3,0,330
insert into #rawdata select 3, 2400,3,1,369
insert into #rawdata select 4, 1416,2,1,232
insert into #rawdata select 5, 3000,4,0,540
--Insert the data into x & y vectors
select id xid, 0 xn,1 xv into #x from #rawdata
union all
select id, 1,rooms from #rawdata
union all
select id, 2,area from #rawdata
union all
select id, 3,odd from #rawdata
select id yid, 0 yn, price yv into #y from #rawdata
--create a residuals table and insert the intercept (1)
create table #z (zid int, zn int, zv float)
insert into #z select id , 0 zn,1 zv from #rawdata
--create a table for the orthoganal (#c) & regression(#b) parameters
create table #c(cxn int, czn int, cv float)
create table #b(bn int, bv float)
--#p is the number of independent variables including the intercept (#p = 0)
declare #p int
set #p = 1
--Loop through each independent variable and estimate the orthagonal parameter (#c)
-- then estimate the residuals and insert into the residuals table (#z)
while #p <= (select max(xn) from #x)
begin
insert into #c
select xn cxn, zn czn, sum(xv*zv)/sum(zv*zv) cv
from #x join #z on xid = zid where zn = #p-1 and xn>zn group by xn, zn
insert into #z
select zid, xn,xv- sum(cv*zv)
from #x join #z on xid = zid join #c on czn = zn and cxn = xn where xn = #p and zn<xn group by zid, xn,xv
set #p = #p +1
end
--Loop through each independent variable and estimate the regression parameter by regressing the orthoganal
-- resiuduals on the dependent variable y
while #p>=0
begin
insert into #b
select zn, sum(yv*zv)/ sum(zv*zv)
from #z join
(select yid, yv-isnull(sum(bv*xv),0) yv from #x join #y on xid = yid left join #b on xn=bn group by yid, yv) y
on zid = yid where zn = #p group by zn
set #p = #p-1
end
--The regression parameters
select * from #b
--Actual vs. fit with error
select yid, yv, fit, yv-fit err from #y join
(select xid, sum(xv*bv) fit from #x join #b on xn = bn group by xid) f
on yid = xid
--R Squared
select 1-sum(power(err,2))/sum(power(yv,2)) from
(select yid, yv, fit, yv-fit err from #y join
(select xid, sum(xv*bv) fit from #x join #b on xn = bn group by xid) f
on yid = xid) d

There are no linear regression functions in SQL Server. But to calculate a Simple Linear Regression (Y' = bX + A) between pairs of data points x,y - including the calculation of the Correlation Coefficient, Coefficient of Determination (R^2) and Standard Estimate of Error (Standard Deviation), do the following:
For a table regression_data with numeric columns x and y:
declare #total_points int
declare #intercept DECIMAL(38, 10)
declare #slope DECIMAL(38, 10)
declare #r_squared DECIMAL(38, 10)
declare #standard_estimate_error DECIMAL(38, 10)
declare #correlation_coefficient DECIMAL(38, 10)
declare #average_x DECIMAL(38, 10)
declare #average_y DECIMAL(38, 10)
declare #sumX DECIMAL(38, 10)
declare #sumY DECIMAL(38, 10)
declare #sumXX DECIMAL(38, 10)
declare #sumYY DECIMAL(38, 10)
declare #sumXY DECIMAL(38, 10)
declare #Sxx DECIMAL(38, 10)
declare #Syy DECIMAL(38, 10)
declare #Sxy DECIMAL(38, 10)
Select
#total_points = count(*),
#average_x = avg(x),
#average_y = avg(y),
#sumX = sum(x),
#sumY = sum(y),
#sumXX = sum(x*x),
#sumYY = sum(y*y),
#sumXY = sum(x*y)
from regression_data
set #Sxx = #sumXX - (#sumX * #sumX) / #total_points
set #Syy = #sumYY - (#sumY * #sumY) / #total_points
set #Sxy = #sumXY - (#sumX * #sumY) / #total_points
set #correlation_coefficient = #Sxy / SQRT(#Sxx * #Syy)
set #slope = (#total_points * #sumXY - #sumX * #sumY) / (#total_points * #sumXX - power(#sumX,2))
set #intercept = #average_y - (#total_points * #sumXY - #sumX * #sumY) / (#total_points * #sumXX - power(#sumX,2)) * #average_x
set #r_squared = (#intercept * #sumY + #slope * #sumXY - power(#sumY,2) / #total_points) / (#sumYY - power(#sumY,2) / #total_points)
-- calculate standard_estimate_error (standard deviation)
Select
#standard_estimate_error = sqrt(sum(power(y - (#slope * x + #intercept),2)) / #total_points)
From regression_data

Here it is as a function that takes a table type of type: table (Y float, X double) which is
called XYDoubleType and assumes our linear function is of the form AX + B. It returns A and B a Table column just in case you want to have it in a join or something
CREATE FUNCTION FN_GetABForData(
#XYData as XYDoubleType READONLY
) RETURNS #ABData TABLE(
A FLOAT,
B FLOAT,
Rsquare FLOAT )
AS
BEGIN
DECLARE #sx FLOAT, #sy FLOAT
DECLARE #sxx FLOAT,#syy FLOAT, #sxy FLOAT,#sxsy FLOAT, #sxsx FLOAT, #sysy FLOAT
DECLARE #n FLOAT, #A FLOAT, #B FLOAT, #Rsq FLOAT
SELECT #sx =SUM(D.X) ,#sy =SUM(D.Y), #sxx=SUM(D.X*D.X),#syy=SUM(D.Y*D.Y),
#sxy =SUM(D.X*D.Y),#n =COUNT(*)
From #XYData D
SET #sxsx =#sx*#sx
SET #sxsy =#sx*#sy
SET #sysy = #sy*#sy
SET #A = (#n*#sxy -#sxsy)/(#n*#sxx -#sxsx)
SET #B = #sy/#n - #A*#sx/#n
SET #Rsq = POWER((#n*#sxy -#sxsy),2)/((#n*#sxx-#sxsx)*(#n*#syy -#sysy))
INSERT INTO #ABData (A,B,Rsquare) VALUES(#A,#B,#Rsq)
RETURN
END

To add to #icc97 answer, I have included the weighted versions for the slope and the intercept. If the values are all constant the slope will be NULL (with the appropriate settings SET ARITHABORT OFF; SET ANSI_WARNINGS OFF;) and will need to be substituted for 0 via coalesce().
Here is a solution written in SQL:
with d as (select segment,w,x,y from somedatasource)
select segment,
avg(y) - avg(x) *
((count(*) * sum(x*y)) - (sum(x)*sum(y)))/
((count(*) * sum(x*x)) - (Sum(x)*Sum(x))) as intercept,
((count(*) * sum(x*y)) - (sum(x)*sum(y)))/
((count(*) * sum(x*x)) - (sum(x)*sum(x))) AS slope,
avg(y) - ((avg(x*y) - avg(x)*avg(y))/var_samp(X)) * avg(x) as interceptUnstable,
(avg(x*y) - avg(x)*avg(y))/var_samp(X) as slopeUnstable,
(Avg(x * y) - Avg(x) * Avg(y)) / (stddev_pop(x) * stddev_pop(y)) as correlationUnstable,
(sum(y*w)/sum(w)) - (sum(w*x)/sum(w)) *
((sum(w)*sum(x*y*w)) - (sum(x*w)*sum(y*w)))/
((sum(w)*sum(x*x*w)) - (sum(x*w)*sum(x*w))) as wIntercept,
((sum(w)*sum(x*y*w)) - (sum(x*w)*sum(y*w)))/
((sum(w)*sum(x*x*w)) - (sum(x*w)*sum(x*w))) as wSlope,
(count(*) * sum(x * y) - sum(x) * sum(y)) / (sqrt(count(*) * sum(x * x) - sum(x) * sum(x))
* sqrt(count(*) * sum(y * y) - sum(y) * sum(y))) as correlation,
(sum(w) * sum(x*y*w) - sum(x*w) * sum(y*w)) /
(sqrt(sum(w) * sum(x*x*w) - sum(x*w) * sum(x*w)) * sqrt(sum(w) * sum(y*y*w)
- sum(y*w) * sum(y*w))) as wCorrelation,
count(*) as n
from d where x is not null and y is not null group by segment
Where w is the weight. I double checked this against R to confirm the results.
One may need to cast the data from somedatasource to floating point.
I included the unstable versions to warn you against those. (Special thanks goes to Stephan in another answer.)
Update: added weighted correlation

I have translated the Linear Regression Function used in the funcion Forecast in Excel, and created an SQL function that returns a,b, and the Forecast.
You can see the complete teorical explanation in the excel help for FORECAST fuction.
Firs of all you will need to create the table data type XYFloatType:
CREATE TYPE [dbo].[XYFloatType]
AS TABLE(
[X] FLOAT,
[Y] FLOAT)
Then write the follow function:
/*
-- =============================================
-- Author: Me :)
-- Create date: Today :)
-- Description: (Copied Excel help):
--Calculates, or predicts, a future value by using existing values.
The predicted value is a y-value for a given x-value.
The known values are existing x-values and y-values, and the new value is predicted by using linear regression.
You can use this function to predict future sales, inventory requirements, or consumer trends.
-- =============================================
*/
CREATE FUNCTION dbo.FN_GetLinearRegressionForcast
(#PtXYData as XYFloatType READONLY ,#PnFuturePointint)
RETURNS #ABDData TABLE( a FLOAT, b FLOAT, Forecast FLOAT)
AS
BEGIN
DECLARE #LnAvX Float
,#LnAvY Float
,#LnB Float
,#LnA Float
,#LnForeCast Float
Select #LnAvX = AVG([X])
,#LnAvY = AVG([Y])
FROM #PtXYData;
SELECT #LnB = SUM ( ([X]-#LnAvX)*([Y]-#LnAvY) ) / SUM (POWER([X]-#LnAvX,2))
FROM #PtXYData;
SET #LnA = #LnAvY - #LnB * #LnAvX;
SET #LnForeCast = #LnA + #LnB * #PnFuturePoint;
INSERT INTO #ABDData ([A],[B],[Forecast]) VALUES (#LnA,#LnB,#LnForeCast)
RETURN
END
/*
your tests:
(I used the same values that are in the excel help)
DECLARE #t XYFloatType
INSERT #t VALUES(20,6),(28,7),(31,9),(38,15),(40,21) -- x and y values
SELECT *, A+B*30 [Prueba]FROM dbo.FN_GetLinearRegressionForcast#t,30);
*/

I hope the following answer helps one understand where some of the solutions come from. I am going to illustrate it with a simple example, but the generalization to many variables is theoretically straightforward as long as you know how to use index notation or matrices. For implementing the solution for anything beyond 3 variables you'll Gram-Schmidt (See Colin Campbell's answer above) or another matrix inversion algorithm.
Since all the functions we need are variance, covariance, average, sum etc. are aggregation functions in SQL, one can easily implement the solution. I've done so in HIVE to do linear calibration of the scores of a Logistic model - amongst many advantages, one is that you can function entirely within HIVE without going out and back in from some scripting language.
The model for your data (x_1, x_2, y) where your data points are indexed by i, is
y(x_1, x_2) = m_1*x_1 + m_2*x_2 + c
The model appears "linear", but needn't be, For example x_2 can be any non-linear function of x_1, as long as it has no free parameters in it, e.g. x_2 = Sinh(3*(x_1)^2 + 42). Even if x_2 is "just" x_2 and the model is linear, the regression problem isn't. Only when you decide that the problem is to find the parameters m_1, m_2, c such that they minimize the L2 error do you have a Linear Regression problem.
The L2 error is sum_i( (y[i] - f(x_1[i], x_2[i]))^2 ). Minimizing this w.r.t. the 3 parameters (set the partial derivatives w.r.t. each parameter = 0) yields 3 linear equations for 3 unknowns. These equations are LINEAR in the parameters (this is what makes it Linear Regression) and can be solved analytically. Doing this for a simple model (1 variable, linear model, hence two parameters) is straightforward and instructive. The generalization to a non-Euclidean metric norm on the error vector space is straightforward, the diagonal special case amounts to using "weights".
Back to our model in two variables:
y = m_1*x_1 + m_2*x_2 + c
Take the expectation value =>
= m_1* + m_2* + c (0)
Now take the covariance w.r.t. x_1 and x_2, and use cov(x,x) = var(x):
cov(y, x_1) = m_1*var(x_1) + m_2*covar(x_2, x_1) (1)
cov(y, x_2) = m_1*covar(x_1, x_2) + m_2*var(x_2) (2)
These are two equations in two unknowns, which you can solve by inverting the 2X2 matrix.
In matrix form:
...
which can be inverted to yield
...
where
det = var(x_1)*var(x_2) - covar(x_1, x_2)^2
(oh barf, what the heck are "reputation points? Gimme some if you want to see the equations.)
In any case, now that you have m1 and m2 in closed form, you can solve (0) for c.
I checked the analytical solution above to Excel's Solver for a quadratic with Gaussian noise and the residual errors agree to 6 significant digits.
Contact me if you want to do Discrete Fourier Transform in SQL in about 20 lines.

Related

SQL Server - Generate 6 float values that MAX - MIN = parameter

I need to generate 6 float values with 1 decimal in the best way and performance possible where:
MAX(value) - MIN(value) = #parameter
I have this code:
BEGIN
DECLARE #parameter float = 0.6
WHILE #validated = 0
BEGIN
IF #count < 6
BEGIN -- fill table with 6 random values from 0 to 2 (with 1 decimal)
INSERT INTO #tempdata ([value])
SELECT ROUND(RAND()*(2-0),1);
SET #count = #count + 1
END
IF #count = 6 -- if temp table has 6 values then do the validation
BEGIN
SELECT #result = (MAX(value) - MIN(value)) FROM #tempdata
IF(#result = #parameter)
BEGIN
PRINT 'MATCH PARAMETER'
SET #validated = 1
END
ELSE
BEGIN
DELETE #tempdata
SET #counter = 0
END
END
END
END
This is working but sometimes it takes 10 or 20 seconds and should be faster.
For example if #parameter value is: 0.8 then we need 6 numbers between 0 and 2 where the MAX - MIN match that, for example:
0.7
1.1
0.6
0.9
1.5
1.2
MAX(1.5) - MIN(0.7) = 0.8
Any clue?
Do the following:
Generate 6 random numbers between 0 and 1
Normalize the values to be between 0 and 0.8 (or whatever)
Add something back in if you don't want them to all start at 0
In SQL:
select x,
max(x) over () - min(x) over () as starting_at_0,
min(x) over () + 0.8 * (x - min(x) over ()) / (max(x) over () - min(x) over ()) as the_value_you_want
from (values (rand(checksum(newid()))),
(rand(checksum(newid()))),
(rand(checksum(newid()))),
(rand(checksum(newid()))),
(rand(checksum(newid()))),
(rand(checksum(newid())))
) v(x);
Here is a db<>fiddle.
Cast to numeric(2, 1), if you want only one decimal point.
EDIT:
I tend to forget that this also works:
select x,
max(x) over () - min(x) over () as starting_at_0,
min(x) over () + 0.8 * (x - min(x) over ()) / (max(x) over () - min(x) over ()) as the_value_you_want
from (values (rand()),
(rand()),
(rand()),
(rand()),
(rand()),
(rand())
) v(x);
(See here.)
SQL Server treats rand() in a special way. Each call to rand() is evaluated before the query is run. So, rand() has the same value on multiple rows in the result set. However, it has different values in different columns.
with randvals(rval) as
(
select rand()
union all
select rand()
union all
select rand()
union all
select rand()
union all
select rand()
union all
select rand()
),
arandvals(rval, xrval, mrval) as
(
select rval, max(rval) over() as xrval, min(rval) over() as mrval
from randvals
)
select cast(0.8 * rval / (xrval - mrval) as numeric(3,2))
from arandvals

Find Coordinates between two points using sql

I have two columns Column X and Column Y. (X1,Y1) = (1,2) and (X2,Y2)= (9,10). Considering X1,Y1 as start Point and X2,Y2 as End Points, I can find Slope. But using slope and these points. How do I find remaining points between them
For Example: I have Values like
ColumnX ColumnY
1 1
. .
. .
. .
10 10
Slope is Y1-Y2/X1-X2 That is 10-1/10-1 =1
Using slope and the coordinates, How to find the remaining 9 coordinates between them using Sql
I don't fully understand what your input data looks like, but you can generate points using a recursive CTE:
with points as (
select 1 as x_start, 2 as y_start, 9 as x_end, 10 as y_end
),
cte as (
select x_start as x, convert(float(53), y_start) as y, 9 as x_end, convert(float(53), (y_end - y_start) * 1.0 / (x_end - x_start)) as slope
from points
union all
select x + 1, y + slope, x_end, slope
from cte
where x < x_end
)
select *
from cte
order by x;
Here is a db<>fiddle.
Here is the solution, let me know if it works for you
declare #x1 as decimal(10,2)
declare #y1 as decimal(10,2)
declare #x2 as decimal(10,2)
declare #y2 as decimal(10,2)
set #x1=1
set #y1=1
set #x2=10
set #y2=10
declare #mytab as table (x decimal(10,2),y decimal(10,2))
insert into #mytab values(#x1,#y1),(#x2,#y2)
declare #slope as decimal(10,2)
set #slope=(#y1-#y2)/(#x1-#x2)
--(y2=y1+s*(x2-x1)
;with cte as(
select #x1 x, #y1 y
union all
select cast(x+1 as decimal(10,2)),cast( #y1+#slope*(x+1.0-#x1) as decimal(10,2)) from cte
where x+1 < 11)
select x,y from cte

How to generate a range of numbers between two numbers?

I have two numbers as input from the user, like for example 1000 and 1050.
How do I generate the numbers between these two numbers, using a sql query, in seperate rows? I want this:
1000
1001
1002
1003
.
.
1050
Select non-persisted values with the VALUES keyword. Then use JOINs to generate lots and lots of combinations (can be extended to create hundreds of thousands of rows and beyond).
Short and fast version (not that easy to read):
WITH x AS (SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n))
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM x ones, x tens, x hundreds, x thousands
ORDER BY 1
Demo
More verbose version:
SELECT ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)
ORDER BY 1
Demo
Both versions can easily be extended with a WHERE clause, limiting the output of numbers to a user-specified range. If you want to reuse it, you can define a table-valued function for it.
an alternative solution is recursive CTE:
DECLARE #startnum INT=1000
DECLARE #endnum INT=1050
;
WITH gen AS (
SELECT #startnum AS num
UNION ALL
SELECT num+1 FROM gen WHERE num+1<=#endnum
)
SELECT * FROM gen
option (maxrecursion 10000)
SELECT DISTINCT n = number
FROM master..[spt_values]
WHERE number BETWEEN #start AND #end
Demo
Note that this table has a maximum of 2048 because then the numbers have gaps.
Here's a slightly better approach using a system view(since from SQL-Server 2005):
;WITH Nums AS
(
SELECT n = ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects
)
SELECT n FROM Nums
WHERE n BETWEEN #start AND #end
ORDER BY n;
Demo
or use a custom a number-table. Credits to Aaron Bertrand, i suggest to read the whole article: Generate a set or sequence without loops
The best option I have used is as follows:
DECLARE #min bigint, #max bigint
SELECT #Min=919859000000 ,#Max=919859999999
SELECT TOP (#Max-#Min+1) #Min-1+row_number() over(order by t1.number) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
I have generated millions of records using this and it works perfect.
I recently wrote this inline table valued function to solve this very problem. It's not limited in range other than memory and storage. It accesses no tables so there's no need for disk reads or writes generally. It adds joins values exponentially on each iteration so it's very fast even for very large ranges. It creates ten million records in five seconds on my server. It also works with negative values.
CREATE FUNCTION [dbo].[fn_ConsecutiveNumbers]
(
#start int,
#end int
) RETURNS TABLE
RETURN
select
x268435456.X
| x16777216.X
| x1048576.X
| x65536.X
| x4096.X
| x256.X
| x16.X
| x1.X
+ #start
X
from
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15)) as x1(X)
join
(VALUES (0),(16),(32),(48),(64),(80),(96),(112),(128),(144),(160),(176),(192),(208),(224),(240)) as x16(X)
on x1.X <= #end-#start and x16.X <= #end-#start
join
(VALUES (0),(256),(512),(768),(1024),(1280),(1536),(1792),(2048),(2304),(2560),(2816),(3072),(3328),(3584),(3840)) as x256(X)
on x256.X <= #end-#start
join
(VALUES (0),(4096),(8192),(12288),(16384),(20480),(24576),(28672),(32768),(36864),(40960),(45056),(49152),(53248),(57344),(61440)) as x4096(X)
on x4096.X <= #end-#start
join
(VALUES (0),(65536),(131072),(196608),(262144),(327680),(393216),(458752),(524288),(589824),(655360),(720896),(786432),(851968),(917504),(983040)) as x65536(X)
on x65536.X <= #end-#start
join
(VALUES (0),(1048576),(2097152),(3145728),(4194304),(5242880),(6291456),(7340032),(8388608),(9437184),(10485760),(11534336),(12582912),(13631488),(14680064),(15728640)) as x1048576(X)
on x1048576.X <= #end-#start
join
(VALUES (0),(16777216),(33554432),(50331648),(67108864),(83886080),(100663296),(117440512),(134217728),(150994944),(167772160),(184549376),(201326592),(218103808),(234881024),(251658240)) as x16777216(X)
on x16777216.X <= #end-#start
join
(VALUES (0),(268435456),(536870912),(805306368),(1073741824),(1342177280),(1610612736),(1879048192)) as x268435456(X)
on x268435456.X <= #end-#start
WHERE #end >=
x268435456.X
| isnull(x16777216.X, 0)
| isnull(x1048576.X, 0)
| isnull(x65536.X, 0)
| isnull(x4096.X, 0)
| isnull(x256.X, 0)
| isnull(x16.X, 0)
| isnull(x1.X, 0)
+ #start
GO
SELECT X FROM fn_ConsecutiveNumbers(5, 500);
It's handy for date and time ranges as well:
SELECT DATEADD(day,X, 0) DayX
FROM fn_ConsecutiveNumbers(datediff(day,0,'5/8/2015'), datediff(day,0,'5/31/2015'))
SELECT DATEADD(hour,X, 0) HourX
FROM fn_ConsecutiveNumbers(datediff(hour,0,'5/8/2015'), datediff(hour,0,'5/8/2015 12:00 PM'));
You could use a cross apply join on it to split records based on values in the table. So for example to create a record for every minute on a time range in a table you could do something like:
select TimeRanges.StartTime,
TimeRanges.EndTime,
DATEADD(minute,X, 0) MinuteX
FROM TimeRanges
cross apply fn_ConsecutiveNumbers(datediff(hour,0,TimeRanges.StartTime),
datediff(hour,0,TimeRanges.EndTime)) ConsecutiveNumbers
It work for me !
select top 50 ROW_NUMBER() over(order by a.name) + 1000 as Rcount
from sys.all_objects a
I do it with recursive ctes, but i'm not sure if it is the best way
declare #initial as int = 1000;
declare #final as int =1050;
with cte_n as (
select #initial as contador
union all
select contador+1 from cte_n
where contador <#final
) select * from cte_n option (maxrecursion 0)
saludos.
declare #start int = 1000
declare #end int =1050
;with numcte
AS
(
SELECT #start [SEQUENCE]
UNION all
SELECT [SEQUENCE] + 1 FROM numcte WHERE [SEQUENCE] < #end
)
SELECT * FROM numcte
If you don't have a problem installing a CLR assembly in your server a good option is writing a table valued function in .NET. That way you can use a simple syntax, making it easy to join with other queries and as a bonus won't waste memory because the result is streamed.
Create a project containing the following class:
using System;
using System.Collections;
using System.Data;
using System.Data.Sql;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace YourNamespace
{
public sealed class SequenceGenerator
{
[SqlFunction(FillRowMethodName = "FillRow")]
public static IEnumerable Generate(SqlInt32 start, SqlInt32 end)
{
int _start = start.Value;
int _end = end.Value;
for (int i = _start; i <= _end; i++)
yield return i;
}
public static void FillRow(Object obj, out int i)
{
i = (int)obj;
}
private SequenceGenerator() { }
}
}
Put the assembly somewhere on the server and run:
USE db;
CREATE ASSEMBLY SqlUtil FROM 'c:\path\to\assembly.dll'
WITH permission_set=Safe;
CREATE FUNCTION [Seq](#start int, #end int)
RETURNS TABLE(i int)
AS EXTERNAL NAME [SqlUtil].[YourNamespace.SequenceGenerator].[Generate];
Now you can run:
select * from dbo.seq(1, 1000000)
slartidan's answer can be improved, performance wise, by eliminating all references to the cartesian product and using ROW_NUMBER() instead (execution plan compared):
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x1(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x2(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x3(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x4(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x5(x)
ORDER BY n
Wrap it inside a CTE and add a where clause to select desired numbers:
DECLARE #n1 AS INT = 100;
DECLARE #n2 AS INT = 40099;
WITH numbers AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x1(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x2(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x3(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x4(x),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x5(x)
)
SELECT numbers.n
FROM numbers
WHERE n BETWEEN #n1 and #n2
ORDER BY n
Nothing new but I rewrote Brian Pressler solution to be easier on the eye, it might be useful to someone (even if it's just future me):
alter function [dbo].[fn_GenerateNumbers]
(
#start int,
#end int
) returns table
return
with
b0 as (select n from (values (0),(0x00000001),(0x00000002),(0x00000003),(0x00000004),(0x00000005),(0x00000006),(0x00000007),(0x00000008),(0x00000009),(0x0000000A),(0x0000000B),(0x0000000C),(0x0000000D),(0x0000000E),(0x0000000F)) as b0(n)),
b1 as (select n from (values (0),(0x00000010),(0x00000020),(0x00000030),(0x00000040),(0x00000050),(0x00000060),(0x00000070),(0x00000080),(0x00000090),(0x000000A0),(0x000000B0),(0x000000C0),(0x000000D0),(0x000000E0),(0x000000F0)) as b1(n)),
b2 as (select n from (values (0),(0x00000100),(0x00000200),(0x00000300),(0x00000400),(0x00000500),(0x00000600),(0x00000700),(0x00000800),(0x00000900),(0x00000A00),(0x00000B00),(0x00000C00),(0x00000D00),(0x00000E00),(0x00000F00)) as b2(n)),
b3 as (select n from (values (0),(0x00001000),(0x00002000),(0x00003000),(0x00004000),(0x00005000),(0x00006000),(0x00007000),(0x00008000),(0x00009000),(0x0000A000),(0x0000B000),(0x0000C000),(0x0000D000),(0x0000E000),(0x0000F000)) as b3(n)),
b4 as (select n from (values (0),(0x00010000),(0x00020000),(0x00030000),(0x00040000),(0x00050000),(0x00060000),(0x00070000),(0x00080000),(0x00090000),(0x000A0000),(0x000B0000),(0x000C0000),(0x000D0000),(0x000E0000),(0x000F0000)) as b4(n)),
b5 as (select n from (values (0),(0x00100000),(0x00200000),(0x00300000),(0x00400000),(0x00500000),(0x00600000),(0x00700000),(0x00800000),(0x00900000),(0x00A00000),(0x00B00000),(0x00C00000),(0x00D00000),(0x00E00000),(0x00F00000)) as b5(n)),
b6 as (select n from (values (0),(0x01000000),(0x02000000),(0x03000000),(0x04000000),(0x05000000),(0x06000000),(0x07000000),(0x08000000),(0x09000000),(0x0A000000),(0x0B000000),(0x0C000000),(0x0D000000),(0x0E000000),(0x0F000000)) as b6(n)),
b7 as (select n from (values (0),(0x10000000),(0x20000000),(0x30000000),(0x40000000),(0x50000000),(0x60000000),(0x70000000)) as b7(n))
select s.n
from (
select
b7.n
| b6.n
| b5.n
| b4.n
| b3.n
| b2.n
| b1.n
| b0.n
+ #start
n
from b0
join b1 on b0.n <= #end-#start and b1.n <= #end-#start
join b2 on b2.n <= #end-#start
join b3 on b3.n <= #end-#start
join b4 on b4.n <= #end-#start
join b5 on b5.n <= #end-#start
join b6 on b6.n <= #end-#start
join b7 on b7.n <= #end-#start
) s
where #end >= s.n
GO
2 years later, but I found I had the same issue. Here is how I solved it. (edited to include parameters)
DECLARE #Start INT, #End INT
SET #Start = 1000
SET #End = 1050
SELECT TOP (#End - #Start+1) ROW_NUMBER() OVER (ORDER BY S.[object_id])+(#Start - 1) [Numbers]
FROM sys.all_objects S WITH (NOLOCK)
I know I'm 4 years too late, but I stumbled upon yet another alternative answer to this problem. The issue for speed isn't just pre-filtering, but also preventing sorting. It's possible to force the join-order to execute in a manner that the Cartesian product actually counts up as a result of the join. Using slartidan's answer as a jump-off point:
WITH x AS (SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n))
SELECT ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM x ones, x tens, x hundreds, x thousands
ORDER BY 1
If we know the range we want, we can specify it via #Upper and #Lower. By combining the join hint REMOTE along with TOP, we can calculate only the subset of values we want with nothing wasted.
WITH x AS (SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n))
SELECT TOP (1+#Upper-#Lower) #Lower + ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM x thousands
INNER REMOTE JOIN x hundreds on 1=1
INNER REMOTE JOIN x tens on 1=1
INNER REMOTE JOIN x ones on 1=1
The join hint REMOTE forces the optimizer to compare on the right side of the join first. By specifying each join as REMOTE from most to least significant value, the join itself will count upwards by one correctly. No need to filter with a WHERE, or sort with an ORDER BY.
If you want to increase the range, you can continue to add additional joins with progressively higher orders of magnitude, so long as they're ordered from most to least significant in the FROM clause.
Note that this is a query specific to SQL Server 2008 or higher.
If your SQL-server version is higher than 2022 or supports GENERATE_SERIES function, we can try to use GENERATE_SERIES function and declare START and STOP parameters.
GENERATE_SERIES returns a single-column table containing a sequence of values in which each differs from the preceding by STEP
declare #start int = 1000
declare #stop int = 1050
declare #step int = 2
SELECT [Value]
FROM GENERATE_SERIES(#start, #stop, #step)
Here are couple quite optimal and compatible solutions:
USE master;
declare #min as int; set #min = 1000;
declare #max as int; set #max = 1050; --null returns all
-- Up to 256 - 2 048 rows depending on SQL Server version
select isnull(#min,0)+number.number as number
FROM dbo.spt_values AS number
WHERE number."type" = 'P' --integers
and ( #max is null --return all
or isnull(#min,0)+number.number <= #max --return up to max
)
order by number
;
-- Up to 65 536 - 4 194 303 rows depending on SQL Server version
select isnull(#min,0)+value1.number+(value2.number*numberCount.numbers) as number
FROM dbo.spt_values AS value1
cross join dbo.spt_values AS value2
cross join ( --get the number of numbers (depends on version)
select sum(1) as numbers
from dbo.spt_values
where spt_values."type" = 'P' --integers
) as numberCount
WHERE value1."type" = 'P' --integers
and value2."type" = 'P' --integers
and ( #max is null --return all
or isnull(#min,0)+value1.number+(value2.number*numberCount.numbers)
<= #max --return up to max
)
order by number
;
recursive CTE in exponential size (even for default of 100 recursion, this can build up to 2^100 numbers):
DECLARE #startnum INT=1000
DECLARE #endnum INT=1050
DECLARE #size INT=#endnum-#startnum+1
;
WITH numrange (num) AS (
SELECT 1 AS num
UNION ALL
SELECT num*2 FROM numrange WHERE num*2<=#size
UNION ALL
SELECT num*2+1 FROM numrange WHERE num*2+1<=#size
)
SELECT num+#startnum-1 FROM numrange order by num
Update for SQL 2017 and later:
If the sequence you desire is < 8k then this will work:
Declare #start_num int = 1000
, #end_num int = 1050
Select [number] = #start_num + ROW_NUMBER() over (order by (Select null))
from string_split(replicate(' ',#end_num-#start_num-1),' ')
This will also do
DECLARE #startNum INT = 1000;
DECLARE #endNum INT = 1050;
INSERT INTO dbo.Numbers
( Num
)
SELECT CASE WHEN MAX(Num) IS NULL THEN #startNum
ELSE MAX(Num) + 1
END AS Num
FROM dbo.Numbers
GO 51
The best speed when run query
DECLARE #num INT = 1000
WHILE(#num<1050)
begin
INSERT INTO [dbo].[Codes]
( Code
)
VALUES (#num)
SET #num = #num + 1
end
I had to insert picture filepath into database using similar method. The query below worked fine:
DECLARE #num INT = 8270058
WHILE(#num<8270284)
begin
INSERT INTO [dbo].[Galleries]
(ImagePath)
VALUES
('~/Content/Galeria/P'+CONVERT(varchar(10), #num)+'.JPG')
SET #num = #num + 1
end
The code for you would be:
DECLARE #num INT = 1000
WHILE(#num<1051)
begin
SELECT #num
SET #num = #num + 1
end
Here's what I came up with:
create or alter function dbo.fn_range(#start int, #end int) returns table
return
with u2(n) as (
select n
from (VALUES (0),(1),(2),(3)) v(n)
),
u8(n) as (
select
x0.n | x1.n * 4 | x2.n * 16 | x3.n * 64 as n
from u2 x0, u2 x1, u2 x2, u2 x3
)
select
#start + s.n as n
from (
select
x0.n | isnull(x1.n, 0) * 256 | isnull(x2.n, 0) * 65536 as n
from u8 x0
left join u8 x1 on #end-#start > 256
left join u8 x2 on #end-#start > 65536
) s
where s.n < #end - #start
Generates up to 2^24 values. Join conditions keep it fast for small values.
This is what I do, it's pretty fast and flexible and not a lot of code.
DECLARE #count  int =   65536;
DECLARE #start  int =   11;
DECLARE #xml    xml =   REPLICATE(CAST('<x/>' AS nvarchar(max)), #count);
; WITH GenerateNumbers(Num) AS
(
    SELECT  ROW_NUMBER() OVER (ORDER BY #count) + #start - 1
    FROM    #xml.nodes('/x') X(T)
)
SELECT  Num
FROM    GenerateNumbers;
Note that (ORDER BY #count) is a dummy. It doesn't do anything but ROW_NUMBER() requires an ORDER BY.
Edit:
I realized that the original question was to get a range from x to y. My script can be modified like this to get a range:
DECLARE #start  int =   5;
DECLARE #end   int =   21;
DECLARE #xml    xml =   REPLICATE(CAST('<x/>' AS nvarchar(max)), #end - #start + 1);
; WITH GenerateNumbers(Num) AS
(
    SELECT  ROW_NUMBER() OVER (ORDER BY #end) + #start - 1
    FROM    #xml.nodes('/x') X(T)
)
SELECT  Num
FROM    GenerateNumbers;
-- Generate Numeric Range
-- Source: http://www.sqlservercentral.com/scripts/Miscellaneous/30397/
CREATE TABLE #NumRange(
n int
)
DECLARE #MinNum int
DECLARE #MaxNum int
DECLARE #I int
SET NOCOUNT ON
SET #I = 0
WHILE #I <= 9 BEGIN
INSERT INTO #NumRange VALUES(#I)
SET #I = #I + 1
END
SET #MinNum = 1
SET #MaxNum = 1000000
SELECT num = a.n +
(b.n * 10) +
(c.n * 100) +
(d.n * 1000) +
(e.n * 10000)
FROM #NumRange a
CROSS JOIN #NumRange b
CROSS JOIN #NumRange c
CROSS JOIN #NumRange d
CROSS JOIN #NumRange e
WHERE a.n +
(b.n * 10) +
(c.n * 100) +
(d.n * 1000) +
(e.n * 10000) BETWEEN #MinNum AND #MaxNum
ORDER BY a.n +
(b.n * 10) +
(c.n * 100) +
(d.n * 1000) +
(e.n * 10000)
DROP TABLE #NumRange
This only works for sequences as long as some application table has rows. Assume I want sequence from 1..100, and have application table dbo.foo with column (of numeric or string type) foo.bar:
select
top 100
row_number() over (order by dbo.foo.bar) as seq
from dbo.foo
Despite its presence in an order by clause, dbo.foo.bar does not have to have distinct or even non-null values.
Of course, SQL Server 2012 has sequence objects, so there's a natural solution in that product.
This completed for me in 36 seconds on our DEV server. Like Brian's answer, focusing on filtering to the range is important from within the query; a BETWEEN still tries to generate all the initial records prior to the lower bound even though it doesn't need them.
declare #s bigint = 10000000
, #e bigint = 20000000
;WITH
Z AS (SELECT 0 z FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15)) T(n)),
Y AS (SELECT 0 z FROM Z a, Z b, Z c, Z d, Z e, Z f, Z g, Z h, Z i, Z j, Z k, Z l, Z m, Z n, Z o, Z p),
N AS (SELECT ROW_NUMBER() OVER (PARTITION BY 0 ORDER BY z) n FROM Y)
SELECT TOP (1+#e-#s) #s + n - 1 FROM N
Note that ROW_NUMBER is a bigint, so we can't go over 2^^64 (==16^^16) generated records with any method that uses it. This query therefore respects the same upper limit on generated values.
This uses procedural code and a table-valued function. Slow, but easy and predictable.
CREATE FUNCTION [dbo].[Sequence] (#start int, #end int)
RETURNS
#Result TABLE(ID int)
AS
begin
declare #i int;
set #i = #start;
while #i <= #end
begin
insert into #result values (#i);
set #i = #i+1;
end
return;
end
Usage:
SELECT * FROM dbo.Sequence (3,7);
ID
3
4
5
6
7
It's a table, so you can use it in joins with other data. I most frequently use this function as the left side of a join against a GROUP BY hour, day etc to ensure a contiguous sequence of time values.
SELECT DateAdd(hh,ID,'2018-06-20 00:00:00') as HoursInTheDay FROM dbo.Sequence (0,23) ;
HoursInTheDay
2018-06-20 00:00:00.000
2018-06-20 01:00:00.000
2018-06-20 02:00:00.000
2018-06-20 03:00:00.000
2018-06-20 04:00:00.000
(...)
Performance is uninspiring (16 seconds for a million rows) but good enough for many purposes.
SELECT count(1) FROM [dbo].[Sequence] (
1000001
,2000000)
GO
Oracle 12c; Quick but limited:
select rownum+1000 from all_objects fetch first 50 rows only;
Note: limited to row count of all_objects view;
The solution I've developed and used for quite some time now (riding some on the shared works of others) is slightly similar to at least one posted. It doesn't reference any tables and returns an unsorted range of up to 1048576 values (2^20) and can include negatives if desired. You can of course sort the result if necessary. It runs pretty quickly, especially on smaller ranges.
Select value from dbo.intRange(-500, 1500) order by value -- returns 2001 values
create function dbo.intRange
(
#Starting as int,
#Ending as int
)
returns table
as
return (
select value
from (
select #Starting +
( bit00.v | bit01.v | bit02.v | bit03.v
| bit04.v | bit05.v | bit06.v | bit07.v
| bit08.v | bit09.v | bit10.v | bit11.v
| bit12.v | bit13.v | bit14.v | bit15.v
| bit16.v | bit17.v | bit18.v | bit19.v
) as value
from (select 0 as v union ALL select 0x00001 as v) as bit00
cross join (select 0 as v union ALL select 0x00002 as v) as bit01
cross join (select 0 as v union ALL select 0x00004 as v) as bit02
cross join (select 0 as v union ALL select 0x00008 as v) as bit03
cross join (select 0 as v union ALL select 0x00010 as v) as bit04
cross join (select 0 as v union ALL select 0x00020 as v) as bit05
cross join (select 0 as v union ALL select 0x00040 as v) as bit06
cross join (select 0 as v union ALL select 0x00080 as v) as bit07
cross join (select 0 as v union ALL select 0x00100 as v) as bit08
cross join (select 0 as v union ALL select 0x00200 as v) as bit09
cross join (select 0 as v union ALL select 0x00400 as v) as bit10
cross join (select 0 as v union ALL select 0x00800 as v) as bit11
cross join (select 0 as v union ALL select 0x01000 as v) as bit12
cross join (select 0 as v union ALL select 0x02000 as v) as bit13
cross join (select 0 as v union ALL select 0x04000 as v) as bit14
cross join (select 0 as v union ALL select 0x08000 as v) as bit15
cross join (select 0 as v union ALL select 0x10000 as v) as bit16
cross join (select 0 as v union ALL select 0x20000 as v) as bit17
cross join (select 0 as v union ALL select 0x40000 as v) as bit18
cross join (select 0 as v union ALL select 0x80000 as v) as bit19
) intList
where #Ending - #Starting < 0x100000
and intList.value between #Starting and #Ending
)
;WITH u AS (
SELECT Unit FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(Unit)
),
d AS (
SELECT
(Thousands+Hundreds+Tens+Units) V
FROM
(SELECT Thousands = Unit * 1000 FROM u) Thousands
,(SELECT Hundreds = Unit * 100 FROM u) Hundreds
,(SELECT Tens = Unit * 10 FROM u) Tens
,(SELECT Units = Unit FROM u) Units
WHERE
(Thousands+Hundreds+Tens+Units) <= 10000
)
SELECT * FROM d ORDER BY v
I made the below function after reading this thread. Simple and fast:
go
create function numbers(#begin int, #len int)
returns table as return
with d as (
select 1 v from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(v)
)
select top (#len) #begin -1 + row_number() over(order by (select null)) v
from d d0
cross join d d1
cross join d d2
cross join d d3
cross join d d4
cross join d d5
cross join d d6
cross join d d7
go
select * from numbers(987654321,500000)

MS SQL - User Defined Function - Slope Intercept RSquare ; How to Group by Portfolio

Below is an example of my data, table RR_Linest:
Portfolio ---- Month_number ---- Collections
A --- --------- 1 --------------------- $100-------------------------------------------------------------------------------------
A-------------- 2 --------------------- $90
A ------------- 3 --------------------- $80--------------------------------------------------------------------------------------
A ------------- 4 --------------------- $70--------------------------------------------------------------------------------------
B ------------- 1 -------------------- $100-------------------------------------------------------------------------------------
B ---- -------- 2 ---------------------- $90 -------------------------------------------------------------------------------------
B - ------------ 3 --------------------- $80
I was able to figure out how to how to get the slope,intercept, RSquare for one portfolio by removing the portfolio column and only selecting the month_Number (x) and collections data (y) for only one selected portfolio (I removed data for portfolio B) and running the code below.
I have been trying to change the function so that when I run it; it gives me the slope, intercept, and R-square by portfolio. Does someone know how to do that? I have tried many ways and I just can't figure it out.
First I created the function:
declare #RegressionInput_A [dbo].[RegressionInput_A]
insert into #RegressionInput_A (x,y)
select
([model month]),log([collection $])
from [dbo].[RR_Linest]
select * from [dbo].LinearRegression_A
GO
drop function dbo.LinearRegression_A
CREATE FUNCTION dbo.LinearRegression_A
(
#RegressionInputs_A AS dbo.RegressionInput_A READONLY
)
RETURNS #RegressionOutput_A TABLE
(
Slope DECIMAL(18, 6),
Intercept DECIMAL(18, 6),
RSquare DECIMAL(18, 6)
)
AS
BEGIN
DECLARE #Xaverage AS DECIMAL(18, 6)
DECLARE #Yaverage AS DECIMAL(18, 6)
DECLARE #slope AS DECIMAL(18, 6)
DECLARE #intercept AS DECIMAL(18, 6)
DECLARE #rSquare AS DECIMAL(18, 6)
SELECT
#Xaverage = AVG(x),
#Yaverage = AVG(y)
FROM
#RegressionInputs_A
SELECT
#slope = SUM((x - #Xaverage) * (y - #Yaverage))/SUM(POWER(x - #Xaverage, 2))
FROM
#RegressionInputs_A
SELECT
#intercept = #Yaverage - (#slope * #Xaverage)
SELECT #rSquare = 1 - (SUM(POWER(y - (#intercept + #slope * x), 2))/(SUM(POWER(y - (#intercept + #slope * x), 2)) + SUM(POWER(((#intercept + #slope * x) - #Yaverage), 2))))
FROM
#RegressionInputs_A
INSERT INTO
#RegressionOutput_A
(
Slope,
Intercept,
RSquare
)
SELECT
#slope,
#intercept,
#rSquare
RETURN
END
GO
Then I run the function
declare #RegressionInput_A [dbo].[RegressionInput_A]
insert into #RegressionInput_A (x,y)
select
([model month]),log([collection $])
from [dbo].[RR_Linest]
select * from [dbo].[LinearRegression_A](#RegressionInput_A)
Wow, this is a real cool example of how to use nested CTE's in a In Line Table Value Function. You want to use a ITVF since they are fast. See Wayne Sheffield’s blog article that attests to this fact.
I always start with a sample database/table if it is really complicated to make sure I give the user a correct solution.
Lets create a database named [test] based on model.
--
-- Create a simple db
--
-- use master
use master;
go
-- delete existing databases
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Test')
DROP DATABASE Test
GO
-- simple db based on model
create database Test;
go
-- switch to new db
use [Test];
go
Lets create a table type named [InputToLinearReg].
--
-- Create table type to pass data
--
-- Delete the existing table type
IF EXISTS (SELECT * FROM sys.systypes WHERE name = 'InputToLinearReg')
DROP TYPE dbo.InputToLinearReg
GO
-- Create the table type
CREATE TYPE InputToLinearReg AS TABLE
(
portfolio_cd char(1),
month_num int,
collections_amt money
);
go
Okay, here is the multi-layered SELECT statement that uses CTE's. The query analyzer treats this as a SQL statement which can be executed in parallel versus a regular function that can't. See the black box section of Wayne's article.
--
-- Create in line table value function (fast)
--
-- Remove if it exists
IF OBJECT_ID('CalculateLinearReg') > 0
DROP FUNCTION CalculateLinearReg
GO
-- Create the function
CREATE FUNCTION CalculateLinearReg
(
#ParmInTable AS dbo.InputToLinearReg READONLY
)
RETURNS TABLE
AS
RETURN
(
WITH cteRawData as
(
SELECT
T.portfolio_cd,
CAST(T.month_num as decimal(18, 6)) as x,
LOG(CAST(T.collections_amt as decimal(18, 6))) as y
FROM
#ParmInTable as T
),
cteAvgByPortfolio as
(
SELECT
portfolio_cd,
AVG(x) as xavg,
AVG(y) as yavg
FROM
cteRawData
GROUP BY
portfolio_cd
),
cteSlopeByPortfolio as
(
SELECT
R.portfolio_cd,
SUM((R.x - A.xavg) * (R.y - A.yavg)) / SUM(POWER(R.x - A.xavg, 2)) as slope
FROM
cteRawData as R
INNER JOIN
cteAvgByPortfolio A
ON
R.portfolio_cd = A.portfolio_cd
GROUP BY
R.portfolio_cd
),
cteInterceptByPortfolio as
(
SELECT
A.portfolio_cd,
(A.yavg - (S.slope * A.xavg)) as intercept
FROM
cteAvgByPortfolio as A
INNER JOIN
cteSlopeByPortfolio S
ON
A.portfolio_cd = S.portfolio_cd
)
SELECT
A.portfolio_cd,
A.xavg,
A.yavg,
S.slope,
I.intercept,
1 - (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) /
(SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) +
SUM(POWER(((I.intercept + S.slope * R.x) - A.yavg), 2)))) as rsquared
FROM
cteRawData as R
INNER JOIN
cteAvgByPortfolio as A ON R.portfolio_cd = A.portfolio_cd
INNER JOIN
cteSlopeByPortfolio S ON A.portfolio_cd = S.portfolio_cd
INNER JOIN
cteInterceptByPortfolio I ON S.portfolio_cd = I.portfolio_cd
GROUP BY
A.portfolio_cd,
A.xavg,
A.yavg,
S.slope,
I.intercept
);
Last but not least, setup a Table Variable and get the answers. Unlike you solution above, it groups by portfolio id.
-- Load data into variable
DECLARE #InTable AS InputToLinearReg;
-- insert data
insert into #InTable
values
('A', 1, 100.00),
('A', 2, 90.00),
('A', 3, 80.00),
('A', 4, 70.00),
('B', 1, 100.00),
('B', 2, 90.00),
('B', 3, 80.00);
-- show data
select * from CalculateLinearReg(#InTable)
go
Here is a picture of the results using your data.
CREATE FUNCTION dbo.LinearRegression
(
#RegressionInputs AS dbo.RegressionInput READONLY
)
RETURNS TABLE AS
RETURN
(
WITH
t1 AS ( --calculate averages
SELECT portfolio, x, y,
AVG(x) OVER(PARTITION BY portfolio) Xaverage,
AVG(y) OVER(PARTITION BY portfolio) Yaverage
FROM #RegressionInputs
),
t2 AS ( --calculate slopes
SELECT portfolio, Xaverage, Yaverage,
SUM((x - Xaverage) * (y - Yaverage))/SUM(POWER(x - Xaverage, 2)) slope
FROM t1
GROUP BY portfolio, Xaverage, Yaverage
),
t3 AS ( --calculate intercepts
SELECT portfolio, slope,
(Yaverage - (slope * Xaverage) ) AS intercept
FROM t2
),
t4 AS ( --calculate rSquare
SELECT t1.portfolio, slope, intercept,
1 - (SUM(POWER(y - (intercept + slope * x), 2))/(SUM(POWER(y - (intercept + slope * x), 2)) + SUM(POWER(((intercept + slope * x) - Yaverage), 2)))) AS rSquare
FROM t1
INNER JOIN t3 ON (t1.portfolio = t3.portfolio)
GROUP BY t1.portfolio
)
SELECT portfolio, slope, intercept, rSquare FROM t4
)

How can I calculate the distance between 2 rows' points in a set of results?

I have a query that takes a LINESTRING and converts it to a result set of POINTS.
What I can't figure out is how to find the distance between 2 specific row points in this result set.
This is what I have so far:
DECLARE #GeographyToConvert geography
SET #GeographyToConvert = geography::STGeomFromText('LINESTRING (26.6434033 -81.7097817, 26.6435367 -81.709785, 26.6435783 -81.7098033, 26.6436067 -81.709825, 26.6435883 -81.709875, 26.64356 -81.7100417, 26.6434417 -81.710125, 26.6433167 -81.7101467, 26.643195 -81.7101033, 26.6431533 -81.7099517, 26.643175 -81.7097867, 26.643165 -81.7097917, 26.6431633 -81.7097367, 26.6431583 -81.7097083)',4326);
WITH GeographyPoints(N, Point) AS
(
SELECT 1, #GeographyToConvert.STPointN(1)
UNION ALL
SELECT N + 1, #GeographyToConvert.STPointN(N + 1)
FROM GeographyPoints GP
WHERE N < #GeographyToConvert.STNumPoints()
)
SELECT N,Point.STBuffer(0.25) as point, Point.STAsText() FROM GeographyPoints
For example, how can I compare the distance between N=10 & N=11?
This is what I was trying, but it does not work:
Declare #Point1 geography;
Declare #Point2 geography;
DECLARE #GeographyToConvert geography
--SET #GeometryToConvert = (select top 1 geotrack from dbo.SYNCTESTING2 where geotrack is not null);
SET #GeographyToConvert = geography::STGeomFromText('LINESTRING (26.6434033 -81.7097817, 26.6435367 -81.709785, 26.6435783 -81.7098033, 26.6436067 -81.709825, 26.6435883 -81.709875, 26.64356 -81.7100417, 26.6434417 -81.710125, 26.6433167 -81.7101467, 26.643195 -81.7101033, 26.6431533 -81.7099517, 26.643175 -81.7097867, 26.643165 -81.7097917, 26.6431633 -81.7097367, 26.6431583 -81.7097083)',4326);
WITH GeographyPoints(N, Point) AS
(
SELECT 1, #GeographyToConvert.STPointN(1)
UNION ALL
SELECT N + 1, #GeographyToConvert.STPointN(N + 1)
FROM GeographyPoints GP
WHERE N < #GeographyToConvert.STNumPoints()
)
SELECT N,Point.STBuffer(0.25) as point, Point.STAsText() FROM GeographyPoints
select #Point1 = Point FROM GeometryPoints where N = 10;
select #Point2 = Point FROM GeometryPoints where N = 11
select #Point1.STDistance(#Point2) as [Distance in Meters]
Replace
SELECT N,Point.STBuffer(0.25) as point, Point.STAsText() FROM GeographyPoints
With
SELECT * INTO #GeographyPoints FROM GeographyPoints
DECLARE #N1 INT = 10
DECLARE #N2 INT = 11
SELECT (SELECT Point FROM #GeographyPoints WHERE N=#N1).STDistance(
(SELECT Point FROM #GeographyPoints WHERE N=#N2))
DROP TABLE #GeographyPoints
And just change the values for #N1 & #N2 as neccessary
Is this what you're looking for? Distance to the previous point?
DECLARE #GeographyToConvert geography
SET #GeographyToConvert = geography::STGeomFromText('LINESTRING (26.6434033 -81.7097817, 26.6435367 -81.709785, 26.6435783 -81.7098033, 26.6436067 -81.709825, 26.6435883 -81.709875, 26.64356 -81.7100417, 26.6434417 -81.710125, 26.6433167 -81.7101467, 26.643195 -81.7101033, 26.6431533 -81.7099517, 26.643175 -81.7097867, 26.643165 -81.7097917, 26.6431633 -81.7097367, 26.6431583 -81.7097083)',4326);
WITH GeographyPoints(N, Point, PreviousPoint, DistanceFromPrevious) AS
(
SELECT 1, #GeographyToConvert.STPointN(1), CAST(NULL AS GEOGRAPHY), CAST(0 AS Float)
UNION ALL
SELECT N + 1, #GeographyToConvert.STPointN(N + 1)
, #GeographyToConvert.STPointN(N)
, #GeographyToConvert.STPointN(N).STDistance(#GeographyToConvert.STPointN(N + 1))
FROM GeographyPoints GP
WHERE N < #GeographyToConvert.STNumPoints()
)
SELECT N,Point.STBuffer(0.25) as point, Point.STAsText(), PreviousPoint, DistanceFromPrevious FROM GeographyPoints