Related
I have data in Redshift that I'm aggregating to the Year-Quarter level i.e. number of items by Year-Quarter
I need to show a continuous trend and hence I need to fill-in the gaps in Year-Quarter. The picture below should give a clearer idea of my current data and desired output.
How can I achieve this in Redshift SQL?
A query like this should do the trick:
create table test (yq int, items int);
INSERT INTO test Values (20201,10),(20204, 15),(20213, 25),(20222, 30);
with recursive quarters(q) as (
select min(yq) as q
from test
union all
select decode(right(q::text, 1), 4, q + 7, q + 1) as q
from quarters
where q < (select max(yq) from test)
)
select q as yq, decode(items is null, true,
lag(items ignore nulls) over (order by q), items) as items
from test t
right join quarters q
on t.yq = q.q
order by q;
It uses a recursive CTE to generate the quarters range needed, right joins this with the source data, and then uses a LAG() window function to populate the items if the value is NULL.
This is known as forward filling values:
CREATE TABLE #Temp
(
[YQ] nvarchar(5),
[items] int
)
INSERT INTO #Temp Values ('20201',10),('20204', 15),('20213', 25),('20222', 30)
---------------------------------------------------------------------------------
DECLARE #start int, #end int, #starty int, #endy int
SELECT #start=1, #end=4
SELECT #starty=MIN(Substring(YQ,0,5)), #endy=MIN(Substring(YQ,0,5)) from #Temp
;With cte1(y) as
(
Select #starty as y
union all
Select y + 1
from cte1
where y <= #endy + 1
)
, cte2(n) as
(
Select #start as n
union all
Select n + 1
from cte2
where n < #end
)
SELECT t1.YQ AS 'Year-Quarter',
CASE WHEN t2.items is null then (SELECT TOP 1 MAX(items) from #Temp WHERE items is not null and YQ < t1.YQ) ELSE t2.items END AS '# Items'
FROM
(
SELECT CAST(cte1.y AS nvarchar(4)) + CAST(cte2.n AS nvarchar(1)) AS YQ
FROM cte1, cte2
) t1
LEFT JOIN #Temp t2 ON t2.YQ = t1.YQ
WHERE t1.YQ <= (SELECT MAX(YQ) FROM #Temp)
ORDER BY t1.YQ, t2.items
I have a very long text string being imported into a table. I would like to split the string up; I have a routine to pull the data into a table, but it creates all the data in a single field in a table.
Example Text:
05/10/2018 21:14,#FXAAF00123456,,Cup 1 X Plane,0.00000,OK,Cup 1 Y Plane,0.00000,OK,Cup 1 Z Plane,40.64252,OK,Cup 2 X Plane,77.89434,OK,..etc
(The test string is much longer than this, in the region of 1500-1700 characters, but with the same structure in the rest of the string).
This data is a series of test measurements, with the name of the value, the value, and the OK/NOK indicator.
I want the results to be stored in a table (variable) with three fields, so the data above becomes:
Field1|Field2|Field3
05/10/2018 21:14|#FXAAF00123456|null|
Cup 1 X Plane|0.00000|OK|
Cup 1 Y Plane|0.00000|OK|
Cup 1 Z Plane|40.64252|OK|
Cup 2 X Plane|77.89434|OK|
...etc
I am using this function to split the string into a table variable:
CREATE FUNCTION [dbo].[fnSplitString]
(
#InputString NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT
[Value] = LTRIM(RTRIM(SUBSTRING(#InputString, [Number],
CHARINDEX(#Delim, #InputString + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#InputString)
AND SUBSTRING(#Delim + #InputString, [Number], LEN(#Delim)) = #Delim
) AS y
);
How can this be modified to give the output required above?
You can try this tiny inline splitting approach.
DECLARE #s VARCHAR(MAX)='05/10/2018 21:14,#FXAAF00123456,,Cup 1 X Plane,0.00000,OK,Cup 1 Y Plane,0.00000,OK,Cup 1 Z Plane,40.64252,OK,Cup 2 X Plane,77.89434,OK';
;WITH
a AS (SELECT n=0, i=-1, j=0 UNION ALL SELECT n+1, j, CAST(CHARINDEX(',', #s, j+1) AS INT) FROM a WHERE j > i)
,b AS (SELECT n, SUBSTRING(#s, i+1, IIF(j>0, j, LEN(#s)+1)-i-1) s FROM a WHERE i >= 0)
,c AS (SELECT n,(n-1) % 3 AS Position,(n-1)/3 AS RowIndex,s FROM b)
SELECT MAX(CASE WHEN Position=0 THEN s END) AS part1
,MAX(CASE WHEN Position=1 THEN s END) AS part2
,MAX(CASE WHEN Position=2 THEN s END) AS part3
FROM c
GROUP BY RowIndex
OPTION (MAXRECURSION 0);
The result
part1 part2 part3
05/10/2018 21:14 #FXAAF00123456
Cup 1 X Plane 0.00000 OK
Cup 1 Y Plane 0.00000 OK
Cup 1 Z Plane 40.64252 OK
Cup 2 X Plane 77.89434 OK
Hint
You might change your splitter function to the recursive approach above. On the one side you are limited to a string-length of the count in sys.all_objects which might be smaller than your input. On the other side your approach has to test each and any position, while the recursive approach hops from spot to spot. Should be faster...
This could easily be opened for a multi-character-delimiter if needed...
UPDATE another approach without recursion
...which makes it clumsy to be used in a splitter function (due to OPTION MAXRECURSION(0), which must be placed at the end of the query and cannot live within the function). Try it out:
;WITH
a(Casted) AS (SELECT CAST('<x>' + REPLACE((SELECT #s AS [*] FOR XML PATH('')),',','</x><x>') + '</x>' AS XML))
,b(s,RowIndex,Position) AS
(
SELECT x.value(N'text()[1]','nvarchar(max)')
,(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1) /3
,(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1) %3
FROM a
CROSS APPLY Casted.nodes(N'/x') X(x)
)
SELECT RowIndex
,MAX(CASE WHEN Position=0 THEN s END) AS part1
,MAX(CASE WHEN Position=1 THEN s END) AS part2
,MAX(CASE WHEN Position=2 THEN s END) AS part3
FROM b
GROUP BY RowIndex;
Hint:
Using (SELECT #s AS [*] FOR XML PATH('')) will make this approach save with forbidden characters...
this required a small modification to your fnSplitString function. Add a RowNo to identify the original sequence of the delimited item
CREATE FUNCTION [dbo].[fnSplitString]
(
#InputString NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT RowNo = ROW_NUMBER() OVER (ORDER BY Number),
[Value] = LTRIM(RTRIM(SUBSTRING(#InputString, [Number],
CHARINDEX(#Delim, #InputString + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#InputString)
AND SUBSTRING(#Delim + #InputString, [Number], LEN(#Delim)) = #Delim
) AS y
);
And with that, you can group every 3 rows as one. Also the RowNo can be used to identify the column
The query
; with tbl as
(
select col = '05/10/2018 21:14,#FXAAF00123456,,Cup 1 X Plane,0.00000,OK,Cup 1 Y Plane,0.00000,OK,Cup 1 Z Plane,40.64252,OK,Cup 2 X Plane,77.89434,OK'
)
select Field1 = MAX(CASE WHEN (RowNo - 1) % 3 = 0 THEN Value END),
Field2 = MAX(CASE WHEN (RowNo - 1) % 3 = 1 THEN Value END),
Field3 = MAX(CASE WHEN (RowNo - 1) % 3 = 2 THEN Value END)
from tbl t
cross apply dbo.fnSplitString (t.col, ',')
group by (RowNo - 1) / 3
Can you try following script after you create the SQL split function given in the reference document.
That split function returns the order of splitted string fragments so that information is used for row data
declare #str nvarchar(max) = '05/10/2018 21:14,#FXAAF00123456,,Cup 1 X Plane,0.00000,OK,Cup 1 Y Plane,0.00000,OK,Cup 1 Z Plane,40.64252,OK,Cup 2 X Plane,77.89434,OK'
select
floor(id / 3)+1 rn,
case when id % 3 = 1 then val end Field1,
case when id % 3 = 2 then val end Field2,
case when id % 3 = 0 then val end Field3
from dbo.Split(#str,',')
select
rn,
max(Field1) Field1,
max(Field2) Field2,
max(Field3) Field3
from (
select
floor((id-1) / 3)+1 rn,
case when id % 3 = 1 then val end Field1,
case when id % 3 = 2 then val end Field2,
case when id % 3 = 0 then val end Field3
from dbo.Split(#str,',')
) t
group by rn
I've added a function to my DB that splits a comma separated string into separate rows.
Now in my string I have: 1,55,2,56,3,57,etc... where (1) is the rowID and (55) the value I want to enter into row 1 of my table.
How can I modify this function to pull the 1st,3rd,5th,etc... values and 2nd,4th,6th,etc... values into two different columns?
CREATE FUNCTION dbo.SplitStringToValues
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(#List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(#List,t.N,1) = #Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter,#List,s.N1),0)-s.N1,8000))
FROM cteStart s;
go
-------------- Update
Thanks everyone for your examples. I'm going to try out each of these until I get something working. I will accept once i figure out which on I can make work.
Thank you,
Alexp
An attempt to help with batch script; please try it out:
DECLARE #List NVARCHAR(MAX) = '1,55,2,56,3,57,10,65,11,88';
DECLARE #Delimiter NVARCHAR(255) = ',';
DECLARE #ListDataTable TABLE
(
ID INT IDENTITY (1, 1)
,DataKey INT
,DataValue INT
)
INSERT INTO #ListDataTable (DataKey, DataValue)
SELECT
value
,LEAD(value, 1, 0) OVER(ORDER BY (SELECT 1))
FROM STRING_SPLIT(#List, #Delimiter) WHERE RTRIM(value) <> '';
-- To get odd key values
SELECT * FROM
(
SELECT DataKey, DataValue FROM #ListDataTable WHERE ID % 2 = 1
) Temp WHERE DataKey % 2 = 1;
-- To get even key values
SELECT * FROM
(
SELECT DataKey, DataValue FROM #ListDataTable WHERE ID % 2 = 1
) Temp WHERE DataKey % 2 = 0;
Modify your function to return two columns: the position and the value. This is easy enough and keeps the function general purpose. Just change the select to:
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter, #List, s.N1), 0) - s.N1, 8000)),
ItemNum = row_number() over (order by s.N1)
FROM cteStart s;
Then you can use to get the information you want. Here is one method:
select max(case when ItemNum % 2 = 1 then Item end) as rownum,
max(case when ItemNum % 2 = 0 then Item end) as value
from dbo.SplitStringToValues('1,55,2,56,3,57', ',')
group by (ItemNum - 1) / 2
#Macwise was on to something with LEAD - you could do this:
SELECT rownum = item, value
FROM
(
SELECT itemnumber, item, value = LEAD(item,1) OVER (ORDER BY itemnumber)
FROM dbo.SplitStringToValues('1,44,2,55,3,456,4,123,5,0', ',')
) split
WHERE 1 = itemnumber%2;
Gordon's solution is the best, most elegant pre-2012 solution. Here's another pre-2012 solution that does not require a sort in the execution plan:
SELECT rownum = s1.Item, value = s2.Item
FROM DelimitedSplit8K(#string, ',') s1
INNER MERGE JOIN SplitStringToValues('1,44,2,55,3,456,4,123,5,0', ',') s2
ON 1 = s1.itemNumber % 2 AND s1.ItemNumber = s2.ItemNumber-1;
Instead of changing that function, to get the next row's value next to the id use the LEAD function introduced in SQL SERVER 2012:
SELECT Id, Value
FROM (SELECT
ROW_NUMBER() over (order by(select 1)) as cnt,
t.item AS Id,
Lead(t.item)
OVER (
ORDER BY (SELECT 1)) Value
FROM dbo.Splitstringtovalues('10,20,30,40,50,10,20,30,40,50,60,70', ',')
t)
keyValue
WHERE keyValue.value IS NOT NULL
and cnt % 2 = 1
ACT and CAT are anagrams
I have to Write a function in sql server that takes 2 strings and given a Boolean output that indicates whether the both of them are anagram or not.
This doesnt make sense to do it in sql server,but,it is for learning purpose only
SQL Server is not good at this kind of things, but here you are:
WITH Src AS
(
SELECT * FROM (VALUES
('CAT', 'ACT'),
('CAR', 'RAC'),
('BUZ', 'BUS'),
('FUZZY', 'MUZZY'),
('PACK', 'PACKS'),
('AA', 'AA'),
('ABCDEFG', 'GFEDCBA')) T(W1, W2)
), Numbered AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) Num
FROM Src
), Splitted AS
(
SELECT Num, W1 Word1, W2 Word2, LEFT(W1, 1) L1, LEFT(W2, 1) L2, SUBSTRING(W1, 2, LEN(W1)) W1, SUBSTRING(W2, 2, LEN(W2)) W2
FROM Numbered
UNION ALL
SELECT Num, Word1, Word2, LEFT(W1, 1) L1, LEFT(W2, 1) L2, SUBSTRING(W1, 2, LEN(W1)) W1, SUBSTRING(W2, 2, LEN(W2)) W2
FROM Splitted
WHERE LEN(W1)>0 AND LEN(W2)>0
), SplitOrdered AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY L1) LNum1,
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY L2) LNum2
FROM Splitted
)
SELECT S1.Num, S1.Word1, S1.Word2, CASE WHEN COUNT(*)=LEN(S1.Word1) AND COUNT(*)=LEN(S1.Word2) THEN 1 ELSE 0 END Test
FROM SplitOrdered S1
JOIN SplitOrdered S2 ON S1.L1=S2.L2 AND S1.Num=S2.Num AND S1.LNum1=S2.LNum2
GROUP BY S1.Num, S1.Word1, S1.Word2
And results:
1 CAT ACT 1
2 CAR RAC 1
3 BUZ BUS 0
4 FUZZY MUZZY 0
5 PACK PACKS 0
6 AA AA 1
7 ABCDEFG GFEDCBA 1
First split (T-SQL Split Word into characters) both words into temporary tables. Then perform an outer join and check for nulls.
Edit thanks to George's comment:
split (T-SQL Split Word into characters) both words into temporary tables
Modify temporary tables or use CTEs to add a column with count(*) with group by letters clause
Perform a full outer join on two temporary tables using a letter and it's count in join condition
Check for nulls in the output - if there are none, you have an anagram
The first get in my mind:
DECLARE #word1 nvarchar(max) = NULL,
#word2 nvarchar(max) = 'Test 1',
#i int = 0, #n int
DECLARE #table TABLE (
id int,
letter int
)
SELECT #word1 = ISNULL(LOWER(#word1),''), #word2 = ISNULL(LOWER(#word2),'')
SELECT #n = CASE WHEN LEN(#word1) > LEN(#word2) THEN LEN(#word1) ELSE LEN(#word2) END
WHILE #n > 0
BEGIN
INSERT INTO #table
SELECT 1, ASCII(SUBSTRING(#word1,#n,1))
UNION ALL
SELECT 2, ASCII(SUBSTRING(#word2,#n,1))
SET #n=#n-1
END
SELECT CASE WHEN COUNT(*) = 0 THEN 1 ELSE 0 END isAnagram
FROM (
SELECT id, letter, COUNT(letter) as c
FROM #table
WHERE id = 1
GROUP BY id, letter)as t
FULL OUTER JOIN (
SELECT id, letter, COUNT(letter) as c
FROM #table
WHERE id = 2
GROUP BY id, letter) as p
ON t.letter = p.letter and t.c =p.c
WHERE t.letter is NULL OR p.letter is null
Output:
isAnagram
0
You can also use loops in functions, and they can work fast. I am not able to get any of the of other answers even close to the performance of this function:
CREATE FUNCTION IsAnagram
(
#value1 VARCHAR(255)
, #value2 VARCHAR(255)
)
RETURNS BIT
BEGIN
IF(LEN(#value1) != LEN(#value2))
RETURN 0;
DECLARE #firstChar VARCHAR(3);
WHILE (LEN(#value1) > 0)
BEGIN
SET #firstChar = CONCAT('%', LEFT(#value1, 1), '%');
IF(PATINDEX(#firstChar, #value2) > 0)
SET #value2 = STUFF(#value2, PATINDEX(#firstChar, #value2), 1, '');
ELSE
RETURN 0;
SET #value1 = STUFF(#value1, 1, 1, '');
END
RETURN (SELECT IIF(#value2 = '', 1, 0));
END
GO
SELECT dbo.IsAnagram('asd', 'asd')
--1
SELECT dbo.IsAnagram('asd', 'dsa')
--1
SELECT dbo.IsAnagram('assd', 'dsa')
--0
SELECT dbo.IsAnagram('asd', 'dssa')
--0
SELECT dbo.IsAnagram('asd', 'asd')
This is something a numbers table can help with.
Code to create and populate a small numbers table is below.
CREATE TABLE dbo.Numbers
(
Number INT PRIMARY KEY
);
WITH Ten(N) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
INSERT INTO dbo.Numbers
SELECT ROW_NUMBER() OVER (ORDER BY ##SPID) AS Number
FROM Ten T10,
Ten T100,
Ten T1000
Once that is in place you can use
SELECT W1,
W2,
IsAnagram = CASE
WHEN LEN(W1) <> LEN(W2)
THEN 0
ELSE
CASE
WHEN EXISTS (SELECT SUBSTRING(W1, Number, 1),
COUNT(*)
FROM dbo.Numbers
WHERE Number <= LEN(W1)
GROUP BY SUBSTRING(W1, Number, 1)
EXCEPT
SELECT SUBSTRING(W2, Number, 1),
COUNT(*)
FROM dbo.Numbers
WHERE Number <= LEN(W2)
GROUP BY SUBSTRING(W2, Number, 1))
THEN 0
ELSE 1
END
END
FROM (VALUES
('CAT', 'ACT'),
('CAR', 'RAC'),
('BUZ', 'BUS'),
('FUZZY', 'MUZZY'),
('PACK', 'PACKS'),
('AA', 'AA'),
('ABCDEFG', 'GFEDCBA')) T(W1, W2)
Or an alternative implementation could be
IsAnagram = CASE
WHEN LEN(W1) <> LEN(W2)
THEN 0
ELSE
CASE
WHEN EXISTS (SELECT 1
FROM dbo.Numbers N
CROSS APPLY (VALUES(1,W1),
(2,W2)) V(Col, String)
WHERE N.Number <= LEN(W1)
GROUP BY SUBSTRING(String, Number, 1)
HAVING COUNT(CASE WHEN Col = 1 THEN 1 END) <>
COUNT(CASE WHEN Col = 2 THEN 1 END))
THEN 0
ELSE 1
END
END
I have the following block of code that calculates the formula for a trend line using linear regression (method of least squares). It just find the R-Squared and coefficient of correlation value for X and Y axis.
This will calculate the exact value if X and Y axis are int and float.
CREATE FUNCTION [dbo].[LinearReqression] (#Data AS XML)
RETURNS TABLE AS RETURN (
WITH Array AS (
SELECT x = n.value('#x', 'float'),
y = n.value('#y', 'float')
FROM #Data.nodes('/r/n') v(n)
),
Medians AS (
SELECT xbar = AVG(x), ybar = AVG(y)
FROM Array ),
BetaCalc AS (
SELECT Beta = SUM(xdelta * (y - ybar)) / NULLIF(SUM(xdelta * xdelta), 0)
FROM Array
CROSS JOIN Medians
CROSS APPLY ( SELECT xdelta = (x - xbar) ) xd ),
AlphaCalc AS (
SELECT Alpha = ybar - xbar * beta
FROM Medians
CROSS JOIN BetaCalc),
SSCalc AS (
SELECT SS_tot = SUM((y - ybar) * (y - ybar)),
SS_err = SUM((y - (Alpha + Beta * x)) * (y - (Alpha + Beta * x)))
FROM Array
CROSS JOIN Medians
CROSS JOIN AlphaCalc
CROSS JOIN BetaCalc )
SELECT r_squared = CASE WHEN SS_tot = 0 THEN 1.0
ELSE 1.0 - ( SS_err / SS_tot ) END,
Alpha, Beta
FROM AlphaCalc
CROSS JOIN BetaCalc
CROSS JOIN SSCalc
)
Usage:
DECLARE #DataTable TABLE (
SourceID INT,
x Date,
y FLOAT
) ;
INSERT INTO #DataTable ( SourceID, x, y )
SELECT ID = 0, x = 1.2, y = 1.0
UNION ALL SELECT 1, 1.6, 1
UNION ALL SELECT 2, 2.0, 1.5
UNION ALL SELECT 3, 2.0, 1.75
UNION ALL SELECT 4, 2.1, 1.85
UNION ALL SELECT 5, 2.1, 2
UNION ALL SELECT 6, 2.2, 3
UNION ALL SELECT 7, 2.2, 3
UNION ALL SELECT 8, 2.3, 3.5
UNION ALL SELECT 9, 2.4, 4
UNION ALL SELECT 10, 2.5, 4
UNION ALL SELECT 11, 3, 4.5 ;
-- Create and view XML data array
DECLARE #DataXML XML ;
SET #DataXML = (
SELECT -- FLOAT values are formatted in XML like "1.000000000000000e+000", increasing the character count
-- Converting them to VARCHAR first keeps the XML small without sacrificing precision
-- They are unpacked as FLOAT in the function either way
[#x] = CAST(x AS VARCHAR(20)),
[#y] = CAST(y AS VARCHAR(20))
FROM #DataTable
FOR XML PATH('n'), ROOT('r') ) ;
SELECT #DataXML ;
-- Get the results
SELECT * FROM dbo.LinearReqression (#DataXML) ;
In my case X axis may be Date column also? So how can I calculate same regression analysis with date columns?
Short answer is: calculating trend line for dates is pretty much the same as calculating trend line for floats.
For dates you can choose some starting date and use number of days between the starting date and your dates as an X.
I didn't check your function itself and I assume that formulas there are correct.
Also, I don't understand why you generate XML out of the table and parse it back into the table inside the function. It is rather inefficient. You can simply pass the table.
I used your function to make two variants: for processing floats and for processing dates.
I'm using SQL Server 2008 for this example.
At first create a user-defined table type, so we could pass a table into the function:
CREATE TYPE [dbo].[FloatRegressionDataTableType] AS TABLE(
[x] [float] NOT NULL,
[y] [float] NOT NULL
)
GO
Then create the function that accepts such table:
CREATE FUNCTION [dbo].[LinearRegressionFloat] (#ParamData dbo.FloatRegressionDataTableType READONLY)
RETURNS TABLE AS RETURN (
WITH Array AS (
SELECT x,
y
FROM #ParamData
),
Medians AS (
SELECT xbar = AVG(x), ybar = AVG(y)
FROM Array ),
BetaCalc AS (
SELECT Beta = SUM(xdelta * (y - ybar)) / NULLIF(SUM(xdelta * xdelta), 0)
FROM Array
CROSS JOIN Medians
CROSS APPLY ( SELECT xdelta = (x - xbar) ) xd ),
AlphaCalc AS (
SELECT Alpha = ybar - xbar * beta
FROM Medians
CROSS JOIN BetaCalc),
SSCalc AS (
SELECT SS_tot = SUM((y - ybar) * (y - ybar)),
SS_err = SUM((y - (Alpha + Beta * x)) * (y - (Alpha + Beta * x)))
FROM Array
CROSS JOIN Medians
CROSS JOIN AlphaCalc
CROSS JOIN BetaCalc )
SELECT r_squared = CASE WHEN SS_tot = 0 THEN 1.0
ELSE 1.0 - ( SS_err / SS_tot ) END,
Alpha, Beta
FROM AlphaCalc
CROSS JOIN BetaCalc
CROSS JOIN SSCalc
)
GO
Very similarly, create a type for table with dates:
CREATE TYPE [dbo].[DateRegressionDataTableType] AS TABLE(
[x] [date] NOT NULL,
[y] [float] NOT NULL
)
GO
And create a function that accepts such table. For each given date it calculates the number of days between 2001-01-01 and the given date x using DATEDIFF and then casts the result to float to make sure that the rest of calculations is correct. You can try to remove the cast to float and you'll see the different result. You can choose any other starting date, it doesn't have to be 2001-01-01.
CREATE FUNCTION [dbo].[LinearRegressionDate] (#ParamData dbo.DateRegressionDataTableType READONLY)
RETURNS TABLE AS RETURN (
WITH Array AS (
SELECT CAST(DATEDIFF(day, '2001-01-01', x) AS float) AS x,
y
FROM #ParamData
),
Medians AS (
SELECT xbar = AVG(x), ybar = AVG(y)
FROM Array ),
BetaCalc AS (
SELECT Beta = SUM(xdelta * (y - ybar)) / NULLIF(SUM(xdelta * xdelta), 0)
FROM Array
CROSS JOIN Medians
CROSS APPLY ( SELECT xdelta = (x - xbar) ) xd ),
AlphaCalc AS (
SELECT Alpha = ybar - xbar * beta
FROM Medians
CROSS JOIN BetaCalc),
SSCalc AS (
SELECT SS_tot = SUM((y - ybar) * (y - ybar)),
SS_err = SUM((y - (Alpha + Beta * x)) * (y - (Alpha + Beta * x)))
FROM Array
CROSS JOIN Medians
CROSS JOIN AlphaCalc
CROSS JOIN BetaCalc )
SELECT r_squared = CASE WHEN SS_tot = 0 THEN 1.0
ELSE 1.0 - ( SS_err / SS_tot ) END,
Alpha, Beta
FROM AlphaCalc
CROSS JOIN BetaCalc
CROSS JOIN SSCalc
)
GO
This is how to test the functions:
-- test float data
DECLARE #FloatDataTable [dbo].[FloatRegressionDataTableType];
INSERT INTO #FloatDataTable (x, y)
VALUES
(1.2, 1.0)
,(1.6, 1)
,(2.0, 1.5)
,(2.0, 1.75)
,(2.1, 1.85)
,(2.1, 2)
,(2.2, 3)
,(2.2, 3)
,(2.3, 3.5)
,(2.4, 4)
,(2.5, 4)
,(3, 4.5);
SELECT * FROM dbo.LinearRegressionFloat(#FloatDataTable);
-- test date data
DECLARE #DateDataTable [dbo].[DateRegressionDataTableType];
INSERT INTO #DateDataTable (x, y)
VALUES
('2001-01-13', 1.0)
,('2001-01-17', 1)
,('2001-01-21', 1.5)
,('2001-01-21', 1.75)
,('2001-01-22', 1.85)
,('2001-01-22', 2)
,('2001-01-23', 3)
,('2001-01-23', 3)
,('2001-01-24', 3.5)
,('2001-01-25', 4)
,('2001-01-26', 4)
,('2001-01-31', 4.5);
SELECT * FROM dbo.LinearRegressionDate(#DateDataTable);
Here are two result sets:
r_squared Alpha Beta
----------------------------------------------------------
0.798224907472009 -2.66524390243902 2.46417682926829
r_squared Alpha Beta
----------------------------------------------------------
0.79822490747201 -2.66524390243902 0.246417682926829