Trying to return the total sum of the bottom 3 results if I call the input as 3. Currently it returns as a dataset which isn't allowed, how would I fix this?
Code looks like
DELIMITER++
CREATE FUNCTION function1 (input INT) returns INT
BEGIN
DECLARE amount int;
SET amount = input;
SELECT SUM(T.C1*Y.C2)
FROM Table T, YTable Y
WHERE T.ID=Y.TID
ORDER BY T.C1*Y.C2 ASC
LIMIT amount;
END++
Table T (C1, ID) has values
(20,1), (50,2), (100,3), (110, 4)
YTable Y (C2, TID) has values
(30, 1), (90, 2), (110, 3), (160,4)
Expected output would be 20*30 + 50*90 + 100*110
=16,100
Use ROW_NUMBER() to determinate which are the bottom results.
WITH cte as (
SELECT T.C1*Y.C2 as total,
ROW_NUMBER() OVER (ORDER BY T.C1*Y.C2) as rn
FROM Table T
JOIN YTable Y
ON T.ID = Y.TID
)
SELECT SUM(total)
FROM cte
WHERE rn <= input
Construct the query using CONCAT(), prepare it, and execute it.
Related
Sample data:
create table #temp (id int, qty int, checkvalue int)
insert into #temp values (1,1,3)
insert into #temp values (2,2,3)
insert into #temp values (3,1,3)
insert into #temp values (4,1,3)
According to data above, I would like to show exact number of lines from top to bottom where sum(qty) = checkvalue. Note that checkvalue is same for all the records all the time. Regarding the sample data above, the desired output is:
Id Qty checkValue
1 1 3
2 2 3
Because 1+2=3 and no more data is needed to show. If checkvalue was 4, we would show the third record: Id:3 Qty:1 checkValue:4 as well.
This is the code I am handling this problem. The code is working very well.
declare #checkValue int = (select top 1 checkvalue from #temp);
declare #counter int = 0, #sumValue int = 0;
while #sumValue < #checkValue
begin
set #counter = #counter + 1;
set #sumValue = #sumValue + (
select t.qty from
(
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY id ASC) AS rownumber,
id,qty,checkvalue
FROM #temp
) AS foo
WHERE rownumber = #counter
) t
)
end
declare #sql nvarchar(255) = 'select top '+cast(#counter as varchar(5))+' * from #temp'
EXECUTE sp_executesql #sql, N'#counter int', #counter = #counter;
However, I am not sure if this is the best way to deal with it and wonder if there is a better approach. There are many professionals here and I'd like to hear from them about what they think about my approach and how we can improve it. Any advice would be appreciated!
Try this:
select id, qty, checkvalue from (
select t1.*,
sum(t1.qty) over (partition by t2.id) [sum]
from #temp [t1] join #temp [t2] on t1.id <= t2.id
) a where checkvalue = [sum]
Smart self-join is all you need :)
For SQL Server 2012, and onwards, you can easily achieve this using ROWS BETWEEN in your OVER clause and the use of a CTE:
WITH Running AS(
SELECT *,
SUM(qty) OVER (ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningQty
FROM #temp t)
SELECT id, qty, checkvalue
FROM Running
WHERE RunningQty <= checkvalue;
One basic improvement is to try & reduce the no. of iterations. You're incrementing by 1, but if you repurpose the logic behind binary searching, you'd get something close to this:
DECLARE #RoughAverage int = 1 -- Some arbitrary value. The closer it is to the real average, the faster things should be.
DECLARE #CheckValue int = (SELECT TOP 1 checkvalue FROM #temp)
DECLARE #Sum int = 0
WHILE 1 = 1 -- Refer to BREAK below.
BEGIN
SELECT TOP (#RoughAverage) #Sum = SUM(qty) OVER(ORDER BY id)
FROM #temp
ORDER BY id
IF #Sum = #CheckValue
BREAK -- Indicating you reached your objective.
ELSE
SET #RoughAverage = #CheckValue - #Sum -- Most likely incomplete like this.
END
For SQL 2008 you can use recursive cte. Top 1 with ties limits result with first combination. Remove it to see all combinations
with cte as (
select
*, rn = row_number() over (order by id)
from
#temp
)
, rcte as (
select
i = id, id, qty, sumV = qty, checkvalue, rn
from
cte
union all
select
a.id, b.id, b.qty, a.sumV + b.qty, a.checkvalue, b.rn
from
rcte a
join cte b on a.rn + 1 = b.rn
where
a.sumV < b.checkvalue
)
select
top 1 with ties id, qty, checkvalue
from (
select
*, needed = max(case when sumV = checkvalue then 1 else 0 end) over (partition by i)
from
rcte
) t
where
needed = 1
order by dense_rank() over (order by i)
I want to get positive part of a number x in sql. It means that the result is x if x>0 and zero otherwise. I mean to use it after an aggregate function.
select 1 as num, 200 as weight into #table
insert into #table values
(8, 100),
(10, 200),
(11, -300),
(20, -100);
Till now I have been using the following:
select sum(num * weight)/sum(weight) as Result,
IIf(sum(num * weight)/sum(weight)>0, sum(num * weight)/sum(weight), 0) as PositivePartResult
from #table
But it is not clear as the function gets longer. Is there a built-in function to get the same result without repetition of the formula?
Another way of writing same query is:
select Result,
case when Result > 0 Then Result else 0 end as PositivePartResult
from
(
select sum(num * weight)/sum(weight) as Result
from #table
)T
You could either calculate the value inline or, if you'll be doing this frequently, create a user defined function:
create function PositiveValue( #N as Int )
returns Int as
begin
return ( Sign( #N ) + 1 ) / 2 * #N;
end;
go
declare #Samples as Table ( N Int );
insert into #Samples ( N ) values ( -42 ), ( -1 ), ( 0 ), ( 1 ), ( 42 );
select N, ( Sign( N ) + 1 ) / 2 * N as PositiveValue1, dbo.PositiveValue( N ) as PositiveValue2
from #Samples;
-- drop function dbo.PositiveValue;
I have a table with 3 columns.
one of them is [Code]. I have many records on this table.
I want to select records that their [Code] are numbers close to 10 regularly
for example if select records that has [Code]=9 then select records that has [Code] = 8 etc...
This is what I implement based on your though.
If you wish near record or record-id, not value, then you can change only condition a.data to a.rid.
declare #t table (data int)
insert into #t values(1), (2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(50),(51),(52)
declare #value int = 11 , #getDatToValue int = 2
select * from
(
select * , ROW_NUMBER( ) over(order by data) rid
from #t
)
a
where
a.data between (#value - #getDatToValue) and (#value + #getDatToValue)
For me it's hard to explait what do I want so article's name may be unclear, but I hope I can describe it with code.
I have some data with two most important value, so let it be time t and value f(t). It's stored in the table, for example
1 - 1000
2 - 1200
3 - 1100
4 - 1500
...
I want to plot a graph using it, and this graph should contain N points. If table has rows less than this N, then we just return this table. But if it hasn't, we should group this points, for example, N = Count/2, then for an example above:
1 - (1000+1200)/2 = 1100
2 - (1100+1500)/2 = 1300
...
I wrote an SQL script (it works fine for N >> Count) (MonitoringDateTime - is t, and ResultCount if f(t))
ALTER PROCEDURE [dbo].[usp_GetRequestStatisticsData]
#ResourceTypeID bigint,
#DateFrom datetime,
#DateTo datetime,
#EstimatedPointCount int
AS
BEGIN
SET NOCOUNT ON;
SET ARITHABORT ON;
declare #groupSize int;
declare #resourceCount int;
select #resourceCount = Count(*)
from ResourceType
where ID & #ResourceTypeID > 0
SELECT d.ResultCount
,MonitoringDateTime = d.GeneratedOnUtc
,ResourceType = a.ResourceTypeID,
ROW_NUMBER() OVER(ORDER BY d.GeneratedOnUtc asc) AS Row
into #t
FROM dbo.AgentData d
INNER JOIN dbo.Agent a ON a.CheckID = d.CheckID
WHERE d.EventType = 'Result' AND
a.ResourceTypeID & #ResourceTypeID > 0 AND
d.GeneratedOnUtc between #DateFrom AND #DateTo AND
d.Result = 1
select #groupSize = Count(*) / (#EstimatedPointCount * #resourceCount)
from #t
if #groupSize = 0 -- return all points
select ResourceType, MonitoringDateTime, ResultCount
from #t
else
select ResourceType, CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
from #t
where [Row] % #groupSize = 0
group by ResourceType, [Row]
order by MonitoringDateTime
END
, but it's doesn't work for N ~= Count, and spend a lot of time for inserts.
This is why I wanted to use CTE's, but it doesn't work with if else statement.
So i calculated a formula for a group number (for use it in GroupBy clause), because we have
GroupNumber = Count < N ? Row : Row*NumberOfGroups
where Count - numer of rows in the table, and NumberOfGroups = Count/EstimatedPointCount
using some trivial mathematics we get a formula
GroupNumber = Row + (Row*Count/EstimatedPointCount - Row)*MAX(Count - Count/EstimatedPointCount,0)/(Count - Count/EstimatedPointCount)
but it doesn't work because of Count aggregate function:
Column 'dbo.AgentData.ResultCount' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
My english is very bad and I know it (and i'm trying to improve it), but hope dies last, so please advice.
results of query
SELECT d.ResultCount
, MonitoringDateTime = d.GeneratedOnUtc
, ResourceType = a.ResourceTypeID
FROM dbo.AgentData d
INNER JOIN dbo.Agent a ON a.CheckID = d.CheckID
WHERE d.GeneratedOnUtc between '2015-01-28' AND '2015-01-30' AND
a.ResourceTypeID & 1376256 > 0 AND
d.EventType = 'Result' AND
d.Result = 1
https://onedrive.live.com/redir?resid=58A31FC352FC3D1A!6118&authkey=!AATDebemNJIgHoo&ithint=file%2ccsv
Here's an example using NTILE and your simple sample data at the top of your question:
declare #samples table (ID int, sample int)
insert into #samples (ID,sample) values
(1,1000),
(2,1200),
(3,1100),
(4,1500)
declare #results int
set #results = 2
;With grouped as (
select *,NTILE(#results) OVER (order by ID) as nt
from #samples
)
select nt,AVG(sample) from grouped
group by nt
Which produces:
nt
-------------------- -----------
1 1100
2 1300
If #results is changed to 4 (or any higher number) then you just get back your original result set.
Unfortunately, I don't have your full data nor can I fully understand what you're trying to do with the full stored procedure, so the above would probably need to be adapted somewhat.
I haven't tried it, but how about instead of
select ResourceType, CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
from #t
where [Row] % #groupSize = 0
group by ResourceType, [Row]
order by MonitoringDateTime
perhaps something like
select ResourceType, CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
from #t
group by ResourceType, convert(int,[Row]/#groupSize)
order by MonitoringDateTime
Maybe that points you in some new direction? by converting to int we are truncating everything after the decimal so Im hoping that will give you a better grouping? you might need to put your row-number over resource type for this to work?
Below is an example of my data, table RR_Linest:
Portfolio ---- Month_number ---- Collections
A --- --------- 1 --------------------- $100-------------------------------------------------------------------------------------
A-------------- 2 --------------------- $90
A ------------- 3 --------------------- $80--------------------------------------------------------------------------------------
A ------------- 4 --------------------- $70--------------------------------------------------------------------------------------
B ------------- 1 -------------------- $100-------------------------------------------------------------------------------------
B ---- -------- 2 ---------------------- $90 -------------------------------------------------------------------------------------
B - ------------ 3 --------------------- $80
I was able to figure out how to how to get the slope,intercept, RSquare for one portfolio by removing the portfolio column and only selecting the month_Number (x) and collections data (y) for only one selected portfolio (I removed data for portfolio B) and running the code below.
I have been trying to change the function so that when I run it; it gives me the slope, intercept, and R-square by portfolio. Does someone know how to do that? I have tried many ways and I just can't figure it out.
First I created the function:
declare #RegressionInput_A [dbo].[RegressionInput_A]
insert into #RegressionInput_A (x,y)
select
([model month]),log([collection $])
from [dbo].[RR_Linest]
select * from [dbo].LinearRegression_A
GO
drop function dbo.LinearRegression_A
CREATE FUNCTION dbo.LinearRegression_A
(
#RegressionInputs_A AS dbo.RegressionInput_A READONLY
)
RETURNS #RegressionOutput_A TABLE
(
Slope DECIMAL(18, 6),
Intercept DECIMAL(18, 6),
RSquare DECIMAL(18, 6)
)
AS
BEGIN
DECLARE #Xaverage AS DECIMAL(18, 6)
DECLARE #Yaverage AS DECIMAL(18, 6)
DECLARE #slope AS DECIMAL(18, 6)
DECLARE #intercept AS DECIMAL(18, 6)
DECLARE #rSquare AS DECIMAL(18, 6)
SELECT
#Xaverage = AVG(x),
#Yaverage = AVG(y)
FROM
#RegressionInputs_A
SELECT
#slope = SUM((x - #Xaverage) * (y - #Yaverage))/SUM(POWER(x - #Xaverage, 2))
FROM
#RegressionInputs_A
SELECT
#intercept = #Yaverage - (#slope * #Xaverage)
SELECT #rSquare = 1 - (SUM(POWER(y - (#intercept + #slope * x), 2))/(SUM(POWER(y - (#intercept + #slope * x), 2)) + SUM(POWER(((#intercept + #slope * x) - #Yaverage), 2))))
FROM
#RegressionInputs_A
INSERT INTO
#RegressionOutput_A
(
Slope,
Intercept,
RSquare
)
SELECT
#slope,
#intercept,
#rSquare
RETURN
END
GO
Then I run the function
declare #RegressionInput_A [dbo].[RegressionInput_A]
insert into #RegressionInput_A (x,y)
select
([model month]),log([collection $])
from [dbo].[RR_Linest]
select * from [dbo].[LinearRegression_A](#RegressionInput_A)
Wow, this is a real cool example of how to use nested CTE's in a In Line Table Value Function. You want to use a ITVF since they are fast. See Wayne Sheffield’s blog article that attests to this fact.
I always start with a sample database/table if it is really complicated to make sure I give the user a correct solution.
Lets create a database named [test] based on model.
--
-- Create a simple db
--
-- use master
use master;
go
-- delete existing databases
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Test')
DROP DATABASE Test
GO
-- simple db based on model
create database Test;
go
-- switch to new db
use [Test];
go
Lets create a table type named [InputToLinearReg].
--
-- Create table type to pass data
--
-- Delete the existing table type
IF EXISTS (SELECT * FROM sys.systypes WHERE name = 'InputToLinearReg')
DROP TYPE dbo.InputToLinearReg
GO
-- Create the table type
CREATE TYPE InputToLinearReg AS TABLE
(
portfolio_cd char(1),
month_num int,
collections_amt money
);
go
Okay, here is the multi-layered SELECT statement that uses CTE's. The query analyzer treats this as a SQL statement which can be executed in parallel versus a regular function that can't. See the black box section of Wayne's article.
--
-- Create in line table value function (fast)
--
-- Remove if it exists
IF OBJECT_ID('CalculateLinearReg') > 0
DROP FUNCTION CalculateLinearReg
GO
-- Create the function
CREATE FUNCTION CalculateLinearReg
(
#ParmInTable AS dbo.InputToLinearReg READONLY
)
RETURNS TABLE
AS
RETURN
(
WITH cteRawData as
(
SELECT
T.portfolio_cd,
CAST(T.month_num as decimal(18, 6)) as x,
LOG(CAST(T.collections_amt as decimal(18, 6))) as y
FROM
#ParmInTable as T
),
cteAvgByPortfolio as
(
SELECT
portfolio_cd,
AVG(x) as xavg,
AVG(y) as yavg
FROM
cteRawData
GROUP BY
portfolio_cd
),
cteSlopeByPortfolio as
(
SELECT
R.portfolio_cd,
SUM((R.x - A.xavg) * (R.y - A.yavg)) / SUM(POWER(R.x - A.xavg, 2)) as slope
FROM
cteRawData as R
INNER JOIN
cteAvgByPortfolio A
ON
R.portfolio_cd = A.portfolio_cd
GROUP BY
R.portfolio_cd
),
cteInterceptByPortfolio as
(
SELECT
A.portfolio_cd,
(A.yavg - (S.slope * A.xavg)) as intercept
FROM
cteAvgByPortfolio as A
INNER JOIN
cteSlopeByPortfolio S
ON
A.portfolio_cd = S.portfolio_cd
)
SELECT
A.portfolio_cd,
A.xavg,
A.yavg,
S.slope,
I.intercept,
1 - (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) /
(SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) +
SUM(POWER(((I.intercept + S.slope * R.x) - A.yavg), 2)))) as rsquared
FROM
cteRawData as R
INNER JOIN
cteAvgByPortfolio as A ON R.portfolio_cd = A.portfolio_cd
INNER JOIN
cteSlopeByPortfolio S ON A.portfolio_cd = S.portfolio_cd
INNER JOIN
cteInterceptByPortfolio I ON S.portfolio_cd = I.portfolio_cd
GROUP BY
A.portfolio_cd,
A.xavg,
A.yavg,
S.slope,
I.intercept
);
Last but not least, setup a Table Variable and get the answers. Unlike you solution above, it groups by portfolio id.
-- Load data into variable
DECLARE #InTable AS InputToLinearReg;
-- insert data
insert into #InTable
values
('A', 1, 100.00),
('A', 2, 90.00),
('A', 3, 80.00),
('A', 4, 70.00),
('B', 1, 100.00),
('B', 2, 90.00),
('B', 3, 80.00);
-- show data
select * from CalculateLinearReg(#InTable)
go
Here is a picture of the results using your data.
CREATE FUNCTION dbo.LinearRegression
(
#RegressionInputs AS dbo.RegressionInput READONLY
)
RETURNS TABLE AS
RETURN
(
WITH
t1 AS ( --calculate averages
SELECT portfolio, x, y,
AVG(x) OVER(PARTITION BY portfolio) Xaverage,
AVG(y) OVER(PARTITION BY portfolio) Yaverage
FROM #RegressionInputs
),
t2 AS ( --calculate slopes
SELECT portfolio, Xaverage, Yaverage,
SUM((x - Xaverage) * (y - Yaverage))/SUM(POWER(x - Xaverage, 2)) slope
FROM t1
GROUP BY portfolio, Xaverage, Yaverage
),
t3 AS ( --calculate intercepts
SELECT portfolio, slope,
(Yaverage - (slope * Xaverage) ) AS intercept
FROM t2
),
t4 AS ( --calculate rSquare
SELECT t1.portfolio, slope, intercept,
1 - (SUM(POWER(y - (intercept + slope * x), 2))/(SUM(POWER(y - (intercept + slope * x), 2)) + SUM(POWER(((intercept + slope * x) - Yaverage), 2)))) AS rSquare
FROM t1
INNER JOIN t3 ON (t1.portfolio = t3.portfolio)
GROUP BY t1.portfolio
)
SELECT portfolio, slope, intercept, rSquare FROM t4
)