I'm using the example of how to create a sql trend line on a report using the below link.
https://www.mssqltips.com/sqlservertip/3432/add-a-linear-trendline-to-a-graph-in-sql-server-reporting-services/
I've got it all up and running but I want to work out the trend by departments also. However its just merging all the data into one final value, I think its the below section of code that needs altering to calculate the sum by each of the departments I add in, but how best do I do this?
-- calculate sample size and the different sums
SELECT
#sample_size = COUNT(*)
,#sumX = SUM(ID)
,#sumY = SUM([OrderQuantity])
,#sumXX = SUM(ID*ID)
,#sumYY = SUM([OrderQuantity]*[OrderQuantity])
,#sumXY = SUM(ID*[OrderQuantity])
FROM #Temp_Regression;
-- output results
SELECT
SampleSize = #sample_size
,SumRID = #sumX
,SumOrderQty =#sumY
,SumXX = #sumXX
,SumYY = #sumYY
,SumXY = #sumXY;
These variables are then used to work out the trend line:
-- calculate the slope and intercept
SET #slope = CASE WHEN #sample_size = 1
THEN 0 -- avoid divide by zero error
ELSE (#sample_size * #sumXY - #sumX * #sumY) / (#sample_size * #sumXX - POWER(#sumX,2))
END;
SET #intercept = (#sumY - (#slope*#sumX)) / #sample_size;
You need to add departments column in SELECT & GROUP BY
SELECT departments,
SampleSize = Count(*),
SumRID = Sum(ID),
SumOrderQty = Sum([OrderQuantity]),
SumXX = Sum(ID * ID),
SumYY = Sum([OrderQuantity] * [OrderQuantity]),
SumXY = Sum(ID * [OrderQuantity])
FROM #Temp_Regression
GROUP BY departments
Here is the easier way to calculate slope & intercept for all departments
;WITH cte
AS (SELECT departments,
sample_size = Count(*),
sumX = Sum(ID),
sumY = Sum([OrderQuantity]),
sumXX = Sum(ID * ID),
sumYY = Sum([OrderQuantity] * [OrderQuantity]),
sumXY = Sum(ID * [OrderQuantity])
FROM #Temp_Regression
GROUP BY departments),
slope
AS (SELECT departments,
Sample_Size,
sumX,
sumY,
slope = CASE
WHEN sample_size = 1 THEN 0 -- avoid divide by zero error
ELSE ( sample_size * sumXY - sumX * sumY ) / ( sample_size * sumXX - Power(sumX, 2) )
END
FROM cte)
SELECT departments,
slope,
intercept = ( sumY - ( slope * sumX ) ) / sample_size
FROM slope
Related
I need SUM of this result:
SELECT
ROUND(
(invoicesitems.pscost / invoicesitems.inmedunit -
AVG(ISNULL(mainexstock.peacecost,0)) / invoicesitems.inmedunit), 2)
*
CASE WHEN units.unitqty=3 THEN (invoicesitems.bigqty *
invoicesitems.inbigunit * invoicesitems.inmedunit) +
(invoicesitems.medqty * invoicesitems.inmedunit) +
invoicesitems.smallqty
ELSE (invoicesitems.bigqty * invoicesitems.inmedunit)
+ invoicesitems.smallqty
END AS PROFITS
FROM invoicesitems
INNER JOIN mainexstock ON mainexstock.pid = invoicesitems.pid
INNER JOIN units ON units.[uid] = invoicesitems.punits
WHERE invoicesitems.bid = 'B-0480580'
GROUP BY
invoicesitems.pid, invoicesitems.inbigunit,
invoicesitems.inmedunit, invoicesitems.bigqty,
invoicesitems.medqty, invoicesitems.smallqty,
invoicesitems.pscost, units.unitqty
You can use it as a CTE and get the sum. For example:
with a as (
-- your big query here
)
select sum(profits) as profits from a;
I want to implement this equation in sql :
f(x) = f(a) + (f(b)-f(a)) * (x-a)/(b-a)
My input is as following:
What I tried was:
select ((select CosValue from CosineTable where Angle=70) +
((select CosValue from CosineTable where Angle=75) -
(select CosValue from CosineTable where Angle=70)) * (73-70) / (75-70)
from CosineTable;
It's showing me a syntax error.
You are missing one closing bracket )
Try this:
SELECT ( ( SELECT CosValue
FROM CosineTable
WHERE Angle = 70
) + ( ( SELECT CosValue
FROM CosineTable
WHERE Angle = 75
) - ( SELECT CosValue
FROM CosineTable
WHERE Angle = 70
) ) * ( 73 - 70 ) / ( 75 - 70 ) )
FROM CosineTable;
But it only works if you always get a single value back in your Subselects.
Assuming there is only one row per angle in your table, you query can be simplified by using a cross join:
select a70.cosvalue + (a75.cosvalue - a70.cosvalue) * (73-70) / (75-70))
from CosineTable a70
cross join cosinetable a75
where a70.angle = 70
and a75.angle = 75;
I need a function to calculate a trend line. I have a query (part of the function):
select round(sum(nvl(vl_indice, vl_meta))/12, 2) from (
SELECT
SUM (vl_indice) vl_indice, SUM (vl_meta) vl_meta
FROM
(SELECT cd_mes, vl_indice, NULL vl_meta, dt.id_tempo,
fi.id_multi_empresa, fi.id_setor, fi.id_indice
FROM dbadw.fa_indice fi , dbadw.di_tempo dt ,
dbadw.di_multi_empresa dme , dbaportal.organizacao o ,
dbadw.di_indice di
WHERE fi.id_tempo = dt.id_tempo
AND DT.CD_MES BETWEEN TO_NUMBER(TO_CHAR(ADD_MONTHS(TO_DATE(TO_CHAR(PCD_MES),'YYYYMM'),- 11),'YYYYMM'))
AND PCD_MES
AND DT.ANO = TO_NUMBER(TO_CHAR(TO_DATE(TO_CHAR(PCD_MES),'YYYYMM'),'YYYY'))
AND fi.id_multi_empresa = dme.id_multi_empresa
AND dme.cd_multi_empresa = NVL(o.cd_multi_empresa_mv2000, o.cd_organizacao)
AND o.cd_organizacao = PCD_ORG
AND fi.id_setor IS NULL
AND fi.id_indice = di.id_indice
AND di.cd_indice = PCD_IVM
UNION ALL
SELECT cd_mes, NULL vl_indice, vl_meta, dt.id_tempo,
fm.id_multi_empresa, fm.id_setor, fm.id_indice
FROM dbadw.fa_meta_indice fm , dbadw.di_tempo dt ,
dbadw.di_multi_empresa dme , dbaportal.organizacao o ,
dbadw.di_indice di
WHERE fm.id_tempo = dt.id_tempo
AND DT.ANO = TO_NUMBER(TO_CHAR(TO_DATE(TO_CHAR(PCD_MES),'YYYYMM'),'YYYY'))
AND fm.id_multi_empresa = dme.id_multi_empresa
AND dme.cd_multi_empresa = NVL(o.cd_multi_empresa_mv2000, o.cd_organizacao)
AND o.cd_organizacao = PCD_ORG
AND fm.id_setor IS NULL
AND fm.id_indice = di.id_indice
AND di.cd_indice = PCD_IVM
)
GROUP BY cd_mes, id_tempo, id_multi_empresa, id_setor, id_indice
ORDER BY cd_mes);
Where I tried to calculate the trend line on the first line, but is not correct. Please, Can anybody help me?
Its very difficult to work out from a query what you want to fit a "trend line" to - by which I assume you mean to use least square linear regression to find a best fit to the data.
So an example with test data:
Oracle Setup:
CREATE TABLE data ( x, y ) AS
SELECT LEVEL,
230 + DBMS_RANDOM.VALUE(-5,5) - 3.14159 * DBMS_RANDOM.VALUE( 0.95, 1.05 ) * LEVEL
FROM DUAL
CONNECT BY LEVEL <= 1000;
As you can see the data is random but its approximately y = -3.14159x + 230
Query - Get the Least Square Regression y-intercept and gradient:
SELECT REGR_INTERCEPT( y, x ) AS best_fit_y_intercept,
REGR_SLOPE( y, x ) AS best_fit_gradient
FROM data
This will get something like:
best_fit_y_intercept best_fit_gradient
-------------------- -----------------
230.531799878168 -3.143190435415
Query - Get the y co-ordinate of the line of best fit:
SELECT x,
y,
REGR_INTERCEPT( y, x ) OVER () + x * REGR_SLOPE( y, x ) OVER () AS best_fit_y
FROM data
The solution is:
SELECT valor, mes,
((mes * SLOPE) + INTERCEPT) TENDENCIA, SLOPE, INTERCEPT from
( select valor, mes, ROUND(REGR_SLOPE(valor,mes) over (partition by id_multi_empresa),4)SLOPE,
ROUND(REGR_INTERCEPT(valor,mes) over (PARTITION by id_multi_empresa),4) INTERCEPT from( --the initial select
I have two tables:
ID,YRMO,Counts
1,Dec 2013,4
1,Jan 2014,6
1,Feb 2014,7
2,Jan,2014,6
2,Feb,2014,8
ID,YRMO,Counts
1,Dec 2013,10
1,Jan 2014,8
1,March 2014,12
2,Jan 2014,6
2,Feb 2014,10
I want to find the pearson corelation coefficient for each sets of ID. There are about more than 200 different IDS.
Pearson correlation is a measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive
More can be found here :http://oreilly.com/catalog/transqlcook/chapter/ch08.html
at calculating correlation section
To calculate Pearson Correlation Coefficient; you need to first calculate Mean then standard daviation and then correlation coefficient as outlined below
1. Calculate Mean
insert into tab2 (tab1_id, mean)
select ID, sum([counts]) /
(select count(*) from tab1) as mean
from tab1
group by ID;
2. Calculate standard deviation
update tab2
set stddev = (
select sqrt(
sum([counts] * [counts]) /
(select count(*) from tab1)
- mean * mean
) stddev
from tab1
where tab1.ID = tab2.tab1_id
group by tab1.ID);
3. Finally Pearson Correlation Coefficient
select ID,
((sf.sum1 / (select count(*) from tab1)
- stats1.mean * stats2.mean
)
/ (stats1.stddev * stats2.stddev)) as PCC
from (
select r1.ID,
sum(r1.[counts] * r2.[counts]) as sum1
from tab1 r1
join tab1 r2
on r1.ID = r2.ID
group by r1.ID
) sf
join tab2 stats1
on stats1.tab1_id = sf.ID
join tab2 stats2
on stats2.tab1_id = sf.ID
Which on your posted data results in
See a demo fiddle here http://sqlfiddle.com/#!3/0da20/5
EDIT:
Well refined a bit. You can use the below function to get PCC but I am not getting exact same result as of your but rather getting 0.999996000000000 for ID = 1.
This could be a great entry point for you. You can refine the calculation further from here.
create function calculate_PCC(#id int)
returns decimal(16,15)
as
begin
declare #mean numeric(16,5);
declare #stddev numeric(16,5);
declare #count numeric(16,5);
declare #pcc numeric(16,12);
declare #store numeric(16,7);
select #count = CONVERT(numeric(16,5), count(case when Id=#id then 1 end)) from tab1;
select #mean = convert(numeric(16,5),sum([Counts])) / #count
from tab1 WHERE ID = #id;
select #store = (sum(counts * counts) / #count) from tab1 WHERE ID = #id;
set #stddev = sqrt(#store - (#mean * #mean));
set #pcc = ((#store - (#mean * #mean)) / (#stddev * #stddev));
return #pcc;
end
Call the function like
select db_name.dbo.calculate_PCC(1)
A Single-Pass Solution:
There are two flavors of the Pearson correlation coefficient, one for a Sample and one for an entire Population. These are simple, single-pass, and I believe, correct formulas for both:
-- Methods for calculating the two Pearson correlation coefficients
SELECT
-- For Population
(avg(x * y) - avg(x) * avg(y)) /
(sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y)))
AS correlation_coefficient_population,
-- For Sample
(count(*) * sum(x * y) - sum(x) * sum(y)) /
(sqrt(count(*) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(*) * sum(y * y) - sum(y) * sum(y)))
AS correlation_coefficient_sample
FROM (
-- The following generates a table of sample data containing two columns with a luke-warm and tweakable correlation
-- y = x for 0 thru 99, y = x - 100 for 100 thru 199, etc. Execute it as a stand-alone to see for yourself
-- x and y are CAST as DECIMAL to avoid integer math, you should definitely do the same
-- Try TOP 100 or less for full correlation (y = x for all cases), TOP 200 for a PCC of 0.5, TOP 300 for one near 0.33, etc.
-- The superfluous "+ 0" is where you could apply various offsets to see that they have no effect on the results
SELECT TOP 200
CAST(ROW_NUMBER() OVER (ORDER BY [object_id]) - 1 + 0 AS DECIMAL) AS x,
CAST((ROW_NUMBER() OVER (ORDER BY [object_id]) - 1) % 100 AS DECIMAL) AS y
FROM sys.all_objects
) AS a
As I noted in the comments, you can try the example with TOP 100 or less for full correlation (y = x for all cases); TOP 200 yields correlations very near 0.5; TOP 300, around 0.33; etc. There is a place ("+ 0") to add an offset if you like; spoiler alert, it has no effect. Make sure you CAST your values as DECIMAL - integer math can significantly impact these calcs.
I'm trying to work out the percentage of growth between 2 years but it is returning growth as 0.
SELECT my.finmonth,
my.trnyear,
my.drawofficenum,
my1.ytdty,
ly1.ytdly,
CASE
WHEN my1.ytdty <> 0 THEN ( my1.ytdty - ly1.ytdly ) / ly1.ytdly * 100
ELSE 0
END AS Growth2012
FROM salestymonth my
LEFT JOIN salestyytd my1
ON my.finmonth = my1.finmonth
AND my.trnyear = my1.trnyear
AND my.drawofficenum = my1.drawofficenum
LEFT JOIN saleslyytd ly1
ON my.finmonth = ly1.finmonth
AND my.trnyear = ly1.trnyear
AND my.drawofficenum = ly1.drawofficenum
WHERE my.finmonth = '1'
ORDER BY ytdty DESC
Try 1.00*(my1.YTDTY - ly1.YTDLY) / ly1.YTDLY. If your column types are integers, you won't get non-integer results from dividing unless you force the numerator to be a decimal or float.