How to calculate average value based on duration between measurements? - sql

I have data similar to this:
Price DateChanged Product
10 2012-01-01 A
12 2012-02-01 A
30 2012-03-01 A
10 2012-09-01 A
12 2013-01-01 A
110 2012-01-01 B
112 2012-02-01 B
130 2012-03-01 B
110 2012-09-01 B
112 2013-01-01 B
I want to calculate average value, but the challenge is this:
Look at the first record, price 10 is valid for a duration of one month, price 12 is valid for a duration of one month while price 30 is valid for a duration of six months.
So, a basic average for product A (10+12+30+10+12)/5 would result in 14.8 while taking duration in to account then the average price would be ~20.1.
What is the best approach to solve this?
I know I could create a sub-query with a row_number() to join against to calculate a duration, but is there a better way? SQL Server has powerful features like STDistance, so surely there is a function for this?

What you are looking for is called weighted average, and AFAIK, there is no built-in function in SQL Server that calculates it for you. However, is not that hard to calculate it by hand.
First, you need to find the weight of each data point, in this case, you need to find the duration of each price period. You might have some additional columns in your data that could enable easier lookup, but you could do it like this as well:
SELECT p1.Product, p1.Price, p1.DateChanged AS DateStart,
isnull(min(p2.DateChanged),getdate()) AS DateEnd
INTO #PricePlanStartEnd
FROM PricePlan p1
LEFT OUTER JOIN PricePlan p2
ON p1.DateChanged < p2.DateChanged
AND p1.Product =p2.Product
GROUP BY p1.Product, p1.Price, p1.DateChanged
ORDER BY p1.Product, p1.DateChanged
This creates a #PricePlanStartEnd temporary table that has the start and the end of each price period. I've used getdate() as the end of the current time period. If you need to just calculate an average up to the last price change, just use INNER JOIN instead of the LEFT OUTER JOIN.
After that you just need to divide the sum of (price * period) by the total length of the period, and get the answer.
Here is an SQL Fiddle with the calculation
Also when your working with months, you must remember that not all months are equal, so the price for December was active longer than it was for February.

Using CTE and row_number() to get monthly average up to the last dateChanged. Fiddle-Demo
;with cte as (
select product, dateChanged, price,
row_number() over (partition by product order by datechanged) rn
from x
)
select t1.product,
sum(t1.price *1.0 * datediff(month, t1.dateChanged,t2.dateChanged))/12 monthlyAvg
from cte t1 join cte t2 on t1.product = t2.product
and t1.rn +1 = t2.rn
group by t1.product
--Results
Product MonthlyAvg
A 20.166666
B 120.166666
OR if you need up to date daily average then use a LEFT JOIN Fiddle-Demo;
;with cte as (
select product, dateChanged, price,
row_number() over (partition by product order by datechanged) rn
from x
)
select t1.product,
sum(t1.price *1.0 *
datediff(day, t1.dateChanged,isnull(t2.dateChanged,getdate())))/365 dailyAvg
from cte t1 left join cte t2 on t1.product = t2.product
and t1.rn +1 = t2.rn
group by t1.product
--Results
product dailyAvg
A 21.386301
B 130.975342

Related

Past 7 days running amounts average as progress per each date

So, the query is simple but i am facing issues in implementing the Sql logic. Heres the query suppose i have records like
Phoneno Company Date Amount
83838 xyz 20210901 100
87337 abc 20210902 500
47473 cde 20210903 600
Output expected is past 7 days progress as running avg of amount for each date (current date n 6 days before)
Date amount avg
20210901 100 100
20210902 500 300
20210903 600 400
I tried
Select date, amount, select
avg(lg) from (
Select case when lag(amount)
Over (order by NULL) IS NULL
THEN AMOUNT
ELSE
lag(amount)
Over (order by NULL) END AS LG)
From table
WHERE DATE>=t.date-7) as avg
From table t;
But i am getting wrong avg values. Could anyone please help?
Note: Ive tried without lag too it results the wrong avgs too
You could use a self join to group the dates
select distinct
a.dt,
b.dt as preceding_dt, --just for QA purpose
a.amt,
b.amt as preceding_amt,--just for QA purpose
avg(b.amt) over (partition by a.dt) as avg_amt
from t a
join t b on a.dt-b.dt between 0 and 6
group by a.dt, b.dt, a.amt, b.amt; --to dedupe the data after the join
If you want to make your correlated subquery approach work, you don't really need the lag.
select dt,
amt,
(select avg(b.amt) from t b where a.dt-b.dt between 0 and 6) as avg_lg
from t a;
If you don't have multiple rows per date, this gets even simpler
select dt,
amt,
avg(amt) over (order by dt rows between 6 preceding and current row) as avg_lg
from t;
Also the condition DATE>=t.date-7 you used is left open on one side meaning it will qualify a lot of dates that shouldn't have been qualified.
DEMO
You can use analytical function with the windowing clause to get your results:
SELECT DISTINCT BillingDate,
AVG(amount) OVER (ORDER BY BillingDate
RANGE BETWEEN TO_DSINTERVAL('7 00:00:00') PRECEDING
AND TO_DSINTERVAL('0 00:00:00') FOLLOWING) AS RUNNING_AVG
FROM accounts
ORDER BY BillingDate;
Here is a DBFiddle showing the query in action (LINK)

How to add using the same row of data on SQL?

I am in need of showing the summation of contributions over time; however I would like to demonstrate it using this format.
Date Pay Total
8.1 100 100
8.8 150 250
8.15 50 300
So I have only two sets of data, the date and the amount paid.
I would like to show the change of total amount paid based of the payment amount.
I think I would need to use a subquery but I cannot get it to work for me!
Any suggestions?
You didn't specify a DBMS so this is ANSI SQL using a window function
select date,
pay,
sum(pay) over (order by date) as total
from the_table
order by date;
This assumes the 30 in the last line is just a typo and should actually be 300
These are portable ways to do it. I'm assuming that your dates are unique.
Inner join:
select t1."Date", min(t1.Pay) as Pay, sum(t1.Pay) as CumulativeTotal
from T t1 inner join T t2 on t2."Date" <= t1."Date"
group by t1."Date"
order by t1."Date"
Scalar subquery:
select
t1."Date", t1.Pay,
(select sum(Total) from T t2 where t2."Date" <= t1."Date") as CumulativeTotal
from T t1
order by t1."Date"

SQL Aggregates OVER and PARTITION

All,
This is my first post on Stackoverflow, so go easy...
I am using SQL Server 2008.
I am fairly new to writing SQL queries, and I have a problem that I thought was pretty simple, but I've been fighting for 2 days. I have a set of data that looks like this:
UserId Duration(Seconds) Month
1 45 January
1 90 January
1 50 February
1 42 February
2 80 January
2 110 February
3 45 January
3 62 January
3 56 January
3 60 February
Now, what I want is to write a single query that gives me the average for a particular user and compares it against all user's average for that month. So the resulting dataset after a query for user #1 would look like this:
UserId Duration(seconds) OrganizationDuration(Seconds) Month
1 67.5 63 January
1 46 65.5 February
I've been batting around different subqueries and group by scenarios and nothing ever seems to work. Lately, I've been trying OVER and PARTITION BY, but with no success there either. My latest query looks like this:
select Userid,
AVG(duration) OVER () as OrgAverage,
AVG(duration) as UserAverage,
DATENAME(mm,MONTH(StartDate)) as Month
from table.name
where YEAR(StartDate)=2014
AND userid=119
GROUP BY MONTH(StartDate), UserId
This query bombs out with a "Duration' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" error.
Please keep in mind I'm dealing with a very large amount of data. I think I can make it work with CASE statements, but I'm looking for a cleaner, more efficient way to write the query if possible.
Thank you!
You are joining two queries together here:
Per-User average per month
All Organisation average per month
If you are only going to return data for one user at a time then an inline select may give you joy:
SELECT AVG(a.duration) AS UserAvergage,
(SELECT AVG(b.Duration) FROM tbl b WHERE MONTH(b.StartDate) = MONTH(a.StartDate)) AS OrgAverage
...
FROM tbl a
WHERE userid = 119
GROUP BY MONTH(StartDate), UserId
Note - using comparison on MONTH may be slow - you may be better off having a CTE (Common Table Expression)
missing partition clause in Average function
OVER ( Partition by MONTH(StartDate))
Please try this. It works fine to me.
WITH C1
AS
(
SELECT
AVG(Duration) AS TotalAvg,
[Month]
FROM [dbo].[Test]
GROUP BY [Month]
),
C2
AS
(
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]
)
SELECT C2.*, C1.TotalAvg
FROM C2 c2
INNER JOIN C1 c1 ON c1.[Month] = c2.[Month]
ORDER BY c2.UserID, c2.[Month] desc;
I was able to get it done using a self join, There's probably a better way.
Select UserId, AVG(t1.Duration) as Duration, t2.duration as OrgDur, t1.Month
from #temp t1
inner join (Select Distinct MONTH, AVG(Duration) over (partition by Month) as duration
from #temp) t2 on t2.Month = t1.Month
group by t1.Month, t1.UserId, t2.Duration
order by t1.UserId, Month desc
Here's using a CTE which is probably a better solution and definitely easier to read
With MonthlyAverage
as
(
Select MONTH, AVG(Duration) as OrgDur
from #temp
group by Month
)
Select UserId, AVG(t1.Duration) as Duration, m.duration as OrgDur , t1.Month
from #temp t1
inner join MonthlyAverage m on m.Month = t1.Month
group by UserId, t1.Month, m.duration
You can try below with less code.
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY [Month]) AS TotalAvg,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]

Finding percentage change in prices between rows

I have been trying (to no avail) to formulate an SQL query that will return rows with the greatest change in pricing between the most recent entry and the first entry greater than 1 day previous.
Price scraping takes a non-trivial amount of time due to a large data set, so times between first and last rows for one pull will often be ± many minutes. I would like to be able to pull the first record from x time or greater ago, pseudo SELECT price FROM table WHERE date < [now epoch time in ms] - 86400000 LIMIT 1 ORDER BY date DESC
My table format is as follows: (date is epoch time in milliseconds)
itemid price date ...
-----------------------------------
... most recent entries ...
1 15.50 1373022446000
2 5.00 1373022446000
3 20.50 1373022446000
... first entries older than X milliseconds ...
1 13.00 1372971693000
2 7.00 1372971693000
3 20.50 1372971693000
I would like to have a query that returned a result something similar to the following
itemid abs pct
----------------------------
1 +2.50 +19.2%
2 -2.00 -28.6%
3 0.00 0.00%
I'm not sure how to approach this. It seems as though it should be able to be done with a query, but I've been struggling to make any progress. I'm running sqlite3 on Play Framework 2.1.1.
Thanks!
You can do this with correlated subqueries and joins. The first problem is identifying the most recent price. tmax helps with this by getting the latest date for each item. This is then joined to the original data to get information such as the price.
Then a correlated subquery is used to get the previous price as least xxx milliseconds before that date. Note this is a relative time span, based on the original date. If you want an absolute timespan, then do date arithmetic on the current time.
select t.itemid, t.price - t.prevprice,
(t.price - t.prevprice) / t.price as change
from (select t.*,
(select t2.price
from yourtable t2
where t2.itemid = t.itemid and
t2.date < t.date - xxx
order by date
limit 1
) as prevprice
from yourtable t join
(select itemid, max(date) as maxdate
from yourtable t
group by itemid
) tmax
on tmax.itemid = t.itemid and tmax.maxdate = t.date
) t
If you have a large amount of data, you might really consider upgrading to a database other than SQLite. In any event, indexes can help improve performance.
If I read your question correctly, you want the diff and % between the first price and the last price for each itemnid.
I think this will help you:
select
t1.itemid,
(select top 1 price from table tout where tout.itemid = t1.itemid order by date desc) -
(select top 1 price from table tout where tout.itemid = t1.itemid order by date) as dif,
((select top 1 price from table tout where tout.itemid = t1.itemid order by date desc) -
(select top 1 price from table tout where tout.itemid = t1.itemid order by date)) /
(select top 1 price from table tout where tout.itemid = t1.itemid order by date desc) * 100 as percent
from
table t1
group by t1.itemid

query to display additional column based on aggregate value

I've been mulling on this problem for a couple of hours now with no luck, so I though people on SO might be able to help :)
I have a table with data regarding processing volumes at stores. The first three columns shown below can be queried from that table. What I'm trying to do is to add a 4th column that's basically a flag regarding if a store has processed >=$150, and if so, will display the corresponding date. The way this works is the first instance where the store has surpassed $150 is the date that gets displayed. Subsequent processing volumes don't count after the the first instance the activated date is hit. For example, for store 4, there's just one instance of the activated date.
store_id sales_volume date activated_date
----------------------------------------------------
2 5 03/14/2012
2 125 05/21/2012
2 30 11/01/2012 11/01/2012
3 100 02/06/2012
3 140 12/22/2012 12/22/2012
4 300 10/15/2012 10/15/2012
4 450 11/25/2012
5 100 12/03/2012
Any insights as to how to build out this fourth column? Thanks in advance!
The solution start by calculating the cumulative sales. Then, you want the activation date only when the cumulative sales first pass through the $150 level. This happens when adding the current sales amount pushes the cumulative amount over the threshold. The following case expression handles this.
select t.store_id, t.sales_volume, t.date,
(case when 150 > cumesales - t.sales_volume and 150 <= cumesales
then date
end) as ActivationDate
from (select t.*,
sum(sales_volume) over (partition by store_id order by date) as cumesales
from t
) t
If you have an older version of Postgres that does not support cumulative sum, you can get the cumulative sales with a subquery like:
(select sum(sales_volume) from t t2 where t2.store_id = t.store_id and t2.date <= t.date) as cumesales
Variant 1
You can LEFT JOIN to a table that calculates the first date surpassing the 150 $ limit per store:
SELECT t.*, b.activated_date
FROM tbl t
LEFT JOIN (
SELECT store_id, min(thedate) AS activated_date
FROM (
SELECT store_id, thedate
,sum(sales_volume) OVER (PARTITION BY store_id
ORDER BY thedate) AS running_sum
FROM tbl
) a
WHERE running_sum >= 150
GROUP BY 1
) b ON t.store_id = b.store_id AND t.thedate = b.activated_date
ORDER BY t.store_id, t.thedate;
The calculation of the the first day has to be done in two steps, since the window function accumulating the running sum has to be applied in a separate SELECT.
Variant 2
Another window function instead of the LEFT JOIN. May of may not be faster. Test with EXPLAIN ANALYZE.
SELECT *
,CASE WHEN running_sum >= 150 AND thedate = first_value(thedate)
OVER (PARTITION BY store_id, running_sum >= 150 ORDER BY thedate)
THEN thedate END AS activated_date
FROM (
SELECT *
,sum(sales_volume)
OVER (PARTITION BY store_id ORDER BY thedate) AS running_sum
FROM tbl
) b
ORDER BY store_id, thedate;
->sqlfiddle demonstrating both.