sql server window function for cumulative sum over a month - sql

just trying to use the window function of doing a cumulative sum against a month as follows.
sum(MeterReading) over (partition by Serial, code order by month(MeterReadingDate)) as cumulative
this seems to be way to slow to run and doesn't bring any results after waiting, is there something i am doing wrong?
Basically I want to see the sum against each month for each serial/code.

Select
serial,
code,
DATEPART(YEAR,MeterReadingDate) as Year,
DATEPART(MONTH,MeterReadingDate) as Month,
sum(MeterReading) over (
partition by
Serial,
code,
Datepart(YEAR,MeterReadingDate),
Datepart(MONTH,MeterReadingDate)
) as cumulative
from table
First making a sum with an order by claus makes no sense since you want to add all results for one month together.
Second, adding the two dateparts for year and month will partition your data so that the sums only will add meter readings from one month.
If you are interested in seeing yearly variations per month then you can remove the Datepart(YEAR,...) and add some average perhaps.

sum(MeterReading) over (
partition by Serial,
code,
DATEADD(MONTH, DATEDIFF(MONTH, 0, MeterReadingDate), 0)
order by MeterReadingDate
) as cumulative
The function with DATEADD and DATEDIFF converts a date to the first of the month.
Then I add this function to the PARTITION to group by Serial, Code and Month.

Related

Finding the initial sampled time window after using SAMPLE BY again

I can't seem to find a perhaps easy solution to what I'm trying to accomplish here, using SQL and, more importantly, QuestDB. I also find it hard to put my exact question into words so bear with me.
Input
My real input is different of course but a similar dataset or case is the gas_prices table on the demo page of QuestDB. On https://demo.questdb.io, you can directly write and run queries against some sample database, so it should be easy enough to follow.
The main task I want to accomplish is to find out which month was responsible for the year's highest galon price.
Output
Using the following query, I can get the average galon price per month just fine.
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
timestamp
avg_per_month
2000-06-05T00:00:00.000000Z
1.6724
2000-07-05T00:00:00.000000Z
1.69275
2000-08-05T00:00:00.000000Z
1.635
...
...
Then, I get all these monthly averages, group them by year and return the maximum galon price per year by wrapping the above query in a subquery, like so:
SELECT timestamp, max(avg_per_month) as max_per_year FROM (
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
) SAMPLE BY 12M
timestamp
max_per_year
2000-01-05T00:00:00.000000Z
1.69275
2001-01-05T00:00:00.000000Z
1.767399999999
2002-01-05T00:00:00.000000Z
1.52075
...
...
Wanted output
I want to know which month was responsible for the maximum price of a year.
Looking at the output of the above query, we see that the maximum galon price for the year 2000 was 1.69275. Which month of the year 2000 had this amount as average price? I'd like to display this month in an additional column.
For the first row, July 2000 is shown in the additional column for year 2000 because it is responsible for the highest average price in 2000. For the second row, it was May 2001 as that month had the highest average price of 2001.
timestamp
max_per_year
which_month_is_responsible
2000-01-05T00:00:00.000000Z
1.69275
2000-07-05T00:00:00.000000Z
2001-01-05T00:00:00.000000Z
1.767399999999
2001-05-05T00:00:00.000000Z
...
...
What did I try?
I tried by adding a subquery to the SELECT to have a "duplicate" of some sort for the timestamp column but that's apparently never valid in QuestDB (?), so probably the solution is by adding even more subqueries in the FROM? Or a UNION?
Who can help me out with this? The data is there in the database and it can be calculated. It's just a matter of getting it out.
I think 'wanted output' can be achieved with window functions.
Please have a look at:
CREATE TABLE electricity (ts TIMESTAMP, consumption DOUBLE) TIMESTAMP(ts);
INSERT INTO electricity
SELECT (x*1000000)::timestamp, rnd_double()
FROM long_sequence(10000000);
SELECT day, ts, max_per_day
FROM
(
SELECT timestamp_floor('d', ts) as day,
ts,
avg_in_15_min as max_per_day,
row_number() OVER (PARTITION BY timestamp_floor('d', ts) ORDER BY avg_in_15_min desc) as rn_per_day
FROM
(
SELECT ts, avg(consumption) as avg_in_15_min
FROM electricity
SAMPLE BY 15m
)
) WHERE rn_per_day = 1

Sum dates with different timestamps and picking the min date?

Beginner here. I want to have only one row for each delivery date but it is important to keep the hours and the minutes. I have the following table in Oracle (left):
As you can see there are days that a certain SKU (e.g SKU A) was delivered twice in the same day. The table on the right is the desired result. Essentially, I want to have the quantities that arrived on the 28th summed up and in the Supplier_delivery column I want to have the earliest delivery timestamp.
I need to keep the hours and the minutes otherwise I know I could achieve this by writing sth like: SELECT SKU, TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD'), SUM(QTY) FROM TABLE GROUP BY SKU , TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD')
Any ideas?
You can use MIN():
SELECT SKU, MIN(SUPPLIER_DELIVERY), SUM(QTY)
FROM TABLE
GROUP BY SKU, TRUNC(SUPPLIER_DELIVERY);
This assumes that SUPPLIER_DELIVERY is a date and does not need to be converted to one. But it would work with TO_DATE() in the GROUP BY as well.

How to Aggregate Weighted averages fields at higher level?

I am calculating weighted formula for a field as sum(revenue)/ Sum(qty) and this is as per the below query. Now I will be creating a view that would store these results as I shown in code below.
My question is, if I select this w_revenue out of the view and want to see per year, how will I aggregate to show it per year?
select month, sum(revenue)/ Sum(qty) as w_revenue,
year
from my_revenue_table
group by month, year;
create view xyz_rev as
Select month, sum(revenue)/ Sum(qty) as w_revenue,
year
from my_revenue_table
group by month, year;
select year, w_revenue
from
xyz_rev ;
You'll aggregate W_REVENUE once again:
select year, sum(w_revenue)
from xyz_rev
group by year
Since there is no real possibility to get the w_revenue for the year from just knowing these values for the months i would suggest to change the view as following:
create view xyz_rev as
Select month, sum(revenue)/ Sum(qty) as w_revenue,
year
from my_revenue_table
group by ROLLUP(MONTH), year;
This way you'll already have to year data along with the data for the single months.

Moving trailing week average in PostgreSQL

My source data includes Transaction ID, Date, Amount. I need a one week trailing average which moves on a daily basis and averaging amount per transaction. Problem is, that sometimes there is no transactions in particuliar date, and I need avg per transaction, no per day, and trailing average moves by day, not by week.In this particular case I can't use OVER with rows preceding. I'm stack with it :(
Data looks like this:
https://gist.github.com/avitominoz/a252e9f1ab3b1d02aa700252839428dd
There are two methods to doing this. One uses generate_series() to get all the results. The second uses a lateral join.
with minmax as (
select min(trade_date) as mintd, max(trade_date) as maxtd
from sales
)
select days.dte, s.values,
avg(values) over (order by days.dte
rows between 6 preceding and current row
) as avg_7day
from generate_series(mintd, maxtd, interval '1 day') days(dte) left join
sales s
on s.trade_dte = days.dte;
Note: this ignores the values on missing days rather than treating them as 0. If you want 0, then use avg(coalesce(values, 0)).

Calculating the AVG value per GROUP in the GROUP BY Clause

I'm working on a query in SQL Server 2005 that looks at a table of recorded phone calls, groups them by the hour of the day, and computes the average wait time for each hour in the day.
I have a query that I think works, but I'm having trouble convincing myself it's right.
SELECT
DATEPART(HOUR, CallTime) AS Hour,
(AVG(calls.WaitDuration) / 60) AS WaitingTimesInMinutes
FROM (
SELECT
CallTime,
WaitDuration
FROM Calls
WHERE DATEADD(day, DATEDIFF(Day, 0, CallTime), 0) = DATEADD(day, DATEDIFF(Day, 0, GETDATE()), 0)
AND DATEPART(HOUR, CallTime) BETWEEN 6 AND 18
) AS calls
GROUP BY DATEPART(HOUR, CallTime)
ORDER BY DATEPART(HOUR, CallTime);
To clarify what I think is happening, this query looks at all calls made on the same day as today, and where the hour of the call is between 6 and 18 -- the times are recorded and SELECTed in 24-hour time, so this between hours is to get calls between 6am and 6pm.
Then, the outer query computes the average of the WaitDuration column (and converts seconds to minutes) and then groups each average by the hour.
What I'm uncertain of is this: Are the reported BY HOUR averages only for the calls made in that hour's timeframe? Or does it compute each reported average using all the calls made on the day and between the hours? I know the AVG function has a optional OVER/PARTITION clause, and it's been a while since I used the AVG group function. What I would like is that each result grouped by an hour shows ONLY the average wait time for that specific hour of the day.
Thanks for your time in this.
The grouping happens on the values that get spit out of datepart(hour, ...). You're already filtering on that value so you know they're going to range between 6 and 18. That's all that the grouping is going to see.
Now of course the datepart() function does what you're looking for in that it looks at the clock and gives the hour component of the time. If you want your group to coincide with HH:00:00 to HH:59:59.997 then you're in luck.
I've already noted in comments that you probably meant to filter your range from 6 to 17 and that your query will probably perform better if you change that and compare your raw CallTime value against a static range instead. Your reasoning looks correct to me. And because your reasoning is correct, you don't need the inner query (derived table) at all.
Also if WaitDuration is an integer then you're going to be doing decimal division in your output. You'd need to cast to decimal in that case or change the divisor a decimal value like 60.00.
Yes if you use the AVG function with a GROUP BY only the items in that group are averaged. Just like if you use the COUNT function with a GROUP BY only the items in that group are counted.
You can use windowing functions (OVER/PARTITION) to conceptually perform GROUP BYs on different criteria for a single function.
eg
AVG(zed) OVER (PARTITION BY DATEPART(YEAR, CallTime)) as YEAR_AVG
Are the reported BY HOUR averages only for the calls made in that hour's timeframe?
Yes. The WHERE clause is applied before the grouping and aggregation, so the aggregation will apply to all records that fit the WHERE clause and within each group.