SQL calculating running total as you go down the rows but also taking other fields into account - sql

I'm hoping you guys can help with this problem.
I have a set of data which I have displayed via excel.
I'm trying to work out the rolling new cap allowance but need to deduct from previous weeks bookings. I don't want to use a cursor so can anyone help.
I'm going to group by the product id so it will need to start afresh for every product.
In the image, Columns A to D are fixed and I am trying to calculate the data in column E ('New Cap'). The 'New Cap' is the expected results.
Column F gives a detailed formula of what im trying to do.
Not sure what I've done for the post to be marked down.
Thanks
Update:
The formula looks like this.

You want the sum of the cap through this row minus the sum of booked through the previous row. This is easy to do with window functions:
select t.*,
(sum(cap + booked) over (partition by productid order by weekbeg) - booked
) as new_cap
from t;

You can get the new running total using lag and sum over window functions - calculate the cap-booked first, then use sum over() for the running total:
select weekbeg, ProductId, Cap, Booked,
Sum(n) over(partition by productid order by weekbeg) New_Cap
from (
select *, cap - Lag(booked,1,0) over(partition by productid order by weekbeg)n
from t
)t

Related

Finding the initial sampled time window after using SAMPLE BY again

I can't seem to find a perhaps easy solution to what I'm trying to accomplish here, using SQL and, more importantly, QuestDB. I also find it hard to put my exact question into words so bear with me.
Input
My real input is different of course but a similar dataset or case is the gas_prices table on the demo page of QuestDB. On https://demo.questdb.io, you can directly write and run queries against some sample database, so it should be easy enough to follow.
The main task I want to accomplish is to find out which month was responsible for the year's highest galon price.
Output
Using the following query, I can get the average galon price per month just fine.
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
timestamp
avg_per_month
2000-06-05T00:00:00.000000Z
1.6724
2000-07-05T00:00:00.000000Z
1.69275
2000-08-05T00:00:00.000000Z
1.635
...
...
Then, I get all these monthly averages, group them by year and return the maximum galon price per year by wrapping the above query in a subquery, like so:
SELECT timestamp, max(avg_per_month) as max_per_year FROM (
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
) SAMPLE BY 12M
timestamp
max_per_year
2000-01-05T00:00:00.000000Z
1.69275
2001-01-05T00:00:00.000000Z
1.767399999999
2002-01-05T00:00:00.000000Z
1.52075
...
...
Wanted output
I want to know which month was responsible for the maximum price of a year.
Looking at the output of the above query, we see that the maximum galon price for the year 2000 was 1.69275. Which month of the year 2000 had this amount as average price? I'd like to display this month in an additional column.
For the first row, July 2000 is shown in the additional column for year 2000 because it is responsible for the highest average price in 2000. For the second row, it was May 2001 as that month had the highest average price of 2001.
timestamp
max_per_year
which_month_is_responsible
2000-01-05T00:00:00.000000Z
1.69275
2000-07-05T00:00:00.000000Z
2001-01-05T00:00:00.000000Z
1.767399999999
2001-05-05T00:00:00.000000Z
...
...
What did I try?
I tried by adding a subquery to the SELECT to have a "duplicate" of some sort for the timestamp column but that's apparently never valid in QuestDB (?), so probably the solution is by adding even more subqueries in the FROM? Or a UNION?
Who can help me out with this? The data is there in the database and it can be calculated. It's just a matter of getting it out.
I think 'wanted output' can be achieved with window functions.
Please have a look at:
CREATE TABLE electricity (ts TIMESTAMP, consumption DOUBLE) TIMESTAMP(ts);
INSERT INTO electricity
SELECT (x*1000000)::timestamp, rnd_double()
FROM long_sequence(10000000);
SELECT day, ts, max_per_day
FROM
(
SELECT timestamp_floor('d', ts) as day,
ts,
avg_in_15_min as max_per_day,
row_number() OVER (PARTITION BY timestamp_floor('d', ts) ORDER BY avg_in_15_min desc) as rn_per_day
FROM
(
SELECT ts, avg(consumption) as avg_in_15_min
FROM electricity
SAMPLE BY 15m
)
) WHERE rn_per_day = 1

Taking rolling sum values of a column till its difference to another column is positive in Bigquery

I'm not sure this can be done in Bigquery or not. Anyhow, I've been trying to solve this.
Columns:
In this Total Inventory, the inventory arrives on that date, demand predicts sales, and the final inventory is the inventory remaining on that day.
I want to subtract the rolling sum of demand to the total inventory to get the final inventory until the value is positive; afterwards, it will start again taking the rolling sum of demand.
Code that I've written:
GREATEST((SUM(Total_Inventory) OVER (PARTITION BY SKU ORDER BY Date) -
SUM(Total_Inventory) OVER (PARTITION BY SKU ORDER BY Date)),0)
Basically, this code breaks down as there is no condition on a rolling sum of demand which instructed it to stop.
After that, I had populate values 1,2..,n when the new inventory gets added. It helped me to partition based on arrival inventory; however, when the final inventory is positive from previous results and new inventory gets added, it didn't consider the last rolling sum of demand and start summing from that row.
The column where I impute 1,2...n: sort
Code:
GREATEST((SUM(Total_Inventory) OVER (PARTITION BY SKU ORDER BY Date) -
SUM(Total_Inventory) OVER (PARTITION BY SKU, sort ORDER BY Date)),0)
For better understanding, I'm attaching the screenshots Raw Data, Desire Results_1, Desire Resuls_2, Desire Results_3.
Any help would be great. Thanks in advance!!

How to NTILE over distinct values in BigQuery?

I have a query that I'm trying to put together in Google BigQuery that would decile sales for each customer. The problem I'm running into is that if a decile breaks at the point where many customers have the same sales value, they could end up in different deciles despite having the same sales.
For example, if there were twenty customers in total, and one spent $100, 18 spent $50, and one spent $25, the 18 customers who spent $50 will still be broken out across all the deciles due to equal groups being created, whereas in reality I would want them to be placed in the same decile.
The data that I'm using is obviously a bit more complex -- there are about 10 million customers, and the sales are deciled within a particular group to which each customer belongs.
Example code:
NTILE(10) OVER (PARTITION BY customer_group ORDER BY yearly_sales asc) as current_sales_decile
The NTILE function works, but I just run into the problem described above and haven't figured out how to fix it. Any suggestions welcome.
Calculate the ntile yourself:
select ceiling(rank() over (partition by customer_group order by yearly_sales) * 10.0 /
count(*) over (partition by customer_group)
)
This gives you more control over how the tiles are formed. In particular, all rows with the same value go in the same tile.

Oracle SQL - Sum next X number of Rows

I have a table in Oracle database whith projected sales per week and would like to sum the next 3 weeks for each week. Here is an example of the table for one product and what I would like to achieve in the last column.
I tried the Sum(Proj Sales) over (partition by Product order by Date), but I am not sure how to configure the Sum Over to get what I am looking for.
Any assistance will be much appreciated.
You can use analytic functions. Assuming that the next three weeks are the current row and the next two:
select t.*,
sum(proj_sales) over (partition by product
order by date
rows between current row and 2 following
) as next_three_weeks
from t;

Calculate max value by a dimension over a number of days in SQL (Presto)

I am looking to get the max value by a specific dimension over a number of days in SQL, as per example below:
I have this initial dataset:
And I am looking to calculate the maximum of nr of items and sales by product type across the number of days, as in the example below:
Expected output:
Any advise on best way to get this? I tried Max function and Max_by to get the max by product id but it didnt work.
Thank you in advance.
Use window functions:
select t.*,
max(items) over (partition by product_type),
max(sales) over (partition by product_type)
from t;