Taking rolling sum values of a column till its difference to another column is positive in Bigquery - sql

I'm not sure this can be done in Bigquery or not. Anyhow, I've been trying to solve this.
Columns:
In this Total Inventory, the inventory arrives on that date, demand predicts sales, and the final inventory is the inventory remaining on that day.
I want to subtract the rolling sum of demand to the total inventory to get the final inventory until the value is positive; afterwards, it will start again taking the rolling sum of demand.
Code that I've written:
GREATEST((SUM(Total_Inventory) OVER (PARTITION BY SKU ORDER BY Date) -
SUM(Total_Inventory) OVER (PARTITION BY SKU ORDER BY Date)),0)
Basically, this code breaks down as there is no condition on a rolling sum of demand which instructed it to stop.
After that, I had populate values 1,2..,n when the new inventory gets added. It helped me to partition based on arrival inventory; however, when the final inventory is positive from previous results and new inventory gets added, it didn't consider the last rolling sum of demand and start summing from that row.
The column where I impute 1,2...n: sort
Code:
GREATEST((SUM(Total_Inventory) OVER (PARTITION BY SKU ORDER BY Date) -
SUM(Total_Inventory) OVER (PARTITION BY SKU, sort ORDER BY Date)),0)
For better understanding, I'm attaching the screenshots Raw Data, Desire Results_1, Desire Resuls_2, Desire Results_3.
Any help would be great. Thanks in advance!!

Related

Finding the initial sampled time window after using SAMPLE BY again

I can't seem to find a perhaps easy solution to what I'm trying to accomplish here, using SQL and, more importantly, QuestDB. I also find it hard to put my exact question into words so bear with me.
Input
My real input is different of course but a similar dataset or case is the gas_prices table on the demo page of QuestDB. On https://demo.questdb.io, you can directly write and run queries against some sample database, so it should be easy enough to follow.
The main task I want to accomplish is to find out which month was responsible for the year's highest galon price.
Output
Using the following query, I can get the average galon price per month just fine.
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
timestamp
avg_per_month
2000-06-05T00:00:00.000000Z
1.6724
2000-07-05T00:00:00.000000Z
1.69275
2000-08-05T00:00:00.000000Z
1.635
...
...
Then, I get all these monthly averages, group them by year and return the maximum galon price per year by wrapping the above query in a subquery, like so:
SELECT timestamp, max(avg_per_month) as max_per_year FROM (
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
) SAMPLE BY 12M
timestamp
max_per_year
2000-01-05T00:00:00.000000Z
1.69275
2001-01-05T00:00:00.000000Z
1.767399999999
2002-01-05T00:00:00.000000Z
1.52075
...
...
Wanted output
I want to know which month was responsible for the maximum price of a year.
Looking at the output of the above query, we see that the maximum galon price for the year 2000 was 1.69275. Which month of the year 2000 had this amount as average price? I'd like to display this month in an additional column.
For the first row, July 2000 is shown in the additional column for year 2000 because it is responsible for the highest average price in 2000. For the second row, it was May 2001 as that month had the highest average price of 2001.
timestamp
max_per_year
which_month_is_responsible
2000-01-05T00:00:00.000000Z
1.69275
2000-07-05T00:00:00.000000Z
2001-01-05T00:00:00.000000Z
1.767399999999
2001-05-05T00:00:00.000000Z
...
...
What did I try?
I tried by adding a subquery to the SELECT to have a "duplicate" of some sort for the timestamp column but that's apparently never valid in QuestDB (?), so probably the solution is by adding even more subqueries in the FROM? Or a UNION?
Who can help me out with this? The data is there in the database and it can be calculated. It's just a matter of getting it out.
I think 'wanted output' can be achieved with window functions.
Please have a look at:
CREATE TABLE electricity (ts TIMESTAMP, consumption DOUBLE) TIMESTAMP(ts);
INSERT INTO electricity
SELECT (x*1000000)::timestamp, rnd_double()
FROM long_sequence(10000000);
SELECT day, ts, max_per_day
FROM
(
SELECT timestamp_floor('d', ts) as day,
ts,
avg_in_15_min as max_per_day,
row_number() OVER (PARTITION BY timestamp_floor('d', ts) ORDER BY avg_in_15_min desc) as rn_per_day
FROM
(
SELECT ts, avg(consumption) as avg_in_15_min
FROM electricity
SAMPLE BY 15m
)
) WHERE rn_per_day = 1

SQL calculating running total as you go down the rows but also taking other fields into account

I'm hoping you guys can help with this problem.
I have a set of data which I have displayed via excel.
I'm trying to work out the rolling new cap allowance but need to deduct from previous weeks bookings. I don't want to use a cursor so can anyone help.
I'm going to group by the product id so it will need to start afresh for every product.
In the image, Columns A to D are fixed and I am trying to calculate the data in column E ('New Cap'). The 'New Cap' is the expected results.
Column F gives a detailed formula of what im trying to do.
Not sure what I've done for the post to be marked down.
Thanks
Update:
The formula looks like this.
You want the sum of the cap through this row minus the sum of booked through the previous row. This is easy to do with window functions:
select t.*,
(sum(cap + booked) over (partition by productid order by weekbeg) - booked
) as new_cap
from t;
You can get the new running total using lag and sum over window functions - calculate the cap-booked first, then use sum over() for the running total:
select weekbeg, ProductId, Cap, Booked,
Sum(n) over(partition by productid order by weekbeg) New_Cap
from (
select *, cap - Lag(booked,1,0) over(partition by productid order by weekbeg)n
from t
)t

DAX Proportion of balance by state and date

I am currently attempting to use DAX queries to calculate the proportion of the balance attributed to each State in my analysis cube, from the following image:
I currently have a Sales table with a ReportDateKey that joins a ReportDate table that has a DateKey
If I use the following statement:
AllCurrentBalanceByDate:=CALCULATE([TotalCurrentBalance],ALLSELECTED())
It gives me the overall total, ignoring the date altogether, which is a useless figure.
If I enter the following query and display it in the excel spreadsheet:
AllCurrentBalanceByDate:=CALCULATE([TotalCurrentBalance],ALLSELECTED('Report Date'[Month]))
it is returning the same data as found in the Balance column. Again, useless. I need a total for each month, so that I can calculate the state balance / overall total for that month to get the proportion/percentage attributable to that State.
What am I doing wrong?
So if you want your measure to ignore whichever State is selected, you need to include the State columns in your ALL filter.
Also I suppose you want to use ALL instead of ALLSELECTED as your overall balance per month shouldn't be affected by external filters on state (but this depends on your use case)?
AllCurrentBalanceByDate:=CALCULATE(SUM([CurrentBalance]),ALL(Geography[StateName]))

Sum dates with different timestamps and picking the min date?

Beginner here. I want to have only one row for each delivery date but it is important to keep the hours and the minutes. I have the following table in Oracle (left):
As you can see there are days that a certain SKU (e.g SKU A) was delivered twice in the same day. The table on the right is the desired result. Essentially, I want to have the quantities that arrived on the 28th summed up and in the Supplier_delivery column I want to have the earliest delivery timestamp.
I need to keep the hours and the minutes otherwise I know I could achieve this by writing sth like: SELECT SKU, TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD'), SUM(QTY) FROM TABLE GROUP BY SKU , TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD')
Any ideas?
You can use MIN():
SELECT SKU, MIN(SUPPLIER_DELIVERY), SUM(QTY)
FROM TABLE
GROUP BY SKU, TRUNC(SUPPLIER_DELIVERY);
This assumes that SUPPLIER_DELIVERY is a date and does not need to be converted to one. But it would work with TO_DATE() in the GROUP BY as well.

How to right click table to show values?

In a pivot table that plots values against a timeline it is possible to right-click the table, select "Show values as..." and have them appear as a percentage of a particular day.
I'm trying to recreate the same behaviour using DAX measures: I would like to have a measure that shows each day's price as a percentage of the first day of the year.
I've successfully created a measure that correctly identifies the first date of the year, i.e. the baseline:
FDate:=CALCULATE(FIRSTDATE(Prices[Date]),ALLEXCEPT('Calendar','Calendar'[Year]))
However, I can't figure out how to use this FDate to get that day's price (needed as the baseline for further calculations):
CALCULATE([Sum of Price], ALLEXCEPT('Calendar','Calendar'[Year]), FILTER('Prices', 'Prices'[Date]=[FDate])) returns each day's price, not the first date's.
CALCULATE([Sum of Price], FILTER(ALLEXCEPT('Calendar','Calendar'[Year]),'Calendar'[Date]=[FDate])) ignores the YEAR report filter and returns the price of the very first date in my calendar table and not the first date in the year I've filtered for.
Any pointer in the right direction would be greatly appreciated!
Thanks
Here's the solution:
VAR FirstDate = [FDate]
RETURN(
CALCULATE([Price],
FILTER(ALLEXCEPT('Calendar','Calendar'[Year]),'Calendar'[Date]=FirstDate))
)
Variables allow you to define measure in a certain filter context but to leave it unaffected by subsequent filter contexts - that at least is my layman's understanding.
More info here: https://www.sqlbi.com/articles/variables-in-dax/