Calculate sales commission with maximum vlaue in big query - google-bigquery

Hi I am trying to calculate a sales commission with a maximum value of 25000 for the year. for example if an employee earning $3000 a month so the total for the year will be $36000. however I have to pay maximum $25000.
I tried window functions like sum(mtd_commission) over month and compare it against 25000 however it stops after 8th month or 24000. how I can calculate only (25000-24000) $1000 for the 9th month.
Thanks
[expected results]

You might consider below query.
WITH sample_table AS (
SELECT month, 3000 mtd_comm FROM UNNEST(GENERATE_ARRAY(1, 12)) month
)
SELECT *,
SUM(mtd_comm) OVER w0 ytd_comm,
CASE
WHEN SUM(mtd_comm) OVER w0 <= 25000 THEN mtd_comm
ELSE GREATEST(0, 25000 - SUM(mtd_comm) OVER w1)
END AS paid_comm,
LEAST(25000, SUM(mtd_comm) OVER w0) ytd_paid_comm,
FROM sample_table
WINDOW w0 AS (ORDER BY month),
w1 AS (ORDER BY month RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
;
Query results

Related

join info from two columns into one

I'm using a public data set to run some modeling while trying to learn BigQuery SQL. I have a date column but I'm trying to group by day of the year not full date. Date is entered as 2018-2-12 but I'd like it just as 2-12 or 02-12. I have the code to extract the day and month from date but can't find a way to concatenate the two in order for it to be grouped.
SELECT
EXTRACT (MONTH FROM sales.date) AS month,
EXTRACT(DAY FROM sales.date) AS day ,
ROUND(AVG(sales.bottles_sold/sales.pack), 2) as pack_qty, -- average case or pack
ROUND(AVG(sales.bottles_sold), 2) AS qty_bottles, -- average total number of bottles
ROUND(AVG(sales.sale_dollars), 2) as sales_rev, -- average sales rev
ROUND(AVG((sales.state_bottle_retail - sales.state_bottle_cost) * sales.bottles_sold), 2) AS profit, -- avg profit on that day
ROUND (AVG(sales.volume_sold_liters), 2) as volumeLit, -- average volume in liters
ROUND (AVG(sales.volume_sold_gallons), 2) as volumeGal -- average volume in gal
FROM `bigquery-public-data.iowa_liquor_sales.sales` AS sales
GROUP BY
month,
day
ORDER BY
volumeGal DESC;
SELECT
EXTRACT (MONTH FROM sales.date) AS month,
EXTRACT(DAY FROM sales.date) AS day ,
ROUND(AVG(sales.bottles_sold/sales.pack), 2) as pack_qty, -- average case or pack
ROUND(AVG(sales.bottles_sold), 2) AS qty_bottles, -- average total number of bottles
ROUND(AVG(sales.sale_dollars), 2) as sales_rev, -- average sales rev
ROUND(AVG((sales.state_bottle_retail - sales.state_bottle_cost) * sales.bottles_sold), 2) AS profit, -- avg profit on that day
ROUND (AVG(sales.volume_sold_liters), 2) as volumeLit, -- average volume in liters
ROUND (AVG(sales.volume_sold_gallons), 2) as volumeGal -- average volume in gal
FROM `bigquery-public-data.iowa_liquor_sales.sales` AS sales
GROUP BY
EXTRACT (MONTH FROM sales.date),
EXTRACT(DAY FROM sales.date)
ORDER BY
volumeGal DESC;
You should use the CONCAT function inside the SELECT statement. With the following query you will have a single day column with the "day-month" format in the result.
SELECT
CONCAT(EXTRACT (DAY FROM sales.date) ,'-', EXTRACT(MONTH FROM sales.date)) AS day ,
ROUND(AVG(sales.bottles_sold/sales.pack), 2) as pack_qty, -- average case or pack
ROUND(AVG(sales.bottles_sold), 2) AS qty_bottles, -- average total number of bottles
ROUND(AVG(sales.sale_dollars), 2) as sales_rev, -- average sales rev
ROUND(AVG((sales.state_bottle_retail - sales.state_bottle_cost) * sales.bottles_sold), 2) AS profit, -- avg profit on that day
ROUND (AVG(sales.volume_sold_liters), 2) as volumeLit, -- average volume in liters
ROUND (AVG(sales.volume_sold_gallons), 2) as volumeGal -- average volume in gal
FROM `bigquery-public-data.iowa_liquor_sales.sales` AS sales
GROUP BY
day
ORDER BY
volumeGal DESC;
You can use + operator for concatenating two strings.
Code:
SELECT
convert(nvarchar,month(sales.date))+'-'+ convert(nvarchar,day(sales.date) ) as date_month ,
ROUND(AVG(sales.bottles_sold/sales.pack), 2) as pack_qty, -- average case or pack
ROUND(AVG(sales.bottles_sold), 2) AS qty_bottles, -- average total number of bottles
ROUND(AVG(sales.sale_dollars), 2) as sales_rev, -- average sales rev
ROUND(AVG((sales.state_bottle_retail - sales.state_bottle_cost) * sales.bottles_sold), 2) AS profit, -- avg profit on that day
ROUND (AVG(sales.volume_sold_liters), 2) as volumeLit, -- average volume in liters
ROUND (AVG(sales.volume_sold_gallons), 2) as volumeGal -- average volume in gal
FROM bigquery-public-data.iowa_liquor_sales.sales AS sales
GROUP BY
date_month
ORDER BY
volumeGal DESC;

Calculate average and standard deviation for pre defined number of values substituting missing rows with zeros

I have a simple table that contains a record of products and their total sales per day over a year (just 3 columns - Product, Date, Sales). So, for example, if product A is sold every single day, it'll have 365 records. Similarly, if product B is sold for only 50 days, the table will have just 50 rows for that product - one for each day of sale.
I need to calculate the daily average sales and standard deviation for the entire year, which means that, for product B, I need to have additional 365-50=315 entries with zero sales to be able to calculate the daily average and standard deviation for the year correctly.
Is there a way to do this efficiently and dynamically in SQL?
Thanks
We can generate 366 rows and join the sales data to it:
WITH rg(rn) AS (
SELECT 1 AS rn
UNION ALL
SELECT a.rn + 1 AS rn
FROM rg a
WHERE a.rn <= 366
)
SELECT
*
FROM
rg
LEFT JOIN (
SELECT YEAR(saledate) as yr, DATEPART(dayofyear, saledate) as doy, count(*) as numsales
FROM sales
GROUP BY YEAR(saledate), DATEPART(dayofyear, saledate)
) s ON rg.rn = s.doy
OPTION (MAXRECURSION 370);
You can replace the nulls (where there is no sale data for that day) with e.g. AVG(COALESCE(numsales, 0)). You'll probably also need a WHERE clause to eliminate the 366th day on non leap years (such as MODULO the year by 4 and only do 366 rows if it's 0).
If you're only doing a single year, you can use a where clause in the sales subquery to give only the relevant records; most efficient is to use a range like WHERE salesdate >= DATEFROMPARTS(YEAR(GetDate()), 1, 1) AND salesdate < DATEFROMPARTS(YEAR(GetDate()) + 1, 1, 1) rather than calling a function on every sales date to extract the year from it to compare to a constant. You can also drop the YEAR(salesdate) from the select/group by if there is only a single year
If you're doing multiple years, you could make the rg generate more rows, or (perhaps simpler) cross join it to a list of years so you get 366 rows multiplied by e.g. VALUES (2015),(2016),(2017),(2018),(2019),(2020) (and make the year from the sales part of the join too)
find the first and last day of the year and then use datediff() to find number of days in that year.
After that don't use AVG on sales, but SUM(Sales) / days_in_year
select *,
days_in_year = datediff(day, first_of_year, last_of_year) + 1
from (values (2019), (2020)) v(year)
cross apply
(
select first_of_year = dateadd(year, year - 1900, 0),
last_of_year = dateadd(year, year - 1900 + 1, -1)
) d
There's a different way to look at it - don't try to add additional empty rows, just divide by the number of days in a year. While the number of days a year isn't constant (a leap year will have 366 days), it can be calculated easily since the first day of the year is always January 1st and the last is always December 31st:
SELECT YEAR(date),
product,
SUM(sales) / DATEPART(dy, DATEFROMPARTS(YEAR(date)), 12, 31))
FROM sales_table
GROUP BY YEAR(date), product

Is there a way to find the highest 90 day total value for any 90 day period in the last year?

I'm thinking the only way to do it is to sum the values between (today - 365) and (today -65 + 90) then move on by 1 day each time, but that would be impractical. Is there a way around it?
If you have one row on each day:
select top (1) t.*
from (select t.*, sum(x) over (order by date rows between 89 preceding and current row) as sum_90
from t
) t
order by sum_90 desc;

Average over rolling date period

I have 4 dimensions, which one of them is date. I need to calculate for each date, the average in the last 30 days, per each dimension value.
I have tried to run average over a partition by the 4 dimensions in a form of:
SELECT
Date, Produce,Company, Song, Revenues,
Average(case when Date between Date -Interval '31' day and Date - Interval '1' Day then Revenues else null End) over (partition by Date,Company,Song,Revenues order by Date) as "Running Average"
From
Base_Table
I get only nulls with every aggregation I tried.
Help is appreciated. Thanks
You can try below -
SELECT
Date, Produce,Company, Song, Revenues,
Average(Revenues) over (partition by Company,Song rows between 30 preceding and current row) as "Running Average"
From
Base_Table

How to calculate a dynamic average value between rows?

How to calculate a dynamic average value between rows?
first 12 months status_flag is going to be N and from 13th month onward we need to take the average of sales for first 13 rows and compare it with min and max values and if it lies in between min and max then set the status_flag as Y else set it as N.
Same for 14th row take the average of first 14 rows and compare it with min and max... and so on.
How to do this?
I think the challenging part is to get the average sales. You can use the Analytic Functions:
select Storeid, Months, Min, Max, sales,
avg(sales) over (order by Months RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as avg_sales
from your_table;
The rest should be easier. Note, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the default, so you can just skip it.
with a as
(select Storeid, Months, Min, Max, sales,
avg(sales) over (order by Months) as avg_sales
from your_table)
select Storeid, Months, Min, Max, sales, avg_sales,
case
when Months <= 12 then 'N'
else
case
when avg_sales between Min and Max then 'Y'
else 'N'
end
end as Status_flag
from a;
Update table t set status_flag =
case when
(Select count(*)
From table
where month <= t.Month) > 12
and
(select avg(sales)
from table
where Month <= t.Month)
Between Min and Max
then 'Y' else 'N' end