Recursive calculation in BigQuery [duplicate] - google-bigquery

This question already has an answer here:
It's possible to create a rule in preceding rows in sum?
(1 answer)
Closed 2 years ago.
I need to calculate stock as a dinamic and recursive value, as the simple equation:
n value refers to a periodic time (month, day, year, etc)
Always when a stock value is negative, this is replaced by zero.
How can I calculate this in Big Query? Here is a example:
WITH `project.dataset.table` AS (
SELECT 10 entrada, 5 venda, 8 quebra, 8 mes, 2019 ano UNION ALL
SELECT 12, 8 , 3, 9, 2019 UNION ALL
SELECT 20, 15, 2, 10, 2019 UNION ALL
SELECT 30, 12, 2, 11, 2019 UNION ALL
SELECT 20, 10, 5, 12, 2019 UNION ALL
SELECT 30, 12, 2, 1, 2020 UNION ALL
SELECT 30, 12, 2, 2, 2020 UNION ALL
SELECT 30, 12, 2, 3, 2020
)
SELECT entrada, venda, quebra,
variacao,
greatest(coalesce(lag(variacao) over (partition by 'project.dataset.table' order by ano, mes),0) + entrada - venda - quebra, 0) as estoque
FROM (
SELECT *,
entrada - venda - quebra AS variacao
FROM `project.dataset.table`
)
And the expected result would be:
entrada venda quebra variacao estoque
10 5 8 -3 0
12 8 3 1 1
20 15 2 3 4
30 12 2 16 20
20 10 5 5 25
30 12 2 16 41
30 12 2 16 57
30 12 2 16 73
But, the results for the code above is:
entrada venda quebra variacao estoque
10 5 8 -3 0
12 8 3 1 0
20 15 2 3 4
30 12 2 16 19
20 10 5 5 21
30 12 2 16 21
30 12 2 16 32
30 12 2 16 32
Thanks in advance!

BigQuery does not support recursive operations natively. Try array_agg() combined with JavaScript user-defined function, but this approach is not very scalable:
CREATE TEMP FUNCTION special_sum(x ARRAY<INT64>)
RETURNS INT64
LANGUAGE js
AS """
var estoque = 0;
for (const num of x)
{
estoque = Math.max(estoque + parseInt(num), 0);
}
return estoque;
""";
WITH `project.dataset.table` AS (
SELECT 10 entrada, 5 venda, 8 quebra, 8 mes, 2019 ano UNION ALL
SELECT 12, 8 , 3, 9, 2019 UNION ALL
SELECT 20, 15, 2, 10, 2019 UNION ALL
SELECT 30, 12, 2, 11, 2019 UNION ALL
SELECT 20, 10, 5, 12, 2019 UNION ALL
SELECT 30, 12, 2, 1, 2020 UNION ALL
SELECT 30, 12, 2, 2, 2020 UNION ALL
SELECT 30, 12, 2, 3, 2020
)
select *,
special_sum(array_agg(entrada - venda - quebra) over (order by ano, mes rows unbounded preceding)) as estoque
from `project.dataset.table`

Solved using a "internal" sum of all values of stock. Using this:
WITH `project.dataset.table` AS (
SELECT 10 entrada, 5 venda, 8 quebra, 8 mes, 2019 ano UNION ALL
SELECT 12, 8 , 3, 9, 2019 UNION ALL
SELECT 20, 15, 2, 10, 2019 UNION ALL
SELECT 30, 12, 2, 11, 2019 UNION ALL
SELECT 20, 10, 5, 12, 2019 UNION ALL
SELECT 30, 12, 2, 1, 2020 UNION ALL
SELECT 30, 12, 2, 2, 2020 UNION ALL
SELECT 30, 12, 2, 3, 2020
)
SELECT entrada, venda, quebra,
sum(greatest(entrada - venda - quebra,0)) over (partition by 'project.dataset.table' order by ano, mes) as estoque,
mes, ano
FROM `project.dataset.table`

Related

Generate rows to fill in gaps between years, carry over a value from previous year

I have a table of road condition ratings (roads are rated from 1-20; 20 being good).
with road_inspections
(road_id, year, cond) as (
select 1, 2009, 17 from dual union all
select 1, 2011, 16 from dual union all
select 1, 2015, 14 from dual union all
select 1, 2016, 18.3 from dual union all
select 1, 2019, 18.1 from dual union all
select 2, 2013, 17.5 from dual union all
select 2, 2016, 18 from dual union all
select 2, 2019, 18 from dual union all
select 2, 2022, 18 from dual union all
select 3, 2022, 20 from dual)
select * from road_inspections
ROAD_ID YEAR COND
---------- ---------- ----------
1 2009 17
1 2011 16
1 2015 14
1 2016 18.3
1 2019 18.1
2 2013 17.5
2 2016 18
2 2019 18
2 2022 18
3 2022 20
db<>fiddle
In a query, for each road, I want to generate rows to fill in the gaps between the years.
For a given road, starting at the first row (the earliest inspection), there should be consecutive rows for each year all the way to the current year (the sysdate year; currently 2022).
For the filler rows, I want carry over the condition rating from the last known inspection.
The result would look like this:
ROAD_ID YEAR COND
---------- ---------- ----------
1 2009 17
1 2010 17 *
1 2011 16
1 2012 16 *
1 2013 16 *
1 2014 16 *
1 2015 14
1 2016 18.3
1 2017 18.3 *
1 2018 18.3 *
1 2019 18.1
1 2020 18.1 *
1 2021 18.1 *
1 2022 18.1 *
2 2013 17.5
2 2014 17.5 *
2 2015 17.5 *
2 2016 18
2 2017 18 *
2 2018 18 *
2 2019 18
2 2020 18 *
2 2021 18 *
2 2022 18
3 2022 20
*=filler row
Question:
How can I create those filler rows using an Oracle SQL query?
(My priorities are: simplicity first, performance second.)
You can use the LEAD analytic function with a LATERAL joined hierarchical query to generate the missing rows from each row until the next row:
SELECT r.road_id,
y.year,
r.cond
FROM ( SELECT r.*,
LEAD(year, 1, EXTRACT(YEAR FROM SYSDATE) + 1)
OVER (PARTITION BY road_id ORDER BY year) AS next_year
FROM road_inspections r
) r
CROSS JOIN LATERAL (
SELECT r.year + LEVEL - 1 AS year
FROM DUAL
CONNECT BY r.year + LEVEL - 1 < r.next_year
) y
Which, for the sample data:
CREATE TABLE road_inspections (road_id, year, cond) as
select 1, 2009, 17 from dual union all
select 1, 2011, 16 from dual union all
select 1, 2015, 14 from dual union all
select 1, 2016, 18.3 from dual union all
select 1, 2019, 18.1 from dual union all
select 2, 2013, 17.5 from dual union all
select 2, 2016, 18 from dual union all
select 2, 2019, 18 from dual union all
select 2, 2022, 18 from dual union all
select 3, 2022, 20 from dual;
Outputs:
ROAD_ID
YEAR
COND
1
2009
17
1
2010
17
1
2011
16
1
2012
16
1
2013
16
1
2014
16
1
2015
14
1
2016
18.3
1
2017
18.3
1
2018
18.3
1
2019
18.1
1
2020
18.1
1
2021
18.1
1
2022
18.1
2
2013
17.5
2
2014
17.5
2
2015
17.5
2
2016
18
2
2017
18
2
2018
18
2
2019
18
2
2020
18
2
2021
18
2
2022
18
3
2022
20
db<>fiddle here
with
road_inspections (road_id, year_, cond) as (
select 1, 2009, 17 from dual union all
select 1, 2011, 16 from dual union all
select 1, 2015, 14 from dual union all
select 1, 2016, 18.3 from dual union all
select 1, 2019, 18.1 from dual union all
select 2, 2013, 17.5 from dual union all
select 2, 2016, 18 from dual union all
select 2, 2019, 18 from dual union all
select 2, 2022, 18 from dual union all
select 3, 2022, 20 from dual
)
, prep (road_id, first_year) as (
select road_id, min(year_)
from road_inspections
group by road_id
)
, all_years (road_id, year_) as (
select p.road_id, l.year_
from prep p cross join lateral (
select first_year + level - 1 as year_
from dual
connect by level <= 2022 - first_year + 1
) l
)
select road_id, year_,
last_value(ri.cond ignore nulls) over
(partition by road_id order by year_) as cond
from all_years ay left outer join road_inspections ri using (road_id, year_)
;
The first subquery, prep, finds the first year for each road id. This is used in the all_years subquery to generate all the years relevant for each road id.
Then left-outer-join to the original cata, copy the cond wherever it is available, and use the analytic function last_value with the ignore nulls option to fill in the gaps.
Note that I changed the column name year to year_ (with a trailing underscore); year is an Oracle keyword, not a good choice for a column name.
Output:
ROAD_ID YEAR_ COND
---------- ---------- ----------
1 2009 17
1 2010 17
1 2011 16
1 2012 16
1 2013 16
1 2014 16
1 2015 14
1 2016 18.3
1 2017 18.3
1 2018 18.3
1 2019 18.1
1 2020 18.1
1 2021 18.1
1 2022 18.1
2 2013 17.5
2 2014 17.5
2 2015 17.5
2 2016 18
2 2017 18
2 2018 18
2 2019 18
2 2020 18
2 2021 18
2 2022 18
3 2022 20
Using LEAD function and connect by LEVEL row generator we can achieve the same. The DB FIDDLE here
with r as (
select
*
from
road_inspections
union
select
road_id,
2022,
cond
from
road_inspections
where
(road_id, year) in(
select
road_id,
max(year) over (partition by road_id)
from
road_inspections a
where
not exists (
select
1
from
road_inspections b
where
a.road_id = b.road_id
and b.year = 2022
)
)
),
data as(
SELECT
r.*,
nvl(
lead(year, 1) over (
partition by road_id
order by
year
)- year,
0
) gaps
FROM
r
)
select
road_id,
year + level -1 year,
cond
from
(
select
a.road_id,
year,
cond,
rownum rn,
gaps
from
data a
) connect by level <= gaps
and prior rn = rn
and prior dbms_random.value != 1
order by
road_id,
year + level -1;

How to group sales by month, quarter and year in the same row using case?

I'm trying to return the total number of sales for every month, every quarter, for the year 2016. I want to display annual sales on the first month row, and not on the other rows. Plus, I want to display the quarter sales on the first month of each quarter, and not on the others.
To further explain this, here's what I want to achieve:
MONTH MONTH_SALES QUARTER_SALES YEAR_SALES
1 2183 5917 12505
2 1712 - -
3 1972 - -
4 2230 6588 -
5 2250 - -
6 2108 - -
Here's my SQL query so far:
SELECT
Time.month,
SUM(Sales.sales) AS MONTH_SALES, -- display monthly sales.
CASE
WHEN MOD(Time.month, 3) = 1 THEN ( -- first month of quarter
SELECT
SUM(Sales.sales)
FROM
Sales,
Time
WHERE
Sales.Time_id = Time.Time_id
AND Time.year = 2016
GROUP BY
Time.quarter
FETCH FIRST 1 ROW ONLY
)
END AS QUARTER_SALES,
CASE
WHEN Time.month = 1 THEN ( -- display annual sales.
SELECT
SUM(Sales.sales)
FROM
Sales,
Time
WHERE
Sales.Time_id = Time.Time_id
AND Time.year = 2016
GROUP BY
Time.year
)
END AS YEAR_SALES
FROM
Sales,
Time
WHERE
Sales.Time_id = Time.Time_id
AND Time.year = 2016
GROUP BY
Time.month
ORDER BY
Time.month
I'm almost getting the desired output, but I'm getting the same duplicated 6588 value in quarter sales for the first and fourth month (because I'm fetching the first row that comes from first quarter).
MONTH MONTH_SALES QUARTER_SALES YEAR_SALES
1 2183 6588 12505
2 1712 - -
3 1972 - -
4 2230 6588 -
5 2250 - -
6 2108 - -
I even tried to put WHERE Time.quarter = ((Time.month * 4) / 12) but the month value from the outer query doesn't get passed in the subquery.
Unfortunately I don't have enough experience with CASE WHEN expressions to know how to pass the month row. Any tips would be awesome.
How about this?
Sample data:
SQL> with
2 time (time_id, month, quarter, year) as
3 (select 1, 1, 1, 2016 from dual union all
4 select 2, 2, 1, 2016 from dual union all
5 select 3, 3, 1, 2016 from dual union all
6 select 4, 5, 2, 2016 from dual union all
7 select 5, 7, 3, 2016 from dual union all
8 select 6, 8, 3, 2016 from dual union all
9 select 7, 9, 3, 2016 from dual union all
10 select 8, 10, 4, 2016 from dual union all
11 select 9, 11, 4, 2016 from dual
12 ),
13 sales (time_id, sales) as
14 (select 1, 100 from dual union all
15 select 1, 100 from dual union all
16 select 2, 200 from dual union all
17 select 3, 300 from dual union all
18 select 4, 400 from dual union all
19 select 5, 500 from dual union all
20 select 6, 600 from dual union all
21 select 7, 700 from dual union all
22 select 8, 800 from dual union all
23 select 9, 900 from dual
24 ),
Query begins here; it uses sum aggregate in its analytic form; partition by clause says what to compute. row_number, similarly, sorts rows in each quarter/year - it is later used in CASE expression to decide whether to show quarterly/yearly total or not.
25 temp as
26 (select t.month, t.quarter, t.year, sum(s.sales) month_sales
27 from time t join sales s on s.time_id = t.time_id
28 where t.year = 2016
29 group by t.month, t.quarter, t.year
30 ),
31 temp2 as
32 (select month, quarter, month_sales,
33 sum(month_sales) over (partition by quarter) quarter_sales,
34 sum(month_sales) over (partition by year ) year_sales,
35 row_number() over (partition by quarter order by quarter) rnq,
36 row_number() over (partition by year order by null) rny
37 from temp
38 )
39 select month,
40 month_sales
41 case when rnq = 1 then quarter_sales end month_sales,
42 case when rny = 1 then year_sales end year_sales
43 from temp2
44 order by month;
MONTH MONTH_SALES QUARTER_SALES YEAR_SALES
---------- ---------- ----------- ----------
1 200 700 4600
2 200
3 300
4 400 1500
5 500
6 600
7 700 2400
8 800
9 900
9 rows selected.
SQL>

SQL: How to deal with NULL and PARTITION BY?

I've got a question, if you don't mind terribly.
So suppose I have this kind of a table here – Products (amount sold by quarter in 2000, only there are multiple entries for the same product and quarter (with different dates)):
product
quarter
amount sold
Jeans
1
20
Jeans
2
40
Jeans
3
60
Jeans
4
5
Skirt
1
10
Skirt
2
5
Skirt
3
30
Blouse
1
15
Blouse
2
40
Blouse
3
60
Blouse
4
15
I want to reintroduce it as follows:
product
quarter1
quarter2
quarter3
quarter4
Jeans
20
40
60
5
Skirt
10
5
30
Null
Blouse
15
40
60
15
I decided to do it with partition (cause it's not exactly that simple, there are different rows with the same quarter for the same product, but different amount sold, that's why it's sum(amount_sold), but you get the idea, I hope):
WITH quater_sales as(
SELECT DISTINCT pro.product, pro.quarter, to_char (sum(pro.amount_sold) OVER (PARTITION BY pro.product, pro.quarter)) AS quater
FROM products pro
ORDER BY pro.pro.product)
SELECT quater_sales.prod_product, quater_sales.quater AS "Q1", qu2.quater AS "Q2", qu3.quater AS "Q3", qu4.quater AS "Q4"
FROM quater_sales
JOIN quater_sales qu2 ON quater_sales.prod_subcategory=qu2.prod_subcategory
JOIN quater_sales qu3 ON quater_sales.prod_subcategory=qu3.prod_subcategory
JOIN quater_sales qu4 ON quater_sales.prod_subcategory=qu4.prod_subcategory
WHERE quater_sales.calendar_quarter_number=1 and qu2.calendar_quarter_number=2 and qu3.calendar_quarter_number=3 and qu4.calendar_quarter_number=4
The problem is with partition (or maybe it's the condition of select) that the product that was not sold in all the 4 quarters is just discarded. What I basically get in the end is this:
product
quarter1
quarter2
quarter3
quarter4
Jeans
20
40
60
5
Blouse
15
40
60
15
So how do I make "skirts" appear there too? I am a bit stuck with this.
Have you considered using a PIVOT statement?
WITH
quarter_sales (product, quarter, amount_sold)
AS
(SELECT 'Jeans', 1, 20 FROM DUAL
UNION ALL
SELECT 'Jeans', 2, 40 FROM DUAL
UNION ALL
SELECT 'Jeans', 3, 60 FROM DUAL
UNION ALL
SELECT 'Jeans', 4, 5 FROM DUAL
UNION ALL
SELECT 'Skirt', 1, 10 FROM DUAL
UNION ALL
SELECT 'Skirt', 2, 5 FROM DUAL
UNION ALL
SELECT 'Skirt', 3, 30 FROM DUAL
UNION ALL
SELECT 'Blouse', 1, 15 FROM DUAL
UNION ALL
SELECT 'Blouse', 2, 40 FROM DUAL
UNION ALL
SELECT 'Blouse', 3, 60 FROM DUAL
UNION ALL
SELECT 'Blouse', 4, 15 FROM DUAL)
SELECT *
FROM (SELECT *
FROM quarter_sales qs)
PIVOT (SUM (amount_sold)
FOR quarter
IN (1 AS quarter1, 2 AS quarter2, 3 AS quarter3, 4 AS quarter4));
PRODUCT QUARTER1 QUARTER2 QUARTER3 QUARTER4
__________ ___________ ___________ ___________ ___________
Blouse 15 40 60 15
Jeans 20 40 60 5
Skirt 10 5 30
try pivot. this is how you would pivot in tsql
declare #tmp as table(product varchar(20),quarter int,[amount sold] int);
insert into #tmp values
('Jeans', 1, 20)
,('Jeans', 2, 40)
,('Jeans', 3, 60)
,('Jeans', 4, 5)
,('Skirt', 1, 10)
,('Skirt', 2, 5)
,('Skirt', 3, 30)
,('Blouse', 1, 15)
,('Blouse', 2, 40)
,('Blouse', 3, 60)
,('Blouse', 4, 15)
select product, [1] as quarter1,[2] as quarter2,[3] as quarter3,[4] as quarter4
from
(
select product,quarter,[amount sold] from #tmp)p
pivot
(
sum([amount sold])
for quarter in([1],[2],[3],[4])
) as pvt
output:
product quarter1 quarter2 quarter3 quarter4
Blouse 15 40 60 15
Jeans 20 40 60 5
Skirt 10 5 30 NULL

How to convert this MYSQL SQL to HIVE SQL?

The table ProductOrder columns include:
id shopid starttime endtime
1 123 2018-04-27 2018-04-28
2 234 2018-04-23 2018-04-30
3 189 2018-05-01 2018-05-30
4 321 2018-05-01 2018-05-29
I wan't to query for valid shop counts between two days and count by each day of latest month,the valid shop counts means the starttime<= $curDate <= endtime,and curDate is a variable of the each day of the leatest month.
Today is 2018-04-27,so the query result should be:
day count
2018-04-27 2
2018-04-26 1
2018-04-25 1
2018-04-24 1
2018-04-23 1
2018-04-22 0
2018-04-21 0
……………………………………
2018-03-26 0
I achieve this requirement in MYSQL.This SQL can work well in MYSQL.How can I convert to Hive Sql?
SELECT
DATE_SUB(DATE(NOW()), INTERVAL days_ago.days DAY) day,
COUNT(distinct(shopID)) count
FROM
(SELECT 0 days UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION
SELECT 10 UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14 UNION
SELECT 15 UNION SELECT 16 UNION SELECT 17 UNION SELECT 18 UNION SELECT 19 UNION
SELECT 20 UNION SELECT 21 UNION SELECT 22 UNION SELECT 23 UNION SELECT 24 UNION
SELECT 25 UNION SELECT 26 UNION SELECT 27 UNION SELECT 28 UNION SELECT 29)
AS days_ago
LEFT JOIN ProductOrder
ON DATE_SUB(DATE(NOW()), INTERVAL days_ago.days DAY) <= ProductOrder.endtime
AND DATE_SUB(DATE(NOW()), INTERVAL days_ago.days DAY) >= ProductOrder.starttime
AND status = 2
GROUP BY days_ago.days;
Hive does not support Non equi join conditions, they can be placed to the WHERE clause instead. Use STACK instead of many UNION subqueries.
select DATE_SUB(CURRENT_DATE, days_ago.days) day,
COUNT(DISTINCT(shopID)) count
from
(
select stack(30, --the number of elements
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29) as (days)
) days_ago
LEFT JOIN ProductOrder po ON status = 2
WHERE (DATE_SUB(CURRENT_DATE, days_ago.days) <= po.endtime
AND DATE_SUB(CURRENT_DATE, days_ago.days) >= po.starttime)
OR po.shopID is NULL --allow nulls
GROUP BY DATE_SUB(CURRENT_DATE, days_ago.days);
SELECT DATE_SUB(CURRENT_DATE, days_ago.days),
COUNT(DISTINCT(shopID)) count
FROM
(
SELECT explode(array(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29)) as days
) days_ago
LEFT JOIN ProductOrder po ON
(
DATE_SUB(CURRENT_DATE, days_ago.days) <= po.endtime
AND DATE_SUB(CURRENT_DATE, days_ago.days) >= po.starttime
AND status = 2
)
GROUP BY days_ago.days;

combine Join and not in date query

Here are my 2 tables.
Review_master
id Rev_month Rev_year ...
1 JAN 2017
2 MAR 2017
4 FEB 2017
Review_det
Id Rev_id closed_date (MM/DD/YYYY)
1 1 01/01/2017
2 1 02/01/2017
3 1 01/17/2017
4 2 03/03/2017
5 2 04/03/2017
6 4 02/02/2017
6 4 02/05/2017
Now i need to find out number of reviews which are closed outside that month. Review id "1" is of Jan month, and from review details table review_detail_id 2 is closed on feb month, so it should be counted.
Final output:
Rev_Id #_Closed_outside_month
1 1
2 1
4 0
there two main points :
Inequality of literal value of MON for closed date and rev_month
rev_month != to_char(closed_date,'MON')
Combining two tables with outer join.
So, you can easily use the following :
select m.id "Rev_id", count(closed_date) "#_Closed_outside_month"
from Review_det d right outer join Review_master m on ( d.Rev_id = m.id )
and rev_month != to_char(closed_date,'MON')
group by m.id
order by m.id;
D e m o
Here's one option:
SQL> with review_master (id, rev_month, rev_year) as
2 (select 1, 'jan', '2017' from dual union
3 select 2, 'mar', '2017' from dual union
4 select 4, 'feb', '2017' from dual),
5 review_det (id, rev_id, closed_date) as
6 (select 1, 1, date '2017-01-01' from dual union
7 select 2, 1, date '2017-02-01' from dual union
8 select 3, 1, date '2017-01-17' from dual union
9 select 4, 2, date '2017-03-03' from dual union
10 select 5, 2, date '2017-04-03' from dual union
11 select 6, 4, date '2017-02-02' from dual union
12 select 7, 4, date '2017-02-05' from dual)
13 select m.id,
14 case when to_char(d.closed_date, 'mmyyyy') <>
15 to_char(to_date(m.rev_month||' '||m.rev_year, 'mon yyyy',
16 'nls_date_language = english'), 'mmyyyy')
17 then 1
18 else 0
19 end closed_outside_Month
20 from review_master m, review_det d
21 where m.id = d.rev_id
22 and d.closed_date = (select max(d1.closed_date)
23 from review_Det d1
24 where d1.rev_id = d.rev_id
25 );
ID CLOSED_OUTSIDE_MONTH
---------- --------------------
1 1
2 1
4 0
SQL>