filling missing combination in a table - sql

input:
item loc month year qty
A DEL 5 2020 12
A DEL 6 2020 14
A DEL 8 2020 16
A DEL 9 2020 17
output:
item loc month year qty
A DEL 5 2020 12
A DEL 6 2020 14
A DEL 7 2020 26
A DEL 8 2020 16
A DEL 9 2020 17
A DEL 10 2020 33
description:
I don't have month 7 in my input. So for calculating month 7 i do sum of previous two months quantity.
for example for month 7 output will be 12(from month 5)+14(from month 6)=26
So its like whenever any month will be missing i should fill that month with this logic.

I have written a script which is two step process but it only considers missing month between the values and not boundary values i.e. it wont assume 10 is missing as it is a boundary value.
1st Step: Insert the misisng month with NULL for all other columns.
INSERT INTO TEST_MISSING(MONTH)
select min_a - 1 + level
from ( select min(MONTH) min_a
, max(MONTH) max_a
from TEST_MISSING
)
connect by level <= max_a - min_a + 1
minus
select MONTH
from TEST_MISSING;
2nd Step: Populate the values of other columns using lag with values from rows about it.
and then using Window function calculate the quantity value.
SELECT NVL(ITEM, NEW_ITEM) ITEM,
NVL(LOC, NEW_LOC) LOC,
MONTH, NVL(YEAR, NEW_YEAR) YEAR,
CASE WHEN QTY IS NULL THEN SUM(NVL(QTY, 0)) OVER(PARTITION BY NEW_ITEM ORDER BY MONTH ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) ELSE QTY END AS QTY
FROM (
SELECT A.*,
nvl(item,CASE WHEN ITEM IS NULL THEN (LAG(ITEM) OVER(ORDER BY MONTH)) END) NEW_ITEM,
nvl(LOC,CASE WHEN LOC IS NULL THEN (LAG(LOC) OVER(ORDER BY MONTH)) END) NEW_LOC,
nvl(YEAR,CASE WHEN YEAR IS NULL THEN (LAG(YEAR) OVER(ORDER BY MONTH)) END) NEW_YEAR
FROM TEST_MISSING A)
X
ORDER BY MONTH;

Related

Sum of last 12 months

I have a table with 3 columns (Year, Month, Value) like this in Sql Server :
Year
Month
Value
ValueOfLastTwelveMonths
2021
1
30
30
2021
2
24
54 (30 + 24)
2021
5
26
80 (54+26)
2021
11
12
92 (80+12)
2022
1
25
87 (SUM of values from 1 2022 TO 2 2021)
2022
2
40
103 (SUM of values from 2 2022 TO 3 2021)
2022
4
20
123 (SUM of values from 4 2022 TO 5 2021)
I need a SQL request to calculate ValueOfLastTwelveMonths.
SELECT Year,
       Month,
Value,
SUM (Value) OVER (PARTITION BY Year, Month)
FROM MyTable
This is much easier if you have a row for each month and year, and then (if needed) you can filter the NULL rows out. The reason it's easier is because then you know how many rows you need to look back at: 11.
If you make a dataset of the years and months, you can then LEFT JOIN to your data, aggregate, and then finally filter the data out:
SELECT *
INTO dbo.YourTable
FROM (VALUES(2021,1,30),
(2021,2,24),
(2021,5,26),
(2021,11,12),
(2022,1,25),
(2022,2,40),
(2022,4,20))V(Year,Month,Value);
GO
WITH YearMonth AS(
SELECT YT.Year,
V.Month
FROM (SELECT DISTINCT Year
FROM dbo.YourTable) YT
CROSS APPLY (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))V(Month)),
RunningTotal AS(
SELECT YM.Year,
YM.Month,
YT.Value,
SUM(YT.Value) OVER (ORDER BY YM.Year, YM.Month
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS Last12Months
FROM YearMonth YM
LEFT JOIN dbo.YourTable YT ON YM.Year = YT.Year
AND YM.Month = YT.Month)
SELECT Year,
Month,
Value,
Last12Months
FROM RunningTotal
WHERE Value IS NOT NULL;
GO
DROP TABLE dbo.YourTable;

Calculate sales metrics (like past 6 months, past 3 months, sale one year ago etc.) on transaction data in BigQuery

I have to create a view in BigQuery with some details of product sales. The measurements to be included in the view are explained below. These measurements have to be calculated for each product for every day that product is sold. A product is identified by unique combination of 5 -6 attributes (in our demo, code1 and code2 columns). The date represents the transaction dates.
sales_today -> the sum of sales for each product (combination of code1 and code2) per day.
TotSales_previous_3_months -> the sum of sales for each product in the previous 3 months(without including any sales from current month). for e.g., if we are calculating TotSales_previous_3_months for a product sale on 5th March 2022, we have to sum up the sales of that product from 1st December 2021 to 28th February 2022.
TotSales_previous_6_months -> the sum of sales for each product in the previous 6 months(without including any sales from current month). Follow the same logic as for TotSales_previous_3_months.
sale_one_month_ago -> The sum of sales of the product on this day exactly one month ago. For e.g., if we are calculating sale_one_month_ago for a product sale on 5th March 2022, it would be the sum of sales of that product on 5th February 2022.
sale_one_year_ago -> The sum of sales of the product on this day exactly one month ago. For e.g., if we are calculating sale_one_month_ago for a product sale on 5th March 2022, it would be the sum of sales of that product on 5th March 2021.
Unique_count_flag -> flag = 1 if the number of sales of the product on a day = 1. If the number of sales of the product is more than 1 on a day, flag = 0.
I have created this table (test_sales) with some demo data for understanding.
code1
code2
date
gen
sales
1
A
2021-02-04
jerez
7
1
A
2021-02-04
abc
5
1
A
2022-02-04
wres
10
1
A
2022-03-04
tomz
10
1
A
2022-03-05
everyz
10
1
A
2022-05-01
ben10
30
1
A
2022-06-01
xyx
10
1
A
2022-06-01
xya
5
2
A
2022-05-10
iqoom
20
3
C
2022-01-10
imola
60
3
C
2022-04-01
nurburgring
50
3
C
2022-06-01
jerez
30
The result set after calculations should be like -
code1
code2
date
gen
sales
sales_today
TotSales_previous_3_months
TotSales_previous_6_months
sale_one_month_ago
sale_one_year_ago
Unique_count_flag
1
A
2021-02-04
jerez
7
12
0
0
0
0
1
A
2021-02-04
abc
5
12
0
0
0
0
1
A
2022-02-04
wres
10
10
0
0
0
12
1
1
A
2022-03-04
tomz
10
10
10
10
10
1
1
A
2022-03-05
everyz
10
10
10
10
0
1
1
A
2022-05-01
ben10
30
30
30
30
0
1
1
A
2022-06-01
xyx
10
15
50
60
30
0
1
A
2022-06-01
xya
5
15
50
60
30
0
2
A
2022-05-10
iqoom
20
20
0
0
0
1
3
C
2022-01-10
imola
60
60
0
0
0
1
3
C
2022-04-01
nurburgring
50
50
60
60
0
1
3
C
2022-06-01
jerez
30
30
50
110
0
1
I was able to create the below code to achieve result, but the problem is that this code works fine for small datasets but here I am dealing with around 60 GB of data(~50 columns and ~80 million rows). If I adapt the code given below for the original sales data(which itself is a combination of few tables after joining them) it just long runs. Is there an alternative or efficient way to achieve the results?
with temp as
(SELECT
code1,code2,date,gen,sales,
COUNT(*) OVER(PARTITION BY code1, code2, date) AS cnt,
SUM(sales) OVER(PARTITION BY code1, code2,date) AS sales_today,
array_agg(struct(sales as sales,date as date)) over(partition by code1,code2 order by date) as past_records
FROM
`test_sales`
)
select * except(past_records,cnt),
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date between (date_trunc(temp.date,MONTH) - INTERVAL 3 MONTH) and (date_trunc(temp.date, MONTH) - interval 1 day)) as TotSales_previous_3_months,
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date between (date_trunc(temp.date,MONTH) - INTERVAL 6 MONTH) and (date_trunc(temp.date, MONTH) - interval 1 day)) as TotSales_previous_6_months,
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date = temp.date - INTERVAL 1 MONTH) as sale_one_month_ago,
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date = temp.date - INTERVAL 1 YEAR) as sale_one_year_ago,
if(cnt = 1,1,0) as Unique_count_flag
from temp
Modified Code inspired from Mikhail's approach:-
select *,
-- extract(year from date) * 12 + extract(month from date) as months,
-- UNIX_DATE(date) AS days,
sum(sales) over(product_date) as sales_today,
sum(sales) over(product range between 3 preceding and 1 preceding) as TotSales_previous_3_months,
sum(sales) over(product range between 6 preceding and 1 preceding) as TotSales_previous_6_months,
case when extract(day from date) = 31 and extract(month from date) in (3,12,10,7,5)
then sum(sales) over(product_by_unix_date range between 31 preceding and 31 preceding)
when extract(day from date) = 30 and extract(month from date) = 3
then sum(sales) over(product_by_unix_date range between 30 preceding and 30 preceding)
when extract(day from date) = 29 and extract(month from date) = 3
then sum(sales) over(product_by_unix_date range between 29 preceding and 29 preceding)
else
sum(sales) over(product_day range between 1 preceding and 1 preceding)
end as sale_one_month_ago,
case when extract(day from date) = 29 and extract(month from date) = 2
then sum(sales) over(product_by_unix_date range between 366 preceding and 366 preceding)
else
sum(sales) over(product_day range between 12 preceding and 12 preceding)
end as sale_one_year_ago
from `river-blade-343102.test.test_sales`
window
product as (partition by code1, code2 order by extract(year from date) * 12 + extract(month from date)),
product_date as (partition by code1, code2, date ),
product_day as (partition by code1, code2, extract(day from date) order by extract(year from date) * 12 + extract(month from date)),
product_by_unix_date as (partition by code1,code2 order by UNIX_DATE(date))
Consider below version of your query - it still not the perfect - but at least it is easier to handle/read and maintain
select *,
sum(sales) over(product_date) as sales_today,
sum(sales) over(product range between 3 preceding and 1 preceding) as TotSales_previous_3_months,
sum(sales) over(product range between 6 preceding and 1 preceding) as TotSales_previous_6_months,
sum(sales) over(product_day range between 1 preceding and 1 preceding) as sale_one_month_ago,
sum(sales) over(product_day range between 12 preceding and 12 preceding) as sale_one_year_ago,
from test_sales
window
product as (partition by code1, code2 order by extract(year from date) * 12 + extract(month from date)),
product_date as (partition by code1, code2, date),
product_day as (partition by code1, code2, extract(day from date) order by extract(year from date) * 12 + extract(month from date))
if applied to sample data in your question - output is
Is there an alternative or efficient way to achieve the results?
So, definitely above is an alternative way with its own pros and cons
Whether it is more efficient - I do think so, but not 100% sure to be honest - it depends on your data - you need to test it against your data and see ...

SQL query for incoming and outgoing stocks, first and last

I need to make a query that shows sales and stocks (incoming and outgoing) for each model in October 2021.
The point is that for obtaining incoming and outgoing stocks I need to get vt_stocks_cube_sz.qty respectively for the first day of month and for the last day of month .
Now I wrote just sum of stocks (SUM(vt_stocks_cube_sz.qty) as stocks) but it isn't correct.
Could you help me to split the stocks according to the rule above, I cannot understant how to write the query correctly.
%%time
SELECT vt_sales_cube_sz.modc_barc2 model,
SUM(vt_sales_cube_sz.qnt) sales,
SUM(vt_stocks_cube_sz.qty) as stocks
FROM vt_sales_cube_sz
LEFT JOIN vt_date_cube2
ON vt_sales_cube_sz.id_calendar_int = vt_date_cube2.id_calendar_int
LEFT JOIN vt_stocks_cube_sz ON
vt_stocks_cube_sz.parent_modc_barc = vt_sales_cube_sz.modc_barc AND
vt_stocks_cube_sz.id_stock = vt_sales_cube_sz.id_stock AND
vt_stocks_cube_sz.id_calendar_int = vt_sales_cube_sz.id_calendar_int AND
vt_stocks_cube_sz.vipusk_type = vt_sales_cube_sz.price_type
WHERE vt_date_cube2.wk_year_id = 2021
AND vt_date_cube2.wk_MoY_id = 10
AND vt_sales_cube_sz.id_stock IN
(SELECT id_stock
FROM vt_warehouse_cube
WHERE channel = \'OffLine\')
GROUP BY vt_sales_cube_sz.modc_barc2
If you're looking for a robust and generalizable approach I'd suggest using analytic functions such as FIRST_VALUE, LAST_VALUE or something slightly different with RANK or ROW_NUMBER.
A simple example follows, so you can rerun it on your side and adjust it to the specific tables/fields you're using.
N.B.: You might need some tiebreakers in case you had multiple entries for the same first/last day.
with dummy_table as (
SELECT 1 as month, 1 as day, 10 as value UNION ALL
SELECT 1 as month, 2 as day, 20 as value UNION ALL
SELECT 1 as month, 3 as day, 30 as value UNION ALL
SELECT 2 as month, 1 as day, 5 as value UNION ALL
SELECT 2 as month, 3 as day, 15 as value UNION ALL
SELECT 2 as month, 5 as day, 25 as value
)
SELECT
month,
day,
case when day = first_day then 'first' else 'last' end as type,
value,
FROM (
SELECT *
, FIRST_VALUE(day) over (partition by month order by day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as first_day
, LAST_VALUE(day) over (partition by month order by day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_day
FROM dummy_table
) tmp
WHERE day = first_day OR day=last_day
Dummy table:
Row
month
day
value
1
1
1
10
2
1
2
20
3
1
3
30
4
2
1
5
5
2
3
15
6
2
5
25
Result:
Row
month
day
type
value
1
1
1
first
10
2
1
3
last
30
3
2
1
first
5
4
2
5
last
25

Selecting records that have low numbers consecutively

I have a table as following (using bigquery):
id
year
month
day
rating
111
2020
11
30
4
111
2020
12
01
4
112
2020
11
30
5
113
2020
11
30
5
Is there a way in which I can select ids that have ratings that are consecutively (two or more consecutive records) low (low as in both records' ratings less than 4.5)?
For example, my desired output is:
id
year
month
day
rating
111
2020
11
30
4
111
2020
12
01
4
If you want all rows, then you need to look at both the previous rating and the next rating:
SELECT t.*
FROM (SELECT t.*,
LAG(rating) OVER (PARTITION BY id ORDER BY year, month, day ASC) AS prev_rating,
LEAD(rating) OVER (PARTITION BY id ORDER BY year, month, day ASC) AS next_rating,
FROM dataset.table t
) t
WHERE (rating < 4.5 and prev_rating < 4.5) OR
(rating < 4.5 and next_rating < 4.5)
Below is for BigQuery Standard SQL
select * except(grp, seq_len)
from (
select *, sum(1) over(partition by grp) seq_len
from (
select *,
countif(rating >= 4.5) over(partition by id order by year, month, day) grp
from `project.dataset.table`
)
where rating < 4.5
)
where seq_len > 1

How to use lead lag function in oracle

I have written some query to get my resultant result as below :
Note: I have months starting from jan-2016 to jan-2018.
There are two types, either 'hist' or 'future'
Resultant dataset :
In this example : let consider combination of id1+id2+id3 as 1,2,3
type month id1 id2 id3 value
hist jan-17 1 2 3 10
hist feb-17 1 2 3 20
future jan-17 1 2 3 15
future feb-17 1 2 3 1
hist mar-17 1 2 3 2
future apr-17 1 2 3 5
My calculation logic depends on the quarter number of month .
For eg . for month of january(first month of quarter) i want the value to be : future of jan + future value of feb + future value of march .
so for jan-17 , output should be : 15+1 + 0(for march there is no corresponding future value)
for the month of feb (2nd month of quarter), value should be : hist of jan + future of feb + future of march i.e 10+1+0(future of march is not available)
Similarly for the month of march , value should be : history of jan + history of feb + future of march i.e 10+20+0(frecast of march no present) .
similarly for april,may.june(depending on quarter number of month)
I am aware of the lead lag function , but I am not able to apply it here
Can someone please help
I would not mess with lag, this can all be done with a group by if you convert your dates to quarters:
WITH
dset
AS
(SELECT DATE '2017-01-17' month, 5 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-02-17' month, 6 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-03-25' month, 7 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-05-25' month, 4 VALUE
FROM DUAL)
SELECT SUM (VALUE) value_sum, TO_CHAR (month, 'q') quarter, TO_CHAR (month, 'YYYY') year
FROM dset
GROUP BY TO_CHAR (month, 'q'), TO_CHAR (month, 'YYYY');
This results in:
VALUE_SUM QUARTER YEAR
18 1 2017
4 2 2017
We can use an analytic function if you need the result on each record:
SELECT SUM (VALUE) OVER (PARTITION BY TO_CHAR (month, 'q'), TO_CHAR (month, 'YYYY')) quarter_sum, month, VALUE
FROM dset
This results in:
QUARTER_SUM MONTH VALUE
18 1/17/2017 5
18 2/17/2017 6
18 3/25/2017 7
4 5/25/2017 4
Make certain you include year, you don't want to combine quarters from different years.
Well, as said in one of the comments.. the trick lies in another question of yours & the corresponding answer. Well... it goes somewhat like this..
with
x as
(select 'hist' type, To_Date('JAN-2017','MON-YYYY') ym , 10 value from dual union all
select 'future' type, To_Date('JAN-2017','MON-YYYY'), 15 value from dual union all
select 'future' type, To_Date('FEB-2017','MON-YYYY'), 1 value from dual),
y as
(select * from x Pivot(Sum(Value) For Type in ('hist' as h,'future' as f))),
/* Pivot for easy lag,lead query instead of working with rows..*/
z as
(
select ym,sum(h) H,sum(f) F from (
Select y.ym,y.H,y.F from y
union all
select add_months(to_Date('01-JAN-2017','DD-MON-YYYY'),rownum-1) ym, 0 H, 0 F
from dual connect by rownum <=3 /* depends on how many months you are querying...
so this dual adds the corresponding missing 0 records...*/
) group by ym
)
select
ym,
Case
When MOD(Extract(Month from YM),3) = 1
Then F + Lead(F,1) Over(Order by ym) + Lead(F,2) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 2
Then Lag(H,1) Over(Order by ym) + F + Lead(F,1) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 3
Then Lag(H,2) Over(Order by ym) + Lag(H,1) Over(Order by ym) + F
End Required_Value
from z