SQL from QTD to daily values bigquery - sql

We receive QTD data daily, what is needed is to transform it to daily -or even monthly any one would work-
The value column has no particular pattern it can increase and decrease and might reach to zero because of the product returns and might be the same value -no purchase- or missing for any other reason
SKU
value
Date
ABC
200
2022-01-10
ABC
300
2022-02-10
ABC
100
2022-03-10
XYZ
1000
2022-01-10
XYZ
1200
2022-02-10
XYZ
2022-03-10
Now the required out put should be like this, also avoiding a new quarter value to get subtracted from last day of the previous quarter value
SKU
value
Date
ABC
200
2022-01-10
ABC
100
2022-02-10
ABC
-200
2022-03-10
XYZ
1000
2022-01-10
XYZ
200
2022-02-10
XYZ
0
2022-03-10
The tricky part would be in the entry of the new quarter for example
assuming by default that Q4 is from October to December and Q1 from Jan to March
SKU
value
Date
ABC
200
2022-01-12
ABC
300
2022-02-12
ABC
100
2022-03-12
ABC
100
2022-01-01
ABC
250
2022-02-01
ABC
300
2022-03-01
This should be
SKU
value
Date
ABC
200
2022-01-12
ABC
100
2022-02-12
ABC
-200
2022-03-12
ABC
100
2022-01-01
ABC
150
2022-02-01
ABC
50
2022-03-01
This is on big query any help would be much appreciated

You might consider below.
WITH sample_table AS (
SELECT 'ABC' SKU, 200 value, '2022-01-12' Date UNION ALL
SELECT 'ABC' SKU, 300 value, '2022-02-12' Date UNION ALL
SELECT 'ABC' SKU, 100 value, '2022-03-12' Date UNION ALL
SELECT 'ABC' SKU, 100 value, '2022-01-01' Date UNION ALL
SELECT 'ABC' SKU, 250 value, '2022-02-01' Date UNION ALL
SELECT 'ABC' SKU, 300 value, '2022-03-01' Date
)
SELECT SKU,
IFNULL(value - LAG(value, 1, 0) OVER w, 0) AS value,
Date
FROM (SELECT * REPLACE(PARSE_DATE('%Y-%d-%m', Date) AS Date) FROM sample_table)
WINDOW w AS (PARTITION BY SKU, EXTRACT(YEAR FROM Date), EXTRACT(QUARTER FROM Date)
ORDER BY UNIX_DATE(Date));
Query results

Related

How to transform data into daily snapshot given the two date columns?

I have product data in my table which looks similar to this
product_id
user_id
sales_start
sales_end
quantity
1
12
2022-01-01
2022-02-01
15
2
234
2022-11-01
2022-12-31
123
I want to transform the table into a daily snapshot so that it would look something like this:
product_id
user_id
quantity
date
1
12
15
2022-01-01
1
12
15
2022-01-02
1
12
15
2022-01-03
...
...
...
...
2
234
123
2022-12-31
I know how to do a similar thing in Pandas, but I need to do it within AWS Athena.
I thought of getting the date interval and unnest it, but I am struggling with mapping them properly.
Any ideas on how to transform data?
This will help you sequence
SELECT product_id, user_id, quantity, date(date) as date FROM(
VALUES
(1, 12, DATE '2022-01-01', DATE '2022-02-01', 15),
(2, 234, DATE '2022-11-01', DATE '2022-12-31', 123)
) AS t (product_id, user_id, sales_start, sales_end, quantity),
UNNEST(sequence(sales_start, sales_end, interval '1' day)) t(date)
You can use sequnece to generate dates range and then unnest it:
-- sample data
with dataset(product_id, user_id, sales_start, sales_end, quantity) as (
values (1, 12 , date '2022-01-01', date '2022-01-05', 15), -- short date ranges
(2, 234, date '2022-11-01', date '2022-11-03', 123) -- short date ranges
)
-- query
select product_id, user_id, quantity, date
from dataset,
unnest(sequence(sales_start, sales_end, interval '1' day)) as t(date);
Output:
product_id
user_id
quantity
date
1
12
15
2022-01-01
1
12
15
2022-01-02
1
12
15
2022-01-03
1
12
15
2022-01-04
1
12
15
2022-01-05
2
234
123
2022-11-01
2
234
123
2022-11-02
2
234
123
2022-11-03

Bigquery merge row where start date for one row is the end date for another

In bigquery, I have a customer table with information about how much he spends X amount of money between a start date and end date like this:
id
start_date
end_date
amount
1
2022-01-01
2022-01-10
100
1
2022-01-10
2022-01-15
30
1
2022-02-10
2022-02-18
10
1
2022-02-18
2022-02-20
30
1
2022-02-20
2022-02-25
50
1
2022-02-18
2022-03-20
5000
2
2022-01-12
2022-01-15
30
2
2022-01-15
2022-01-27
30
And I would like to have this:
id
start_date
end_date
amount
1
2022-01-01
2022-01-15
130
1
2022-02-10
2022-02-25
90
1
2022-02-18
2022-03-20
5000
2
2022-01-12
2022-01-27
60
The catch is that there can be multiple contiguous rows for the same id, and if there is a merge we want to merge the row with the smallest time interval possible, in the example the row with id=1,start_date=2022-02-18,end_date=2022-03-20 is not merged.
Consider below approach
select id, min(start_date) start_date, max(end_date) end_date, sum(amount) amount
from (
select *, countif(ifnull(new_group, true)) over (partition by id order by end_date) grp
from (
select *, start_date != lag(end_date) over(partition by id order by end_date) new_group
from your_table
)
)
group by id, grp
if applied to sample data in your question - output is

Query to aggregate data from table into another table based on number of entries of an id column

I have a table like this:
order_id start_date end_date amount corrected_amount
1 2020-01-01 2020-01-31 100 95
1 2020-02-01 2020-02-28 200 200
1 2020-03-01 2020-03-30 100 100
1 2020-10-01 2020-11-25 200 95
2 2020-01-01 2020-05-30 500 250
3 2020-01-01 2020-12-31 400 5
And I am trying to create a query to aggregate this into a smaller table with just one row per order_id and I need to sum this together using a few rules that I am having some problems implementing.
In the case where there just exists one entry like for id 2 and 3 then I want to return just the order_id start_date, end_date and value from the amount column
In the case where there exists multiple entries like for 1 then I want to return the order_id, the minimum start_date, the maximum end_date, and for every end_date that is "lower" than todays date I want to sum up the corrected_amounts and also add this to the amount where end_date is "bigger" than today.
So for the table above the result would look like
order_id start_date end_date amount
1 2020-01-01 2020-11-25 595
2 2020-01-01 2020-05-30 500
3 2020-01-01 2020-12-31 400
Consider using IF:
WITH TestData AS (
SELECT 1 as order_id, DATE('2020-01-01') as start_date, DATE('2020-01-31') as end_date, 100 as amount, 95 as corrected_amount UNION ALL
SELECT 1, DATE('2020-02-01'), DATE('2020-02-28'), 200, 200 UNION ALL
SELECT 1, DATE('2020-03-01'), DATE('2020-03-30'), 100, 100 UNION ALL
SELECT 1, DATE('2020-10-01'), DATE('2020-11-25'), 200, 95 UNION ALL
SELECT 2, DATE('2020-01-01'), DATE('2020-05-30'), 500, 250 UNION ALL
SELECT 3, DATE('2020-01-01'), DATE('2020-12-31'), 400, 5
)
SELECT order_id,
MIN(start_date) AS start_date,
MAX(end_date) AS end_date,
IF(COUNT(*) > 1,
SUM(IF(end_date < CURRENT_DATE(), corrected_amount, amount)),
SUM(amount)
) as amount
FROM TestData
GROUP BY order_id
The result is:

bigquery - calculate monthly outstanding values

I'm trying to solve the following problem:
a user took three loans with running times of 3,4 and 5 months.
How to calculate in BigQuery for each point in time, how much he owns?
I know to do this calculation in R or Python but would clearly prefer a BigQuery/SQL solution.
Thank you!
I have the data:
Take Date Return Date Sum
2016-01-01 2016-03-31 10
2016-02-01 2016-05-31 20
2016-03-01 2016-07-31 50
I need the output like this:
Date Sum
2016-01-01 10
2016-02-01 30
2016-03-01 80
2016-04-01 70
2016-05-01 70
2016-06-01 50
2016-07-01 50
2016-08-01 0
Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2016-01-01' take_date, DATE '2016-03-31' return_date, 10 amount
UNION ALL SELECT 1, DATE '2016-02-01', DATE '2016-05-31', 20
UNION ALL SELECT 1, DATE '2016-03-01', DATE '2016-07-31', 50
), dates AS (
SELECT id, day
FROM (
SELECT id, GENERATE_DATE_ARRAY(
MIN(take_date),
DATE_ADD(DATE_TRUNC(MAX(return_date), MONTH), INTERVAL 1 MONTH),
INTERVAL 1 MONTH
) days
FROM `project.dataset.table`
GROUP BY id
), UNNEST(days) day
)
SELECT d.id, d.day, SUM(IF(d.day BETWEEN t.take_date AND t.return_date, amount, 0)) amount
FROM dates d
LEFT JOIN `project.dataset.table` t
ON d.id = t.id
GROUP BY d.id, d.day
ORDER BY d.day
with result as
Row id day amount
1 1 2016-01-01 10
2 1 2016-02-01 30
3 1 2016-03-01 80
4 1 2016-04-01 70
5 1 2016-05-01 70
6 1 2016-06-01 50
7 1 2016-07-01 50
8 1 2016-08-01 0

How do I write a query to display the months between two dates?

I have a database of IDs with income and start/end dates as below but I have trouble breaking the income per ID per month for the given start/end date range.
A sample of the table data is below:
ID | INCOME | START_DATE | END_DATE
1 | 2000 | 02/01/2016 | 05/31/2016
1 | 1500 | 12/01/2015 | 01/31/2016
2 | 1000 | 01/01/2016 | 04/30/2016
The outcome should be:
ID | INCOME | MONTH
1 | 2000 | 05/2016
1 | 2000 | 04/2016
1 | 2000 | 03/2016
1 | 2000 | 02/2016
1 | 1500 | 01/2016
1 | 1500 | 12/2015
2 | 1000 | 04/2016
2 | 1000 | 03/2016
2 | 1000 | 02/2016
2 | 1000 | 01/2016
How would I write the Oracle SQL such that I am able to produce the above outcome efficiently (assuming the table has thousands of unique IDs)?
You can do this using connect by, like so:
with sample_data as (select 1 id, 2000 income, to_date('01/02/2016', 'dd/mm/yyyy') start_date, to_date('31/05/2016', 'dd/mm/yyyy') end_date from dual union all
select 1 id, 1500 income, to_date('01/12/2015', 'dd/mm/yyyy') start_date, to_date('31/01/2016', 'dd/mm/yyyy') end_date from dual union all
select 2 id, 1000 income, to_date('01/01/2016', 'dd/mm/yyyy') start_date, to_date('30/04/2016', 'dd/mm/yyyy') end_date from dual)
select id,
income,
add_months(trunc(start_date, 'mm'), -1 + level) mnth
from sample_data
connect by prior id = id
and prior income = income
and prior sys_guid() is not null
and add_months(trunc(start_date, 'mm'), -1 + level) <= trunc(end_date, 'mm')
order by id, income desc, mnth desc;
ID INCOME MNTH
---------- ---------- ---------
1 2000 01-MAY-16
1 2000 01-APR-16
1 2000 01-MAR-16
1 2000 01-FEB-16
1 1500 01-JAN-16
1 1500 01-DEC-15
2 1000 01-APR-16
2 1000 01-MAR-16
2 1000 01-FEB-16
2 1000 01-JAN-16
You could use recursive subquery factoring, if you're on 11gR2 or higher:
with r (id, income, this_date, end_date) as (
select id, income, trunc(start_date, 'MM'), trunc(end_date, 'MM')
from your_table
union all
select id, income, this_date + interval '1' month, end_date
from r
where end_date > this_date
)
select id, income, to_char(this_date, 'MM/YYYY') as month
from r
order by id, this_date desc;
ID INCOME MONTH
---------- ---------- -------
1 2000 05/2016
1 2000 04/2016
1 2000 03/2016
1 2000 02/2016
1 1500 01/2016
1 1500 12/2015
2 1000 04/2016
2 1000 03/2016
2 1000 02/2016
2 1000 01/2016
The anchor member gets the starting information - which I'm truncating to the start of the month, probably redundantly, but just in case one starts late enough in the month to cause a problem with interval addition. The recursive member then keeps adding a month to each existing member until it reaches the end date.