calculate partition from two table and using index less than or equals of index in bigquery - sql

i have 2 table, first table is a main table that i want to join and sum partition to second table.
the first table is : main_table
Month
Product
MOB
2020-12-01
B2B
1
2020-12-01
B2B
2
2021-01-01
B2B
1
2020-11-01
B2C
1
2020-11-01
B2C
2
2020-11-01
B2C
3
second table is : second_table
month
Product
MOB
amount
2020-12-01
B2B
0
100
2020-12-01
B2B
2
100
2021-01-01
B2B
1
50
2020-11-01
B2C
-2
50
2020-11-01
B2C
1
55
2020-11-01
B2C
3
100
my expectation result is
Month
Product
MOB
partition_amount
2020-12-01
B2B
1
100
2020-12-01
B2B
2
200
2021-01-01
B2B
1
50
2020-11-01
B2C
1
105
2020-11-01
B2C
2
105
2020-11-01
B2C
3
205
how to calculate partition_amount is when main_table.Month=second_table.Month and main_table.product=second_table.product and the partition is sum of second_table.amount by mob. it would be calculate when second_table.mob <= main_table.mob
anyone can help me to write the query use big query ?

it would be calculate when second_table.mob <= main_table.mob
One method is join and aggregation:
select m.month, m.product, m.mob, sum(s.partition_amount)
from main_table m join
second_table s
on s.month = m.month and
s.product = m.product and
s.mob <= m.mob
group by 1, 2, 3;

Consider below
select any_value(main_table).*,
sum(if(second_table.mob <= main_table.mob, amount, 0)) as partition_amount
from `project.dataset.main_table` main_table
left join `project.dataset.second_table` second_table
using(month, product)
group by format('%t', main_table)
if applied to sample data in your question - output is

Related

How to index match with conditions in sql

I have tables like this:
regist table
userID
registDate
1
2022-01-22
2
2022-01-23
session table
userID
date_key
traffic
null
2022-01-02
facebook
1
2021-01-03
facebook
1
2021-01-04
google
1
2021-01-05
linkedin
2
2021-01-15
facebook
2
2021-01-25
facebook
3
2021-01-20
facebook
Output
userID
date_key
traffic
regist date
1
2021-01-03
facebook
2022-01-22
1
2021-01-04
google
2022-01-22
1
2021-01-05
linkedin
2022-01-22
2
2021-01-15
facebook
2022-01-23
How do I merge the tables so that I can return the regist date. Do I do a right join?
Is this correct?
select *
from sessiontables st
left join registtable rt on st.userID = rt.userID
where st.userID is not null
How to do exist userID exist in regist table statement?
if I understand correctly, You can try to use self join with an aggregate function.
select rt.userID,
st.date_key,
st.traffic,
rt.registDate
from (
SELECT userID,min(date_key) date_key,traffic
FROM sessiontables
GROUP BY traffic,userID
) st
JOIN registtable rt
ON st.userID=rt.userID

Multiple Between Dates Aggregation

I have this dataset that is structured more or less like the following:
Product
Sales Value
Sales Qty
Sales Date
Period 1 start
Period 1 end
Period 2 start
Period 2 end
XXX
6
2
2021-05-20
2021-05-15
2021-05-21
2021-05-22
2021-06-01
YYY
10
3
2021-05-21
2021-05-15
2021-05-21
2021-05-22
2021-06-01
XXX
3
1
2021-05-23
2021-05-15
2021-05-21
2021-05-22
2021-06-01
XXX
6
2
2021-05-24
2021-05-15
2021-05-21
2021-05-22
2021-06-01
I would like to sum the columns "sales value" and "sales quantity" and create 4 more columns called "Period 1 Sales", "Period 1 Qty", "Period 2 Sales", and "Period 2 Qty".
For example, with the data above, I would get four new columns while rows would be grouped by product:
Product
Period 1 start
Period 1 end
Period 2 start
Period 2 end
Period 1 Sales Value
Period 1 Sales Qty
Period 2 Sales Value
Period 2 Sales Qty
XXX
2021-05-15
2021-05-21
2021-05-22
2021-06-01
6
2
9
3
YYY
2021-05-15
2021-05-21
2021-05-22
2021-06-01
0
0
10
3
I'm using SQL Server and right now I am pretty much stuck.
So far I managed to Cross Join my calendar table and sales table to get the table described in the first matrix.
Can anybody help?
Thanks a lot!
The following provides your desired results, with the exception of your value for YYY where in your sample data the values are in period1 not period 2.
select Product,
Period1start, Period1end, Period1start, Period2end,
Sum(P1sales) Period1Sales, Sum(P1Qty) Period1Qty,
Sum(P2sales) Period2Sales, Sum(P2Qty) Period2Qty
from t
cross apply(values(case when SalesDate between Period1start and Period1end then SalesValue else 0 end) )p1s(P1sales)
cross apply(values(case when SalesDate between Period1start and Period1end then SalesQty else 0 end) )p1q(P1Qty)
cross apply(values(case when SalesDate between Period2start and Period2end then SalesValue else 0 end) )p2s(P2sales)
cross apply(values(case when SalesDate between Period2start and Period2end then SalesQty else 0 end) )p2q(P2Qty)
group by product, Period1start, Period1end, Period1start, Period2end
See example DB<>Fiddle

Match group of variables and values with the nearest datetime

I have a transaction table that looks like that:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:11:00 001 101 -45 Y
2021-03-01 10:11:00 001 105 -25 Y
2021-03-01 10:11:00 001 109 -40 Y
The table does not have an id column; the transaction_start for a given store_no will never be the same.
Whenever a transaction is post voided, the transaction is then repeated with the same store_no, item_no but with a negative/minus amount and an equal or higher transaction_start. Also, the column post_voided is then equal to 'Y'.
In the example above, the rows 1-3 have the same transaction_start and store_no, thus belonging to the same receipt, containing three different items (101, 105, 109). The same logic is applied to the other rows: rows 4-5 belong to a same receipt, and so on. In the example, 4 different receipts can be seen. The last receipt, given by the last three rows, is a post voided of the first receipt (rows 1-3).
What I want to do is to change the transaction_start for the post_voided = 'Y' transactions (in my example, only one receipt - represented by the last three rows - has it) to the next/closest datetime of a similar receipt that has the variables store_no, item_no and (negative) amount (but post_voided = 'N') (in my example, the similar ticket is given by the first three rows - store_no, all item_no and (positive) amount match). The transaction_start for the post voided receipt is always equal or higher than the "original" receipt.
Desired output:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:00:00 001 101 -45 Y
2021-03-01 10:00:00 001 105 -25 Y
2021-03-01 10:00:00 001 109 -40 Y
Here a link of the table: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=26142fa24e46acb4213b96c86f4eb94b
Thanks in advance!
Consider below
select a.* replace(ifnull(b.transaction_start, a.transaction_start) as transaction_start)
from `project.dataset.table` a
left join (
select * replace(-amount as amount)
from `project.dataset.table`
where post_voided = 'N'
) b
using (store_no, item_no)
if applied to sample data in your question - output is
Consider below for new / extended example (https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=91f9f180fd672e7c357aa48d18ced5fd)
select x.* replace(ifnull(y.original_transaction_start, x.transaction_start) as transaction_start)
from `project.dataset.table` x
left join (
select b.transaction_start, b.store_no, b.item_no, b.amount amount,
max(a.transaction_start) original_transaction_start
from `project.dataset.table` a
join `project.dataset.table` b
on a.store_no = b.store_no
and a.item_no = b.item_no
and a.amount = -b.amount
and a.post_voided = 'N'
and b.post_voided = 'Y'
and a.transaction_start < b.transaction_start
group by b.transaction_start, b.store_no, b.item_no, b.amount
) y
using (store_no, item_no, amount, transaction_start)
with output

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;
You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

Calculate Average between columns by comparing two rows in SQL Server

I have the below table
BidID AppID AppStatus StatusTime
1 1 In Review 2019-01-02 12:00:00
1 1 Approved 2019-01-02 13:00:00
1 2 In Review 2019-01-04 13:00:00
1 2 Approved 2019-01-04 14:00:00
2 2 In Review 2019-01-07 15:00:00
2 2 Approved 2019-01-07 17:00:00
3 1 In Review 2019-01-09 13:00:00
4 1 Approved 2019-01-09 13:00:00
What I am trying to do is first to calculate the average of statusTime minutes difference by the following logic
First group by the BidID and then by AppID and then calculate the time difference between the StatusTime between In Review and Approved AppStatus
eg
First Group BidID,Then group App ID
, Then First Check for In Review Status and Find the Next Approved status and then have to calculate min difference between the dates
BidID AppID AppStatus BidAverage
1 -> 1,2 -> For App ID 1(2019-01-02 1hour 1.5
15:48:42.000 - 2019-01-02
12:33:36.000)
For App ID 2(2019-01-04 2hour
10:33:12.000 - 2019-01-04
10:33:12.000)
2-> 2 -> For App ID 2(2019-01-04 1 1
10:33:12.000 - 2019-01-04
10:33:12.000)
3-> 1-> No Calculation since no Approved
4-> 1-> No Calculation since no In Review before Approved
Final Average (1.5 + 1) / 2 = 1.25 for the table
The time difference excluding saturday I have already figured out Time Dfference Exluding Weekend using David's suggestion.
I am not sure how to check if AppStatus is first in In Review and then Approved and then only calculate the time difference and if there is no Approved like in BidID 3 then don't use that in the average calculation and then average it across the APPId and then the BidID
Thanks
I think you can just use min() and max() for simplicity to get the times for the bid/app pairs. The rest is just aggregation and more aggregation.
The processing you describe seems to be:
select avg(avg_bid_diff)
from (select bid, avg(diff*1.0) as avg_bid_diff
from (select bid, appid,
datediff(second, min(starttime), max(statustime)) as diff
from t
where appstatus in ('In Review', 'Approved')
group by bid, appid
having count(*) = 2
) ba
group by bid
) b;
This makes assumptions that are consistent with the provided data -- that the statuses don't have duplicates for the bid/app pairs an that approval is always after review.