BigQuery: JOIN on single matching row - sql

I have two tables, one containing orders with a nested line_items structure and another with a pricing history for each product sku code.
Orders Table
order_id
order_date
item_sku
item_quantity
item_subtotal
1
2022-23-07
SKU1
7
12.34
SKU2
1
9.99
2
2022-12-07
SKU1
1
1.12
SKU3
5
32.54
Price History Table
item_sku
effective_date
cost
SKU1
2022-20-07
0.78
SKU2
2022-02-03
4.50
SKU1
2022-02-03
0.56
SKU3
2022-02-03
4.32
Desired Output
order_id
order_date
item_sku
item_quantity
item_subtotal
cost
1
2022-23-07
SKU1
7
12.34
0.78
SKU2
1
9.99
4.50
2
2022-12-07
SKU1
1
1.12
0.56
SKU3
5
32.54
4.32
I'm trying to get the product cost by finding the cost at the time of the order being placed.
SELECT order_id, order_date,
ARRAY(
SELECT AS STRUCT
item_sku,
item_quantity,
item_subtotal,
cost.product_cost
FROM UNNEST(line_items) as items
JOIN `price_history_table` as cost
ON items.item_sku = cost.sku AND effective_date < order_date
) AS line_items,
FROM
`order_data_table`
The above query works but creates a separate line_item array row for each record in the price history table.
How can I match on just the most recent price for that sku. I want to add something like this
ORDER BY effective_date DESC LIMIT 1
But can't work out how to add it.

How can I match on just the most recent price for that sku
You need to add below line into subquery and move join out of select to address correlated subquery issue
QUALIFY 1 = ROW_NUMBER() OVER(PARTITION BY item_sku ORDER BY effective_date DESC)
so, the final query will look like below
SELECT order_id, order_date,
ARRAY_AGG(line_item) AS line_items
FROM (
SELECT order_id, order_date,
STRUCT(item_sku,
item_quantity,
item_subtotal,
cost.product_cost) AS line_item
FROM `order_data_table`, UNNEST(line_items) AS items
JOIN `price_history_table` AS cost
ON items.item_sku = cost.sku AND effective_date < order_date
QUALIFY 1 = ROW_NUMBER() OVER(PARTITION BY order_id, order_date, item_sku ORDER BY effective_date DESC)
)
GROUP BY order_id, order_date
with output

Related

Find the products contributing to the 50% of the total sales using SQL SUM window Function

There are two Tables - orders and item_line
orders
order_id
created_at
total_amount
123
2022-11-11 13:40:50
450.00
124
2022-10-30 00:40:50
1500.00
item_line
order_id
product_id
product_name
quantity
unit_price
123
a1b
milo
4
100.00
123
c2d
coke
5
10.00
124
c2d
coke
150
10.00
The question is:
Find the products contributing to the 50% of the total sales.
My take on this is -
SELECT i.product_name,SUM(o.total_amount)AS 'Net Sales'
FROM item_line i
JOIN orders o on o.order_id = i.order_id
GROUP BY i.product_name
HAVING SUM(o.total_amount) = (SUM(o.total_amount)*0.5);
But this is not correct. SUM windows functions need to be used, but how?
Try the following, explanation is within the query comments:
-- find the the total sales for each product
WITH product_sales AS
(
SELECT product_id, product_name,
SUM(quantity * unit_price) AS product_tot_sales
FROM item_line
GROUP BY product_id, product_name
),
-- find the running sales percentage for each product starting from porduct with highest sales value
running_percentage AS
(
SELECT product_id, product_name, product_tot_sales,
SUM(product_tot_sales) OVER (ORDER BY product_tot_sales DESC) /
SUM(product_tot_sales) OVER () AS running_sales_percentage,
SUM(product_tot_sales) OVER () AS tot_sales
FROM product_sales
)
-- select products that have a running sales percentage less than the min(running_sales_percentage) where running_sales_percentage >= 0.5
-- this will select all of products that contributes of 0.5 of the total sales
SELECT product_id, product_name, product_tot_sales,
tot_sales,
running_sales_percentage
FROM running_percentage
WHERE running_sales_percentage <=
(
SELECT MIN(running_sales_percentage)
FROM running_percentage
WHERE running_sales_percentage >= 0.5
)
You don't need a join with orders table, all data you need is existed in the item_line table.
See demo.

Find oldest record by customer?

for my SQL Server 2016 project I have an Orders table looks like the one below and I want to create a SQL query that shows the oldest order for each customer / product. There are thousands of orders in the Orders table today and I should expect this to grow in size so I want this to perform well.
The goal is the output to look like this:
OrderID
CustomerID
ProductID
OrderDt
OrderAmt
123
1
1
1/1/2021
$50
456
1
2
1/2/2021
$20
345
2
1
1/1/2021
$30
The data in the Orders table today look like this:
OrderID
CustomerID
ProductID
OrderDt
OrderAmt
123
1
1
1/1/2021
$50
758
1
1
1/2/2021
$80
563
1
2
1/3/2021
74
684
1
2
1/4/2021
23
456
1
2
1/2/2021
$20
345
2
1
1/1/2021
$30
The canonical method is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by customerid, productid order by orderdt, orderid) as seqnum
from t
) t
where seqnum = 1;
With an index on (customerid, productid, orderdt), then a correlated subquery might be a smidgen faster:
select t.*
from t
where t.orderdt = (select min(t2.orderdt)
from t t2
where t2.productid = t.productid and t2.customerid = t.customerid
);
Or a slightly less performance method without subqueries:
select top (1) with ties t.*
from t
order by row_number() over (partition by productid, customerid order by orderdt);

Calculating multiple averages across different parts of the table?

I have the following transactions table:
customer_id purchase_date product category department quantity store_id
1 2020-10-01 Kit Kat Candy Food 2 store_A
1 2020-10-01 Snickers Candy Food 1 store_A
1 2020-10-01 Snickers Candy Food 1 store_A
2 2020-10-01 Snickers Candy Food 2 store_A
2 2020-10-01 Baguette Bread Food 5 store_A
2 2020-10-01 iPhone Cell phones Electronics 2 store_A
3 2020-10-01 Sony PS5 Games Electronics 1 store_A
I would like to calculate the average number of products purchased (for each product in the table). I'm also looking to calculate averages across each category and each department by accounting for all products within the same category or department respectively. Care should be taken to divide over unique customers AND the product quantity being greater than 0 (a 0 quantity indicates a refund, and should not be accounted for).
So basically, the output table would like below:
...where store_id and average_level_type are partition columns.
Is there a way to achieve this in a single pass over the transactions table? or do I need to break down my approach into multiple steps?
Thanks!
How about using “union all” as below -
Select store_id, 'product' as average_level_type,product as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,product
Union all
Select store_id, 'category' as average_level_type, category as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,category
Union all
Select store_id, 'department' as average_level_type,department as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,department;
If you want to avoid using union all in that case you can use something like rollup() or group by grouping sets() to achieve the same but the query would be a little more complicated to get the output in the exact format which you have shown in the question.
EDIT : Below is how you can use grouping sets to get the same output -
Select store_id,
case when G_ID = 3 then 'product'
when G_ID = 5 then 'category'
when G_ID = 6 then 'department' end As average_level_type,
case when G_ID = 3 then product
when G_ID = 5 then category
when G_ID = 6 then department end As id,
total_quantity,
unique_customer_count,
average
from
(select store_id, product, category, department, sum(quantity) as total_quantity, Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average, GROUPING__ID As G_ID
from transactions
group by store_id,product,category,department
grouping sets((store_id,product),(store_id,category),(store_id,department))
) Tab
order by 2
;

How to Retrieve Maximum Value of Each Group? - SQL

There is a table tbl_products that contains data as shown below:
Id Name
----------
1 P1
2 P2
3 P3
4 P4
5 P5
6 P6
And another table tbl_inputs that contains data as shown below:
Id Product_Id Price Register_Date
----------------------------------------
1 1 10 2010-01-01
2 1 20 2010-10-11
3 1 30 2011-01-01
4 2 100 2010-01-01
5 2 200 2009-01-01
6 3 500 2011-01-01
7 3 270 2010-10-15
8 4 80 2010-01-01
9 4 50 2010-02-02
10 4 92 2011-01-01
I want to select all products(id, name, price, register_date) with maximum date in each group.
For Example:
Id Name Price Register_Date
----------------------------------------
3 P1 30 2011-01-01
4 P2 100 2010-01-01
6 P3 500 2011-01-01
10 P4 92 2011-01-01
select
id
,name
,code
,price
from tbl_products tp
cross apply (
select top 1 price
from tbl_inputs ti
where ti.product_id = tp.id
order by register_date desc
) tii
Although is not the optimum way you can do it like:
;with gb as (
select
distinct
product_id
,max(register_date) As max_register_date
from tbl_inputs
group by product_id
)
select
id
,product_id
,price
,register_date
from tbl_inputs ti
join gb
on ti.product_id=gb.product_id
and ti.register_date = gb.max_register_date
But as I said earlier .. this is not the way to go in this case.
;with cte as
(
select t1.id, t1.name, t1.code, t2.price, t2.register_date,
row_number() over (partition by product_id order by register_date desc) rn
from tbl_products t1
join tbl_inputs t2
on t1.id = t2.product_id
)
select id, name, code, price, register_date
from cte
where rn = 1
Something like this..
select id, product_id, price, max(register_date)
from tbl_inputs
group by id, product_id, price
you can use the max function and the group by clause. if you only need results from the table tbl_inputs you even don't need a join
select product_id, max(register_date), price
from tbl_inputs
group by product_id, price
if you need field from the tbl_prducts you have to use a join.
select p.name, p. code, i.id, i.price, max(i.register_date)
from tbl_products p join tbl_inputs i on p.id=i.product_id
grooup by p.name, p. code, i.id, i.price
Try this:
SELECT id, product_id, price, register_date
FROM tbl_inputs T1 INNER JOIN
(
SELECT product_id, MAX(register_date) As Max_register_date
FROM tbl_inputs
GROUP BY product_id
) T2 ON(T1.product_id= T2.product_id AND T1.register_date= T2.Max_register_date)
This is, of course, assuming your dates are unique. if they are not, you need to add the DISTINCT Keyword to the outer SELECT statement.
edit
Sorry, I didn't explain it very well. Your dates can be duplicated, it's not a problem as long as they are unique per product id. if you can have duplicated dates per product id, then you will have more then one row per product in the outcome of the select statement I suggested, and you will have to find a way to reduce it to one row per product.
i.e:
If you have records like that (when the last date for a product appears more then once in your table with different prices)
id | product_Id | price | register_date
--------------------------------------------
1 | 1 | 10.00 | 01/01/2000
2 | 1 | 20.00 | 01/01/2000
it will result in having both of these records as outcome.
However, if the register_date is unique per product id, then you will get only one result for each product id.

How to join tables based on the dates

COMMISSION table
PRODUCT_ID DATE COMMISSION
1 20110101 27.00
1 20120101 28.00
1 20130705 30.00
2 20110101 17.00
2 20120501 16.00
2 20130101 18.00
...
ORDER table
PRODUCT_ID DATE PRICE
1 20110405 2500
2 20130402 3000
2 20130101 1900
Desired output
PRODUCT_ID DATE PRICE COMMISSION
1 20110405 2500 27.00
2 20130402 3000 16.00
2 20130101 1900 18.00
Commission table records commission % based on the product id and date.
Order table is basically a record of orders placed on a particular date,
I'd like to join two tables and bring the appropriate commission based on the date of the order. For example, you can see that the first order's commission is 27.00 as the date for the product_id 1 falls between 20110101 and 20120101.
How do I do this? Seems like a simple 1 to n relationship but I can't figure it out.
Try
SELECT o.*,
(
SELECT TOP 1 commission
FROM commission
WHERE product_id = o.product_id
AND date <= o.date
ORDER BY date DESC
) commission
FROM [order] o
Here is SQLFiddle demo