BigQuery Unnest based on a condition + cross join

BigQuery Unnest based on a condition + cross join - google-bigquery

Consider the following schema for monthly purchases, where items and payment are nested columns:
user STRING
items RECORD
class STRING
spent INT
payment RECORD
method STRING
e.g. MWE (nested items & payment)
User items payment
Joe class spent method
fruit 45 Direct Debit
drinks 10 Credit Card
fish 20
drugs 35
I'd like to unnest both the field items and payment but in different ways:
Restrict the field 'items' to those ones containing class='fruit' or class='drink' and unstack the values into columns
Regular cross join user - payment methods
One possible solution would be to split into two steps:
with
payments as (
select user, pay.method
from table
cross join table.payments as pay
)
, items_fruit as (
select user, items.spent as spent_fruit
from table
cross join table.items as items
where items.class IN ('fruit')
)
, items_drinks as (
select user, items.spent as spent_drinks
from table
cross join table.items as items
where items.class IN ('drinks')
)
select *
from payments
INNER JOIN items_fruit using(user)
INNER JOIN items_drinks using(user)
i.e. the outcome of the operation looking like below. However, it doesn't seem elegant nor efficient, what is the best way to achieve it?
user payment spent_fruit spent_drinks
Joe Direct Debit 45 10
Joe Credit Card 45 10

Consider below simple approach
select * from (
select user, class, spent, method as payment
from your_table t, t.items, t.payment
)
pivot (sum(spent) as spent for class in ('fruit', 'drinks'))
if applied to sample data in your question - output is

Related

SQL query to return average of results using JOIN and GROUP BY

I have a simple manufacturing job card system that track parts and labor for an assigned job.
It consists of a JobHeader table that holds the Job Card number (JobHeader.JobNo), ID of the part being manufactured (JobHeader.RegNo) and quantity to be manufactured (JobHeader.RegNo).
There is a child table (JobLabour) that tracks all the times that have been worked on the job (JobLabour.WorkedTime)
I'm looking for a query that will return the average time taken to produce a part accross the last 5 job cards for that particular part.
The following query
SELECT TOP 5 JobHeader.RegNo, JobHeader.BOMQty, sum(JobLabour.WorkedTime) AS TotalTime FROM JobHeader INNER JOIN JobLabour ON JobHeader.JobNo=JobLabour.JobNo
WHERE JobHeader.RegNo='RM-BRU-0134'
GROUP BY JobHeader.BOMQty, JobHeader.JobNo, JobHeader.RegNo
will return this:
But what I'm looking for is a query that will return the average BOMQty and average totalTime. Something like this:
Is there a way to do this?

Your question explicitly mentions the "last five" but does not specify how that is determined. Presumably, you have some sort of date/time column in the data that defines this.
In SQL Server, you can use apply:
select jh.*, jl.*
from jobheader jh outer apply
(select top (5) avg(BOMQty) as avg_BOMQty, avg(totalTime) as avg_totalTime
from (select top (5) jl.*
from joblabour jl
where jl.regno = jh.regno
order by jl.<some datetime> -- however you determine the last five
) jl
) jl;
You can add a where clause to the outer query to filter on one or more particular jobs.

If I understand you correctly this will do the work
this will work for 1 RegNo='RM-BRU-0134' at a time
with topFive as (
SELECT TOP 5 JobHeader.RegNo, JobHeader.BOMQty, sum(JobLabour.WorkedTime) AS TotalTime
FROM JobHeader
INNER JOIN JobLabour ON JobHeader.JobNo = JobLabour.JobNo
WHERE JobHeader.RegNo = 'RM-BRU-0134'
GROUP BY JobHeader.BOMQty, JobHeader.JobNo, JobHeader.RegNo
)
select RegNo, avg(BOMQty) as BOMQty, avg(TotalTime) as TotalTime
from topFive
group by RegNo

Expand Join to not limit data

I have a weird question - I understand that Joins return matching data based on the 'ON' stipulation, however the problem I am facing is I need the Business date back for both tables but at the same time i need to join on the date in order to get the totals correct
See below code:
Select
o.Resort,
o.Business_Date,
Occupied,
Comps,
House,
ADR,
Room_Revenue,
Occupied-(Comps+House) AS DandT,
Coalesce(gd.Projected_Occ1,0) AS Projected_Occ1,
Occupied-(Comps+House)+Coalesce(gd.Projected_Occ1,0) as Total
from Occupancy o
left join Group_Details_HF gd
on o.Business_Date = gd.Business_Date
and o.Resort = gd.resort
UNION ALL
select
o.Resort,
o.Business_Date,
Occupied,
Comps,
House,
ADR,
Room_Revenue,
Occupied-(Comps+House) AS DandT,
Coalesce(gd.Projected_Occ1,0) AS Projected_Occ1,
Coalesce(Occupied-(Comps+House),0)+Coalesce(gd.Projected_Occ1,0) as Total
from Occupancy_Forecast o
FULL OUTER JOIN Group_Details_HF gd
on o.Business_Date = gd.Business_Date
and o.Resort = gd.resort
Currently, this gives me the desired results from the Occupancy and Occupancy forecast table however when the business date does not exist in the occupancy forecast table it ignores the group_details table, I need the results to combine the dates when they exist in both or give the unique results for each when there is no match

I have decided to create another pivot table storing the details from Group_Details_HF and then Union together the two tables which has given me the desired result rather than fiddling with the join :)

extending row number in windows function

I am trying to find the top 10 brands and article type on shopping page for etailer.
the logic I am using is as follows:
I am creating a table for both ]using this logic below and storing top 10
WITH CTE AS
(
SELECT shoppingpage_url,
brand,
COUNT(*) AS sp_count
FROM TABLE name
GROUP BY 1,
2
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY shoppingpage_url ORDER BY sp_count DESC)
AS Top_10_flag
FROM cte
I am doing the same for article type and joining them both.
SELECT a.shoppingpage_url,
a.top_10_flag,
brand,
article_type
FROM dev.top10_Brand a
LEFT JOIN dev.top10_Articletype b
ON a.shoppingpage_url = b.shoppingpage_url
AND a.Top_10_flag = b.Top_10_flag
The problem I am facing is for certain pages its just one brand but multiple article types.
I am missing the article types for the pages with brand counts Top_10_flag not equal to or lesser than Article type'Top_10_flag.
how do I prevent this?
sample data
-- brand data table
shoppingpage_url, brand,sp_count,Top_10_flag
url1,brandd,5,1
url2,branda,17,1
url2,brandb,8,2
url2,brandc,4,3
url3,brande,5,1
-- article type table
shoppingpage_url, article_type,sp_count,Top_10_flag
url1,articletype1,5,1
url1,articletype2,5,1
url1,articletype3,5,1
url2,articletype12,17,1
url2,articletype3,8,2
url3,articletype23,5,1
url3,articletype2,5,1
-----
the result I am getting
shoppingpage_url,Top_10_flag, brand, article_type
url1,1,brandd,articletype1
url2,1,branda,articletype12
url2,2,brandb,articletype3
url2,3,brandc,
url3,1,brande,1articletype23
---------------------------
what i want
url1,1,brandd,articletype1
url1,2,,articletype2
url1,3,,articletype3
url2,1,branda,articletype12
url2,2,brandb,articletype3
url2,3,brandc,
url3,1,brande,1articletype23
url3,2,,1articletype2

Are you looking for a full join?
SELECT COALESCE(b.shoppingpage_url, a.shoppingpage_url) as shoppingpage_url,
COALESCE(b.top_10_flag, a.top_10_flag) as top_10_flag,
b.brand, a.article_type
FROM dev.top10_Brand b FULL JOIN
dev.top10_Articletype a
ON a.shoppingpage_url = b.shoppingpage_url AND
a.Top_10_flag = b.Top_10_flag

Access 2013 SQL, three tables, two using sum wrong results

Can someone please help me with this issue? I've scoured the Internet looking at dozens of examples, but i just can't find a solution that works.
I am using Access 2013. The problem is that I am trying to make a query that will highlight all part numbers from a supplier that either has customer back orders and/or overdue deliveries.
I am using three tables:
tbl_Inventory_Master which I require the part number, on hand stock value, and the supplier code.
For any back orders I need to join the tbl_Customer_Back_Order table as I need the count of back order lines and the sum of the back order quantity.
If the supplier has a late delivery, then I need to add the tbl_On_Order table showing the count of overdue deliveries and the sum of the overdue quantities.
The query is retrieving the data but the returned quantities are double what they should be.
SELECT
I.Inventory_Part_Num, I.Description, I.On_Hand_Stock,
COUNT (B.Part_Number) AS Back_Order_Count, SUM(B.Back_Order_Qty) as BO_Qty,
COUNT(O.Part_Number) AS Late_Deliveries_Count, SUM(O.Order_Qty) AS Late_Qty
FROM (tbl_Inventory_Master AS I
LEFT OUTER JOIN tbl_Customer_Back_Order AS B
ON I.Inventory_Part_Num = B.Part_Number)
LEFT OUTER tbl_On_Order AS O
ON I.Inventory_Part_Num = O.Part_Number
WHERE
I.Customer_Code = '274' AND
O.Due_Date < [ENTER TODAYS DATE IN FORMAT DD/MM/YYYY]
GROUP BY I.Inventory_Part_Num, I.Description, I.On_Hand_Stock
For example, for the part number 2022940 I should have 10 back order lines and an overdue quantity of 43. Instead, the query is returning 20 back order lines and an overdue quantity sum of 86.
From the on order table I have three orders totaling 144 pieces, instead the query is returning 960.
Can someone please advise, as this is driving me crazy?

You are joining along unrelated dimensions, so you need to aggregate before joining:
SELECT I.Inventory_Part_Num, I.Description, I.On_Hand_Stock,
B.Back_Order_Count, B.BO_Qty,
O.Late_Deliveries_Count, O.Late_Qty
FROM (tbl_Inventory_Master AS I LEFT OUTER JOIN
(SELECT B.Part_Number, COUNT(*) as Back_Order_Count,
SUM(B.Back_Order_Qty) as BO_Qty
FROM tbl_Customer_Back_Order AS B
GROUP BY B.Part_Number
) as B
ON I.Inventory_Part_Num = B.Part_Number
) LEFT JOIN
(SELECT O.Part_Number, COUNT(O.Part_Number) AS Late_Deliveries_Count,
SUM(O.Order_Qty) AS Late_Qty
FROM tbl_On_Order AS O
WHERE O.Due_Date < [ENTER TODAYS DATE IN FORMAT DD/MM/YYYY]
GROUP BY O.Part_Number
) as O
ON I.Inventory_Part_Num = O.Part_Number
WHERE I.Customer_Code = '274';
Notice the outer aggregation is no longer needed.

BigQuery - Shuffle By error

I have a table of about 5M rows. Note this is just a poc. Ultimately we will need to be in the TB range. I am doing a self join to find permutations of products for a market basket analysis.
I need to find the number of times the combination occurs in a basket, the ratio of occurrences to total baskets, and the number of times the item occurs in all baskets. This is pretty standard. BigQuery does not support selects in the predicate of another select so I needed to create another join I suppose. Here's what I came up with -
select twoItem.upc1,twoItem.upc2,twoItem.twoItemOccurrences, totalUpc.totalUpcCount
from
(
select purchase1.upc as upc1,purchase2.upc as upc2,count(upc1) as twoItemOccurrences
from
conagra.purchase as purchase1
join each conagra.purchase as purchase2
on purchase1.upc = purchase2.upc
group by upc1,upc2
) as twoItem
JOIN EACH
(
select purchase3.upc as upc3, count(*) as totalUpcCount
from conagra.purchase as purchase3
group by upc3
) as totalUpc
on totalUpc.upc3 = twoItem.upc1
LIMIT 50;
I get the following error:
SHUFFLE BY may only be applied to parallelizable queries, but query is not parallelizable: (SELECT * FROM (SELECT [purchase3.upc] AS [upc3], COUNT(*) AS [totalUpcCount]...
Maybe an unpublished limitation?
Any help would be appreciated.

Try running these with GROUP EACH BY on your inner queries. We'll improve the response message for queries like this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas