Joining two tables on two criteria "cannot partition on repeated field" - sql

I'm using BigQuery for this.
I have a subquery that pulls data from a table that has an account_id, product, date, and product_spend fields. This subquery calculates the total lifetime spend for each product for each 'account_id' by adding up each of the line items.
SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY 1, 2
The result looks like this:
table: lifetime
account_id product lifetime_product_spend
===========================================================
A product1 50
A product2 20
B product2 100
B product3 150
C product3 500
I'm trying to preserve the values and join them with a larger query:
SELECT account_id,
product,
month,
SUM(spend)
FROM data_source
WHERE month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY 1, 2, 3
This query has a table that looks like this:
table: monthly
account_id product month spend
=================================================================
A product1 1 10
A product1 2 20
A product1 3 30
A product2 1 5
A product2 2 15
B product2 2 100
B product3 2 100
B product3 3 50
C product3 1 100
C product3 2 400
I'm not using an aggregate to calculate lifetime_product_spend on the second table. Due to the sheer amount of data, I'm only able to include the last 6 months data. That's why I'm calculating the lifetime spend in a different table and joining them.
My current query is failing:
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON d.account_id = u.account_id
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
because it doesn't seem to have assigned the lifetime figures to each product as in the Lifetime table. That's because I'm only joining on account_id. See below for the bad output. I've truncated this table because it basically added the # of outputs I have for lifetime_product_spend (5) and put one for each month, product, and company...because it's ignoring the 'product' assignment for these values:
table: monthly
account_id product month spend lifetime_product_spend
=====================================================================================
A product1 1 10 50
A product1 1 10 20
A product1 1 10 100
A product1 1 10 150
A product1 1 10 500
A product1 2 20 50
A product1 2 20 20
A product1 2 20 100
A product1 2 20 150
A product1 2 20 500
Is there a way for me to join on both of them? I've tried doing a JOIN ON x = x AND y = y:
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
but it gives me this error : "Execution Failed
Error: Cannot partition on repeated field d.product".
I want my final table to look like this:
table: monthly
account_id product month spend lifetime_product_spend
=====================================================================================
A product1 1 10 50
A product1 2 20 50
A product1 3 30 50
A product2 1 5 20
A product2 2 15 20
B product2 2 100 100
B product3 2 100 150
B product3 3 50 150
C product3 1 100 500
C product3 2 400 500
I think I need "FLATTEN" somewhere, but I can't seem to get it in the right place. Thanks for reading.

SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM FLATTEN(data_source, product) d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
The above works with the original data source flattened around the repeated field d.product. Thanks for the comments and help.

Write "Select .... from usage" as a sub-query and apply an INNER JOIN or LEFT JOIN on data_source table.
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
from data_source d
left join (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
on(d.account_id=u.account_id and d.product=u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend

Related

Joining multiple tables and getting MAX value in subquery PostgreSQL

I have 4 Tables in PostgreSQL with the following structure as you can see below:
"Customers"
ID | NAME
101 Max
102 Peter
103 Alex
"orders"
ID | customer_id | CREATED_AT
1 101 2022-05-12
2 101 2022-06-14
3 101 2022-07-9
4 102 2022-02-14
5 102 2022-06-18
6 103 2022-05-22
"orderEntry"
ID | order_id | product_id |
1 3 10
2 3 20
3 3 30
4 5 20
5 5 40
6 6 20
"product"
ID | min_duration
10 P10D
20 P20D
30 P30D
40 P40D
50 P50D
Firstly I need to select "orders" with the max(created_at) date for each customer this is done with the query (it works!):
SELECT c.id as customerId,
o.id as orderId,
o.created_at
FROM Customer c
INNER JOIN Orders o
ON c.id = o.customer_id
INNER JOIN
(
SELECT customer_id, MAX(created_at) Max_Date
FROM Orders
GROUP BY customer_id
) res ON o.customer_id = res.customer_id AND
o.created_at = res.Max_date
the result will look like this:
customer_id | order_id | CREATED_AT
101 3 2022-07-9
102 5 2022-06-18
103 6 2022-05-22
Secondly I need to select for each order_id from "orderEntry" Table, "products" with the max(min_duration) the result should be:
order_id | max(min_duration)
3 P30D
5 P40D
6 P20D
and then join results from 1) and 2) queries by "order_id" and the total result which I'm trying to get should look like this:
customer_name | customer_id | Order_ID | Order_CREATED_AT | Max_Duration
Max 101 3 2022-07-9 P30D
Peter 102 5 2022-06-18 P40D
Alex 103 6 2022-05-22 P20D
I'm struggling to get query for 2) and then join everything with query from 1) to get the result. Any help I would appreciate!
You could make the first query to an CTE and use that to join the rest of the queries.
Like this.
WITH CTE AS ( SELECT c.id as customerId,
o.id as orderId,
o.created_at
FROM Customer c
INNER JOIN Orders o
ON c.id = o.customer_id
INNER JOIN
(
SELECT customer_id, MAX(created_at) Max_Date
FROM Orders
GROUP BY customer_id
) res ON o.customer_id = res.customer_id AND
o.created_at = res.Max_date)
SELECT customerId,orderId,created_at,p.min_duration
FROM CTE
JOIN (SELECT "orderId", MAX("product_id") as product_id FROM "orderEntry" GROUP BY orderId) oe ON CTE.orderId = oe.orderId
JOIN "product" pr ON oe.product_id = pr."ID"

group by of one column and having count of another

I have a table 'customer' which contains 4 columns
name day product price
A 2021-04-01 p1 100
B 2021-04-01 p1 100
C 2021-04-01 p2 120
A 2021-04-01 p2 120
A 2021-04-02 p1 100
B 2021-04-02 p3 80
C 2021-04-03 p2 120
D 2021-04-03 p2 120
C 2021-04-04 p1 100
With a command
SELECT COUNT(name)
FROM (SELECT name
FROM customer
WHERE day > '2021-03-28'
AND day < '2021-04-09'
GROUP BY name
HAVING COUNT(name) > 2)
I could count number of customer that bought something more than twice in a period of time.
I would like to know in each day (GROUP BY over day) how many customers bought something with this condition that in a period they bought something more than twice.
Suggested Edit:
For above example A and C are valid agents by the condition.
The desired output will be:
day how_many
2021-04-01 2
2021-04-02 1
2021-04-03 1
2021-04-04 1
I interpret your question as wanting to know how many customers made more than one purchase on each day. If so, one method uses two levels of aggregation:
select day,
sum(case when day_count >= 2 then 1 else 0 end)
from (select c.name, c.day, count(*) as day_count
from customer c
group by c.name, c.day
) nc
group by day
order by day;

Displaying results for fixed values in SQL

I am having difficulty in solving the below problem:
I have a table which contains the shopid, date, hour, category and sales amount.
shopid date hour category amount
------------------------------------
1 date1 7 food 10
1 date1 8 food 15
1 date1 10 misc. 5
2 date1 7 food 6
...................................
I am trying to calculate the total sales amount in each hour by food category and display like the following:
shopid category hour amount
------------------------------------
1 food 6 0
1 food 7 5
1 food 8 20
2 food 9 40
...................................
The shops' opening hours are 6 am -10 pm. So for each hour, there might be any sales or not. I was able to perform the hourly summation. But I am unable to display zero and the time when there are no sales at a particular time (e.g. 6 am or any other time between the opening hours) for each sale category.
Use a left join against a list of hours:
select t.shopid, t.category. g.hour, sum(t.amount)
from generate_series(6,22) as g(hour)
left join the_table t on t.hour = g.hour
group by t.shopid, t.category, g.hour
order by t.shopid, t.category, g.hour;
I am trying to calculate the total sales amount in each hour by food category.
This makes sense, but it doesn't make sense to include the shopid in the results.
To do this, you need to generate the rows -- which are all hours and food categories. Then bring in the actual results using left join:
select c.category. g.hour, coalesce(sum(s.amount), 0)
from generate_series(6, 22) g(hour) cross join
(select distinct category from sales) c left join
sales s
on s.hour = g.hour and s.category = c.category
group by c.category, g.hour
order by c.category, g.hour;
If you want results by shop/category/hour, then you can use the same idea:
select sh.shopid, c.category. g.hour,
coalesce(sum(s.amount), 0)
from generate_series(6, 22) g(hour) cross join
(select distinct category from sales) c cross join
(select distinct shopid from sales) sh left join
sales s
on s.shopid = sh.shopid and
s.hour = g.hour and
s.category = c.category
group by sh.shopid, c.category, g.hour
order by sh.shopid, c.category, g.hour;

pgsql -Showing top 10 products's sales and other products as 'others' and its sum of sales

I have a table called "products" where it has 100 records with sales details. My requirement is so simple that I was not able to do it.
I need to show the top 10 product names with sales and other product names as "others" and its sales. so totally my o/p will be 11 rows. 11-th row should be others and sum of sales of all remaining products. Can anyone give me the logic?
O/p should be like this,
Name sales
------ -----
1 colgate 9000
2 pepsodent 8000
3 closeup 7000
4 brittal 6000
5 ariies 5000
6 babool 4000
7 imami 3000
8 nepolop 2500
9 lactoteeth 2000
10 menwhite 1500
11 Others 6000 (sum of sales of remaining 90 products)
here is my sql query,
select case when rank<11 then prod_cat else 'Others' END as prod_cat,
total_sales,ID,rank from (select ROW_NUMBER() over (order by (sum(i.grandtotal)) desc) as rank,pc.name as prod_cat,sum(i.grandtotal) as total_sales, pc.m_product_category_id as ID`enter code here`
from adempiere.c_invoice i join adempiere.c_invoiceline il on il.c_invoice_id=i.c_invoice_id join adempiere.m_product p on p.m_product_id=il.m_product_id join adempiere.m_product_category pc on pc.m_product_category_id=p.m_product_category_id
where extract(year from i.dateacct)=extract(year from now())
group by pc.m_product_category_id) innersql
order by total_sales desc
o/p what i got is,
prod_cat total_sales id rank
-------- ----------- --- ----
BSHIRT 4511697.63 460000015 1
BT-SHIRT 2725167.03 460000016 2
SHIRT 2630471.56 1000003 3
BJEAN 1793514.07 460000005 4
JEAN 1115402.90 1000004 5
GT-SHIRT 1079596.33 460000062 6
T SHIRT 446238.60 1000006 7
PANT 405189.00 1000005 8
GDRESS 396789.02 460000059 9
BTROUSER 393739.48 460000017 10
Others 164849.41 1000009 11
Others 156677.00 1000008 12
Others 146678.00 1000007 13
As #e4c5 suggests, use UNION:
select id, prod_cat, sum(total_sales) as total_sales
with
totals as (
select --pc.m_product_category_id as id,
pc.name as prod_cat,
sum(i.grandtotal) as total_sales,
ROW_NUMBER() over (order by sum(i.grandtotal) desc) as rank
from adempiere.c_invoice i
join adempiere.c_invoiceline il on (il.c_invoice_id=i.c_invoice_id)
join adempiere.m_product p on (p.m_product_id=il.m_product_id)
join adempiere.m_product_category pc on (pc.m_product_category_id=p.m_product_category_id)
where i.dateacct >= date_trunc('year', now()) and i.dateacct < date_trunc('year', now()) + interval '1' year
group by pc.m_product_category_id, pc.name
),
rankedothers as (
select prod_cat, total_sales, rank
from totals where rank <= 10
union
select 'Others', sum(total_sales), 11
from totals where rank > 10
)
select prod_cat, total_sales
from ranked_others
order by rank
Also, I recommend using sargable conditions like the one above, which is slightly more complicated than the one you implemented, but generally worth the extra effort.

How get data from 3 tables?

I have 3 tables
collecton, paymentdata, payment
Payment table is computed table and it has only one product data
So, rcvamt and restamt get from payment table
I have following data
Collection:
id(PK) clientid company Client product total note
1 2001 Company1 Client1 Product1 50000 note1
2 2002 Company2 Client2 Product2 60000 note2
3 2003 Company3 Client3 Product3 70000 note3
PaymentData:
wid(PK)wcid(FK) clientid product rcvamt restamt rcvdate nxtdate Note
1 1 2001 Product1 500 49500 10-1-2015 11-2-2015 abc1
2 1 2001 Product1 800 48700 11-2-2015 12-3-2015 xyz1
3 2 2002 Product2 1500 58500 5-3-2015 6-4-2015 qwe1
Payment
id(PK) wid(FK) clientid product rcvamt restamt
1 2 2001 Product1 1300 48700
2 3 2002 Product2 1500 58500
I want to show a report like
clientid company procudt total rcvamt restamt rcvdate nxtdate note
2001 Company1 Product1 50000 1300 48700 11-2-2015 12-3-2015 xyz1
2002 Company2 Product2 60000 1500 58500 5-3-2015 6-4-2015 qwe1
2003 Company3 Product3 70000 - - - - -
I tried to make it simple:
SELECT DISTINCT
C.clientid
, C.company
, C.product
, C.total
, P.rcvamt
, P.restamt
, ( SELECT TOP 1 rcvdate FROM PaymentData AS PD1 WHERE PD1.ClientID=PD.ClientID AND PD1.Product=PD.Product ORDER BY rcvdate DESC)
, ( SELECT TOP 1 nxtdate FROM PaymentData AS PD1 WHERE PD1.ClientID=PD.ClientID AND PD1.Product=PD.Product ORDER BY rcvdate DESC)
, ( SELECT TOP 1 Note FROM PaymentData AS PD1 WHERE PD1.ClientID=PD.ClientID AND PD1.Product=PD.Product ORDER BY rcvdate DESC)
FROM
Collection C
LEFT OUTER JOIN Payment P
ON C.clientid = P.clientid
LEFT OUTER JOIN PaymentData PD
ON P.clientid = PD.clientid
But I don't know all the relationships between the tables.
The Answer to your Question could look like this
Select Collection.clientid
,Collection.company
,Collection.product
,Collection.total
,Payment.rcvamt
,Payment.restamt
,PaymentData.rcvdate
,PaymentData.nxtdate
,PaymentData.Note
From PaymentData
Inner Join (Select wcid
,Max(PaymentData.rcvdate) as rcvdate
,Max(PaymentData.nxtdate) as nxtdate
FROM PaymentData
GROUP BY wcid) AS SubSelect ON PaymentData.wcid = SubSelect.wcid
AND PaymentData.rcvdate = SubSelect.rcvdate
AND PaymentData.nxtdate = SubSelect.nxtdate
Inner Join Payment on PaymentData.wcid = Payment.id
RIGHT OUTER JOIN Collection ON PaymentData.clientid = Collection.clientid
Here the sqlfiddle to prove my Answer.
Something like this should work. It appears you want an aggregation on the restamt, both other fields are the last payment received. I also assume this is SQL Server due to your name. If it's a different db, please provide the
UPDATE: Changed to LEFT JOIN to handle client 3 without products, fixed typo in product. SQL Fiddle: http://sqlfiddle.com/#!3/8ad566/19/0
SELECT
c.clientid,
c.company,
c.product,
c.total,
SUM(pd.rcvamt) AS rcvamt,
LastPayment.restamt,
LastPayment.rcvdate,
LastPayment.nxtDate,
LastPayment.note
FROM Collection c
LEFT OUTER JOIN PaymentData pd
ON pd.wcid = c.id
LEFT OUTER JOIN (
SELECT
wcid,
restamt,
rcvdate,
nxtdate,
Note,
ROW_NUMBER() OVER (PARTITION BY wcid ORDER BY rcvdate DESC) AS RowNum
FROM PaymentData
) LastPayment
ON LastPayment.wcid = c.id
AND LastPayment.RowNum = 1 -- Get last payment info
GROUP BY
c.clientid,
c.company,
c.product,
c.total,
LastPayment.restamt,
LastPayment.rcvdate,
LastPayment.nxtDate,
LastPayment.note
ORDER BY
c.clientid