I only want the count of Brands (Blue, White, Red) if their corresponding Customer value is listed more than once.
*customer* *productid* *brand*
1 A Red
2 B Blue
1 A Red
2 C Blue
3 B White
1 A Red
2 B Blue
Desired result: I want a single dataset that has the Brands with their tallied count, only for Customers that are repeat purchasers.
*brands* *repeat_purchase*
Red 3
Blue 2
select customer, productid, count(productid) as repeat_purchase
from Public."CustomerData"
group by customer, productid
having count(productid) > 1;
Above is what I have so far, but I can't figure out how to get just two columns: One with the name of each Brand, and one with the total number of times a that each brand is included in a repeat purchase.
Your question seems to want two levels of aggregation:
select brand, sum(cnt)
from (select customer, product, brand, count(*) as cnt
from Public."CustomerData"
group by customer, product, brand
having count(*) >= 2
) t
group by brand;
Related
What if I have table like this and I want to select the best selling product_id.
id
transaction_id
product_id
qty_sold
1
21
2
5
2
22
3
2
3
23
4
2
3
24
2
1
3
25
2
4
I want the best selling product_id with the highest qty_sold
Using SQLS, you can group by the productID, add up the number of sold, and order by the total descending. If we also take the minimum transaction ID per product, if two products come out to have the same total qty, we can take the minimum tran ID to split the tie
SELECT TOP 1 product_id, SUM(qty_sold) as sellcount, MIN(transaction_id) as firsttran
FROM t
GROUP BY product_id
ORDER BY SUM(qty_sold) DESC, MIN(transaction_id)
Once you're happy the sums are right etc, you can remove the , SUM(qty_sold) as sellcount, MIN(transaction_id) from the SELECT if you want/if you only need the prod ID
I have the following transactions table:
customer_id purchase_date product category department quantity store_id
1 2020-10-01 Kit Kat Candy Food 2 store_A
1 2020-10-01 Snickers Candy Food 1 store_A
1 2020-10-01 Snickers Candy Food 1 store_A
2 2020-10-01 Snickers Candy Food 2 store_A
2 2020-10-01 Baguette Bread Food 5 store_A
2 2020-10-01 iPhone Cell phones Electronics 2 store_A
3 2020-10-01 Sony PS5 Games Electronics 1 store_A
I would like to calculate the average number of products purchased (for each product in the table). I'm also looking to calculate averages across each category and each department by accounting for all products within the same category or department respectively. Care should be taken to divide over unique customers AND the product quantity being greater than 0 (a 0 quantity indicates a refund, and should not be accounted for).
So basically, the output table would like below:
...where store_id and average_level_type are partition columns.
Is there a way to achieve this in a single pass over the transactions table? or do I need to break down my approach into multiple steps?
Thanks!
How about using “union all” as below -
Select store_id, 'product' as average_level_type,product as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,product
Union all
Select store_id, 'category' as average_level_type, category as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,category
Union all
Select store_id, 'department' as average_level_type,department as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,department;
If you want to avoid using union all in that case you can use something like rollup() or group by grouping sets() to achieve the same but the query would be a little more complicated to get the output in the exact format which you have shown in the question.
EDIT : Below is how you can use grouping sets to get the same output -
Select store_id,
case when G_ID = 3 then 'product'
when G_ID = 5 then 'category'
when G_ID = 6 then 'department' end As average_level_type,
case when G_ID = 3 then product
when G_ID = 5 then category
when G_ID = 6 then department end As id,
total_quantity,
unique_customer_count,
average
from
(select store_id, product, category, department, sum(quantity) as total_quantity, Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average, GROUPING__ID As G_ID
from transactions
group by store_id,product,category,department
grouping sets((store_id,product),(store_id,category),(store_id,department))
) Tab
order by 2
;
suppose i have a product is and sales column
product id sales
1 1000
2 10000
3 50000
4 12000
5 8000
write an sql query to get all product ids that contribute to top 80 % of sales?
For this, you want a cumulative sum. Presumably, you want the top selling such products, so:
select p.*
from (select p.*,
sum(sales) over (order by sales desc) as running_sales,
sum(sales) over () as total_sales,
from products
) p
where running_sales - sales < 0.8 * total_sales;
This returns the product that reaches or first exceeds 80% of the total sales.
I have a table, sales, which is ordered by descending TotalSales
user_id | TotalSales
----------------------
4 10
2 1.5
5 0.99
3 0.5
1 0.33
What I would like to do is find the percentage of the sum of all sales that the xx% most important sales represent.
For example if I wanted to do it for top 40% sales, here I would get (10+1.5)/(10+1.5+0.99+0.5+0.33)= 86%
But right now I haven't been able to select "top xx% rows".
Edit: DB management system can be MySQL or Vertica or Hive
select Sum(a) as s from sales where a in (Select TotalSales from sales where TotalSales>=x)
GROUP BY a
select Sum(TotalSales) as b from sales group by b
your result is s/b
and x= the percentage you set each time
I have a customers table as following:
customername, ordername, amount
=============================
bob, book, 20
bob, computer, 40
steve,hat, 15
bill, book, 12
bill, computer, 3
steve, pencil, 10
bill, pen, 2
I want to run a query to get the following result:
customername, ordername, amount
=============================
bob, computer, 40
bob, book, 20
bob, ~total~, 60
steve, hat, 15
steve, pencil, 10
steve, ~total~,25
bill, book, 12
bill, computer, 3
bill, pen, 2
bill, ~total~, 17
I want the amount for each customer to be ordered from max to min and a new ordername as "~total~" (must always be the last row for each customer) with a result as sum of all amount for the same customer.
So, in above example, bob should be the first since the total=60, steve the second (total=25) and bill the third (total=17).
Use:
SELECT x.customername,
x.ordername,
x.amount
FROM (SELECT a.customername,
a.ordername,
a.amount,
y.rk,
1 AS sort
FROM CUSTOMERS a
JOIN (SELECT c.customername,
ROW_NUMBER() OVER (ORDER BY SUM(c.amount) DESC) AS rk
FROM CUSTOMERS c
GROUP BY c.customername) y ON y.customername = a.customername
UNION ALL
SELECT b.customername,
'~total~',
SUM(b.amount),
ROW_NUMBER() OVER (ORDER BY SUM(b.amount) DESC) AS rk,
2 AS sort
FROM CUSTOMERS b
GROUP BY b.customername) x
ORDER BY x.rk, x.customername, x.sort, x.amount DESC
You could look at using GROUP BY ROLLUP, but the ordername value would be NULL so you'd have to post-process it to get that replaced with "~total~"...