PostgreSQL: group with most common items: handling overlapping items - sql

Postgresql: 9.3
I have a long log of "shopping cart ids" and the "product ids" each shopping cart contains.
I'm looking for a way to create groups that have the most "product ids" in common. The "product ids" can be in multiple groups at the same time.
As a result I need the "shopping cart ids","product ids" and the name of the groups (group 1, group 2, ...).
If anyone have a hint on how to do it. I know a SQL query is not ideal for it but it's all I have at the moment.
EDIT: With the below query I know groups of xx Shopping Carts have xx Products in common.
WITH a AS (
SELECT Shopping_Cart.Product_Id AS Product_Id, count(Shopping_Cart.Product_Id) AS "count" FROM Shopping_Cart
GROUP BY Shopping_Cart.Product_Id
ORDER BY "count"
)
SELECT a."count" AS "Product in Common", count(DISTINCT Shopping_Cart.id) AS "Shopping Cart Count" FROM a
RIGHT JOIN Shopping_Cart ON Shopping_Cart.Product_Id = a.Product_Id
GROUP BY a."count"
It's better than nothing but if I have 7 shoppers with items 1,2,3 and 7 shoppers with items 4,5,6 they fall into the same group of shoppers with 3 items in common. I need to separate them.

I bet you might want to loop through your Product table joining back on your Shopping table by CartId. maybe something like
DECLARE
ProdId Product%rowtype;
BEGIN
FOR ProdId IN SELECT "Product Id" from ProductTable
LOOP
SELECT ProdId,ProductId,count(CartId)
From ShoppingTable where CartId in
(Select Distinct CartId from Shopping where ProductId = ProdId)
GROUP BY ProdId,ProductId
ORDER by count(CartId) Desc
RETURN NEXt ProdId;
END LOOP;
RETURN;
END
LANGUAGE 'plpgsql' ;

Related

Countif or CASE with multiple conditions

I am trying to figure out the most efficient way to count products being placed in a online cart . I have ranked the first 3 items placed in a cart by purchase time(time they were put in the cart not actual check out time), but now am struggling to figure out a way to count the different combinations of items going into the cart.
Counting the individual ranks is easy enough, but I need to figure out a count for purchasing product 1 first and product 1 second as well as all the combinations possible (5 products total). I only need to count first items in the cart, all combinations of first item in cart to second item in cart, and all combinations of second item in cart to third third item in cart.
SELECT
COUNTIF(product = 'Product1' and rank = 1) as firstpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product1' and rank = 2)) as firstpurchase_product1_secondpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product2' and rank = 2)) as firstpurchase_product1_secondpurchase_product2,
#code would continue for all combinations.
FROM(
customer_info.customer_id as customer_id,
customer_info.session_id as session_id,
customer_info.product_purchased as product,
ROW_NUMBER() OVER (PARTITION BY customer_info.session_id ORDER BY customer_info.purchase_time ASC) AS rank
FROM customer_purchases cp,
WHERE p_date >= "2022-04-12"
)rnk
where rnk.finish_rank in (1,2,3)
This seems like a lot of code, is there a better way to do it? The query is returning 0 for all line except when counting just first purchases, should I be using CASE instead?
Any thoughts or ideas would be appreciated.
Thanks!
Example of input:
Product 1, Product 2, Product 3
Product 1, Product 1, Product 1
Product 4, Product 2, Product 1
Product 3, Product 3, Product 5
Product 4, Product 2, Product 4
--this goes on for hundreds of lines
Output:
Count Product 1 in first column
Count Product 2 in first column
#continue for all 5
Count of customers who put product 1 in cart first AND product 1 in cart
second
Count of customers who put product 1 in cart first AND product 2 in cart second
###continue with all combinations with product 1
Count of customers who put product 2 in cart first and product 1 in cart second
Count of customers who put product 2 in the cart first and product 2 in the cart second
###continue with all combinations of product 2,3,4, and 5
It seems to me that you want to GROUP BY a set of columns (item1, item2, item3) and produce a count of the number of times each combination occurs.
Possibly (it's a little unclear from your wording - a well-formatted table showing example raw data and desired results for that example would be helpful), you want to know an overall count for values of item1 regardless of the other items. This can be achieved via GROUP BY ROLLUP(item1, item2, item3).
So, our aim is to get an unaggregated table with those columns, so that we can aggregate it as described!
You have a long-format table (customer ID, session ID, product, rank) and we want a wide-format table with a column for each value of rank. This is a PIVOT operation:
WITH rnk AS (
SELECT
customer_id,
session_id,
product_purchased AS product,
ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY purchase_time ASC) AS rank
FROM customer_info
WHERE p_date >= "2022-04-12"
QUALIFY rank IN (1,2,3)
),
pivoted AS (
SELECT *
FROM rnk PIVOT(
ANY_VALUE(product) AS item FOR rank in (1,2,3)
)
)
SELECT
item_1,
item_2,
item_3,
COUNT(*) AS N
FROM
pivoted
GROUP BY
ROLLUP(item_1, item_2, item_3)
Does that get you what you want?
A couple of features to note:
I use common table expressions (WITH) to make this more readable
QUALIFY is a filter clause to apply to the output of a window clause
Pivoting requires an aggregation function because in general there could be many records with the same value of session, product, and rank. Here we know there will be one record only, so it's safe to use ANY_VALUE (which 'aggregates' by non-deterministically choosing one of the values).
Just to prevent confusion: ROLLUP will give you something like 'Product A', NULL, NULL for some of its records - this doesn't mean items 2 and 3 don't exist, it's just how it signals those records that group only by item 1 and aggregate over all values of the other items.

Using SQL to create a query of what grid location generates the most income in a shop

I am new to SQL and databases and I have created a basic db for a shop. Tables are: Purchase, Item, Product. I am trying to create a query that will pull back which grid had the most income. I have tried the code below:
SELECT PRODUCT.LATITUDE
, PRODUCT.LONGITUDE
, SUM(PRICE)"TOTAL"
, PRODUCT.PRODUCT_ID
, PURCHASE.PURCHASE_ID
, ITEM.PURCHASE_ID,
FROM PRODUCT
, PURCHASE
, ITEM,
WHERE PURCHASE.PURCHASE_ID = ITEM.PURCHASE_ID
AND ITEM.PRODUCT_ID = PRODUCT.PRODUCT_ID;
Any tips on how best to bring back these details?
Thanks!
Here you go:
SELECT PRODUCT.LATITUDE
, PRODUCT.LONGITUDE
, SUM(PRODUCT.PRICE) "TOTAL"
FROM PUCHARSE JOIN ITEM ON PUCHARSE.ID = ITEM.PUCHARSE_ID JOIN PRODUCT ON PRODUCT.ID = ITEM.PRODUCT_ID
GROUP BY
PRODUCT.LATITUDE,
PRODUCT.LONGITUDE
ORDER BY SUM(PRODUCT.PRICE) DESC
NULLS LAST;
Assuming that PRODUCT table contains information about geolocation in shop, price and from table Purchase you need only PRODUCT_ID(each sold out products is single row in this table - by this you have amount of sold out products in history).

SQL Pivot column values

I have tried following this and this(SQL Server specific solution) but were not helpful.
I have two tables, Product and Sale and I want to find how many products are sold on each day. But I want to pivot the table so that columns become the products name and each row will contain the amount of products sold for each day ordered by the day.
Simplified schema is as following
CREATE TABLE product (
id integer,
name varchar(40),
price float(2)
);
CREATE TABLE sale(
id integer,
product_id integer,
transaction_time timestamp
);
This is what I want
I only managed to aggregate the total sales per day per product but I am not able to pivot the product names.
select date(sale.transaction_date)
, product.id
, product.name
, count(product_id)
from sale inner join
product on sale.product_id = product.id
group by date(sale.transaction_date)
, product.id
, product.name
This is the situation so far
Please suggest.
You need pivoting logic, e.g.
select
s.transaction_date::date,
count(case when p.name = 'intelligent_rubber_clock' then 1 end) as intelligent_rubber_clock,
count(case when p.name = 'intelligent_iron_wallet' then 1 end) as intelligent_iron_wallet,
count(case when p.name = 'practical_marble_car' then 1 end) as practical_marble_car
from sale s
inner join product p
on s.product_id = p.id
group by
s.transaction_date::date;
Since your expected output aggregates by date alone, then only the transaction date should be in your GROUP BY clause. The trick used here is to take the count of a CASE expression which returns 1 when the record is from a given product, and 0 otherwise. This generates conditional counts for each product, all in separate columns. To add more columns, just add more conditional counts.

SQL query with grouping

I have a problem by solving following task:
'Show for every seller how much he earned (quantity * product_price) by selling the product PS4 in the year 2013'
The relations are:
seller(id , seller_name, advertised_by);
product( id, product_name, product_price);
sale(id, seller_id, product_id, quantity, date);
I inserted following data:
INSERT into seller VALUES
(1,'Bob',NULL),
(2,'Mary',1),
(3,'Peter',1),
(4,'Parker',1),
(5,'Jeff',1);
INSERT INTO product VALUES
(1,'PS4',100),
(2,'XBOX One',300),
(3,'Laptop',500);
INSERT INTO sale VALUES
(1,1,1,1,'4 5 2013'),
(2,2,1,2,'5 6 2013'),
(3,3,1,3,'6 6 2013'),
(4,4,1,4,'6 6 2013');
I know not using foreign keys or using varchar for date isn't good but I want to have the example being simple.
SELECT seller.id,seller.seller_name, (sale.quantity * product.price) AS sale
FROM seller,product,sale
WHERE product.id = sale.product_id
AND product.product_name = 'PS4'
AND sale.date like '%2013'
GROUP by seller.id;
I know that I have to use a GROUP BY but grouping by seller.id doesn't work.
You need to group by every column that isn't aggregated, and apply an aggregate function to the others. Here, you need to add sellar_name to the group by clause (which shouldn't change the grouping, as the id is already unique), and sum the sales.
Also, as a side note, using implicit joins (having more than one table in the from clause) has been deprecated for several years, and it's recommended you use an explicit join instead:
SELECT seller.id,seller.seller_name, SUM(sale.quantity * product.price) AS sale
FROM seller
JOIN sale ON sale.seller_id = seller.id
JOIN product ON product.id = sale.product_id
WHERE product.product_name = 'PS4' AND sale.date like '%2013'
GROUP BY seller.id;

Using 2 fields in ORDER BY clause

I have a page that show "special offers", and i need to order the results by discount value. Besides i want that products with quantity=0 are shown at the end of the list (regardless of the discount value).
So, there is any way to do that using only SQL? I mean... if i set "ORDER BY discount, quantity DESC" the list show products ordered by discount, and each groups of discout is ordered by the quantity value... this isn't what i want.
Thanks in advance...
ORDER BY CASE Quantity WHEN 0 THEN 99999999 ELSE Discount END, Quantity DESC
SELECT * FROM `products` ORDER BY discount WHERE quantity > 0
UNION SELECT * FROM `products` WHERE quantity <= 0;
Like this?