Finding the most frequently occurring combination - sql

I have two table with name Orders and Products,The order table contains the number of specific orders made by a customer and the products included in that order is in the Products table.
My requirement is to get the number of total orders against the most frequently coming products.
means for these products product 1,Product 2, product 3 what is the total orders,If an order contains 10 Products which contains Product 1 ,Product 2 and Product 3 that order should be counted.
For an order_id there can be multiple products will be there and i'm confused on how to get this result.Can anyone share or suggest a solution on how to get this?
I'm using PostgreSQL.
Below is the sample query ,
SELECT
"Orders"."order_id",pr.product_name
FROM
"data"."orders" AS "Orders"
LEFT JOIN data.items i On i."order_id"="Orders"."order_id"
LEFT join data.products pr on pr."product_id"=i."product_id"
WHERE TO_CHAR("Orders"."created_at_order",'YYYY-MM-DD') BETWEEN '2019-02-01' AND '2019-04-30'
ORDER BY "Orders"."order_id"
Desired Result will be like this(3 columns),The most purchased product combination with number of occurring orders.
Product 1, Product 2,Product 3,etc..... , Number Of Orders
This is the sample data output,Need the product list which is purchased in combination the most.(As of now i have given only 3 columns for sample but it may vary according to the number of PRODUCTS in an order).
and example

SELECT
"Orders"."order_id",
string_agg(DISTINCT pr.product_name,::character varying, ',') AS product_name
count(1) AS product_no
FROM
"data"."orders" AS "Orders"
LEFT JOIN data.items i On i."order_id"="Orders"."order_id"
LEFT join data.products pr on pr."product_id"=i."product_id"
WHERE TO_CHAR("Orders"."created_at_order",'YYYY-MM-DD') BETWEEN '2019-02-01' AND '2019-04-30'
GROUP BY "Orders"."order_id"
ORDER BY count(1);
You can try to use group by clause.

If you want to generally get the number of orders against some products then you can just count the number of orders grouped on the products from product table. Query should look something like this:
SELECT product_id, COUNT(*)
FROM data.products
GROUP BY product_id
ORDER BY COUNT(*)
LIMIT 1;
Hope this helps!

Try to use GROUP BY and take MOST counted value as below-
SELECT
pr.product_name,
COUNT(DISTINCT Orders.order_id)
FROM
"data"."orders" AS "Orders"
LEFT JOIN data.items i On i."order_id"="Orders"."order_id"
LEFT join data.products pr on pr."product_id"=i."product_id"
WHERE TO_CHAR("Orders"."created_at_order",'YYYY-MM-DD') BETWEEN '2019-02-01' AND '2019-04-30'
GROUP BY pr.product_name
ORDER BY COUNT(DISTINCT Orders.order_id) DESC
LIMIT 1 -- You can use the LIMIT or NOT as per requirement

Related

Cant sort correctly when use GROUP BY

I Have two tables:
Products (id, product_name, option)
Prices (id, product_id, price, shop, available)
Each product can have several prices that each shop enters.
I want select products and sort them by price(lowest price) low to high.
But this code deos not work correctly:
Select
product_name,
Prices.price
FROM Products
LEFT JOIN Prices ON Prices.product_id=Products.id
AND Prices.available="yes"
GROUP BY product_name
ORDER BY Prices.price
LIMIT 0,10
The above code at first Group products by name then sort them by price
And its my problem.
I dont want to show one product a few times
Is there any solution?
Select
product_name,
MIN(Prices.price) as mprice
FROM Products
LEFT JOIN Prices ON Prices.product_id=Products.id AND Prices.available="yes"
GROUP BY product_name
ORDER BY mprice
LIMIT 0,10
You are close. You only need an aggregation function in the ORDER BY. However, I would also advise you to use table aliases, an INNER JOIN, and single quotes for the string constant:
SELECT p.product_name, MIN(pr.price)
FROM Products p INNER JOIN
Prices pr
ON pr.product_id = p.id
pr.available = 'yes'
GROUP BY p.product_name
ORDER BY MIN(pr.price)
LIMIT 0, 10;
If you GROUP BY you wont get the different prices for the same product. You could check for the MAX value or count them using agregate functions but you wont get your results this way. Also, you should have an intermediary table since a product can be in defferent models with different prices. And I don't get why you use a name attribute. Why don't you just use the product ID? And do something like
SELECT products.product_name, prices.price
FROM products, prices
WHERE products.ID = prices.product_ID;
ORDER BY prices.price ASC

SQL based Northwind, hard time on filtering

So in a practice site there is a question:
Which Product is the most popular? (number of items)
This means that There are Customers, and they want to know the most popular Ordered Product by the Customers(Overall Orders of TOP 1 ordered Product).
I Sincerely do not know How to solve this one.
Any help?
What I've tried so far is:
SELECT TOP(1) ProductID, ProductName
FROM Products
GROUP BY ProductID, ProductName
ORDER BY COUNT(*) DESC
But that's far from what they have asked.
In this one, I just get the top 1 Product with the lowest count, but that doesn't mean anything about the customers who ordered this product.
That only means that this specific Item could have been at low quantity and still is lower then the others, while the others were very high quantity and now just low (but still not low enough)
I hope I was clear enough.
If the data exists in that table, you might just need to order by something more sophisticated than count, like summing the quantity (if that column exists). Also, if ProductID and ProductName are already unique identifiers, note that you don't need the group by and sum at all.
SELECT TOP(1) ProductID, ProductName
FROM Products
GROUP BY ProductID, ProductName
ORDER BY SUM(Quantity) DESC
I don't know what your keys are, but it sounds like you actually want to be counting how many times it was ordered by customers, so you may need to join on the Customers table. I am assuming here that you have a table Orders, that has one line per order and shares the ProductID key. I also assume that ProductID is unique in Products (which may not be true based on your first query).
SELECT TOP(1) Products.ProductID, Products.ProductName
FROM Products
LEFT JOIN Orders
ON Orders.ProductID = Products.ProductID
GROUP BY Products.ProductID, Products.ProductName
ORDER BY COUNT(Orders.OrderID) DESC
This really depends on what tables and keys you have available to you.
Select top 1 P.ProductID,P.ProductName,Sum(OD.Quantity)AS Quantity
From [Order Details] OD
inner join Products P ON P.ProductID = OD.ProductID
Group By P.ProductID,P.ProductName
Order by Quantity Desc
You can workout something like this, (Table name/schema may differ)
with cte_product
as
(
select ProductID,Rank() over (order by Count(1) desc) as Rank from
Orders O
inner join Product P
on P.ProductID = O.ProductID
group by ProductID
)
select P.productID, P.ProductName from
cte_product ct
inner join product p
on ct.productId = p.ProductID
where ct.Rank = 1
Crux is usage of RANK() to get most popular product. Rest you may fetch columns as per need using relevant Joins.

Get last record making multiple searches in the same table

I have one table called Products.
Fields
product_id
Type (IN or OUT)
Date (date of registration)
I have several entries in the table, with entries and product outputs with their respective dates.
How do I find products that do not have OUTPUT movement after the LAST ENTRY?
I already tried:
SELECT Products.product_id, Products.Type, MAX(Products.Date)
FROM Products PRODUCTS_1
LEFT OUTER JOIN Products PRODUCTS
ON PRODUCTS_1.Product_Id = PRODUCTS.Product_Id
AND PRODUCTS_1.Type='O'}
WHERE (PRODUCTS.Type='I')
AND (PRODUCTS_1.Date>PRODUCTS.Date)
GROUP BY Products.product_id, Products.Type;
This query will list all products for which the latest entry is I. I think that's what you are asking for.
SELECT p.product_id
FROM products p
GROUP BY p.product_id
HAVING MAX(Date) = MAX(CASE WHEN Type = 'I' THEN Date END)

Aggregate after join without duplicates

Consider this query:
select
count(p.id),
count(s.id),
sum(s.price)
from
(select * from orders where <condition>) as s,
(select * from products where <condition>) as p
where
s.id = p.order;
There are, for example, 200 records in products and 100 in orders (one order can contain one or more products).
I need to join then and then:
count products (should return 200)
count orders (should return 100)
sum by one of orders field (should return sum by 100 prices)
The problem is after join p and s has same length and for 2) I can write count(distinct s.id), but for 3) I'm getting duplicates (for example, if sale has 2 products it sums price twice) so sum works on entire 200 records set, but should query only 100.
Any thoughts how to sum only distinct records from joined table but also not ruin another selects?
Example, joined table has
id sale price
0 0 4
0 0 4
1 1 3
2 2 4
2 2 4
2 2 4
So the sum(s.price) will return:
4+4+3+4+4+4=23
but I need:
4+3+4=11
If the products table is really more of an "order lines" table, then the query would make sense. You can do what you want by in several ways. Here I'm going to suggest conditional aggregation:
select count(distinct p.id), count(distinct s.id),
sum(case when seqnum = 1 then s.price end)
from (select o.* from orders o where <condition>) s join
(select p.*, row_number() over (partition by p.order order by p.order) as seqnum
from products p
where <condition>
) p
on s.id = p.order;
Normally, a table called "products" would have one row per product, with things like a description and name. A table called something like "OrderLines" or "OrderProducts" or "OrderDetails" would have the products within a given order.
You are not interested in single product records, but only in their number. So join the aggregate (one record per order) instead of the single rows:
select
count(*) as count_orders,
sum(p.cnt) as count_products,
sum(s.price)
from orders as s
join
(
select order, count(*) as cnt
from products
where <condition>
group by order
) as p on p.order = s.id
where <condition>;
Your main problem is with table design. You currently have no way of knowing the price of a product if there were no sales on it. Price should be in the product table. A product cost a certain price. Then you can count all the products of a sale and also get the total price of the sale.
Also why are you using subqueries. When you do this no indexes will be used when joining the two subqueries. If your joins are that complicated use views. In most databases they can indexed

I need to compare COUNT() results in Access SQL and perform functions based on the result

I have the following tables:
VENDOR: PRODUCT: ITEM: STORE:
- VENDOR_ID - PRODUCT_ID - ITEM_ID - STORE_ID
- VENDOR_NAME - PRODUCT_DESC - STORE_ID - STORE_NAME
- VENDOR_ID - PRODUCT_ID - STORE_LOCATION
- ITEM_PRICE
The VENDOR table stores information about the product's vendor, the PRODUCT table stores information about the products, the ITEM table is essentially the inventory of the store, with a record for each item that is in stock. If there is more than one of the same product in the item inventory, it has a different ITEM_ID, but ITEM_ID and PRODUCT_ID are the primary key. The STORE stores store information :D
I want to list the names of the vendors who provide products to the store that has the greatest range of products. So, I think I'll need to use a COUNT function to count the number of distinct PRODUCT_IDs in each STORE's ITEM records.
I don't really know how to get started on this, I would appreciate a bit of help.
This is what I have so far:
SELECT DISTINCT VENDOR.VENDOR_NAME AS [Vendor Name]
FROM VENDOR, PRODUCT, ITEM, STORE
WHERE STORE.STORE_ID
IN (SELECT STORE_ID
FROM ITEM);
This returns all the vendors, but I need to add a COUNT or MAX function in there, but I'm not sure how to go about doing that. Any help would be greatly appreciated.
SELECT Z.VENDOR_NAME AS VENDOR_NAME
FROM
(
SELECT B.PRODUCT_ID,A.STORE_ID
FROM
(
SELECT A.STORE_ID,MAX(COUNT_PRODUCTS_PER_EACH_STORE) AS MAX_COUNT_PRODUCTS_PER_EACH_STORE
FROM
(
SELECT STORE_ID,COUNT(DISTINCT PRODUCT_ID) AS COUNT_PRODUCTS_PER_EACH_STORE
FROM ITEM
GROUP BY STORE_ID
) A
) A,
ITEM B
WHERE A.STORE_ID = B.STORE_ID
) X,
PRODUCT Y,
VENDOR Z
WHERE X.PRODUCT_ID = Y.PRODUCT_ID
AND Y.VENDOR_ID = Z.VENDOR_ID;
To break it down you first want to get "the store that has the greatest range of products", since access doesn't allow COUNT(DISTINCT you need to use a subquery to get distinct records before counting:
SELECT TOP 1
Store_ID,
COUNT(Product_ID) AS StoreProducts
FROM ( SELECT DISTINCT Store_ID, Product_ID
FROM Item
) i
GROUP BY Store_ID
ORDER BY COUNT(Product_ID) DESC;
This returns the store_ID of the store that has the highest number of distinct products.
Then you need to get all the vendor_IDs that supply products to any given store (and the number of products supplied if required)
SELECT Item.Store_ID,
Vendor.Vendor_ID,
Vendor.Vendor_Name,
COUNT(Product.Product_ID) AS ItemsSuppliedToStore
FROM (Item
INNER JOIN Product
ON item.Product_ID = Product.Product_ID)
INNER JOIN Vendor
ON Vendor.Vendor_ID = Product.Vendor_ID
GROUP BY Item.Store_ID, Vendor.Vendor_ID, Vendor.Vendor_Name;
This will give you all venders by store_ID and the number of items they have at that store
Then you can combine your two queries:
SELECT Store.Store_ID,
Store.Store_Name,
TopStore.StoreProducts,
Vendor.Vendor_ID,
Vendor.Vendor_Name,
COUNT(Product.Product_ID) AS ItemsSuppliedToStore
FROM ((((SELECT TOP 1
Store_ID,
COUNT(Product_ID) AS StoreProducts
FROM ( SELECT DISTINCT Store_ID, Product_ID
FROM Item
) i
GROUP BY Store_ID
ORDER BY COUNT(Product_ID) DESC
) AS TopStore
INNER JOIN Store
ON Store.Store_ID = TopStore.Store_ID)
INNER JOIN Item
ON Item.Store_ID = TopStore.Store_ID)
INNER JOIN Product
ON item.Product_ID = Product.Product_ID)
INNER JOIN Vendor
ON Vendor.Vendor_ID = Product.Vendor_ID
GROUP BY Store.Store_ID, Store.Store_Name, TopStore.StoreProducts,
Vendor.Vendor_ID, Vendor.Vendor_Name;
SELECT V.VENDOR_NAME, COUNT(I.PRODUCT_ID)
FROM ((VENDOR AS V
INNER JOIN PRODUCT AS P ON V.VENDOR_ID = P.VENDOR_ID)
INNER JOIN ITEM AS I ON P.PRODUCT_ID = I.PRODUCT_ID)
INNER JOIN STORE AS S ON I.STORE_ID = S.STORE_ID
WHERE S.STORE_NAME = 'Store Name"
GROUP BY V.VENDOR_NAME
ORDER BY COUNT(I.PRODUCT_ID) DESC
The above query will get the vendor's name and a count of the product ids. I put the WHERE clause in there so you could specify a certain store. If you want all stores then remove the WHERE.