Shopping basket: Products with options, how to check if a combination exists? - sql

I have a bit of a SQL problem that I'm hoping someone can help.
On my website someone can order a product and then options for that products, e.g. they buy a car, and options such as tyres, stereo system. The customer can add multiple items to their basket for the same main item (a car), but then different options for it, such as:
car, pirelli, b&o
car, michelin, b&o
I have this DB structure to achieve this:
orders
----------
order_ref
total
orders_parts
------------
id
order_ref
part_id
quantity
orders_parts_options
--------------------
id
option
orders
------
12345 1000.01
orders_parts
------------
1001 12345 Audi 1
1002 12345 Audi 1
orders_parts_options
--------------------
1001 michelin
1001 b&o
1002 pirelli
1002 b&o
So here you can see I have two Audis in my shopping basket, one with michelin, one with pirelli, both with B&O audio systems. My question; let's say another call is made to add an item to the shopping basket for this order, e.g. another Audi with Michelin and B&O, what SQL would I need to get orders_parts.id of 1001?
I came up with this bit of rubbish:
SELECT op.id FROM orders_parts op
INNER JOIN orders_parts_options opo ON (op.id = opo.id)
WHERE op.order_ref = 12345 AND (opo.option = 'michelin' OR opo.option = 'b&o')
But I get this result
1001
1001
1002
from that. I'm guessing I need to aggregate it and have a having count = 2 in there, but just cannot work it out. Anyone smarter out there who can help me?
(just to add, the DB is normalized in real life, but for clarity I've but full text values in there).

SELECT op.id, count(op.id) as n FROM orders_parts op
INNER JOIN orders_parts_options opo ON (op.id = opo.id)
WHERE op.order_ref = 12345 AND (opo.option = 'michelin' OR opo.option = 'b&o')
GROUP BY op.id;
This query retrieves all parts (and their number) for the order_ref=12345 where option = 'michelin' or option = 'b&o'

Related

POSTGRESQL - Finding specific product when

I've attempted to write a query but I've not managed to get it working correctly.
I'm attempting to retrieve where a specific product has been bought but where it also has been bought with other products. In the case below, I want to find where product A01 has been bought but also when it was bought with other products.
Data (extracted from tables for illustration):
Order | Product
123456 | A01
123457 | A01
123457 | B02
123458 | C03
123459 | A01
123459 | C03
Query which will return all orders with product A01 without showing other products:
SELECT
O.NUMBER
O.DATE
P.NUMBER
FROM
ORDERS O
JOIN PRODUCTS P on P.ID = O.ID
WHERE
P.NUMBER = 'A01'
I've tried to create a sub query which brings back just orders of product A01 but I don't know how to place it in the query for it to return all orders containing product A01 as well as any other product ordered with it.
Any help on this would be very grateful.
Thanks in advance.
You can use conditional SUM to detect if one ORDER group have one ore more 'A01'
CREATE TABLE orders
("Order" int, "Product" varchar(3))
;
INSERT INTO orders
("Order", "Product")
VALUES
(123456, 'A01'),
(123457, 'A01'),
(123457, 'B02'),
(123458, 'C03'),
(123459, 'A01'),
(123459, 'C03')
;
SELECT "Order"
FROM orders
GROUP BY "Order"
HAVING SUM(CASE WHEN "Product" = 'A01' THEN 1 ELSE 0 END) > 0
I appreciated Juan's including the DDL to create the database on my system. By the time I saw it, I'd already done all the same work, except that I got around the reserved word problem by naming that field Order1.
Sadly, I didn't consider that either of the offered queries worked on my system. I used MySQL.
The first one returned the A01 lines of the two orders on which other products were ordered too. I took Alex's purpose to include seeing all items of all orders that included A01. (Perhaps he wants to tell future customers what other products other customers have ordered with A01, and generate sales that way.)
The second one returned the three A01 lines.
Maybe Alex wants:
select *
from orders
where Order1 in (select Order1
from orders
where Product = 'A01')
It outputs all lines of all orders that include A01. The subquery makes a list of all orders with A01. The first query returns all lines of those orders.
In a big database, you might not want to run two queries, but this is the only way I see to get the result I understood Alex wanted. If that is what he wanted, he would have to run a second query once armed with output from the queries offered, so there's no real gain.
Good discussion. Thanks to all!
Use GROUP BY clause along with HAVING like
select "order", Product
from data
group by "order"
having count(distinct product) > 1;

Semi-hierarchical SQL query with multiple tables and possible outer joins

I have products. Each product is made up of items and assemblies. Assemblies themselves can be made up of items too. So it's a hierarchy but limited in depth. What I would like to do is list products with the items and assemblies it contains, plus any items in the product's assemblies.
This is the output I would like to see. It doesn't have to look exactly like this, but the aim is to show the items in the product, then the assemblies and within each assembly the items with in it. The number of columns isn't fixed, if more are necessary to show the items in the assemblies there is no problem with that.
ProductID ProductName AssemblyID AssemblyName ItemID ItemName
--------- ----------- ---------- ------------ ------ --------
P0001001 Product One
I0045 Item A
I0082 Item B
A00023 Assembly 1
I0320 Item 1
I0900 Item 2
A00024 Assembly 2
I0877 Item 3
I0900 Item 2
I0042 Item 4
This I can then use to build a report grouped on the Product ID to list the contents of each product.
This is the table structure I have at the moment.
+ProductList-+ +ProductItems-+
|ProductID | ----------> |ProductID | +ItemList-+
|ProductName | \ |ItemID | --------------------------------> |ItemID |
|Price | \ +-------------+ > |ItemName |
+------------+ \ / |Cost |
\ +ProductAssemblies-+ / +---------+
\-> |ProductID | +AssemblyItems-+ /
+-- |AssemblyID | ----> |AssemblyID | /
| |BuildTime | |ItemID | ---/
| +------------------+ +--------------+
|
| +AssemblyList-+
+-> |AssemblyID |
|AssemblyName |
+-------------+
What kind of SELECT statement would I need to do this.
I think I need some sort of outer join but I'm not totally up on SQL syntax to know how to structure the select statement. All my efforts have always led to the product being listed multiple times for each item and assembly. So if a product has 3 items and 2 assemblies, the product appears with 6 times.
Searching for this kind of problem is not easy as I don't know what I need to search on. Is it a three table problem, an outer join issue, or just a simple syntactical answer.
Or would it be better to switch to a pure hierarchical table structure without the use of assemblies? It would then be easier to search on hierarchical tables to solve any problems I might have.
I'm using LibreOffice 3.5.6.2 Base. It has wizards and other helpful things but they don't extend to the complexity of the situation that I find myself in. The aim is that the database contains prices and it can be used to properly price out the products from the cost of the items and time to build the assemblies.
Be gentle, I'm a newbie to SO.
The normal SQL approach to this would put all the data on one line, rather than split among several lines. So, your data would look like:
ProductID ProductName AssemblyID AssemblyName ItemID ItemName
--------- ----------- ---------- ------------ ------ --------
P0001001 Product One I0045 Item A
P0001001 Product One I0045 Item B
P0001001 Product One A00023 Assembly 1 I0320 Item 1
P0001001 Product One A00023 Assembly 1 I0320 Item 2
. . .
The product and assembly information, for instance, would not be blank for a given item. All would be on the same line.
This information comes from two sources, the product items and the assembly items. The following query gets each component, then unions them together, finally ordering the results by product:
select *
from ((select p.Productid, p.ProductName, NULL as AssemblyId, NULL as AssemblyName, il.Itemid, il.ItemName
from Product p join
ProductItems pi
on p.productId = pi.ProductId join
ItemList il
on pi.ItemId = il.ItemId
) union all
(select p.Productid, p.ProductName, al.AssemblyId, al.AssemblyName, il.Itemid, il.ItemName
from Product p join
ProductAssemblies pa
on pa.ProductId = p.ProductId join
AssemblyList al
on pl.AssembyId = al.AssemblyId, join
AssemblyItems ai
on al.AssemblyItems join
ItemList il
on p.ItemId = il.ItemId
)
) t
order by 1, 2, 3, 4, 5, 6
Often, restructuring into the format you want would be done at the app level. You can do it in SQL, but the best approach depends on the database you are using.

SELECT datafields with multiple groups and sums

I cant seem to group by multiple data fields and sum a particular grouped column.
I want to group Person to customer and then group customer to price and then sum price. The person with the highest combined sum(price) should be listed in ascending order.
Example:
table customer
-----------
customer | common_id
green 2
blue 2
orange 1
table invoice
----------
person | price | common_id
bob 2330 1
greg 360 2
greg 170 2
SELECT DISTINCT
min(person) As person,min(customer) AS customer, sum(price) as price
FROM invoice a LEFT JOIN customer b ON a.common_id = b.common_id
GROUP BY customer,price
ORDER BY person
The results I desire are:
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue,$170
The colors are the customer, that GREG and Bob handle. Each color has a price.
There are two issues that I can see. One is a bit picky, and one is quite fundamental.
Presentation of data in SQL
SQL returns tabular data sets. It's not able to return sub-sets with headings, looking something a Pivot Table.
The means that this is not possible...
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue, $170
But that this is possible...
Bob, Orange, $2230
Greg, Green, $360
Greg, Blue, $170
Relating data
I can visually see how you relate the data together...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 360 2
blue 2 greg 170 2
orange 1 bob 2330 1
But SQL doesn't have any implied ordering. Things can only be related if an expression can state that they are related. For example, the following is equally possible...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 170 2 \ These two have
blue 2 greg 360 2 / been swapped
orange 1 bob 2330 1
This means that you need rules (and likely additional fields) that explicitly state which customer record matches which invoice record, especially when there are multiples in both with the same common_id.
An example of a rule could be, the lowest price always matches with the first customer alphabetically. But then, what happens if you have three records in customer for common_id = 2, but only two records in invoice for common_id = 2? Or do the number of records always match, and do you enforce that?
Most likely you need an extra piece (or pieces) of information to know which records relate to each other.
you should group by using all your selected fields except sum then maybe the function group_concat (mysql) can help you in concatenating resulting rows of the group clause
Im not sure how you could possibly do this. Greg has 2 colors, AND 2 prices, how do you determine which goes with which?
Greg Blue 170 or Greg Blue 360 ???? or attaching the Green to either price?
I think the colors need to have unique identofiers, seperate from the person unique identofiers.
Just a thought.

Making a query more efficient for reads

I have a data model like the following:
username | product1 | product2
-------------------------------
harold abc qrs
harold abc def
harold def abc
kim abc def
kim lmn qrs
...
username | friend_username
---------------------------
john harold
john kim
...
I want to build a histogram of the most frequent product1 to product2 records there are, restricted to a given product1 id, and restricted only to friends of john. So something like:
What do friends of john link to for product1, when product1='abc':
Select all of john's friends from the friends table. For each friend, count and group the number of records where product1 = 'abc', sort results in desc order:
Results:
abc -> def (2 instances)
abc -> qrs (1 instance)
I know we can do the following in a relational database, but there will be some threshold where this kind of query will start utilizing a lot of resources. Users might have a large number of friend records (500+). If this query is running 5 times every time a user loads a page, I'm worried I'll run out of resources quickly.
Is there some other table I can introduce to my model to relieve the overhead of doing the above query everytime users want to see the histogram break down? All I can think of is to precompute the histograms when possible so that reads optimized.
Thanks for any ideas
Here's your query:
SELECT p.product2,
COUNT(p.product2) AS num_product
FROM PRODUCTS p
JOIN FRIENDS f ON f.friend_username = p.username
AND f.username = 'john'
WHERE p.product1 = 'abc'
GROUP BY p.product2
ORDER BY num_product DESC
To handle 5 products, use:
SELECT p.product1,
p.product2,
COUNT(p.product2) AS num_product
FROM PRODUCTS p
JOIN FRIENDS f ON f.friend_username = p.username
AND f.username = 'john'
WHERE p.product1 IN ('abc', 'def', 'ghi', 'jkl', 'mno')
GROUP BY p.product1, p.product2
ORDER BY num_product DESC
It's pretty simple, and the more you can filter the records down, the faster it will run because of being a smaller dataset.
If this query is running 5 times every time a user loads a page, I'm worried I'll run out of resources quickly.
My first question is why you'd run this query more than once per page. If it's to cover more than one friend, the query I posted can be updated to expose counts for products on a per friend or user basis.
After that, I'd wonder if the query can be cached at all. How fresh do you really need the data to be - is 2 hours acceptable? How about 6 or 12... We'd all like the data to be instantaneous, but you need to weigh that against performance and make a decision.

SQL Query Advice - Most recent item

I have a table where I store customer sales (on periodicals, like newspaper) data. The product is stored by issue. Example
custid prodid issue qty datesold
1 123 2 12 01052008
2 234 1 5 01022008
1 123 1 5 01012008
2 444 2 3 02052008
How can I retrieve (whats a faster way) the get last issue for all products, for a specific customer? Can I have samples for both SQL Server 2000 and 2005? Please note, the table is over 500k rows.
Thanks
Assuming that "latest" is determined by date (rather than by issue number), this method is usually pretty fast, assuming decent indexes:
SELECT
T1.prodid,
T1.issue
FROM
Sales T1
LEFT OUTER JOIN dbo.Sales T2 ON
T2.custid = T1.custid AND
T2.prodid = T1.prodid AND
T2.datesold > T1.datesold
WHERE
T1.custid = #custid AND
T2.custid IS NULL
Handling 500k rows is something that a laptop can probably handle without trouble, let alone a real server, so I'd stay clear of denormalizing your database for "performance". Don't add extra maintenance, inaccuracy, and most of all headaches by tracking a "last sold" somewhere else.
EDIT: I forgot to mention... this doesn't specifically handle cases where two issues have the same exact datesold. You might need to tweak it based on your business rules for that situation.
Generic SQL; SQL Server's syntax shouldn't be much different:
SELECT prodid, max(issue) FROM sales WHERE custid = ? GROUP BY prodid;
Is this a new project? If so, I would be wary of setting up your database like this and read up a bit on normalization, so that you might end up with something like this:
CustID LastName FirstName
------ -------- ---------
1 Woman Test
2 Man Test
ProdID ProdName
------ --------
123 NY Times
234 Boston Globe
ProdID IssueID PublishDate
------ ------- -----------
123 1 12/05/2008
123 2 12/06/2008
CustID OrderID OrderDate
------ ------- ---------
1 1 12/04/2008
OrderID ProdID IssueID Quantity
------- ------ ------- --------
1 123 1 5
2 123 2 12
I'd have to know your database better to come up with a better schema, but it sound like you're building too many things into a flat table, which will cause lots of issues down the road.
If you're looking for most recent sale by date maybe that's what you need:
SELECT prodid, issue
FROM Sales
WHERE custid = #custid
AND datesold = SELECT MAX(datesold)
FROM Sales s
WHERE s.prodid = Sales.prodid
AND s.issue = Sales.issue
AND s.custid = #custid
To query on existing growing historical table is way too slow!
Strongly suggest you create a new table tblCustomerSalesLatest which stores the last issue data of each customer. and select from there.