Designing and Querying Product / Review system - sql

I created a product / review system from scratch and I´m having a hard time to do the following query in SQL Server.
My schema has different tables, for: products, reviews, categories, productPhotos and Brand. I have to query them all to find the brand and category name, photos details, Average Rating and Number of Reviews.
I´m having a hard time to get No. of reviews and average rating.
Reviews can be hidden (user has deleted) or blocked (waiting for moderation). My product table doesn't have No. of Reviews or Average Rating columns, so I need to count it on that query, but not counting the blocked and hidden ones (r.bloqueado=0 and r.hidden=0).
I have the query below, but it´s counting the blocked and hidden. If I uncomment the "and r.bloqueado=0 and r.hidden=0" part I get the right counting, but then it doesn't show products that has 0 reviews (something I need!).
select top 20
p.id, p.brand, m.nome, c.name,
count(r.product) AS NoReviews, Avg(r.nota) AS AvgRating,
f.id as cod_foto,f.nome as nome_foto
from
tblBrands AS m
inner join
(tblProducts AS p
left join
tblProductsReviews AS r ON p.id = r.product) ON p.brand = m.id
left join
tblProductsCategorias as c on p.categoria = c.id
left join
(select
id_product, id, nome
from
tblProductsFotos O
where
id = (SELECT min(I.id)
FROM tblProductsFotos I
WHERE I.id_product = O.id_product)) as f on p.id = f.id_product
where
p.bloqueado = 0
//Problem - and r.bloqueado=0 and r.hidden=0
group by
p.id, p.brand, p.modalidade, m.nome, c.name, f.id,f.nome"
Need your advice:
I have seen other systems that has Avg Rating and No. of Reviews in the product table. This would help a lot in the complexity of this query (probably also performance), but then I have to do extra queries in every new review, blocked and hidden actions. I can easily to that. Considering that includes and updates occurs much much less than showing the products, this sounds nice.
Would be a better idea to do that ?
Or is it better to find a way to fix this query ? Can you help me find a solution ?
Thanks

For count the number of product you can use case when and sum assigning 1 there the value is not r.bloqueado=0 or r.hidden=0 and 0 for these values (so you can avoid the filter in where)
"select top 20 p.id, p.brand, m.nome, c.name, sum(
case when r.bloqueado=0 then 0
when r.hidden=0 then 0
else 1
end ) AS NoReviews,
Avg(r.nota) AS AvgRating, f.id as cod_foto,f.nome as nome_foto
from tblBrands AS m
inner join (tblProducts AS p
left join tblProductsReviews AS r ON p.id=r.product ) ON p.brand = m.id
left join tblProductsCategorias as c on p.categoria=c.id
left join (select id_product,id,nome from tblProductsFotos O
where id = (SELECT min(I.id) FROM tblProductsFotos I
WHERE I.id_product = O.id_product)) as f on p.id = f.id_product where p.bloqueado=0
group by p.id, p.brand, p.modalidade, m.nome, c.name, f.id,f.nome"
for avg could be you can do somethings similar

It's very easy to lose records when combining a where clause with an outer join. Rows that do not exist in the outer table are returned as NULL. Your filter has accidentally excluded these nulls.
Here's an example that demonstrates what's happening:
/* Sample data.
* There are two tables: product and review.
* There are two products: 1 & 2.
* Only product 1 has a review.
*/
DECLARE #Product TABLE
(
ProductId INT
)
;
DECLARE #Review TABLE
(
ReviewId INT,
ProductId INT,
Blocked BIT
)
;
INSERT INTO #Product
(
ProductId
)
VALUES
(1),
(2)
;
INSERT INTO #Review
(
ReviewId,
ProductId,
Blocked
)
VALUES
(1, 1, 0)
;
Outer joining the tables, without a where clause, returns:
Query
-- No where.
SELECT
p.ProductId,
r.ReviewId,
r.Blocked
FROM
#Product AS p
LEFT OUTER JOIN #Review AS r ON r.ProductId = p.ProductId
;
Result
ProductId ReviewId Blocked
1 1 0
2 NULL NULL
Filtering for Blocked = 0 would remove the second record, and therefore ProductId 2. Instead:
-- With where.
SELECT
p.ProductId,
r.ReviewId,
r.Blocked
FROM
#Product AS p
LEFT OUTER JOIN #Review AS r ON r.ProductId = p.ProductId
WHERE
r.Blocked = 0
OR r.Blocked IS NULL
;
This query retains the NULL value, and ProductId 2. Your example is a little more complicated because you have two fields.
SELECT
...
WHERE
(
Blocked = 0
AND Hidden = 0
)
OR Blocked IS NULL
;
You do not need to check both fields for NULL, as they appear in the same table.

Related

Returning ID's from two other tables or null if no IDs found using using a left join SQL Server

I am wondering if someone could hep me. I am trying to make a join on two tables and return an id if an id is there but if there is no id return null but still return the row for that product and not ignore it. My query below returns twice the amount the records to which I can not figure out why.
SELECT
T2.ProductID, FirstChild.SupplierID, SecondChild.AccountID
FROM
Products T2
LEFT OUTER JOIN
(
SELECT TOP(1) SupplierID, Reference,CompanyID, Row_Number() OVER (Partition By SupplierID Order By SupplierID) AS RowNo FROM Suppliers
)
FirstChild ON T2.SupplierReference = FirstChild.Reference AND RowNo = 1AND FirstChild.CompanyID =T2.CompanyID
LEFT OUTER JOIN
(
SELECT TOP(1) AccountID, SageKey,CompanyID, Row_Number() OVER (Partition By AccountID Order By AccountID) AS RowNo2 FROM Accounts
)
SecondChild ON T2.ProductAccountReference = SecondChild.Reference AND RowNo2 = 1 AND SecondChild.CompanyID =T2.CompanyID
Example of what I am trying to do
ProductID SupplierID AccountID
1 5 2
2 6 NULL
3 NULL NULL
OUTER APPLY and ditching the ROW_NUMBER Seems like a better choice here:
SELECT
p.ProductId
,FirstChild.SupplierId
,SecondChild.AccountId
FROM
Products p
OUTER APPLY (SELECT TOP (1) s.SupplierId
FROM
Suppliers s
WHERE
p.SupplierReference = s.SupplierReference
AND p.CompanyId = s.CompanyId
ORDER BY
s.SupplierId
) FirstChild
OUTER APPLY (SELECT TOP (1) a.AccountId
FROM
Accounts
WHERE
p.ProductAccountReference = a.Reference
AND p.CompanyId = a.CompanyId
ORDER BY
a.AccountID
) SecondChild
The way your query is written above there is no correlation for the derived tables. Which means you would always get what ever SupplierId SQL chooses based on optimization and if that doesn't happen to always be Row1 you wont get the value. You need to relate your Table and select top 1, adding an ORDER BY in your derived table is like identifying the row number you want.
If it's just showing duplicate records, wouldn't an inelegant solution just be to add distinct in the select line?

WHERE Clause for One-To-Many Association

I have two tables Products and ProductProperties.
Products
name - string
description - text
etc etc
ProductProperties
product_id - integer
property_id - integer
There is also a table Properties which basically stores the list of property names and their attributes
How can I implement a SQL command that finds a product with the property_ids (A or B or C) AND (X or Y or Z)
I've got upto here:
SELECT DISTINCT "products".*
FROM "products"
INNER JOIN "product_properties" ON "product_properties"."product_id" = "products"."id" AND "product_properties"."deleted_at" IS NULL
WHERE "products"."deleted_at" IS NULL
AND (product_properties.property_id IN ('504, 506, 403'))
AND (product_properties.property_id IN ('520, 501, 502'))
But it doesn't really work since it's looking for a Product Property which has both values 504 and 520, which will never exist.
Would appreciate some help!
You need to define intermediate resultsets on a property group basis:
SELECT DISTINCT p.*
FROM products p
JOIN product_properties groupA ON groupA.product_id = p.id AND groupA.deleted_at IS NULL AND groupA.property_id IN ('504')
JOIN product_properties groupB ON groupB.product_id = p.id AND groupB.deleted_at IS NULL AND groupB.property_id IN ('520')
WHERE p.deleted_at IS NULL
You see, you detected the problem yourself very nicely: "But it doesn't really work since it's looking for a Product Property which has both values 504 and 520, which will never exist."
Indeed, recordsets are immutable within a query, all single criteria applied to them are applied all at once. You need to duplicate each table and apply individual criteria to them.
One method uses exists or in:
select p.*
from products p
where p.id in (select pp.product_id
from product_properties pp
where pp.propertyid in ('504', '520')
);
This saves you from having to use distinct in the outer query.
If, perchance, you really mean finding the products that have all the properties, then a join and group by work:
select p.*
from products p join
product_properties pp
on p.id = pp.product_id
where pp.propertyid in ('504', '520')
group by p.id -- yes, this is allowed in Postgres
having count(*) = 2;
Hi try this queries i just thinking about it so i didn't try any of them check i got the idea i want to do
SELECT DISTINCT "products".*
FROM products pr
WHERE id IN
(
SELECT product_id FROM ProductProperties WHERE property_id IN (504,520)
GROUP BY product_id
HAVING Count(*) = 2
) AND "products"."deleted_at" IS NULL
SELECT DISTINCT "products".*
FROM products pr, INNER JOIN (
SELECT product_id,count(*) as nbr FROM ProductProperties WHERE property_id IN (504,520)
GROUP BY product_id
) as temp ON temp.product_id = pr.id
WHERE "products"."deleted_at" IS NULL AND temp.nbr = 2
and also you can check this one as well ( you can use also the join in where clause instead of using INNER JOIN)
SELECT DISTINCT products.* FROM products as p
INNER JOIN product_properties as p1 ON p1.product_id = p.id
INNER JOIN product_properties as p2 ON p2.product_id = p.id
WHERE p.deleted_at IS NULL
AND p1.property_id = '504' AND p1.deleted_at IS NULL
AND p2.property_id = '520' AND p2.deleted_at IS NULL

Refactoring a tsql view which uses row_number() to return rows with a unique column value

I have a sql view, which I'm using to retrieve data. Lets say its a large list of products, which are linked to the customers who have bought them. The view should return only one row per product, no matter how many customers it is linked to. I'm using the row_number function to achieve this. (This example is simplified, the generic situation would be a query where there should only be one row returned for each unique value of some column X. Which row is returned is not important)
CREATE VIEW productView AS
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE temp.prodcut_numbering = 1
Now lets say that the total number of rows in this view is ~1 million, and running select * from productView takes 10 seconds. Performing a query such as select * from productView where productID = 10 takes the same amount of time. I believe this is because the query gets evaluated to this
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10
I think this is causing the inner subquery to be evaluated in full each time. Ideally I'd like to use something along the following lines
SELECT
Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
customer.id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
WHERE prodcut_numbering = 1
But this doesn't seem to be allowed. Is there any way to do something similar?
EDIT -
After much experimentation, the actual problem I believe I am having is how to force a join to return exactly 1 row. I tried to use outer apply, as suggested below. Some sample code.
CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
id int not null PRIMARY KEY,
productId int not null,
value varchar(20) NOT NULL)
declare #count int = 1
while #count <= 150000
begin
insert into Customers (id, productID, value)
values (#count,#count/2, 'Value ' + cast(#count/2 as varchar))
insert into Products (id)
values (#count)
SET #count = #count + 1
end
CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)
With the above sample set, the 'get everything' query below
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
takes ~1000ms to run. Adding an explicit condition:
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'
Takes some identical amount of time. This 1000ms for a fairly simple query is already too much, and scales the wrong way (upwards) when adding additional similar joins.
Try the following approach, using a Common Table Expression (CTE). With the test data you provided, it returns specific ProductIds in less than a second.
create view ProductTest as
with cte as (
select
row_number() over (partition by p.id order by p.id) as RN,
c.*
from
Products p
inner join Customers c
on p.id = c.productid
)
select *
from cte
where RN = 1
go
select * from ProductTest where ProductId = 25
What if you did something like:
SELECT ...
FROM products
OUTER APPLY (SELECT TOP 1 * from customer where customerid = products.buyerid) as customer
...
Then the filter on productId should help. It might be worse without filtering, though.
The problem is that your data model is flawed. You should have three tables:
Customers (customerId, ...)
Products (productId,...)
ProductSales (customerId, productId)
Furthermore, the sale table should probably be split into 1-to-many (Sales and SalesDetails). Unless you fix your data model you're just going to run circles around your tail chasing red-herring problems. If the system is not your design, fix it. If the boss doesn't let your fix it, then fix it. If you cannot fix it, then fix it. There isn't a easy way out for the bad data model you're proposing.
this will probably be fast enough if you really don't care which customer you bring back
select p1.*, c1.*
FROM products p1
Left Join (
select p2.id, max( c2.id) max_customer_id
From product p2
Join customer c2 on
c2.productID = p2.id
group by 1
) product_max_customer
Left join customer c1 on
c1.id = product_max_customer.max_customer_id
;

Many to many query

I have two tables products and sections in a many to many relationship and a join table products_sections. A product can be in one or more sections (new, car, airplane, old).
Products
id name
-----------------
1 something
2 something_else
3 other_thing
Sections
id name
-----------------
1 new
2 car
Products_sections
product_id section_id
--------------------------
1 1
1 2
2 1
3 2
I want to extract all products that are both in the new and the car sections. In this example result returned should be product 1. What is the correct mysql query to obtain this?
SELECT Products.name
FROM Products
WHERE NOT EXISTS (
SELECT id
FROM Sections
WHERE name IN ('new','car')
AND NOT EXISTS (
SELECT *
FROM Products_sections
WHERE Products_sections.section_id = Sections.id
AND Products_sections.product_id = Products.id
)
)
In other words, select those products for which none of the desired Section.id values is missing from the Products_sections table for that product.
Answer andho's comment:
You can put
NOT EXISTS (<select query>)
into a WHERE clause like any other predicate. It will evaluate to TRUE if there are no rows in the result set described by <select query>.
Stepwise, here's how to get to this query as an answer:
Step 1. The requirement is to identify all products that are "in both the 'new' and 'car' sections".
Step 2. A product is in both the 'new' and 'car' sections if both the 'new' and 'car' sections contain the product. Equivalently, a product is in both the 'new' and 'car' sections if neither of those sections fails to contain the product. (Note the double negative: neither fails to contain.) Restated again, we want all the products for which there is no required section failing to contain the product.
The required sections are these:
SELECT id
FROM Sections
WHERE name IN ('new','car')
Therefore, the desired products are these:
SELECT Products.name
FROM Products
WHERE NOT EXISTS ( -- there does not exist
SELECT id -- a section
FROM Sections
WHERE name IN ('new','car') -- that is required
AND (the section identified by Sections.id fails to contain the product identified by Products.id)
)
Step 3. A given section (such as 'new' or 'car') does contain a particular product if there's a row in Products_sections for the given section and particular product. So a given section fails to contain a particular product if there is no such row in Products_sections.
Step 4. If the query below does contain a row, the section_id section does contain the product_id product:
SELECT *
FROM Products_sections
WHERE Products_sections.section_id = Sections.id
AND Products_sections.product_id = Products.id
So the section_id section fails to contain the product (and that's what we need to express) if the query above does not produce a row in its result, or if NOT EXISTS ().
Seems complicated, but once you get it in your head, it sticks: Are all required items present? Yes, so long as there does not exist a required item that is not present.
The way I always do these is this:
Start at what you're trying to get (products), and then go through your lookup table (products_sections) to what you're trying to filter by (sections). This way, you can have it in plain view what you're looking for, and you never have to memorize surrogate keys (which are a great thing to have, not to memorize).
select distinct
p.name
from
products p
inner join products_sections ps on
p.product_id = ps.product_id
inner join sections s1 on
ps.section_id = s1.section_id
inner join sections s2 on
ps.section_id = s2.section_id
where
s1.name = 'new'
and s2.name = 'car'
Voila. Three inner joins, and you have a nice, clear, concise query that is obvious what it's bringing back. Hope this helps!
SELECT product_id, count(*) AS TotalSection
FROM Products_sections
GROUP BY product_id
WHERE section_id IN (1,2)
HAVING TotalSection = 2;
See if this works in mysql.
The query below is a little unwieldy, but it should answer your question:
select products.id
from products
where products.id in
(
select products_sections.product_id
from products_sections
where products_sections.section_id=1
)
and products.id in
(
select products_sections.product_id
from products_sections
where products_sections.section_id=2
)
Self-join on two subsets of join table and then selecting unique product ids.
SELECT DISTINCT car.product_id
FROM ( SELECT product_id
FROM Product_sections
WHERE section_id = 2
) car JOIN
( SELECT product_id
FROM Product_sections
WHERE section_id = 1
) neww
ON (car.product_id = neww.product_id)
This query is a variation of more general solution:
SELECT DISTINCT car.product_id
FROM product_sections car join
product_sections neww ON (car.product_id = neww.product_id AND
car.section_id = 2 AND
neww.section_id = 2)
Less efficient but more straight forward solution is:
SELECT p.name FROM Products p WHERE
EXISTS (SELECT 'found car'
FROM Products_sections ps
WHERE ps.product_id = p.id AND ps.section_id = 2)
AND
EXISTS (SELECT 'found new'
FROM products_sections ps
WHERE ps.product_id = p.id AND ps.section_id = 1)
----------------
I manipulated with ids for clarity. If necessary replace expressions section_id = 2 and section_id = 1 with
section_id = (SELECT s.id FROM Sections s WHERE s.name = 'car')
section_id = (SELECT s.id FROM Sections s WHERE s.name = 'new')
Also, you can select product names by plugging in any of the queries above like this:
SELECT Products.name FROM Products
WHERE EXISTS (
SELECT 'found product'
FROM product_sections car join
product_sections neww ON (car.product_id = neww.product_id AND
car.section_id = 2 AND
neww.section_id = 2)
WHERE car.product_id = Products.id
)
SELECT p.*
FROM Products p
INNER JOIN (SELECT ps.product_id
FROM Products_sections ps
INNER JOIN Sections s
ON s.id = ps.section_id
WHERE s.name IN ("new","car")
GROUP BY ps.product_id
HAVING Count(ps.product_id) = 2) pp
ON p.id = pp.product_id
This query will get you the result without having to add more inner joins when you need to search more sections. What will change here are:
values inside the IN () paranthesis
The value in the where clause for count which should be replaced with the number of sections you are searching
SELECT id, name FROM
(
SELECT
products.id,
products.name,
sections.name AS section_name,
COUNT(*) AS count FROM products
INNER JOIN products_sections
ON products_sections.product_id=products.id
INNER JOIN sections
ON sections.id=products_sections.section_id
WHERE sections.name IN ('car', 'new')
GROUP BY products.id
) AS P
WHERE count = 2
select
`p`.`id`,
`p`.`name`
from `Sections` as `s`
join `Products_sections` as `ps` on `ps`.`section_id` = `s`.`id`
join `Products` as `p` on `p`.`id` = `ps`.`product_id`
where `s`.`id` in ( 1,2 )
having count( distinct `s`.`name` = 2 )
will return...
id name
-----------------
1 something
Is that what you were looking for?

Join two tables where all child records of first table match all child records of second table

I have four tables: Customer, CustomerCategory, Limit, and LimitCategory. A customer can be in multiple categories and a limit can also have multiple categories. I need to write a query that will return the customer name and limit amount where ALL the customers categories match ALL the limit categories.
I'm guessing it would be similar to the answer here, but I can't seem to get it right. Thanks!
Edit - Here's what the tables look like:
tblCustomer
customerId
name
tblCustomerCategory
customerId
categoryId
tblLimit
limitId
limit
tblLimitCategory
limitId
categoryId
I THINK you're looking for:
SELECT *
FROM CustomerCategory
LEFT OUTER JOIN Customer
ON CustomerCategory.CustomerId = Customer.Id
INNER JOIN LimitCategory
ON CustomerCategory.CategoryId = LimitCategory.CategoryId
LEFT OUTER JOIN Limit
ON Limit.Id = LimitCategory.LimitId
Updated!
Thanks to Felix for pointing out a flaw in my existing solution (3 years after I originally posted it, hehe). After looking at it again, I think this might be correct. Here I'm getting (1) the customers and limits with matching categories, plus the number of matching categories, (2) the number of categories per customer, (3) the number of categories per limit, (4) I then ensure the number of categories for customer and limits is the same as the number of the matches between the customers and limits:
UNTESTED!
select
matches.name,
matches.limit
from (
select
c.name,
c.customerId,
l.limit,
l.limitId,
count(*) over(partition by cc.customerId, lc.limitId) as matchCount
from tblCustomer c
join tblCustomerCategory cc on c.customerId = cc.customerId
join tblLimitCategory lc on cc.categoryId = lc.categoryId
join tblLimit l on lc.limitId = l.limitId
) as matches
join (
select
cc.customerId,
count(*) as categoryCount
from tblCustomerCategory cc
group by cc.customerId
) as customerCategories
on matches.customerId = customerCategories.customerId
join (
select
lc.limitId,
count(*) as categoryCount
from tblLimitCategory lc
group by lc.limitId
) as limitCategories
on matches.limitId = limitCategories.limitId
where matches.matchCount = customerCategories.categoryCount
and matches.matchCount = limitCategories.categoryCount
I don't know if this will work or not, just a thought i had and i can't test it, I'm sures theres a nicer way! don't be too harsh :)
SELECT
c.customerId
, l.limitId
FROM
tblCustomer c
CROSS JOIN
tblLimit l
WHERE NOT EXISTS
(
SELECT
lc.limitId
FROM
tblLimitCategory lc
WHERE
lc.limitId = l.id
EXCEPT
SELECT
cc.categoryId
FROM
tblCustomerCategory cc
WHERE
cc.customerId = l.id
)