Get SUM in GROUP BY with JOIN using MySQL - sql

I have two tables in MySQL 5.1.38.
products
+----+------------+-------+------------+
| id | name | price | department |
+----+------------+-------+------------+
| 1 | Fire Truck | 15.00 | Toys |
| 2 | Bike | 75.00 | Toys |
| 3 | T-Shirt | 18.00 | Clothes |
| 4 | Skirt | 18.00 | Clothes |
| 5 | Pants | 22.00 | Clothes |
+----+------------+-------+------------+
ratings
+------------+--------+
| product_id | rating |
+------------+--------+
| 1 | 5 |
| 2 | 5 |
| 2 | 3 |
| 2 | 5 |
| 3 | 5 |
| 4 | 5 |
| 5 | 4 |
+------------+--------+
My goal is to get the total price of all products which have a 5 star rating in each department. Something like this.
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes | 36.00 | /* T-Shirt and Skirt */
| Toys | 90.00 | /* Fire Truck and Bike */
+------------+-------------+
I would like to do this without a subquery if I can. At first I tried a join with a sum().
select department, sum(price) from products
join ratings on product_id=products.id
where rating=5 group by department;
+------------+------------+
| department | sum(price) |
+------------+------------+
| Clothes | 36.00 |
| Toys | 165.00 |
+------------+------------+
As you can see the price for the Toys department is incorrect because there are two 5 star ratings for the Bike and therefore counting that price twice due to the join.
I then tried adding distinct to the sum.
select department, sum(distinct price) from products
join ratings on product_id=products.id where rating=5
group by department;
+------------+---------------------+
| department | sum(distinct price) |
+------------+---------------------+
| Clothes | 18.00 |
| Toys | 90.00 |
+------------+---------------------+
But then the clothes department is off because two products share the same price.
Currently my work-around involves taking something unique about the product (the id) and using that to make the price unique.
select department, sum(distinct price + id * 100000) - sum(id * 100000) as total_price
from products join ratings on product_id=products.id
where rating=5 group by department;
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes | 36.00 |
| Toys | 90.00 |
+------------+-------------+
But this feels like such a silly hack. Is there a better way to do this without a subquery? Thanks!

Use:
SELECT p.department,
SUM(p.price) AS total_price
FROM PRODUCTS p
JOIN (SELECT DISTINCT
r.product_id,
r.rating
FROM RATINGS r) x ON x.product_id = p.id
AND x.rating = 5
GROUP BY p.department
Technically, this does not use a subquery - it uses a derived table/inline view.

The primary reason you are having trouble finding a solution is that the schema as presented is fundamentally flawed. You shouldn't allow a table to have two rows that are complete duplicates of each other. Every table should have a means to uniquely identify each row even if it is the combination of all columns. Now, if we change the ratings table so that it has an AUTO_INCREMENT column called Id, the problem is easier:
Select products.department, Sum(price) As total_price
From products
Left Join ratings As R1
On R1.product_id = products.id
And R1.rating = 5
Left Join ratings As R2
On R2.product_id = R1.product_id
And R2.rating = R1.rating
And R2.Id > R1.Id
Where R2.Id Is Null
Group By products.department

You can do two queries. First query:
SELECT DISTINCT product_id FROM ratings WHERE rating = 5;
Then, take each of those ID's and manually put them in the second query:
SELECT department, Sum(price) AS total_price
FROM products
WHERE product_id In (1,2,3,4)
GROUP BY department;
This is the work-around for not being able to use subqueries. Without them, there is no way to eliminate the duplicate records caused by the join.

I can't think of any way to do it without a subquery somewhere in the query. You could perhaps use a View to mask the use of a subquery.
Barring that, your best bet is probably to find the minimum data set needed to make the calculation and do that in the front end. Whether or not that's possible depends on your specific data - how many rows, etc.
The other option (actually, maybe this is the best one...) would be to get a new ORM or do without it altogether ;)
This view would allow you to bypass the subquery:
CREATE VIEW Distinct_Product_Ratings
AS
SELECT DISTINCT
product_id,
rating
FROM
Ratings

Related

Excluding tuples based on maximum condition

I have been trying to answer to solve this SQL Query problem, but got no success. The problem is the following:
PROBLEM:
Given 4 tables, PRODUCTS, REPAIRS, OWNERS and MALFUNCTION, for each product Brand and Model display the type of malfunction which have been repaired more times.
The tables have the following fields:
PRODUCTS: *Series_num, Brand, Model, Year, Code_Owner
OWNERS: *Code_Owner, Name, Surname, Street, Civic, City, (u)Phone
MALFUNCTIONS: *Malf_code, Desc
REPAIRS: *Series_num, *Malf_code, *Repair_Date, Price
* <- Primary key
(u) <- Unique attribute
The expected result, given this example of data:
| MODEL | BRAND | MALF_CODE | NUMBER OF REPAIRS|
|----------------------------------------------------|
| 1 | BRAND1 | 1 | 20 |
| 1 | BRAND1 | 2 | 10 |
| 2 | BRAND1 | 1 | 1 |
| 2 | BRAND1 | 2 | 1 |
| 1 | BRAND2 | 1 | 10 |
| 1 | BRAND2 | 2 | 11 |
Should be:
| MODEL | BRAND | MALF_CODE | NUMBER OF REPAIRS|
|----------------------------------------------------|
| 1 | BRAND1 | 1 | 20 |
| 2 | BRAND1 | 1 | 1 |
| 1 | BRAND2 | 2 | 11 |
Note that BRAND1, MODEL:2 has the same number of repairs for two different types of malfunction, so one of the rows can be ignored or both of them can be shown (it does not matter)
WHAT I'VE TRIED:
To get the first table, I used a simple JOIN query:
SELECT A.MODEL, A.BRAND, R.MALF_CODE, COUNT(*) AS N_REP
FROM REPAIRS R LEFT JOIN PRODUCTS A ON A.SERIES_NUM = R.SERIES_NUM
GROUP BY A.MODEL, A.BRAND, R.MALF_CODE;
Then I tried to get the second table thanks to MAX() function:
SELECT A.MODEL, A.BRAND, R.MALF_CODE, COUNT(*) AS N_REP
FROM REPAIRS R LEFT JOIN PRODUCTS A ON A.SERIES_NUM = R.SERIES_NUM
GROUP BY A.MODEL, A.BRAND, R.MALF_CODE
HAVING COUNT(*) IN(
SELECT MAX(R.MALF_CODE)
FROM REPAIRS R LEFT JOIN PRODUCTS A ON A.SERIES_NUM = R.SERIES_NUM
GROUP BY A.MODEL, A.BRAND, R.MALF_CODE
ORDER BY A.BRAND, R.MALF_CODE);
But this throws me the following error:
[42000][907] ORA-00907: Missing closing Parenthesis
It seems I can't find the error.
I hope I've been clear enough. Thanks in advance.
EDIT: I forgot to mention that I'm aware of RANK functions and such, but never heard of Partitions. So a solution without them is highly appreciated but not mandatory.
If I understand correctly, you want the row with the most repairs for each model/brand combination. If so, window functions are one method:
SELECT MODEL, BRAND, MALF_CODE, N_REP
FROM (SELECT P.MODEL, P.BRAND, R.MALF_CODE, COUNT(*) AS N_REP,
ROW_NUMBER() OVER (PARTITION BY P.MODEL, P.BRAND ORDER BY COUNT(*) DESC, R.MALF_CODE) as SEQNUM
FROM REPAIRS R LEFT JOIN
PRODUCTS P
ON P.SERIES_NUM = R.SERIES_NUM
GROUP BY P.MODEL, P.BRAND, R.MALF_CODE
) MB
WHERE seqnum = 1;

Join max records in Postresql

I have two tables:
products
+----+--------+
| id | name |
+----+--------+
| 1 | Orange |
| 2 | Juice |
| 3 | Fance |
+----+--------+
reviews
+----+------------+-------+------------+
| id | created_at | price | product_id |
+----+------------+-------+------------+
| 1 | 12/12/20 | 2 | 1 |
| 2 | 12/14/20 | 4 | 1 |
| 3 | 12/15/20 | 5 | 2 |
+----+------------+-------+------------+
How can I get list of products ordered by price of most recent (max created_at) review?
+------------+--------+-----------+-------+
| product_id | name | review_id | price |
+------------+--------+-----------+-------+
| 2 | Juice | 3 | 5 |
| 1 | Orance | 2 | 4 |
| 3 | Fance | | |
+------------+--------+-----------+-------+
I use latest PostgreSQL.
demo:db<>fiddle
Using DISTINCT ON
SELECT
*
FROM (
SELECT DISTINCT ON (p.id)
p.id,
p.name,
r.id as review_id,
r.price
FROM
reviews r
RIGHT JOIN products p ON r.product_id = p.id
ORDER BY p.id, r.created_at DESC NULLS LAST
) s
ORDER BY price DESC NULLS LAST
Join both tables (products LEFT JOIN review or review RIGHT JOIN products).
Now you have to do your orders. First you want to group the products together. Then you want to get the most recent entry per product (date in descending order to get the most recent as first row).
DISTINCT ON filters always the first row of an ordered group. So you get the most recent entry per product.
To sort your product rows put 1-3 into a subquery and order by price afterwards.
DISTINCT ON and an outer join is a good approach, but I would handle this as:
SELECT . . . -- whatever columns you want
FROM products p LEFT JOIN
(SELECT DISTINCT ON (r.product_id) r.*
FROM reviews r
ORDER BY r.product_id, r.created_at DESC NULLS LAST
) r
ON r.product_id = p.id
ORDER BY p.price DESC NULLS LAST;
The difference in doing DISTINCT ON before the JOIN or after may look minor. But this version of the query can take advantage of an index on reviews(product_id, created_at desc). And that could be a big performance win on a lot of data.
Indexes cannot be used for an ORDER BY that mixes columns from different tables.

How to get a MAX and a COUNT from a three table join?

I got an interview question where there's a Car sale modeled in a DB. Each Car represents a physical car in a Car sale which refers to a Make and a Model table. A Sale table keeps track of each Car that is sold. A Sale only consists of one Car, so there's a record in Sale per every unique Car that had been sold.
The question was to find-out the name of the most sold Model in the car sale. I answered with a 3-level nested query. The interviewer specifically asked for a solution using joins where I only succeeded in just joining the tables without the aggregates.
How would you join 3 tables as below (Car, Make, Sale) while using two other aggregates?
Here's a rough sketch of the schema. The most sold Model here should return 'Corolla'
Car
| carid| modid | etc...
_________________
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
Make
| mkid | name |
_________________
| 1 | Toyota |
| 2 | Nissan |
| 3 | Chevy |
| 4 | Merc |
| 5 | Ford |
Model
| modid| name | mkid |
________________________
| 1 | Corolla| 1
| 2 | Sunny | 2
| 3 | Carina | 1
| 4 | Skyline| 2
| 5 | Focus | 5
Sale
| sid | carid | etc...
_________________
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
Edit:
Using MS SQL Server 2008
Output needed:
Model Name | Count
_____________________
Corolla | 3
i.e. The model of the Car that has been sold the most.
Notice only 3 Corollas and 2 Sunnys are in the Car table while Sale table corresponds to each of those with other sales detail. The 5 Sale records are actually Corolla, Corolla, Corolla, Sunnnu and Sunny.
Since you are using SQL Server 2008, make use of Common Table Expression and Window Function.
WITH recordList
AS
(
SELECT c.name, COUNT(*) [Count],
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) rn
FROM Sale a
INNER JOIN Car b
ON a.carid = b.carID
INNER JOIN Model c
ON b.modID = c.modID
GROUP BY c.Name
)
SELECT name, [Count]
FROM recordList
WHERE rn = 1
SQLFiddle Demo
When interviewers ask for this they usually want you to say that you'd use windowed functions. You could give each sale a unique ascending number partitioned by model and the highest sale number you'd get would be the max count.
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
Following query works on oracle 11g . here's fiddle link
SELECT name FROM (
SELECT model.name AS name FROM car , sale , model
WHERE car.carid=sale.carid
AND car.modid=model.modid
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
Or
SELECT name FROM (
SELECT model.name AS name FROM car natural join sale natural join model
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
OUTPUT
| NAME |
-----------
| Corolla |
Based on your newly added SQL Server 2008 tag. If you are using a different RDBMS you'll probably need to use limit instead of top and place it at the end of the top_sold_car subquery.
select Make.name as Make, Model.name as Model
from (
select top 1 count(*) as num_sold
from Car
group by modid
order by num_sold desc) as top_sold_car
join Model
on (top_sold_car.modid = Model.modid)
join Make
on (Model.mkid = Make.mkid)

Issue with SQL involving JOINS

I have 2 tables with similar layout, involving INCOME and EXPENSES.
The id column is a customer ID.
I need a result of customer TOTAL AMOUNT, summing up income and expenses.
Table: Income
| id | amountIN|
+--------------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
Table: Expenses
| id | amountOUT|
+---------------+
| 1 | -x |
| 4 | -z |
My problem is that some customers only have expenses and others just income... so cannot know in advance id I need to do a LEFT or RIGHT JOIN.
In the example above an RIGHT JOIN could do the trick, but if the situation is inverted (more customers on the Expenses table) it doesn't work.
Expected Result
| id | TotalAmount|
+--------------+
| 1 | a - x |
| 2 | b |
| 3 | c |
| 4 | d - z |
Any help?
select id, SUM(Amount)
from
(
select id, amountin as Amount
from Income
union all
select id, amountout as Amount
from Expense
) a
group by id
I believe a full join will solve your problem.
I would approach this as a union. Do that in your subquery then sum on it.
For instance:
select id, sum(amt) from
(
select i.id, i.amountIN as amt from Income i
union all
select e.id, e.amountOUT as amt from Expenses e
)
group by id
You should really have another table like client :
Table: Client
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
So you could do something like that
SELECT Client.ID, COALESCE(Income.AmountIN, 0) - COALESCE(Expenses.AmountOUT, 0)
FROM Client c
LEFT JOIN Income i ON i.ID = c.ID
LEFT JOIN Expense e ON e.ID = c.ID
Will be less complicated and i'm sure it will come handy another time :)

Count problem in SQL when I want results from diffrent tabels

ALTER PROCEDURE GetProducts
#CategoryID INT
AS
SELECT COUNT(tblReview.GroupID) AS ReviewCount,
COUNT(tblComment.GroupID) AS CommentCount,
Product.GroupID,
MAX(Product.ProductID) AS ProductID,
AVG(Product.Price) AS Price,
MAX (Product.Year) AS Year,
MAX (Product.Name) AS Name,
AVG(tblReview.Grade) AS Grade
FROM tblReview, tblComment, Product
WHERE (Product.CategoryID = #CategoryID)
GROUP BY Product.GroupID
HAVING COUNT(distinct Product.GroupID) = 1
This is what the tabels look like:
**Product** |**tblReview** | **tblComment**
ProductID | ReviewID | CommentID
Name | Description | Description
Year | GroupID | GroupID
Price | Grade |
GroupID
GroupID is name_year of a Product, ex Nike_2010. One product can have diffrent sizes for exampel:
ProductID | Name | Year | Price | Size | GroupID
1 | Nike | 2010 | 50 | 8 | Nike_2010
2 | Nike | 2010 | 50 | 9 | Nike_2010
3 | Nike | 2010 | 50 | 10 | Nike_2010
4 | Adidas| 2009 | 45 | 8 | Adidas_2009
5 | Adidas| 2009 | 45 | 9 | Adidas_2009
6 | Adidas| 2009 | 45 | 10 | Adidas_2009
I dont get the right count in my tblReview and tblComment. If I add a review to Nike size 8 and I add one review to Nike size 10 I want 2 count results when I list the products with diffrent GroupID. Now I get the same count on Reviews and Comment and both are wrong.
I use a datalist to show all the products with diffrent/unique GroupID, I want it to be like this:
______________
| |
| Name: Nike |
| Year: 2010 |
| (All Sizes) |
| x Reviews |
| x Comments |
| x AVG Grade |
|______________|
All Reviewcounts, Commentcounts and the Average of all products with the same GroupID, the Average works great.
Because you are not specifying any criteria which join the tables, you will get the product category you specify in combination with every of tblReview and tblComment (effectively a massive cross join).
Your AVG just happens to work out of luck.
You should try something like this:
SELECT (SELECT COUNT(*) FROM tblReview WHERE tblReview.GroupID = Product.GroupID) AS ReviewCount,
(SELECT COUNT(*) FROM tblComment WHERE tblComment.GroupID = Product.GroupID) AS CommentCount,
Product.GroupID,
MAX(Product.ProductID) AS ProductID,
AVG(Product.Price) AS Price,
MAX (Product.Year) AS Year,
MAX (Product.Name) AS Name,
(SELECT AVG(tblReview.Grade) FROM tblReview WHERE tblReview.GroupID = Product.GroupID) AS Grade
FROM Product
WHERE (Product.CategoryID = #CategoryID)
GROUP BY Product.GroupID
HAVING COUNT(distinct Product.GroupID) = 1
Normally I would not use correlated subqueries and instead join to aggregate subqueries, but this is more illustrative of your problem.
There will be one comment row for every product, so both COUNT(tblReview.GroupID) and COUNT(tblComment.GroupID) will return the number of products x number of comments for that group.
Another way of explaining that is by running the query without a group by. The database will iterate over the rows, and increase COUNT(tblReview.GroupID) for every row where tblReview.GroupID is not null.
One solution is to use distinct. Change the ReviewCount to:
COUNT(DISTINCT tblReview.GroupID) AS ReviewCount,
^^^^^^^^