Count problem in SQL when I want results from diffrent tabels - sql

ALTER PROCEDURE GetProducts
#CategoryID INT
AS
SELECT COUNT(tblReview.GroupID) AS ReviewCount,
COUNT(tblComment.GroupID) AS CommentCount,
Product.GroupID,
MAX(Product.ProductID) AS ProductID,
AVG(Product.Price) AS Price,
MAX (Product.Year) AS Year,
MAX (Product.Name) AS Name,
AVG(tblReview.Grade) AS Grade
FROM tblReview, tblComment, Product
WHERE (Product.CategoryID = #CategoryID)
GROUP BY Product.GroupID
HAVING COUNT(distinct Product.GroupID) = 1
This is what the tabels look like:
**Product** |**tblReview** | **tblComment**
ProductID | ReviewID | CommentID
Name | Description | Description
Year | GroupID | GroupID
Price | Grade |
GroupID
GroupID is name_year of a Product, ex Nike_2010. One product can have diffrent sizes for exampel:
ProductID | Name | Year | Price | Size | GroupID
1 | Nike | 2010 | 50 | 8 | Nike_2010
2 | Nike | 2010 | 50 | 9 | Nike_2010
3 | Nike | 2010 | 50 | 10 | Nike_2010
4 | Adidas| 2009 | 45 | 8 | Adidas_2009
5 | Adidas| 2009 | 45 | 9 | Adidas_2009
6 | Adidas| 2009 | 45 | 10 | Adidas_2009
I dont get the right count in my tblReview and tblComment. If I add a review to Nike size 8 and I add one review to Nike size 10 I want 2 count results when I list the products with diffrent GroupID. Now I get the same count on Reviews and Comment and both are wrong.
I use a datalist to show all the products with diffrent/unique GroupID, I want it to be like this:
______________
| |
| Name: Nike |
| Year: 2010 |
| (All Sizes) |
| x Reviews |
| x Comments |
| x AVG Grade |
|______________|
All Reviewcounts, Commentcounts and the Average of all products with the same GroupID, the Average works great.

Because you are not specifying any criteria which join the tables, you will get the product category you specify in combination with every of tblReview and tblComment (effectively a massive cross join).
Your AVG just happens to work out of luck.
You should try something like this:
SELECT (SELECT COUNT(*) FROM tblReview WHERE tblReview.GroupID = Product.GroupID) AS ReviewCount,
(SELECT COUNT(*) FROM tblComment WHERE tblComment.GroupID = Product.GroupID) AS CommentCount,
Product.GroupID,
MAX(Product.ProductID) AS ProductID,
AVG(Product.Price) AS Price,
MAX (Product.Year) AS Year,
MAX (Product.Name) AS Name,
(SELECT AVG(tblReview.Grade) FROM tblReview WHERE tblReview.GroupID = Product.GroupID) AS Grade
FROM Product
WHERE (Product.CategoryID = #CategoryID)
GROUP BY Product.GroupID
HAVING COUNT(distinct Product.GroupID) = 1
Normally I would not use correlated subqueries and instead join to aggregate subqueries, but this is more illustrative of your problem.

There will be one comment row for every product, so both COUNT(tblReview.GroupID) and COUNT(tblComment.GroupID) will return the number of products x number of comments for that group.
Another way of explaining that is by running the query without a group by. The database will iterate over the rows, and increase COUNT(tblReview.GroupID) for every row where tblReview.GroupID is not null.
One solution is to use distinct. Change the ReviewCount to:
COUNT(DISTINCT tblReview.GroupID) AS ReviewCount,
^^^^^^^^

Related

Getting a distinct value from one column if all rows matches a certain criteria

I'm trying to find a performant and easy-to-read query to get a distinct value from one column, if all rows in the table matches a certain criteria.
I have a table that tracks e-commerce orders and whether they're delivered on time, contents and schema as following:
> select * from orders;
+----+--------------------+-------------+
| id | delivered_on_time | customer_id |
+----+--------------------+-------------+
| 1 | 1 | 9 |
| 2 | 0 | 9 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
| 5 | 0 | 11 |
+----+--------------------+-------------+
I would like to get all distinct customer_id's which have had all their orders delivered on time. I.e. I would like an output like this:
+-------------+
| customer_id |
+-------------+
| 10 |
+-------------+
What's the best way to do this?
I've found a solution, but it's a bit hard to read and I doubt it's the most efficient way to do it (using double CTE's):
> with hits_all as (
select memberid,count(*) as count from orders group by memberid
),
hits_true as
(select memberid,count(*) as count from orders where hit = true group by memberid)
select
*
from
hits_true
inner join
hits_all on
hits_all.memberid = hits_true.memberid
and hits_all.count = hits_true.count;
+----------+-------+----------+-------+
| memberid | count | memberid | count |
+----------+-------+----------+-------+
| 10 | 2 | 10 | 2 |
+----------+-------+----------+-------+
You use group by and having as follows:
select customer_id
from orders
group by customer_id
having sum(delivered_on_time) = count(*)
This works because an ontime delivery is identified by delivered_on_time = 1. So you can just ensure that the sum of delivered_on_time is equal to the number of records for the customer.
You can use aggregation and having:
select customer_id
from orders
group by customer_id
having min(delivered_on_time) = max(delivered_on_time);

Excluding tuples based on maximum condition

I have been trying to answer to solve this SQL Query problem, but got no success. The problem is the following:
PROBLEM:
Given 4 tables, PRODUCTS, REPAIRS, OWNERS and MALFUNCTION, for each product Brand and Model display the type of malfunction which have been repaired more times.
The tables have the following fields:
PRODUCTS: *Series_num, Brand, Model, Year, Code_Owner
OWNERS: *Code_Owner, Name, Surname, Street, Civic, City, (u)Phone
MALFUNCTIONS: *Malf_code, Desc
REPAIRS: *Series_num, *Malf_code, *Repair_Date, Price
* <- Primary key
(u) <- Unique attribute
The expected result, given this example of data:
| MODEL | BRAND | MALF_CODE | NUMBER OF REPAIRS|
|----------------------------------------------------|
| 1 | BRAND1 | 1 | 20 |
| 1 | BRAND1 | 2 | 10 |
| 2 | BRAND1 | 1 | 1 |
| 2 | BRAND1 | 2 | 1 |
| 1 | BRAND2 | 1 | 10 |
| 1 | BRAND2 | 2 | 11 |
Should be:
| MODEL | BRAND | MALF_CODE | NUMBER OF REPAIRS|
|----------------------------------------------------|
| 1 | BRAND1 | 1 | 20 |
| 2 | BRAND1 | 1 | 1 |
| 1 | BRAND2 | 2 | 11 |
Note that BRAND1, MODEL:2 has the same number of repairs for two different types of malfunction, so one of the rows can be ignored or both of them can be shown (it does not matter)
WHAT I'VE TRIED:
To get the first table, I used a simple JOIN query:
SELECT A.MODEL, A.BRAND, R.MALF_CODE, COUNT(*) AS N_REP
FROM REPAIRS R LEFT JOIN PRODUCTS A ON A.SERIES_NUM = R.SERIES_NUM
GROUP BY A.MODEL, A.BRAND, R.MALF_CODE;
Then I tried to get the second table thanks to MAX() function:
SELECT A.MODEL, A.BRAND, R.MALF_CODE, COUNT(*) AS N_REP
FROM REPAIRS R LEFT JOIN PRODUCTS A ON A.SERIES_NUM = R.SERIES_NUM
GROUP BY A.MODEL, A.BRAND, R.MALF_CODE
HAVING COUNT(*) IN(
SELECT MAX(R.MALF_CODE)
FROM REPAIRS R LEFT JOIN PRODUCTS A ON A.SERIES_NUM = R.SERIES_NUM
GROUP BY A.MODEL, A.BRAND, R.MALF_CODE
ORDER BY A.BRAND, R.MALF_CODE);
But this throws me the following error:
[42000][907] ORA-00907: Missing closing Parenthesis
It seems I can't find the error.
I hope I've been clear enough. Thanks in advance.
EDIT: I forgot to mention that I'm aware of RANK functions and such, but never heard of Partitions. So a solution without them is highly appreciated but not mandatory.
If I understand correctly, you want the row with the most repairs for each model/brand combination. If so, window functions are one method:
SELECT MODEL, BRAND, MALF_CODE, N_REP
FROM (SELECT P.MODEL, P.BRAND, R.MALF_CODE, COUNT(*) AS N_REP,
ROW_NUMBER() OVER (PARTITION BY P.MODEL, P.BRAND ORDER BY COUNT(*) DESC, R.MALF_CODE) as SEQNUM
FROM REPAIRS R LEFT JOIN
PRODUCTS P
ON P.SERIES_NUM = R.SERIES_NUM
GROUP BY P.MODEL, P.BRAND, R.MALF_CODE
) MB
WHERE seqnum = 1;

Select DISTINCT returning too many records

I have two tables: Products and Items. I want to select distinct items that belong to a product based on the condition column, sorted by price ASC.
+-------------------+
| id | name |
+-------------------+
| 1 | Mickey Mouse |
+-------------------+
+-------------------------------------+
| id | product_id | condition | price |
+-------------------------------------+
| 1 | 1 | New | 90 |
| 2 | 1 | New | 80 |
| 3 | 1 | Excellent | 60 |
| 4 | 1 | Excellent | 50 |
| 5 | 1 | Used | 30 |
| 6 | 1 | Used | 20 |
+-------------------------------------+
Desired output:
+----------------------------------------+
| id | name | condition | price |
+----------------------------------------+
| 2 | Mickey Mouse | New | 80 |
| 4 | Mickey Mouse | Excellent | 50 |
| 6 | Mickey Mouse | Used | 20 |
+----------------------------------------+
Here's the query. It returns six records instead of the desired three:
SELECT DISTINCT(items.condition), items.price, products.name
FROM products
INNER JOIN items ON products.id = items.product_id
WHERE products.id = 1
ORDER BY items."price" ASC, products.name;
Correct PostgreSQL query:
SELECT DISTINCT ON (items.condition) items.id, items.condition, items.price, products.name
FROM products
INNER JOIN items ON products.id = items.product_id
WHERE products.id = 1
ORDER BY items.condition, items.price, products.name;
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal.
Details here
There is no distinct() function in SQL. Your query is being parsed as
SELECT DISTINCT (items.condition), ...
which is equivalent to
SELECT DISTINCT items.condition, ...
DISTINCT applies to the whole row - if two or more rows all have the same field values, THEN the "duplicate" row is dropped from the result set.
You probably want something more like
SELECT items.condition, MIN(items.price), products.name
FROM ...
...
GROUP BY products.id
I want to select distinct items that belong to a product based on the
condition column, sorted by price ASC.
You most probably want DISTINCT ON:
SELECT *
FROM (
SELECT DISTINCT ON (i.condition)
i.id AS item_id, p.name, i.condition, i.price
FROM products p
JOIN items i ON i.products.id = p.id
WHERE p.id = 1
ORDER BY i.condition, i.price ASC
) sub
ORDER BY item_id;
Since the leading columns of ORDER BY have to match the columns used in DISTINCT ON , you need a subquery to get the sort order you display.
Better yet:
SELECT i.item_id, p.name, i.condition, i.price
FROM (
SELECT DISTINCT ON (condition)
id AS item_id, product_id, condition, price
FROM items
WHERE product_id = 1
ORDER BY condition, price
) i
JOIN products p ON p.id = i.product_id
ORDER BY item_id;
Should be a bit faster.
Aside: You shouldn't be using the non-descriptive name id as identifier. Use item_id and product_id instead.
More details, links and a benchmark test in this related answer:
Select first row in each GROUP BY group?
Use a SELECT GROUP BY, extracting only the MIN(price) for every PRODUCT/CONDITION.

How to get a MAX and a COUNT from a three table join?

I got an interview question where there's a Car sale modeled in a DB. Each Car represents a physical car in a Car sale which refers to a Make and a Model table. A Sale table keeps track of each Car that is sold. A Sale only consists of one Car, so there's a record in Sale per every unique Car that had been sold.
The question was to find-out the name of the most sold Model in the car sale. I answered with a 3-level nested query. The interviewer specifically asked for a solution using joins where I only succeeded in just joining the tables without the aggregates.
How would you join 3 tables as below (Car, Make, Sale) while using two other aggregates?
Here's a rough sketch of the schema. The most sold Model here should return 'Corolla'
Car
| carid| modid | etc...
_________________
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
Make
| mkid | name |
_________________
| 1 | Toyota |
| 2 | Nissan |
| 3 | Chevy |
| 4 | Merc |
| 5 | Ford |
Model
| modid| name | mkid |
________________________
| 1 | Corolla| 1
| 2 | Sunny | 2
| 3 | Carina | 1
| 4 | Skyline| 2
| 5 | Focus | 5
Sale
| sid | carid | etc...
_________________
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
Edit:
Using MS SQL Server 2008
Output needed:
Model Name | Count
_____________________
Corolla | 3
i.e. The model of the Car that has been sold the most.
Notice only 3 Corollas and 2 Sunnys are in the Car table while Sale table corresponds to each of those with other sales detail. The 5 Sale records are actually Corolla, Corolla, Corolla, Sunnnu and Sunny.
Since you are using SQL Server 2008, make use of Common Table Expression and Window Function.
WITH recordList
AS
(
SELECT c.name, COUNT(*) [Count],
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) rn
FROM Sale a
INNER JOIN Car b
ON a.carid = b.carID
INNER JOIN Model c
ON b.modID = c.modID
GROUP BY c.Name
)
SELECT name, [Count]
FROM recordList
WHERE rn = 1
SQLFiddle Demo
When interviewers ask for this they usually want you to say that you'd use windowed functions. You could give each sale a unique ascending number partitioned by model and the highest sale number you'd get would be the max count.
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
Following query works on oracle 11g . here's fiddle link
SELECT name FROM (
SELECT model.name AS name FROM car , sale , model
WHERE car.carid=sale.carid
AND car.modid=model.modid
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
Or
SELECT name FROM (
SELECT model.name AS name FROM car natural join sale natural join model
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
OUTPUT
| NAME |
-----------
| Corolla |
Based on your newly added SQL Server 2008 tag. If you are using a different RDBMS you'll probably need to use limit instead of top and place it at the end of the top_sold_car subquery.
select Make.name as Make, Model.name as Model
from (
select top 1 count(*) as num_sold
from Car
group by modid
order by num_sold desc) as top_sold_car
join Model
on (top_sold_car.modid = Model.modid)
join Make
on (Model.mkid = Make.mkid)

Get SUM in GROUP BY with JOIN using MySQL

I have two tables in MySQL 5.1.38.
products
+----+------------+-------+------------+
| id | name | price | department |
+----+------------+-------+------------+
| 1 | Fire Truck | 15.00 | Toys |
| 2 | Bike | 75.00 | Toys |
| 3 | T-Shirt | 18.00 | Clothes |
| 4 | Skirt | 18.00 | Clothes |
| 5 | Pants | 22.00 | Clothes |
+----+------------+-------+------------+
ratings
+------------+--------+
| product_id | rating |
+------------+--------+
| 1 | 5 |
| 2 | 5 |
| 2 | 3 |
| 2 | 5 |
| 3 | 5 |
| 4 | 5 |
| 5 | 4 |
+------------+--------+
My goal is to get the total price of all products which have a 5 star rating in each department. Something like this.
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes | 36.00 | /* T-Shirt and Skirt */
| Toys | 90.00 | /* Fire Truck and Bike */
+------------+-------------+
I would like to do this without a subquery if I can. At first I tried a join with a sum().
select department, sum(price) from products
join ratings on product_id=products.id
where rating=5 group by department;
+------------+------------+
| department | sum(price) |
+------------+------------+
| Clothes | 36.00 |
| Toys | 165.00 |
+------------+------------+
As you can see the price for the Toys department is incorrect because there are two 5 star ratings for the Bike and therefore counting that price twice due to the join.
I then tried adding distinct to the sum.
select department, sum(distinct price) from products
join ratings on product_id=products.id where rating=5
group by department;
+------------+---------------------+
| department | sum(distinct price) |
+------------+---------------------+
| Clothes | 18.00 |
| Toys | 90.00 |
+------------+---------------------+
But then the clothes department is off because two products share the same price.
Currently my work-around involves taking something unique about the product (the id) and using that to make the price unique.
select department, sum(distinct price + id * 100000) - sum(id * 100000) as total_price
from products join ratings on product_id=products.id
where rating=5 group by department;
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes | 36.00 |
| Toys | 90.00 |
+------------+-------------+
But this feels like such a silly hack. Is there a better way to do this without a subquery? Thanks!
Use:
SELECT p.department,
SUM(p.price) AS total_price
FROM PRODUCTS p
JOIN (SELECT DISTINCT
r.product_id,
r.rating
FROM RATINGS r) x ON x.product_id = p.id
AND x.rating = 5
GROUP BY p.department
Technically, this does not use a subquery - it uses a derived table/inline view.
The primary reason you are having trouble finding a solution is that the schema as presented is fundamentally flawed. You shouldn't allow a table to have two rows that are complete duplicates of each other. Every table should have a means to uniquely identify each row even if it is the combination of all columns. Now, if we change the ratings table so that it has an AUTO_INCREMENT column called Id, the problem is easier:
Select products.department, Sum(price) As total_price
From products
Left Join ratings As R1
On R1.product_id = products.id
And R1.rating = 5
Left Join ratings As R2
On R2.product_id = R1.product_id
And R2.rating = R1.rating
And R2.Id > R1.Id
Where R2.Id Is Null
Group By products.department
You can do two queries. First query:
SELECT DISTINCT product_id FROM ratings WHERE rating = 5;
Then, take each of those ID's and manually put them in the second query:
SELECT department, Sum(price) AS total_price
FROM products
WHERE product_id In (1,2,3,4)
GROUP BY department;
This is the work-around for not being able to use subqueries. Without them, there is no way to eliminate the duplicate records caused by the join.
I can't think of any way to do it without a subquery somewhere in the query. You could perhaps use a View to mask the use of a subquery.
Barring that, your best bet is probably to find the minimum data set needed to make the calculation and do that in the front end. Whether or not that's possible depends on your specific data - how many rows, etc.
The other option (actually, maybe this is the best one...) would be to get a new ORM or do without it altogether ;)
This view would allow you to bypass the subquery:
CREATE VIEW Distinct_Product_Ratings
AS
SELECT DISTINCT
product_id,
rating
FROM
Ratings