Joining three tables with aggregation - sql

I have the following items table:
items:
id pr1 pr2 pr3
-------------------
1 11 22 tt
...
and two tables associated with the items:
comments:
item_id text
-------------
1 "cool"
1 "very good"
...
tags:
item_id tag
-------------
1 "life"
1 "drug"
...
Now I want to get a table with columns item_id, pr1, pr2, count(comments), count(tags) with a condition WHERE pr3 = zz. What is the best way to get it? I can do this by creating additional tables, but I was wondering if there is a way achieve this by using only a single SQL statement. I'm using Postgres 9.3.

The easiest way is certainly to get the counts in the select clause:
select
id,
pr1,
pr2,
(select count(*) from comments where item_id = items.id) as comment_count,
(select count(*) from tags where item_id = items.id) as tag_count
from items;

You can just join, but you need to be careful that you don't get double count. E.g. you can use a subqueries to get what you want.
SELECT i.id,i.pr1,i.pr2, commentcount,tagcount FROM
items i
INNER JOIN
(SELECT item_id,count(*) as commentcount from comments GROUP BY item_id) c
ON i.id = c.item_id
INNER JOIN
(SELECT item_id,count(*) as tagcount from tags GROUP BY item_id) t
ON i.id = t.item_id
[EDIT] based on the comment, here's the left join version...
SELECT i.id,i.pr1,i.pr2, coalesce(commentcount,0) as commentcount,
coalesce(tagcount,0) as tagcount FROM
items i
LEFT JOIN
(SELECT item_id,count(*) as commentcount from comments GROUP BY item_id) c
ON i.id = c.item_id
LEFT JOIN
(SELECT item_id,count(*) as tagcount from tags GROUP BY item_id) t
ON i.id = t.item_id

Try this:
SELECT i.id, i.pr1, i.pr2, A.commentCount, B.tagCount
FROM items i
LEFT OUTER JOIN (SELECT item_id, COUNT(1) AS commentCount
FROM comments
GROUP BY item_id
) AS A ON i.id = A.item_id
LEFT OUTER JOIN (SELECT item_id, count(1) as tagCount
FROM tags
GROUP BY item_id
) AS B ON i.id = B.item_id;

select
i.id
, i.pr1
, i.pr2
, count(c.item_id) as count_comments
, count(t.item_id) as count_tags
from items i
left outer join comments c on i.id = c.item_id
left outer join tags t on i.id = t.item_id
group by i.id, i.pr1, i.pr2
I've used a LEFT OUTER JOIN to also return counts of zero.

Related

Select query not printing all of the items

When running
SELECT
s.id space_id
,s.name space_name
,i.name item_name
,GROUP_CONCAT(DISTINCT a.category) attribute_names
FROM spaces s
INNER JOIN spaceItemAssociations sia ON sia.space_id = s.id
INNER JOIN items i ON i.id = sia.item_id
INNER JOIN itemAssociations ia ON ia.items_id = i.id
INNER JOIN itemAttributes a ON ia.itemAttributes_id = a.id
WHERE s.id = 1
on this sql fiddle I get only one row instead of four.
Expected:
4 rows with the objects belonging to this space and their attributes
Actual:
1 row
Is it my select that is wrong?
Because an aggregate function (GROUP_CONCAT) is present, this is an aggregate query - but because no GROUP BY is present, it aggregates over the entire result set, leaving you with a single result row.
If you want distinct categories for each item (each group of rows corresponding to the same item), rather than distinct categories across all items, add a GROUP BY i.id or similar.
Reference
When you tried to GROUP_CONCAT rows by categories, you don't see that you only have attribute2 for these 4 rows. Normal behavior is to show only 1 row because of GROUP_CONCAT clause.
Check now on my SQL FIDDLE when I commented out GROUP_CONCAT that query return all 4 rows.
SELECT
s.id space_id
,s.name space_name
,i.name item_name
, a.category
--,GROUP_CONCAT(DISTINCT a.category) attribute_names
FROM spaces s
INNER JOIN spaceItemAssociations sia ON sia.space_id = s.id
INNER JOIN items i ON i.id = sia.item_id
INNER JOIN itemAssociations ia ON ia.items_id = i.id
INNER JOIN itemAttributes a ON ia.itemAttributes_id = a.id
WHERE s.id = 1

SELECT * and SELECT COUNT(*) in one query

My SQL query looks like this
SELECT *
FROM categories AS c
LEFT JOIN LATERAL (SELECT i.*
FROM influencer_profiles AS i
WHERE c.id = i.category_id
ORDER BY i.updated_at
LIMIT 2) AS i ON 1 = 1
INNER JOIN users AS u ON i.user_id = u.id
But I also want to count each influencer_profile for category to display how many influencer_profiles in each categories. How can I use COUNT(*) with selecting all columns?
SELECT *
FROM categories AS c
LEFT JOIN LATERAL (SELECT COUNT(*)
FROM influencer_profiles AS i
WHERE c.id = i.category_id
ORDER BY i.updated_at
LIMIT 2) AS i ON 1 = 1
INNER JOIN users AS u ON i.user_id = u.id
This code doesn't work.
Perhaps you just want a window function. I note that you are using left join in one place and the inner join is undoing it.
So, I am thinking:
SELECT c.*, i.*, u.*,
COUNT(*) OVER (PARTITION BY c.id) as category_cnt
FROM categories c LEFT JOIN LATERAL
(SELECT i.*
FROM influencer_profiles AS i
WHERE c.id = i.category_id
ORDER BY i.updated_at
LIMIT 2
) i
ON 1=1 LEFT JOIN
users u
ON i.user_id = u.id;

Left outer join with only first row

I have a query something like
SELECT S.product_id, S.link, C.id AS category_id
FROM Products P
INNER JOIN SEO S ON S.product_id = P.id AND P.product_type = 1
LEFT OUTER JOIN Categories C ON c.product_id = P.id
WHERE P.active = 1
I works fine for me as long as each product has assigned to only one category. But if a product is assigned to many categories it returns all possible combinations.
Can I only select the first one and if a product don't have any category the link should still be returned with category_id = NULL
An easy way is to use outer apply, so as to have a correlated join, and make that a top 1 query. Thus you are able to access all columns of the category record in question. I'm adding a category name here as an example:
select s.product_id, s.link, c.id as category_id, c.name as category_name
from products p
inner join seo s on s.product_id = p.id
outer apply
(
select top 1 *
from categories cat
where cat.product_id = p.id
order by cat.id
) c
where p.active = 1
and p.product_type = 1;
You can use a GROUP BY to accomplish this along with an Aggregate function, most likely MIN or MAX.
Depending on which Category Id you prefer in your result you could select the minimum.
SELECT S.product_id, S.link, MIN(C.id) AS category_id
FROM Products P
INNER JOIN SEO S ON S.product_id = P.id AND P.product_type = 1
LEFT OUTER JOIN Categories C ON c.product_id = P.id
WHERE P.active = 1
GROUP BY S.product_id, S.link
Or the maximum.
SELECT S.product_id, S.link, MAX(C.id) AS category_id
FROM Products P
INNER JOIN SEO S ON S.product_id = P.id AND P.product_type = 1
LEFT OUTER JOIN Categories C ON c.product_id = P.id
WHERE P.active = 1
GROUP BY S.product_id, S.link
Alternate solution using subquery:
SELECT S.product_id, S.link,
(
SELECT C.id FROM Categories C WHERE C.product_id = P.id AND
ROW_NUMBER() OVER(ORDER BY /* your sort option goes here*/ ) = 1
) AS category_id
FROM Products P
INNER JOIN SEO S ON S.product_id = P.id AND P.product_type = 1
WHERE P.active = 1

ORDER BY id IN Subquery

I have a query like this:
SELECT i.*
FROM items i
WHERE i.id
IN (
SELECT c.item_id
FROM cart c
WHERE c.sessID=MY_SESSION_ID
)
It's working beautifully, but I need to sort items from the cart by date of purchase (cart.id) DESC.
I don't want sort in PHP. How can I sort by cart.id?
I tried:
SELECT i.*
FROM items i
WHERE i.id
IN (
SELECT c.item_id
FROM cart c
WHERE c.sessID=MY_SESSION_ID
)
ORDER BY c.id
But it did not sort correctly.
Change your Sub query to Inner Join. Sub query will not allow to you refer the columns outside of sub query. So change it to Inner join
SELECT i.*
FROM items i
JOIN (SELECT item_id,
id
FROM cart) C
ON i.id = c.item_id
AND c.sessID = MY_SESSION_ID
ORDER BY c.id Desc
or use this.
SELECT i.*
FROM items i
JOIN cart C
ON i.id = c.item_id
AND c.sessID = MY_SESSION_ID
ORDER BY c.id Desc
Try this query:
SELECT i.* FROM items i LEFT OUTER JOIN cart c
ON i.id = c.item_id WHERE c.sessID=MY_SESSION_ID AND
c.item_id is not null ORDER BY c.id
Try this:
SELECT i.*
FROM items i
INNER JOIN cart c ON i.id = c.item_id
WHERE c.sessID = MY_SESSION_ID
GROUP BY i.id
ORDER BY MAX(c.id) DESC;
OR
SELECT i.*
FROM items i
INNER JOIN (SELECT item_id, MAX(id) AS cid
FROM cart
WHERE sessID = MY_SESSION_ID
GROUP BY item_id
) AS c ON i.id = c.item_id
ORDER BY c.cid DESC;

Optimize JOIN SQL query with additional SELECT

I need a query which will select just one (GROUP BY phi.id_product) image for each product and this image have to be the one with the highest priority (inner SELECT with ORDER BY statement).
The priority is stored in N:M relation table called product_has_image
I've created a query, but it tooks about 3 seconds to execute and I need to optimize it. Here it is:
SELECT p.*, i.id AS imageid
FROM `product` p JOIN `category` c on c.`id` = p.`id_category`
LEFT OUTER JOIN (SELECT id_product, id_image FROM
`product_has_image` ORDER BY priority DESC) phi ON p.id = phi.id_product
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
Indexes which I find to be important in this query are:
image (PRIMARY id)
product_has_image (PRIMARY id_product, id_image; INDEX id_product; INDEX id_image)
product (PRIMARY id, id_category; INDEX id_category)
category (PRIMARY id; INDEX id_parent)
Most of the time takes joining the tables using the SELECT statement which is required for sorting.
Joining with LEFT JOIN [product_has_image] phi ON p.id = phi.id_product is much faster, but doesn't assign the image with the highest priority.
Any help would be appreciated.
Reformatted for sensibility . . .
SELECT p.*, i.id AS imageid
FROM `product` p
INNER JOIN `category` c on (c.`id` = p.`id_category`)
LEFT OUTER JOIN (SELECT id_product, id_image
FROM `product_has_image`
ORDER BY priority DESC) phi
ON (p.id = phi.id_product)
LEFT OUTER JOIN `image` i
ON (phi.id_image = i.id)
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
Without seeing an execution plan or DDL, I'd guess (shudder) that the problem is likely to be the inner select/sort. If you create a view
create view highest_priority_images as
select id_product, max(priority)
from product_has_image
group by id_product
Then you can replace that inner SELECT...ORDER BY with a SELECT...INNER JOIN on that view. That would reduce the cardinality, so I'd expect it to run faster.
Posting DDL would help.
I would probably try to do it like this:
SELECT p.*, i.id AS imageid
FROM `product` p
INNER JOIN `category` c ON c.id = p.id_category
/* a list of `id_product`s with their highest priorities
from `product_has_image` */
LEFT OUTER JOIN (
SELECT id_product, MAX(priority) AS max_priority
FROM `product_has_image`
GROUP BY id_product
) m ON p.id = m.id_product
/* now joining `product_has_image` again, using
m.`max_priority` for additional filtering */
LEFT OUTER JOIN `product_has_image` phi
ON p.id = phi.id_product AND m.max_priority = phi.priority
/* if you only select `id` from `image`, you can use
phi.`id_image` instead and remove this join */
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE c.id_parent = 2 OR c.id = 2
Can't test it now, but wouldn't it be possible to do this?
SELECT p.*, i.id AS imageid
FROM `product` p JOIN `category` c on c.`id` = p.`id_category`
LEFT JOIN `product_has_image` phi ON p.id = phi.id_product
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
ORDER BY phi.priority DESC
Do it in a regular join and order by phi.priority.