group twice in one query - sql

I use below code but doesn't return what I expect,
the table relationship,
each gallery is include multiple media and each media is include multiple media_user_action.
I want to count each gallery how many media_user_action and order by this count
rows: [
{
"id": 1
},
{
"id": 2
}
]
and this query will return duplicate gallery rows something like
rows: [
{
"id": 1
},
{
"id": 1
},
{
"id": 2
}
...
]
I think because in the LEFT JOIN subquery select media_user_action rows only group by media_id,
need to group by gallery_id also ?
SELECT
g.*
FROM gallery g
LEFT JOIN gallery_media gm ON gm.gallery_id = g.id
LEFT JOIN (
SELECT
media_id,
COUNT(*) as mua_count
FROM media_user_action
WHERE type = 0
GROUP BY media_id
) mua ON mua.media_id = gm.media_id
ORDER BY g.id desc NULLS LAST OFFSET $1 LIMIT $2
table
gallery
id |
1 |
2 |
gallery_media
id | gallery_id fk gallery.id | media_id fk media.id
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
....
media_user_action
id | media_id fk media.id | user_id | type
1 | 1 | 1 | 0
2 | 1 | 2 | 0
3 | 3 | 1 | 0
...
media
id |
1 |
2 |
3 |
UPDATE
There's more other table I need to select, this is a part in a function like this https://jsfiddle.net/g8wtqqqa/1/ when user input option then build query.
So I correct my question I need to find a way if user want to count media_user_action order by it, I wanna know how to put these in a subquery possible not change any other code
Base on below #trincot answer I update code, only add media_count on top change a little bit and put those in sub query. is what I want,
now they are group by gallery.id, but sort media_count desc and asc are same result not working I can't find why?
SELECT
g.*,
row_to_json(gi.*) as gallery_information,
row_to_json(gl.*) as gallery_limit,
media_count
FROM gallery g
LEFT JOIN gallery_information gi ON gi.gallery_id = g.id
LEFT JOIN gallery_limit gl ON gl.gallery_id = g.id
LEFT JOIN "user" u ON u.id = g.create_by_user_id
LEFT JOIN category_gallery cg ON cg.gallery_id = g.id
LEFT JOIN category c ON c.id = cg.category_id
LEFT JOIN (
SELECT
gm.gallery_id,
COUNT(DISTINCT mua.media_id) media_count
FROM gallery_media gm
INNER JOIN media_user_action mua
ON mua.media_id = gm.media_id AND mua.type = 0
GROUP BY gm.gallery_id
) gm ON gm.gallery_id = g.id
ORDER BY gm.media_count asc NULLS LAST OFFSET $1 LIMIT $2

The join with gallery_media table is multiplying your results. The count and grouping should happen after you have made that join.
You could achieve that like this:
SELECT g.id,
COUNT(DISTINCT mua.media_id)
FROM gallery g
LEFT JOIN gallery_media gm
ON gm.gallery_id = g.id
LEFT JOIN media_user_action mua
ON mua.media_id = gm.id AND type = 0
GROUP BY g.id
ORDER BY 2 DESC
If you need the other informations as well, you could use the above (in simplified form) as a sub-query, which you join with anything else that you need, but will not multiply the number of rows:
SELECT g.*
row_to_json(gi.*) as gallery_information,
row_to_json(gl.*) as gallery_limit,
media_count
FROM gallery g
LEFT JOIN (
SELECT gm.gallery_id,
COUNT(DISTINCT mua.media_id) media_count
FROM gallery_media gm
INNER JOIN media_user_action mua
ON mua.media_id = gm.id AND type = 0
GROUP BY gm.gallery_id
) gm
ON gm.gallery_id = g.id
LEFT JOIN gallery_information gi ON gi.gallery_id = g.id
LEFT JOIN gallery_limit gl ON gl.gallery_id = g.id
ORDER BY media_count DESC NULLS LAST
OFFSET $1
LIMIT $2
The above assumes that gallery_id is unique in the tables gallery_information and gallery_limit.

You're grouping by media_id to get a count, but since one gallery can have many gallery_media, you still end up with multiple rows for one gallery. You can either sum the mua_count from your subselect:
SELECT g.*, sum(mua_count)
FROM gallery g
LEFT JOIN gallery_media gm ON gm.gallery_id = g.id
LEFT JOIN (
SELECT media_id,
COUNT(*) as mua_count
FROM media_user_action
WHERE type = 0
GROUP BY media_id
) mua ON mua.media_id = gm.media_id
GROUP BY g.id
ORDER BY g.id desc NULLS LAST;
id | sum
----+-----
2 | 1
1 | 2
Or you can just JOIN all the way through and group once on g.id:
SELECT g.id, count(*)
FROM gallery g
JOIN gallery_media gm ON gm.gallery_id = g.id
JOIN media_user_action mua ON mua.media_id = gm.id
GROUP BY g.id
ORDER BY count DESC;
id | count
----+-------
1 | 2
2 | 1

If you only want to show data from table gallery (with select g.*) then why do you join the other tables? Outer joins either join one ore more records to each main record (depending on how many matches are found in the outer-joined table), so no surprise you get duplicates (in your case because gallery ID 1 has two matches in gallery_media).

Related

Postgres: Many to many joins creates double output

I've recently added a many to many JOIN to one of my queries to add a "tag" functionality. The many to many works great, however, it's now causing a previously working part of the query to output records twice.
SELECT v.*
FROM "Server" AS s
JOIN "Vote" AS v ON (s.id = v."serverId")
JOIN "_ServerToTag" st ON (s.id = st."A")
OFFSET 0 LIMIT 25;
id | createdAt | authorId | serverId
-----+-------------------------+----------+----------
190 | 2020-12-23 15:47:25.476 | 6667 | 3
190 | 2020-12-23 15:47:25.476 | 6667 | 3
194 | 2020-12-21 15:47:25.476 | 6667 | 3
194 | 2020-12-21 15:47:25.476 | 6667 | 3
In the example above:
Server is my main table which contains a bunch of entries. Think of it as Reddit Posts, they have a title, content and use the Vote table to count "upvotes".
id | title
----+-------------------------------
3 | test server 3
Votes is a really simple table, it contains a timestamp of the "upvote", who created it, and the Server.id it is assigned to.
_ServerToTag is a table that contains two columns A and B. It connects Server to another table which contains Tags.
A | B
---+---
3 | 1
3 | 2
The above is a much-simplified query, in reality, I am suming the outcome of the query to get a number total of Votes.
The desired outcome would be that the results are not duplicated:
id | createdAt | authorId | serverId
-----+-------------------------+----------+----------
190 | 2020-12-23 15:47:25.476 | 6667 | 3
194 | 2020-12-21 15:47:25.476 | 6667 | 3
I'm really unsure why this is even happening so I have absolutely no idea how to fix it.
Any help would be greatly appreciated.
Edit:
DISTINCT works if I want to query the Vote table. But not in more complex queries. In my case it would look something more like this:
SELECT s.id, s.title, sum(case WHEN v."createdAt" >= '2020-12-01' AND v."createdAt" < '2021-01-01'
THEN 1 ELSE 0 END ) AS "voteCount",
FROM "Server" AS s
LEFT JOIN "Vote" AS v ON (s.id = "serverId")
LEFT JOIN "_ServerToTag" st ON (s.id = st."A");
id | title | voteCount
----+-------------------------------+-----------
3 | test server 3 | 4
In the above, I only need the voteCount column to be DISTINCT.
SELECT s.id, s.title, sum(DISTINCT case WHEN v."createdAt" >= '2020-12-01' AND v."createdAt" < '2021-01-01'
THEN 1 ELSE 0 END ) AS "voteCount",
FROM "Server" AS s
LEFT JOIN "Vote" AS v ON (s.id = "serverId")
LEFT JOIN "_ServerToTag" st ON (s.id = st."A");
id | title | voteCount
----+-------------------------------+-----------
3 | test server 3 | 1
The above kind of works, but it seems to only count one vote even if there are multiple.
It appears that the problem is that you added the join to _ServerToTag. Because there are multiple rows in _ServerToTag for each row in Server the query returns multiple rows for each server, one for each matching row in _ServerToTag.
It appears that _ServerToTag was adde to the query so it will only include servers which have tags. If that's your intent you can use:
SELECT v.id, v.authorId, v.serverId, COUNT(DISTINCT v.createdAt) AS TOTAL_VOTES
FROM "Server" AS s
INNER JOIN "Vote" AS v
ON s.id = v."serverId"
INNER JOIN (SELECT DISTINCT "A" FROM "_ServerToTag") st
ON s.id = st."A"
WHERE v."createdAt" >= '2020-12-01' AND
v."createdAt" < '2021-01-01'
GROUP BY v.id, v.authorId, v.serverId
OFFSET 0 LIMIT 25
or
SELECT v.id, v.authorId, v.serverId, COUNT(DISTINCT v.createdAt) AS TOTAL_VOTES
FROM "Server" AS s
INNER JOIN "Vote" AS v
ON s.id = v."serverId"
WHERE v."createdAt" >= '2020-12-01' AND
v."createdAt" < '2021-01-01' AND
s.id IN (SELECT "A" FROM "_ServerToTag")
GROUP BY v.id, v.authorId, v.serverId
OFFSET 0 LIMIT 25
which may communicate the intent of the query a bit better.
EDIT
If you want to be able to count entries which have no votes you'll need to use an outer join to pull in the (potentially non-existent) votes and then use a CASE expression to only count votes if they exist:
SELECT s.id, v.id, v.authorId, v.serverId,
CASE
WHEN v.id IS NULL THEN 0
ELSE COUNT(DISTINCT v.createdAt)
END AS TOTAL_VOTES
FROM "Server" AS s
LEFT OUTER JOIN "Vote" AS v
ON s.id = v."serverId"
WHERE v."createdAt" >= '2020-12-01' AND
v."createdAt" < '2021-01-01' AND
s.id IN (SELECT "A" FROM "_ServerToTag")
GROUP BY s.id, v.id, v.authorId, v.serverId
OFFSET 0 LIMIT 25
You may not actually need that though - you may be able to get away with
SELECT s.id, v.id, v.authorId, v.serverId,
COUNT(DISTINCT v.createdAt) AS TOTAL_VOTES
FROM "Server" AS s
LEFT OUTER JOIN "Vote" AS v
ON s.id = v."serverId"
WHERE v."createdAt" >= '2020-12-01' AND
v."createdAt" < '2021-01-01' AND
s.id IN (SELECT "A" FROM "_ServerToTag")
GROUP BY s.id, v.id, v.authorId, v.serverId
OFFSET 0 LIMIT 25
Okay so I went and asked a friend for help after not really being able to fix my problem with the answers I received.
I think my query was just too complex and confusing and I was suggested to use subqueries to make it less complicated and easier to manage.
My query now looks like this:
SELECT
s.id
, s.title
, COALESCE(v."VOTES", 0) AS "voteCount"
FROM "Server" AS s
-- Join tags
INNER JOIN
(
SELECT
st."A"
, json_agg(
json_build_object(
'id',
t.id,
'tagName',
t."tagName"
)
) as "tagsArray"
FROM
"_ServerToTag" AS st
INNER JOIN
"Tag" AS t
ON
t.id = st."B"
GROUP BY
st."A"
) AS tag
ON
tag."A" = s.id
-- Count votes
LEFT JOIN
(
SELECT
"serverId"
, COUNT(*) AS "VOTES"
FROM
"Vote" as v
WHERE
v."createdAt" >= '2020-12-01' AND
v."createdAt" < '2021-01-01'
GROUP BY "serverId"
) as v
ON
s.id = v."serverId"
OFFSET 0 LIMIT 25;
This works exactly the same way but by selecting what I need directly in the joins it's more readable and I have more control over the data I get back.

how to get value using latest date from one table and joining to another table

i have 1 table inventory_movement here is data in table
product_id | staff_name | status | sum | reference_number
--------------------------------------------------
1 zes cp 1 000122
2 shan cp 4 000133
i have another table inventory_orderproduct where i have cost date
orderdate product_id cost
--------------------------------
01/11/2018 1 3200
01/11/2018 2 100
02/11/2018 1 4000
02/11/2018 1 500
03/11/2018 2 2000
i want this result
product_id| staff_name | status | sum reference_number | cost
--------------------------------------------------------------
1 zes cp 1 000122 4000
2 shan cp 4 000133 2000
here is my query
select ipm.product_id,
case when ipm.order_by_id is not null then
(select au.first_name from users_staffuser us inner join auth_user au on us.user_id= au.id
where us.id = ipm.order_by_id) else '0' end as "Staff_name"
,ipm.status,
Sum(ipm.quantity), ip.reference_number
from inventory_productmovement ipm
inner join inventory_product ip on ipm.product_id = ip.id
inner join users_staffuser us on ip.branch_id = us.branch_id
inner join auth_user au on us.user_id = au.id
AND ipm.status = 'CP'
group by ipm.product_id, au.first_name, ipm.status,
ip.reference_number, ip.product_name
order by 1
Here is the solution of your question.its working fine.if you like the answer please vote!
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.Cost
FROM (SELECT product_id,MAX(cost) AS Cost
FROM inventory_orderproduct
GROUP BY product_id ) s
JOIN inventory_movement i ON i.product_id =s.product_id
In the given situation, this should work fine:
Select table1.product_id, table2.staff_name, table2.status, table2.reference_number,
MAX(table1.cost)
FROM table2
LEFT JOIN table1 ON table1.product_id = table2.product_id
GROUP BY table2.product_id, table2.staff_name, table2.status, table2.reference_number
You can use the below query to get MAX cost for products
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.MAXCost
FROM (SELECT product_id,MAX(cost) AS MAXCost
FROM inventory_orderproduct
GROUP BY product_id ) s
JOIN inventory_movement i ON i.product_id =s.product_id
For Retrieving the cost using the latest date use the below query
WITH cte as (
SELECT product_id,cost
,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY orderdate DESC) AS Rno
FROM inventory_orderproduct )
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.Cost
FROM cte s
JOIN inventory_movement i ON i.product_id =s.product_id
WHERE s.Rno=1
You can use below query it will pick the data according to the latest date
WITH result as (
SELECT product_id,cost
,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY date DESC)
FROM inventory_orderproduct )
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.Cost
FROM result s
JOIN inventory_movement i ON i.product_id =s.product_id

Order by and limit on a multiple left join - PostgresSQL and python

I have the following relations:
A Product have multiple Images
A Product can have multiple Categories
A Category can have multiple Products
I want to get:
only the 'short_name' from the first category
only the first image url order_by another parameter
I have the following SQL, in PostgreSql:
SELECT DISTINCT ON(I.product_id) P.id, P.name, P.short_description,
CAT.short_name AS category, I.url
FROM products_product AS P
LEFT JOIN products_product_categories AS RPC ON P.id = RPC.product_id
LEFT JOIN categories_category AS CAT ON RPC.category_id = CAT.id
LEFT JOIN products_productimage AS I ON I.product_id = P.id
WHERE (P.is_active = TRUE)
My issue is that I don't know to limit left join and order by, I try to add LIMIT 1
LEFT JOIN categories_category AS CAT ON RPC.category_id = CAT.id LIMIT 1
but it is not working, I receive a code error 'syntax error at or near "LEFT"'
Category table
id | category_name | category_short_name
1 catA A
2 catB B
3 catC C
Product table
id | product_name | product_desc
1 P1 lorem1
2 P2 lorem2
3 P3 lorem3
ManytoMany: product_category
id product_id category_id
1 1 1
2 2 1
3 1 2
4 3 3
5 3 3
Image table
id url product_id order
1 lo1 1 4
2 lo2 1 0
3 lo3 1 1
4 lo4 2 0
For Product with id1 I expect to get:
name: P1, desc 'lorem1', category short_name : cat A, image url lo2
DISTINCT ON makes no sense without ORDER BY. As you want two different orders (on i.order for images and on cat.id for categories), you must do this in separate subqueries.
select p.id, p.name, p.short_description, c.short_name, i.url
from products_product p
left join
(
select distinct on (pcat.product_id) pcat.product_id, cat.short_name
from products_product_categories pcat
join categories_category cat on cat.id = pcat.category_id
order by pcat.product_id, cat.id
) c on c.product_id = p.id
left join
(
select distinct on (product_id) product_id, url
from products_productimage
order by product_id, order
) i on i.product_id = p.id
where p.is_active
order by p.id;
Two alternatives to write this query are:
subqueries with fetch first row only in the select clause
lateral left joins on subqueries with fetch first row only

Select all categories with COUNT of sub-categories

I need to select all categories with count of its sub-categories.
Assume here are my tables:
categories
id | title
----------
1 | colors
2 | animals
3 | plants
sub_categories
id | category_id | title | confirmed
------------------------------------
1 1 red 1
2 1 blue 1
3 1 pink 1
4 2 cat 1
5 2 tiger 0
6 2 lion 0
What I want is :
id | title | count
------------------
1 colors 3
2 animals 1
3 plants 0
What I have tried so far:
SELECT c.id, c.title, count(s.category_id) as count from categories c
LEFT JOIN sub_categories s on c.id = s.category_id
WHERE c.confirmed = 't' AND s.confirmed='t'
GROUP BY c.id, c.title
ORDER BY count DESC
The only problem with this query is that this query does not show categories with 0 sub categories!
You also can check that on SqlFiddle
Any help would be great appreciated.
The reason you don't get rows with zero counts is that WHERE clause checks s.confirmed to be t, thus eliminating rows with NULLs from the outer join result.
Move s.confirmed check into join expression to fix this problem:
SELECT c.id, c.title, count(s.category_id) as count from categories c
LEFT JOIN sub_categories s on c.id = s.category_id AND s.confirmed='t'
WHERE c.confirmed = 't'
GROUP BY c.id, c.title
ORDER BY count DESC
Adding Sql Fiddle: http://sqlfiddle.com/#!17/83add/13
I think you can try this too (it evidence what column(s) you are really grouping by):
SELECT c.id, c.title, RC
from categories c
LEFT JOIN (SELECT category_id, COUNT(*) AS RC
FROM sub_categories
WHERE confirmed= 't'
GROUP BY category_id) s on c.id = s.category_id
WHERE c.confirmed = 't'
ORDER BY RC DESC

SQL Server: Subquery on a join

I have two tables with schema and data as below. Table A has an id and an associated name. Table B associates the id from Table A with a price and otherAttr. For each entry in Table A, there may be multiple entries with different prices in Table B, but otherAttr is the same for each entry.
Given an id for Table A, I would like to select the name, otherAttr, and the minimum price.
The below query returns multiple results, I need to write a query that will return a single result with the minimum price.
SELECT a.name, b.price, b.otherAttr
FROM A a
LEFT Join B b on b.idFromA = 1
WHERE a.id = 1
Table A Table B
id | name idFromA | price | otherAttr
-------- ---------------------------
1 | A 1 | 200 | abc
2 | B 1 | 300 | abc
1 | 400 | abc
2 | 20 | def
2 | 30 | def
2 | 40 | ef
I massively oversimplified my example. In addition to selecting the min price and otherAttr from Table B, I also have to select a bunch of other attributes from joins on other tables. Which is why I was thinking the Group By and Min should be a subquery of the join on Table B, as a way to avoid Grouping By all the attributes I am selecting (because the attributes being selected for vary programmatically).
The Actual query looks more like:
SELECT a.name, b.price, b.otherAttr, c.x, c.y, d.e, d.f, g.h....
FROM A a
LEFT Join B b on b.idFromA = 1
LEFT Join C c on something...
LEFT Join D d on something...
LEFT Join G g on something...
WHERE a.id = 1
To get this, you could use GROUP BY in an INNER query:
SELECT gd.name, gd.price,gd.otherAttr, c.x, c.y, d.e, d.f, g.h....
FROM
(SELECT a.id,a.name, MIN(b.price) as price, b.otherAttr
FROM A a
LEFT Join B b on b.idFromA = 1
WHERE a.id = 1
GROUP BY a.id,a.name,b.otherAttr) gd
LEFT Join B b on b.idFromA = 1
LEFT Join C c on something...
LEFT Join D d on something...
LEFT Join G g on something...
Try:-
SELECT a.name, MIN(b.price) minprice, b.otherAttr
FROM A a
LEFT Join B b ON a.Id = b.idFromA
GROUP BY a.name, b.otherAttr
HAVING a.id = 1
You could just do this instead:
SELECT a.name, MIN(b.price), MIN(b.otherAttr)
FROM TableA a
LEFT JOIN TableB b on b.idFromA = a.id
GROUP BY a.name
HAVING a.id = 1;
You need to inner join on price as well in addition to id on the subquery to intersect the right record(s) with the lowest price(s). Then TOP(1) will return only one of those records. You can avoid using TOP(1) if you can expand the conditions and group by fields in the subquery so you schema can assure only a single record is returned for that combination of attributes. Lastly, avoid left joins when intersecting sets.
SELECT TOP(1) p.id, p.price, b.OtherAttr
FROM B as b
INNER JOIN
(SELECT A.id, min(B.price) as price
FROM B
INNER JOIN A on A.id=B.idFromA and A.ID=1
GROUP BY A.id) as p on b.idFromA=p.id and b.price=p.price