Postgres Select Distinct AND Order By Date - sql

Table and columns of note:
pictures, tags, picture_tags
pictures.id
pictures.created_date
tags.id
picture_tags.tag_id
picture_tags.picture_id
I have the following query using a join table:
SELECT pictures.*, pictures.id as picture_id, tags.id as tag_id
FROM picture_tags
LEFT JOIN pictures ON pictures.id = picture_tags.picture_id
LEFT JOIN tags ON tags.id = picture_tags.tag_id
WHERE picture_tags.tag_id IN (1, 2)
GROUP BY pictures.id, tags.id
ORDER BY pictures.created_date ASC;
Since a picture can have multiple tags, this can return the same picture.id multiple times. Is there a way to prevent this so that picture.ids only show up once?
It is currently returning like this:
id | created_date | picture_id | tag_id
1 | 2022-12-08 19:04:23 | 1 | 1
1 | 2022-12-08 19:04:23 | 1 | 2
2 | 2022-12-09 00:46:30 | 2 | 3
My ideal return would be something like:
picture.created_date | picture.id | tagIds
2022-12-08 19:04:23 | 1 | [ 1, 2 ]
2022-12-09 00:46:30 | 2 | [3]

As I said in a comment, you want to think carefully before combining rows like this. But if you really want to, you can do this:
SELECT p.id, p.created_date, string_agg(pt.tag_id, ',') as tag_ids
FROM pictures p
INNER JOIN picture_tags pt on pt.picture_id = p.id
WHERE pg.tag_id IN (1,2)
GROUP BY p.id, p.created_date
ORDER BY p.created_date
Note I converted the LEFT JOIN to INNER JOIN. In the original query, the first join made no sense as a LEFT JOIN (all the fields would be NULL) and the second join was effectively an INNER JOIN because of the WHERE clause. Additionally, since we didn't actually use any fields from the tag table I was able to remove that join completely.

Related

Efficiently getting multiple counts of foreign key rows in PostgreSQL

I have a database that consists of users who can perform various actions, which I keep track of in multiple tables. I'm creating a point system, so I need to count how many of each type of action the user did. For example, if I had:
users posts comments shares
id | username id | user_id id | user_id id | user_id
------------- -------------- -------------- --------------
1 | abc 1 | 1 1 | 1 1 | 2
2 | xyz 2 | 1 2 | 2 2 | 2
I would want to return:
user_details
id | username | post_count | comment_count | share_count
---------------------------------------------------------
1 | abc | 2 | 1 | 0
2 | xyz | 0 | 1 | 2
This is slightly different from this question about foreign key counts since I want to return the individual counts per table.
What I've tried so far (example code):
SELECT
users.id,
users.username,
COUNT( DISTINCT posts.id ) as post_count,
COUNT( DISTINCT comments.id ) as comment_count,
COUNT( DISTINCT shares.id ) as share_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
LEFT JOIN comments ON comments.user_id = users.id
LEFT JOIN shares ON shares.user_id = users.id
GROUP BY users.id
While this works, I had to use DISTINCT in all of my counts because the LEFT JOINS were causing high numbers of duplicate rows. I feel like there must be a better way to do this since (please correct me if I'm wrong) on each LEFT JOIN, the DISTINCT is having to filter out an exponentially growing number of duplicated rows.
Thank you so much for any help you could give me with this!
You can join derived tables that already do the aggregation.
SELECT u.id,
u.username,
coalesce(pc.c, 0) AS post_count,
coalesce(cc.c, 0) AS comment_count,
coalesce(sc.c, 0) AS share_count
FROM users AS u
LEFT JOIN (SELECT p.user_id,
count(*) AS cc
FROM posts AS p
GROUP BY p.user_id) AS pc
ON pc.user_id = u.id
LEFT JOIN (SELECT c.user_id,
count(*) AS
FROM comments AS c
GROUP BY c.user_id) AS cc
ON cc.user_id = u.id
LEFT JOIN (SELECT s.user_id,
count(*) AS c
FROM shares AS s
GROUP BY s.user_id) AS sc
ON sc.user_id = u.id;

Left join command is not showing all results

I have a table RESTAURANT:
Id | Name
------------------
0 | 'McDonalds'
1 | 'Burger King'
2 | 'Starbucks'
3 | 'Pans'
And a table ORDER:
Id | ResId | Client
--------------------
0 | 1 | 'Peter'
1 | 2 | 'John'
2 | 2 | 'Peter'
Where 'ResId' is a foreign key from RESTAURANT.Id.
I want to select the number of order per restaurant:
Expected result:
Restaurant | Number of orders
----------------------------------
'McDonalds' | 0
'Burguer King' | 1
'Starbucks' | 2
'Pans' | 0
Actual result:
Restaurant | Number of orders
----------------------------------
'McDonalds' | 0
'Burguer King' | 1
'Starbucks' | 2
Command used:
select r.Name, count(o.ResId)
from RESTAURANT r
left join ORDER o on r.Id like o.ResId
group by o.ResId;
Just fix the group by clause:
select r.name, count(*) as cnt_orders
from restaurants r
left join orders o on r.id = o.resid
group by r.id, r.name;
That way, the SELECT and GROUP BY clauses are consistent; I also added the restaurant id to the group, so potential restaurants having the same name are not aggregated together. I also changed like to =: this is more efficient, and does not alter the logic.
You could also phrase this with a subquery, so there is no need for outer aggregation. I would prefer:
select r.*,
(select count(*) from orders o where o.resid = r.id) as cnt_orders
from restaurants r
Your query should be generating an error because the select columns and the group by columns are incompatible. Just aggregate by the unaggregated columns in the select:
select r.Name, count(o.ResId)
from RESTAURANT r left join
ORDER o
on r.Id = o.ResId
group by r.Name;
Notes:
You might want to include r.id in the GROUP BY (and SELECT) in case restaurants can have the same name.
Note the use of = instead of LIKE. The ids look like numbers, so you should use number operations. LIKE is a string operation.
ORDER is a bad name for a table because it is a SQL keyword.
As a general rule, in a LEFT JOIN, you don't want the aggregation keys to be from the second table, because those values could be NULL.

SQL join on table twice not bringing expected results

I have two tables (extraneous columns removed to exemplify the issue):
-People-
PID | CarID1 | CarID2
----------------------
1 | 1 | 3
2 | 5 | NULL
3 | 1 | NULL
4 | NULL | 1
-Cars-
CarID
-----
1
3
5
I'm creating a view based on the CarID so using:
SELECT
c.CarID,
COUNT(p.PID) AS pCount
FROM
Cars c
LEFT JOIN People p ON p.CarID1 = c.CarID OR p.CarID2 = c.CarID
Group By c.CarID
Brings back the expected results:
CarID | pCount
--------------
1 | 3
3 | 1
5 | 1
The issue being that on a table with 1000+ car id's and 25,000 people, this can take a long time (taking out the OR clause means it takes milliseconds)
So I was trying to do it another way like this:
SELECT
c.CarID,
COUNT(p1.PID) AS pCount1,
COUNT(p2.PID) AS pCount2
FROM
Cars c
LEFT JOIN People p1 ON p1.CarID1 = c.CarID
LEFT JOIN People p2 ON p2.CarID2 = c.CarID
Group By c.CarID
It's many times quicker, but because CarID 1 exists in both CarID1 and CarID2 I'm getting this:
CarID | pCount1 | pCount2
-------------------------
1 | 3 | 3
3 | 0 | 1
5 | 1 | 0
When I would expect this:
CarID | pCount1 | pCount2
-------------------------
1 | 2 | 1
3 | 0 | 1
5 | 1 | 0
And I could just sum the pCount1 and pCount2
Is there any way I can achieve the results of the first query using the 2nd method? I'm presuming the GROUP BY clause has something to do with it, but not sure how to omit it.
How about unpivoting the columns and then joining:
SELECT v.CarID, COUNT(p.PID) AS pCount
FROM People p CROSS APPLY
(VALUES (p.CarID1), (p.CarID2)) v(CarID) JOIN
Cars c
ON v.CarID = c.CarId
WHERE v.CarID IS NOT NULL
GROUP BY v.CarID;
If you want to keep cars even with no people, then you can express this as a LEFT JOIN:
SELECT c.CarID, COUNT(p.PID) AS pCount
FROM Cars c LEFT JOIN
(People p CROSS APPLY
(VALUES (p.CarID1), (p.CarID2)) v(CarID)
)
ON v.CarID = c.CarId
GROUP BY c.CarID;
Here is a db<>fiddle.
Is the p.CarID1 a Primary Key?
If so it would explain that a join on the carID1 is fast but on the carID2 it's slow.
Try creating an Index on CarID2 and see if that solves your performance issues.
The index would turn it from a full table scan into an index lookup. Which is a lot faster.
CREATE NONCLUSTERED INDEX CarId2Index
ON p.CarID2;
If that solves it you can keep your query as it is.
Alternatively you can send us the query explain plan so we can see what is slowing it down.
Try using SUM with condition like below.
SELECT
c.CarID,
SUM(IIF(p1.PID IS NULL, 0, 1)) AS pCount1,
SUM(IIF(p2.PID IS NULL, 0, 1)) AS pCount2
FROM
Cars c
LEFT JOIN People p1 ON p1.CarID1 = c.CarID
LEFT JOIN People p2 ON p2.CarID2 = c.CarID
Group By c.CarID
Try with COALESCE function:
SELECT
c.CarID,
COUNT(p.PID) AS pCount
FROM
Cars c
LEFT JOIN People p ON COALESCE(p.CarID1, p.CarID2) = c.CarID
Group By c.CarID

Postgres SQL: getting group count

I have the following table
>> tbl_category
id | category
-------------
0 | A
1 | B
...|...
>>tbl_product
id | category_id | product
---------------------------
0 | 0 | P1
1 | 1 | P2
...|... | ...
I can use the following query to count the number of products in a category.
select category, count(tbl.product) from tbl_product
join tbl_category on tbl_product.category_id = category.id
group by catregory
However, there are some categories that never have any product belonging to. How do I get these to show up in the query result as well?
Use a left join:
select c.category, count(tbl.product)
from tbl_category c left join
tbl_product p
on p.category_id = c.id
group by c.category;
The table where you want to keep all the rows goes first (tbl_category).
Note the use of table aliases to make the query easier to write and to read.

GROUP BY including 0 where none present

I have a table of lists, each of which contains posts. I want a query that tells me how many posts each list has, including an entry with a 0 for each list that doesn't have any posts.
eg.
posts:
id | list_id
--------------
1 | 1
2 | 1
3 | 2
4 | 2
lists:
id
---
1
2
3
should return:
list_id | num_posts
-------------------
1 | 2
2 | 2
3 | 0
I have done so using the following query, but it feels a bit stupid to effectively do the grouping and then execute another sub-query to fill in the blanks:
WITH "count_data" AS (
SELECT "posts"."list_id" AS "list_id", COUNT(DISTINCT "posts"."id") AS "num_posts"
FROM "posts"
INNER JOIN "lists" ON "posts"."list_id" = "lists"."id"
GROUP BY "posts"."list_id"
)
SELECT "lists"."id", COALESCE("count_data"."num_posts", 0)
FROM "lists"
LEFT JOIN "count_data" ON "count_data"."list_id" = "lists"."id"
ORDER BY "count_data"."num_posts" DESC
Thanks!
It'll be more efficient to left join directly, avoiding a seq scan with a big merge join in the process:
select lists.id as list_id, count(posts.list_id) as num_posts
from lists
left join posts on posts.list_id = lists.id
group by lists.id
If I understand your question, this should work:
SELECT List_ID, ISNULL(b.list_ID,0)
FROM lists a
LEFT JOIN (SELECT list_ID, COUNT(*)
FROM posts
GROUP BY list_ID
)b
ON a.ID = b.list_ID