Using UNNEST with a JOIN - sql

I want to be able to use unnest() function in PostgreSQL in a complicated SQL query that has many JOINs. Here's the example query:
SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN taggings ON taggings.mention_id = mentions.id
INNER JOIN tags ON tags.id = taggings.tag_id
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id
I want to eliminate the taggings table here, because my mentions table has an integer array field named taglist that consists of all linked tag ids of mentions.
I tried following:
SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN tags ON tags.id IN (SELECT unnest(taglist))
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id
This works but brings different results than the first query.
So what I want to do is to use the result of the SELECT unnest(taglist) in a JOIN query to compensate for the taggings table.
How can I do that?
UPDATE: taglist is the same set as the respective list of tag ids of mention.

Technically, your query might work like this (not entirely sure about the objective of this query):
SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM (
SELECT unnest(m.taglist) AS tag_id
FROM mentions m
WHERE m.search_id = 3
AND 9 = ANY (m.taglist)
) m
JOIN tags t USING (tag_id) -- assumes tag.tag_id!
GROUP BY t.parent_id;
However, it seems to me you are going in the wrong direction here. Normally one would remove the redundant array taglist and keep the normalized database schema. Then your original query should serve well, only shortened the syntax with aliases:
SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM mentions m
JOIN taggings mt ON mt.mention_id = m.id
JOIN tags t ON t.id = mt.tag_id
WHERE 9 = ANY (m.taglist)
AND m.search_id = 3
GROUP BY t.parent_id;
Unravel the mystery
<rant>
The root cause for your "different results" is the unfortunate naming convention that some intellectually challenged ORMs impose on people.
I am speaking of id as column name. Never use this anti-pattern in a database with more than one table. Right, that means basically any database. As soon as you join a bunch of tables (that's what you do in a database) you end up with a bunch of columns named id. Utterly pointless.
The ID column of a table named tag should be tag_id (unless there is another descriptive name). Never id.
</rant>
Your query inadvertently counts tags instead of mentions:
SELECT 25 AS keyword_id, count(m.id) AS total, t.parent_id AS tag_id
FROM (
SELECT unnest(m.taglist) AS id
FROM mentions m
WHERE m.search_id = 4
AND 25 = ANY (m.taglist)
) m
JOIN tags t USING (id)
GROUP BY t.parent_id;
It should work this way:
SELECT 25 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id
FROM (
SELECT m.id, unnest(m.taglist) AS tag_id
FROM mentions m
WHERE m.search_id = 4
AND 25 = ANY (m.taglist)
) m
JOIN tags t ON t.id = m.tag_id
GROUP BY t.parent_id;
I also added back the DISTINCT to your count() that got lost along the way in your query.

Something like this should work:
...
tags t INNER JOIN
(SELECT UNNEST(taglist) as idd) a ON t.id = a.idd
...

Related

How to convert [NULL] to NULL in Postgres SQL statement?

In Postgres, I set out to write a SQL statement that would return various fields from one table, along with a column containing an array of tag strings that come from another table. I've made quite good progress with this code:
SELECT p.photo_id, p.name, p.path, array_agg(t.tag) as tags FROM photos p
JOIN users u USING (user_id)
LEFT JOIN photo_tags pt USING (photo_id)
LEFT JOIN tags t USING (tag_id)
WHERE u.user_id = 'some_uuid'
GROUP BY p.photo_id, p.name, p.path
ORDER BY date(p.date_created) DESC, p.date_created ASC
Everything is working exactly like I intended except for one thing: If a given photo has no tags attached to it then this is being returned: [NULL]
I would prefer to return just NULL rather than null in an array. I've tried several things, including using coalesce and ifnull but couldn't fix things precisely the way I want.
Not the end of the world if an array with NULL is returned by the endpoint but if you know a way to return just NULL instead, I would appreciate learning how to do this.
You can filter out nulls during the join process.
If none is returned, you should get a NULL instead of [NULL]
SELECT array_agg(t.tag) filter (where t.tag is not null) as tags
FROM ...
I would go with a subquery in your case:
SELECT p.photo_id, p.name, p.path, agg_tags as tags
FROM photos p
JOIN users u USING (user_id)
LEFT JOIN photo_tags pt USING (photo_id)
LEFT JOIN (
SELECT tag_id, array_agg(tag) AS agg_tags
FROM tags
GROUP BY tag_id
) t USING (tag_id)
WHERE u.user_id = 'some_uuid'
ORDER BY date(p.date_created) DESC, p.date_created ASC
You did not post many information about your schema, table size and so on but a LATERAL join could be an option to add on the above syntax.

sum greater than in subquery

I'm making a database in PostgreSQL that involves around democracy. All data that should be displayed are controlled by the users, and their percentage of power.
I'm struggling to write this SQL query where a collection of tags on a post should only be shown once the sum of all the percentage for each tag reaches a certain criteria.
The relations between the tables (relevant to this question) looks like this:
The post_tags table is used for deciding what tag stays on what post, decided by the users based on their percentage.
It may look something like this
approved_by
post_id
tag_id
percentage
1
1
1
0.33
5
1
3
0.45
7
1
3
0.25
6
1
3
0.15
4
1
1
0.90
1
1
2
0.45
1
1
6
-0.60
6
1
2
-0.15
How do you write an SQL query that selects a post and its tags if the percentage sum is above a certain threshold?
In the case of SUM(post_tags.percentage) > 0.75, only tag with tag_id 1 and 3 should show.
So far, I have written this SQL query, but it contains duplicates in the array_agg (might be a separate issue), and the HAVING only seem to depend on the total sum of all the tags in the array_agg.
SELECT
posts.post_id, array_agg(tags.name) AS tags
FROM
posts, tags, post_tags
WHERE
post_tags.post_id = posts.post_id AND
post_tags.tag_id = tags.tag_id
GROUP BY
posts.post_id
HAVING
SUM(post_tags.percentage) > 0.75
LIMIT 10;
I assume I might need to do a subquery within the query, but you can't do SUM inside the WHERE clause. I'm a bit lost at this issue.
Any help is appreciated
UPDATE
Because I think there needs to be atleast 2 queries into play, I think this should be one of them
SELECT
tags.name
FROM
post_tags, posts, tags
WHERE
post_tags.tag_id = tags.tag_id AND
post_tags.post_id = posts.post_id AND
posts.post_id = 1
GROUP BY
tags.tag_id
HAVING
SUM(post_tags.percentage) > 0.75
In this case, it's only for post 1, and I don't know how to continue this query for all posts
It's easy to get confused, but start small, and then expand the SQL query as you go.
Note the inner parenthesis will execute first. Start with the inner query, and then work on the outer query when building SQL queries.
In this case, for finding the tags relevant for a single post can be written like so
SELECT
t.name
FROM
post_tags
INNER JOIN tags t ON t.tag_id = post_tags.tag_id
INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = 1
GROUP BY
t.tag_id
HAVING
SUM(post_tags.percentage) > 0.75
To expand on this, and apply the query for every post, switch out the 1 and set it equal the outer scope. The complete SQL query becomes this:
SELECT p.post_id, ARRAY(
SELECT
t.name
FROM
post_tags
INNER JOIN tags t ON t.tag_id = post_tags.tag_id
INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = p.post_id
GROUP BY
t.tag_id
HAVING
SUM(post_tags.percentage) > 0.75
) AS tags
FROM posts p
Big thanks to Tim Biegeleisen who helped change out the FROM statements to INNER JOIN (tho performance-wise, both are tested equally fast in this case).
One idea would be to first aggregate the total percentage for each post/tag pair in a subquery. The subquery gives you a new join table unique_post_tags (one entry per post and tag, including the total_percentage for each post/tag pair). You can then select from post_tags_unique in the outer query, filtering irrelevant tags in the WHERE clause:
SELECT unique_post_tags.post_id, unique_post_tags.tag_id FROM
(
SELECT post_id, tag_id, sum(percentage) as total_percentage
FROM post_tags
GROUP BY post_id, tag_id
) AS unique_post_tags
WHERE unique_post_tag.total_percentage > 0.75
To actually select the tag names per post and group it into an array as you requested, the above query can be extended like this:
SELECT unique_post_tags.post_id, array_agg(t.name) AS tags FROM
(
SELECT post_id, tag_id, sum(percentage) as total_percentage
FROM post_tags
GROUP BY post_id, tag_id
) AS unique_post_tags
LEFT JOIN tags t ON t.id = unique_post_tags.tag_id
WHERE unique_post_tags.total_percentage > 0.75
GROUP BY unique_post_tags.post_id
Update
After looking at your answer more closely, I now realize that my idea of reducing the join table to the relevant entries first, can be implemented entirely in the subquery using the GROUP BY/HAVING approach you initially suggested:
SELECT relevant_post_tags.post_id , array_agg(t.name) AS tags FROM
(
SELECT post_id, tag_id
FROM post_tags
GROUP BY post_id, tag_id
HAVING SUM(percentage) > 0.75
) AS relevant_post_tags
LEFT JOIN tags t ON t.id = relevant_post_tags.tag_id
GROUP BY relevant_post_tags.post_id;
Or written as CTE (for readability):
WITH relevant_post_tags AS (
SELECT post_id, tag_id
FROM post_tags
GROUP BY post_id, tag_id
HAVING SUM(percentage) > 0.75)
)
SELECT relevant_post_tags.post_id, array_agg(t.name) AS tags
FROM relevant_post_tags
LEFT JOIN tags t ON t.id = relevant_post_tags.tag_id
GROUP BY relevant_post_tags.post_id;
If the 0.75 limit is static, you could also create a relevant_post_tags view in the DB and select from there directly. I did not look at the performance of the above (my guess would be that the query optimizer takes care of it, just note that using CTEs had some pitfalls in earlier Postgres versions).
The approach I came up with is a bit different from what you initially asked for, the result set for the queries above will only contain posts that actually have tags.
If you need to select all posts, you can expand like this:
WITH relevant_post_tags AS (
SELECT post_id, tag_id
FROM post_tags
GROUP BY post_id, tag_id
HAVING SUM(percentage) > 0.75
)
SELECT p.id, array_remove(array_agg(t.name), NULL) AS tags
FROM posts p
LEFT JOIN relevant_post_tags pt on pt.post_id = p.id
LEFT JOIN tags t ON t.id = pt.tag_id
GROUP BY p.id;
Or closer to your solution:
WITH relevant_post_tags AS (
SELECT post_id, tag_id
FROM post_tags
GROUP BY post_id, tag_id
HAVING SUM(percentage) > 0.75
)
SELECT p.id, ARRAY(
SELECT t.name
FROM relevant_post_tags pt
JOIN tags t ON t.id = pt.tag_id
WHERE pt.post_id = p.id
)
FROM posts p;

PostgreSQL - GROUP BY clause

I want to search by tags, and then list all articles with that tag, and also how many of given tags they match. So for example I might have:
Page1 - 2 (has css and php tag)
Page2 - 1 (has only css tag)
Query:
SELECT COUNT(t.tag)
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag = 'css' OR t.tag = 'php'
GROUP BY t.tag
LIMIT 9
When I only put COUNT(t.tag) the query works, and I get okay results. But if I append e.g. ID of my article I get following error:
ERROR: column "a.title" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT COUNT(t.tag), a.title FROM a_tags t
How to add said columns to this query?
Postgres 9.1 or later, quoting the release notes of 9.1 ...
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.
Related:
Return a grouped list with occurrences using Rails and PostgreSQL
The queries in the question and in #Michael's answer have the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.
list all articles with that tag, and also how many of given tags they match
To fix this:
SELECT count(t.tag) AS ct, a.* -- any column from table a allowed ...
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag IN ('css', 'php')
GROUP BY a.id -- ... since PK is in GROUP BY
LIMIT 9;
Assuming id is the primary key of w_article.
However, this form will be faster while doing the same:
SELECT a.*, ct
FROM (
SELECT a2t.article AS id, count(*) AS ct
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
GROUP BY 1
LIMIT 9 -- LIMIT early - cheaper
) sub
JOIN w_article a USING (id); -- attached alias to article in the sub
Closely related answer from just yesterday:
Why does the following join increase the query time significantly?
When you use a "GROUP BY" clause, you need to enclose all columns that are not grouped in an aggregate function. Try adding title to the GROUP BY list, or selecting "min(a.title)" instead.
SELECT COUNT(t.tag), a.title FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag = 'css' OR t.tag = 'php' GROUP BY t.tag, a.title LIMIT 9

Sql query to find things tagged with all specified tags

Let's say I have the following tables:
TAGS
id: integer
name: string
POSTS
id: integer
body: text
TAGGINGS
id: integer
tag_id: integer
post_id: integer
How would I go about writing a query that select all posts that are tagged with ALL of the following tags (name attribute of tags table): "Cheese", "Wine", "Paris", "Frace", "City", "Scenic", "Art"
See also: Sql query to find things with most specified tags (note: similar, but not a duplicate!)
Using IN:
SELECT p.*
FROM POSTS p
WHERE p.id IN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
Using a JOIN
SELECT p.*
FROM POSTS p
JOIN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id
Using EXISTS
SELECT p.*
FROM POSTS p
WHERE EXISTS (SELECT NULL
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
AND tg.post_id = p.id
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
Explanation
The crux of things is that the COUNT(DISTINCT t.name) needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.
Performance
Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.
Try this:
Select * From Posts p
Where Not Exists
(Select * From tags t
Where name in
('Cheese', 'Wine', 'Paris',
'Frace', 'City', 'Scenic', 'Art')
And Not Exists
(Select * From taggings
Where tag_id = t.Tag_Id
And post_Id = p.Post_Id))
Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.

Fetch fields from a table that has the same relation to another table

I'll try to explain my case as good as i can.
I'm making a website where you can find topics by browsing their tags. Nothing strange there. I'm having tricky time with some of the queries though. They might be easy for you, my mind is pretty messed up from doing alot of work :P.
I have the tables "topics" and "tags". They are joined using the table tags_topics which contains topic_id and tag_id. When the user wants to find a topic they might first select one tag to filter by, and then add another one to the filter. Then i make a query for fetching all topics that has both of the selected tags. They might also have other tags, but they MUST have those tags chosen to filter by. The amount of tags to filter by differs, but we always have a list of user-selected tags to filter by.
This was mostly answered in Filtering from join-table and i went for the multiple joins-solution.
Now I need to fetch the tags that the user can filter by. So if we already have a defined filter of 2 tags, I need to fetch all tags but those in the filter that is associated to topics that includes all the tags in the filter. This might sound wierd, so i'll give a practical example :P
Let's say we have three topics: tennis, gym and golf.
tennis has tags: sport, ball, court and racket
gym has tags: sport, training and muscles
golf has tags: sport, ball, stick and outside
User selects tag sport, so we show all three tennis, gym and golf, and we show ball, court, racket, training, muscles, stick and outside as other possible filters.
User now adds ball to the filter. Filter is now sport and ball, so we show the topics tennis and golf, with court, racket, stick and outside as additional possible filters.
User now adds court to the filter, so we show tennis and racket as an additional possible filter.
I hope I'm making some sense. By the way, I'm using MySQL.
SELECT DISTINCT `tags`.`tag`
FROM `tags`
LEFT JOIN `tags_topics` ON `tags`.`id` = `tags_topics`.`tag_id`
LEFT JOIN `topics` ON `tags_topics`.`topic_id` = `topics`.`id`
LEFT JOIN `tags_topics` AS `tt1` ON `tt1`.`topic_id` = `topics`.`id`
LEFT JOIN `tags` AS `t1` ON `t1`.`id` = `tt1`.`tag_id`
LEFT JOIN `tags_topics` AS `tt2` ON `tt2`.`topic_id` = `topics`.`id`
LEFT JOIN `tags` AS `t2` ON `t2`.`id` = `tt2`.`tag_id`
LEFT JOIN `tags_topics` AS `tt3` ON `tt3`.`topic_id` = `topics`.`id`
LEFT JOIN `tags` AS `t3` ON `t3`.`id` = `tt3`.`tag_id`
WHERE `t1`.`tag` = 'tag1'
AND `t2`.`tag` = 'tag2'
AND `t3`.`tag` = 'tag3'
AND `tags`.`tag` NOT IN ('tag1', 'tag2', 'tag3')
SELECT topic_id
FROM topic_tag
WHERE tag_id = 1
OR tag_id = 2
OR tag_id = 3
GROUP BY topic_id
HAVING COUNT(topic_id) = 3;
The above query will get all topic_ids that have all three tag_ids of 1, 2 and 3. Then use this as a subquery:
SELECT tag_name
FROM tag
INNER JOIN topic_tag
ON tag.tag_id = topic_tag.tag_id
WHERE topic_id IN
( SELECT topic_id
FROM topic_tag
WHERE tag_id = 1
OR tag_id = 2
OR tag_id = 3
GROUP BY topic_id
HAVING COUNT(topic_id) = 3
)
AND
(
tag.tag_id <> 1
OR tag.tag_id <> 2
OR tag.tag_id <> 3
)
I think this is what you are looking for.
Select a.topic_id
from join_table a
where exists( select *
from join_table b
where a.tag_id = b.tag_id
and b.topic_id = selected_topic )
group by a.topic_id
having count(*) = ( select count(*)
from join_table c
where c.topic_id = selected_topic )
Should give you a list of topics which have all of the tags for selected_topic.
Generic solution from the top of my head but prone to have typos:
CREATE VIEW shared_tags_count AS
SELECT topic_to_tag1.topic_id AS topic_id1, topic_to_tag2.topic_id AS topic_id2, COUNT(*) as number
FROM topic_to_tag as topic_to_tag1
JOIN topic_to_tag as topic_to_tag2
ON topic_to_tag1.topic_id <> topic_to_tag2.topic_id
AND topic_to_tag1.tag_id = topic_to_tag2.tag_id
GROUP BY topic_to_tag1.topic_id, topic_to_tag2.topic_id;
CREATE VIEW tags_count AS
SELECT topic_id, COUNT(*) as number
FROM topic_to_tag
GROUP BY topic_id
CREATE VIEW related_topics AS
SELECT shared_tags_count.topic_id1, shared_tags_count.topic_id2
FROM shared_tags_count
JOIN tags_count
ON topic_id=topic_id1
AND shared_tags_counts.number = tags_count.number
CREATE VIEW related_tags AS
SELECT related_topics.topic_id1 as topic_id, topic_to_tag.tag_id
FROM related_topics
JOIN topic_to_tag
ON raleted_topics.tag_id2 = topic_to_tag.topic_id
You just have to query the related_tags view.
Interesting challenge btw.