many-to-many query - sql

I have a problem and I dont know what is better solution.
Okay, I have 2 tables: posts(id, title), posts_tags(post_id, tag_id).
I have next task: must select posts with tags ids for example 4, 10 and 11.
Not exactly, post could have any other tags at the same time.
So, how I could do it more optimized? Creating temporary table in each query? Or may be some kind of stored procedure?
In the future, user could ask script to select posts with any count of tags (it could be 1 tag only or 10 at the same time) and I must be sure that method that I will choose would be the best method for my problem.
Sorry for my english, thx for attention.

This solution assumes that (post_id, tag_id) in post_tags is enforced to be UNIQUE:
SELECT id, title FROM posts
INNER JOIN post_tag ON post_tag.post_id = posts.id
WHERE tag_id IN (4, 6, 10)
GROUP BY id, title
HAVING COUNT(*) = 3
Although it's not a solution for all possible tag combinations, it's easy to create as dynamic SQL. To change for other sets of tags, change the IN () list to have all the tags, and the COUNT(*) = to check for the number of tags specified. The advantage of this solution over cascading a bunch of JOINs together is that you don't have to add JOINs, or even extra WHERE terms, when you change the request.

select id, title
from posts p, tags t
where p.id = t.post_id
and tag_id in ( 4,10,11 ) ;
?

Does this work?
select *
from posts
where post.post_id in
(select post_id
from post_tags
where tag_id = 4
and post_id in (select post_id
from post_tags
where tag_id = 10
and post_id in (select post_id
from post_tags
where tag_id = 11)))

You can do a time-storage trade-off by storing a one-way hash of the post's tag names sorted alphabetically.
When a post is tagged, execute select t.name from tags t inner join post_tags pt where pt.post_id = [ID_of_tagged_post] order by t.name. Concatenate all of the tag names, create a hash using the MD5 algorithm and insert the value into a column alongside your post (or into another table joined by a foreign key, if you prefer).
When you want to search for a specific combination of tags, simply execute (remembering to sort the tag names) select from posts p where p.taghash = MD5([concatenated_tag_string]).

This selects all posts that have any of the tags (4, 10, 11):
select distinct id, title from posts
where exists (
select * from posts_tags
where
post_id = id and
tag_id in (4, 10, 11))
Or you can use this:
select distinct id, title from posts
join posts_tags on post_id = id
where tag_id in (4, 10, 11)
(Both will be optimized the same way).
This selects all posts that have all of the tags (4, 10, 11):
select distinct id, title from posts
where not exists (
select * from posts_tags t1
where
t1.tag_id in (4, 10, 11) and
not exists (
select * from posts_tags as t2
where
t1.tag_id = t2.tag_id and
id = t2.post_id))
The list of tags in the in clause is what dynamically changes (in all cases).
But, this last query is not really fast, so you could use something like this instead:
create temporary table target_tags (tag_id int);
insert into target_tags values(4),(10),(11);
select id, title from posts
join posts_tags on post_id = id
join target_tags on target_tags.tag_id = posts_tags.tag_id
group by id, title
having count(*) = (select count(*) from target_tags);
drop table target_tags;
The part that changes dynamically is now in the second statement (the insert).

Related

Filter on Foreign Key with LATERAL JOIN brings yields strange results

thanks for your time!
Basically, I'm trying to filter a NxM table using foreign keys, with 0,1 or N different tags. The problem is that LEFT LATERAL JOIN yields bizarre results.
Please, don't mind the strange casting, I'm doing so because I'm using spring boot.
Here is a fiddle showing a fake relationship:
https://www.db-fiddle.com/f/6bDu33keWACHssLqznk88n/0
Schema (PostgreSQL v13)
CREATE TABLE posts (id int primary key);
CREATE TABLE tags (id int primary key);
CREATE TABLE post_tags (post_id int references posts(id),
tags_id int references tags(id),
primary key (post_id, tags_id));
INSERT INTO posts VALUES (1), (2), (3), (4);
INSERT INTO tags VALUES (8), (9);
INSERT INTO post_tags VALUES (1,8), (1,9), (2,8);
Query #1
select * from posts p
left join lateral (select * from post_tags pt where pt.post_id = p.id) pt on 1=1
where (1 is null or pt.tags_id = any(cast(STRING_TO_ARRAY(CAST('9' AS TEXT), ',') AS INT[])));
id
post_id
tags_id
1
1
9
Query #2
select * from posts p
left join lateral (select * from post_tags pt where pt.post_id = p.id limit 1) pt on 1=1
where (1 is null or pt.tags_id = any(cast(STRING_TO_ARRAY(CAST('9' AS TEXT), ',') AS INT[])));
There are no results to be displayed.
Query #3
select * from posts p
left join lateral (select * from post_tags pt where pt.post_id = p.id) pt on 1=1
where (1 is null or pt.tags_id = any(cast(STRING_TO_ARRAY(CAST('9,8' AS TEXT), ',') AS INT[])));
id
post_id
tags_id
1
1
9
1
1
8
2
2
8
Query #4
select * from posts p
left join lateral (select * from post_tags pt where pt.post_id = p.id limit 1) pt on 1=1
where (1 is null or pt.tags_id = any(cast(STRING_TO_ARRAY(CAST('9,8' AS TEXT), ',') AS INT[])));
id
post_id
tags_id
1
1
8
2
2
8
View on DB Fiddle
If you notice, query #2 yields no results, although it should. I suspect the limit 1 is not allowing it to function properly. But if I remove it, I get duplicate results (as seen in query #3).
My question is, how can I filter on foreign keys and not having duplicate results?
EDIT ---
I expect the query to return at most 1 result per category that matches the where clause;
Query #2 should return:
id
post_id
tags_id
1
1
9
Or in case multi tags are passed, it should return just like query #4 (both matches, post 1 and 2, but not duplicated posts (post id = 1)
Thanks
Obviously there is a problem with the where clause. It is filtering on pt.tag_id, but that column comes from the left join, so it may be null. So when a post has no tags it is always filtered out.
It also occurs to me that you don’t really need a join (which may cause cardinality issues as you are seeing) ; if you just want to filter the posts per tag, exists seems more appropriate:
select p.*
from post p
where exists (
select 1
from post_tag pt
where pt_post_id = p.id
and pt.tags_id = any(cast(STRING_TO_ARRAY(CAST('9,8' AS TEXT), ',') AS INT[]))
)

SELECT from subquery without having to specify all columns in GROUP BY

Idea is to query an article table where an article has a given tag, and then to STRING_AGG all (even unrelated) tags that belong to that article row.
Example tables and query:
CREATE TABLE article (id SERIAL, body TEXT);
CREATE TABLE article_tag (article INT, tag INT);
CREATE TABLE tag (id SERIAL, title TEXT);
SELECT DISTICT ON (id)
q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
SELECT a.*, tag.title tag_title
FROM article a
LEFT JOIN article_tag x ON a.id = tag.article
LEFT JOIN tag ON tag.id = x.tag
WHERE tag.title = 'someTag'
) q
GROUP BY q.id
Running the above, postgres require that the q.body must be included in GROUP BY:
ERROR: column "q.body" must appear in the GROUP BY clause or be used in an aggregate function
As I understand it, it's because subquery q doesn't include any PRIMARY key.
I naively thought that the DISTINCT ON would supplement that, but it doesn't seem so.
Is there a way to mark a column in a subquery as PRIMARY so that we don't have to list all columns in GROUP BY clause?
If we do have to list all columns in GROUP BY clause, does that incur significant perf cost?
EDIT: to elaborate, since PostgreSQL 9.1 you don't have to supply non-primary (i.e. functionally dependent) keys when using GROUP BY, e.g. following query works fine:
SELECT a.id, a.body, STRING_AGG(tag.title, '|') tags
FROM article a
LEFT JOIN article_tag x ON a.id = tag.article
LEFT JOIN tag ON tag.id = x.tag
GROUP BY a.id
I was wondering if I can leverage the same behavior, but with a subquery (by somehow indicating that q.id is a PRIMARY key).
It sadly doesn't work when you wrap your primary key in subquery and I don't know of any way to "mark it" as you suggested.
You can try this workaround using window function and distinct:
CREATE TABLE test1 (id serial primary key, name text, value text);
CREATE TABLE test2 (id serial primary key, test1_id int, value text);
INSERT INTO test1(name, value)
values('name1', 'test01'), ('name2', 'test02'), ('name3', 'test03');
INSERT INTO test2(test1_id, value)
values(1, 'test1'), (1, 'test2'), (3, 'test3');
SELECT DISTINCT ON (id) id, name, string_agg(value2, '|') over (partition by id)
FROM (SELECT test1.*, test2.value AS value2
FROM test1
LEFT JOIN test2 ON test2.test1_id = test1.id) AS sub;
id name string_agg
1 name1 test1|test2
2 name2 null
3 name3 test3
Demo
Problem is in outer SELECT - you should either aggregate columns either
group by them. Postgres wants you to specify what to do with q.body - group by it or calculate aggregate. Looks little bit awkward but should work.
SELECT DISTICT ON (id)
q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
SELECT a.*, tag.title tag_title
FROM article a
LEFT JOIN article_tag x ON a.id = tag.article
LEFT JOIN tag ON tag.id = x.tag
WHERE tag.title = 'someTag'
) q
GROUP BY q.id, q.body
-- ^^^^^^
Another way is to make a query to get id and aggregated tags then join body to it. If you wish I can make an example.

SQL search for all the projects which have all mentioned tags assigned

I have a SQL query for searching projects based on tags. It's working fine if any of the tag is matching.
SELECT *
FROM projects
WHERE projects.id IN (SELECT taggable_id
FROM taggings
WHERE taggable_type='Project'
AND taggable_id=projects.id
AND taggings.tag_id IN (1, 2, 3))
There are three tables, Taggings with columns id, taggable_id, taggable_type, tag_id; Tags with columns id, name; and Projects with columns id, name, description.
What I want is, to search for all the projects for which all 3 tags are assigned.
Thanks.
You can do this by counting the number of matches. Your subquery is redundant in the comparison to p.id (done both by the IN and by the correlation clause). Here is one method:
SELECT p.*
from projects p
WHERE 3 = (SELECT COUNT(*)
FROM taggings t
WHERE t.taggable_type = 'Project' AND
t.taggable_id = p.id AND
t.tag_id IN (1, 2, 3)
);
You can also do this using IN:
SELECT p.*
from projects p
WHERE p.id = (SELECT t.taggable_id
FROM taggings t
WHERE t.taggable_type = 'Project' AND
t.tag_id IN (1, 2, 3)
GROUP BY t.taggable_id
HAVING COUNT(*) = 3
);

Is there a better way to find the most popular title in a 'self-linked' table of user posts?

I have this (simplified for space) table schema with user posts and related comments:
create table
tbl_post (
id integer primary key,
title text not null,
content text not null,
post_id integer null
);
where tbl_post.post_id is an (int) comment id associated with given tbl_post.id,
or null if tbl_post.id row is main, authored title (namely not a comment).
I'm using this sqlite query to figure out the most popular title in posts table (criteria is how many comments relates to it...):
select
title
from
tbl_post
where
id = (
select
post_id
from (
select
post_id, count(post_id) as tot
from
tbl_post
where
ifnull(post_id, '') != ''
group by
post_id
order by
tot desc
limit 1
)
);
which looks quite bulky to me having those two nested select statements. I would like to make the query simpler (shorter, potentialy faster) somehow. Thanks.
How about a self-join?
SELECT p.Id p.title, p.content, COUNT(c.Id) AS nbOfComments
FROM tbl_post p
LEFT JOIN tbl_post c ON p.Id = c.post_id
WHERE p.post_id IS NULL
GROUP BY p.Id, p.title, p.content

SQL Taxonomy Help

I have a database that relates content by taxonomy and I am trying to query that content by taxonomy. It looks like this:
Table 1
content_id, content_name
Table 2
content_id, content_taxonmoy
What I am trying in my query is to find content with two or more types of related taxonomy. My query looks like this:
SELECT content_id FROM table_1 JOIN table_2 ON table_1.content_id=table_2.content_id WHERE content_taxonomy='ABC' AND content_taxonomy='123'
Except it returns nothing. I later tried a group by with:
SELECT content_id FROM table_1 JOIN table_2 ON table_1.content_id=table_2.content_id WHERE content_taxonomy='ABC' AND content_taxonomy='123'GROUP BY content_id, content_taxonomy
But that didn't work either. Any suggestions please?
SELECT *
FROM content c
WHERE (
SELECT COUNT(*)
FROM taxonomy t
WHERE t.content_id = c.content_id
AND t.content_taxonomy IN ('ABC', '123')
) = 2
Create a UNIQUE INDEX or a PRIMARY KEY on taxonomy (content_id, content_taxonomy) for this to work fast.
SELECT c.*
FROM (
SELECT content_id
FROM taxonomy
WHERE content_taxonomy IN ('ABC', '123')
GROUP BY
content_id
HAVING COUNT(*) = 2
) t
JOIN content c
ON c.content_id = t.content_id
In this case, create a UNIQUE INDEX or a PRIMARY KEY on taxonomy (content_taxonomy, content_id) (note the order or the fields).
Either solution can be more or less effective than another one, depending on how many taxonomies per content do you have and what is the probability of matching.