"Tag" searching/exclusion query design issue - sql

Background: I'm working on a homebrew project for managing a collection of my own images, and have been trying to implement a tag-based search so I can easily sift through them.
Right now, I'm working with RedBean's tagging API for applying tags to each image's database entry, however I'm stuck on a specific detail of my implementation; currently, to allow search of tags where multiple tags will refine the search (when searching for "ABC XYZ", tagged image must have tags "ABC" and "XYZ"),
I'm having to handle some of the processing in the server-side language and not SQL, and then run an (optional) second query to verify that any returned images don't have a tag that has been explicitly excluded from results. (when searching for "ABC -XYZ", tagged image must have tag "ABC" and not "XYZ").
The problem here is that my current method requires that I run all results by my server-side code, and makes any attempts at sensible pagination/result offsets inaccurate.
My goal is to just grab the rows of the post table that contain the requested tags (and not contain any excluded tags) with one query, and still be able to use LIMIT/OFFSET arguments for my query to obtain reasonably paginated results.
Table schemas is as follows:
Table "post"
Columns:
id (PRIMARY KEY for post table)
(image metadata, not relevant to tag search)
Table "tag"
Columns:
id (PRIMARY KEY for tag table)
title (string of varying length - assume varchar(255))
Table "post_tag"
Columns:
id (PRIMARY KEY for post_tag table)
post_id (associated with column "post.id")
tag_id (associated with column "tag.id")
If possible, I'd like to also be able to have WHERE conditions specific to post table columns as well.
What should I be using for query structure? I've been playing with left joins but haven't been able to get the exact structure I need to solve this.

Here is the basic idea:
The LEFT OUTER JOIN is the set of posts that match tags you want to exclude. The final WHERE clause in the query makes sure that none of these posts match the entries in the first post table.
The INNER JOIN is the set of posts that match all of the tags. Note that the number two must match the number of unique tag names that you provide in the IN clause.
select p.*
from post p
left outer join (
select pt.post_id
from post_tag pt
inner join tag t on pt.tag_id = t.id
where t.title in ('UVW', 'XYZ')
) notag on p.id = notag.post_id
inner join (
select pt.post_id
from post_tag pt
inner join tag t on pt.tag_id = t.id
where t.title in ('ABC', 'DEF')
group by pt.post_id
having count(distinct t.title) = 2
) yestag on p.id = yestag.post_id
where notag.post_id is null
--add additional WHERE filters here as needed

Related

Join List of words to a table

I have two tables. Table "List" and table "Content". I want to store 6 types of lists. These lists are black,white and grey lists and each of these list contain a few words. Whenever someone notices a new word that should be in one of the lists, then new word should be simply added to the database.
The image below shows you the tables that I use.
I want to refer or join a certain list for instance a list with name: "Blacklist" that has an ListID= 1 with the correct set of blacklistwords, that could have a ContentID=1.
The content is a list of words, but I am clueless as for how I should join the correct list of words(content) to a listID. I don't know how to query this.
The part that is troubeling me is that it is a list of words. So a ContentID =1 has for example the words"Login","Password", "Credential" etc. How do I query it to ListID=1 with the name"BlackList"? And do the same for the other lists?
I think it should look like this.
SELECT ID
FROM List
LEFT JOIN Content
ON LIST.ID = ContenID AND CONTENT.ISDEFAULT = 1
WHERE ListID = 1
This only joins the two ID with each other. How do I join the correct list of words with the correct list? Maybe I am totally missing the point with the query above?
Question: How do I join a set or list of words to a list with a name and ListID?
Once you change this schema, the below query will work
SELECT ListID,ContentID,Words
FROM List
LEFT JOIN Content
ON List.ListID = Content.ListID
WHERE List.ListID = 1
I have considered the schema from the diagrams. Please execute the below query:
SELECT
L.Name AS 'ListName',
C.Words
FROM List L
INNER JOIN Content C ON C.ListID = L.ListID
WHERE
C.Words IN ('Login','Password','Credential')

How to select records that have multiple values in a related table?

I have the following three tables for tagging content where each content can have one-to-many tags. For example, a content record could have a tag of California and Variable.
Table w/ the content
Content
-ContentID
-ContentName
Table w/ the tags
Tag
-TagID
-TagName
Table that links the content and the tags
ContentTag
-ContentID
-TagID
With the following SELECT statement I want to get records with TagID of both 21 and 54 however no rows are returned.
SELECT * FROM ContentTag
INNER JOIN Content On ContentTag.ContentID=Content.ContentID
INNER JOIN Tag ON ContentTag.TagID=Tag.TagID
Where (Tag.TagID=21 And Tag.TagID=54)
How do I create a SQL SELECT statement to retrieve content that has one-to-many tags?
I like to approach this question using aggregation and a having clause:
SELECT c.ContentId, c.ContentName
FROM ContentTag ct INNER JOIN
Content c
On ct.ContentID = c.ContentID
WHERE ct.TagID IN (21, 54)
GROUP BY c.ContentId, c.ContentName
HAVING COUNT(Distinct ct.TagId) = 2;
Some notes:
You don't need the join to the tags table. You are using the id and which is in ContentTag.
You don't need *. I presume you are looking content that has the two tags.
The WHERE clause limits the tags to the two tags in question.
The HAVING clause makes sure both are there.

SQL query - get rows from a table based on one condition and through join table based on another condition

I have a tags table - id, name, owner_id (owner_id is FK for users)
and a user_tags table - user_id, tag_id (linking table between users a tags for the purpose of sharing those tags - ie, users who can access the tag but aren't the owner)
I have a query that can get me tags through a join on the user_tags table:
SELECT tags . *
FROM tags
JOIN user_tags ON user_tags.user_id =2
AND user_tags.tag_id = tags.id
LIMIT 0 , 30
But in that same query I'd also like to select tags WHERE tags.owner_id = 2, getting all tags shared with that user through the linking table(user_tags) and also tags that user owns (tags.owner_id = user_id).
If I include WHERE tags.owner_id = 2 after the join, It only returns results where tags.owner_id = 2.
If I include OR tags.owner_id = 2, I get repeats of all the results.
If I make the statement SELECT DISTINCT... OR tags.owner_id = 2 I end up with the correct result set, but I'm not sure that's the correct way to do this join with condition.
Is there a better way/best practice?
Also, why does a join return multiples of the results (ie why is SELECT DISTINCT or GROUP BY necessary?
Thank you.
EDIT FOR CLARIFICATION OF STRUCTURE
I wouldn't use user_tags.user_id as part of the join condition. Just do specify both conditions in the where clause to make your intent clearer. But to answer your question, yes you would need to de-dupe tags with DISTINCT if one tags.id can be associated to many user_tags.tag_id
SELECT DISTINCT tags.*
FROM tags
JOIN user_tags ON tags.id = user_tags.tag_id
WHERE user_tags.user_id = 2
OR user_tags.owner_id = 2
LIMIT 0,30

Full text search involving 2 tables

I'm very noob in relation to Full Text search and I was told to do a full text search over 2 tables and sort results by relevance.
I will look on table "Posts" and table "PostComments". I must look for the search term (let's say "AnyWord") on Posts.Description, Posts.Title and PostComments.Comments.
I have to return Posts order by relevance but since I'm looking on Posts AND PostComments I don't know if this make sense. I'd say I need all the information on the same table in order to sort by relevance.
Could you help me to figure out if this make sense and if it does how to achieve it?
EDIT
I'll try to explain a little better what I need.
A Post is relevant for the search if the searched term is present on the title, on the description or on any of the related PostComments.
But on the front end I will show a list of post. The title of the post on this list is a link to the post itself. The post comments are visible there but not on the search result list, although they are involved on the search process.
So you could have posts on the search result that matched JUST because the search term is present on one or more comments
Only ContainsTable returns an evaluation of relevance. You did not mention what needed to be returned so I simply returned the name of the table from where the value is stored along with the given table's primary key (you would replace "PrimaryKey" with your actual primary key column name e.g. PostId or PostCommentsId), the value and its rank.
Select Z.TableName, Z.PK, Z.Value, Z.Rank
From (
Select 'Posts' As TableName, Posts.PrimaryKey As PK, Posts.Description As Value, CT.Rank
From Posts
Join ContainsTable( Posts, Description, 'Anyword' ) As CT
On CT.Key = Posts.PrimaryKey
Union All
Select 'PostComments', PostComments.PrimaryKey, PostComments.Comments, CT.Rank
From PostComments
Join ContainsTable( PostComments, Comments, 'Anyword' ) As CT
On CT.Key = PostComments.PrimaryKey
) As Z
Order By Z.Rank Desc
EDIT Given the additional information, it is much clearer. First, it would appear that the ranking of the search has no bearing on the results. So, all that is necessary is to use an OR between the search on post information and the search on PostComments:
Select ...
From Posts
Where Contains( Posts.Description, Posts.Title, 'searchterm' )
Or Exists (
Select 1
From PostComments
Where PostComments.PostId = Posts.Id
And Contains( PostComments.Comments, 'searchterm' )
)

how do i display all the tags related to all the feedbacks in one query

I am trying to write a sql query which fetches all the tags related to every topic being displayed on the page.
like this
TITLE: feedback1
POSTED BY: User1
CATEGORY: category1
TAGS: tag1, tag2, tag3
TITLE: feedback2
POSTED BY: User2
CATEGORY: category2
TAGS: tag2, tag5, tag7,tag8
TITLE: feedback3
POSTED BY: User3
CATEGORY: category3
TAGS: tag1, tag5, tag6, tag3
The relationship of tags to topics is many to many.
Right now I am first fetching all the topics from the "topics" table and to fetch the related tags of every topic I loop over the returned topics array for fetching tags.
But this method is very expensive in terms of speed and not efficient too.
Please help me write this sql query.
Query for fetching all the topics and its information is as follows:
SELECT
tbl_feedbacks.pk_feedbackid as feedbackId,
tbl_feedbacks.type as feedbackType,
DATE_FORMAT(tbl_feedbacks.createdon,'%M %D, %Y') as postedOn,
tbl_feedbacks.description as description,
tbl_feedbacks.upvotecount as upvotecount,
tbl_feedbacks.downvotecount as downvotecount,
(tbl_feedbacks.upvotecount)-(tbl_feedbacks.downvotecount) as totalvotecount,
tbl_feedbacks.viewcount as viewcount,
tbl_feedbacks.title as feedbackTitle,
tbl_users.email as userEmail,
tbl_users.name as postedBy,
tbl_categories.pk_categoryid as categoryId,
tbl_clients.pk_clientid as clientId
FROM
tbl_feedbacks
LEFT JOIN tbl_users
ON ( tbl_users.pk_userid = tbl_feedbacks.fk_tbl_users_userid )
LEFT JOIN tbl_categories
ON ( tbl_categories.pk_categoryid = tbl_feedbacks.fk_tbl_categories_categoryid )
LEFT JOIN tbl_clients
ON ( tbl_clients.pk_clientid = tbl_feedbacks.fk_tbl_clients_clientid )
WHERE
tbl_clients.pk_clientid = '1'
What is the best practice that should be followed in such cases when you need to display all the tags related to every topic being displayed on a single page.
How do I alter the above sql query, so that all the tags plus related information of topics is fetched using a single query.
For a demo of what I am trying to achieve is similar to the'questions' page of stackoverflow.
All the information (tags + information of every topic being displayed) is properly displayed.
Thanks
To do this, I would have three tables:
Topics
topic_id
[whatever else you need to know for a topic]
Tags
tag_id
[etc]
Map
topic_id
tag_id
select t.[whatever], tag.[whatever]
from topics t
join map m on t.topic_id = m.topic_id
join tags tag on tag.tag_id = m.tag_id
where [conditionals]
Set up partitions and/or indexes on the map table to maximize the speed of your query. For example, if you have many more topics than tags, partition the table on topics. Then, each time you grab all the tags for a topic, it will be 1 read from 1 area, no seeking needed. Make sure to have both topics and tags indexed on their _id.
Use your 'explain plan' tool. (I am not familiar with mysql, but I assume there is some tool that can tell you how a query will be run, so you can optimize it)
EDIT:
So you have the following tables:
tbl_feedbacks
tbl_users
tbl_categories
tbl_clients
tbl_tags
tbl_topics
tbl_topics_tags
The query you provide as a starting point shows how feedback, users, categories and clients relate to each other.
I assume that tbl_topics_tags contains FKs to tags and topics, showing which topic has which tag. Is this correct?
What of (feedbacks, users, categories, and clients) has a FK to topics or tags? Or, do either topics or tags have a FK to any of the initial 4?
Once I know this, I'll be able to show how to modify the query.
EDIT #2
There are two different ways to go about this:
The easy way is the just join on your FK. This will give you one row for each tag. It is much easier and more flexible to put together the SQL to do it this way. If you are using some other language to take the results of the query and translate them to present them to the user, this method is better. If nothing else, it will be far more obvious what is going on, and will be easier to debug and maintain.
However, you may want each row of the query results to contain one feedback (and the tags that go with it).
SQL joining question <- this is a question I posted on how to do this. The answer I accepted is an oracle-only answer AFAIK, but there are other non-oracle answers.
Adapting Kevin's answer (which is supposed to work in SQL92 compliant systems):
select
[other stuff: same as in your post],
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 0 ) as tag1,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 1 ) as tag2,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 2 ) as tag3
from [same as in the OP]
This should do the trick.
Notes:
This will pull the first three tags. AFAIK, there isn't a way to have an arbitrary number of tags. You can expand the number of tags shown by copying and pasting more of those parts of the query. Make sure to increase the offset setting.
If this does not work, you'll probably have to write up another question, focusing on how to do the pivot in mysql. I've never used mysql, so I'm only guessing that this will work based on what others have told me.
One tip: you'll usually get more attention to your question if you strip away all the extra details. In the question I linked to above, I was really joining between 4 or 5 different tables, with many different fields. But I stripped it down to just the part I didn't know (how to get oracle to aggregate my results into one row). I know some stuff, but you can usually do far better than just one person if you trim your question down to the essentials.