how do i display all the tags related to all the feedbacks in one query - sql

I am trying to write a sql query which fetches all the tags related to every topic being displayed on the page.
like this
TITLE: feedback1
POSTED BY: User1
CATEGORY: category1
TAGS: tag1, tag2, tag3
TITLE: feedback2
POSTED BY: User2
CATEGORY: category2
TAGS: tag2, tag5, tag7,tag8
TITLE: feedback3
POSTED BY: User3
CATEGORY: category3
TAGS: tag1, tag5, tag6, tag3
The relationship of tags to topics is many to many.
Right now I am first fetching all the topics from the "topics" table and to fetch the related tags of every topic I loop over the returned topics array for fetching tags.
But this method is very expensive in terms of speed and not efficient too.
Please help me write this sql query.
Query for fetching all the topics and its information is as follows:
SELECT
tbl_feedbacks.pk_feedbackid as feedbackId,
tbl_feedbacks.type as feedbackType,
DATE_FORMAT(tbl_feedbacks.createdon,'%M %D, %Y') as postedOn,
tbl_feedbacks.description as description,
tbl_feedbacks.upvotecount as upvotecount,
tbl_feedbacks.downvotecount as downvotecount,
(tbl_feedbacks.upvotecount)-(tbl_feedbacks.downvotecount) as totalvotecount,
tbl_feedbacks.viewcount as viewcount,
tbl_feedbacks.title as feedbackTitle,
tbl_users.email as userEmail,
tbl_users.name as postedBy,
tbl_categories.pk_categoryid as categoryId,
tbl_clients.pk_clientid as clientId
FROM
tbl_feedbacks
LEFT JOIN tbl_users
ON ( tbl_users.pk_userid = tbl_feedbacks.fk_tbl_users_userid )
LEFT JOIN tbl_categories
ON ( tbl_categories.pk_categoryid = tbl_feedbacks.fk_tbl_categories_categoryid )
LEFT JOIN tbl_clients
ON ( tbl_clients.pk_clientid = tbl_feedbacks.fk_tbl_clients_clientid )
WHERE
tbl_clients.pk_clientid = '1'
What is the best practice that should be followed in such cases when you need to display all the tags related to every topic being displayed on a single page.
How do I alter the above sql query, so that all the tags plus related information of topics is fetched using a single query.
For a demo of what I am trying to achieve is similar to the'questions' page of stackoverflow.
All the information (tags + information of every topic being displayed) is properly displayed.
Thanks

To do this, I would have three tables:
Topics
topic_id
[whatever else you need to know for a topic]
Tags
tag_id
[etc]
Map
topic_id
tag_id
select t.[whatever], tag.[whatever]
from topics t
join map m on t.topic_id = m.topic_id
join tags tag on tag.tag_id = m.tag_id
where [conditionals]
Set up partitions and/or indexes on the map table to maximize the speed of your query. For example, if you have many more topics than tags, partition the table on topics. Then, each time you grab all the tags for a topic, it will be 1 read from 1 area, no seeking needed. Make sure to have both topics and tags indexed on their _id.
Use your 'explain plan' tool. (I am not familiar with mysql, but I assume there is some tool that can tell you how a query will be run, so you can optimize it)
EDIT:
So you have the following tables:
tbl_feedbacks
tbl_users
tbl_categories
tbl_clients
tbl_tags
tbl_topics
tbl_topics_tags
The query you provide as a starting point shows how feedback, users, categories and clients relate to each other.
I assume that tbl_topics_tags contains FKs to tags and topics, showing which topic has which tag. Is this correct?
What of (feedbacks, users, categories, and clients) has a FK to topics or tags? Or, do either topics or tags have a FK to any of the initial 4?
Once I know this, I'll be able to show how to modify the query.
EDIT #2
There are two different ways to go about this:
The easy way is the just join on your FK. This will give you one row for each tag. It is much easier and more flexible to put together the SQL to do it this way. If you are using some other language to take the results of the query and translate them to present them to the user, this method is better. If nothing else, it will be far more obvious what is going on, and will be easier to debug and maintain.
However, you may want each row of the query results to contain one feedback (and the tags that go with it).
SQL joining question <- this is a question I posted on how to do this. The answer I accepted is an oracle-only answer AFAIK, but there are other non-oracle answers.
Adapting Kevin's answer (which is supposed to work in SQL92 compliant systems):
select
[other stuff: same as in your post],
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 0 ) as tag1,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 1 ) as tag2,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 2 ) as tag3
from [same as in the OP]
This should do the trick.
Notes:
This will pull the first three tags. AFAIK, there isn't a way to have an arbitrary number of tags. You can expand the number of tags shown by copying and pasting more of those parts of the query. Make sure to increase the offset setting.
If this does not work, you'll probably have to write up another question, focusing on how to do the pivot in mysql. I've never used mysql, so I'm only guessing that this will work based on what others have told me.
One tip: you'll usually get more attention to your question if you strip away all the extra details. In the question I linked to above, I was really joining between 4 or 5 different tables, with many different fields. But I stripped it down to just the part I didn't know (how to get oracle to aggregate my results into one row). I know some stuff, but you can usually do far better than just one person if you trim your question down to the essentials.

Related

Performance in SQL sentence containing ORDER BY, LIMIT and COUNT

I've searched the way of improving this dangerous combination of functions in one SQL sentence...
To put you in a context, i have a table with several information about articles (article_id, author, ...) and another one containing the article_id with one tag_id. As an article is able to have several tags, that second table could have 2 rows with the same article_id and different tag_id.
In order to get a list of the 8 articles that have more tags in common with the one that i want (in this case the 1354) I have written the following query:
SELECT articles.article_id, articles.author, count(articles_tags.article_id) as times
FROM articles
INNER JOIN articles_tags ON (articles.article_id=articles_tags.article_id)
WHERE id_tag IN
(SELECT article_id FROM articles_tags WHERE article_id=1354)
AND article_id <> 1354
GROUP BY article_id
ORDER BY times DESC
LIMIT 8
It is EXTREMELY slow... like 90 seconds for half million articles.
By deleting the "order by times" sentence, it works almost instantly, but if i do so, i won't get the most similar articles.
What can i do?
Thanks!!
a query on a sub-select is ALWAYS a time-killer... Also, as the query didn't really appear to be accurate, or missing, I am making an assumption that your articles_tags table has two columns... one for the actual article ID, and another for the tag_ID associated with it.
That said, I would pre-query just the TAG IDs for article 1354 (the on you are interested in). Use that as a Cartesian join to the article tags again on the tag IDs being the same. From that, you are grabbing the SECOND version of article tags alias and getting ITs article ID, and then the count that MATCH (via Join and not a left-join). Apply the group by on the article ID as you had, And for grins, join to the articles table to get the author.
Now, note. Some SQL engines require you to group by all non-aggregate fields, so you MAY have to either add the author to the group by (which will always be the same per article ID anyway), or change it to MAX( A.author ) as Author which would give the same results.
I would have an index on the (tag_id, article_id) so the tags are found from the "common" tags you are looking to find in common. You could have one article with 10 tags, and another article with 10 completely different tags resulting in 0 in common. This will prevent the other article from even appearing in the result set.
You STILL have the time associated with blowing through half-million articles as you described, which could be millions of actual tag entries.
select
AT2.article_id,
A.Author,
count(*) as Times
from
( select ATG.id_tag
from articles_tags ATG
where ATG.Article_ID = 1354
order by ATG.id_tag ) CommonTags
JOIN articles_tags AT2
on CommonTags.ID_Tag = AT2.ID_Tag
AND AT2.Article_ID <> 1354
JOIN articles A
on AT2.Article_ID = A.Article_ID
group by
AT2.article_id
order by
Times DESC
limit 8
It seems that it should be possible to do this without any subqueries, and then a quicker query may result.
Here the article of interest is joined to its tags, and then further to other articles having these tags. Then the number of tags for each article is counted and ordered:
SELECT a2.article_id, a2.author, COUNT(t2.tag_id) AS times
FROM articles a1
INNER JOIN articles_tags t1
ON t1.article_id = a1.article_id -- find tags for staring article
INNER JOIN tags t2
ON t2.tag_id = t1.tag_id -- find other instances of those tags
AND t2.articles_id <> t1.articles_id
INNER JOIN articles a2
ON a2.articles_id = t2.articles_id -- and the articles where they are used
WHERE a1.article_id = 1354
GROUP BY a2.article_id, a2.author -- count common tags by articles
ORDER BY times DESC
LIMIT 8
If you know a lower bound on the number of tags in common (e.g. 3), inserting HAVING times > 2 before ORDER BY times DESC could give a further speed improvement.

How can I group objects retrieved from database tables that have the same properties?

I am working on an application that allows users to build a "book" from a number of "pages" and then place them in any order that they'd like. It's possible that multiple people can build the same book (the same pages in the same order). The books are built by the user prior to them being processed and printed, so I need to group books together that have the same exact layout (the same pages in the same order). I've written a million queries in my life, but for some reason I can't grasp how to do this.
I could simply write a big SELECT query, and then loop through the results and build arrays of objects that have the same pages in the same sequence, but I'm trying to figure out how to do this with one query.
Here is my data layout:
dbo.Books
BookId
Quantity
dbo.BookPages
BookId
PageId
Sequence
dbo.Pages
PageId
DocName
So, I need some clarification on a few things:
Once a user orders the pages the way they want, are they saved back down to a database?
If yes, then is the question to run a query to group book orders that have the same page-numbering, so that they are sent to the printers in an optimal way?
OR, does the user layout the pages, then send the order directly to the printer? And if so, it seems more complicated/less efficient to capture requested print jobs, and order them on-the-fly on the way out to the printers ...
What language/technology are you using to create this solution? .NET? Java?
With the answers to these questions, I can better gauge what you need.
With the answers to my questions, I also assume that:
You are using some type of many-to-many table to store customer page ordering. If so, then you'll need to write a query to select distinct page-orderings, and group by those page orderings. This is possible with a single SQL query.
However, if you feel you want more control over how this data is joined, then doing this programmatically may be the way to go, although you will lose performance by reading in all the data, and then outputting that data in a way that is consumable by your printers.
The books are identical only if the page count = match count.
It was tagged TSQL when I started. This may not be the same syntax on SQL.
;WITH BookPageCount
AS
(
select b1.bookID, COUNT(*) as [individualCount]
from book b1 with (nolock)
group by b1.bookID
),
BookCombinedCount
AS
(
select b1.bookID as [book1ID], b2.bookID as [book2ID], COUNT(*) as [combindCount]
from book b1 with (nolock)
join book b2 with (nolock)
on b1.bookID < b2.bookID
and b1.squence = b2.squence
and b1.page = b2.page
group by b1.bookID, b2.bookID
)
select BookCombinedCount.book1ID, BookCombinedCount.book2ID
from BookCombinedCount
join BookPageCount as book1 on book1.bookID = BookCombinedCount.book1ID
join BookPageCount as book2 on book2.bookID = BookCombinedCount.book2ID
where BookCombinedCount.combindCount = book1.individualCount
and BookCombinedCount.combindCount = book2.individualCount.PageCount

How to optimize count of new posts in favorites

I would like to say, that I will be glad for any reply. I will try to structure my text for better understanding.
Situation
I run thematic internet forum, where you can add topics as your favorites in the menu and the menu shows the number of new posts in these topics. So everytime you reload the page (go anywhere on the whole site), new posts for all topics in your favorites are checked.
Problem
This is of course quite expensive on DB, because it is common to have 20-50 favorites and I have to check the DB if any post was added in any of these topics. The average topic has 1000-2000 posts. And this happens for every pageview for every user which is approximately 900 000 pageviews per month.
Possible solution 1
I store number of total posts in every topic and I store number of last viewed posts for every topic, for every user. This may be fastest, but it has a lot of disadvantages, which are functional (deleting, filtering of posts, etc.).
Possible solution 2
I store id of last viewed post for every topic, for every user. This is very good solution, but about ten times slower then previous one.
Database
I store all posts for all topics in one huge table = hunderds of thousands of posts.
Question
I would like to remove problems that brings solution 1, but I need to keep the speed. I thought of creating a table for each topic and use Solution 2, but I dont know if it will help. So if you have any experiences please just tell me what would be the fastest solution.
Thank you very much.
Firsty: No idea about your schema or database system, but this should be relativly simple assuming you keep a record of when your user was last seen ($DATE_USER_WAS_LAST_SEEN in the example below) and each of your posts is presumably associaed with it's topic by some kind of id and you have a list of all the $FAVOURITE ids.
SELECT topic_id, count(*) AS count FROM posts
WHERE topic_id IN ($FAVOURITES)
AND created_date > $DATE_USER_WAS_LAST_SEEN
GROUP BY topic_id
will give you an output like:
topic_id | count
---------------------
3 | 20
1 | 27
33 | 120
This should be an acceptable speed for this kind of scale, you could improve the query by not using IN and making a long (topic_id = 1 OR topic_id = 2 OR topic_id = etc) string if your database doesn't automatically optimise these things.
Secondly: Don't worry so much about keeping these values bang up to date. People will use them as an indicator that there are new messages, not base life decisions on them, so cache these requests per user (either on the user's own record or using some kind of in-memory cache like memcache if you are familiar with those) and expire the cache every 5mins or so, this will radically reduce your hits to the database
I suppose your post ids are sequential and always incrementing.
Create a table for your favorite with at least these fields : user_id, topic_id, last_post_id
You can then check for new posts with this simple query :
select topics.id, count(posts.id)
from users
inner join favorites on favorites.user_id = users.id
inner join topics on topics.id = favorites.topic_id
inner join posts on
posts.topic_id = topics.id and
posts.id > last_post_id
where users.id = $id
group by topics.id
This should run pretty smoothly.
You must also update the last_post_id each time a user visit a topic, but this should be pretty straightforward.
I you have an index (topic_id, post_id) on the huge all_posts table it shouldn't be too costly to do this query:
select topic_id, count(*)
from all_posts a
inner join user_favorites u on u.topic_id = a.topic_id
where a.post_id > u.post_id and u.user_id = #user_id
group by topic_id

Full text search involving 2 tables

I'm very noob in relation to Full Text search and I was told to do a full text search over 2 tables and sort results by relevance.
I will look on table "Posts" and table "PostComments". I must look for the search term (let's say "AnyWord") on Posts.Description, Posts.Title and PostComments.Comments.
I have to return Posts order by relevance but since I'm looking on Posts AND PostComments I don't know if this make sense. I'd say I need all the information on the same table in order to sort by relevance.
Could you help me to figure out if this make sense and if it does how to achieve it?
EDIT
I'll try to explain a little better what I need.
A Post is relevant for the search if the searched term is present on the title, on the description or on any of the related PostComments.
But on the front end I will show a list of post. The title of the post on this list is a link to the post itself. The post comments are visible there but not on the search result list, although they are involved on the search process.
So you could have posts on the search result that matched JUST because the search term is present on one or more comments
Only ContainsTable returns an evaluation of relevance. You did not mention what needed to be returned so I simply returned the name of the table from where the value is stored along with the given table's primary key (you would replace "PrimaryKey" with your actual primary key column name e.g. PostId or PostCommentsId), the value and its rank.
Select Z.TableName, Z.PK, Z.Value, Z.Rank
From (
Select 'Posts' As TableName, Posts.PrimaryKey As PK, Posts.Description As Value, CT.Rank
From Posts
Join ContainsTable( Posts, Description, 'Anyword' ) As CT
On CT.Key = Posts.PrimaryKey
Union All
Select 'PostComments', PostComments.PrimaryKey, PostComments.Comments, CT.Rank
From PostComments
Join ContainsTable( PostComments, Comments, 'Anyword' ) As CT
On CT.Key = PostComments.PrimaryKey
) As Z
Order By Z.Rank Desc
EDIT Given the additional information, it is much clearer. First, it would appear that the ranking of the search has no bearing on the results. So, all that is necessary is to use an OR between the search on post information and the search on PostComments:
Select ...
From Posts
Where Contains( Posts.Description, Posts.Title, 'searchterm' )
Or Exists (
Select 1
From PostComments
Where PostComments.PostId = Posts.Id
And Contains( PostComments.Comments, 'searchterm' )
)

finding the posts related by tags with one specific post in mysql

I have this tables: Posts (post_is, title and text), Tags (tag_id, tag), Post_tag_nn (id, tag_id, post_id). I want for a specific post having for example 4 tags all the posts with those tags, then all the posts with any three of those tags, then all posts with any two of those tags and so on. How can i build a SQL query for this purpose (in php it seems like a backtracking problem=all the subsets of a given set).
Have a query to find the tags of the current post, something like
SELECT tag_id
FROM Post_tag_nn
WHERE post_id = $post_id;
Then using those tag id's, this query should return you the id's of posts with 4,3,2,... matching tags:
SELECT post_id, COUNT(post_id) AS tag_count
FROM Post_tag_nn
WHERE tag_id IN ($array_of_tag_ids)
GROUP BY post_id
ORDER BY tag_count DESC;
If you're going to be pulling down every post with even a single one of the tags regardless, you might be best off just running a single query per tag to pull all of the posts with that tag, and then generating the sets yourself.
Something like:
select t.id, t.tag_id, p.post_id, p.title, p.text
from post_tag_nn as t, posts p
where p.id = t.post_id
order by t.id
And then do the group in your code. You could of course do two different queries, one where you figure out the order and count of your tags and then one where you pull back the post for each tag.