How to combine data from 2 tables under circumstances? - sql

I have 2 tables. One table contains posts and the other contains votes for the posts. Each member can vote (+ or -) for each post.
(Structure example:)
Posts table: pid, belongs, userp, text.
Votes table: vid, userv, postid, vote.
Also one table which contains the info for the users.
What I want is: Supposing I am a logged-in member. I want to show all the posts, and at those I've already voted, not let me vote again. (and show me what I have voted + or -)
What I have done til now is very bad as it will do a lot of queries:
SELECT `posts`.*, `users`.`username`
FROM `posts`,`users`
WHERE `posts`.belongs=$taken_from_url AND `users`.`usernumber`=`posts`.`userp`
ORDER BY `posts`.`pid` DESC;
and then:
foreach ($query as $result) {if (logged_in) {select vote from votes....etc} }
So, this means that if I am logged in and it shows 30 posts, then it will do 30 queries to check if at each post I have voted and what I've voted. My question is, can I do it shorter with a JOIN (I guess) and how? (I already tried something, but didn't succeed)

Firstly I'll say that if you're going to have significantly different output for users logged in versus those that aren't, just have two queries rather than trying to create something really complicated.
Secondly, this should do something like what you want:
SELECT p.*, u.username,
(SELECT SUM(vote) FROM votes WHERE postid = p.pid) total_votes,
(SELECT vote FROM votes WHERE postid = p.pid AND userv = $logged_in_user_id) my_vote
FROM posts p
JOIN users u ON p.userp = u.usernumber
WHERE p.belongs = $taken_from_url
ORDER BY p.pid DESC
Note: You don't say what the values of the votes table are. I'm assuming it's either +1 (up) or -1 (down) so you can easily find the total votes by adding them up. If you're not doing it this way I suggest you do to make your life easier.
The first correlated subquery can be eliminated by doing a JOIN and GROUP BY but I tend to find the above form much more readable.
So what this does is it joins users to posts, much like you were doing except that it uses JOIN syntax (which again comes down to readability). Then it has two subqueries: the first finds the total votes for that particular post and the second finds out what a particular user's vote was:
+1: up vote;
-1: down vote;
NULL: no vote.

Related

How to get post with maximum likes or post with likes counts in rails

I am having two models post and like, having a relationship between them. Post has_many likes. I wanted an optimal way to find which post has maximum likes. One way of doing this by
count = {}
Post.includes(:likes).each do |post|
count[post.id] = post.likes.count
end
Initially I used array which is not a good data structure so I used hash,but still I am not satisfy with this type of approach. What would be the best to get posts with likes.
Also, I have tried the following query but it is not working as expected so could anyone can suggest a better and optimal approach.
Post.joins("LEFT OUTER JOIN Likes ON likes.post_id =posts.id").group("posts.id").order("COUNT(likes.id) DESC")
Use counter_cache so that you always have a count of likes on the Post objects, then you can call Post.maximum(:likes_count).first to retrieve the one post that has the most likes. Likewise, any Post query will include a post's like count.
You don't need joining. Group likes by post_id and count them. The resulting post_id with max count will be id of your most liked post. Then you can join or just select the post you're looking for. In pure SQL it would look like:
SELECT l.post_id, count(*) as cnt
FROM likes l
GROUP BY l.post_id
ORDER BY cnt DESC
LIMIT 1;

How to do this in one select query?

I need to display a list of posts. For each post, I need to also show:
How many people "like" the post.
Three names of those who "like" the post (preferably friends of viewing user).
If the viewing user "likes" the post, I'd like for him/her to be one of the three.
I don't know how to do it without querying for each item in a for loop, which is proving to be very slow. Sure caching/denormalization will help, but I'd like to know if this can be done otherwise. How does facebook do it?
Assuming this basic db structure, any suggestions?
users
-----
id
username
posts
---------
id
user_id
content
friendships
-----------
user_id
friend_id
is_confirmed (bool)
users_liked_posts
-----------------
user_id
post_id
As a side note, if anyone knows how to do this in SQLAlchemy, that would very much appreciated.
EDIT: SQLFiddle http://sqlfiddle.com/#!2/9e703
You can try this in your sqlfiddle. The condition "WHERE user_id = 2" needs 2 replaced by your current user id.
SELECT numbered.*
FROM
(SELECT ranked.*,
IF (post_id=#prev_post,
#n := #n + 1,
#n := 1 AND #prev_post := post_id) as position
FROM
(SELECT users_liked_posts.post_id,
users_liked_posts.user_id,
visitor.user_id as u1,
friendships.user_id as u2,
IF (visitor.user_id is not null, 1, IF(friendships.user_id is not null, 2, 3)) as rank
FROM users_liked_posts
INNER JOIN posts
ON posts.id = users_liked_posts.post_id
LEFT JOIN friendships
ON users_liked_posts.user_id = friendships.user_id
AND friendships.friend_id = posts.user_id
LEFT JOIN (SELECT post_id, user_id FROM users_liked_posts WHERE user_id = 2) visitor
ON users_liked_posts.post_id = visitor.post_id
AND users_liked_posts.user_id = visitor.user_id
ORDER BY users_liked_posts.post_id, rank) as ranked
JOIN
(SELECT #n := 0, #prev_post := 0) as setup) as numbered
WHERE numbered.position < 4
You can easily join subquery "numbered" with table "users" to obtain additional user information. There are extra fields u2, u3 to help see what is happening. You can remove these.
General idea of the query:
1) left join users_liked_posts with itself two times. The first time it is restricted to current visitor, creating subquery visitors. The second time is restricted to friends.
2) the column rank, IF (visitor.user_id is not null, 1, IF(friendships.user_id is not null, 2, 3)), assigns a rank to each user in users_liked_posts. This query is sorted by post and by rank.
3) use the previous as a subquery to create the same data but with a running position for the users, per post.
4) use the previous as a subquery to extract the top 3 positions per post.
No, these steps can not be merged, in particular because MySQL does not allow a computed column to be used by alias in the WHERE condition.
#koriander gave the SQL answer, but as to how Facebook does it, you already partially answered that; they use highly denormalized data, and caching. Also, they implement atomic counters, in-memory edge lists to perform graph traversals, and they most certainly don't use relational database concepts (like JOIN's) since they don't scale. Even the MySQL clusters they run are essentially just key/value pairs which only get accessed when there's a miss in the cache layer.
Instead of an RDBS, I might suggest a graph database for your purposes, like neo4j
Good luck.
EDIT:
You're really going to have to play with Neo4j if you're interested in using it. You may or may not find it easier coming from a SQL background, but it will certainly provide more powerful, and likely faster, queries for performing graph traversals.
Here's a couple examples of Cypher queries which may be useful to you.
Count how many people like a post:
START post=node({postId})
MATCH post<-[:like]-user
RETURN count(*)
(really you should use an atomic counter, instead, if it's something you're going to be querying for a lot)
Get three people who liked a post with the following constraints:
The first likingUser will always be the current user if he/she liked the post.
If friends of the current user liked the post, they will show up before any non-friends.
START post=node({postId}), user=node({currentUserId})
MATCH path = post<-[:like]-likingUser-[r?:friend*0..1]-user
RETURN likingUser, count(r) as rc, length(path) as len
ORDER BY rc desc, len asc
LIMIT 3
I'll try to explain the above query... if I can.
Start by grabbing two nodes, the post and the current user
Match all users who like the post (likingUser)
Additionally, test whether there is a path of length 0 or 1 which connects likingUser through a friendship relationship to the current user (a path of length 0 indicates that likingUser==user).
Now, order the results first by whether or not relationship r exists (it will exist if the likingUser is friends with user or if likingUser==user). So, count(r) will be either 0 or 1 for each result. Since we prefer results where count(r)==1, we'll sort this in descending order.
Next, perform a secondary sort which forces the current user to the top of the list if he/she was part of the results set. We do this by checking the length of path. When user==likingUser, the path length will be shorter than when user is a friend of likingUser, so we can use length(path) to force user up to the top by sorting in ascending order.
Lastly, we limit the results to only the top three results.
Hopefully that makes some sense. As a side note, you may actually get better performance by separating out your queries. For example, one query to see if the user likes the post, then another to get up to three friends who liked the post, and finally another to get up to three non-friends who like the post. I say it may be faster because each query can short-circuit after it gets three results, whereas the big single-query I wrote has to consider all possibilities, then sort them. So, just keep in mind that just because you can combine multiple questions into a single query, it may actually perform worse than multiple queries.

How to optimize count of new posts in favorites

I would like to say, that I will be glad for any reply. I will try to structure my text for better understanding.
Situation
I run thematic internet forum, where you can add topics as your favorites in the menu and the menu shows the number of new posts in these topics. So everytime you reload the page (go anywhere on the whole site), new posts for all topics in your favorites are checked.
Problem
This is of course quite expensive on DB, because it is common to have 20-50 favorites and I have to check the DB if any post was added in any of these topics. The average topic has 1000-2000 posts. And this happens for every pageview for every user which is approximately 900 000 pageviews per month.
Possible solution 1
I store number of total posts in every topic and I store number of last viewed posts for every topic, for every user. This may be fastest, but it has a lot of disadvantages, which are functional (deleting, filtering of posts, etc.).
Possible solution 2
I store id of last viewed post for every topic, for every user. This is very good solution, but about ten times slower then previous one.
Database
I store all posts for all topics in one huge table = hunderds of thousands of posts.
Question
I would like to remove problems that brings solution 1, but I need to keep the speed. I thought of creating a table for each topic and use Solution 2, but I dont know if it will help. So if you have any experiences please just tell me what would be the fastest solution.
Thank you very much.
Firsty: No idea about your schema or database system, but this should be relativly simple assuming you keep a record of when your user was last seen ($DATE_USER_WAS_LAST_SEEN in the example below) and each of your posts is presumably associaed with it's topic by some kind of id and you have a list of all the $FAVOURITE ids.
SELECT topic_id, count(*) AS count FROM posts
WHERE topic_id IN ($FAVOURITES)
AND created_date > $DATE_USER_WAS_LAST_SEEN
GROUP BY topic_id
will give you an output like:
topic_id | count
---------------------
3 | 20
1 | 27
33 | 120
This should be an acceptable speed for this kind of scale, you could improve the query by not using IN and making a long (topic_id = 1 OR topic_id = 2 OR topic_id = etc) string if your database doesn't automatically optimise these things.
Secondly: Don't worry so much about keeping these values bang up to date. People will use them as an indicator that there are new messages, not base life decisions on them, so cache these requests per user (either on the user's own record or using some kind of in-memory cache like memcache if you are familiar with those) and expire the cache every 5mins or so, this will radically reduce your hits to the database
I suppose your post ids are sequential and always incrementing.
Create a table for your favorite with at least these fields : user_id, topic_id, last_post_id
You can then check for new posts with this simple query :
select topics.id, count(posts.id)
from users
inner join favorites on favorites.user_id = users.id
inner join topics on topics.id = favorites.topic_id
inner join posts on
posts.topic_id = topics.id and
posts.id > last_post_id
where users.id = $id
group by topics.id
This should run pretty smoothly.
You must also update the last_post_id each time a user visit a topic, but this should be pretty straightforward.
I you have an index (topic_id, post_id) on the huge all_posts table it shouldn't be too costly to do this query:
select topic_id, count(*)
from all_posts a
inner join user_favorites u on u.topic_id = a.topic_id
where a.post_id > u.post_id and u.user_id = #user_id
group by topic_id

how do i display all the tags related to all the feedbacks in one query

I am trying to write a sql query which fetches all the tags related to every topic being displayed on the page.
like this
TITLE: feedback1
POSTED BY: User1
CATEGORY: category1
TAGS: tag1, tag2, tag3
TITLE: feedback2
POSTED BY: User2
CATEGORY: category2
TAGS: tag2, tag5, tag7,tag8
TITLE: feedback3
POSTED BY: User3
CATEGORY: category3
TAGS: tag1, tag5, tag6, tag3
The relationship of tags to topics is many to many.
Right now I am first fetching all the topics from the "topics" table and to fetch the related tags of every topic I loop over the returned topics array for fetching tags.
But this method is very expensive in terms of speed and not efficient too.
Please help me write this sql query.
Query for fetching all the topics and its information is as follows:
SELECT
tbl_feedbacks.pk_feedbackid as feedbackId,
tbl_feedbacks.type as feedbackType,
DATE_FORMAT(tbl_feedbacks.createdon,'%M %D, %Y') as postedOn,
tbl_feedbacks.description as description,
tbl_feedbacks.upvotecount as upvotecount,
tbl_feedbacks.downvotecount as downvotecount,
(tbl_feedbacks.upvotecount)-(tbl_feedbacks.downvotecount) as totalvotecount,
tbl_feedbacks.viewcount as viewcount,
tbl_feedbacks.title as feedbackTitle,
tbl_users.email as userEmail,
tbl_users.name as postedBy,
tbl_categories.pk_categoryid as categoryId,
tbl_clients.pk_clientid as clientId
FROM
tbl_feedbacks
LEFT JOIN tbl_users
ON ( tbl_users.pk_userid = tbl_feedbacks.fk_tbl_users_userid )
LEFT JOIN tbl_categories
ON ( tbl_categories.pk_categoryid = tbl_feedbacks.fk_tbl_categories_categoryid )
LEFT JOIN tbl_clients
ON ( tbl_clients.pk_clientid = tbl_feedbacks.fk_tbl_clients_clientid )
WHERE
tbl_clients.pk_clientid = '1'
What is the best practice that should be followed in such cases when you need to display all the tags related to every topic being displayed on a single page.
How do I alter the above sql query, so that all the tags plus related information of topics is fetched using a single query.
For a demo of what I am trying to achieve is similar to the'questions' page of stackoverflow.
All the information (tags + information of every topic being displayed) is properly displayed.
Thanks
To do this, I would have three tables:
Topics
topic_id
[whatever else you need to know for a topic]
Tags
tag_id
[etc]
Map
topic_id
tag_id
select t.[whatever], tag.[whatever]
from topics t
join map m on t.topic_id = m.topic_id
join tags tag on tag.tag_id = m.tag_id
where [conditionals]
Set up partitions and/or indexes on the map table to maximize the speed of your query. For example, if you have many more topics than tags, partition the table on topics. Then, each time you grab all the tags for a topic, it will be 1 read from 1 area, no seeking needed. Make sure to have both topics and tags indexed on their _id.
Use your 'explain plan' tool. (I am not familiar with mysql, but I assume there is some tool that can tell you how a query will be run, so you can optimize it)
EDIT:
So you have the following tables:
tbl_feedbacks
tbl_users
tbl_categories
tbl_clients
tbl_tags
tbl_topics
tbl_topics_tags
The query you provide as a starting point shows how feedback, users, categories and clients relate to each other.
I assume that tbl_topics_tags contains FKs to tags and topics, showing which topic has which tag. Is this correct?
What of (feedbacks, users, categories, and clients) has a FK to topics or tags? Or, do either topics or tags have a FK to any of the initial 4?
Once I know this, I'll be able to show how to modify the query.
EDIT #2
There are two different ways to go about this:
The easy way is the just join on your FK. This will give you one row for each tag. It is much easier and more flexible to put together the SQL to do it this way. If you are using some other language to take the results of the query and translate them to present them to the user, this method is better. If nothing else, it will be far more obvious what is going on, and will be easier to debug and maintain.
However, you may want each row of the query results to contain one feedback (and the tags that go with it).
SQL joining question <- this is a question I posted on how to do this. The answer I accepted is an oracle-only answer AFAIK, but there are other non-oracle answers.
Adapting Kevin's answer (which is supposed to work in SQL92 compliant systems):
select
[other stuff: same as in your post],
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 0 ) as tag1,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 1 ) as tag2,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 2 ) as tag3
from [same as in the OP]
This should do the trick.
Notes:
This will pull the first three tags. AFAIK, there isn't a way to have an arbitrary number of tags. You can expand the number of tags shown by copying and pasting more of those parts of the query. Make sure to increase the offset setting.
If this does not work, you'll probably have to write up another question, focusing on how to do the pivot in mysql. I've never used mysql, so I'm only guessing that this will work based on what others have told me.
One tip: you'll usually get more attention to your question if you strip away all the extra details. In the question I linked to above, I was really joining between 4 or 5 different tables, with many different fields. But I stripped it down to just the part I didn't know (how to get oracle to aggregate my results into one row). I know some stuff, but you can usually do far better than just one person if you trim your question down to the essentials.

selecting and displaying ranked items and a user's votes, a la reddit, digg, et al

when selecting ranked objects from a database (eg, articles users have voted on), what is the best way to show:
the current page of items
the user's rating, per item (if they've voted)
rough schema:
articles: id, title, content, ...
user: id, username, ...
votes: id, user_id, article_id, vote_value
is it better/ideal to:
select the current page of items
select the user's vote, limiting them to the page of items with an 'IN' clause
or
select the current page of items and just 'JOIN' vote data from the table of user votes
or, something entirely different?
this is theoretically in a high-traffic environment, and using an rdbms like mysql. fwiw, i see this on the side of "thinking it out before doing" and not "premature optimization."
thanks!
The JOIN would be faster; it would save a round trip to the database.
However, I wouldn't worry at all about this until you actually get some traffic. Many people have spoken out against premature optimization, I'll quote a random one:
More computing sins are committed in
the name of efficiency (without
necessarily achieving it) than for any
other single reason - including blind
stupidity.
If you need to order on votes, use this:
SELECT *
FROM (
SELECT a.*, (
SELECT SUM(vote_value)
FROM votes v
WHERE v.article_id = a.id
) AS votes
FROM article a
)
ORDER BY
votes DESC
LIMIT 100, 110
This will count the votes and paginate in a single query.
If you want to show only the user's own votes, use LEFT JOIN:
SELECT a.*, vote_value
FROM articles a
LEFT JOIN
votes v
ON v.user_id = #current_user
AND v.article_id = a.id
ORDER BY
a.timestamp DESC
LIMIT 100, 110
Having an index on (vote_user, vote_item) will greatly improve this query.
Note that you can make (vote_user, vote_item) a PRIMARY KEY for votes, which will improve this query even more.