SQL ORDER BY something else than one of the table's columns - sql

I have a table with posts in them. Website visitors can upvote or downvote such a post. I want to order a certain sql query by the score of the post, but my posts table doesn't have a score column - I keep the upvotes and downvotes in a different votes table, because that tells me who voted on what. I could add a score column to by posts table, and update it every time someone votes on a post, but I'd rather not do this, as the score is something I can work out by subtracting the downvotes from the upvotes anyways.
Do you have any suggestions? Or should I just go ahead and add a score column to my table?
Edit
My posts table has a post_id column (among other irrelevant columns) and my votes table has columns post_id, user_id and positive (the latter is a BOOLEAN, being 1 when the vote is an upvote and 0 when the vote is a downvote).
I can easily determine the score of a post 'by hand', by first querying the number of upvotes of that post, then the number of downvotes, and calculating their difference. However, I would like to query my posts table and order by the score of that post, so I want to know how/if I can query the votes table in the ORDER BY command while querying the posts table.

No, you do not have to create a score column. You can order by the calculated score, as below:
Since you do have the upvotes and downvotes in a different table, you need to join, as Tim Schmelter has explained.
SELECT p.*
FROM Post p
INNER JOIN Votes v
ON p.PostID = v.PostID
ORDER BY (v.upvotes - v.downvotes);
If you want to get the query to perform better, you could add a function-based index for (v.upvotes - v.downvotes).
EDIT:
Based on the updated information about the posts and the votes table, the following query can be used. The score is calculated within an inline view using a CASE statement. Then, this inline view is joined with the posts table, ordering the rows by the score. Note that an INNER JOIN is used, so only posts that have votes would be listed. To list all posts, a LEFT JOIN could be used instead.
SELECT p.*
FROM posts p
INNER JOIN
(
SELECT
post_id,
SUM
(
CASE
WHEN positive = 0 THEN -1
ELSE 1
END
) score
FROM votes v
GROUP BY post_id
) scores
ON p.post_id = scores.post_id
ORDER BY scores.score;

You have to link both tables via JOIN. Presuming that the Score-table has a column PostID:
SELECT p.*, Score = s.Upvotes- s.DownVotes
FROM Post p
INNER JOIN Score s
ON p.PostID = s.PostID
ORDER BY Score

Presumably, your data has a scores table with a column for each vote and an indicator of whether it is an up vote or down vote. If so, you need to aggregate this information and then you can use it for ordering:
select p.*, (NumUpVotes - NumDownVotes) as NetVotes
from posts p left outer join
(select PostId, sum(case when IsUpVote = 'Y' then 1 else 0 end) as NumUpvotes,
sum(case when IsDownVote = 'Y' then 1 else 0 end) as NumDownVotes
from scores s
group by PostId
) s
on p.postId = scores.PostId
order by (NumUpVotes - NumDownVotes);
You don't specify what database you are using so this uses standard SQL that should work in any database. You can adapt the logic for your particular data structure.

Related

How to query top record group conditional on the counts and strings in a second table

I call on the SQL Gods of the internet!! O so desperately need your help with this query, my livelyhood depends on it. I've solved it in Alteryx in like 2 minutes but i need to write this query in SQL and I am relatively new to the language in terms of complex blending and syntax.
Your help would be so appreciated!! :) xoxox I cant begin to describe
Using SSMS I need to use 2 tables 'searches' and 'events' to query...
the TOP 2 [user]s with the highest count of unique search ids in Table 'searches'
Condition that the [user]s in the list have at least 1 eventid in 'events' where [event type] starts with "great"
Here is an example of what needs to happen
search event and end result example
So the only pieces i have so far are below but boy oh boy please don't Laugh :(
What i was trying to do is..
select a table of unique users with the searchcounts from the search table
inner join selected table from 1 on userid with a table described in 3
create table of unique user ids with counts of events with [type] starting with "great"
Filter the inner joined table for the top 2 search counts from step 1
SELECT userid, COUNT() as searchcount
FROM searches
GROUP BY userid
INNER JOIN (SELECT userid, COUNT() as eventcount
FROM events WHERE LEFT(type, 5) = "great" AND eventcount>0 Group by userid)
ON searches.userid=events.userId
Obviously, this doesn't work at all!!! I think my structure is off and my method of filtering for "great" is errored. Also i dont know how to add the "top 2" clause to the search table query without affecting the inner join. This code needs to be fairly efficient so if you have a better more computationally efficient idea...I love you long time
SELECT top(2) userid, COUNT() as searchcount FROM searches
where userid in (select userid from events where left(type, 5)='great')
GROUP BY userid
order by count() desc
hope above query will serve your purpose.
I think you need exists and windows function dense_rank as follows:
Select * from
(Select u.userid, dense_rank() over (partition by u.userid order by count(*) desc) as rn
From users u join searches s on u.userid = s.userid
Where exists
(select 1 from events e
Where e.userid = u.userid And LEFT(e.type, 5) = 'great')
Group by u.userid ) t Where rn <= 2

LEFT JOIN discarding left rows in results?

Simplifying my issue, let's say I have two tables:
"Users" storing user_id and event_date from users who access each day.
"Purchases" storing user_id, event_date and product_id from users who make purchases each day.
I need to get from all users, their respective product purchases, or null value for product_id if a user didn't make a purchase. For that purpose I made this query:
with all_users as (
select user_id from `my_project.my_dataset.Users`
where event_date = "2019-12-01"
)
select user_id,product_id
from all_users
left join `my_project.my_dataset.Purchases`
using(user_id)
where event_date = "2019-12-01"
But this query returns only user_id who made purchases, in other words, there are rows in the LEFT from_item (all_users) that are being ommited in the result.
Is this working as spected? I read that LEFT JOIN always retains all rows of the left from_item.
EDIT 1:
Adding some screenshots:
This is the full query detailed before, but with real names (table "Users" is "user_metrics_daily" and table "Purchases" is "virtual_currency_daily"). As you can see, I added the count(distinct user_pseudo_id)OVER() to count how many distinct users are in the result.
In the other hand, this is a query to get the number of users I expect to have in the result (8935 users, with null values in product_id for users who don't purchase). But actually I got 2724 distinct users (the number of users who made purchases).
EDIT 2: I found a solution to my desired result, but still I don't understand what's wrong with my first query.
Your query (as it is) should return an error because user_id is ambiguous. BigQuery does not know if you want the column from all_users or my_project.my_dataset.Purchases.
Discarding that, you need to explicitly say from which table the projected columns should come from. In your case, user_id from all_users and product_id from my_project.my_dataset.Purchases.
with all_users as (
select user_id from `my_project.my_dataset.Users`
where event_date = "2019-12-01"
)
select
a.user_id,
p.product_id
from all_users as a
left join `my_project.my_dataset.Purchases` as p on a.user_id = p.user_id
where event_date = "2019-12-01"

SQL Joins, Count(), and group by to sort 'posts' by # of yes/no 'votes'

I have posts, votes, and comments tables. Each post can have N 'yes votes', N 'no votes' and N comments. I am trying to get a set of posts sorted by number of yes votes.
I have a query that does exactly this, but is running far too slowly. On a data set of 1500 posts and 15K votes, it's take .48 seconds on my dev machine. How can I optimize this?
select
p.*,
v.yes,
x.no
from
posts p
left join (select post_id, vote_type_id, count(1) as yes from votes where (vote_type_id = 1) group by post_id) v on v.post_id = p.id
left join (select post_id, vote_type_id, count(1) as no from votes where (vote_type_id = 2) group by post_id) x on x.post_id = p.id
left join (select post_id, count(1) as comment_count from comments group by post_id) p on p.confession_id = p.id
order by
yes desc
limit
0, 10
EDIT:
Votes and Comments both have a post_id FK
Adding an index on vote_type_id and post_id in the votes table shaved .1sec off the query execution.
Add a 'yes_count' column and use a trigger to update the vote count for each post when the vote is made. You can index this column, then it should be very fast.
Use explain for checking the query execution plan so you can see why it is slow, usually it is enough to see the plan and later create appropriate indexes. The 1.5k and 15k tables are really small so that query should be much faster.
Why don't you add a column yes and no ? Rather than adding a new entry at every post, just increment the count.
If I misunderstood your database or you can't modify it, at least do you have a foreign key on votes.post_id to post.id? Foreign keys are crutial if you do any join.
First off, your current query shouldn't compile, as it uses p as an alias for both the comments and the posts table.
Second, you're joining votes twice: once for no, and once for yes. Using a CASE statement, you can compute the sums of both with a single join. Here's a sample query:
select
p.*,
sum(case when v.vote_type_id = 1 then 1 else 0 end) as yes,
sum(case when v.vote_type_id = 2 then 1 else 0 end) as no,
count(c.id) as comment_count
from posts p
left join votes v on v.post_id = p.id
left join comments c on c.post_id = p.id
order by yes desc
limit 0, 10
Third, you could verify that the proper foreign keys exists for the relations between posts, votes and comments. An (post_id, vote_type_id) index on the votes could also help.

Get Common Rows Within The Same Table

I've had a bit of a search, but didn't find anything quite like what I'm trying to achieve.
Basically, I'm trying to find a similarity between two users' voting habits.
I have a table storing each individual vote made, which stores:
voteID
itemID (the item the vote is attached to)
userID (the user who voted)
direction (whether the user voted the post up, or down)
I'm aiming to calculate the similarity between, say, users A and B, by finding out two things:
The number of votes they have in common. That is, the number of times they've both voted on the same post (the direction does not matter at this point).
The number of times they've voted in the same direction, on common votes.
(Then simply to calculate #2 as a percentage of #1, to achieve a crude similarity rating).
My question is, how do I find the intersection between the two users' sets of votes? (i.e. how do I calculate point #1 adequately, without looping over every vote in a highly inefficient way.) If they were in different tables, an INNER JOIN would suffice, I'd imagine... but that obviously won't work on the same table (or will it?).
Any ideas would be greatly appreciated.
Something like this:
SELECT COUNT(*)
FROM votes v1
INNER JOIN votes v2 ON (v1.item_id = v2.item_id)
WHERE v1.userID = 'userA'
AND v2.userUD = 'userB'
In case you want to do this for a single user (rather than knowing both users at the start) to find to whom they are the closest match:
SELECT
v2.userID,
COUNT(*) AS matching_items,
SUM(CASE WHEN v2.direction = v1.direction THEN 1 ELSE 0 END) AS matching_votes
FROM
Votes v1
INNER JOIN Votes v2 ON
v2.userID <> v1.userID AND
v2.itemID = v1.itemID
WHERE
v1.userID = #userID
GROUP BY
v2.userID
You can then limit that however you see fit (return the top 10, top 20, all, etc.)
I haven't tested this yet, so let me know if it doesn't act as expected.
Here's an example that should get you closer:
SELECT COUNT(*)
FROM (
SELECT u1.userID
FROM vote u1, vote u2
WHERE u1.itemID = u2.itemID
AND u1.userID = user1
AND u2.userID = user2)
Assuming userID 1 being compared to userID 2
For finding how many votes they have in common:
SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
AND v2.itemID = v1.itemID)
WHERE v1.userID = 1;
For finding when they also voted the same:
SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
AND v2.itemID = v1.itemID
AND v2.direction = v1.direction)
WHERE v1.userID = 1;
A self join is in order. Here it is with all you asked:
SELECT v1.userID user1, v2.userID user2,
count(*) n_votes_in_common,
sum(case when v1.direction = v2.direction then 1 else 0 end) n_votes_same_direction,
(n_votes_same_direction * 100.0 / n_votes_in_common) crude_similarity_percent
FROM votes v1
INNER JOIN votes v2
ON v1.item_id = v2.item_id
You most certainly can join a table to itself. In fact, that's what you're going to have to do. You must use aliasing when joining a table to itself. If your table doesn't have a PK or FK, you'll have to use Union instead. Union will remove duplicates and Union All will not.

SQL: Get all posts with any comments

I need to construct some rather simple SQL, I suppose, but as it's a rare event that I work with DBs these days I can't figure out the details.
I have a table 'posts' with the following columns:
id, caption, text
and a table 'comments' with the following columns:
id, name, text, post_id
What would the (single) SQL statement look like which retrieves the captions of all posts which have one or more comments associated with it through the 'post_id' key? The DBMS is MySQL if it has any relevance for the SQL query.
select p.caption, count(c.id)
from posts p join comments c on p.id = c.post_id
group by p.caption
having count (c.id) > 0
SELECT DISTINCT p.caption, p.id
FROM posts p,
comments c
WHERE c.post_ID = p.ID
I think using a join would be a lot faster than using the IN clause or a subquery.
SELECT DISTINCT caption
FROM posts
INNER JOIN comments ON posts.id = comments.post_id
Forget about counts and subqueries.
The inner join will pick up all the comments that have valid posts and exclude all the posts that have 0 comments. The DISTINCT will coalesce the duplicate caption entries for posts that have more then 1 comment.
I find this syntax to be the most readable in this situation:
SELECT * FROM posts P
WHERE EXISTS (SELECT * FROM Comments WHERE post_id = P.id)
It expresses your intent better than most of the others in this thread - "give me all the posts ..." (select * from posts) "... that have any comments" (where exist (select * from comments ... )). It's essentially the same as the joins above, but because you're not actually doing a join, you don't have to worry about getting duplicates of the records in Posts, so you'll just get one record per post.
SELECT caption FROM posts
INNER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id;
No need for a having clause or count().
edit: Should be a inner join of course (to avoid nulls if a comment is orphaned), thanks to jishi.
Just going off the top of my head here but maybe something like:
SELECT caption FROM posts WHERE id IN (SELECT post_id FROM comments HAVING count(*) > 0)
You're basically looking at performing a subquery --
SELECT p.caption FROM posts p WHERE (SELECT COUNT(*) FROM comments c WHERE c.post_id=p.id) > 1;
This has the effect of running the SELECT COUNT(*) subquery for each row in the posts table. Depending on the size of your tables, you might consider adding an additional column, comment_count, into your posts table to store the number of corresponding comments, such that you can simply do
SELECT p.caption FROM posts p WHERE comment_count > 1