SQL - ordering table by information from multiple tables - sql

Title of the question may not have been very clear - I am not really sure how to name this question, but I hope that my explanation will make my problem clearer.
I have 3 tables:
[1] score
id
rating_type
1
UPVOTE
2
UPVOTE
3
DOWNVOTE
4
UPVOTE
5
DOWNVOTE
6
DOWNVOTE
[2] post_score
post_id
score_id
1
1
1
2
1
3
2
4
2
5
2
6
and [3] post
id
title
1
title1
2
title2
My goal is to order [3] post table by score.
Assume UPVOTE represents value of 1 and DOWNVOTE value of -1; In this example, post where id = 1 has 3 scores related to it, and the values of them are UPVOTE, UPVOTE, DOWNVOTE, making the "numeric score" of this post: 2;
likewise, post where id = 2, also has 3 scores, and those values are: UPVOTE, DOWNVOTE, DOWNVOTE, making the "numeric score": -1;
How would I order post table by this score? In this example, if I ordered by score asc, I would expect the following result:
id
title
2
title2
1
title1
My attempts didn't go far, I am stuck here with this query currently, which doesn't really do anything useful yet:
WITH fullScoreInformation AS (
SELECT * FROM score s
JOIN post_score ps ON s.id = ps.score_id),
upvotes AS (SELECT * FROM fullScoreInformation WHERE rating_type = 'UPVOTE'),
downvotes AS (SELECT * FROM fullScoreInformation WHERE rating_type = 'DOWNVOTE')
SELECT p.id, rating_type, title FROM post p JOIN fullScoreInformation fsi on p.id = fsi.post_id
I am using PostgreSQL. Queries will be used in my Spring Boot application (I normally use native queries).
Perhaps this data structure is bad and I should have constructed my entities differently ?

My goal is to order post table by score. Assume UPVOTE represents value of 1 and DOWNVOTE value of -1
One option uses a subquery to count the upvotes and downvotes of each post:
select p.*, s.*
from post p
cross join lateral (
select
count(*) filter(where s.rating_type = 'UPVOTE' ) as cnt_up,
count(*) filter(where s.rating_type = 'DOWNVOTE') as cnt_down
from post_score ps
inner join score s on s.id = ps.score_id
where ps.post_id = p.id
) s
order by s.cnt_up - s.cnt_down desc
Perhaps this data structure is bad and I should have constructed my entities differently ?
As it stands, I don't see the need for two distinct tables post_score and score. For the data you have showed, this is a 1-1 relationship, so just one table should be sufficient, storing the post id and the rating type.

You better use a LEFT join, otherwise you wouldn't get posts that have no votes yet. Then aggregate to get the fitered sum of the scores. Then add these sums, apply coalesce() to get 0 for posts without votes and order by the result.
SELECT p.id,
p.title
FROM post p
LEFT JOIN post_score ps
ON ps.post_id = p.id
LEFT JOIN score s
ON s.id = ps.score_id
GROUP BY p.id,
p.title
ORDER BY coalesce(sum(1) FILTER (WHERE rating_type = 'UPVOTE')
+
sum(-1) FILTER (WHERE rating_type = 'DOWNVOTE'),
0);
I second GMB's comment about the superfluous table.

Related

Why does my SQL SELECT query output only 1 result

The select function would sometumes output 1 result. Does anyone know why?
SELECT * FROM people
WHERE id = (SELECT person_id FROM stars
WHERE movie_id = (SELECT id FROM movies
WHERE year = 2004))
ORDER BY birth;
You should avoid subqueries. You're best bet is to use something like the following code:
SELECT
ppl.* -- to get just people information
FROM
people ppl,
stars sta,
movies mov
WHERE
ppl.id = sta.person_id
AND sta.movie_id = mov.id
AND mov.YEAR = 2004
ORDER BY
ppl.birth;
If you want to have stars information or movie information you just need to add the desired fields on the return like mov.title (assuming you have a column named title on movies table :P)
EDIT:
As pointed out, I will leave an example using JOIN also.
SELECT
ppl.* -- to get just people information
FROM
people AS ppl
INNER JOIN
stars AS sta ON ppl.id = sta.person_id
INNER JOIN
movies AS mov ON sta.movie_id = mov.id
WHERE
mov.YEAR = 2004;

How to Limit Results Per Match on a Left Join - SQL Server

I have a table with student info [STU] and a table with parent info [PAR]. I want to return an email address for each student, but just one. So I run this query:
SELECT [STU].[ID], [PAR].[EM]
FROM (SELECT [STU].* FROM DB1.STU)
STU LEFT JOIN (SELECT [PAR].* FROM DB1.PAR) PAR ON [STU].[ID] = [PAR].[ID]
This gives me the below table:
Student ID ParentEmail
1 jim#email.com
1 sarah#email.com
2 paul#email.com
2 tim#email.com
3 bill#email.com
3 frank#email.com
3 joyce#email.com
4 greg#email.com
5 tony#email.com
5 sam#email.com
Each student has multiple parent emails, but I only want one. In other words, I want the output to look like this:
Student ID ParentEmail
1 jim#email.com
2 paul#email.com
3 frank#email.com
4 greg#email.com
5 sam#email.com
I've tried so many things. I've tried using GROUP BY and MIN/MAX and I've tried complex CASE statements, and I've tried COALESCE but I just can't seem to figure it out.
I think OUTER APPLY is the simplest method:
SELECT [STU].[ID], [PAR].[EM]
FROM DB1.STU OUTER APPLY
(SELECT TOP (1) [PAR].*
FROM DB1.PAR
WHERE [STU].[ID] = [PAR].[ID]
) PAR;
Normally, there would be an ORDER BY in the subquery, to give you control over which email you want -- the longest, shortest, oldest, or whatever. Without an ORDER BY it returns just one email, which is what you are asking for.
If you just want one column from the parent table, a simple approach is a correlated subquery:
select
s.id student_id,
(select max(p.em) from db1.par p where p.id = s.id) parent_email
from db1.stu s
This gives you the greatest parent email per student.

Multiple rows get only specific values

Feel like this should be a rather simple problem yet, I'm struggling to find the solution.
We have three tables to create a Question Answer system. One is the question, other is answer and then the third is finally where we store the user's selection.
Question table
QuestionID Question
1 What is your favorite color?
2 Where were you born?
Answer table
AnswerID QuestionID Answer
1 1 Blue
2 1 Green
3 1 Yellow
4 2 USA
5 2 Africa
Answer stored table
AnswerStoreID QuestionID AnswerID UserID
1 1 1 1
2 1 2 1
3 2 4 2
4 2 5 2
5 1 1 3
I want to find the UserID that answered QuestionID 1 as AnswerID 1 AND QuestionID 2 as AnswerID 4.
Thought it would be simple like this
SELECT UserID
FROM Question Q
INNER JOIN Answer A ON A.QuestionID = A.QuestionID
INNER JOIN AnswerStore AS ON AS.AnswerID = A.AnswerID
WHERE (AS.AnswerID = 1 AND AS.QuestionID = 1)
AND (AS.AnswerID = 2 AND AS.QuestionID = 4)
That renders nothing though. When replacing the AND between the two where statements with an OR gets results that don't have both those answers though which is not desired either. I want only those users who answered both of these questions.
I then did a query with some various joins to do a query per question but feel that is too complicated and heavy for this problem and I'm overthinking it. Is there an easier solution to this problem?
---- Edit ----
Actually, you don't even need the JOINs in your original query:
SELECT t.UserID
FROM AnswerStore AS t
WHERE (t.AnswerID = 1 AND t.QuestionID = 1)
OR (t.AnswerID = 2 AND t.QuestionID = 4)
GROUP BY t.UserID
HAVING COUNT(*) = 2
---- Original Full Answer ----
This is actually a fairly common question, that appears a couple times a week. Unfortunately, it is really hard to formulate a repeatable/searchable question to reference for it.
SELECT UserID
FROM Question Q
INNER JOIN Answer A ON A.QuestionID = A.QuestionID
INNER JOIN AnswerStore AS ON AS.AnswerID = A.AnswerID
WHERE (AS.AnswerID = 1 AND AS.QuestionID = 1)
OR (AS.AnswerID = 2 AND AS.QuestionID = 4)
GROUP BY UserID
HAVING COUNT(*) = 2
The general form is:
SELECT A.a_id
FROM A
INNER JOIN B ON A.a_id = B.a_id
WHERE B.something IN ([list])
GROUP BY a_id
HAVING COUNT(*) = [length of list]
-- or in cases where B matches may be non-unique
-- HAVING COUNT(DISTINCT B.something) = [length of list]
You are really looking at two sets of data, UserIDs that answered QuestionID 1 as AnswerID 1, and UserIDs that answered QuestionID 2 as AnswerID 4. So you can join the sets together to find UserIDs that are in both sets of data:
SELECT UserID
FROM AnswerStore as1 INNER JOIN AnswerStore as2 ON as1.UserID = as2.UserID
AND as1.QuestionID = 1 AND as1.AnswerID = 1
AND as2.QuestionID = 2 AND as2.AnswerID = 4

SQL Sampling: one element from each bucket

Here is a simulation of the basic setup i have: each person can hold multiple possessions.
Persons table:
id name
1 Carl
2 Sam
3 Tom
4 Jack
Possessions table:
possession personId
car 2
shoes 2
shovel 2
tent 3
matches 3
axe 4
I want to generate a random set of possessions belonging to a random set of people, one possession per person.
So, in a non-SQL world I would generate a set of N random people and then pick a random possession for each person in the set. But how do I implement that in SQL semantics?
I thought of getting a random sample of possessions with some variation of:
SELECT * FROM Posessions WHERE 0.01 >= RAND()
And then filtering out duplicate persons, but that is no good as it will favor persons with large number of possessions in the end, and I want each person to have equal chance of being selected.
Is there a canonical way to solve this?
P.S. Person contains ~50000 entities and Possession contains ~2500000 entities, but i only need to perform this sampling once, so it can be somewhat slow.
Why don't you take random set of persons and join to posessions ranked by random. Something like below. Sorry if it contain any spelling error but I don't have DB to check it now:
select * from (
(select top 1 percent * from persons order by newid()) a
inner join
(select p.*, ROW_NUMBER() OVER (partition by personId order by newid()) r from posessions p) b
on (a.personId = b.personId)
)
where r = 1;
One way would be (for 2 persons below and one possession per person)
DECLARE #PeopleCount INT = 2,
#PossessionsPerPersonCount INT = 1;
SELECT *
FROM (SELECT TOP (#PeopleCount) *
FROM Persons
ORDER BY CRYPT_GEN_RANDOM(4)) RandomPersons
OUTER APPLY (SELECT TOP (#PossessionsPerPersonCount) * FROM Posessions p
WHERE RandomPersons.id = p.personId
ORDER BY CRYPT_GEN_RANDOM(4)) RandomPosessions
Hopefully Possession has an index on personId so that it can seek into the relevant rows per person (average 50) rather than scanning all 2,500,000 in the table for each person.
I've used OUTER APPLY above as not all the people in your example data have possessions (i.e. Carl doesn't).
If you only want to include people with possessions and want one possession per person you can use this instead.
DECLARE #PeopleCount INT = 2;
SELECT TOP (#PeopleCount) *
FROM Persons
CROSS APPLY (SELECT TOP (1) * FROM Posessions p
WHERE Persons.id = p.personId
ORDER BY CRYPT_GEN_RANDOM(4)) RandomPosessions
ORDER BY CRYPT_GEN_RANDOM(4);
The following query will generate 3 random sample for you
SELECT p.id,
(SELECT posession FROM posessions p1 where p1.id=p.id ORDER BY RAND() LIMIT 1) as posession
FROM posessions p
GROUP BY p.id
ORDER BY RAND()
LIMIT 3
The sub-query generate random posession of each person, while the outer-query generate random person.

Text search of a many-to-many data relation

I know this must have been answered before here, but I simply can't find a matching question.
Using a LIKE '%keyword%', I want to search a many-to-many data relationship in a MSSQL database and reduce it to a one-to-one result set. The two tables are joined through a linking table. Here's a very simplified version of what I'm talking about:
Books:
book_ id title
1 Treasure Island
2 Poe Collected Stories
3 Invest in Treasure Islands
Categories:
category_id name
1 Children
2 Adventure
3 Horror
4 Classic
5 Money
BookCategory:
book_id category_id
1 1
1 2
1 4
2 3
2 4
3 5
What I want to do is search for a phrase in the title (e.g. '%treasure island%') and get matching Books records that contain the search string and the single highest matching Categories record that goes with each book -- I want to discard the lesser category records. In other words, I'm looking for this:
book_id title category_id name
1 Treasure Island 4 Classic
3 Invest in Treasure Islands 5 Money
Any suggestions?
Try this. Filter your lookup table, then join:
With maxCategories AS
(select book_id, max(category_id) as category_id from BookCategory group by book_id)
select Books.book_id, Books.Title, Categories.category_id, Categories.name
from Books
inner join maxCategories on (Books.book_id = maxCategories.book_id)
inner join Categories on (Categories.category_id = maxCategories.category_id)
where Books.title like '%treasure island%'
Try:
select * from
(select b.*,
c.*,
row_number() over (partition by bc.book_id
order by bc.category_id desc) rn
from Books b
join BookCategory bc on b.book_id = bc.book_id
join Categories c on bc.category_id = c.category_id
where b.name like '%treasure island%') sq
where rn=1