MySQL Left Join with conditional - sql

It seems pretty simple i have a table 'question' which stores a list of all questions and a many to many table which sits between 'question' and 'user' called 'question_answer'.
Is it possible to do one query to get back all questions within questions table and the ones a user has answered with the un answered questions being NULL values
question:
| id | question |
question_answer:
| id | question_id | answer | user_id |
I am doing this query, but the condition is enforcing that only the questions answered are returned. Will i need to resort to nested select?
SELECT * FROM `question` LEFT JOIN `question_answer`
ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id

if user_id is in the outer joined to table then your predicate user_id = 14583461 will result in not returning any rows where user_id is null i.e. the rows with unanswered questions. You need to say "user_id = 14583461 or user_id is null"

Shouldn't you use RIGHT JOIN?
SELECT * FROM question_answer RIGHT JOIN question ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id

something like this might help (http://pastie.org/1114844)
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) not null
)engine=innodb;
drop table if exists question;
create table question
(
question_id int unsigned not null auto_increment primary key,
ques varchar(255) not null
)engine=innodb;
drop table if exists question_ans;
create table question_ans
(
user_id int unsigned not null,
question_id int unsigned not null,
ans varchar(255) not null,
primary key (user_id, question_id)
)engine=innodb;
insert into users (username) values
('user1'),('user2'),('user3'),('user4');
insert into question (ques) values
('question1 ?'),('question2 ?'),('question3 ?');
insert into question_ans (user_id,question_id,ans) values
(1,1,'foo'), (1,2,'mysql'), (1,3,'php'),
(2,1,'bar'), (2,2,'oracle'),
(3,1,'foobar');
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
order by
u.user_id,
q.question_id;
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
u.user_id = 2
order by
q.question_id;
edit: added some stats/explain plan & runtime:
runtime: 0.031 (10,000 users, 1000 questions, 3.5 million answers)
select count(*) from users
count(*)
========
10000
select count(*) from question
count(*)
========
1000
select count(*) from question_ans
count(*)
========
3682482
explain
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
u.user_id = 256
order by
u.user_id,
q.question_id;
id select_type table type possible_keys key key_len ref rows Extra
== =========== ===== ==== ============= === ======= === ==== =====
1 SIMPLE u const PRIMARY PRIMARY 4 const 1 Using filesort
1 SIMPLE q ALL 687
1 SIMPLE a eq_ref PRIMARY PRIMARY 8 const,foo_db.q.question_id 1

Move the user_id predicate into the join condition. This will then ensure that all rows from question are returned, but only rows from question_answer with the specified user ID and question ID.
SELECT * FROM question
LEFT JOIN question_answer ON question_answer.question_id = question.id
AND user_id = 14583461
ORDER BY user_id, question_id

Related

How can I join these tables to get this count?

I have 4 tables :
users
id int primary key
questions
id int primary key
user_id int references users(id)
answers
id int primary key
question_id references questions(id)
user_id references users(id)
likes
id int primary key
answer_id references answers(id)
question_id references questions(id)
check answer_id xor question_id
A like can either reference an answer or a question, but not both so one foreign key will be null.
user_id in the likes tables is the user who placed the like.
How can I count the number of likes that were placed on each user's questions and answers?
If I correctly understand, you need count likes for each user id, which are earned by answers and questions together.
If so, then one way is:
select coalesce(questions.user_id, answers.user_id) as liked_user_id, count(*)
from likes
left join questions
on likes.question_id = questions.id
left join answers
on likes.answer_id = answers.id
group by liked_user_id
One method uses union all:
select 'questions' as which, count(*)
from questions q join
likes l
on l.question_id = q.id
where q.user_id = $user_id
union all
select 'answers' as which, count(*)
from answers a join
likes l
on l.answer_id = a.id
where a.user_id = $user_id;
EDIT:
If you want the result for all users in one row, then a correlated subqueries are a pretty easy method:
select u.*,
(select count(*)
from questions q join
likes l
on l.question_id = q.id
where q.user_id = u.id
) as question_likes,
(select count(*)
from answers a join
likes l
on l.answer_id = a.id
where a.user_id = u.id
) as answer_likes
from users u;

selecting records from parent / child tables

I have parent / child tables that look like this:
CREATE TABLE users(
id integer primary key AUTOINCREMENT,
pnum varchar(10),
dloc varchar(100),
cc varchar(10),
name varchar(255),
group active bit(1)
);
CREATE TABLE group_members(
id integer primary key AUTOINCREMENT,
group_id integer,
member_id integer,
FOREIGN KEY (group_id) REFERENCES users(id),
FOREIGN KEY (member_id) REFERENCES users(id)
);
Users Data looks like:
ID PNUM DLOC CC NAME GRP
86|23101|dloc 89| | |0
87|23101|dloc 90| | |0
88|23102|dloc 91| | |0
590|12345|Group | |Test Group|1
591|90000|dloc 1 | | |0
group_members data looks like:
ID GROUP_ID
1 |590 | 87
2 |590 | 88
Based on the PNUM, I would like to be able to get the dloc values for all users, whether its a group or not.
So for example, if someone requests pnum 23101, I would like to get back
"dloc 89" and
"dloc 90"
But if they request 12345, I would like to get back
"dloc 90", and
"dloc 91"
So far, I have come up with this query:
SELECT users.dloc
FROM users
WHERE users.id IN
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
That works for groups, but it won't return any results if i run the same query with pnum 23101.
What I've tried so far
I tried to see if I could use OR like so:
SELECT users.dloc
FROM users
WHERE users.id in
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
OR
(SELECT users.dloc
FROM users
WHERE pnum='12345')
Any suggestions would be appreciated.
You may use a UNION to execute both queries, one of which matches directly on PNUM for values where GROUP <> 1 and the other which matches on PNUM and joins through group_members for values where GROUP = 1.
In the second part, you need to join twice to users to get the members of the group matching the original PNUM.
Since the GROUP condition is opposite in each, only one part of the UNION will ever return results.
/* First part of UNION directly matches PNUM for GROUP = 0 */
SELECT dloc
FROM users
WHERE
PNUM = 23101
AND `group` <> 1
UNION
/**
Second part of UNION matches GROUP = 1
and joins through group_members back to users
to get member dloc (from the second users join)
*/
SELECT uu.dloc
FROM users u
INNER JOIN group_members m ON u.`GROUP` = 1 AND u.id = m.group_id
INNER JOIN users uu ON m.member_id = uu.id
WHERE
u.PNUM = 23101
This does unfortunately require placing the PNUM value twice in the query, once per UNION part, but that isn't so bad.
Here it is in action (using MySQL rather than SQLite, but that doesn't really matter)
Using your original method with OR and an IN() subquery, it can also be done, but I've added WHERE conditions for GROUP = 0 and GROUP = 1.
SELECT users.dloc
FROM users
WHERE users.id in (
SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='23101' AND `GROUP` = 1
)
OR users.id IN (
SELECT users.ID
FROM users
WHERE pnum='23101' AND `GROUP` = 0
);
And here's the alternative method in action...

Query conditionally return only one row per distinct id

I am making a Reddit clone and I'm having trouble querying my list of posts, given a logged in user, that shows whether or not logged in user upvoted the post for every post. I made a small example to make things simpler.
I am trying to return only one row per distinct post_id, but prioritize the upvoted column to be t > f > null.
For this example data:
> select * from post;
id
----
1
2
3
> select * from users;
id
----
1
2
> select * from upvoted;
user_id | post_id
---------+---------
1 | 1
2 | 1
If I am given user_id = 1 I want my query to return:
postid | user_upvoted
--------+--------------
1 | t
2 | f
3 | f
Since user1 upvoted post1, upvoted is t. Since user1 did not upvote post2, upvoted is f. Same for post3.
Schema
CREATE TABLE IF NOT EXISTS post (
id bigserial,
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS users (
id serial,
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS upvoted (
user_id integer
REFERENCES users(id)
ON DELETE CASCADE ON UPDATE CASCADE,
post_id bigint
REFERENCES post(id)
ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (user_id, post_id)
);
What I tried so far
SELECT post.id as postid,
CASE WHEN user_id=1 THEN true ELSE false END as user_upvoted
FROM post LEFT OUTER JOIN upvoted
ON post_id = post.id;
Which gives me:
postid | user_upvoted
--------+--------------
1 | t
1 | f
2 | f
3 | f
Due to the join, there are two "duplicate" rows that result from the query. I want to priority the row with t > f > null. So I want to keep the 1 | t row.
Full script with schema+data.
You should be able to do this with distinct on:
SELECT distinct on (p.id) p.id as postid,
(CASE WHEN user_id = 1 THEN true ELSE false END) as upvoted
FROM post p LEFT OUTER JOIN
upvoted u
ON u.post_id = p.id
ORDER BY p.id, upvoted desc;
Since the combination (user_id, post_id) is defined unique in upvoted (PRIMARY KEY), this can be much simpler:
SELECT p.id AS post_id, u.post_id IS NOT NULL AS user_upvoted
FROM post p
LEFT JOIN upvoted u ON u.post_id = p.id
AND u.user_id = 1;
Simply add user_id = 1 to the join condition. Makes perfect use of the index and should be simplest and fastest.
You also mention NULL, but there are only two distinct states in the result: true / false.
Alternative approach
On second thought, you might be complicating a very basic task. If you are only interested in posts the current user upvoted, use this simple query instead:
SELECT post_id FROM upvoted WHERE user_id = 1;
All other posts are not upvoted by the given user. It would seem we don't have to list those explicitly.
SQL Fiddle.
The exists() operator yields a boolean value:
SELECT p.id
, EXISTS (SELECT * FROM upvoted x
WHERE x.post_id = p.id
AND x.user_id = 1) AS it_was_upvoted_by_user1
FROM post p
;

List of questions comparison

I have a profile that looks like this:
profile_id | answer_id
----------------------
1 1
1 4
1 10
I have a table which contains a list of responses by poll respondents with structure like this:
user_id | answer_id
-------------------
1 1
1 9
2 1
2 4
2 10
3 14
3 29
How do I select a list of users that gave all of the answers in the profile? In this case only user 2.
You can use the following:
select user_id
from response r
where answer_id in (select distinct answer_id -- get the list of distinct answer_id
from profile
where profile_id = 1) -- add filter if needed
group by user_id -- group by each user
having count(distinct answer_id) = (select count(distinct answer_id) -- verify the user has the distinct count
from profile
where profile_id = 1) -- add filter if needed
See SQL Fiddle with Demo
Or another way to write this is:
select user_id
from response r
where answer_id in (1, 4, 10)
group by user_id
having count(distinct answer_id) = 3
See SQL Fiddle with Demo
This is an example of a join query with an aggregation:
select a.user_id
from profile p full outer join
answers a
on p.answer_id = p.answer_id and
p.profile_id = 1
group by a.user_id
having count(p.profileid) = count(*) and
count(a.user_id) = count(*)
The full outer join matches all the profiles to all the answers. If the two sets completely match, then there are no "null"s in the ids of the other set. The having clause checks for jsut this condition.
SELECT user_id
FROM user_answer
WHERE user_id in (SELECT user_id FROM profile WHERE answer_id = 1) AND
user_id in (SELECT user_id FROM profile WHERE answer_id = 4) AND
user_id in (SELECT user_id FROM profile WHERE answer_id = 10)
SELECT *
FROM table1
INNER JOIN table2
ON table1.answer_id = table2.answer_id
WHERE table2.user_id = 2
i think this might be what you're looking for.

mysql "group by" very slow query

I have this query in a table with about 100k records, it runs quite slow (3-4s), when I take out the group it's much faster (less than 0.5s). I'm quite at loss what to do to fix this:
SELECT msg.id,
msg.thread_id,
msg.senderid,
msg.recipientid,
from_user.username AS from_name,
to_user.username AS to_name
FROM msgtable AS msg
LEFT JOIN usertable AS from_user ON msg.senderid = from_user.id
LEFT JOIN usertabe AS to_user ON msg.recipientid = to_user.id
GROUP BY msg.thread_id
ORDER BY msg.id desc
msgtable has indexes on thread_id, id, senderid and recipientid.
explain returns:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE msg ALL NULL NULL NULL NULL 162346 Using temporary; Using filesort
1 SIMPLE from_user eq_ref PRIMARY PRIMARY 4 db.msg.senderid 1
1 SIMPLE to_user eq_ref PRIMARY PRIMARY 4 db.msg.recipientid 1
Any ideas how to speed this up while returning the same result (there are multiple messages per thread, i want to return only one message per thread in this query).
thanks in advance.
try this:
select m.thread_id, m.id, m.senderid, m.recipientid,
f.username as from_name, t.username as to_name
from msgtable m
join usertable f on m.senderid = f.id
join usertable t on m.recipientid = t.id
where m.id = (select MAX(id) from msgtable where thread_id = m.thread_id)
Or this:
select m.thread_id, m.id, m.senderid, m.recipientid,
(select username from usertable where id = m.senderid) as from_name,
(select username from usertable where id = m.recipientid) as to_name
from msgtable m
where m.id = (select MAX(id) from msgtable where thread_id = m.thread_id)
Why were the user tables left joined? Can a message be missing a from or to?..
The biggest problem is that you have no usable indexes on msgtable. Create an index on at least senderid and recipientid, and it should help the speed of your query, as it will limit the number of results needing to be scanned.