selecting records from parent / child tables - sql

I have parent / child tables that look like this:
CREATE TABLE users(
id integer primary key AUTOINCREMENT,
pnum varchar(10),
dloc varchar(100),
cc varchar(10),
name varchar(255),
group active bit(1)
);
CREATE TABLE group_members(
id integer primary key AUTOINCREMENT,
group_id integer,
member_id integer,
FOREIGN KEY (group_id) REFERENCES users(id),
FOREIGN KEY (member_id) REFERENCES users(id)
);
Users Data looks like:
ID PNUM DLOC CC NAME GRP
86|23101|dloc 89| | |0
87|23101|dloc 90| | |0
88|23102|dloc 91| | |0
590|12345|Group | |Test Group|1
591|90000|dloc 1 | | |0
group_members data looks like:
ID GROUP_ID
1 |590 | 87
2 |590 | 88
Based on the PNUM, I would like to be able to get the dloc values for all users, whether its a group or not.
So for example, if someone requests pnum 23101, I would like to get back
"dloc 89" and
"dloc 90"
But if they request 12345, I would like to get back
"dloc 90", and
"dloc 91"
So far, I have come up with this query:
SELECT users.dloc
FROM users
WHERE users.id IN
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
That works for groups, but it won't return any results if i run the same query with pnum 23101.
What I've tried so far
I tried to see if I could use OR like so:
SELECT users.dloc
FROM users
WHERE users.id in
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
OR
(SELECT users.dloc
FROM users
WHERE pnum='12345')
Any suggestions would be appreciated.

You may use a UNION to execute both queries, one of which matches directly on PNUM for values where GROUP <> 1 and the other which matches on PNUM and joins through group_members for values where GROUP = 1.
In the second part, you need to join twice to users to get the members of the group matching the original PNUM.
Since the GROUP condition is opposite in each, only one part of the UNION will ever return results.
/* First part of UNION directly matches PNUM for GROUP = 0 */
SELECT dloc
FROM users
WHERE
PNUM = 23101
AND `group` <> 1
UNION
/**
Second part of UNION matches GROUP = 1
and joins through group_members back to users
to get member dloc (from the second users join)
*/
SELECT uu.dloc
FROM users u
INNER JOIN group_members m ON u.`GROUP` = 1 AND u.id = m.group_id
INNER JOIN users uu ON m.member_id = uu.id
WHERE
u.PNUM = 23101
This does unfortunately require placing the PNUM value twice in the query, once per UNION part, but that isn't so bad.
Here it is in action (using MySQL rather than SQLite, but that doesn't really matter)
Using your original method with OR and an IN() subquery, it can also be done, but I've added WHERE conditions for GROUP = 0 and GROUP = 1.
SELECT users.dloc
FROM users
WHERE users.id in (
SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='23101' AND `GROUP` = 1
)
OR users.id IN (
SELECT users.ID
FROM users
WHERE pnum='23101' AND `GROUP` = 0
);
And here's the alternative method in action...

Related

Postgres many to one relationship join multiple tables and select all rows, provided that at least one row matches some criterea

Suppose I have a schema something like
create table if not exists user (
id serial primary key,
name text not null
);
create table if not exists post (
id serial primary key,
user_id integer not null references user (id),
score integer not null
)
I want to run a query that selects a row from the user table by ID, and all the rows that reference it from the post table, provided that at least one row in the post table has a score of greater than some number n (e.g. 50). I'm not exactly sure how to do this though.
You can use window functions. Let me assume that post has a user_id column so the tables can be tied together:
select u.*
from user u join
(select p.*, max(score) over (partition by user_id) as max_score
from post p
) p
on p.user_id = u.id
where p.max_score > 50;
If you just wanted all scores, then aggregation with filtering might be sufficient:
select u.*, array_agg(p.score order by p.score desc)
from user u join
post p
) p
on p.user_id = u.id
group by u.id
having max(p.score) > 50;

Query to get record id if exists in any table

I have these 4 tables, they are to store user item.
The item is unique and can only exists one table at a time.
Also, I am searching using serial column
users:
======
id
name
user_bags:
==========
id
user_id
serial
user_store:
===========
id
user_id
serial
user_storage:
============
id
user_id
serial
I have a list of item and need to search them whether they are in any table and show the id for the record in that table.
user_name | user_bags | user_store | user_storage |
==================================================================
A | 2390 | | |
------------------------------------------------------------------
B | | 352 | |
------------------------------------------------------------------
A | 5500 | | |
------------------------------------------------------------------
C | | | 6440 |
------------------------------------------------------------------
I tried this:
SELECT
users.name AS user_name,
(SELECT id FROM user_bags WHERE user_bags.serial = 'abc' AND user_bags.user_id = users.id) AS user_bags,
(SELECT id FROM user_storage WHERE user_storage.serial = 'abc' AND user_storage.user_id = users.id) AS user_storage,
(SELECT id FROM user_store WHERE user_store.serial = 'abc' AND user_store.user_id = users.id) AS user_store
FROM
users
How do I do a better query (faster)? and a proper one. I will be looking through several thousand serial and there are million of records in each table at at time.
Updated: And only show with user having found a match
Your query is fine. For performance, you want indexes on (user_id, serial, id) in the three tables used in the subqueries.
With the indexes, a left join would probably have equivalent performance:
select u.name, ub.id as user_bags, us.id as user_storage, ust.id as user_store
from users u left join
user_bags ub
on ub.serial = 'abc' and ub.user_id = u.id left join
user_storage us
on us.serial = 'abc' and us.user_id = u.id left join
user_store ust
on ust.serial = 'abc' and ust.user_id = u.id
from users;
This is about the best I can do with the given information.
Let me restructure / standardize this a little for you.
SELECT users.name AS [user_name],
user_bags.id AS user_bags,
user_storage.id AS user_storage,
user_store.id AS user_store
FROM users
LEFT JOIN user_store ON user_store.user_id = users.id
LEFT JOIN user_bags ON user_bags.user_id = users.id AND user_bags.serial = user_store.serial
LEFT JOIN user_storage ON user_storage.user_id = users.id AND user_storage.serial = user_store.serial
WHERE user_store.serial = 'abc'
Make sure that you have covering indexes:
CREATE INDEX IX_users_ID ON Users ( ID )
CREATE INDEX IX_user_store_ID_Serial ON user_store ( ID, serial )
CREATE INDEX IX_user_bags_ID_Serial ON user_bags ( ID, serial )
CREATE INDEX IX_user_storage_ID_Serial ON user_storage ( ID, serial )
There are ways to improve performance from here but I would need to know the tables, see query plans and know their perspective record counts.

Query conditionally return only one row per distinct id

I am making a Reddit clone and I'm having trouble querying my list of posts, given a logged in user, that shows whether or not logged in user upvoted the post for every post. I made a small example to make things simpler.
I am trying to return only one row per distinct post_id, but prioritize the upvoted column to be t > f > null.
For this example data:
> select * from post;
id
----
1
2
3
> select * from users;
id
----
1
2
> select * from upvoted;
user_id | post_id
---------+---------
1 | 1
2 | 1
If I am given user_id = 1 I want my query to return:
postid | user_upvoted
--------+--------------
1 | t
2 | f
3 | f
Since user1 upvoted post1, upvoted is t. Since user1 did not upvote post2, upvoted is f. Same for post3.
Schema
CREATE TABLE IF NOT EXISTS post (
id bigserial,
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS users (
id serial,
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS upvoted (
user_id integer
REFERENCES users(id)
ON DELETE CASCADE ON UPDATE CASCADE,
post_id bigint
REFERENCES post(id)
ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (user_id, post_id)
);
What I tried so far
SELECT post.id as postid,
CASE WHEN user_id=1 THEN true ELSE false END as user_upvoted
FROM post LEFT OUTER JOIN upvoted
ON post_id = post.id;
Which gives me:
postid | user_upvoted
--------+--------------
1 | t
1 | f
2 | f
3 | f
Due to the join, there are two "duplicate" rows that result from the query. I want to priority the row with t > f > null. So I want to keep the 1 | t row.
Full script with schema+data.
You should be able to do this with distinct on:
SELECT distinct on (p.id) p.id as postid,
(CASE WHEN user_id = 1 THEN true ELSE false END) as upvoted
FROM post p LEFT OUTER JOIN
upvoted u
ON u.post_id = p.id
ORDER BY p.id, upvoted desc;
Since the combination (user_id, post_id) is defined unique in upvoted (PRIMARY KEY), this can be much simpler:
SELECT p.id AS post_id, u.post_id IS NOT NULL AS user_upvoted
FROM post p
LEFT JOIN upvoted u ON u.post_id = p.id
AND u.user_id = 1;
Simply add user_id = 1 to the join condition. Makes perfect use of the index and should be simplest and fastest.
You also mention NULL, but there are only two distinct states in the result: true / false.
Alternative approach
On second thought, you might be complicating a very basic task. If you are only interested in posts the current user upvoted, use this simple query instead:
SELECT post_id FROM upvoted WHERE user_id = 1;
All other posts are not upvoted by the given user. It would seem we don't have to list those explicitly.
SQL Fiddle.
The exists() operator yields a boolean value:
SELECT p.id
, EXISTS (SELECT * FROM upvoted x
WHERE x.post_id = p.id
AND x.user_id = 1) AS it_was_upvoted_by_user1
FROM post p
;

Find rows that have same value in one column and other values in another column?

I have a PostgreSQL database that stores users in a users table and conversations they take part in a conversation table. Since each user can take part in multiple conversations and each conversation can involve multiple users, I have a conversation_user linking table to track which users are participating in each conversation:
# conversation_user
id | conversation_id | user_id
----+------------------+--------
1 | 1 | 32
2 | 1 | 3
3 | 2 | 32
4 | 2 | 3
5 | 2 | 4
In the above table, user 32 is having one conversation with just user 3 and another with both 3 and user 4. How would I write a query that would show that there is a conversation between just user 32 and user 3?
I've tried the following:
SELECT conversation_id AS cid,
user_id
FROM conversation_user
GROUP BY cid HAVING count(*) = 2
AND (user_id = 32
OR user_id = 3);
SELECT conversation_id AS cid,
user_id
FROM conversation_user
GROUP BY (cid HAVING count(*) = 2
AND (user_id = 32
OR user_id = 3));
SELECT conversation_id AS cid,
user_id
FROM conversation_user
WHERE (user_id = 32)
OR (user_id = 3)
GROUP BY cid HAVING count(*) = 2;
These queries throw an error that says that user_id must appear in the GROUP BY clause or be used in an aggregate function. Putting them in an aggregate function (e.g. MIN or MAX) doesn't sound appropriate. I thought that my first two attempts were putting them in the GROUP BY clause.
What am I doing wrong?
This is a case of relational division. We have assembled an arsenal of techniques under this related question:
How to filter SQL results in a has-many-through relation
The special difficulty is to exclude additional users. There are basically 4 techniques.
Select rows which are not present in other table
I suggest LEFT JOIN / IS NULL:
SELECT cu1.conversation_id
FROM conversation_user cu1
JOIN conversation_user cu2 USING (conversation_id)
LEFT JOIN conversation_user cu3 ON cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
WHERE cu1.user_id = 32
AND cu2.user_id = 3
AND cu3.conversation_id IS NULL;
Or NOT EXISTS:
SELECT cu1.conversation_id
FROM conversation_user cu1
JOIN conversation_user cu2 USING (conversation_id)
WHERE cu1.user_id = 32
AND cu2.user_id = 3
AND NOT EXISTS (
SELECT 1
FROM conversation_user cu3
WHERE cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
);
Both queries do not depend on a UNIQUE constraint for (conversation_id, user_id), which may or may not be in place. Meaning, the query even works if user_id 32 (or 3) is listed more than once for the same conversation. You would get duplicate rows in the result, though, and need to apply DISTINCT or GROUP BY.
The only condition is the one you formulated:
... a query that would show that there is a conversation between just user 32 and user 3?
Audited query
The query you linked in the comment wouldn't work. You forgot to exclude other participants. Should be something like:
SELECT * -- or whatever you want to return
FROM conversation_user cu1
WHERE cu1.user_id = 32
AND EXISTS (
SELECT 1
FROM conversation_user cu2
WHERE cu2.conversation_id = cu1.conversation_id
AND cu2.user_id = 3
)
AND NOT EXISTS (
SELECT 1
FROM conversation_user cu3
WHERE cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
);
Which is similar to the other two queries, except that it will not return multiple rows if user_id = 3 is linked multiple times.
You can use conditional aggregation to select all cids that only have 2 specific particpants
select cid from conversation_user
group by cid
having count(*) = 2
and count(case when user_id not in (32,3) then 1 end) = 0
If (cid,user_id) is not unique then replace having count(*) = 2 with having count(distinct user_id) = 2
if you just want confirmation.
select conversation_id
from conversation_users
group by conversation_id
having bool_and ( user_id in (3,32))
and count(*) = 2;
if you want full details,
you can use a window function and a CTE like this:
with a as (
select *
,not bool_and( user_id in (3,32) )
over ( partition by conversation_id)
and 2 = count(user_id)
over ( partition by conversation_id)
as conv_candidates
from conversation_users
)
select * from a where conv_candidates;
Because you want conversations with just 2 users, you can use a self outer join on other users and filter out hits:
To find all 2-user conversations and they're between:
SELECT
a.conversation_id cid,
a.user_id user_id_1,
b.user_id user_id_2
FROM conversation_user a
JOIN conversation_user b ON b.cid = a.cid
AND b.user_id > a.user_id
LEFT JOIN conversation_user c ON c.cid = a.cid
AND c.user_id NOT IN (a.user_id, b.user_id)
WHERE c.cid IS NULL -- only return misses on join to others
To find all 2-user conversations for a particular user, just add:
AND a.user_id = 32

MySQL Left Join with conditional

It seems pretty simple i have a table 'question' which stores a list of all questions and a many to many table which sits between 'question' and 'user' called 'question_answer'.
Is it possible to do one query to get back all questions within questions table and the ones a user has answered with the un answered questions being NULL values
question:
| id | question |
question_answer:
| id | question_id | answer | user_id |
I am doing this query, but the condition is enforcing that only the questions answered are returned. Will i need to resort to nested select?
SELECT * FROM `question` LEFT JOIN `question_answer`
ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id
if user_id is in the outer joined to table then your predicate user_id = 14583461 will result in not returning any rows where user_id is null i.e. the rows with unanswered questions. You need to say "user_id = 14583461 or user_id is null"
Shouldn't you use RIGHT JOIN?
SELECT * FROM question_answer RIGHT JOIN question ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id
something like this might help (http://pastie.org/1114844)
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) not null
)engine=innodb;
drop table if exists question;
create table question
(
question_id int unsigned not null auto_increment primary key,
ques varchar(255) not null
)engine=innodb;
drop table if exists question_ans;
create table question_ans
(
user_id int unsigned not null,
question_id int unsigned not null,
ans varchar(255) not null,
primary key (user_id, question_id)
)engine=innodb;
insert into users (username) values
('user1'),('user2'),('user3'),('user4');
insert into question (ques) values
('question1 ?'),('question2 ?'),('question3 ?');
insert into question_ans (user_id,question_id,ans) values
(1,1,'foo'), (1,2,'mysql'), (1,3,'php'),
(2,1,'bar'), (2,2,'oracle'),
(3,1,'foobar');
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
order by
u.user_id,
q.question_id;
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
u.user_id = 2
order by
q.question_id;
edit: added some stats/explain plan & runtime:
runtime: 0.031 (10,000 users, 1000 questions, 3.5 million answers)
select count(*) from users
count(*)
========
10000
select count(*) from question
count(*)
========
1000
select count(*) from question_ans
count(*)
========
3682482
explain
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
u.user_id = 256
order by
u.user_id,
q.question_id;
id select_type table type possible_keys key key_len ref rows Extra
== =========== ===== ==== ============= === ======= === ==== =====
1 SIMPLE u const PRIMARY PRIMARY 4 const 1 Using filesort
1 SIMPLE q ALL 687
1 SIMPLE a eq_ref PRIMARY PRIMARY 8 const,foo_db.q.question_id 1
Move the user_id predicate into the join condition. This will then ensure that all rows from question are returned, but only rows from question_answer with the specified user ID and question ID.
SELECT * FROM question
LEFT JOIN question_answer ON question_answer.question_id = question.id
AND user_id = 14583461
ORDER BY user_id, question_id