mysql "group by" very slow query

mysql "group by" very slow query - sql

I have this query in a table with about 100k records, it runs quite slow (3-4s), when I take out the group it's much faster (less than 0.5s). I'm quite at loss what to do to fix this:
SELECT msg.id,
msg.thread_id,
msg.senderid,
msg.recipientid,
from_user.username AS from_name,
to_user.username AS to_name
FROM msgtable AS msg
LEFT JOIN usertable AS from_user ON msg.senderid = from_user.id
LEFT JOIN usertabe AS to_user ON msg.recipientid = to_user.id
GROUP BY msg.thread_id
ORDER BY msg.id desc
msgtable has indexes on thread_id, id, senderid and recipientid.
explain returns:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE msg ALL NULL NULL NULL NULL 162346 Using temporary; Using filesort
1 SIMPLE from_user eq_ref PRIMARY PRIMARY 4 db.msg.senderid 1
1 SIMPLE to_user eq_ref PRIMARY PRIMARY 4 db.msg.recipientid 1
Any ideas how to speed this up while returning the same result (there are multiple messages per thread, i want to return only one message per thread in this query).
thanks in advance.

try this:
select m.thread_id, m.id, m.senderid, m.recipientid,
f.username as from_name, t.username as to_name
from msgtable m
join usertable f on m.senderid = f.id
join usertable t on m.recipientid = t.id
where m.id = (select MAX(id) from msgtable where thread_id = m.thread_id)
Or this:
select m.thread_id, m.id, m.senderid, m.recipientid,
(select username from usertable where id = m.senderid) as from_name,
(select username from usertable where id = m.recipientid) as to_name
from msgtable m
where m.id = (select MAX(id) from msgtable where thread_id = m.thread_id)
Why were the user tables left joined? Can a message be missing a from or to?..

The biggest problem is that you have no usable indexes on msgtable. Create an index on at least senderid and recipientid, and it should help the speed of your query, as it will limit the number of results needing to be scanned.

Related

Query to get record id if exists in any table

I have these 4 tables, they are to store user item.
The item is unique and can only exists one table at a time.
Also, I am searching using serial column
users:
======
id
name
user_bags:
==========
id
user_id
serial
user_store:
===========
id
user_id
serial
user_storage:
============
id
user_id
serial
I have a list of item and need to search them whether they are in any table and show the id for the record in that table.
user_name | user_bags | user_store | user_storage |
==================================================================
A | 2390 | | |
------------------------------------------------------------------
B | | 352 | |
------------------------------------------------------------------
A | 5500 | | |
------------------------------------------------------------------
C | | | 6440 |
------------------------------------------------------------------
I tried this:
SELECT
users.name AS user_name,
(SELECT id FROM user_bags WHERE user_bags.serial = 'abc' AND user_bags.user_id = users.id) AS user_bags,
(SELECT id FROM user_storage WHERE user_storage.serial = 'abc' AND user_storage.user_id = users.id) AS user_storage,
(SELECT id FROM user_store WHERE user_store.serial = 'abc' AND user_store.user_id = users.id) AS user_store
FROM
users
How do I do a better query (faster)? and a proper one. I will be looking through several thousand serial and there are million of records in each table at at time.
Updated: And only show with user having found a match

Your query is fine. For performance, you want indexes on (user_id, serial, id) in the three tables used in the subqueries.
With the indexes, a left join would probably have equivalent performance:
select u.name, ub.id as user_bags, us.id as user_storage, ust.id as user_store
from users u left join
user_bags ub
on ub.serial = 'abc' and ub.user_id = u.id left join
user_storage us
on us.serial = 'abc' and us.user_id = u.id left join
user_store ust
on ust.serial = 'abc' and ust.user_id = u.id
from users;

This is about the best I can do with the given information.
Let me restructure / standardize this a little for you.
SELECT users.name AS [user_name],
user_bags.id AS user_bags,
user_storage.id AS user_storage,
user_store.id AS user_store
FROM users
LEFT JOIN user_store ON user_store.user_id = users.id
LEFT JOIN user_bags ON user_bags.user_id = users.id AND user_bags.serial = user_store.serial
LEFT JOIN user_storage ON user_storage.user_id = users.id AND user_storage.serial = user_store.serial
WHERE user_store.serial = 'abc'
Make sure that you have covering indexes:
CREATE INDEX IX_users_ID ON Users ( ID )
CREATE INDEX IX_user_store_ID_Serial ON user_store ( ID, serial )
CREATE INDEX IX_user_bags_ID_Serial ON user_bags ( ID, serial )
CREATE INDEX IX_user_storage_ID_Serial ON user_storage ( ID, serial )
There are ways to improve performance from here but I would need to know the tables, see query plans and know their perspective record counts.

Flattening nested query in WHERE clause with NOT IN

Suppose I have these two tables, simplified for the purpose of the question:
CREATE TABLE merchandises
(
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price INT NOT NULL
)
CREATE TABLE gifts
(
id BIGSERIAL NOT NULL PRIMARY KEY,
from_user VARCHAR(255) REFERENCES users(id),
to_user VARCHAR(255) REFERENCES users(id),
with_merchandise BIGINT REFERENCES merchandises(id)
)
The merchandises table lists available merchandises. The gifts table show records that a user has sent a merchandise to another user as gift (proper index is in place to avoid duplication).
What I would like to query is a list of merchandises that a user can send to another user, provided that the merchandises should not have been gifted before.
This is a query that works, but I hope that I can find one that does not have a nested query, thinking that it might give better performance thanks to the optimizer of POSTGRESQL.
SELECT DISTINCT ON (m.id) m.id, m.name, m.description
FROM merchandises m
WHERE m.id NOT IN (
SELECT g.with_merchandise
FROM gifts g
WHERE g.from_user = 'some_user_id' AND g.to_user = 'some_other_user_id'
)
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
In the previous attempt, I had this query, but I found out that it does not work:
SELECT DISTINCT ON (m.id) m.id, m.name, m.description
FROM merchandises m
LEFT JOIN gifts g
ON m.id = g.with_merchandise
WHERE g.id IS NULL
OR g.from_user <> 'some_user_id' AND g.to_user <> 'some_other_user_id'
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
This query does not work because even though the WHERE clause filters out gift entries from two specific users, two other users might have given gifts with the same merchandise (same merchandise_id).

Even though you asked to remove the subquery, using a not exists subquery might run faster than not in especially if the not in query returns a lot of values:
SELECT m.id, m.name, m.description
FROM merchandises m
WHERE NOT EXISTS (
SELECT 1
FROM gifts g
WHERE g.with_merchandise = m.id
AND g.from_user = 'some_user_id'
AND g.to_user = 'some_other_user_id'
)
This query can take advantage of a composite key on gifts(with_merchandise,from_user,to_user)
If you still rather use left join, then move your conditions for from_user and to_user from the where to the on clause
SELECT m.id, m.name, m.description
FROM merchandises m
LEFT JOIN gifts g ON m.id = g.with_merchandise
AND g.from_user = 'some_user_id' AND g.to_user = 'some_other_user_id'
WHERE g.id IS NULL
ORDER BY m.id ASC
LIMIT 20 OFFSET 0

This uses a left outer join and should perform well.
SELECT m.*
FROM merchandises m
LEFT OUTER JOIN (SELECT with_merchandise FROM gifts WHERE from_user = 'some_user_id' AND to_user = 'some_other_user_id' GROUP BY with_merchandise) g ON m.id = g.with_merchandise
WHERE g.with_merchandise IS NULL
ORDER BY m.id ASC
LIMIT 20 OFFSET 0

selecting records from parent / child tables

I have parent / child tables that look like this:
CREATE TABLE users(
id integer primary key AUTOINCREMENT,
pnum varchar(10),
dloc varchar(100),
cc varchar(10),
name varchar(255),
group active bit(1)
);
CREATE TABLE group_members(
id integer primary key AUTOINCREMENT,
group_id integer,
member_id integer,
FOREIGN KEY (group_id) REFERENCES users(id),
FOREIGN KEY (member_id) REFERENCES users(id)
);
Users Data looks like:
ID PNUM DLOC CC NAME GRP
86|23101|dloc 89| | |0
87|23101|dloc 90| | |0
88|23102|dloc 91| | |0
590|12345|Group | |Test Group|1
591|90000|dloc 1 | | |0
group_members data looks like:
ID GROUP_ID
1 |590 | 87
2 |590 | 88
Based on the PNUM, I would like to be able to get the dloc values for all users, whether its a group or not.
So for example, if someone requests pnum 23101, I would like to get back
"dloc 89" and
"dloc 90"
But if they request 12345, I would like to get back
"dloc 90", and
"dloc 91"
So far, I have come up with this query:
SELECT users.dloc
FROM users
WHERE users.id IN
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
That works for groups, but it won't return any results if i run the same query with pnum 23101.
What I've tried so far
I tried to see if I could use OR like so:
SELECT users.dloc
FROM users
WHERE users.id in
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
OR
(SELECT users.dloc
FROM users
WHERE pnum='12345')
Any suggestions would be appreciated.

You may use a UNION to execute both queries, one of which matches directly on PNUM for values where GROUP <> 1 and the other which matches on PNUM and joins through group_members for values where GROUP = 1.
In the second part, you need to join twice to users to get the members of the group matching the original PNUM.
Since the GROUP condition is opposite in each, only one part of the UNION will ever return results.
/* First part of UNION directly matches PNUM for GROUP = 0 */
SELECT dloc
FROM users
WHERE
PNUM = 23101
AND `group` <> 1
UNION
/**
Second part of UNION matches GROUP = 1
and joins through group_members back to users
to get member dloc (from the second users join)
*/
SELECT uu.dloc
FROM users u
INNER JOIN group_members m ON u.`GROUP` = 1 AND u.id = m.group_id
INNER JOIN users uu ON m.member_id = uu.id
WHERE
u.PNUM = 23101
This does unfortunately require placing the PNUM value twice in the query, once per UNION part, but that isn't so bad.
Here it is in action (using MySQL rather than SQLite, but that doesn't really matter)
Using your original method with OR and an IN() subquery, it can also be done, but I've added WHERE conditions for GROUP = 0 and GROUP = 1.
SELECT users.dloc
FROM users
WHERE users.id in (
SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='23101' AND `GROUP` = 1
)
OR users.id IN (
SELECT users.ID
FROM users
WHERE pnum='23101' AND `GROUP` = 0
);
And here's the alternative method in action...

MySQL Left Join with conditional

It seems pretty simple i have a table 'question' which stores a list of all questions and a many to many table which sits between 'question' and 'user' called 'question_answer'.
Is it possible to do one query to get back all questions within questions table and the ones a user has answered with the un answered questions being NULL values
question:
| id | question |
question_answer:
| id | question_id | answer | user_id |
I am doing this query, but the condition is enforcing that only the questions answered are returned. Will i need to resort to nested select?
SELECT * FROM `question` LEFT JOIN `question_answer`
ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id

if user_id is in the outer joined to table then your predicate user_id = 14583461 will result in not returning any rows where user_id is null i.e. the rows with unanswered questions. You need to say "user_id = 14583461 or user_id is null"

Shouldn't you use RIGHT JOIN?
SELECT * FROM question_answer RIGHT JOIN question ON question_answer.question_id = question.id
WHERE user_id = 14583461 GROUP BY question_id

something like this might help (http://pastie.org/1114844)
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) not null
)engine=innodb;
drop table if exists question;
create table question
(
question_id int unsigned not null auto_increment primary key,
ques varchar(255) not null
)engine=innodb;
drop table if exists question_ans;
create table question_ans
(
user_id int unsigned not null,
question_id int unsigned not null,
ans varchar(255) not null,
primary key (user_id, question_id)
)engine=innodb;
insert into users (username) values
('user1'),('user2'),('user3'),('user4');
insert into question (ques) values
('question1 ?'),('question2 ?'),('question3 ?');
insert into question_ans (user_id,question_id,ans) values
(1,1,'foo'), (1,2,'mysql'), (1,3,'php'),
(2,1,'bar'), (2,2,'oracle'),
(3,1,'foobar');
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
order by
u.user_id,
q.question_id;
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
u.user_id = 2
order by
q.question_id;
edit: added some stats/explain plan & runtime:
runtime: 0.031 (10,000 users, 1000 questions, 3.5 million answers)
select count(*) from users
count(*)
========
10000
select count(*) from question
count(*)
========
1000
select count(*) from question_ans
count(*)
========
3682482
explain
select
u.*,
q.*,
a.ans
from users u
cross join question q
left outer join question_ans a on a.user_id = u.user_id and a.question_id = q.question_id
where
u.user_id = 256
order by
u.user_id,
q.question_id;
id select_type table type possible_keys key key_len ref rows Extra
== =========== ===== ==== ============= === ======= === ==== =====
1 SIMPLE u const PRIMARY PRIMARY 4 const 1 Using filesort
1 SIMPLE q ALL 687
1 SIMPLE a eq_ref PRIMARY PRIMARY 8 const,foo_db.q.question_id 1

Move the user_id predicate into the join condition. This will then ensure that all rows from question are returned, but only rows from question_answer with the specified user ID and question ID.
SELECT * FROM question
LEFT JOIN question_answer ON question_answer.question_id = question.id
AND user_id = 14583461
ORDER BY user_id, question_id

What's wrong with this MySQL query? SELECT * AS `x`, how to use x again later?

The following MySQL query:
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
where `sID` in (select `siteID` from `sites` where `foo` = "bar")
order by `timestamp` desc limit 100
…returns an error:
Unknown column 'sID' in 'IN/ALL/ANY subquery'
I don't understand what I'm doing wrong here. The sID thing is not supposed to be a column, but the 'alias' (what is this called?) I created by executing (select siteID from users where userID = uID) as sID. And it’s not even inside the IN subquery.
Any ideas?
Edit: #Roland: Thanks for your comment. I have three tables, actions, users and sites. The table actions contains a userID field, which corresponds to an entry in the users table. Every user in this table (users) has a siteID.
I'm trying to select the latest actions from the actions table, and link them to the users and sites table to find out who performed those actions, and on which site. Hope that makes sense :)

You either need to enclose it into a subquery:
SELECT *
FROM (
SELECT userID as uID, (select siteID from users where userID = actions.userID) as sID,
FROM actions
) q
WHERE sID IN (select siteID from sites where foo = "bar")
ORDER BY
timestamp DESC
LIMIT 100
, or, better, rewrite it as a JOIN
SELECT a.userId, u.siteID
FROM actions a
JOIN users u
ON u.userID = a.userID
WHERE siteID IN
(
SELECT siteID
FROM sites
WHERE foo = 'bar'
)
ORDER BY
timestamp DESC
LIMIT 100
Create the following indexes:
actions (timestamp)
users (userId)
sites (foo, siteID)

The column alias is not established until the query processor finishes the Select clause, and buiulds the first intermediate result set, so it can only be referenced in a group By, (since the group By clause operates on that intermediate result set) if you want ot use it this way, puit the alias inside the sub-query, then it will be in the resultset generated by the subquery, and therefore accessible to the outer query. To illustrate
(This is not the simplest way to do this query but it illustrates how to establish and use a column alias from a subquery)
select a.userID as uID, z.Sid
from actions a
Join (select userID, siteID as sid1 from users) Z,
On z.userID = a.userID
where Z.sID in (select siteID from sites where foo = "bar")
order by timestamp desc limit 100

Try the following:
SELECT
a.userID as uID
,u.siteID as sID
FROM
actions as a
INNER JOIN
users as u ON u.userID=a.userID
WHERE
u.siteID IN (SELECT siteID FROM sites WHERE foo = 'bar')
ORDER BY
a.timestamp DESC
LIMIT 100

I think the reason for the error is that the alias isn't available to the WHERE instruction, which is why we have HAVING.
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
HAVING `sID` in (select `siteID` from `sites` where `foo` = "bar")
order by `timestamp` desc limit 100
Though i also agree with the other answers that your query could be better structured.

Try the following
SELECT
a.userID as uID
,u.siteID as sID
FROM
actions as a
INNER JOIN
users as u ON u.userID = a.userID
INNER JOIN
sites as s ON u.siteID = s.siteID
WHERE
s.foo = 'bar'
ORDER BY
a.timestamp DESC
LIMIT 100
If you wish to use a field from the select section later you can try a subselect
SELECT One,
Two,
One + Two as Three
FROM (
SELECT 1 AS One,
2 as Two
) sub

I don't know whether this was not in the SQL standard 11 years ago, but I found it the easiest way to use HAVING:
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
order by `timestamp` desc limit 100
HAVING `sID` in (select `siteID` from `sites` where `foo` = "bar")

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

mysql "group by" very slow query - sql

The biggest problem is that you have no usable indexes on msgtable. Create an index on at least senderid and recipientid, and it should help the speed of your query, as it will limit the number of results needing to be scanned.

Related

Query to get record id if exists in any table

Flattening nested query in WHERE clause with NOT IN

selecting records from parent / child tables

MySQL Left Join with conditional

What's wrong with this MySQL query? SELECT * AS `x`, how to use x again later?

Categories

Resources