Is it possible to update an "order" column from within a trigger in MySQL? - sql

We have a table in our system that would benefit from a numeric column so we can easily grab the 1st, 2nd, 3rd records for a job. We could, of course, update this column from the application itself, but I was hoping to do it in the database.
The final method must handle cases where users insert data that belongs in the "middle" of the results, as they may receive information out of order. They may also edit or delete records, so there will be corresponding update and delete triggers.
The table:
CREATE TABLE `test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`seq` int(11) unsigned NOT NULL,
`job_no` varchar(20) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
And some example data:
mysql> SELECT * FROM test ORDER BY job_no, seq;
+----+-----+--------+------------+
| id | seq | job_no | date |
+----+-----+--------+------------+
| 5 | 1 | 123 | 2009-10-05 |
| 6 | 2 | 123 | 2009-10-01 |
| 4 | 1 | 123456 | 2009-11-02 |
| 3 | 2 | 123456 | 2009-11-10 |
| 2 | 3 | 123456 | 2009-11-19 |
+----+-----+--------+------------+
I was hoping to update the "seq" column from a t rigger, but this isn't allowed by MySQL, with an error "Can't update table 'test' in stored function/trigger because it is already used by statement which invoked this stored function/trigger".
My test trigger is as follows:
CREATE TRIGGER `test_after_ins_tr` AFTER INSERT ON `test`
FOR EACH ROW
BEGIN
SET #seq = 0;
UPDATE
`test` t
SET
t.`seq` = #seq := (SELECT #seq + 1)
WHERE
t.`job_no` = NEW.`job_no`
ORDER BY
t.`date`;
END;
Is there any way to achieve what I'm after other than remembering to call a function after each update to this table?

What about this?
CREATE TRIGGER `test_after_ins_tr` BEFORE INSERT ON `test`
FOR EACH ROW
BEGIN
SET #seq = (SELECT COALESCE(MAX(seq),0) + 1 FROM test t WHERE t.job_no = NEW.job_no);
SET NEW.seq = #seq;
END;

From Sergi's comment above:
http://dev.mysql.com/doc/refman/5.1/en/stored-program-restrictions.html - "Within a stored function or trigger, it is not permitted to modify a table that is already being used (for reading or writing) by the statement that invoked the function or trigger."

Related

Create trigger to automatically update column with subquery

In my application, I have a couple of tables, lessons and votes. Here's what they look like:
Lessons
+-------------+---------+----------------------+
| Column | Type | Modifiers |
|-------------+---------+----------------------|
| id | uuid | not null |
| votes_total | integer | not null default 0 |
+-------------+---------+----------------------+
Votes
+-------------+---------+-------------+
| Column | Type | Modifiers |
|-------------+---------+-------------|
| positive | boolean | not null |
| user_id | uuid | not null |
| lesson_id | uuid | not null |
+-------------+---------+-------------+
Whenever a row in votes is inserted, updated or deleted, I'd like to update the votes_total column of the related lesson using a subquery. Here's what I've tried:
CREATE PROCEDURE update_lesson_votes_total()
LANGUAGE SQL
AS $$
UPDATE lessons
SET votes_total = (
SELECT SUM(CASE WHEN positive THEN 1 ELSE -1 END)
FROM votes
WHERE lesson_id = NEW.lesson_id
)
WHERE id = NEW.lesson_id;
$$;
CREATE TRIGGER votes_change
AFTER INSERT OR UPDATE OR DELETE ON votes
FOR EACH ROW
EXECUTE PROCEDURE update_lesson_votes_total();
However, when I try to run this in a migration, I get the following error:
(Postgrex.Error) ERROR 42601 (syntax_error) cannot insert multiple commands into a prepared statement
Hi your function must return a trigger. You can't use new in a trigger on delete. You have to create at least 2 triggers calling the same function and use TG_OP to know if the function is triggered by insert, update or delete.
Then in a case or if statement you can use new or old to get the id's value.
CREATE FUNCTION update_lesson_votes_total()
returns trigger
LANGUAGE plpgsql
AS $$
begin
UPDATE lessons
SET votes_total = (
SELECT SUM(CASE WHEN positive THEN 1 ELSE -1 END)
FROM votes
WHERE lesson_id = NEW.lesson_id
)
WHERE id = NEW.lesson_id;
return null;
end;
$$;
CREATE TRIGGER votes_change_u_i
AFTER INSERT OR UPDATE ON votes
FOR EACH ROW
EXECUTE PROCEDURE update_lesson_votes_total();
CREATE TRIGGER votes_change_d
AFTER DELETE ON votes
FOR EACH ROW
EXECUTE PROCEDURE update_lesson_votes_total();

Audited table and foreign key

I have a database with multiples tables that must be audited.
As an example, I have a table of objects defined with an unique ID, a name and a description.
The name will always be the same. It is not possible to update it. "ObjectA" will always be "ObjectA".
As you see the name is not unique in the database but only in the logic.
The rows "from", "to" and "creator_id" are used to audit the changes. "from" is the date of the change, "to" is the date when a new row has been added and is null when it is the latest row. "creator_id" is the ID of the user that made the change.
+----+----------+--------------+----------------------+----------------------+------------+
| id | name | description | from | to | creator_id |
+----+----------+--------------+----------------------+----------------------+------------+
| 1 | ObjectA | My object | 2021-05-30T00:05:00Z | 2021-05-31T05:04:36Z | 18 |
| 2 | ObjectB | My desc | 2021-05-30T02:07:25Z | null | 15 |
| 3 | ObjectA | Super object | 2021-05-31T05:04:36Z | null | 20 |
+----+----------+--------------+----------------------+----------------------+------------+
Now I have another table that must have a foreign key to this object table based on the "unique" object name.
+----+---------+-------------+
| id | foo | object_name |
+----+---------+-------------+
| 1 | blabla | ObjectA |
| 2 | wawawa | ObjectB |
+----+---------+-------------+
How can I create this link between those 2 tables ?
I already tried to create another table with a uuid and add a column "unique_identifier" in the object table. The foreign key will be then linked to this uuid table and not the object table. The issue is that I have multiple tables with this problem and I will have to create the double number of table.
It is also possible to use the object ID as the FK instead of the name but it would mean that I must update every table with that FK with the new ID when updating an object.
By the SQL standard, a foreign key must reference either the primary key or a unique key of the parent table. If the primary key has multiple columns, the foreign key must have the same number and order of columns. Therefore the foreign key references a unique row in the parent table; there can be no duplicates.
Another solution is to use trigger, you can check the existence of the object in objects table before you insert into another table.
Update : Adding code
Prepare the tables and create trigger: (I have only included 3 columns in Objects table for simplicity. In trigger, I am just printing the error in else part, you could raise error suing RAISEERROR function to return the error to client)
Create table AuditObjects(id int identity (1,1),ObjectName varchar(20), ObjectDescription varchar(100) )
Insert into AuditObjects values('ObjectA','description ObjectA Test')
Insert into AuditObjects values('ObjectB','description ObjectB Test')
Insert into AuditObjects values('ObjectC','description ObjectC Test')
Insert into AuditObjects values('ObjectB','description ObjectB Test')
Insert into AuditObjects values('ObjectB','description ObjectB Test')
Insert into AuditObjects values('ObjectA','description ObjectA Test')
Create table ObjectTab2 (id int identity (1,1),foo varchar(200), ObjectName varchar(20))
go
CREATE TRIGGER t_CheckObject ON ObjectTab2 INSTEAD OF INSERT
AS BEGIN
Declare #errormsg varchar(200), #ObjectName varchar(20)
select #ObjectName = objectname from INSERTED
if exists(select 1 from AuditObjects where objectname = #ObjectName)
Begin
INSERT INTO ObjectTab2 (foo, Objectname)
Select foo, Objectname
from INSERTED
End
Else
Begin
Select #errormsg = 'Object '+objectname+ ' does not exists in AuditObjects table'
from Inserted
print(#errormsg)
End
END;
Now if you try to insert a row in ObjectTab2 with object name as "ObjectC", insert will be allowed as "objectC" is present in audit table.
Insert into ObjectTab2 values('blabla', 'ObjectC')
Select * from ObjectTab2
id foo ObjectName
----------- ------ --------------------
1 blabla ObjectC
However, if you try to enter "ObjectD", it will not make an insert and give error msg in output.
Insert into ObjectTab2 values('Inserting ObjectD', 'ObjectD')
Object ObjectD does not exists in AuditObjects table
Well its not what you asked for but give you the same functionality and results.
Can you not still go ahead with linking the two tables based on 'object name'. The only difference would be - when you join the two tables, you would get multiple records from table1 (the first table you were referencing). You may then add filter condition based on from and to, as per your requirements.
Post Edit -
What I meant is, you can still achieve the desired results without introducing Foreign Key in this scenario -
Let's call your tables - Table1 and Table2
--Below will give you all records from Table1
SELECT T2.*, T1.description, T1.creator_id, T1.from, T1.to
FROM TABLE2 T2
INNER JOIN TABLE1 T1 ON T2.OBJECT_NAME = T1.NAME;
--Below will give you ONLY those records from Table1 whose TO is null
SELECT T2.*, T1.description, T1.creator_id, T1.from, T1.to
FROM TABLE2 T2
INNER JOIN TABLE1 T1 ON T2.OBJECT_NAME = T1.NAME
WHERE T1.TO IS NULL;
I decided to go with an additional table to have this final design:
Table "Object"
+-------+--------------------------------------+---------+--------------+----------------------+----------------------+------------+
| id PK | identifier FK | name | description | from | to | creator_id |
+-------+--------------------------------------+---------+--------------+----------------------+----------------------+------------+
| 1 | 123e4567-e89b-12d3-a456-426614174000 | ObjectA | My object | 2021-05-30T00:05:00Z | 2021-05-31T05:04:36Z | 18 |
| 2 | 123e4567-e89b-12d3-a456-524887451057 | ObjectB | My desc | 2021-05-30T02:07:25Z | null | 15 |
| 3 | 123e4567-e89b-12d3-a456-426614174000 | ObjectA | Super object | 2021-05-31T05:04:36Z | null | 20 |
+-------+--------------------------------------+---------+--------------+----------------------+----------------------+------------+
Table "Object_identifier"
+--------------------------------------+
| identifier PK |
+--------------------------------------+
| 123e4567-e89b-12d3-a456-426614174000 |
| 123e4567-e89b-12d3-a456-524887451057 |
+--------------------------------------+
Table "foo"
+-------+--------+--------------------------------------+
| id PK | foo | object_identifier FK |
+-------+--------+--------------------------------------+
| 1 | blabla | 123e4567-e89b-12d3-a456-426614174000 |
| 2 | wawawa | 123e4567-e89b-12d3-a456-524887451057 |
+-------+--------+--------------------------------------+

Creating a trigger to copy data from 1 table to another

I am trying to create a trigger that will copy data from table 1 when and paste it into table 2, when a new entry has been put into table 1:
Table 1
id | first_name | last_name | email | uid | pwd
----+------------+-----------+-------+-----+-----
Table 2
user_id | user_first_name | user_last_name | user_uid
---------+-----------------+----------------+---------
the code i am using is this :
DROP TRIGGER IF EXISTS usersetup_identifier ON users;
CREATE OR REPLACE FUNCTION usersetup_identifier_insert_update() RETURNS trigger AS $BODY$
BEGIN
if NEW.identifier is null then
NEW.identifier := "INSERT INTO users_setup (user_id, user_first_name, user_last_name, user_uid)
SELECT id, first_name, last_name, uid
FROM users";
end if;
RETURN NEW;
end
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER usersetup_identifier
AFTER INSERT OR UPDATE ON users FOR EACH ROW
EXECUTE PROCEDURE usersetup_identifier_insert_update();
but when i insert data into table 1 i am getting this error message :
NOTICE: identifier "INSERT INTO users_setup (user_id, user_first_name, user_last_name, user_uid)
SELECT id, first_name, last_name, uid
FROM users" will be truncated to "INSERT INTO users_setup (user_id, user_first_name, user_last_na"
ERROR: record "new" has no field "identifier"
CONTEXT: SQL statement "SELECT NEW.identifier is null"
PL/pgSQL function usersetup_identifier_insert_update() line 3 at IF
the table descriptions are:
Table "public.users"
Column | Type | Collation | Nullable | Default
------------+---------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('users_id_seq'::regclass)
first_name | character(20) | | not null |
last_name | character(20) | | not null |
email | character(60) | | not null |
uid | character(20) | | not null |
pwd | character(20) | | not null |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_email_key" UNIQUE CONSTRAINT, btree (email)
"users_pwd_key" UNIQUE CONSTRAINT, btree (pwd)
"users_uid_key" UNIQUE CONSTRAINT, btree (uid)
Triggers:
usersetup_identifier AFTER INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE usersetup_identifier_insert_update()
All the columns match there corresponding columns
can any one help and tell me where i am going wrong?
Table "public.users_setup"
Column | Type | Collation | Nullable | Default
-----------------+------------------------+-----------+----------+-----------------------------------------
id | integer | | not null | nextval('users_setup_id_seq'::regclass)
user_id | integer | | |
user_first_name | character(20) | | |
user_last_name | character(20) | | |
user_uid | character(20) | | |
Can any one help me with where I am going wrong?
There are multiple errors in your code
the table users has a column named identifier so the expression NEW.identifier is invalid
You are assigning a value to a (non-existing) column with the expression new.identifier := ... - but you want to run an INSERT statement, not assign a value.
String values need to be enclosed in single quotes, e.g. 'Arthur', double quotes denote identifiers (e.g. a table or column name). But there is no column named "INSERT INTO use ..."
To access the values of the row being inserted you need to use the new record and the column names. No need to select from the table:
As far as I can tell, this is what you want:
CREATE OR REPLACE FUNCTION usersetup_identifier_insert_update()
RETURNS trigger
AS
$BODY$
BEGIN
INSERT INTO users_setup (user_id, user_first_name, user_last_name, user_uid)
values (new.id, new.first_name, new.last_name, new.uid);
RETURN NEW;
end
$BODY$
LANGUAGE plpgsql;
Unrelated, but:
copying data around like that is bad database design. What happens if you change the user's name? Then you would need to UPDATE the user_setup table as well. It is better to only store a (foreign key) reference in the user_setup table that references the users table.

Write SQL script to insert data

In a database that contains many tables, I need to write a SQL script to insert data if it is not exist.
Table currency
| id | Code | lastupdate | rate |
+--------+---------+------------+-----------+
| 1 | USD | 05-11-2012 | 2 |
| 2 | EUR | 05-11-2012 | 3 |
Table client
| id | name | createdate | currencyId|
+--------+---------+------------+-----------+
| 4 | tony | 11-24-2010 | 1 |
| 5 | john | 09-14-2010 | 2 |
Table: account
| id | number | createdate | clientId |
+--------+---------+------------+-----------+
| 7 | 1234 | 12-24-2010 | 4 |
| 8 | 5648 | 12-14-2010 | 5 |
I need to insert to:
currency (id=3, Code=JPY, lastupdate=today, rate=4)
client (id=6, name=Joe, createdate=today, currencyId=Currency with Code 'USD')
account (id=9, number=0910, createdate=today, clientId=Client with name 'Joe')
Problem:
script must check if row exists or not before inserting new data
script must allow us to add a foreign key to the new row where this foreign related to a row already found in database (as currencyId in client table)
script must allow us to add the current datetime to the column in the insert statement (such as createdate in client table)
script must allow us to add a foreign key to the new row where this foreign related to a row inserted in the same script (such as clientId in account table)
Note: I tried the following SQL statement but it solved only the first problem
INSERT INTO Client (id, name, createdate, currencyId)
SELECT 6, 'Joe', '05-11-2012', 1
WHERE not exists (SELECT * FROM Client where id=6);
this query runs without any error but as you can see I wrote createdate and currencyid manually, I need to take currency id from a select statement with where clause (I tried to substitute 1 by select statement but query failed).
This is an example about what I need, in my database, I need this script to insert more than 30 rows in more than 10 tables.
any help
You wrote
I tried to substitute 1 by select statement but query failed
But I wonder why did it fail? What did you try? This should work:
INSERT INTO Client (id, name, createdate, currencyId)
SELECT
6,
'Joe',
current_date,
(select c.id from currency as c where c.code = 'USD') as currencyId
WHERE not exists (SELECT * FROM Client where id=6);
It looks like you can work out if the data exists.
Here is a quick bit of code written in SQL Server / Sybase that I think answers you basic questions:
create table currency(
id numeric(16,0) identity primary key,
code varchar(3) not null,
lastupdated datetime not null,
rate smallint
);
create table client(
id numeric(16,0) identity primary key,
createddate datetime not null,
currencyid numeric(16,0) foreign key references currency(id)
);
insert into currency (code, lastupdated, rate)
values('EUR',GETDATE(),3)
--inserts the date and last allocated identity into client
insert into client(createddate, currencyid)
values(GETDATE(), ##IDENTITY)
go

MySQL GROUP BY optimization

This question is a more specific version of a previous question I asked
Table
CREATE TABLE Test4_ClusterMatches
(
`match_index` INT UNSIGNED,
`cluster_index` INT UNSIGNED,
`id` INT NOT NULL AUTO_INCREMENT,
`tfidf` FLOAT,
PRIMARY KEY (`cluster_index`,`match_index`,`id`)
);
The query I want to run
mysql> explain SELECT `match_index`, SUM(`tfidf`) AS total
FROM Test4_ClusterMatches WHERE `cluster_index` IN (1,2,3 ... 3000)
GROUP BY `match_index`;
The Problem with the query
It uses temporary and filesort so its to slow+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | Test4_ClusterMatches | range | PRIMARY | PRIMARY | 4 | NULL | 51540 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
With the current indexing the query would need to sort by cluster_index first to eliminate the use of temporary and filesort, but doing so gives the wrong results for sum(tfidf).
Changing the primary key to
PRIMARY KEY (`match_index`,`cluster_index`,`id`)
Doesn't use file sort or temp tables but it uses 14,932,441 rows so it is also to slow
+----+-------------+----------------------+-------+---------------+---------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+----------+--------------------------+
| 1 | SIMPLE | Test5_ClusterMatches | index | NULL | PRIMARY | 16 | NULL | 14932441 | Using where; Using index |
+----+-------------+----------------------+-------+---------------+---------+---------+------+----------+--------------------------+
Tight Index Scan
Using tight index scan by running the search for just one index
mysql> explain SELECT match_index, SUM(tfidf) AS total
FROM Test4_ClusterMatches WHERE cluster_index =3000
GROUP BY match_index;Eliminates the temporary tables and filesort.
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+--------------------------+
| 1 | SIMPLE | Test4_ClusterMatches | ref | PRIMARY | PRIMARY | 4 | const | 27 | Using where; Using index |
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+--------------------------+ I'm not sure if this can be exploited with some magic sql-fu that I haven't come across yet?
Question
How can I change my query so that it use 3,000 cluster_indexes, avoids using temporary and filesort without it needing to use 14,932,441 rows?
Update
Using the table
CREATE TABLE Test6_ClusterMatches
(
match_index INT UNSIGNED,
cluster_index INT UNSIGNED,
id INT NOT NULL AUTO_INCREMENT,
tfidf FLOAT,
PRIMARY KEY (id),
UNIQUE KEY(cluster_index,match_index)
);
The query below then gives 10 rows in set (0.41 sec) :)
SELECT `match_index`, SUM(`tfidf`) AS total FROM Test6_ClusterMatches WHERE
`cluster_index` IN (.....)
GROUP BY `match_index` ORDER BY total DESC LIMIT 0,10;
but its using temporary and filesort
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | Test6_ClusterMatches | range | cluster_index | cluster_index | 5 | NULL | 78663 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+----------------------------------------------+
I'm wondering if theres anyway to get it faster by eliminating the using temporary and using filesort?
I had a quick look and this is what I came up with - hope it helps...
SQL Table
drop table if exists cluster_matches;
create table cluster_matches
(
cluster_id int unsigned not null,
match_id int unsigned not null,
...
tfidf float not null default 0,
primary key (cluster_id, match_id) -- if this isnt unique add id to the end !!
)
engine=innodb;
Test Data
select count(*) from cluster_matches
count(*)
========
17974591
select count(distinct(cluster_id)) from cluster_matches;
count(distinct(cluster_id))
===========================
1000000
select count(distinct(match_id)) from cluster_matches;
count(distinct(match_id))
=========================
6000
explain select
cm.match_id,
sum(tfidf) as sum_tfidf,
count(*) as count_tfidf
from
cluster_matches cm
where
cm.cluster_id between 5000 and 10000
group by
cm.match_id
order by
sum_tfidf desc limit 10;
id select_type table type possible_keys key key_len ref rows Extra
== =========== ===== ==== ============= === ======= === ==== =====
1 SIMPLE cm range PRIMARY PRIMARY 4 290016 Using where; Using temporary; Using filesort
runtime - 0.067 seconds.
Pretty respectable runtime of 0.067 seconds but I think we can make it better.
Stored Procedure
You will have to forgive me for not wanting to type/pass in a list of 5000+ random cluster_ids !
call sum_cluster_matches(null,1); -- for testing
call sum_cluster_matches('1,2,3,4,....5000',1);
The bulk of the sproc isnt very elegant but all it does is split a csv string into individual cluster_ids and populate a temp table.
drop procedure if exists sum_cluster_matches;
delimiter #
create procedure sum_cluster_matches
(
in p_cluster_id_csv varchar(65535),
in p_show_explain tinyint unsigned
)
proc_main:begin
declare v_id varchar(10);
declare v_done tinyint unsigned default 0;
declare v_idx int unsigned default 1;
create temporary table tmp(cluster_id int unsigned not null primary key);
-- not every elegant - split the string into tokens and put into a temp table...
if p_cluster_id_csv is not null then
while not v_done do
set v_id = trim(substring(p_cluster_id_csv, v_idx,
if(locate(',', p_cluster_id_csv, v_idx) > 0,
locate(',', p_cluster_id_csv, v_idx) - v_idx, length(p_cluster_id_csv))));
if length(v_id) > 0 then
set v_idx = v_idx + length(v_id) + 1;
insert ignore into tmp values(v_id);
else
set v_done = 1;
end if;
end while;
else
-- instead of passing in a huge comma separated list of cluster_ids im cheating here to save typing
insert into tmp select cluster_id from clusters where cluster_id between 5000 and 10000;
-- end cheat
end if;
if p_show_explain then
select count(*) as count_of_tmp from tmp;
explain
select
cm.match_id,
sum(tfidf) as sum_tfidf,
count(*) as count_tfidf
from
cluster_matches cm
inner join tmp on tmp.cluster_id = cm.cluster_id
group by
cm.match_id
order by
sum_tfidf desc limit 10;
end if;
select
cm.match_id,
sum(tfidf) as sum_tfidf,
count(*) as count_tfidf
from
cluster_matches cm
inner join tmp on tmp.cluster_id = cm.cluster_id
group by
cm.match_id
order by
sum_tfidf desc limit 10;
drop temporary table if exists tmp;
end proc_main #
delimiter ;
Results
call sum_cluster_matches(null,1);
count_of_tmp
============
5001
id select_type table type possible_keys key key_len ref rows Extra
== =========== ===== ==== ============= === ======= === ==== =====
1 SIMPLE tmp index PRIMARY PRIMARY 4 5001 Using index; Using temporary; Using filesort
1 SIMPLE cm ref PRIMARY PRIMARY 4 vldb_db.tmp.cluster_id 8
match_id sum_tfidf count_tfidf
======== ========= ===========
1618 387 64
1473 387 64
3307 382 64
2495 373 64
1135 373 64
3832 372 57
3203 362 58
5464 358 67
2100 355 60
1634 354 52
runtime 0.028 seconds.
Explain plan and runtime much improved.
If the cluster_index values in the WHERE condition are continuous, then instead of IN use:
WHERE (cluster_index >= 1) and (cluster_index <= 3000)
If the values are not continuous then you can create a temporary table to hold the cluster_index values with an index and use an INNER JOIN to the temporary table.