How to choose indexes - sql

I'm trying to execute several times the following query :
SELECT st2.stop_id AS to_stop_id,
TIME_TO_SEC(
ADDTIME(TIMEDIFF(MIN(st1.time), %time),
TIMEDIFF(st2.time, st2.time))) AS duration
FROM stop_times st1,
stop_times st2,
trips tr,
calendar cal
WHERE tr.service_id = cal.service_id
AND tr.trip_id = st1.trip_id
AND st1.trip_id = st2.trip_id
AND st1.stop_id = %sid
AND st1.stop_seq +1 = st2.stop_seq
AND st1.time > %time
AND DATE(NOW()) BETWEEN cal.start_date AND
cal.end_date
GROUP BY st2.stop_id
However, it run extremely slow. I indexed the following attributes:
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| stop_times | 0 | st_id | 1 | st_id | A | 11431583 | NULL | NULL | | BTREE | | |
| stop_times | 1 | fk_tid_s | 1 | trip_id | A | 1039234 | NULL | NULL | YES | BTREE | | |
| stop_times | 1 | st_per_sid | 1 | stop_id | A | 33135 | NULL | NULL | YES | BTREE | | |
| calendar | 0 | PRIMARY | 1 | service_id | A | 5206 | NULL | NULL | | BTREE | | |
| calendar | 0 | PRIMARY | 1 | service_id | A | 5206 | NULL | NULL | | BTREE | | |
| trips | 0 | PRIMARY | 1 | trip_id | A | 449489 | NULL | NULL | | BTREE | | |
| trips | 1 | fk_rid | 1 | route_id | A | 1937 | NULL | NULL | YES | BTREE | | |
| trips | 1 | fk_sid | 1 | service_id | A | 7749 | NULL | NULL | YES | BTREE | | |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
(For some reasons, st_id is not show as a PRIMARY KEY, but it is, I don't know if it's important but just in case..)
I ran SQL EXPLAIN on this query and it gave me the following answer :
+------+-------------+-------+--------+-------------------------------------------------+---------------------+---------+------------------------------+------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+-------------------------------------------------+---------------------+---------+------------------------------+------+---------------------------------------------------------------------+
| 1 | SIMPLE | st1 | range | comp_uniq_st_seq,st_per_sid,comp_uniq_stid_time | comp_uniq_stid_time | 9 | NULL | 1396 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | tr | eq_ref | PRIMARY,fk_sid | PRIMARY | 8 | reseau_ratp.st1.trip_id | 1 | Using where |
| 1 | SIMPLE | cal | eq_ref | PRIMARY,comp_sid_date_en,comp_sid_date_st | PRIMARY | 4 | reseau_ratp.tr.service_id | 1 | Using where |
| 1 | SIMPLE | st2 | ref | comp_uniq_st_seq | comp_uniq_st_seq | 14 | reseau_ratp.st1.trip_id,func | 1 | Using index condition |
+------+-------------+-------+--------+-------------------------------------------------+---------------------+---------+------------------------------+------+---------------------------------------------------------------------+
What should I do to get this query running faster?
EDIT :
Query using the requested syntax :
SELECT st2.stop_id AS to_stop_id,
TIME_TO_SEC(
ADDTIME(TIMEDIFF(MIN(st1.time), %time),
TIMEDIFF(st2.time, st2.time))) AS duration
FROM stop_times st1
INNER JOIN stop_times st2
ON st1.trip_id = st2.trip_id AND st1.stop_seq + 1 = st2.stop_seq
INNER JOIN trips tr
ON tr.trip_id = st1.trip_id
INNER JOIN calendar cal
ON tr.service_id = cal.service_id
WHERE st1.stop_id = %sid
AND st1.time > %time
AND cal.start_date <= NOW()
AND cal.end_date >= NOW()
GROUP BY st2.stop_id
Here SHOW CREATE TABLE stop_times:
CREATE TABLE `stop_times` (
`trip_id` bigint(10) unsigned DEFAULT NULL,
`stop_id` int(10) DEFAULT NULL,
`time` time DEFAULT NULL,
`stop_seq` int(10) unsigned DEFAULT NULL,
UNIQUE KEY `comp_uniq_st_seq` (`trip_id`,`stop_seq`),
KEY `comp_uniq_stid_time` (`stop_id`,`time`),
CONSTRAINT `fk_sid_s` FOREIGN KEY (`stop_id`) REFERENCES `stops` (`stop_id`),
CONSTRAINT `fk_tid_s` FOREIGN KEY (`trip_id`) REFERENCES `trips` (`trip_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
For calendar :
CREATE TABLE `calendar` (
`service_id` int(10) unsigned NOT NULL,
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL,
PRIMARY KEY (`service_id`),
KEY `comp_sid_date_en` (`service_id`,`end_date`),
KEY `comp_sid_date_st` (`service_id`,`start_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
And for trips :
CREATE TABLE `trips` (
`trip_id` bigint(10) unsigned NOT NULL DEFAULT '0',
`route_id` int(10) unsigned DEFAULT NULL,
`service_id` int(10) unsigned DEFAULT NULL,
`trip_headsign` varchar(15) DEFAULT NULL,
`trip_short_name` varchar(15) DEFAULT NULL,
`direction_id` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`trip_id`),
KEY `fk_rid` (`route_id`),
KEY `fk_sid` (`service_id`),
CONSTRAINT `fk_rid` FOREIGN KEY (`route_id`) REFERENCES `routes` (`route_id`),
CONSTRAINT `fk_sid` FOREIGN KEY (`service_id`) REFERENCES `calendar` (`service_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

st1 needs this composite index: INDEX(stop_id, time)
Please use the JOIN ... ON syntax.
Please provide SHOW CREATE TABLE.
Here is a Cookbook on creating INDEXes from a SELECT.
(Edit)
Calendar is trickier to handle, and there is no "good" index. These may help:
INDEX(service_id, start_time)
INDEX(service_id, end_time)
plus, reformulate AND DATE(NOW()) BETWEEN cal.start_date AND cal.end_date into
AND cal.start_date <= NOW()
AND cal.end_time >= NOW()
(Edit 2)
Wherever practical, say NOT NULL. This is probably especially important in stop_times which does not have a PRIMARY KEY. Change the two columns in UNIQUE KEY comp_uniq_st_seq (trip_id,stop_seq) to be NOT NULL and turn it into PRIMARY KEY (trip_id, stop_seq). This will allow the performance benefits of "the PK is clustered with the data" to kick in.
Now that I see the CREATE TABLE for Calendar, and that service_id is the PRIMARY KEY, the two indexes I suggested for it are probably useless. (Again, this relates to "clustering".)
My Cookbook for building indexes may come in handy.

Related

Response Slow when Order BY is added to my SQL Query

I have the following job_requests table schema as shown here
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| available_to. | integer[] | NO | | | |
| available_type | varchar(255) | NO | | NULL | |
| start_at | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
I have the following query to return a list of records and order them by the type_of_pool value
WITH matching_jobs AS (
SELECT
job_requests_with_distance.*,
CASE WHEN (users.id = ANY (available_to) AND available_type = 0) THEN 'favourite'
ELSE 'normal'
END AS type_of_pool
FROM (
SELECT
job_requests.*,
users.id AS user_id,
FROM
job_requests,
users) AS job_requests_with_distance
LEFT JOIN users ON users.id = user_id
WHERE start_at > NOW() at time zone 'Asia/Kuala_Lumpur'
AND user_id = 491
AND (user_id != ALL(coalesce(unavailable_to, array[]::int[])))
)
SELECT
*
FROM
matching_jobs
WHERE (type_of_pool != 'normal')::BOOLEAN
ORDER BY
array_position (ARRAY['favourite','exclusive','normal']::text[], type_of_pool),
LIMIT 30
If i remove the ORDER BY function, it takes about 3ms but when I add the ORDER BY function, it takes about 1.3seconds to run.
Not sure how do i optimize this query to make it faster? I have read using Indexes and all but not sure how an index will help in this scenario.
Any help is appreciated.

Using CTE and aggregate functions with UPDATE

In PostgreSQL 10.6 I am trying to store results of 2 aggregate function calls into the avg_score and avg_time columns of the words_users table, but unfortunately get the syntax error:
WITH last_week_moves AS (
SELECT
m.gid,
m.uid,
m.played - LAG(m.played) OVER(PARTITION BY m.gid ORDER BY played) AS diff
FROM words_moves m
JOIN words_games g ON (m.gid = g.gid AND 5 IN (g.player1, g.player2))
WHERE m.played > CURRENT_TIMESTAMP - INTERVAL '1 week'
)
UPDATE words_users SET
avg_score = (SELECT ROUND(AVG(score), 1) FROM words_moves WHERE uid = 5),
avg_time = TO_CHAR(AVG(diff), 'HH24:MI')
FROM last_week_moves
WHERE uid = 5
GROUP BY uid;
(I am using the hardcoded uid = 5 in the above statement, but in the real life the latter is wrapped inside of a PL/PgSQL stored function and is using a parameter uid = in_uid).
ERROR: 42601: syntax error at or near "GROUP"
LINE 15: GROUP BY uid
^
LOCATION: scanner_yyerror, scan.l:1128
The database seems to be unhappy with the GROUP BY, but I need it for AVG(diff), because the CTE delivers the times between moves always for 2 players in a game:
SELECT
m.gid,
m.uid,
m.played - LAG(m.played) OVER(PARTITION BY m.gid ORDER BY played) AS diff
FROM words_moves m
JOIN words_games g ON (m.gid = g.gid AND 5 IN (g.player1, g.player2))
WHERE m.played > CURRENT_TIMESTAMP - INTERVAL '1 week';
gid | uid | diff
-------+-------+-----------------------
50399 | 774 | ¤
50608 | 8977 | ¤
50608 | 5 | 00:39:48.121149
50608 | 8977 | 00:09:46.221235
50608 | 5 | 01:35:23.524209
50608 | 8977 | 09:26:40.794061
50697 | 5 | ¤
50697 | 10322 | 02:13:16.502079
50697 | 5 | 01:47:44.681788
50697 | 10322 | 00:01:31.597973
50697 | 5 | 12:11:24.54716
50697 | 10322 | 12:01:15.078243
50697 | 5 | 11:52:39.60056
50697 | 10322 | 00:11:30.491137
50697 | 5 | 00:14:53.612513
50697 | 10322 | 01:45:23.940957
...
52469 | 5 | 02:46:29.768655
52469 | 8550 | 01:16:45.169882
52469 | 5 | 08:38:00.691552
(59 rows)
Does anybody please know, how to change my UPDATE query?
Below are the 3 tables in question:
# \d words_users
Table "public.words_users"
Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+------------------------------------------
uid | integer | | not null | nextval('words_users_uid_seq'::regclass)
created | timestamp with time zone | | not null |
visited | timestamp with time zone | | not null |
ip | inet | | not null |
fcm | text | | |
apns | text | | |
adm | text | | |
motto | text | | |
vip_until | timestamp with time zone | | |
grand_until | timestamp with time zone | | |
banned_until | timestamp with time zone | | |
banned_reason | text | | |
elo | integer | | not null |
medals | integer | | not null |
coins | integer | | not null |
avg_score | double precision | | |
avg_time | text | | |
Indexes:
"words_users_pkey" PRIMARY KEY, btree (uid)
Check constraints:
"words_users_banned_reason_check" CHECK (length(banned_reason) > 0)
"words_users_elo_check" CHECK (elo >= 0)
"words_users_medals_check" CHECK (medals >= 0)
Referenced by:
TABLE "words_chat" CONSTRAINT "words_chat_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_games" CONSTRAINT "words_games_player1_fkey" FOREIGN KEY (player1) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_games" CONSTRAINT "words_games_player2_fkey" FOREIGN KEY (player2) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_moves" CONSTRAINT "words_moves_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_reviews" CONSTRAINT "words_reviews_author_fkey" FOREIGN KEY (author) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_reviews" CONSTRAINT "words_reviews_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_scores" CONSTRAINT "words_scores_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_social" CONSTRAINT "words_social_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
TABLE "words_stats" CONSTRAINT "words_stats_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
# \d words_moves
Table "public.words_moves"
Column | Type | Collation | Nullable | Default
---------+--------------------------+-----------+----------+------------------------------------------
mid | bigint | | not null | nextval('words_moves_mid_seq'::regclass)
action | text | | not null |
gid | integer | | not null |
uid | integer | | not null |
played | timestamp with time zone | | not null |
tiles | jsonb | | |
score | integer | | |
letters | text | | |
hand | text | | |
puzzle | boolean | | not null | false
Indexes:
"words_moves_pkey" PRIMARY KEY, btree (mid)
"words_moves_gid_played_idx" btree (gid, played DESC)
"words_moves_uid_idx" btree (uid)
Check constraints:
"words_moves_score_check" CHECK (score >= 0)
Foreign-key constraints:
"words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
"words_moves_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
Referenced by:
TABLE "words_scores" CONSTRAINT "words_scores_mid_fkey" FOREIGN KEY (mid) REFERENCES words_moves(mid) ON DELETE CASCADE
# \d words_games
Table "public.words_games"
Column | Type | Collation | Nullable | Default
----------+--------------------------+-----------+----------+------------------------------------------
gid | integer | | not null | nextval('words_games_gid_seq'::regclass)
created | timestamp with time zone | | not null |
finished | timestamp with time zone | | |
player1 | integer | | not null |
player2 | integer | | |
played1 | timestamp with time zone | | |
played2 | timestamp with time zone | | |
state1 | text | | |
state2 | text | | |
reason | text | | |
hint1 | text | | |
hint2 | text | | |
score1 | integer | | not null |
score2 | integer | | not null |
chat1 | integer | | not null |
chat2 | integer | | not null |
hand1 | character(1)[] | | not null |
hand2 | character(1)[] | | not null |
pile | character(1)[] | | not null |
letters | character(1)[] | | not null |
values | integer[] | | not null |
bid | integer | | not null |
friendly | boolean | | |
Indexes:
"words_games_pkey" PRIMARY KEY, btree (gid)
"words_games_player1_coalesce_idx" btree (player1, COALESCE(finished, 'infinity'::timestamp with time zone))
"words_games_player2_coalesce_idx" btree (player2, COALESCE(finished, 'infinity'::timestamp with time zone))
Check constraints:
"words_games_chat1_check" CHECK (chat1 >= 0)
"words_games_chat2_check" CHECK (chat2 >= 0)
"words_games_check" CHECK (player1 <> player2)
"words_games_score1_check" CHECK (score1 >= 0)
"words_games_score2_check" CHECK (score2 >= 0)
Foreign-key constraints:
"words_games_bid_fkey" FOREIGN KEY (bid) REFERENCES words_boards(bid) ON DELETE CASCADE
"words_games_player1_fkey" FOREIGN KEY (player1) REFERENCES words_users(uid) ON DELETE CASCADE
"words_games_player2_fkey" FOREIGN KEY (player2) REFERENCES words_users(uid) ON DELETE CASCADE
Referenced by:
TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
UPDATE:
As suggested by #lau I have tried moving the AVG into the CTE, but get another syntax error:
WITH last_week_moves AS (
SELECT
m.gid,
m.uid,
TO_CHAR(AVG(m.played - LAG(m.played) OVER(PARTITION BY m.gid ORDER BY played)), 'HH24:MI') AS diff
FROM words_moves m
JOIN words_games g ON (m.gid = g.gid AND 5 IN (g.player1, g.player2))
WHERE m.played > CURRENT_TIMESTAMP - INTERVAL '1 week'
GROUP BY uid
)
UPDATE words_users SET
avg_score = (SELECT ROUND(AVG(score), 1) FROM words_moves WHERE uid = 5),
avg_time = diff
FROM last_week_moves
WHERE uid = 5;
ERROR: 42803: aggregate function calls cannot contain window function calls
LINE 5: TO_CHAR(AVG(m.played - LAG(m.played)...
^
LOCATION: check_agg_arguments_walker, parse_agg.c:728
You seem to want:
WITH last_week_moves AS (
SELECT m.uid,
(MAX(m.played) - MIN(m.played)) / NULLIF(COUNT(*) - 1, 0) as avg_diff,
AVG(score) as avg_score
FROM words_moves m JOIN
words_games g
ON m.gid = g.gid AND 5 IN (g.player1, g.player2)
WHERE m.played > CURRENT_TIMESTAMP - INTERVAL '1 week'
GROUP BY m.uid
)
UPDATE words_users wu
SET avg_score = lwm.avg_score,
avg_time = TO_CHAR(avg_diff, 'HH24:MI')
FROM last_week_moves lwm
WHERE wu.uid = lwm.uid AND
wu.uid = 5;
Note that this simplifies the calculate of the diff calculation, so lag() is not needed.
You can see that these are equivalent:
value diff
1
4 3
9 5
The average of diff is obviously 4. This is ((4 - 1) + (9 - 4)) / 2. You see that the "4"s cancel, so it is really (9 - 1) / 2. This observation generalizes.

Rewriting this subquery?

I am trying to build a new table such that the values in the existing table are NOT contained (but obviously the following checks for contained) in another table. Following is my table structure:
mysql> explain t1;
+-----------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| point | bigint(20) unsigned | NO | MUL | 0 | |
+-----------+---------------------+------+-----+---------+-------+
mysql> explain whitelist;
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| x | bigint(20) unsigned | YES | | NULL | |
| y | bigint(20) unsigned | YES | | NULL | |
| geonetwork | linestring | NO | MUL | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
My query looks like this:
SELECT point
FROM t1
WHERE EXISTS(SELECT source
FROM whitelist
WHERE MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))));
Explain:
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| 1 | PRIMARY | t1 | index | NULL | point | 8 | NULL | 1001 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | whitelist | ALL | _geonetwork | NULL | NULL | NULL | 3257 | Using where |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
The query is taking 6 seconds to execute for 1000 records in t1 which is unacceptable for me. How can I rewrite this query using Joins (or perhaps a faster way if that exists) if I don't have a column to join on? Even a stored procedure is acceptable I guess in the worst case. My goal is to finally create a new table containing entries from t1. Any suggestions?
Unless the query optimizer is failing, a WHERE EXISTS construct should result in the same plan as a join with a GROUP clause. Look at optimizing MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)')))), that's probably where your query is spending all its time. I don't have a suggestion for that, but here's your query written with a JOIN:
Select t1.point
from t1
join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
group by t1.point
;
or to get the points in t1 not in whitelist:
Select t1.point
from t1
left join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
where whitelist.id is null
;
This seems like a case where de-nomalizing t1 might be beneficial. Adding a GeomFrmTxt column with a value of GeomFromText(CONCAT('POINT(', t1.point, ' 0)')) could speed up the query you already have.

Integer comparison in mysql using an index

I need to compare integers in a mysql table. Pretty simple, but this table is fairly large... so queries take a long time. No problem, I can use an index. According to MySQL documentation, I should be able to use an index for comparison operators:
"A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN"
However, when I try this it has no effect on performance and the index is not used according to explain :(
SELECT * FROM Node n WHERE n.X < 800000
That results in extremely poor performance and calling explain shows our "Rectangle_Index" as being of the possible_keys but NULL key was actually used... Here's are create table statement:
CREATE TABLE `Visual_Node` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`X` bigint(20) NOT NULL,
`Y` bigint(20) NOT NULL,
`X_plus_Width` bigint(20) DEFAULT NULL,
`Y_plus_Height` bigint(20) DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `Rectangle_Index` (`X`,`X_plus_Width`,`Y`,`Y_plus_Height`)
) ENGINE=InnoDB AUTO_INCREMENT=4340743 DEFAULT CHARSET=latin1
Can anyone help this query? The actual query I want to run is the following:
SELECT * FROM Node n WHERE 800000 BETWEEN n.X and n.X_plus_Width AND 1234567 BETWEEN n.Y and n.Y_plus_Height
Update (asked in one of the answers below)
Heres the output of the explain for the basic query:
altering the table structure is very difficult for me. Here's the output of my explain:
mysql> explain select * from Node n where n.X < 800000;
+----+-------------+-------+------+-----------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | n | ALL | Rectangle_Index | NULL | NULL | NULL | 173952 | Using where |
+----+-------------+-------+------+-----------------+------+---------+------+--------+-------------+
1 row in set (0.02 sec)
If you rewrite your query as
SELECT *
FROM Node n
WHERE
n.X <= 800000 AND
n.X_plus_Width >= 800000 AND
n.Y <= 1234567 AND
n.Y_plus_Height >= 1234567
Mysql could use index for one column (it can't use index for more than 1 range condition, and you have 4 of them.
I suggest you to take a look at Spatial extensions
Have you checked the details of multiple-column indexes - specifically, the part about how the optimizer is (or is not) able to use them. Here's a quote from this page:
If the table has a multiple-column
index, any leftmost prefix of the
index can be used by the optimizer to
find rows. For example, if you have a
three-column index on (col1, col2,
col3), you have indexed search
capabilities on (col1), (col1, col2),
and (col1, col2, col3).
Perhaps you could try creating multiple single-column indexes, rather than one multiple-column index?
EDIT 1:
I put together a simple test on my copy of MySQL (version 5.0.51a-24+lenny3). It shows that when using both your proper query, and your test query, your Rectangle_Index is being used. However, when using the proper query, the key_len is 8, suggesting that not all the parts of the multi-column index are being used. Perhaps the output from your version of MySQL differs in this respect.
As you'll see from the output below, even when additional indexes are added, the Rectangle_Index index is still chosen in all cases, except only the Y column is referenced in the query:
CREATE TABLE `Visual_Node` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`X` bigint(20) NOT NULL,
`Y` bigint(20) NOT NULL,
`X_plus_Width` bigint(20) DEFAULT NULL,
`Y_plus_Height` bigint(20) DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `Rectangle_Index` (`X`,`X_plus_Width`,`Y`,`Y_plus_Height`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `Visual_Node` VALUES
(1, 100000, 1000000, 3000000, 3000000),
(2, 200000, 2000000, 4000000, 4000000),
(3, 300000, 3000000, 5000000, 5000000),
(4, 400000, 4000000, 6000000, 6000000),
(5, 500000, 5000000, 7000000, 7000000),
(6, 600000, 6000000, 8000000, 8000000),
(7, 700000, 7000000, 9000000, 9000000),
(8, 800000, 8000000, 10000000, 10000000),
(9, 900000, 9000000, 11000000, 11000000),
(10, 1000000, 10000000, 12000000, 12000000);
EXPLAIN SELECT * FROM Visual_Node n WHERE n.X < 800000;
+----+-------------+-------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | range | Rectangle_Index | Rectangle_Index | 8 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+------+--------------------------+
EXPLAIN SELECT * FROM Visual_Node n WHERE n.Y < 800000;
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | index | NULL | Rectangle_Index | 34 | NULL | 10 | Using where; Using index |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+
EXPLAIN SELECT * FROM Visual_Node n
WHERE 800000 BETWEEN n.X and n.X_plus_Width
AND 1234567 BETWEEN n.Y and n.Y_plus_Height;
+----+-------------+-------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | range | Rectangle_Index | Rectangle_Index | 8 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+------+--------------------------+
ALTER TABLE `Visual_Node` ADD INDEX `X_Index` (`X`,`X_plus_Width`);
ALTER TABLE `Visual_Node` ADD INDEX `Y_Index` (`Y`,`Y_plus_Height`);
EXPLAIN SELECT * FROM Visual_Node n WHERE n.X < 800000;
+----+-------------+-------+-------+-------------------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | range | Rectangle_Index,X_Index | Rectangle_Index | 8 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+-------------------------+-----------------+---------+------+------+--------------------------+
EXPLAIN SELECT * FROM Visual_Node n WHERE n.Y < 800000;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | n | range | Y_Index | Y_Index | 8 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
EXPLAIN SELECT * FROM Visual_Node n
WHERE 800000 BETWEEN n.X and n.X_plus_Width
AND 1234567 BETWEEN n.Y and n.Y_plus_Height;
+----+-------------+-------+-------+---------------------------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | range | Rectangle_Index,X_Index,Y_Index | Rectangle_Index | 8 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------------------------+-----------------+---------+------+------+--------------------------+
ALTER TABLE `Visual_Node` ADD INDEX `X` (`X`,`X_plus_Width`);
ALTER TABLE `Visual_Node` ADD INDEX `X_plus_Width` (`X_plus_Width`);
ALTER TABLE `Visual_Node` ADD INDEX `Y` (`Y`);
ALTER TABLE `Visual_Node` ADD INDEX `Y_plus_Height` (`Y_plus_Height`);
EXPLAIN SELECT * FROM Visual_Node n WHERE n.X < 800000;
+----+-------------+-------+-------+---------------------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | range | Rectangle_Index,X_Index,X | Rectangle_Index | 8 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------------------+-----------------+---------+------+------+--------------------------+
EXPLAIN SELECT * FROM Visual_Node n WHERE n.Y < 800000;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | n | range | Y_Index,Y | Y_Index | 8 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
EXPLAIN SELECT * FROM Visual_Node n
WHERE 800000 BETWEEN n.X and n.X_plus_Width
AND 1234567 BETWEEN n.Y and n.Y_plus_Height;
+----+-------------+-------+-------+----------------------------------------------------------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------------------------------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | n | range | Rectangle_Index,X_Index,Y_Index,X,X_plus_Width,Y,Y_plus_Height | Rectangle_Index | 8 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+----------------------------------------------------------------+-----------------+---------+------+------+--------------------------+
Can you post the output from your EXPLAIN query?
What version of MySQL are you using?
EDIT 2:
The Spatial Extensions, as suggested by Naktibalda, are really cool. I'd not used these before, but if you are able to alter your table structure to use them, they may solve your problem.
Curious, I did a little research, and here's the result of my test scripts:
CREATE TABLE `Spatial_Node` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`Rectangle` POLYGON NOT NULL,
PRIMARY KEY (`Id`),
SPATIAL KEY `Rectangle` (`Rectangle`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `Spatial_Node` (`Rectangle`)
SELECT Polygon(LineString(
Point(X, Y),
Point(X_plus_Width, Y),
Point(X_plus_Width, Y_plus_Height),
Point(X, Y_plus_Height),
Point(X, Y)
))
FROM Visual_Node;
SELECT AsText(`Rectangle`) FROM Spatial_Node
WHERE MBRContains(Rectangle, Point(100001, 1000001));
+-----------------------------------------------------------------------------------------+
| AsText(`Rectangle`) |
+-----------------------------------------------------------------------------------------+
| POLYGON((100000 1000000,3000000 1000000,3000000 3000000,100000 3000000,100000 1000000)) |
+-----------------------------------------------------------------------------------------+
EXPLAIN SELECT AsText(`Rectangle`) FROM Spatial_Node
WHERE MBRContains(Rectangle, Point(100001, 1000001));
+----+-------------+--------------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | Spatial_Node | range | Rectangle | Rectangle | 32 | NULL | 1 | Using where |
+----+-------------+--------------+-------+---------------+-----------+---------+------+------+-------------+
I have no idea how the speed will compare, but I've definitely learned something new and exciting today. Thanks Naktibalda :-)
Have you tried changing the index to:
CREATE TABLE `Visual_Node` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`X` bigint(20) NOT NULL,
`Y` bigint(20) NOT NULL,
`X_plus_Width` bigint(20) DEFAULT NULL,
`Y_plus_Height` bigint(20) DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `X_Index` (`X`),
KEY `Y_Index` (`Y`),
KEY `X_Width_Index` (`X_plus_Width`),
KEY `Y_Height_Index` (`Y_plus_Height`)
) ENGINE=InnoDB AUTO_INCREMENT=4340743 DEFAULT CHARSET=latin1
Judging by your AI value, you'll probably want to test this with a smaller set of data.

MySQL syntax for Join Update

I have two tables that look like this
Train
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| TrainID | varchar(11) | NO | PRI | NULL | |
| Capacity | int(11) | NO | | 50 | |
+----------+-------------+------+-----+---------+-------+
Reservations
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| ReservationID | int(11) | NO | PRI | NULL | auto_increment |
| FirstName | varchar(30) | NO | | NULL | |
| LastName | varchar(30) | NO | | NULL | |
| DDate | date | NO | | NULL | |
| NoSeats | int(2) | NO | | NULL | |
| Route | varchar(11) | NO | | NULL | |
| Train | varchar(11) | NO | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
Currently, I'm trying to create a query that will increment the capacity on a Train if a reservation is cancelled. I know I have to perform a Join, but I'm not sure how to do it in an Update statement. For Example, I know how to get the capacity of a Train with given a certain ReservationID, like so:
select Capacity
from Train
Join Reservations on Train.TrainID = Reservations.Train
where ReservationID = "15";
But I'd like to construct the query that does this -
Increment Train.Capacity by ReservationTable.NoSeats given a ReservationID
If possible, I'd like to know also how to Increment by an arbitrary number of seats. As an aside, I'm planning on deleting the reservation after I perform the increment in a Java transaction. Will the delete effect the transaction?
Thanks for the help!
MySQL supports a multi-table UPDATE syntax, which would look approximately like this:
UPDATE Reservations r JOIN Train t ON (r.Train = t.TrainID)
SET t.Capacity = t.Capacity + r.NoSeats
WHERE r.ReservationID = ?;
You can update the Train table and delete from the Reservations table in the same transaction. As long as you do the update first and then do the delete second, it should work.
Here is another example of an UPDATE statement that contains joins to determine the value that is being updated. In this case, I want to update the transactions.payee_id with the related account payment id, if the payee_id is zero (wasn't assigned).
UPDATE transactions t
JOIN account a ON a.id = t.account_id
JOIN account ap ON ap.id = a.pmt_act_id
SET t.payee_id = a.pmt_act_id
WHERE t.payee_id = 0