Response Slow when Order BY is added to my SQL Query - sql

I have the following job_requests table schema as shown here
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| available_to. | integer[] | NO | | | |
| available_type | varchar(255) | NO | | NULL | |
| start_at | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
I have the following query to return a list of records and order them by the type_of_pool value
WITH matching_jobs AS (
SELECT
job_requests_with_distance.*,
CASE WHEN (users.id = ANY (available_to) AND available_type = 0) THEN 'favourite'
ELSE 'normal'
END AS type_of_pool
FROM (
SELECT
job_requests.*,
users.id AS user_id,
FROM
job_requests,
users) AS job_requests_with_distance
LEFT JOIN users ON users.id = user_id
WHERE start_at > NOW() at time zone 'Asia/Kuala_Lumpur'
AND user_id = 491
AND (user_id != ALL(coalesce(unavailable_to, array[]::int[])))
)
SELECT
*
FROM
matching_jobs
WHERE (type_of_pool != 'normal')::BOOLEAN
ORDER BY
array_position (ARRAY['favourite','exclusive','normal']::text[], type_of_pool),
LIMIT 30
If i remove the ORDER BY function, it takes about 3ms but when I add the ORDER BY function, it takes about 1.3seconds to run.
Not sure how do i optimize this query to make it faster? I have read using Indexes and all but not sure how an index will help in this scenario.
Any help is appreciated.

Related

Why query is still so fast when I operate a non-indexing column?

I am learning indexing of database.
here are indexings of a table. And this table has 330k records.
mysql> show index from employee;
+----------+------------+-------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+-------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee | 0 | PRIMARY | 1 | id | A | 297383 | NULL | NULL | | BTREE | | | YES | NULL |
| employee | 0 | ak_employee | 1 | personal_code | A | 297383 | NULL | NULL | | BTREE | | | YES | NULL |
| employee | 1 | idx_email | 1 | email | A | 297383 | NULL | NULL | | BTREE | | | YES | NULL |
+----------+------------+-------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
as you can see, there are only three indexing on this table.
Now I want to query with where on birth_date column, I think it will be very slow because there is no indexing on birth-date column, I when I try query, I found it is very fast.
mysql> select sql_no_cache *
-> from employee
-> where birth_date > '1955-11-11'
-> limit 100
-> ;
100 rows in set, 1 warning (0.04 sec)
So I am confused:
why it is still so fast without indexing?
due to its still fast, why do we still need indexing?
This is your query:
select sql_no_cache *
from employee
where birth_date > '1955-11-11'
limit 100
There are no indexes so the query starts reading the data from the data pages. On each record, it compares the birthdate and returns the row. When it finds 100 (due to the limit) it stops.
Presumably, it finds 100 rows quite quickly. After all, the median age of the United States is about 38 -- which is (as I write this) a birth year of 1981. By far, most people were born after 1955.
The query would be much slower if you had an order by or group by. That would require reading all the data before returning anything.

SQL query to find the time difference

I want to find the time difference between the login and logout of a user.
The hard part for me is that both login and logout time is in one column and there is a "Status" column to show if it is login/logout.
Example:
Timestamp Status UserName
2015-04-26 20:12:33 Login Grashia
2015-04-26 23:22:13 Logout Grashia
How do I query this?
I tried the DATEDIFF function but I know thats not the right way.
Suppose you have the following table schema
CREATE TABLE `user_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`dt` datetime DEFAULT NULL,
`status` varchar(15) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
);
with this sort of data
+----+---------+---------------------+--------+
| id | user_id | dt | status |
+----+---------+---------------------+--------+
| 1 | 1 | 2015-09-23 08:35:36 | Login |
| 2 | 1 | 2015-09-23 17:15:44 | Logout |
| 3 | 1 | 2015-09-24 08:55:18 | Login |
| 4 | 2 | 2015-09-23 09:00:16 | Login |
| 5 | 2 | 2015-09-23 18:00:23 | Logout |
+----+---------+---------------------+--------+
You can use this query
SELECT i.user_id, i.dt AS 'login_dt', IFNULL(o.dt, '-') AS 'logout_dt',
TIMEDIFF(IFNULL(o.dt, NOW()), i.dt) AS 'total_time'
FROM
(SELECT * FROM user_log WHERE `status`='Login') i
LEFT OUTER JOIN (SELECT * FROM user_log WHERE `status`='Logout') o
ON i.user_id=o.user_id AND DATE(i.dt)=DATE(o.dt)
to get this result
+---------+---------------------+---------------------+------------+
| user_id | login_dt | logout_dt | total_time |
+---------+---------------------+---------------------+------------+
| 1 | 2015-09-23 08:35:36 | 2015-09-23 17:15:44 | 08:40:08 |
| 1 | 2015-09-24 08:55:18 | - | 00:10:23 |
| 2 | 2015-09-23 09:00:16 | 2015-09-23 18:00:23 | 09:00:07 |
+---------+---------------------+---------------------+------------+
You have to add required indices and set proper table engine for optimum performance
if it is SQL server your query will look like this:
I suppose that there should be a column userid:
You need to join two table to make compare data in the same column
SELECT in. username, DATEDIFF(minute, in.timestamp, out.timestamp)
FROM
(SELECT username, timestamp FROM logtable WHERE status = 'Login') as IN
INNER JOIN
(SELECT username, timestamp FROM logtable WHERE status = 'logout') AS OUT
ON in.userid = out.useriD
you can read more about DATEDIFF function here:
https://msdn.microsoft.com/en-us/library/ms189794.aspx

Rewriting this subquery?

I am trying to build a new table such that the values in the existing table are NOT contained (but obviously the following checks for contained) in another table. Following is my table structure:
mysql> explain t1;
+-----------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| point | bigint(20) unsigned | NO | MUL | 0 | |
+-----------+---------------------+------+-----+---------+-------+
mysql> explain whitelist;
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| x | bigint(20) unsigned | YES | | NULL | |
| y | bigint(20) unsigned | YES | | NULL | |
| geonetwork | linestring | NO | MUL | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
My query looks like this:
SELECT point
FROM t1
WHERE EXISTS(SELECT source
FROM whitelist
WHERE MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))));
Explain:
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| 1 | PRIMARY | t1 | index | NULL | point | 8 | NULL | 1001 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | whitelist | ALL | _geonetwork | NULL | NULL | NULL | 3257 | Using where |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
The query is taking 6 seconds to execute for 1000 records in t1 which is unacceptable for me. How can I rewrite this query using Joins (or perhaps a faster way if that exists) if I don't have a column to join on? Even a stored procedure is acceptable I guess in the worst case. My goal is to finally create a new table containing entries from t1. Any suggestions?
Unless the query optimizer is failing, a WHERE EXISTS construct should result in the same plan as a join with a GROUP clause. Look at optimizing MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)')))), that's probably where your query is spending all its time. I don't have a suggestion for that, but here's your query written with a JOIN:
Select t1.point
from t1
join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
group by t1.point
;
or to get the points in t1 not in whitelist:
Select t1.point
from t1
left join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
where whitelist.id is null
;
This seems like a case where de-nomalizing t1 might be beneficial. Adding a GeomFrmTxt column with a value of GeomFromText(CONCAT('POINT(', t1.point, ' 0)')) could speed up the query you already have.

MySQL syntax for Join Update

I have two tables that look like this
Train
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| TrainID | varchar(11) | NO | PRI | NULL | |
| Capacity | int(11) | NO | | 50 | |
+----------+-------------+------+-----+---------+-------+
Reservations
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| ReservationID | int(11) | NO | PRI | NULL | auto_increment |
| FirstName | varchar(30) | NO | | NULL | |
| LastName | varchar(30) | NO | | NULL | |
| DDate | date | NO | | NULL | |
| NoSeats | int(2) | NO | | NULL | |
| Route | varchar(11) | NO | | NULL | |
| Train | varchar(11) | NO | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
Currently, I'm trying to create a query that will increment the capacity on a Train if a reservation is cancelled. I know I have to perform a Join, but I'm not sure how to do it in an Update statement. For Example, I know how to get the capacity of a Train with given a certain ReservationID, like so:
select Capacity
from Train
Join Reservations on Train.TrainID = Reservations.Train
where ReservationID = "15";
But I'd like to construct the query that does this -
Increment Train.Capacity by ReservationTable.NoSeats given a ReservationID
If possible, I'd like to know also how to Increment by an arbitrary number of seats. As an aside, I'm planning on deleting the reservation after I perform the increment in a Java transaction. Will the delete effect the transaction?
Thanks for the help!
MySQL supports a multi-table UPDATE syntax, which would look approximately like this:
UPDATE Reservations r JOIN Train t ON (r.Train = t.TrainID)
SET t.Capacity = t.Capacity + r.NoSeats
WHERE r.ReservationID = ?;
You can update the Train table and delete from the Reservations table in the same transaction. As long as you do the update first and then do the delete second, it should work.
Here is another example of an UPDATE statement that contains joins to determine the value that is being updated. In this case, I want to update the transactions.payee_id with the related account payment id, if the payee_id is zero (wasn't assigned).
UPDATE transactions t
JOIN account a ON a.id = t.account_id
JOIN account ap ON ap.id = a.pmt_act_id
SET t.payee_id = a.pmt_act_id
WHERE t.payee_id = 0

MySQL join with sort on datetime in joined table

I have 2 large mysql tables: Articles and ArticleTopics. I want to query the DB and retrieve the last 30 articles published for a given topicID. My current query is rather slow. Any ideas on how to improve it?
More details:
The tables:
Articles (~1 million rows)
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| articleId | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| pubDate | datetime | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
ArticleTopics (~10 million rows)
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| articleId | int(11) | NO | MUL | NULL | |
| topicId | int(11) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
And my query:
SELECT a.articleId, a.pubDate
FROM Articles a, ArticleTopics t
WHERE t.articleId=a.articleId AND t.topicId=3364
ORDER BY a.pubDate DESC LIMIT 30;
And the EXPLAIN of the query:
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | t | ref | articleId,topicId,topicId_articleId | topicId_articleId | 4 | const | 4281 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,articleId_pubDate | PRIMARY | 4 | t.articleId | 1 | |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
The slowness, I believe, is coming from the ORDER BY a.pubDate DESC. I can greatly improve performance by faking it a bit by instead doing an ORDER BY t.articleId DESC and having an index in ArticleTopics on both articleId & topicId, since in general, the articleIds are in the same order as pubDates. They are not always, however, so it's not ideal. I'd like to be able to sort it on the pubDate.
Update: Added EXPLAIN.
You can rewrite the query in various ways to see if it speeds things up:
SELECT a.articleId, a.pubDate
FROM Articles a
WHERE a.articleId in (
select articleId
from ArticleTopics
where topicId = 3364
)
ORDER BY a.pubDate DESC LIMIT 30;
Or:
SELECT a.articleId, a.pubDate
FROM Articles a
INNER JOIN ArticleTopics t ON t.articleId = a.articleId
WHERE t.topicId = 3364
ORDER BY a.pubDate DESC LIMIT 30;
The important index for both queries is on Articles, and contains articleId as first field.
If article is a large table, with say the entire PDF in binary, you can create an index that fully covers the query. Full coverage means all selected fields are part of the index. For this query, a fully covering index would be (articleId, pubDate).
At this point, do you have an index on topicId? If so, does the index contain only the topicId field?
And maybe you can post the output of the EXPLAIN query.