MySQL: order by and limit gives wrong result

MySQL: order by and limit gives wrong result - sql

MySQL ver 5.1.26
I'm getting the wrong result with a select that has where, order by and limit clauses.
It's only a problem when the order by uses the id column.
I saw the MySQL manual for LIMIT Optimization
My guess from reading the manual is that there is some problem with the index on the primary key, id. But I don't know where I should go from here...
Question: what should I do to best solve the problem?
Works correctly:
mysql> SELECT id, created_at FROM billing_invoices
WHERE (billing_invoices.account_id = 5) ORDER BY id DESC ;
+------+---------------------+
| id | created_at |
+------+---------------------+
| 1336 | 2010-05-14 08:05:25 |
| 1334 | 2010-05-06 08:05:25 |
| 1331 | 2010-05-05 23:18:11 |
+------+---------------------+
3 rows in set (0.00 sec)
WRONG result when limit added! Should be the first row, id - 1336
mysql> SELECT id, created_at FROM billing_invoices
WHERE (billing_invoices.account_id = 5) ORDER BY id DESC limit 1;
+------+---------------------+
| id | created_at |
+------+---------------------+
| 1331 | 2010-05-05 23:18:11 |
+------+---------------------+
1 row in set (0.00 sec)
Works correctly:
mysql> SELECT id, created_at FROM billing_invoices
WHERE (billing_invoices.account_id = 5) ORDER BY created_at DESC ;
+------+---------------------+
| id | created_at |
+------+---------------------+
| 1336 | 2010-05-14 08:05:25 |
| 1334 | 2010-05-06 08:05:25 |
| 1331 | 2010-05-05 23:18:11 |
+------+---------------------+
3 rows in set (0.01 sec)
Works correctly with limit:
mysql> SELECT id, created_at FROM billing_invoices
WHERE (billing_invoices.account_id = 5) ORDER BY created_at DESC limit 1;
+------+---------------------+
| id | created_at |
+------+---------------------+
| 1336 | 2010-05-14 08:05:25 |
+------+---------------------+
1 row in set (0.01 sec)
Additional info:
explain SELECT id, created_at FROM billing_invoices WHERE (billing_invoices.account_id = 5) ORDER BY id DESC limit 1;
+----+-------------+------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+-------------+
| 1 | SIMPLE | billing_invoices | range | index_billing_invoices_on_account_id | index_billing_invoices_on_account_id | 4 | NULL | 3 | Using where |
+----+-------------+------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+-------------+
Added SHOW CREATE TABLE billing_invoices result:
Table -- billing_invoices
Create Table --
CREATE TABLE `billing_invoices` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`invoice_date` date NOT NULL,
`prior_invoice_id` int(11) DEFAULT NULL,
`closing_balance` decimal(8,2) NOT NULL,
`note` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`monthly_invoice` tinyint(1) NOT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_billing_invoices_on_account_id` (`account_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1337 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Added more:
I now see that on my development machine, everything is working correctly. That machine has version VERSION() of 5.1.26-rc-log
On my production machine, where the problem is, I see that VERSION() returns 5.1.26-rc-percona-log
So at this point, I'm thinking the problem is with the percona software?
Added more:
At this point, I'm going to consider it a bug in the Percona InnoDB driver. I've put a question to their forum. As an immediate work-around, I'm going to order by created_at. I will also investigate upgrading the db on my system and see if that helps.
My thanks to Rabbott and mdma for their help. I also appreciate the help that I'm not doing something silly, this really is a problem.

Could be this bug that was never resolved for your updated version? http://bugs.mysql.com/bug.php?id=31001
I am running 5.1.42 locally. I copy and pasted your queries from above and am getting all the correct results.. Whether it be the bug mentioned above or not, it sounds like a bug, and it appears to have been fixed in a more recent release than yours..

Seems peculiar, maybe a bug? As a workarount maybe you can make the selection explicit - use a subquery to select the MAX(id) and filter on that in a WHERE clause. E.g.
SELECT id, created_at FROM billing_invoices
WHERE id IN (SELECT MAX(id) FROM billing_invoices WHERE account_id=5)

From here,
Bug Details
It seems it was fixed in 5.1.28 :
[22 Jul 2008 20:34] Bugs System
Pushed into 5.1.28
However, I'm noticing the same problem in my version: 5.1.41-3ubuntu12.8

Related

Response Slow when Order BY is added to my SQL Query

I have the following job_requests table schema as shown here
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| available_to. | integer[] | NO | | | |
| available_type | varchar(255) | NO | | NULL | |
| start_at | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
I have the following query to return a list of records and order them by the type_of_pool value
WITH matching_jobs AS (
SELECT
job_requests_with_distance.*,
CASE WHEN (users.id = ANY (available_to) AND available_type = 0) THEN 'favourite'
ELSE 'normal'
END AS type_of_pool
FROM (
SELECT
job_requests.*,
users.id AS user_id,
FROM
job_requests,
users) AS job_requests_with_distance
LEFT JOIN users ON users.id = user_id
WHERE start_at > NOW() at time zone 'Asia/Kuala_Lumpur'
AND user_id = 491
AND (user_id != ALL(coalesce(unavailable_to, array[]::int[])))
)
SELECT
*
FROM
matching_jobs
WHERE (type_of_pool != 'normal')::BOOLEAN
ORDER BY
array_position (ARRAY['favourite','exclusive','normal']::text[], type_of_pool),
LIMIT 30
If i remove the ORDER BY function, it takes about 3ms but when I add the ORDER BY function, it takes about 1.3seconds to run.
Not sure how do i optimize this query to make it faster? I have read using Indexes and all but not sure how an index will help in this scenario.
Any help is appreciated.

Postgresql row gets hidden after update, can't select it

I'm having this problem where my rows are being "hidden" (or SELECT * doesn't retrieve them) after updating.
I've tried with CLI and with a GUI (DBeaver) but the result it's the same, here is an example:
select * from users limit 4;
id | email | password | status | role | created_at | updated_at
----+--------------------------------+-----------------+----------+--------+----------------------------+----------------------------
8 | Brad.Bailey#gmail.com | qYHsmrKWaiaiZxI | disabled | seller | 2019-09-06 21:43:08.043-03 | 2019-08-13 16:04:25.233-03
9 | Marcelino_Prohaska97#gmail.com | sUMuOM_gXCPxz19 | disabled | seller | 2019-06-14 15:39:45.447-03 | 2019-06-25 12:54:01.023-03
10 | Gino_Blick#gmail.com | iOkZQhc7JSsQcpY | disabled | seller | 2020-02-13 13:39:16.26-03 | 2019-12-18 17:02:37.938-03
11 | Tiffany.Schuster16#yahoo.com | Bw2OhPUtIRcWxZF | active | seller | 2018-07-30 08:01:29.942-03 | 2019-09-03 10:50:40.314-03
(4 rows)
Then
update users set email = 'test#test.com' where id = 8;
UPDATE 1
And then, this happens:
select * from users limit 4;
id | email | password | status | role | created_at | updated_at
----+--------------------------------+-----------------+----------+--------+----------------------------+----------------------------
9 | Marcelino_Prohaska97#gmail.com | sUMuOM_gXCPxz19 | disabled | seller | 2019-06-14 15:39:45.447-03 | 2019-06-25 12:54:01.023-03
10 | Gino_Blick#gmail.com | iOkZQhc7JSsQcpY | disabled | seller | 2020-02-13 13:39:16.26-03 | 2019-12-18 17:02:37.938-03
11 | Tiffany.Schuster16#yahoo.com | Bw2OhPUtIRcWxZF | active | seller | 2018-07-30 08:01:29.942-03 | 2019-09-03 10:50:40.314-03
12 | Brody_Pollich#yahoo.com | ZlFy3kEUSrmxHAa | disabled | seller | 2018-07-06 13:18:29.936-03 | 2019-08-03 21:46:22.296-03
(4 rows)
The thing is, the row still exists, but it is not shown by SELECT *
select * from users where id = 8 limit 10;
id | email | password | status | role | created_at | updated_at
----+---------------+-----------------+----------+--------+----------------------------+----------------------------
8 | test#test.com | qYHsmrKWaiaiZxI | disabled | seller | 2019-09-06 21:43:08.043-03 | 2019-08-13 16:04:25.233-03
(1 row)
Here is the structure of my table (code by Dbeaver) and the version of my Postgresql:
CREATE TABLE public.users (
id serial NOT NULL,
email varchar(255) NOT NULL,
"password" varchar(255) NOT NULL,
status varchar(255) NOT NULL,
"role" varchar(255) NOT NULL,
created_at timestamptz NOT NULL,
updated_at timestamptz NOT NULL,
CONSTRAINT users_email_unique UNIQUE (email),
CONSTRAINT users_pkey PRIMARY KEY (id)
);
SELECT version();
version
--------------------------------------------------------------------------------------------
-------------------------------------
PostgreSQL 12.3 (Ubuntu 12.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)
Thanks.

Remove limit and apply order by:
select * from users order by id;
Is ID = 8 back?
Documentation says:
Because the order of the rows in the database table is unspecified, when you use the LIMIT clause, you should always use the ORDER BY clause to control the row order. If you don’t do so, you will get a result set whose rows are in an unspecified order.
which means that you saw record with ID = 8 by accident; no guarantee you'll get it again just with the limit clause.

Not getting results even with long datetime formats in SQL

I am trying to get some data that has datetime set, using the long format(ex: 2019-04-26 18:02:42).
When I use the following query I expected to find the following entry:
SELECT ip, cam_id
FROM test_table
WHERE ( date_time >= '2019-04-26 20:00:00' AND date_time <'2019-04-26 20:59:59' );
Entry:
id | ip | cam_id | date_time
-----+----+--------+---------------------
1 | 13 | 2 | 2019-04-26 20:46:06
However I am not getting any results. What am I doing wrong?
EDIT: Table schema
Kolon | Veri tipi | S²ralama (collation) | Bo■ (null) olabilir | Varsay²lan | Saklama | Stats hedefi | A²klama
-----------+------------------------+----------------------+---------------------+----------------------------------------+----------+--------------+----------
id | integer | | not null | nextval('test_table_id_seq'::regclass) | plain | |
ip | integer | | | | plain | |
cam_id | integer | | | | plain | |
date_time | character varying(255) | | | | extended | |
¦ndeksler:
"test_table_pkey" PRIMARY KEY, btree (id)

If you are new to postgresql, you should start first by reading postgresql manual and examples only. Dont use any kind of third party or unrelated code and sql generators, especialy unrelated to postgresql, those will only confuse you.
Currently, your query is comparing strings not datetime.
if you run this query, it will change date_time columns character varying(255) type to timestamp one, then your query will run properly:
alter table test_table alter column date_time TYPE timestamp without time zone using date_time::timestamp without time zone

You have to convert string to datetime format:
CONVERT(datetime, YOUR_COLUMN, 'yyyy-mm-dd hh:mm')

Optimize GROUP BY after ranged index query

I have a content application that needs to count responses in a time slice, then order them by number of responses. It currently works great with a small data set, but needs to scale to millions rows. My current query won't work.
mysql> describe Responses;
+---------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| site_id | int(10) unsigned | NO | MUL | NULL | |
| content_id | bigint(20) unsigned | NO | PRI | NULL | |
| response_id | bigint(20) unsigned | NO | PRI | NULL | |
| date | int(10) unsigned | NO | | NULL | |
+---------------+---------------------+------+-----+---------+-------+
The table type is InnoDB, the primary key is on (content_id, response_id). There is an additional index on (content_id, date) used to find responses to a piece of content, and another additional index on (site_id, date) used in the query I am have trouble with:
mysql> explain SELECT content_id id, COUNT(response_id) num_responses
FROM Responses
WHERE site_id = 1
AND date > 1234567890
AND date < 1293579867
GROUP BY content_id
ORDER BY num_responses DESC
LIMIT 0, 10;
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| 1 | SIMPLE | Responses | range | date | date | 8 | NULL | 102 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
That's the best I've been able to come up with, but it will end up being in the 1,000,000's of rows needing to be counted, resulting in 10,000's of rows to sort, to pull in a handful of results.
I can't think of a way to precalculate the count either, as the date range is arbitrary. I have some liberty with changing the primary key: it can be composed of content_id, response_id, and site_id in any order, but cannot contain date.
The application is developed mostly in PHP, so if there is an quicker way to accomplish the same results by splitting the query into subqueries, using temporary tables, or doing things on the application side, I'm open to suggestions.

(Reposted from comments by request)
Set up a table that has three columns: id, date, and num_responses. The column num_responses consists of the number of responses for the given id on the given date. Backfill the table appropriately, and then at around midnight (or later) each night, run a script that adds a new row for the previous day.
Then, to get the rows you want, you can merely query the table mentioned above.

Rather than calculating each time, how about cache the calculated count since the last query, and add the increment of count to update the cache by putting date condition into the WHERE clause?

Have you considered partitioning the table by date? Are there any indices on the table?

MySQL join with sort on datetime in joined table

I have 2 large mysql tables: Articles and ArticleTopics. I want to query the DB and retrieve the last 30 articles published for a given topicID. My current query is rather slow. Any ideas on how to improve it?
More details:
The tables:
Articles (~1 million rows)
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| articleId | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| pubDate | datetime | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
ArticleTopics (~10 million rows)
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| articleId | int(11) | NO | MUL | NULL | |
| topicId | int(11) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
And my query:
SELECT a.articleId, a.pubDate
FROM Articles a, ArticleTopics t
WHERE t.articleId=a.articleId AND t.topicId=3364
ORDER BY a.pubDate DESC LIMIT 30;
And the EXPLAIN of the query:
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | t | ref | articleId,topicId,topicId_articleId | topicId_articleId | 4 | const | 4281 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,articleId_pubDate | PRIMARY | 4 | t.articleId | 1 | |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
The slowness, I believe, is coming from the ORDER BY a.pubDate DESC. I can greatly improve performance by faking it a bit by instead doing an ORDER BY t.articleId DESC and having an index in ArticleTopics on both articleId & topicId, since in general, the articleIds are in the same order as pubDates. They are not always, however, so it's not ideal. I'd like to be able to sort it on the pubDate.
Update: Added EXPLAIN.

You can rewrite the query in various ways to see if it speeds things up:
SELECT a.articleId, a.pubDate
FROM Articles a
WHERE a.articleId in (
select articleId
from ArticleTopics
where topicId = 3364
)
ORDER BY a.pubDate DESC LIMIT 30;
Or:
SELECT a.articleId, a.pubDate
FROM Articles a
INNER JOIN ArticleTopics t ON t.articleId = a.articleId
WHERE t.topicId = 3364
ORDER BY a.pubDate DESC LIMIT 30;
The important index for both queries is on Articles, and contains articleId as first field.
If article is a large table, with say the entire PDF in binary, you can create an index that fully covers the query. Full coverage means all selected fields are part of the index. For this query, a fully covering index would be (articleId, pubDate).

At this point, do you have an index on topicId? If so, does the index contain only the topicId field?
And maybe you can post the output of the EXPLAIN query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

MySQL: order by and limit gives wrong result - sql

Seems peculiar, maybe a bug? As a workarount maybe you can make the selection explicit - use a subquery to select the MAX(id) and filter on that in a WHERE clause. E.g. SELECT id, created_at FROM billing_invoices WHERE id IN (SELECT MAX(id) FROM billing_invoices WHERE account_id=5)

From here, Bug Details It seems it was fixed in 5.1.28 : [22 Jul 2008 20:34] Bugs System Pushed into 5.1.28 However, I'm noticing the same problem in my version: 5.1.41-3ubuntu12.8

Related

Response Slow when Order BY is added to my SQL Query

Postgresql row gets hidden after update, can't select it

Not getting results even with long datetime formats in SQL

Optimize GROUP BY after ranged index query

MySQL join with sort on datetime in joined table

Categories

Resources