After some expected growth on our service all of the sudden some updates are taking extremely long time, these used to be pretty fast until the table reached about 2MM records, now they take about 40-60 seconds each.
update table1 set field1=field1+1 where id=2229230;
Query OK, 0 rows affected (42.31 sec)
Rows matched: 1 Changed: 0 Warnings: 0
Here are the field types:
`id` bigint(20) NOT NULL auto_increment,
`field1` int(11) default '0',
Result from the profiling, for context switches which is the only one that seems to have high numbers on the results:
mysql> show profile context switches
-> ;
+----------------------+-----------+-------------------+---------------------+
| Status | Duration | Context_voluntary | Context_involuntary |
+----------------------+-----------+-------------------+---------------------+
| (initialization) | 0.000007 | 0 | 0 |
| checking permissions | 0.000009 | 0 | 0 |
| Opening tables | 0.000045 | 0 | 0 |
| System lock | 0.000009 | 0 | 0 |
| Table lock | 0.000008 | 0 | 0 |
| init | 0.000056 | 0 | 0 |
| Updating | 46.063662 | 75487 | 14658 |
| end | 2.803943 | 5851 | 857 |
| query end | 0.000054 | 0 | 0 |
| freeing items | 0.000011 | 0 | 0 |
| closing tables | 0.000008 | 0 | 0 |
| logging slow query | 0.000265 | 2 | 1 |
+----------------------+-----------+-------------------+---------------------+
12 rows in set (0.00 sec)
The table is about 2.5 million records, the id is the primary key, and it has one unique index on another field (not included in the update).
It's a innodb table.
any pointers on what could be the cause ?
Any particular variables that could help track the issue down ?
Is there a "explain" for updates ?
EDIT: Also I just noticed that the table also has a :
`modDate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
Explain:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | table1 | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.02 sec)
There's no way that query should take a long time, if id is really the primary key (unless you have lots and lots of ids equal to 2229230?). Please run the following two sqls and post the results:
show create table table1;
explain select * from table1 where id = 2229230;
Update: just to be complete, also do a
select count(*) from table1 where id = 2229230;
Ok after a few hours of tracking down this one, it seems the cause was a "duplicate index", the create table that I was doing to answer Keith, had this strange combo:
a unique key on fieldx
a key on fieldx
the second one is obviously redundant and useless, but after I dropped the key all updates went back to < 1 sec.
+1 to Keith and Ivan as their comments help me finally track down the issue.
May I recommend as well:
OPTIMIZE TABLE table1;
Sometimes your tables just need a little love, if they've grown rapidly and you haven't optimized them in a while the indices may be a little crazy. In fact, if you've got the time (and you do), it never hurts to throw in
REPAIR TABLE table1;
What helped for me was changing the engine from 'innodb' to 'myisam'.
My update query on a similar size dataset went from 100 ms to 0.1 ms.
Be aware that changing the engine type for an existing application might have consequences as InnoDB has better data integrity and some functions your application might be depending on.
For me it was worth losing InnoDB for a 1000 times speed increase on big datasets.
I'd come across this same issue, which turned out to be due to a trigger on the table. Any time an update was done on my table, the trigger would update another table which is were the delay was actually caused. Adding an index on that table fixed the issue. So try checking for triggers on the table.
Another thing to take into consideration when debugging update statements taking way too much time compared to the select statement version:
are the update statements executed within a transaction?
In my case a transaction turned the update statement(s) from a blazingly fast execution time to a extremely slow one. The more rows affected the heavier the loss of performance. My statements work with user defined variables, but don't know if that part of the 'problem'.
Related
I have a PostgreSQL (v10.10) / PostGIS (v2.5) table of the following structure
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------+------------------------------+-----------+----------+---------------------------------------------------+---------+--------------+-------------
id | integer | | not null | nextval('seq_carlocation'::regclass) | plain | |
timestamp | timestamp with time zone | | not null | | plain | |
location | postgis.geometry(Point,4326) | | | | main | |
speed | double precision | | | | plain | |
car_id | integer | | not null | | plain | |
timestamp_dvc | timestamp with time zone | | | | plain | |
Indexes:
"carlocation_pkey" PRIMARY KEY, btree (id)
"carlocation_pkey_car_id" btree (car_id)
Foreign-key constraints:
"carlocation_pkey_car_id_fk" FOREIGN KEY (car_id) REFERENCES car(id) DEFERRABLE INITIALLY DEFERRED
It's filled with location data through a Python/Django backend 1-5 times per second with a single insert per row
The problem:
insert sometimes takes several seconds, sometimes even minutes, here's an excerpt of pg_stat_statements of that query:
The db server is overloaded sometimes, but even when load is very low, insert time of this specific insert doesn't really drop. At the same time, other insert queries with the same or even higher frequency and much more indexes to keep updated do take just some milliseconds.
What I already tried:
removed all indexes except PK and FK (both timestamps and the location had one before)
re-created the complete table
re-created sequence
data is now archived (exported to S3 and deleted in table) once a month, so # of entries in table is max. 2 million
VACUUM ANALYZE after each archive
Any idea what could cause the slow performance of this specific insert not affecting others?
Every time I copy a specific query from slow log and do a EXPLAIN ANALZE nothing special is shown and both planning and execution time are far less than 1ms.
I'm looking for a best practice or solution, on a conceptual level, to a problem I'm working on.
I have a collection of data points (around 500) which are partially changed, by a user, over time. It is important to able to tell, which values have been changed at what point in time. The data might look like this:
Data changed over time:
+--------------------------------------------------------------------------------------+
| Date | Value no. 1 | Value no. 2 | Value no. 3 | ... | Value no. 500 |
|------------+---------------+---------------+---------------+-------+-----------------|
| 1/1/2018 | | | 2 | | 1 |
| 1/3/2018 | 2 | 1 | | | |
| 1/7/2018 | | | 4 | | 8 |
| 1/12/2018 | 5 | 3 | | | |
....
It must be possible to take a snapshot at a certain point in time, to get a complete set of data points, that were valid for that particular point in time, like this:
Snapshot taken 1/3/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 2 | 0 | 1 |
Snapshot taken 1/9/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 4 | 0 | 8 |
Snapshot taken 1/13/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 5 | 3 | 4 | 0 | 8 |
and so on...
I'm not bound by a particular database technology, so either SQL or NoSQL will do. It is probably not possible to satisfy all the requirements in the DB-domain - some will probably have to be addressed in code. But my main question is what database technology is best suited for this task?
I'm not quite sure this fits a time-series database (TSDB), since only a portion of the values are changed at a given time, and it is important to know which values changed. Maybe I'm wrong?
/Chris
My suggestion would be to model this in a sparse format, something like:
CREATE TABLE DataPoint (
DataID int, /* 1 to 500 in your example, or whatever you need to identify it*/
ValidFrom timestamp, /*default value 01/01/1970-00:00:00 or a suitable "Epoch" */
ValidUntil timestamp, /*default value 31/12/3999-00:00:00 or again something that is in the far future for your case */
value Number (7,5) /* again, this may be any data type, or even more than one field if needed, like Price & Currency
);
What we have just defined is a set of data and the "interval" in which each data has a specific value, so if you measured DataPoint 1 yesterday and got a value of 89.768 you will insert:
DataId=1
ValidFrom=26/11/2018-14:52:41
ValidUntil=31/12/3999-00:00:00
Value=89.768
Then you measure it again tomorrow and get:
DataId=1
ValidFrom=28/11/2018-14:51:23
ValidUntil=31/12/3999-00:00:00
Value=89.443
(Let assume that you have also logic so that when you record a new value you update the current value record and assign ValidUntil=28/11/2018-14:51:23 this is not really needed but will make the example query simpler).
One month from now you have accumulated more measurements for data #1, and the same, on different moments, for data #2 to 500.
You now want to find out what the values were at noon today (i.e. one month "ago") i.e. at 27/11/2018:12:00:00:00
Select DataID, Value from DataPoint where ValidFrom <= 27/11/2018:12:00:00 and ValidUntil > 27/11/2018:12:00:00
This will return:
001,89.768
002,45.678
...,...
500,112.809
Regarding logging who did this, or for what reason, you can either log it separately (saving for example DataPoint Id, Timestamp, UserId...) or make it part of the original table, so that whenever you register a new datapoint you also log who measured it.
Have a look at SQL Server temporal tables engine which may be a solution in your case. This approach allow to run the queries mentioned in the question, for example
SELECT *
FROM my_data
FOR SYSTEM_TIME AS OF '2018-01-01'
However, the table in the example seems to be very large (maybe denormalized). I would suggest to group columns by some technical or functional characteristics (vertical partitioning) to avoid further maintenance drawbacks.
This question already has answers here:
Closed 12 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Why is SELECT * considered harmful?
Probably a database nOOb question.
Our application has a table like the following
TABLE WF
Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| children | text | YES | | NULL | |
| w_id | int(11) | YES | | NULL | |
| f_id | int(11) | YES | | NULL | |
| filterable | tinyint(1) | YES | | 1 | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| status | smallint(6) | YES | | 1 | |
| visible | tinyint(1) | YES | | 1 | |
| weight | int(11) | YES | | NULL | |
| root | tinyint(1) | YES | | 0 | |
| mfr | tinyint(1) | YES | | 0 | |
+--------------------+-------------+------+-----+---------+----------------+
This table is expected to be upwards of ten million records. The schema is not expected to change much. I need to retrieve the columns f_id, children, status, visible, weight, root, mfr.
Which approach is faster for data retrieval?
1) Select * from WF where w_id = 1 AND status = 1;
I will strip the unnecessary columns in the application layer.
2) Select children,f_id,status,visible,weight,root,mfr from WF where w_id = 1 AND status = 1;
There is no need to strip the unnecessary columns as its pre-selected in the query.
Does any one have a real life benchmark as to which is faster. I know some say Select * is evil, but will MySQL respond faster while trying to get the whole chunk as opposed to retrieving selective columns?
I am using MySQL version: 5.1.37-1ubuntu5 (Ubuntu) and the application is Rails3 app.
As an example of how a select statement that includes a subset of columns can be significantly faster, it can use a covering index on the table that includes just those columns, potentially resulting in much better query performance.
If you return fewer columns there is less data to go across the network and less data for the database to process and it will almost always return faster. Databases also tend to be slower using select * because the database then has to go figure out what the columns are and thus do more work than when you specify. Further select * will often return bad results if the structure changes significantly. It may end up showing the user fields you don;t want them to see or if someone is silly enough to rearrange the columns, then the application may actually appear to show things in the wrong order or if doing an insert from the data, put them in the wrong column. It is almost alawys a poor practice to use selct * in production code.
Explain
SELECT `feed_objects`.*
FROM `feed_objects`
WHERE (`feed_objects`.feed_id IN
( 165,160,159,158,157,153,152,151,150,149,148,147,129,128,127,126,125,124,122,
121, 120,119,118,117,116,115,114,113,111,110)) ;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | feed_objects | ALL | by_feed_id | NULL | NULL | NULL | 188 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
Not used index by_feed_id
But when I point less than the values in the WHERE - everything is working right
Explain
SELECT `feed_objects`.*
FROM `feed_objects`
WHERE (`feed_objects`.feed_id IN
(165,160,159,158,157,153,152,151,150,149,148,147,129,128,127,125,124)) ;
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | feed_objects | range | by_feed_id | by_feed_id | 9 | NULL | 18 | Using where |
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
Used index by_feed_id
What is the problem?
The MySQL optimizer makes a lot of decisions that sometimes look strange. In this particular case, I believe that you have a very small table (188 rows total from the looks of the first EXPLAIN), and that is affecting the optimizer's decision.
The "How MySQL Uses Indexes" manual pages offers this snippet of info:
Sometimes MySQL does not use an index,
even if one is available. One
circumstance under which this occurs
is when the optimizer estimates that
using the index would require MySQL to
access a very large percentage of the
rows in the table. (In this case, a
table scan is likely to be much faster
because it requires fewer seeks.)
Because the number of ids in your first WHERE clause is relatively large compared to the table size, MySQL has determined that it is faster to scan the data than to consult the index, as the data scan is likely to result in less disk access time.
You could test this by adding rows to the table and re-running the first EXPLAIN query. At a certain point, MySQL will start using the index for it as well as the second query.
EXPLAIN SELECT
*
FROM
content_link link
STRAIGHT_JOIN
content
ON
link.content_id = content.id
WHERE
link.content_id = 1
LIMIT 10;
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
| 1 | SIMPLE | link | ref | content_id | content_id | 4 | const | 1 | |
| 1 | SIMPLE | content | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
However, when I remove the WHERE, the query stops using the key (even when i explicitly force it to)
EXPLAIN SELECT
*
FROM
content_link link FORCE KEY (content_id)
STRAIGHT_JOIN
content
ON
link.content_id = content.id
LIMIT 10;
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
| 1 | SIMPLE | link | index | content_id | PRIMARY | 7 | NULL | 4555299 | Using index |
| 1 | SIMPLE | content | eq_ref | PRIMARY | PRIMARY | 4 | ft_dir.link.content_id | 1 | |
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
Are there any work-arounds to this?
I realize I'm selecting the entire table in the second example, but why does mysql suddenly decide that it's going to ignore my FORCE anyway and not use the key? Without the key the query takes like 10 minutes.. ugh.
FORCE is a bit of a misnomer. Here's what the MySQL docs say (emphasis mine):
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Since you aren't actually "finding" any rows (you are selecting them all), a table scan is always going to be fastest, and the optimizer is smart enough to know that in spite of what you are telling them.
ETA:
Try adding an ORDER BY on the primary key once and I bet it'll use the index.
An index helps search quickly inside a table, but it just slows things down if you select the entire table. So MySQL is correct in ignoring the index.
In your case, maybe the index has a hidden side effect that's not known to MySQL. For example, if the inner join holds only for a few rows, an index would speed things up. But MySQL can't know that without an explicit hint.
There is an exception: when every column you select is inside the index, the index is still useful if you select every row. For example, if you have an index on LastName, the following query still benefits from the index:
select LastName from orders
But this one won't:
select * from Orders
Your content_id seems to accept NULL values.
MySQL optimizer thinks there is no guarantee that your query will return all values only by using the index (though actually there is guarantee, since you use the column in a JOIN)
That's why it reverts to full table scan.
Either add a NOT NULL condition:
SELECT *
FROM content_link link FORCE KEY (content_id)
STRAIGHT_JOIN
content
ON content.id = link.content_id
WHERE link.content_id IS NOT NULL
LIMIT 10;
or mark your column as NOT NULL:
ALTER TABLE content_link MODIFY content_id NOT NULL
Update:
This is verified bug 45314 in MySQL.