I have a PostgreSQL (v10.10) / PostGIS (v2.5) table of the following structure
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------+------------------------------+-----------+----------+---------------------------------------------------+---------+--------------+-------------
id | integer | | not null | nextval('seq_carlocation'::regclass) | plain | |
timestamp | timestamp with time zone | | not null | | plain | |
location | postgis.geometry(Point,4326) | | | | main | |
speed | double precision | | | | plain | |
car_id | integer | | not null | | plain | |
timestamp_dvc | timestamp with time zone | | | | plain | |
Indexes:
"carlocation_pkey" PRIMARY KEY, btree (id)
"carlocation_pkey_car_id" btree (car_id)
Foreign-key constraints:
"carlocation_pkey_car_id_fk" FOREIGN KEY (car_id) REFERENCES car(id) DEFERRABLE INITIALLY DEFERRED
It's filled with location data through a Python/Django backend 1-5 times per second with a single insert per row
The problem:
insert sometimes takes several seconds, sometimes even minutes, here's an excerpt of pg_stat_statements of that query:
The db server is overloaded sometimes, but even when load is very low, insert time of this specific insert doesn't really drop. At the same time, other insert queries with the same or even higher frequency and much more indexes to keep updated do take just some milliseconds.
What I already tried:
removed all indexes except PK and FK (both timestamps and the location had one before)
re-created the complete table
re-created sequence
data is now archived (exported to S3 and deleted in table) once a month, so # of entries in table is max. 2 million
VACUUM ANALYZE after each archive
Any idea what could cause the slow performance of this specific insert not affecting others?
Every time I copy a specific query from slow log and do a EXPLAIN ANALZE nothing special is shown and both planning and execution time are far less than 1ms.
Related
I have a Postgresql-database with more than 100 billion rows in one table.
The table schema is as follow:
id_1 | integer | | not null |
id_2 | bigint | | not null |
created_at | timestamp without time zone | | not null |
id_3 | bigint | | |
char1 | character varying(20) | | not null |
lang | character(6) | | not null |
gps | point | | |
some_dat | character varying(140)[] | | |
JSON | jsonb | | not null |
I'm trying to search inside the JSON object and sort the data by the JSON object but the problem is that it takes too much time for sorting and returning the data.
Also when sorting the data by created_at for example, it takes also time for the result.
I'm trying to make my application as a real-time as I can.
I have 2 indexing for id_1 and id_2
Also, I tried to use materialized view for each (id) but the problem is updating the materialized view takes much time also.
Any suggestions please?
I'm running PostgreSQL 10.3, on a Linux server with SSD and 128 GB of ram.
Thanks,
If you want to sort a query result with an expression like this:
ORDER BY expr1, expr2, ...
You need the following index to speed up the sorting:
CREATE INDEX ON atable ((expr1), (expr2), ...);
If that does not work because the expressions contain functions that are not IMMUTABLE, you cannot speed up the sort with an index. In that case, consider rewriting your query with IMMUTABLE expressions.
I've read some information about the ugly side of just setting a deleted_at field in your tables to signify a row has been deleted.
Namely
http://richarddingwall.name/2009/11/20/the-trouble-with-soft-delete/
Are there any potential problems with taking a row from a table you want to delete and pivoting it into some EAV tables?
For instance.
Lets Say I have two tables deleted and deleted_row respectively described as follows.
mysql> describe deleted;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| tablename | varchar(255) | YES | | NULL | |
| deleted_at | timestamp | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
mysql> describe deleted_rows;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| entity | int(11) | YES | MUL | NULL | |
| name | varchar(255) | YES | | NULL | |
| value | blob | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
Now when you wanted to delete a row from any table you would delete it from the table then insert it into these tables as such.
deleted
+----+-----------+---------------------+
| id | tablename | deleted_at |
+----+-----------+---------------------+
| 1 | products | 2011-03-23 00:00:00 |
+----+-----------+---------------------+
deleted_row
+----+--------+-------------+-------------------------------+
| id | entity | name | value |
+----+--------+-------------+-------------------------------+
| 1 | 1 | Title | A Great Product |
| 2 | 1 | Price | 55.00 |
| 3 | 1 | Description | You guessed it... it's great. |
+----+--------+-------------+-------------------------------+
A few things I see off the bat.
You'll need to use application logic
to do the pivot (Ruby, PHP, Python,
etc)
The table could grow pretty big
because I'm using blob to handle
the unknown size of the row value
Do you see any other glaring problems with this type of soft delete?
Why not mirror your tables with archive tables?
create table mytable(
col_1 int
,col_2 varchar(100)
,col_3 date
,primary key(col_1)
)
create table mytable_deleted(
delete_id int not null auto_increment
,delete_dtm datetime not null
-- All of the original columns
,col_1 int
,col_2 varchar(100)
,col_3 date
,index(col_1)
,primary key(delete_id)
)
And then simply add on-delete-triggers on your tables that inserts the current row in the mirrored table before the deletion? That would provide you with dead-simple and very performant solution.
You could actually generate the tables and trigger code using the data dictionary.
Note that I might not want to have a unique index on the original primary key (col_1) in the archive table, because you may actually end up deleting the same row twice over time if you are using natural keys. Unless you plan to hook up the archive tables in your application (for undo purposes) you can drop the index entirely. Also, I added the time of delete (deleted_dtm) and a surrogate key that can be used to delete the deleted (hehe) rows.
You may also consider range partitioning the archive table on deleted_dtm. This makes it pretty much effortless to purge data from the tables.
I have a content application that needs to count responses in a time slice, then order them by number of responses. It currently works great with a small data set, but needs to scale to millions rows. My current query won't work.
mysql> describe Responses;
+---------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| site_id | int(10) unsigned | NO | MUL | NULL | |
| content_id | bigint(20) unsigned | NO | PRI | NULL | |
| response_id | bigint(20) unsigned | NO | PRI | NULL | |
| date | int(10) unsigned | NO | | NULL | |
+---------------+---------------------+------+-----+---------+-------+
The table type is InnoDB, the primary key is on (content_id, response_id). There is an additional index on (content_id, date) used to find responses to a piece of content, and another additional index on (site_id, date) used in the query I am have trouble with:
mysql> explain SELECT content_id id, COUNT(response_id) num_responses
FROM Responses
WHERE site_id = 1
AND date > 1234567890
AND date < 1293579867
GROUP BY content_id
ORDER BY num_responses DESC
LIMIT 0, 10;
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| 1 | SIMPLE | Responses | range | date | date | 8 | NULL | 102 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
That's the best I've been able to come up with, but it will end up being in the 1,000,000's of rows needing to be counted, resulting in 10,000's of rows to sort, to pull in a handful of results.
I can't think of a way to precalculate the count either, as the date range is arbitrary. I have some liberty with changing the primary key: it can be composed of content_id, response_id, and site_id in any order, but cannot contain date.
The application is developed mostly in PHP, so if there is an quicker way to accomplish the same results by splitting the query into subqueries, using temporary tables, or doing things on the application side, I'm open to suggestions.
(Reposted from comments by request)
Set up a table that has three columns: id, date, and num_responses. The column num_responses consists of the number of responses for the given id on the given date. Backfill the table appropriately, and then at around midnight (or later) each night, run a script that adds a new row for the previous day.
Then, to get the rows you want, you can merely query the table mentioned above.
Rather than calculating each time, how about cache the calculated count since the last query, and add the increment of count to update the cache by putting date condition into the WHERE clause?
Have you considered partitioning the table by date? Are there any indices on the table?
Explain
SELECT `feed_objects`.*
FROM `feed_objects`
WHERE (`feed_objects`.feed_id IN
( 165,160,159,158,157,153,152,151,150,149,148,147,129,128,127,126,125,124,122,
121, 120,119,118,117,116,115,114,113,111,110)) ;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | feed_objects | ALL | by_feed_id | NULL | NULL | NULL | 188 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
Not used index by_feed_id
But when I point less than the values in the WHERE - everything is working right
Explain
SELECT `feed_objects`.*
FROM `feed_objects`
WHERE (`feed_objects`.feed_id IN
(165,160,159,158,157,153,152,151,150,149,148,147,129,128,127,125,124)) ;
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | feed_objects | range | by_feed_id | by_feed_id | 9 | NULL | 18 | Using where |
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
Used index by_feed_id
What is the problem?
The MySQL optimizer makes a lot of decisions that sometimes look strange. In this particular case, I believe that you have a very small table (188 rows total from the looks of the first EXPLAIN), and that is affecting the optimizer's decision.
The "How MySQL Uses Indexes" manual pages offers this snippet of info:
Sometimes MySQL does not use an index,
even if one is available. One
circumstance under which this occurs
is when the optimizer estimates that
using the index would require MySQL to
access a very large percentage of the
rows in the table. (In this case, a
table scan is likely to be much faster
because it requires fewer seeks.)
Because the number of ids in your first WHERE clause is relatively large compared to the table size, MySQL has determined that it is faster to scan the data than to consult the index, as the data scan is likely to result in less disk access time.
You could test this by adding rows to the table and re-running the first EXPLAIN query. At a certain point, MySQL will start using the index for it as well as the second query.
After some expected growth on our service all of the sudden some updates are taking extremely long time, these used to be pretty fast until the table reached about 2MM records, now they take about 40-60 seconds each.
update table1 set field1=field1+1 where id=2229230;
Query OK, 0 rows affected (42.31 sec)
Rows matched: 1 Changed: 0 Warnings: 0
Here are the field types:
`id` bigint(20) NOT NULL auto_increment,
`field1` int(11) default '0',
Result from the profiling, for context switches which is the only one that seems to have high numbers on the results:
mysql> show profile context switches
-> ;
+----------------------+-----------+-------------------+---------------------+
| Status | Duration | Context_voluntary | Context_involuntary |
+----------------------+-----------+-------------------+---------------------+
| (initialization) | 0.000007 | 0 | 0 |
| checking permissions | 0.000009 | 0 | 0 |
| Opening tables | 0.000045 | 0 | 0 |
| System lock | 0.000009 | 0 | 0 |
| Table lock | 0.000008 | 0 | 0 |
| init | 0.000056 | 0 | 0 |
| Updating | 46.063662 | 75487 | 14658 |
| end | 2.803943 | 5851 | 857 |
| query end | 0.000054 | 0 | 0 |
| freeing items | 0.000011 | 0 | 0 |
| closing tables | 0.000008 | 0 | 0 |
| logging slow query | 0.000265 | 2 | 1 |
+----------------------+-----------+-------------------+---------------------+
12 rows in set (0.00 sec)
The table is about 2.5 million records, the id is the primary key, and it has one unique index on another field (not included in the update).
It's a innodb table.
any pointers on what could be the cause ?
Any particular variables that could help track the issue down ?
Is there a "explain" for updates ?
EDIT: Also I just noticed that the table also has a :
`modDate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
Explain:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | table1 | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.02 sec)
There's no way that query should take a long time, if id is really the primary key (unless you have lots and lots of ids equal to 2229230?). Please run the following two sqls and post the results:
show create table table1;
explain select * from table1 where id = 2229230;
Update: just to be complete, also do a
select count(*) from table1 where id = 2229230;
Ok after a few hours of tracking down this one, it seems the cause was a "duplicate index", the create table that I was doing to answer Keith, had this strange combo:
a unique key on fieldx
a key on fieldx
the second one is obviously redundant and useless, but after I dropped the key all updates went back to < 1 sec.
+1 to Keith and Ivan as their comments help me finally track down the issue.
May I recommend as well:
OPTIMIZE TABLE table1;
Sometimes your tables just need a little love, if they've grown rapidly and you haven't optimized them in a while the indices may be a little crazy. In fact, if you've got the time (and you do), it never hurts to throw in
REPAIR TABLE table1;
What helped for me was changing the engine from 'innodb' to 'myisam'.
My update query on a similar size dataset went from 100 ms to 0.1 ms.
Be aware that changing the engine type for an existing application might have consequences as InnoDB has better data integrity and some functions your application might be depending on.
For me it was worth losing InnoDB for a 1000 times speed increase on big datasets.
I'd come across this same issue, which turned out to be due to a trigger on the table. Any time an update was done on my table, the trigger would update another table which is were the delay was actually caused. Adding an index on that table fixed the issue. So try checking for triggers on the table.
Another thing to take into consideration when debugging update statements taking way too much time compared to the select statement version:
are the update statements executed within a transaction?
In my case a transaction turned the update statement(s) from a blazingly fast execution time to a extremely slow one. The more rows affected the heavier the loss of performance. My statements work with user defined variables, but don't know if that part of the 'problem'.