Optimize sql query - sql

Is it possable to optimize this query?
SELECT count(locId) AS antal , locId
FROM `geolitecity_block`
WHERE (1835880985>= startIpNum AND 1835880985 <= endIpNum)
OR (1836875969>= startIpNum AND 1836875969 <= endIpNum)
OR (1836878754>= startIpNum AND 1836878754 <= endIpNum)
...
...
OR (1843488110>= startIpNum AND 1843488110 <= endIpNum)
GROUP BY locId ORDER BY antal DESC LIMIT 100
The table looks like this
CREATE TABLE IF NOT EXISTS `geolitecity_block` (
`startIpNum` int(11) unsigned NOT NULL,
`endIpNum` int(11) unsigned NOT NULL,
`locId` int(11) unsigned NOT NULL,
PRIMARY KEY (`startIpNum`),
KEY `locId` (`locId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
UPDATE
and the explain-query looks like this
+----+-------------+-------------------+-------+---------------+-------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+-------+---------+------+------+----------------------------------------------+
| 1 | SIMPLE | geolitecity_block | index | PRIMARY | locId | 4 | NULL | 108 | Using where; Using temporary; Using filesort |
+----+-------------+-------------------+-------+---------------+-------+---------+------+------+----------------------------------------------+

To optimize performance, create an index on startIpNum and endIpNum.
CREATE INDEX index_startIpNum ON geolitecity_block (startIpNum);
CREATE INDEX index_endIpNum ON geolitecity_block (endIpNum);

Indexing columns that are being grouped or sorted on will almost always improve performance. I would suggest plugging this query into the DTA (Database Tuning Advisor) to see if SQL can make any suggestions, this might include the creation of one or more indexes in addition to statistics.

If it is possible in your use case, create a temporary table TMP_RESULT (remove order) and than submit a second query that orders results by antal. Filesort is extremely slow and -- in your case -- you can not avoid this operation, because you do not sort by any of keys/indices. To perform count operation, you have to scan complete table. A temporary table is a much faster solution.
ps. Adding an index on (startIpNum, endIpNum) definitely will help you to get better performance but -- if you have a lot of rows -- it will not be a huge improvement.

Related

How to declare "nextval('testing_thing_thing_id_seq'::regclass)" as default value for column "thing_id" in postgres table "testing_thing"?

In my postgres db there is a table called testing_thing, which I can see (by running \d testing_thing in my psql prompt) it is defined as
Table "public.testing_thing"
Column | Type | Collation | Nullable | Default
--------------+-------------------+-----------+----------+-----------------------------------------------------
thing_id | integer | | not null | nextval('testing_thing_thing_id_seq'::regclass)
thing_num | smallint | | not null | 0
thing_desc | character varying | | not null |
Indexes:
"testing_thing_pk" PRIMARY KEY, btree (thing_num)
I want to drop it and re-create it exactly as it is, but I don't know how to reproduce the
nextval('testing_thing_thing_id_seq'::regclass)
part for column thing_id.
This is the query I put together to create the table:
CREATE TABLE testing_thing(
thing_id integer NOT NULL, --what else should I put here?
thing_num smallint NOT NULL PRIMARY KEY DEFAULT 0,
thing_desc varchar(100) NOT NULL
);
what is it missing?
Add a DEFAULT to the column you want to increment and call nextval():
CREATE SEQUENCE testing_thing_thing_id_seq START WITH 1;
CREATE TABLE testing_thing(
thing_id integer NOT NULL DEFAULT nextval('testing_thing_thing_id_seq'),
thing_num smallint NOT NULL PRIMARY KEY DEFAULT 0,
thing_desc varchar(100) NOT NULL
);
Side note: Keep in mind that attaching a sequence to a column does not prevent users to manually fill it with random data, which can create really nasty problems with primary keys. If you want to overcome it and do not necessarily need to have a sequence, consider creating an identity column, e.g.
CREATE TABLE testing_thing(
thing_id integer NOT NULL GENERATED ALWAYS AS IDENTITY,
thing_num smallint NOT NULL PRIMARY KEY DEFAULT 0,
thing_desc varchar(100) NOT NULL
);
Demo: db<>fiddle

How to re-define indexes in a postgres SQL table

I have this table which is automatically created in my DB.
This is the description of the table using the \d command.
Table "public.tableA":
Column | Type | Modifiers
----------------------------+----------+-----------------------------------------------------
var_a | integer | not null
var_b | integer | not null
var_c | bigint | not null default nextval('var_c_sequence'::regclass)
var_d | integer |
var_e | integer |
var_f | smallint | default mysessionid()
var_g | smallint | default (-1)
var_h | boolean | default false
var_g | uuid |
Indexes:
"tableA_pkey" PRIMARY KEY, btree (var_c)
"tableA_edit" btree (var_g) WHERE var_g <> (-1)
"tableA_idx" btree (var_a)
Check constraints:
"constraintC" CHECK (var_f > 0 AND var_d IS NULL AND var_e IS NULL OR (var_f = 0 OR var_f = (-1)) AND var_d IS NOT NULL AND var_e IS NOT NULL)
Triggers:
object_create BEFORE INSERT ON tableA FOR EACH ROW EXECUTE PROCEDURE create_tableA()
object_update BEFORE DELETE OR UPDATE ON tableA FOR EACH ROW EXECUTE PROCEDURE update_tableA()
I'm interested in creating this table myself, and I'm not quite sure on how to define this indices manually, any ideas?
Unless I've totally missed the boat:
alter table public."tableA"
add constraint "tableA_pkey" PRIMARY KEY (var_c);
create index "tableA_edit" on public."tableA" (var_g) WHERE var_g <> (-1);
create index "tableA_idx" on public."tableA" (var_a);
Btree is default, so I don't bother specifying that, but you can if you want.
You didn't ask, but the check constraint syntax is:
alter table public."tableA"
add constraint "constraintC"
CHECK (var_f > 0 AND var_d IS NULL AND var_e IS NULL OR
(var_f = 0 OR var_f = (-1)) AND var_d IS NOT NULL AND var_e IS NOT NULL)
By the way, the cheat would be to just look at the DDL in PgAdmin.
All that said, I generally discourage the use of the "quoteS" around a table to enforce upper/lowercase. There are cases where it makes sense (otherwise, why would the functionality exist), but in many cases it creates so much extra work in the future. In the case of the index names, it doesn't even buy you anything, since you don't really refer to them in any SQL.

PostgreSQL date query performance problems

I have around 5 million rows in a postgres table. I'd like to know how many rows match start_time >= NOW(), but despite having an index on start_time the query is extremely slow (in the order of several hours).
EXPLAIN SELECT COUNT(*) FROM core_event WHERE start_time >= NOW();
Aggregate (cost=449217.81..449217.82 rows=1 width=0)
-> Index Scan using core_event_start_time on core_event (cost=0.00..447750.83 rows=586791 width=0)
Index Cond: (start_time >= now())
Here's the schema information for the table:
id | integer | not null default nextval('core_event_id_seq'::regclass)
source | character varying(100) | not null
external_id | character varying(100) |
title | character varying(250) | not null
location | geometry | not null
start_time | timestamp with time zone |
stop_time | timestamp with time zone |
thumb | character varying(300) |
image | character varying(100) |
image_thumb | character varying(100) |
address | character varying(300) |
description | text |
venue_name | character varying(100) |
website | character varying(300) |
city_id | integer |
category | character varying(100) |
phone | character varying(50) |
place_id | integer |
image_url | character varying(300) |
event_type | character varying(200) |
hidden | boolean | not null
views | integer | not null
added | timestamp with time zone |
I have indexes on the following fields:
city_id
external_id (unique)
location
location_id
place_id
start_time
Is there any easy way for me to speed up the query (eg. a partial index), or am I going to have to resort to partitioning the data by date?
Try adding a partial index like the following:
CREATE INDEX core_event_start_time_recent_idx ON core_event (start_time)
WHERE start_time >= '2011-01-12 0:0'::timestamptz
This will create a comparatively small index. Index creation will take some time, but queries like this one will be much faster thereafter.
SELECT count(*) FROM core_event WHERE start_time >= now();
The effectiveness of this index for queries against now() will degrade slowly over the course of time, depending on how many new rows are coming in. Update (= drop & create) the index with a more recent timestamp occasionally at off hours.
You could automate this with plpgsql function that you call per cronjob or pgAgent.
You might try and see if a running CLUSTER on the table improves things (if it doesn't go against other requirements in your db):
CLUSTER core_event USING core_event_start_time;
Yes, cluster on the full index, not the partial one. This will take a while and needs an exclusive lock, because it effectively rewrites the table. It also effectively vacuums the table fully. Read about it in the manual.
You also may want to increase the statistics target for core_event.start_time;
ALTER core_event ALTER start_time SET STATISTICS 1000; -- example value
The default is just 100. Then:
ANALYZE core_event;
Or course, all the usual performance stuff applies, too.
Do most of these columns get populated for each row? If so, the amount of disk that postgresql has to look at to test rows for liveness even after checking the index will be fairly large. Try for example creating a separate table that only has id and start_time:
create table core_event_start_time as select id, start_time from core_event;
alter table core_event_start_time add primary key(id);
alter table core_event_start_time add foreign key(id) references core_event(id);
create index on core_event_start_time(start_time);
Now see how long it takes to count IDs in core_event_start_time only. Of course, this approach will take up more buffer cache at the expense of space for your actual core_event table...
If it helps, you can add a trigger onto core_event to keep the auxiliary table updated.
(postgresql 9.2 will introduce "index only scans" which may help with this sort of situation, but that's for the future)

MySQL: Which indexes to use for a simple range select?

I have a table with ~30 million rows ( and growing! ) and currently i have some problems with a simple range select.
The query, looks like this one:
SELECT SUM( CEIL( dlvSize / 100 ) ) as numItems
FROM log
WHERE timeLogged BETWEEN 1000000 AND 2000000
AND user = 'example'</pre>
It takes minutes to finish and i think that the solution would be at the indexes that i'm using. Here is the result of explain:
+----+-------------+-------+-------+---------------------------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------------+---------+---------+------+----------+-------------+
| 1 | SIMPLE | log | range | PRIMARY,timeLogged | PRIMARY | 4 | NULL | 11839754 | Using where |
+----+-------------+-------+-------+---------------------------------+---------+---------+------+----------+-------------+
My table structure is this one ( reduced to make it fit better on the problem ):
CREATE TABLE IF NOT EXISTS `log` (
`origDomain` varchar(64) NOT NULL default '0',
`timeLogged` int(11) NOT NULL default '0',
`orig` varchar(128) NOT NULL default '',
`rcpt` varchar(128) NOT NULL default '',
`dlvSize` varchar(255) default NULL,
`user` varchar(255) default NULL,
PRIMARY KEY (`timeLogged`,`orig`,`rcpt`),
KEY `timeLogged` (`timeLogged`),
KEY `orig` (`orig`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Any ideas of what can I do to optimize this query or indexes on my table?
You may want to try adding a composite index on (user, timeLogged):
CREATE TABLE IF NOT EXISTS `log` (
...
KEY `user_timeLogged` (user, timeLogged),
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Related Stack Overflow post:
Database: When should I use a composite index?
In addition to the suggestions made by the other answers, I note that you have a column user in the table which is a varchar(255). If this refers to a column in a table of users, then 1) it would most likely to far more efficient to add an integer ID column to that table, and use that as the primary key and as a referencing column in other tables; 2) you are using InnoDB, so why not take advantage of the foreign key capabilities it offers?
Consider that if you index by a varchar(n) column, it is treated like a char(n) in the index, so each row of your current primary key takes up 4 + 128 + 128 = 260 bytes in the index.
Add an index on user.

Is InnoDB sorting really THAT slow?

I had all my tables in myISAM but the table level locking was starting to kill me when I had long running update jobs. I converted my primary tables over to InnoDB and now many of my queries are taking over 1 minute to complete where they were nearly instantaneous on myISAM. They are usually stuck in the Sorting result step. Did I do something wrong?
For example :
SELECT * FROM `metaward_achiever`
INNER JOIN `metaward_alias` ON (`metaward_achiever`.`alias_id` = `metaward_alias`.`id`)
WHERE `metaward_achiever`.`award_id` = 1507
ORDER BY `metaward_achiever`.`modified` DESC
LIMIT 100
Takes about 90 seconds now. Here is the describe :
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+-------+-----------------------------+
| 1 | SIMPLE | metaward_achiever | ref | metaward_achiever_award_id,metaward_achiever_alias_id | metaward_achiever_award_id | 4 | const | 66424 | Using where; Using filesort |
| 1 | SIMPLE | metaward_alias | eq_ref | PRIMARY | PRIMARY | 4 | paul.metaward_achiever.alias_id | 1 | |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+-------+-----------------------------+
It seems that now TONS of my queries get stuck in the "Sorting result" step :
mysql> show processlist;
+--------+------+-----------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+------+-----------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| 460568 | paul | localhost | paul | Query | 0 | NULL | show processlist |
| 460638 | paul | localhost | paul | Query | 0 | Sorting result | SELECT `metaward_achiever`.`id`, `metaward_achiever`.`modified`, `metaward_achiever`.`created`, `met |
| 460710 | paul | localhost | paul | Query | 79 | Sending data | SELECT `metaward_achiever`.`id`, `metaward_achiever`.`modified`, `metaward_achiever`.`created`, `met |
| 460722 | paul | localhost | paul | Query | 49 | Updating | UPDATE `metaward_alias` SET `modified` = '2009-09-15 12:43:50', `created` = '2009-08-24 11:55:24', ` |
| 460732 | paul | localhost | paul | Query | 25 | Sorting result | SELECT `metaward_achiever`.`id`, `metaward_achiever`.`modified`, `metaward_achiever`.`created`, `met |
+--------+------+-----------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
Any why is that simple update stuck for 49 seconds?
If it helps, here are the schemas :
| metaward_alias | CREATE TABLE `metaward_alias` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`modified` datetime NOT NULL,
`created` datetime NOT NULL,
`string_id` varchar(255) DEFAULT NULL,
`shortname` varchar(100) NOT NULL,
`remote_image` varchar(500) DEFAULT NULL,
`image` varchar(100) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`type_id` int(11) NOT NULL,
`md5` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `string_id` (`string_id`),
KEY `metaward_alias_user_id` (`user_id`),
KEY `metaward_alias_type_id` (`type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=858381 DEFAULT CHARSET=utf8 |
| metaward_award | CREATE TABLE `metaward_award` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`modified` datetime NOT NULL,
`created` datetime NOT NULL,
`string_id` varchar(20) NOT NULL,
`owner_id` int(11) NOT NULL,
`name` varchar(100) NOT NULL,
`description` longtext NOT NULL,
`owner_points` int(11) NOT NULL,
`url` varchar(500) NOT NULL,
`remote_image` varchar(500) DEFAULT NULL,
`image` varchar(100) NOT NULL,
`parent_award_id` int(11) DEFAULT NULL,
`slug` varchar(110) NOT NULL,
`true_points` double DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `string_id` (`string_id`),
KEY `metaward_award_owner_id` (`owner_id`),
KEY `metaward_award_parent_award_id` (`parent_award_id`),
KEY `metaward_award_slug` (`slug`),
KEY `metaward_award_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=122176 DEFAULT CHARSET=utf8 |
| metaward_achiever | CREATE TABLE `metaward_achiever` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`modified` datetime NOT NULL,
`created` datetime NOT NULL,
`award_id` int(11) NOT NULL,
`alias_id` int(11) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `metaward_achiever_award_id` (`award_id`),
KEY `metaward_achiever_alias_id` (`alias_id`)
) ENGINE=InnoDB AUTO_INCREMENT=77175366 DEFAULT CHARSET=utf8 |
And these in my my.cnf
innodb_file_per_table
innodb_buffer_pool_size = 2048M
innodb_additional_mem_pool_size = 16M
innodb_flush_method=O_DIRECT
As written in Should you move from MyISAM to Innodb ? (which is pretty recent):
Innodb Needs Tuning As a final note about MyISAM to Innodb migration I should mention about Innodb tuning. Innodb needs tuning. Really. MyISAM for many applications can work well with defaults. I’ve seen hundreds of GB databases ran with MyISAM with default settings and it worked reasonably. Innodb needs resources and it will not work well with defaults a lot. Tuning MyISAM from defaults rarely gives more than 2-3 times gain while it can be as much as 10-50 times for Innodb tables in particular for write intensive workloads. Check here for details.
So, about MySQL Innodb Settings, the author wrote in Innodb Performance Optimization Basics:
The most important ones are:
innodb_buffer_pool_size 70-80% of
memory is a safe bet. I set it to 12G
on 16GB box.
UPDATE: If you’re looking for more
details, check out detailed guide on
tuning innodb buffer pool
innodb_log_file_size – This depends on
your recovery speed needs but 256M
seems to be a good balance between
reasonable recovery time and good
performance
innodb_log_buffer_size=4M 4M is good
for most cases unless you’re piping
large blobs to Innodb in this case
increase it a bit.
innodb_flush_log_at_trx_commit=2 If you’re not concern about ACID and can loose transactions for last second or two in case of full OS crash than set this value. It can dramatic effect especially on a lot of short write transactions.
innodb_thread_concurrency=8 Even with current Innodb Scalability Fixes having limited concurrency helps. The actual number may be higher or lower depending on your application and default which is 8 is decent start
innodb_flush_method=O_DIRECT Avoid double buffering and reduce swap pressure, in most cases this setting improves performance. Though be careful if you do not have battery backed up RAID cache as when write IO may suffer.
innodb_file_per_table – If you do not have too many tables use this option, so you will not have uncontrolled innodb main tablespace growth which you can’t reclaim. This option was added in MySQL 4.1 and now stable enough to use.
Also check if your application can run in READ-COMMITED isolation mode – if it does – set it to be default as transaction-isolation=READ-COMMITTED. This option has some performance benefits, especially in locking in 5.0 and even more to come with MySQL 5.1 and row level replication.
Just for the record, the people behind mysqlperformanceblog.com ran a benchmark comparing Falcon, MyISAM and InnoDB. The benchmark was really supposed to be highlighting Falcon, except it was InnoDB that won the day, topping both Falcon and MyISAM in queries per second for almost every test: InnoDB vs MyISAM vs Falcon benchmarks – part 1.
That is a large result set (66,424 rows) that MySQL must manually sort. Try adding an index to metaward_achiever.modified.
There is a limitation with MySQL 4.x that only allows MySQL to use one index per table. Since it is using the index on metaward_achiever.award_id column for the WHERE selection, it cannot also use the index on metaward_achiever.modified for the sort. I hope you're using MySQL 5.x, which may have improved this.
You can see this by doing explain on this simplified query:
SELECT * FROM `metaward_achiever`
WHERE `metaward_achiever`.`award_id` = 1507
ORDER BY `metaward_achiever`.`modified` DESC
LIMIT 100
If you can get this using the indexes for both the WHERE selection and sorting, then you're set.
You could also create a compound index with both metaward_achiever.award_id and metaward_achiever. If MySQL doesn't use it, then you can hint at it or remove the one on just award_id.
Alternatively, if you can get rid of metaward_achiever.id and make metaward_achiever.award_id your primary key and add a key on metaward_achiever.modified, or better yet make metaward_achiever.award_id combined with metaward.modified your primary key, then you'll be really good.
You can try to optimize the file sorting by modifying settings. Unfortunately, I'm not experienced with this, as our DBA handles the configuration, but you might want to check out this great blog:
http://www.mysqlperformanceblog.com/
Here's an article about filesort in particular:
http://s.petrunia.net/blog/?p=24
My guess is that you probably haven't configured your InnoDB settings beyond the defaults. You should do a quick google for setting up your InnoDB options.
The one that caused me the most noticeable performance issues out of the box was innodb_buffer_pool_size. This should be set to 50-80% of your machine's memory. By default it's often only a few MB. Crank it way up, and you should see a noticeable performance increase.
Also take a look at innodb_additional_mem_pool_size.
Start here, but also google around for "innodb performance tuning".
MySQL's query optimizer is not good, from my memory. Try a subselect instead of a straight join.
SELECT * FROM (SELECT * FROM `metaward_achiever`
WHERE `metaward_achiever`.`award_id` = 1507) a
INNER JOIN `metaward_alias` ON (a.`alias_id` = `metaward_alias`.`id`)
ORDER BY a.`modified` DESC
LIMIT 100
Or something like that (untested syntax above).
Sorting is something done by the database server, not the storage engine, in MySQL.
If in both cases, the engine was not able to provide the results in already-sorted form (it depends on the index used), then the server needs to sort them.
The only reason that MyISAM / InnoDB might be different is that the order the rows come back could affect how sorted the data are already - MyISAM could give the data back in "more sorted" order in some cases (and vice versa).
Still, sorting 60k rows is not going to take long as it's a very small data set. Are you sure you've got your sort buffer set big enough?
Using an on-disc filesort() instead of an in-memory one is much slower. The engine should however, not make any difference to this. filesort is not an engine function, but a MySQL core function. filesort does, in fact, suck in quite a lot of ways but it's not normally that slow.
Try adding a key for fields: metaward_achiever.alias_id, metaward_achiever.award_id, and metaward_achiever.modified this will help alot. And do not use keys on varchar fields it will increase time for inserts and updates. Also it seems you have 77M records in achiever table, you may want to care about innodb optimizations. There are lots of good tutors around how to set memory limits for it.