Very slow delete/update in MariaDb ColumnStore Server using primary key - sql

We have table with InnoDB engine (on MariaDB columnstore server) that have 5 million rows and primary key of 2 columns. DDL:
CREATE TABLE `interest` (
`src_id` TINYINT NOT NULL,
`id` binary(36) NOT NULL,
`account_id` bigint(20) NOT NULL,
`instrument_id` int(11) NOT NULL,
`quantity` decimal(15,4) NOT NULL,
`time` datetime NOT NULL,
`interest` decimal(16,2) NOT NULL,
`auto_incr` bigint(20) NOT NULL ,
PRIMARY KEY (`src_id`, `id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and both of these queries takes 4-5 seconds to execute
DELETE FROM interest WHERE `id`='52f35ddb-94d0-4744-bb3c-f3ae8100dc8e' AND `src_id`='1';
UPDATE interest SET interest = 1 WHERE `id`='52f35ddb-94d0-4744-bb3c-f3ae8100dc8e' AND `src_id`='1';
The result of the EXPLAIN is:
EXPLAIN EXTENDED DELETE FROM interest WHERE `id`='52f35ddb-94d0-4744-bb3c-f3ae8100dc8e' AND `src_id`='1';
It looks like that primary key doesn't take effect because it scans more than 50% of the rows.
I exported the data from this table (in MariaDB columnstore server) and import it in table within normal MariaDB server and there is no such problem. Both query execute under 10ms.
We are using latest version of MariaDB columnstore: 1.2.5

Columnstore uses slightly customized version of MDB and there is place where custom code alters optimizer settings but this only works for queries that have CS tables. JFYI Enterprise 10.4 and community 10.5 will have Columnstore as a normal engine with the non-custom MDB version.

Related

Primary Index not being used

I have a greenplum Cluster with below specifications:
Master (16 VCPUs, 32GB RAM, 27GB Swap)
4 Segments (16 VCPUs, 62GB RAM, 27GB Swap) on each
Earlier i had two segments and was having outstanding performance for my use cases but ever since i have expanded the cluster to four nodes, i am unable to get the indexes to being used by the queries.
The queries which were being being executed within 10ms (with index hit) are now taking 2-5 seconds on sequential scan.
I have attached my schema and some sample explain analyze outputs(This is a sample query plan, relevant table has 48260809 number of rows in it).
Schema:
\c dmiprod
Create Table dmiprod_schema."package"
(
identity varchar(4096) not null,
"identityHash" bytea not null,
"packageDate" date not null,
ctime timestamp not null,
customer varchar(32) not null
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.package ADD CONSTRAINT "package_pkey" PRIMARY KEY ("identityHash");
create index idx_package_ctime on dmiprod_schema.package ("ctime");
create index idx_package_packageDate on dmiprod_schema.package ("packageDate");
create index idx_package_customer on dmiprod_schema.package ("customer");
CREATE TABLE dmiprod_schema."tags"
(
"identityHash" bytea not null,
tag varchar(32) not null,
UNIQUE ("identityHash",tag)
) distributed by ("identityHash");
create index "idx_tags_identityHash" on dmiprod_schema.tags ("identityHash");
create index idx_tags_tag on dmiprod_schema.tags ("tag");
CREATE TABLE dmiprod_schema."features"
(
"identityHash" bytea not null,
ctime timestamp not null,
utime timestamp not null,
phash varchar(64) ,
ahash varchar(64),
chash varchar(78),
iimages JSON ,
lcert JSON ,
slogos JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.features ADD CONSTRAINT "features_pkey" PRIMARY KEY ("identityHash");
create index idx_features_phash on dmiprod_schema.features ("phash");
CREATE TABLE dmiprod_schema."raw"
(
"identityHash" bytea not null,
ctime timestamp not null,
utime timestamp not null,
ourl TEXT,
lurl TEXT,
"pageText" TEXT,
"ocrText" TEXT,
html TEXT,
meta JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.raw ADD CONSTRAINT "raw_pkey" PRIMARY KEY ("identityHash");
CREATE TABLE dmiprod_schema.packageLock
(
"identityHash" bytea not null,
secret bytea not null,
ctime timestamp not null,
UNIQUE ("identityHash")
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.packageLock ADD CONSTRAINT "packageLock_pkey" PRIMARY KEY ("identityHash");
create index idx_packageLock_secret on dmiprod_schema.packageLock ("secret");
Recreating the table and the indexes, inserting 20 million 32, 48, 64, 96 and 128 length random bytea in identityHash, and then performing same select results in package_pkey being used and in under 20ms.
Aside from index usage, the other difference is usage of GPORCA optimizer. I suggest you set optimizer = 'on'; and try again. If that does not work, post your GPDB/Greenplum version and session settings include optimizer, enable_indexscan, and any other relevant settings.
I tested on VMs single physical host and Tanzu Greenplum 6.17.1 with 4 segment hosts.
Just to confirm: when you expanded the system (assuming the use of gpexpand), you did run the redistribution phase (gpexpand is a two step process)? When that completed, did you run analyzedb on the system to make sure statistics were updated with the new table/index segments?

Postgresql Query is very Slow

I have a table with 300000 rows and when i run a simple query like
select * from diario_det;
it leaves 41041 ms to return rows. It's fine that? How i can optimize the query?
I use Postgresql 9.3 in Centos 7.
Here's is my table
CREATE TABLE diario_det
(
cod_empresa numeric(2,0) NOT NULL,
nro_asiento numeric(8,0) NOT NULL,
nro_secue_pase numeric(4,0) NOT NULL,
descripcion_pase character varying(150) NOT NULL,
monto_debe numeric(16,3),
monto_haber numeric(16,3),
estado character varying(1) NOT NULL,
cod_pcuenta character varying(15) NOT NULL,
cod_local numeric(2,0) NOT NULL,
cod_centrocosto numeric(4,0) NOT NULL,
cod_ejercicio numeric(4,0) NOT NULL,
nro_comprob character varying(15),
conciliado_por character varying(10),
CONSTRAINT fk_diario_det_cab FOREIGN KEY (cod_empresa, cod_local, cod_ejercicio, nro_asiento)
REFERENCES diario_cab (cod_empresa, cod_local, cod_ejercicio, nro_asiento) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_diario_det_pc FOREIGN KEY (cod_empresa, cod_pcuenta)
REFERENCES plan_cuenta (cod_empresa, cod_pcuenta) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=TRUE
);
ALTER TABLE diario_det
OWNER TO postgres;
-- Index: pk_diario_det_ax
-- DROP INDEX pk_diario_det_ax;
CREATE INDEX pk_diario_det_ax
ON diario_det
USING btree
(cod_pcuenta COLLATE pg_catalog."default", cod_local, estado COLLATE pg_catalog."default");
Very roughly size of one row is 231 bytes, times 300000... It's 69300000 bytes (~69MB) that has to be transferred from server to client.
I think that 41 seconds is a bit long, but still the query has to be slow because of amount of data that has to be loaded from disk and transferred.
You can optimise query by
selecting just columns you that are going to use not all of them (if you need just cod_empresa it would reduce total amount of transferred data to ~1.2MB, but server would still have to iterate trough all records - slow)
filter only rows that are going to use - using WHERE on columns with indexes can really speed the query up
If you want to know what is happening in your query, play around with EXPLAIN and EXPLAIN EXECUTE.
Also, if you're running dedicated database server, be sure to configure it properly to use a lot of system resources.

In H2 Database, add index while table creation in single query

I am trying to create table having different indexes with single query but H2 gives Error for example:
create table tbl_Cust
(
id int primary key auto_increment not null,
fid int,
c_name varchar(50),
INDEX (fid)
);
but this gives error as
Unknown data type: "("; SQL statement:
[Error Code: 50004]
[SQL State: HY004]
Due to this I have to run 2 different queries to create table with Index. First query to create table and then second query to add index with
create INDEX c_fid on tbl_Cust(fid);
Is there something wrong in my query or H2 simply does not support this creation of table with index in single query?
Interesting question. The solution is even more interesting, as it involves MySQL compatibility mode.
It's actually possible to perform the exact same command you wrote without any modification, provided you just add to your jdbc url the MySQL mode.
Example URL like this: jdbc:h2:mem:;mode=mysql
SQL remains:
create table tbl_Cust
(
id int primary key auto_increment not null,
fid int,
c_name varchar(50),
INDEX (fid)
);
Update count: 0
(15 ms)
Too bad I did not see this question earlier... Hopefully the solution might become handy one day to someone :-)
I could resolve the problem. According to
http://www.h2database.com/html/grammar.html#create_index
I modified the query. It works fine with my H2 server.
CREATE TABLE subscription_validator (
application_id int(11) NOT NULL,
api_id int(11) NOT NULL,
validator_id int(11) NOT NULL,
PRIMARY KEY (application_id,api_id),
CONSTRAINT subscription_validator_ibfk_1 FOREIGN KEY (validator_id) REFERENCES validator (id) ON UPDATE CASCADE
);
CREATE INDEX validator_id ON subscription_validator(validator_id);

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

Large MyISAM table slow even for non-concurrent inserts/updates

I have a MyISAM table with ~50'000'000 records (tasks for web crawler):
CREATE TABLE `tasks2` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(760) character set latin1 NOT NULL,
`state` varchar(10) collate utf8_bin default NULL,
`links_depth` int(11) NOT NULL,
`sites_depth` int(11) NOT NULL,
`error_text` text character set latin1,
`parent` int(11) default NULL,
`seed` int(11) NOT NULL,
`random` int(11) NOT NULL default '0',
PRIMARY KEY (`id`),
UNIQUE KEY `URL_UNIQUE` (`url`),
KEY `next_random_task` (`state`,`random`)
) ENGINE=MyISAM AUTO_INCREMENT=61211954 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Once every few seconds one of the following operations occur (but never simultaneously):
INSERT ... VALUES (500 rows) - inserts new tasks
UPDATE ... WHERE id IN (up to 10 ids) - updates state for batch of tasks
SELECT ... WHERE (by next_random_task index) - loads batch of tasks for processing
My problem is that inserts and updates are very slow - running on the order of tens of seconds, sometimes over a minute. Selects are fast, though. Why could this happen and how to improve performance?
~50M on a regular hardware is a decent number.
Please go through this question on sf (even though it is written for InoDB, there are similar parameters for MyISAM)
After that you should start the cycle of
identifying (logging) slow queries to understand you patterns (or confirm your assumptions)
tweaking my.cnf or adding/removing indexes (depending on the patterns)
measuring improvements
EXPLAIN a sample UPDATE against the full table to ensure the primary key index is being used.
Consider changing state to a TINYINT or ENUM to make its index smaller. (ENUM might not actually do this).
Do you need the unique key on url? This will slow down inserts.