In timescale db, while querying compressed chunks, segment by indexes are not getting used. How to make sure that indexes are used while querying? - indexing

Below is my table schema:-
CREATE TABLE public.table1
(
machinecode character varying(100) COLLATE pg_catalog."default" NOT NULL,
groupname character varying(50) COLLATE pg_catalog."default" NOT NULL,
sensor character varying(100) COLLATE pg_catalog."default" NOT NULL,
"time" timestamp without time zone NOT NULL,
value1 integer,
value2 real,
PRIMARY KEY (machinecode, groupname,sensor, "time")
)
CREATE INDEX ON table1 (machinecode,groupname,sensor, time DESC);
SELECT create_hypertable('table1', 'time');
SELECT set_chunk_time_interval('table1', INTERVAL '24 hours');
ALTER TABLE table1 SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'machinecode,groupname,sensor'
);
SELECT add_compression_policy('table1', INTERVAL '7 days');
Image 1
But as can be seen from image 1, in explain no index scan is used on machinecode
select * from _timescaledb_catalog.hypertable
where id =
(SELECT compressed_hypertable_id FROM _timescaledb_catalog.hypertable
WHERE table_name='table1');
image 2
From image 2 it can be seen that compressed_hypertable_18 is formed and indexes on same is
CREATE INDEX _compressed_hypertable_18_machinecode__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_18 USING btree (machinecode, _ts_meta_sequence_num)
CREATE INDEX _compressed_hypertable_18_groupname__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_18 USING btree (groupname, _ts_meta_sequence_num)
CREATE INDEX _compressed_hypertable_18_sensor__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_18 USING btree (sensor, _ts_meta_sequence_num)
But still in explain, query is not using index.
How can i make sure that indexes are used while querying.

Your query plan shows that the issue is casting the column to TEXT. This answer in GH suggest to use TEXT for the segmentby columns. In your case you use type VARYING, which introduces the need for casting. And since casting is used on the column, the index cannot be used.

Related

How to find the columns that need to be indexed?

I'm starting to learn SQL and relational databases. Below is the table that I have, and it has around 10 million records. My composite key is (reltype, from_product_id, to_product_id).
What strategy should I follow while selecting the columns that needs to be indexed? Also, I have documented the operations that would be performed on the table. Please help in determining which columns or combination of columns that need to be indexed?
Table DDL is shown below.
Table name: prod_rel.
Database schema name : public
CREATE TABLE public.prod_rel (
reltype varchar NULL,
assocsequence float4 NULL,
action varchar NULL,
from_product_id varchar NOT NULL,
to_product_id varchar NOT NULL,
status varchar NULL,
starttime varchar NULL,
endtime varchar null,
primary key reltype, from_product_id, to_product_id)
);
Operations performed on table:
select distinct(reltype )
from public.prod_rel;
update public.prod_rel
set status = ? , starttime = ?
where from_product_id = ?;
update public.prod_rel
set status = ? , endtime = ?
where from_product_id = ?;
select *
from public.prod_rel
where from_product_id in (select distinct (from_product_id)
from public.prod_rel
where status = ?
and action in ('A', 'E', 'C', 'P')
and reltype = ?
fetch first 1000 rows only);
Note: I'm not performing any JOIN operations. Also please ignore the uppercase for table or column names. I'm just getting started.
Ideal would be two indexes:
CREATE INDEX ON prod_rel (from_product_id);
CREATE INDEX ON prod_rel (status, reltype)
WHERE action IN ('A', 'E', 'C', 'P');
Your primary key (which also is implemented using an index) cannot support query 2 and 3 because from_product_id is not in the beginning. If you redefine the primary key as from_product_id, to_product_id, reltype, you don't need the first index I suggested.
Why does order matter? Imagine you are looking for a book in a library where the books are ordered by “last name, first name”. You can use this ordering to find all books by “Dickens” quickly, but not all books by any “Charles”.
But let me also comment on your queries.
The first one will perform badly if there are lots of different reltype values; try raising work_mem in that case. It is always a sequential scan of the whole table, and no index can help.
I have changed the order of primary columns as shown below as per #a_horse_with_no_name 's suggestion and created only one index for (from_product_id, reltype, status, action) columns.
CREATE TABLE public.prod_rel (
reltype varchar NULL,
assocsequence float4 NULL,
action varchar NULL,
from_product_id varchar NOT NULL,
to_product_id varchar NOT NULL,
status varchar NULL,
starttime varchar NULL,
endtime varchar null,
primary key reltype, from_product_id, to_product_id)
);
Also, I have gone thorough the portal suggested by #a_horse_with_no_name. It was amazing. I came to know lot of new things on indexing.
https://use-the-index-luke.com/

SQLite speed up select with collate nocase

I have SQLite db:
CREATE TABLE IF NOT EXISTS Commits
(
GlobalVer INTEGER PRIMARY KEY,
Data blob NOT NULL
) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
I want to make 1 select:
SELECT Commits.Data
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ?
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
after that i want to make another select:
SELECT Commits.Data,Streams.Name
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ? COLLATE NOCASE
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
The problem is that 2nd select works super slow. I think this is because COLLATE NOCASE. I want to speed up it. I tried to add index but it doesn't help (may be i did sometinhg wrong?). How to execute 2nd query with speed approximately equals to 1st query's speed?
An index can be used to speed up a search only if it uses the same collation as the query.
By default, an index takes the collation from the table column, so you could change the table definition:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
However, this would make the first query slower.
To speed up both queries, you need two indexes, one for each collation. So to use the default collation for the implicit index, and NOCASE for the explicit index:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS Streams_nocase_idx ON Streams(Name COLLATE NOCASE, GlobalVar);
(Adding the second column to the index speeds up the ORDER BY in this query.)

Optimizing GROUP BY in hsqldb

I have a table with 700K+ records on wich a simple GROUP BY query takes in excess of 35+ seconds to execute. I'm out of ideas on how to optimize this.
SELECT TOP 10 called_dn, COUNT(called_dn) FROM reportview.calls_out GROUP BY called_dn;
Here I add TOP 10 to limit network transfer induced delays.
I have an index on called_dn (hsqldb seems not to be using this).
called_dn is non nullable.
reportview.calls_out is a cached table.
Here's the table script:
CREATE TABLE calls_out (
pk_global_call_id INTEGER GENERATED BY DEFAULT AS SEQUENCE seq_global_call_id NOT NULL,
sys_global_call_id VARCHAR(65),
call_start TIMESTAMP WITH TIME ZONE NOT NULL,
call_end TIMESTAMP WITH TIME ZONE NOT NULL,
duration_interval INTERVAL HOUR TO SECOND(0),
duration_seconds INTEGER,
call_segments INTEGER,
calling_dn VARCHAR(25) NOT NULL,
called_dn VARCHAR(25) NOT NULL,
called_via_dn VARCHAR(25),
fk_end_status INTEGER NOT NULL,
fk_incoming_queue INTEGER,
call_start_year INTEGER,
call_start_month INTEGER,
call_start_week INTEGER,
call_start_day INTEGER,
call_start_hour INTEGER,
call_start_minute INTEGER,
call_start_second INTEGER,
utc_created TIMESTAMP WITH TIME ZONE,
created_by VARCHAR(25),
utc_modified TIMESTAMP WITH TIME ZONE,
modified_by VARCHAR(25),
PRIMARY KEY (pk_global_call_id),
FOREIGN KEY (fk_incoming_queue)
REFERENCES lookup_incoming_queue(pk_id),
FOREIGN KEY (fk_end_status)
REFERENCES lookup_end_status(pk_id));
I'm I stuck with this kind of performance or is there something I might try to speed up this query?
EDIT: Here's the query plan if it helps:
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[true]
columns=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN not nullable
COUNT arg=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN nullable]
[range variable 1
join type=INNER
table=CALLS_OUT
cardinality=771855
access=FULL SCAN
join condition = [index=SYS_IDX_SYS_PK_10173_10177]]]
groupColumns=[COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN]
offset=[VALUE = 0, TYPE = INTEGER]
limit=[VALUE = 10, TYPE = INTEGER]
PARAMETERS=[]
SUBQUERIES[]
Well, as it seems there's no way to avoid a full column scan in this situation.
Just for reference of future souls reaching this question, here's what I resorted to in the end:
Created a summary table maintained by INSERT / DELETE triggers in the original table. This in combination with suitable indexes and using LIMIT USING INDEX clauses in my queries yields very good performance.

Postgresql Query is very Slow

I have a table with 300000 rows and when i run a simple query like
select * from diario_det;
it leaves 41041 ms to return rows. It's fine that? How i can optimize the query?
I use Postgresql 9.3 in Centos 7.
Here's is my table
CREATE TABLE diario_det
(
cod_empresa numeric(2,0) NOT NULL,
nro_asiento numeric(8,0) NOT NULL,
nro_secue_pase numeric(4,0) NOT NULL,
descripcion_pase character varying(150) NOT NULL,
monto_debe numeric(16,3),
monto_haber numeric(16,3),
estado character varying(1) NOT NULL,
cod_pcuenta character varying(15) NOT NULL,
cod_local numeric(2,0) NOT NULL,
cod_centrocosto numeric(4,0) NOT NULL,
cod_ejercicio numeric(4,0) NOT NULL,
nro_comprob character varying(15),
conciliado_por character varying(10),
CONSTRAINT fk_diario_det_cab FOREIGN KEY (cod_empresa, cod_local, cod_ejercicio, nro_asiento)
REFERENCES diario_cab (cod_empresa, cod_local, cod_ejercicio, nro_asiento) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_diario_det_pc FOREIGN KEY (cod_empresa, cod_pcuenta)
REFERENCES plan_cuenta (cod_empresa, cod_pcuenta) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=TRUE
);
ALTER TABLE diario_det
OWNER TO postgres;
-- Index: pk_diario_det_ax
-- DROP INDEX pk_diario_det_ax;
CREATE INDEX pk_diario_det_ax
ON diario_det
USING btree
(cod_pcuenta COLLATE pg_catalog."default", cod_local, estado COLLATE pg_catalog."default");
Very roughly size of one row is 231 bytes, times 300000... It's 69300000 bytes (~69MB) that has to be transferred from server to client.
I think that 41 seconds is a bit long, but still the query has to be slow because of amount of data that has to be loaded from disk and transferred.
You can optimise query by
selecting just columns you that are going to use not all of them (if you need just cod_empresa it would reduce total amount of transferred data to ~1.2MB, but server would still have to iterate trough all records - slow)
filter only rows that are going to use - using WHERE on columns with indexes can really speed the query up
If you want to know what is happening in your query, play around with EXPLAIN and EXPLAIN EXECUTE.
Also, if you're running dedicated database server, be sure to configure it properly to use a lot of system resources.

MySQL 1 millon row query speed

I'm having trouble getting a decent query time out of a large MySQL table, currently its taking over 20 seconds. The problem lies in the GROUP BY as MySQL needs to run a filesort but I don't see how I can get around this
QUERY:
SELECT play_date, COUNT(DISTINCT(email)) AS count
FROM log
WHERE type = 'play'
AND play_date BETWEEN '2009-02-23'
AND '2009-02-24'
GROUP BY play_date
ORDER BY play_date desc
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL type,type_2 NULL NULL NULL 530892 Using where; Using filesort
TABLE STRUCTURE
CREATE TABLE IF NOT EXISTS `log` (
`id` int(11) NOT NULL auto_increment,
`email` varchar(255) NOT NULL,
`type` enum('played','reg','friend') NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,
`play_date` date NOT NULL,
`email_refer` varchar(255) NOT NULL,
`remote_addr` varchar(15) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `type` (`type`),
KEY `email_refer` (`email_refer`),
KEY `type_2` (`type`,`timestamp`,`play_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=707859 ;
If anyone knows how I could improve the speed I would be very greatful
Tom
EDIT
I've added the new index with just play_date and type but MySQL refuses to use it
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL play_date NULL NULL NULL 801647 Using where; Using filesort
This index was created using ALTER TABLE log ADD INDEX (type, play_date);
You need to create index on fields type AND play_date.
Like this:
ALTER TABLE `log` ADD INDEX (`type`, `play_date`);
Or, alternately, you can rearrange your last key like this:
KEY `type_2` (`type`,`play_date`,`timestamp`)
so MySQL can use its left part as a key.
You should add an index on the fields that you base your search on.
In your case it play_date and type
You're not taking advantage of the key named type_2. It is a composite key for type, timestamp and play_date, but you're filtering by type and play_date, ignoring timestamp. Because of this, the engine can't make use of that key.
You should create an index on the fields type and play_date, or remove timestamp from the key type_2.
Or you could try to incorporate timestamp into your current query as a filter. But judging from your current query I don't think that is logical.
Does there need to be an index on play_date, or move the position in the composite index to second place?
The fastest options would be this
ALTER TABLE `log` ADD INDEX (`type`, `play_date`, 'email');
It would turn this index into a "covering index", which would mean that the query would only access the index stored in memory and not even goto the hard disk.
The DESC parameter is causing MySQL not to use the index for the ORDER BY. You can leave it ASC and iterate the resultset in reverse on the client side (?).