Primary Index not being used

Primary Index not being used - optimization

I have a greenplum Cluster with below specifications:
Master (16 VCPUs, 32GB RAM, 27GB Swap)
4 Segments (16 VCPUs, 62GB RAM, 27GB Swap) on each
Earlier i had two segments and was having outstanding performance for my use cases but ever since i have expanded the cluster to four nodes, i am unable to get the indexes to being used by the queries.
The queries which were being being executed within 10ms (with index hit) are now taking 2-5 seconds on sequential scan.
I have attached my schema and some sample explain analyze outputs(This is a sample query plan, relevant table has 48260809 number of rows in it).
Schema:
\c dmiprod
Create Table dmiprod_schema."package"
(
identity varchar(4096) not null,
"identityHash" bytea not null,
"packageDate" date not null,
ctime timestamp not null,
customer varchar(32) not null
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.package ADD CONSTRAINT "package_pkey" PRIMARY KEY ("identityHash");
create index idx_package_ctime on dmiprod_schema.package ("ctime");
create index idx_package_packageDate on dmiprod_schema.package ("packageDate");
create index idx_package_customer on dmiprod_schema.package ("customer");
CREATE TABLE dmiprod_schema."tags"
(
"identityHash" bytea not null,
tag varchar(32) not null,
UNIQUE ("identityHash",tag)
) distributed by ("identityHash");
create index "idx_tags_identityHash" on dmiprod_schema.tags ("identityHash");
create index idx_tags_tag on dmiprod_schema.tags ("tag");
CREATE TABLE dmiprod_schema."features"
(
"identityHash" bytea not null,
ctime timestamp not null,
utime timestamp not null,
phash varchar(64) ,
ahash varchar(64),
chash varchar(78),
iimages JSON ,
lcert JSON ,
slogos JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.features ADD CONSTRAINT "features_pkey" PRIMARY KEY ("identityHash");
create index idx_features_phash on dmiprod_schema.features ("phash");
CREATE TABLE dmiprod_schema."raw"
(
"identityHash" bytea not null,
ctime timestamp not null,
utime timestamp not null,
ourl TEXT,
lurl TEXT,
"pageText" TEXT,
"ocrText" TEXT,
html TEXT,
meta JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.raw ADD CONSTRAINT "raw_pkey" PRIMARY KEY ("identityHash");
CREATE TABLE dmiprod_schema.packageLock
(
"identityHash" bytea not null,
secret bytea not null,
ctime timestamp not null,
UNIQUE ("identityHash")
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.packageLock ADD CONSTRAINT "packageLock_pkey" PRIMARY KEY ("identityHash");
create index idx_packageLock_secret on dmiprod_schema.packageLock ("secret");

Recreating the table and the indexes, inserting 20 million 32, 48, 64, 96 and 128 length random bytea in identityHash, and then performing same select results in package_pkey being used and in under 20ms.
Aside from index usage, the other difference is usage of GPORCA optimizer. I suggest you set optimizer = 'on'; and try again. If that does not work, post your GPDB/Greenplum version and session settings include optimizer, enable_indexscan, and any other relevant settings.
I tested on VMs single physical host and Tanzu Greenplum 6.17.1 with 4 segment hosts.

Just to confirm: when you expanded the system (assuming the use of gpexpand), you did run the redistribution phase (gpexpand is a two step process)? When that completed, did you run analyzedb on the system to make sure statistics were updated with the new table/index segments?

Related

Large SQL Request optimization for Faces Euclidean Distances calculations

I am calculating Euclidean distance between faces and want to store results in a table.
Current setup :
Each face is stored in Objects table and Distances between faces is stored in Faces_distances table.
The object table has the following columns objects_id, face_encodings, description
The faces_distances table has the following columns face_from, face_to, distance
In my my data set I have around 22 231 face objects which result in 494 217 361 pairs of faces - Although I understand it could be divided by 2 because
distance(face_from, face_to) = distance(face_to, face_from)
The database is Postgres 12.
The request below enables to insert the pairs of faces (without performing the distance calculation) that have not been calculated yet, but the execution time is very very very long (started 4 Days ago and still not done). Is there a way to optimize it ?
'''
-- public.objects definition
-- Drop table
-- DROP TABLE public.objects;
CREATE TABLE public.objects
(
objects_id int4 NOT NULL DEFAULT
nextval('objects_in_image_objects_id_seq'::regclass),
filefullname varchar(2303) NULL,
bbox varchar(255) NULL,
description varchar(255) NULL,
confidence numeric NULL,
analyzer varchar(255) NOT NULL DEFAULT 'object_detector'::character
varying,
analyzer_version int4 NOT NULL DEFAULT 100,
x int4 NULL,
y int4 NULL,
w int4 NULL,
h int4 NULL,
image_id int4 NULL,
derived_from_object int4 NULL,
object_image_filename varchar(2023) NULL,
face_encodings _float8 NULL,
face_id int4 NULL,
face_id_iteration int4 NULL,
text_found varchar NULL COLLATE "C.UTF-8",
CONSTRAINT objects_in_image_pkey PRIMARY KEY (objects_id),
CONSTRAINT objects_in_images FOREIGN KEY (objects_id) REFERENCES
public.objects(objects_id)
);
CREATE TABLE public.face_distances
(
face_from int8 NOT NULL,
face_to int8 NOT NULL,
distance float8 NULL,
CONSTRAINT face_distances_pk PRIMARY KEY (face_from, face_to)
);
-- public.face_distances foreign keys
ALTER TABLE public.face_distances ADD CONSTRAINT face_distances_fk
FOREIGN KEY (face_from) REFERENCES public.objects(objects_id);
ALTER TABLE public.face_distances ADD CONSTRAINT face_distances_fk_1
FOREIGN KEY (face_to) REFERENCES public.objects(objects_id);
Indexes
CREATE UNIQUE INDEX objects_in_image_pkey ON public.objects USING btree (objects_id);
CREATE INDEX objects_description_column ON public.objects USING btree (description);
CREATE UNIQUE INDEX face_distances_pk ON public.face_distances USING btree (face_from, face_to);
Query to add all pair of faces that are not already in the table.
insert into face_distances (face_from,face_to)
select t1.face_from , t1.face_to
from (
select f_from.objects_id face_from,
f_from.face_encodings face_from_encodings,
f_to.objects_id face_to,
f_to.face_encodings face_to_encodings
from objects f_from,
objects f_to
where f_from.description = 'face'
and f_to.description = 'face' ) as t1
left join face_distances on (
t1.face_from= face_distances.face_from
and t1.face_to = face_distances.face_to )
where face_distances.face_from is null;

try this simplified query.
It took only 5 minutes on my apple M1, SQLServer, with 22231 objects 'face', generated 247.097.565 pairs, which is excatly C(22231,2) number. The syntax is compatible with postgressql.
optimizations: join instead of the old jointure way, ranking functions to remove duplicates permutations (A,B)=(B,A),
removed the last [left join face_distance]: an empty table to recompute is a lot faster than checking for existance as an index search key lookup would be initiated for each key pair
insert into face_distances (face_from,face_to)
select f1,f2
from(
select --only needed fields here as this will fill temporary tables
f1.objects_id f1
,f2.objects_id f2
,dense_rank()over(order by f1.objects_id) rank1
,rank()over(partition by f2.objects_id order by f1.objects_id) rank2
from objects f1
-- generates all permutations
join objects f2 on f2.objects_id <> f1.objects_id and f2.description = 'face'
where f1.description = 'face'
)a
where rank2 >= rank1 --removes duplicate permutations

Very slow delete/update in MariaDb ColumnStore Server using primary key

We have table with InnoDB engine (on MariaDB columnstore server) that have 5 million rows and primary key of 2 columns. DDL:
CREATE TABLE `interest` (
`src_id` TINYINT NOT NULL,
`id` binary(36) NOT NULL,
`account_id` bigint(20) NOT NULL,
`instrument_id` int(11) NOT NULL,
`quantity` decimal(15,4) NOT NULL,
`time` datetime NOT NULL,
`interest` decimal(16,2) NOT NULL,
`auto_incr` bigint(20) NOT NULL ,
PRIMARY KEY (`src_id`, `id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and both of these queries takes 4-5 seconds to execute
DELETE FROM interest WHERE `id`='52f35ddb-94d0-4744-bb3c-f3ae8100dc8e' AND `src_id`='1';
UPDATE interest SET interest = 1 WHERE `id`='52f35ddb-94d0-4744-bb3c-f3ae8100dc8e' AND `src_id`='1';
The result of the EXPLAIN is:
EXPLAIN EXTENDED DELETE FROM interest WHERE `id`='52f35ddb-94d0-4744-bb3c-f3ae8100dc8e' AND `src_id`='1';
It looks like that primary key doesn't take effect because it scans more than 50% of the rows.
I exported the data from this table (in MariaDB columnstore server) and import it in table within normal MariaDB server and there is no such problem. Both query execute under 10ms.
We are using latest version of MariaDB columnstore: 1.2.5

Columnstore uses slightly customized version of MDB and there is place where custom code alters optimizer settings but this only works for queries that have CS tables. JFYI Enterprise 10.4 and community 10.5 will have Columnstore as a normal engine with the non-custom MDB version.

One to many to many relationship, with composite keys

In my game, an archetype is a collection of associated traits, an attack type, a damage type, and a resource type. Each piece of data is unique to each
archetype. For example, the Mage archetype might look like the following:
archetype: Mage
attack: Targeted Area Effect
damage: Shock
resource: Mana
trait_defense: Willpower
trait_offense: Intelligence
This is the archetype table in SQLite syntax:
create table archetype
(
archetype_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_defense_id varchar(16) not null,
trait_offense_id varchar(16) not null,
archetype_description varchar(128),
constraint pk_archetype primary key (archetype_id),
constraint uk_archetype unique (attack_id, damage_id,
resource_type_id,
trait_defense_id,
trait_offense_id)
);
The primary key should be the complete composite, but I do not want to pass
all the data around to other tables unless necessary. For example, there are
crafting skills associated with each archetype which do not need to know any
other archetype data.
An effect is a combat outcome that can be applied to a friend or foe. An effect has an application type (instant, overtime), a type (buff, debuff, harm, heal, etc.) and a detail describing to which stat the effect applies. It also has most of the archetype data to make each effect unique. Also included is the associated trait used for progress and skill checks. For example, an effect might look like:
apply: Instant
type: Harm
detail: Health
archetype: Mage
attack_id: Targeted Area Effect
damage_id: Shock
resource: Mana
trait_id: Intelligence
This is the effect table in SQLite syntax:
create table effect
(
effect_apply_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
archetype_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint pk_effect primary key(archetype_id, effect_type_id,
effect_detail_id, effect_apply_id,
attack_id, damage_id, resource_type_id),
constraint fk_effect_archetype_id foreign key(archetype_id, attack_id,
damage_id, resource_type_id)
references archetype (archetype_id, attack_id,
damage_id, resource_type_id)
);
An ability is a container that can hold multiple effects. There is no limit to
the kinds of effects it can hold, e.g. having both Mage and Warrior effects in
the same ability, or even having two of the same effects, is fine. Each effect
in the ability is going to have the archetype data, and the effect data.
Again.
Ability tables in SQLite syntax:
create table ability
(
ability_id varchar(64),
ability_description varchar(128),
constraint pk_ability primary key (ability_id)
);
create table ability_effect
(
ability_effect_id integer primary key autoincrement,
ability_id varchar(64) not null,
archetype_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
effect_apply_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint fk_ability_effect_ability_id foreign key (ability_id)
references ability (ability_id),
constraint fk_ability_effect_effect_id foreign key (archetype_id,
effect_type_id,
effect_detail_id,
effect_apply_id)
references effect (archetype_id,
effect_type_id,
effect_detail_id,
effect_apply_id)
);
This is basically a one to many to many relationship, so I needed a technical
key to have duplicate effects in the ability_effect table.
Questions:
1) Is there a better way to design these tables to avoid the duplication of
data over these three tables?
2) Should these tables be broken down further?
3) Is it better to perform multiple table lookups to collect all the data? For example, just passing around the archetype_id and doing lookups for the data when necessary (which will be often).
UPDATE:
I actually do have parent tables for attacks, damage, etc. I removed those
tables and their related indexes from the sample to make the question clean,
concise, and focused on my duplicate data issue.
I was trying to avoid each table having both an id and a name, as both would be candidate keys and so having both would be wasted space. I was trying to keep the SQLite database as small as possible. (Hence, the many "varchar(16)"
declarations, which I now know SQLite ignores.) It seems in SQLite having both
values is unavoidable, unless being twice as slow is somehow ok when using the
WITHOUT ROWID option during table creation. So, I will rewrite my database to
use ids and names via the rowid implementation.
Thanks for your input guys!

1) Is there a better way to design these tables to avoid the
duplication of data over these three tables?
and also
2) Should these tables be broken down further?
It would appear so.
It would appear Mage is a unique archtype, as is Warrior. (based upon For example, the Mage archetype might look like the following:).
As such why not make the archtype_id a primary key and then reference the attack type, damage etc from tables for these. i.e. have an attack table and a damage table.
So you could, for example, have something like (simplified for demonstration) :-
DROP TABLE IF EXISTS archtype;
DROP TABLE IF EXISTS attack;
DROP TABLE IF EXISTS damage;
CREATE TABLE IF NOT EXISTS attack (attack_id INTEGER PRIMARY KEY, attack_name TEXT, a_more_columns TEXT);
INSERT INTO attack (attack_name, a_more_columns) VALUES
('Targetted Affect','ta blah'), -- id 1
('AOE','aoe blah'), -- id 2
('Bounce Effect','bounce blah') -- id 3
;
CREATE TABLE IF NOT EXISTS damage (damage_id INTEGER PRIMARY KEY, damage_name TEXT, d_more_columns TEXT);
INSERT INTO damage (damage_name,d_more_columns) VALUES
('Shock','shock blah'), -- id 1
('Freeze','freeze blah'), -- id 2
('Fire','fire blah'), -- id 3
('Hit','hit blah')
;
CREATE TABLE IF NOT EXISTS archtype (id INTEGER PRIMARY KEY, archtype_name TEXT, attack_id_ref INTEGER, damage_id_ref INTEGER, at_more_columns TEXT);
INSERT INTO archtype (archtype_name,attack_id_ref,damage_id_ref,at_more_columns) VALUES
('Mage',1,1,'Mage blah'),
('Warrior',3,4,'Warrior Blah'),
('Dragon',2,3,'Dragon blah'),
('Iceman',2,2,'Iceman blah')
;
SELECT archtype_name, damage_name, attack_name FROM archtype JOIN damage ON damage_id_ref = damage_id JOIN attack ON attack_id_ref = attack_id;
Note that the aliases of rowid have been used for id's rather than the name as these are generally the most efficient.
The data for rowid tables is stored as a B-Tree structure containing one entry for each table row, using the rowid value as the key. This means that retrieving or sorting records by rowid is fast. Searching for a record with a specific rowid, or for all records with rowids within a specified range is around twice as fast as a similar search made by specifying any other PRIMARY KEY or indexed value. SQL As Understood By SQLite - CREATE TABLE- ROWIDs and the INTEGER PRIMARY KEY
A rowid is generated for all rows (unless WITHOUT ROWID is specified), by specifying ?? INTEGER PRIMARY KEY column ?? is an alias of the rowid.
Beware using AUTOINCREMENT, unlike other RDMS's that use this for automatically generating unique id's for rows. SQLite by default creates a unique id (the rowid). The AUTOINCREMENT keyword adds a constraint that ensures that the generated id is larger than the highest existing. To do this requires an additional table sqlite_sequence that has to be maintained and interrogated and as such has overheads. The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed. SQLite Autoincrement
The query at the end will result in :-
Now say you wanted types to have multiple attacks and damages per type then the above could easily be adapted by using many-many relationships by introducing reference/mapping/link tables (all just different names for the same). Such a table will have two columns (sometime other columns for data specific to the distinct reference/map/link) one for the parent (archtype) reference/map/link and the other for the child (attack/damage) referenced/mapped/linked.
e.g. the following could be added :-
DROP TABLE IF EXISTS archtype_attack_reference;
CREATE TABLE IF NOT EXISTS archtype_attack_reference
(aar_archtype_id INTEGER NOT NULL, aar_attack_id INTEGER NOT NULL,
PRIMARY KEY(aar_archtype_id,aar_attack_id))
WITHOUT ROWID;
DROP TABLE IF EXISTS archtype_damage_reference;
CREATE TABLE IF NOT EXISTS archtype_damage_reference
(adr_archtype_id INTEGER NOT NULL, adr_damage_id INTEGER NOT NULL,
PRIMARY KEY(adr_archtype_id,adr_damage_id))
WITHOUT ROWID
;
INSERT INTO archtype_attack_reference VALUES
(1,1), -- Mage has attack Targetted
(1,3), -- Mage has attack Bounce
(3,2), -- Dragon has attack AOE
(2,1), -- Warrior has attack targetted
(2,2), -- Warrior has attack AOE
(4,2), -- Iceman has attack AOE
(4,3) -- Icemane has attack Bounce
;
INSERT INTO archtype_damage_reference VALUES
(1,1),(1,3), -- Mage can damage with Shock and Freeze
(2,4), -- Warrior can damage with Hit
(3,3),(3,4), -- Dragon can damage with Fire and Hit
(4,2),(4,4) -- Iceman can damage with Freeze and Hit
;
SELECT archtype_name, attack_name,damage_name FROM archtype
JOIN archtype_attack_reference ON archtype_id = aar_archtype_id
JOIN archtype_damage_reference ON archtype_id = adr_archtype_id
JOIN attack ON aar_attack_id = attack_id
JOIN damage ON adr_damage_id = damage_id
;
The query results in :-
With a slight change the above query could even be used to perform a random attack e.g. :-
SELECT archtype_name, attack_name,damage_name FROM archtype
JOIN archtype_attack_reference ON archtype_id = aar_archtype_id
JOIN archtype_damage_reference ON archtype_id = adr_archtype_id
JOIN attack ON aar_attack_id = attack_id
JOIN damage ON adr_damage_id = damage_id
ORDER BY random() LIMIT 1 -- ADDED THIS LINE
;
You could get :-
Another time you might get :-
3) Is it better to perform multiple table lookups to collect all the
data? For example, just passing around the archetype_id and doing
lookups for the data when necessary (which will be often).
That's pretty hard to say. You may initially think gather all the data once and keep it in memory say as an object. However, at times the underlying data may well already be in memory due to it being cached. Perhaps it could be better to utilise part of each. So I believe the answer is, you will need to test various scenarios.

I would probably avoid those composite primary keys.
And use the more commonly used integer with an autoincrement.
Then add the unique or non-unique composite indexes where needed.
Although i.m.h.o it's not always a bad idea to use a short CHAR or VARCHAR as the primary key in some cases. Mostly when easy to understand abbreviations can be used.
An example. Suppose you have a reference table for Countries. With a primary key on the 2 character CountryCode. Then when querying a table with a foreign key on that CountryCode, then for the human mind it's way easier to understand 'US' than some integer. Even without joining to Countries you'll probably know what Country is referenced.
So here are your tables with a slightly different layout.
create table archetype
(
archetype_id integer primary key autoincrement,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_defense_id varchar(16) not null,
trait_offense_id varchar(16) not null,
archetype_description varchar(128),
constraint uk_archetype unique (attack_id, damage_id,
resource_type_id,
trait_defense_id,
trait_offense_id)
);
create table effect
(
effect_id integer primary key autoincrement,
archetype_id integer not null, -- FK archetype
effect_apply_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint pk_effect unique(archetype_id, effect_type_id,
effect_detail_id, effect_apply_id,
attack_id, damage_id, resource_type_id),
constraint fk_effect_archetype_id foreign key(archetype_id)
references archetype (archetype_id)
);
create table ability
(
ability_id integer primary key autoincrement,
ability_description varchar(128)
);
create table ability_effect
(
ability_effect_id integer primary key autoincrement,
ability_id integer not null, -- FK ability
effect_id integer not null, -- FK effect
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint fk_ability_effect_ability_id foreign key (ability_id)
references ability (ability_id),
constraint fk_ability_effect_effect_id foreign key (effect_id)
references effect (effect_id)
);

I have a GUID Clustered primary key - Is there a way I can optimize or unfragment a table that might be fragmented?

Here's the code I have. The table actually has 20 more columns but I am just showing the first few:
CREATE TABLE [dbo].[Phrase]
(
[PhraseId] [uniqueidentifier] NOT NULL,
[PhraseNum] [int] NULL
[English] [nvarchar](250) NOT NULL,
PRIMARY KEY CLUSTERED ([PhraseId] ASC)
) ON [PRIMARY]
GO
From what I remember I read
Fragmentation and GUID clustered key
that it was good to have a GUID for the primary key but now it's been suggested it's not a good idea as data has to be re-ordered for each insert -- causing fragmentation.
Can anyone comment on this. Now my table has already been created is there a way to unfragment it? Also how can I stop this problem getting worse. Can I modify an existing table add NEWSEQUENTIALID?

Thats true ,NEWSEQUENTIALID helps to completely fill the data and index pages.
But NEWSEQUENTIALID datasize is 4 times than int.So 4 times more page will be require than int.
declare #t table(col int
,col2 uniqueidentifier DEFAULT NEWSEQUENTIALID())
insert into #t (col) values(1),(2)
select DATALENGTH(col2),DATALENGTH(col) from #t
Suppose x data page is require in case of int to hold 100 rows
In case of NEWSEQUENTIALID 4x data page will be require to hold 100 rows.
Therefore query will read more page to fetch same number of records.
So ,if you can alter table then you can add int identity column and make it PK+CI.You can drop or not [uniqueidentifier] as per your requirement or need.

Looks like this is dup to:
INT vs Unique-Identifier for ID field in database
But here's a rehash for your issue:
Rather than a guid and depending on your table depth, int or big int would be better choices, both from storage and optimization vantages. You might also consider defining the field as "int identity not null" to further help population.
GUIDs have a considerable storage impact, due to their length.
CREATE TABLE [dbo].[Phrase]
(
[PhraseId] [int] identity NOT NULL
CONSTRAINT [PK_Phrase_PhraseId] PRIMARY KEY,
[PhraseNum] [int] NULL
[English] [nvarchar](250) NOT NULL,
....
) ON [PRIMARY]
GO

Postgresql Query is very Slow

I have a table with 300000 rows and when i run a simple query like
select * from diario_det;
it leaves 41041 ms to return rows. It's fine that? How i can optimize the query?
I use Postgresql 9.3 in Centos 7.
Here's is my table
CREATE TABLE diario_det
(
cod_empresa numeric(2,0) NOT NULL,
nro_asiento numeric(8,0) NOT NULL,
nro_secue_pase numeric(4,0) NOT NULL,
descripcion_pase character varying(150) NOT NULL,
monto_debe numeric(16,3),
monto_haber numeric(16,3),
estado character varying(1) NOT NULL,
cod_pcuenta character varying(15) NOT NULL,
cod_local numeric(2,0) NOT NULL,
cod_centrocosto numeric(4,0) NOT NULL,
cod_ejercicio numeric(4,0) NOT NULL,
nro_comprob character varying(15),
conciliado_por character varying(10),
CONSTRAINT fk_diario_det_cab FOREIGN KEY (cod_empresa, cod_local, cod_ejercicio, nro_asiento)
REFERENCES diario_cab (cod_empresa, cod_local, cod_ejercicio, nro_asiento) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_diario_det_pc FOREIGN KEY (cod_empresa, cod_pcuenta)
REFERENCES plan_cuenta (cod_empresa, cod_pcuenta) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=TRUE
);
ALTER TABLE diario_det
OWNER TO postgres;
-- Index: pk_diario_det_ax
-- DROP INDEX pk_diario_det_ax;
CREATE INDEX pk_diario_det_ax
ON diario_det
USING btree
(cod_pcuenta COLLATE pg_catalog."default", cod_local, estado COLLATE pg_catalog."default");

Very roughly size of one row is 231 bytes, times 300000... It's 69300000 bytes (~69MB) that has to be transferred from server to client.
I think that 41 seconds is a bit long, but still the query has to be slow because of amount of data that has to be loaded from disk and transferred.
You can optimise query by
selecting just columns you that are going to use not all of them (if you need just cod_empresa it would reduce total amount of transferred data to ~1.2MB, but server would still have to iterate trough all records - slow)
filter only rows that are going to use - using WHERE on columns with indexes can really speed the query up
If you want to know what is happening in your query, play around with EXPLAIN and EXPLAIN EXECUTE.
Also, if you're running dedicated database server, be sure to configure it properly to use a lot of system resources.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Primary Index not being used - optimization

Just to confirm: when you expanded the system (assuming the use of gpexpand), you did run the redistribution phase (gpexpand is a two step process)? When that completed, did you run analyzedb on the system to make sure statistics were updated with the new table/index segments?

Related

Large SQL Request optimization for Faces Euclidean Distances calculations

Very slow delete/update in MariaDb ColumnStore Server using primary key

One to many to many relationship, with composite keys

I have a GUID Clustered primary key - Is there a way I can optimize or unfragment a table that might be fragmented?

Postgresql Query is very Slow

Categories

Resources