How can I speed up the following sql query? [duplicate] - sql

This question already exists:
Closed 11 years ago.
Possible Duplicate:
How can I speed up the following update query?
I would like to run the following query in an acceptable time (e.g. max 15 mins) on a modern desktop computer running postgresql 8.4:
UPDATE cap_options_rule_items
SET cap_option_id = O.id
FROM cap_options_rule_items I
JOIN cap_options_rules R
ON R.id = I.cap_options_rule_id
JOIN cap_options O
ON R.cap_engine_id = O.cap_engine_id
AND O.code = I.cap_option_code;
I would like to know if there are obvious mistakes I'm doing in the query on with the choice of indexes.
The tables in the query have the following number of records:
cap_options_rule_item: 2208705
cap_options_rule: 430268
cap_options: 1628188
And the following schema (including indexes)
-- Table: cap_options_rule_items
CREATE TABLE cap_options_rule_items
(
id serial NOT NULL,
cap_options_rule_id integer,
cap_option_code integer,
"primary" boolean,
cap_option_id integer,
CONSTRAINT cap_options_rule_items_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
-- Index: index_cap_options_rule_items_on_cap_option_id
CREATE INDEX index_cap_options_rule_items_on_cap_option_id
ON cap_options_rule_items
USING btree (cap_option_code);
-- Index: index_cap_options_rule_items_on_cap_option_rule_id
CREATE INDEX index_cap_options_rule_items_on_cap_option_rule_id
ON cap_options_rule_items
USING btree (cap_options_rule_id);
-- Table: cap_options_rules
CREATE TABLE cap_options_rules
(
id serial NOT NULL,
rule_type character varying(255),
cap_engine_id integer,
CONSTRAINT cap_options_rules_pkey PRIMARY KEY (id)
) WITH ( OIDS=FALSE
);
-- Index: index_cap_options_rules_on_cap_engine_id
CREATE INDEX index_cap_options_rules_on_cap_engine_id
ON cap_options_rules
USING btree (cap_engine_id);
-- Table: cap_options
CREATE TABLE cap_options
( id serial NOT NULL,
description character varying(255),
cap_engine_id integer,
cap_option_category_id integer,
basic_price numeric,
vat numeric,
default_option boolean,
created_at timestamp without time zone,
updated_at timestamp without time zone,
code integer,
CONSTRAINT cap_options_pkey PRIMARY KEY (id)
) WITH ( OIDS=FALSE
);
-- Index: index_code_and_cap_engine_id_on_cap_options
CREATE INDEX index_code_and_cap_engine_id_on_cap_options
ON cap_options
USING btree (code, cap_engine_id);
Thank you!

Your query is slow because you are updating all the rows in cap_options_rule_items.
I think you really want something like this:
UPDATE cap_options_rule_items I
SET cap_option_id = O.id
FROM cap_options_rules R
join cap_options O on R.cap_engine_id = O.cap_engine_id
WHERE I.cap_options_rule_id = R.id
and I.cap_option_code = O.code;

Related

Large SQL Request optimization for Faces Euclidean Distances calculations

I am calculating Euclidean distance between faces and want to store results in a table.
Current setup :
Each face is stored in Objects table and Distances between faces is stored in Faces_distances table.
The object table has the following columns objects_id, face_encodings, description
The faces_distances table has the following columns face_from, face_to, distance
In my my data set I have around 22 231 face objects which result in 494 217 361 pairs of faces - Although I understand it could be divided by 2 because
distance(face_from, face_to) = distance(face_to, face_from)
The database is Postgres 12.
The request below enables to insert the pairs of faces (without performing the distance calculation) that have not been calculated yet, but the execution time is very very very long (started 4 Days ago and still not done). Is there a way to optimize it ?
'''
-- public.objects definition
-- Drop table
-- DROP TABLE public.objects;
CREATE TABLE public.objects
(
objects_id int4 NOT NULL DEFAULT
nextval('objects_in_image_objects_id_seq'::regclass),
filefullname varchar(2303) NULL,
bbox varchar(255) NULL,
description varchar(255) NULL,
confidence numeric NULL,
analyzer varchar(255) NOT NULL DEFAULT 'object_detector'::character
varying,
analyzer_version int4 NOT NULL DEFAULT 100,
x int4 NULL,
y int4 NULL,
w int4 NULL,
h int4 NULL,
image_id int4 NULL,
derived_from_object int4 NULL,
object_image_filename varchar(2023) NULL,
face_encodings _float8 NULL,
face_id int4 NULL,
face_id_iteration int4 NULL,
text_found varchar NULL COLLATE "C.UTF-8",
CONSTRAINT objects_in_image_pkey PRIMARY KEY (objects_id),
CONSTRAINT objects_in_images FOREIGN KEY (objects_id) REFERENCES
public.objects(objects_id)
);
CREATE TABLE public.face_distances
(
face_from int8 NOT NULL,
face_to int8 NOT NULL,
distance float8 NULL,
CONSTRAINT face_distances_pk PRIMARY KEY (face_from, face_to)
);
-- public.face_distances foreign keys
ALTER TABLE public.face_distances ADD CONSTRAINT face_distances_fk
FOREIGN KEY (face_from) REFERENCES public.objects(objects_id);
ALTER TABLE public.face_distances ADD CONSTRAINT face_distances_fk_1
FOREIGN KEY (face_to) REFERENCES public.objects(objects_id);
Indexes
CREATE UNIQUE INDEX objects_in_image_pkey ON public.objects USING btree (objects_id);
CREATE INDEX objects_description_column ON public.objects USING btree (description);
CREATE UNIQUE INDEX face_distances_pk ON public.face_distances USING btree (face_from, face_to);
Query to add all pair of faces that are not already in the table.
insert into face_distances (face_from,face_to)
select t1.face_from , t1.face_to
from (
select f_from.objects_id face_from,
f_from.face_encodings face_from_encodings,
f_to.objects_id face_to,
f_to.face_encodings face_to_encodings
from objects f_from,
objects f_to
where f_from.description = 'face'
and f_to.description = 'face' ) as t1
left join face_distances on (
t1.face_from= face_distances.face_from
and t1.face_to = face_distances.face_to )
where face_distances.face_from is null;
try this simplified query.
It took only 5 minutes on my apple M1, SQLServer, with 22231 objects 'face', generated 247.097.565 pairs, which is excatly C(22231,2) number. The syntax is compatible with postgressql.
optimizations: join instead of the old jointure way, ranking functions to remove duplicates permutations (A,B)=(B,A),
removed the last [left join face_distance]: an empty table to recompute is a lot faster than checking for existance as an index search key lookup would be initiated for each key pair
insert into face_distances (face_from,face_to)
select f1,f2
from(
select --only needed fields here as this will fill temporary tables
f1.objects_id f1
,f2.objects_id f2
,dense_rank()over(order by f1.objects_id) rank1
,rank()over(partition by f2.objects_id order by f1.objects_id) rank2
from objects f1
-- generates all permutations
join objects f2 on f2.objects_id <> f1.objects_id and f2.description = 'face'
where f1.description = 'face'
)a
where rank2 >= rank1 --removes duplicate permutations

How can I fix "operator does not exist: text = uuid" when using Haskell's postgres-simple library to perform a multi-row insert?

I am using the postgres-simple library to insert into the eligible_class_passes table. Which is essentially a join table representing a many to many relationship.
I am using the executeMany function from the postgres-simple to do a multi row insert.
updateEligibleClassPasses :: Connection -> Text -> Text -> [Text] -> IO Int64
updateEligibleClassPasses conn tenantId classTypeId classPassTypeIds =
withTransaction conn $ do
executeMany
simpleConn
[sql|
INSERT INTO eligible_class_passes (class_type_id, class_pass_type_id)
SELECT upd.class_type_id::uuid, upd.class_pass_type_id::uuid
FROM (VALUES (?, ?, ?)) as upd(class_type_id, class_pass_type_id, tenant_id)
INNER JOIN class_types AS ct
ON upd.class_type_id::uuid = ct.id
INNER JOIN subscription_types AS st
ON class_pass_type_id::uuid = st.id
WHERE ct.tenant_id = upd.tenant_id::uuid AND st.tenant_id = upd.tenant_id::uuid
|]
params
where
addParams classPassTypeId = (classTypeId, classPassTypeId, tenantId)
params = addParams <$> classPassTypeIds
When this function is executed with the correct parameters applied I get the following runtime error
SqlError {sqlState = "42883", sqlExecStatus = FatalError, sqlErrorMsg = "operator does not exist: text = uuid", sqlErrorDetail = "", sqlErrorHint = "No operator matches the given name and argument type(s). You might need to add explicit type casts."}
However, when translated to SQL without the parameter substitutions (?) the query works correctly when executed in psql.
INSERT INTO eligible_class_passes (class_type_id, class_pass_type_id)
SELECT upd.class_type_id::uuid, upd.class_pass_type_id::uuid
FROM (VALUES ('863cb5ea-7a68-41d5-ab9f-5344605de500', 'e9195660-fd48-4fa2-9847-65a0ad323bd5', '597e6d7a-092a-49be-a2ea-11e8d85d8f82')) as upd(class_type_id, class_pass_type_id, tenant_id)
INNER JOIN class_types AS ct
ON upd.class_type_id::uuid = ct.id
INNER JOIN subscription_types AS st
ON class_pass_type_id::uuid = st.id
WHERE ct.tenant_id = upd.tenant_id::uuid AND st.tenant_id = upd.tenant_id::uuid;
My schema is as follows
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE tenants (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
name text NOT NULL UNIQUE,
email text NOT NULL UNIQUE,
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now()
);
CREATE TABLE class_types (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
FOREIGN KEY (tenant_id) REFERENCES tenants (id),
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now()
);
CREATE TABLE class_pass_types (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
name TEXT NOT NULL,
tenant_id UUID NOT NULL,
price Int NOT NULL,
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now(),
FOREIGN KEY (tenant_id) REFERENCES tenants (id)
);
-- Many to many join through table.
-- Expresses class pass type redeemability against class types.
CREATE TABLE eligible_class_passes (
class_type_id UUID,
class_pass_type_id UUID,
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now(),
FOREIGN KEY (class_type_id) REFERENCES class_types (id) ON DELETE CASCADE,
FOREIGN KEY (class_pass_type_id) REFERENCES class_pass_types (id) ON DELETE CASCADE,
PRIMARY KEY (
class_type_id, class_pass_type_id
)
);
To help debug your issue, use formatQuery function, then you can see what kind of final query postgresql-simple is sending to the server.
Also, I'd recommend using UUID type from uuid-types package, instead of Text for the uuids. Using Text most likely hides some issues from you (which you'll hopefully see by using formatQuery.

ORA-00904: "NO_OF_PROJ_PER_CON_PY": invalid identifier

I am trying to create a fact table which will display the number of projects per consultant per year. It has 2 dimension tables 1 for time (report_time_dim) and the other for consultants(consultant_dim) then the main fact table (fact_table).
CREATE TABLE fact_table(
fact_key INTEGER NOT NULL,
consultant_key INTEGER NOT NULL,
time_key INTEGER NOT NULL,
no_of_projects_py INTEGER,
no_of_consultants_py INTEGER,
no_of_accounts_py INTEGER,
no_of_proj_per_con_py INTEGER,
fk1_time_key INTEGER NOT NULL,
fk2_consultant_key INTEGER NOT NULL,
-- Specify the PRIMARY KEY constraint for table "fact_table".
-- This indicates which attribute(s) uniquely identify each row of data.
CONSTRAINT pk_fact_table PRIMARY KEY (consultant_key,time_key)
);
CREATE TABLE report_time_dim(
time_key INTEGER NOT NULL,
year INTEGER,
-- Specify the PRIMARY KEY constraint for table "time_dim".
-- This indicates which attribute(s) uniquely identify each row of data.
CONSTRAINT pk_report_time_dim PRIMARY KEY (time_key)
);
CREATE TABLE consultant_dim(
consultant_key INTEGER NOT NULL,
project_id INTEGER,
consultant_id INTEGER,
-- Specify the PRIMARY KEY constraint for table "consultant_dim".
-- This indicates which attribute(s) uniquely identify each row of data.
CONSTRAINT pk_consultant_dim PRIMARY KEY (consultant_key)
);
Each table has it's own surrogate key and I have managed to populate the time and consultant tables successfully, however the issue I'm having is with the fact table. When I try to populate it I get the error ORA-00904: "NO_OF_PROJ_PER_CON_PY": invalid identifier. I am unsure how I can go about fixing this and populating the fact table so it will display the information I want. Any help would be appreciated.
--populate fact_table
--table that lists consultant ids, project ids and years
DROP TABLE temp_fact1;
CREATE TABLE temp_fact1 AS
SELECT project_id, fk2_consultant_id, to_number(to_char(lds_project.pj_actual_start_date, 'YYYY')) as which_year FROM lds_project;
--display table
SELECT * FROM temp_fact1;
--list that counts the number of projects for each consultant and specify the year
DROP TABLE temp_fact2;
CREATE TABLE temp_fact2 AS
SELECT which_year, fk2_consultant_id, COUNT(*) project_id FROM temp_fact1 GROUP by fk2_consultant_id, which_year;
--display table
SELECT * FROM temp_fact2;
--fact table surrogate key
DROP SEQUENCE fact_seq;
CREATE SEQUENCE fact_seq
START WITH 1
INCREMENT BY 1
MAXVALUE 1000000
MINVALUE 1
NOCACHE
NOCYCLE;
--load data
INSERT INTO fact_table (fact_key, consultant_key, time_key, no_of_proj_per_con_py)
SELECT fact_seq.nextval, consultant_key, report_time_dim.time_key, no_of_proj_per_con_py FROM temp_fact2, report_time_dim WHERE temp_fact2.which_year = report_time_dim.year;
Try just running this select by itself - it's the last line in your script.
SELECT fact_seq.nextval,
consultant_key,
report_time_dim.time_key,
no_of_proj_per_con_py
FROM temp_fact2, report_time_dim
WHERE temp_fact2.which_year = report_time_dim.year;
It doesn't look like either TEMP_FACT2 or REPORT_TIME_DIM has a column named no_of_proj_per_con_py. I'm not sure where you want to pull that data from, actually.

Optimizing GROUP BY in hsqldb

I have a table with 700K+ records on wich a simple GROUP BY query takes in excess of 35+ seconds to execute. I'm out of ideas on how to optimize this.
SELECT TOP 10 called_dn, COUNT(called_dn) FROM reportview.calls_out GROUP BY called_dn;
Here I add TOP 10 to limit network transfer induced delays.
I have an index on called_dn (hsqldb seems not to be using this).
called_dn is non nullable.
reportview.calls_out is a cached table.
Here's the table script:
CREATE TABLE calls_out (
pk_global_call_id INTEGER GENERATED BY DEFAULT AS SEQUENCE seq_global_call_id NOT NULL,
sys_global_call_id VARCHAR(65),
call_start TIMESTAMP WITH TIME ZONE NOT NULL,
call_end TIMESTAMP WITH TIME ZONE NOT NULL,
duration_interval INTERVAL HOUR TO SECOND(0),
duration_seconds INTEGER,
call_segments INTEGER,
calling_dn VARCHAR(25) NOT NULL,
called_dn VARCHAR(25) NOT NULL,
called_via_dn VARCHAR(25),
fk_end_status INTEGER NOT NULL,
fk_incoming_queue INTEGER,
call_start_year INTEGER,
call_start_month INTEGER,
call_start_week INTEGER,
call_start_day INTEGER,
call_start_hour INTEGER,
call_start_minute INTEGER,
call_start_second INTEGER,
utc_created TIMESTAMP WITH TIME ZONE,
created_by VARCHAR(25),
utc_modified TIMESTAMP WITH TIME ZONE,
modified_by VARCHAR(25),
PRIMARY KEY (pk_global_call_id),
FOREIGN KEY (fk_incoming_queue)
REFERENCES lookup_incoming_queue(pk_id),
FOREIGN KEY (fk_end_status)
REFERENCES lookup_end_status(pk_id));
I'm I stuck with this kind of performance or is there something I might try to speed up this query?
EDIT: Here's the query plan if it helps:
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[true]
columns=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN not nullable
COUNT arg=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN nullable]
[range variable 1
join type=INNER
table=CALLS_OUT
cardinality=771855
access=FULL SCAN
join condition = [index=SYS_IDX_SYS_PK_10173_10177]]]
groupColumns=[COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN]
offset=[VALUE = 0, TYPE = INTEGER]
limit=[VALUE = 10, TYPE = INTEGER]
PARAMETERS=[]
SUBQUERIES[]
Well, as it seems there's no way to avoid a full column scan in this situation.
Just for reference of future souls reaching this question, here's what I resorted to in the end:
Created a summary table maintained by INSERT / DELETE triggers in the original table. This in combination with suitable indexes and using LIMIT USING INDEX clauses in my queries yields very good performance.

How to use soft order with JOIN

I want to use table JOIN with soft order for these tables:
CREATE TABLE ACCOUNT(
ID INTEGER NOT NULL,
USER_NAME TEXT NOT NULL,
PASSWD TEXT,
FIRST_NAME TEXT,
LAST_NAME TEXT,
E_MAIL TEXT NOT NULL,
COUNTRY TEXT,
STATE TEXT,
CITY TEXT,
ADDRESS TEXT,
STATUS INTEGER,
SECURITY_QUESTION TEXT,
SECURITY_ANSWER TEXT,
LAST_PASSWD_RESET DATE,
DESCRIPTION TEXT,
LAST_UPDATED DATE,
CREATED DATE
)
;
-- ADD KEYS FOR TABLE ACCOUNT
ALTER TABLE ACCOUNT ADD CONSTRAINT KEY1 PRIMARY KEY (ID)
;
ALTER TABLE ACCOUNT ADD CONSTRAINT USER_NAME UNIQUE (USER_NAME)
;
ALTER TABLE ACCOUNT ADD CONSTRAINT E_MAIL UNIQUE (E_MAIL)
;
-- TABLE ACCOUNT_ROLE
CREATE TABLE ACCOUNT_ROLE(
ID INTEGER NOT NULL,
USER_NAME TEXT NOT NULL,
ROLE INTEGER,
PERMISSION TEXT,
LAST_UPDATED DATE,
CREATED DATE
)
;
-- CREATE INDEXES FOR TABLE ACCOUNT_ROLE
CREATE INDEX IX_RELATIONSHIP19 ON ACCOUNT_ROLE (ID)
;
-- ADD KEYS FOR TABLE ACCOUNT_ROLE
ALTER TABLE ACCOUNT_ROLE ADD CONSTRAINT KEY26 PRIMARY KEY (ID)
;
ALTER TABLE ACCOUNT_ROLE ADD CONSTRAINT RELATIONSHIP19 FOREIGN KEY (ID) REFERENCES ACCOUNT (ID) ON DELETE CASCADE ON UPDATE CASCADE
;
Working query:
SELECT * FROM ACCOUNT ORDER BY %S %S offset ? limit ?
I tried this SQL query:
SELECT *
FROM ACCOUNT_ROLE
INNER JOIN ACCOUNT ON ACCOUNT.ID = ACCOUNT_ROLE.ID
ORDER BY Account.%S Account.%S offset ? limit ?
But I get this error message:
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "Account"
Position: 99
How I can fix this query? I would like to get the data from two tables and sort it based in value.
It is not entirely clear what you are asking so instead I am proposing a few troubleshooting steps:
It looks like you are trying to do some query preprocessing. Please log the query after this is done and troubleshoot based on that. Failing that, check the PostgreSQL logs for the failing query text string (the logged query is better because of how placeholders are handled).
Once you are looking at the query itself, then look at it for syntax errors.
The problems are almost certainly in portions of your code you have not shown us. Knowing how to get the troubleshooting process started however can be worth mentioning.