I am joining two tables with a left join:
The first table is quite simple
create table L (
id integer primary key
);
and contains only a handful of records.
The second table is
create table R (
L_id null references L,
k text not null,
v text not null
);
and contains millions of records.
The following two indexes are on R:
create index R_ix_1 on R(L_id);
create index R_ix_2 on R(k);
This select statement, imho, selects the wrong index:
select
L.id,
R.v
from
L left join
R on
L.id = R.L_id and
R.k = 'foo';
A explain query plan tells me that the select statement uses the index R_ix_2, the execution of the select takes too much time. I believe the performance would be much
better if sqlite chose to use R_ix_1 instead.
I tried also
select
L.id,
R.v
from
L left join
R indexed by R_ix_1 on
L.id = R.L_id and
R.k = 'foo';
but that gave me Error: no query solution.
Is there something I can do to make sqlite use the other index?
Your join condition relies on 2 columns, so your index should cover those 2 columns:
create index R_ix_1 on R(L_id, k);
If you do some other queries relying only on single column, you can keep old indexes, but you still need to have this double-column index as well:
create index R_ix_1 on R(L_id);
create index R_ix_2 on R(k);
create index R_ix_3 on R(L_id, k);
I wonder if the SQLite optimizer just gets confused in this case. Does this work better?
select L.id, R.v
from L left join
R
on L.id = R.L_id
where R.k = 'foo' or R.k is NULL;
EDIT:
Of course, SQLite will only use an index if the types of the columns are the same. The question doesn't specify the type of l_id. If it is not the same as the type of the primary key, then the index (probably) will not be used.
Related
The question is for Firebird 2.5. Let's assume we have the following query:
SELECT
EVENTS.ID,
EVENTS.TS,
EVENTS.DEV_TS,
EVENTS.COMPLETE_TS,
EVENTS.OBJ_ID,
EVENTS.OBJ_CODE,
EVENTS.SIGNAL_CODE,
EVENTS.SIGNAL_EVENT,
EVENTS.REACTION,
EVENTS.PROT_TYPE,
EVENTS.GROUP_CODE,
EVENTS.DEV_TYPE,
EVENTS.DEV_CODE,
EVENTS.SIGNAL_LEVEL,
EVENTS.SIGNAL_INFO,
EVENTS.USER_ID,
EVENTS.MEDIA_ID,
SIGNALS.ID AS SIGNAL_ID,
SIGNALS.SIGNAL_TYPE,
SIGNALS.IMAGE AS SIGNAL_IMAGE,
SIGNALS.NAME AS SIGNAL_NAME,
REACTION.INFO,
USERS.NAME AS USER_NAME
FROM EVENTS
LEFT OUTER JOIN SIGNALS ON (EVENTS.SIGNAL_ID = SIGNALS.ID)
LEFT OUTER JOIN REACTION ON (EVENTS.ID = REACTION.EVENTS_ID)
LEFT OUTER JOIN USERS ON (EVENTS.USER_ID = USERS.ID)
WHERE (TS BETWEEN '27.07.2021 00:00:00' AND '28.07.2021 10:34:08')
AND (OBJ_ID = 8973)
AND (DEV_CODE IN (0, 1234))
AND (DEV_TYPE = 79)
AND (PROT_TYPE = 8)
ORDER BY TS;
EVENTS has about 190 million records by now and this query takes too much time to complete. As I read here, the tables have to have indexes on all the columns that are used.
Here are the CREATE INDEX statements for the EVENTS table:
CREATE INDEX FK_EVENTS_OBJ ON EVENTS (OBJ_ID);
CREATE INDEX FK_EVENTS_SIGNALS ON EVENTS (SIGNAL_ID);
CREATE INDEX IDX_EVENTS_COMPLETE_TS ON EVENTS (COMPLETE_TS);
CREATE INDEX IDX_EVENTS_OBJ_SIGNAL_TS ON EVENTS (OBJ_ID,SIGNAL_ID,TS);
CREATE INDEX IDX_EVENTS_TS ON EVENTS (TS);
Here is the data from the PLAN analyzer:
PLAN JOIN (JOIN (JOIN (EVENTS ORDER IDX_EVENTS_TS INDEX (FK_EVENTS_OBJ, IDX_EVENTS_TS), SIGNALS INDEX (PK_SIGNALS)), REACTION INDEX (IDX_REACTION_EVENTS)), USERS INDEX (PK_USERS))
As requested the speed of the execution:
without LEFT JOIN -> 138ms
with LEFT JOIN -> 338ms
Is there another way to speed up the execution of the query besides indexing the columns or maybe add another index?
If I add another index will the optimizer choose to use it?
You can only optimize the joins themselves by being sure that the keys are indexed in the second tables. These all look like primary keys, so they should have appropriate indexes.
For this WHERE clause:
WHERE TS BETWEEN '27.07.2021 00:00:00' AND '28.07.2021 10:34:08')
OBJ_ID = 8973 AND
DEV_CODE IN (0, 1234) AND
DEV_TYPE = 79 AND
PROT_TYPE = 8
You probably want an index on (OBJ_ID, DEV_TYPE, PROT_TYPE, TS, DEV_CODE). The order of the first three keys is not particularly important because they are all equality comparisons. I am guessing that one day of data is fewer rows than two device codes.
First of all you want to find the table1 rows quickly. You are using several columns in your WHERE clause to get them. Provide an index on these columns. Which column is the most selective? I.e. which criteria narrows the result rows most? Let's say it's dt, so we put this first:
create index idx1 on table1 (dt, oid, pt, ts, dc);
I have put ts and dt last, because we are looking for more than one value in these columns. It may still be that putting ts or dsas the first column is a good choice. Sometimes we have to play around with this. I.e. provide several indexes with the column order changed and then see which one gets used by the DBMS.
Tables table2 and tabe4 get accessed by the primary key for which exists an index. But table3 gets accessed by t1id. So provide an index on that, too:
create index idx2 on table3 (t1id);
Faced with need of using column aliases in where condition for selection.
Found possible solution here.
Let's assume we have one-to-one relationship (user-to-role) and we want to get results as following:
SELECT u.name AS u_name, r.name AS r_name
FROM users AS u
INNER JOIN roles AS r
ON u.role_id = r.role_id
WHERE u.name = 'John'
And we have corresponding idex for user.name (just for example).
If this query is run with EXPLAIN, it shows all indexes that are used during selection (including index for name).
Now, as we want to use aliases in WHERE clause, based on proposed solution we can rewrite the query:
SELECT * FROM (
SELECT u.name AS u_name, r.name AS r_name
FROM users AS u
INNER JOIN roles AS r
ON u.role_id = r.role_id
) AS temp
WHERE u_name = 'John'
As you see, there's no WHERE clause in nested select. Running this query with EXPLAIN gives the same results (just to admit, I'm not an expert in analyzing results of 'explain', but still):
same indexes
same costs
similar time of execution
And I'm a little bit confused by this result: was convinced that at least index for user name won't be used.
Q1: Does postgres use indexes in that way?
Q2: Are there possible performance issues?
The subquery is not needed, so it can be unrolled/collapsed.
The following query will generate a flat plan (and indexes are not relevant)
\i tmp.sql
CREATE TABLE t
(a integer not null primary key
);
insert into t(a)
select v from generate_series(1,10000) v
;
ANALYZE t;
EXPLAIN
SELECT * from (
select d AS e from (
select c as d from (
select b AS c from (
select a AS b from t
) s
) r
) q
) p
where e =95
;
Resulting plan:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 10000
ANALYZE
QUERY PLAN
---------------------------------------------------------------------
Index Only Scan using t_pkey on t (cost=0.17..2.38 rows=1 width=4)
Index Cond: (a = 95)
(2 rows)
In the OP's fragment,the innermost query (table expression) is a two-table join
, but the mechanism is the same: all the outer layers can be peeled off (and the result column is renamed)
And yes: the join will benefit from indexes on the joined fields, and the final where could use an index, too.
SQL is a descriptive language, not a procedural language. A SQL query describes the result set being produced. It does not specify how to create it -- and that is even more true in Postgres which doesn't have compiler options or hints.
What actually gets run is a directed acyclic graph of operations (DAG). The compiling step creates the DAG. Postgres is smart enough to realize that the subquery is meaningless, so the two versions are optimized to the same DAG.
Let me add that I think Postgres usually materializes CTEs, so using a CTE might prevent the index from being used.
I have a table T with some 500000 records. That table is a hierarchical table.
My goal is to update the table by self joining the same table based on some condition for parent - child relationship
The update query is taking really long because the number of rows is really high. I have created an unique index on the column which helps identifying the rows to update (meanign x and Y). After creating the index the cost has reduced but still the query is performing a lot slower.
This my query format
update T
set a1, b1
= (select T.parent.a1, T.parent.b1
from T T.paremt, T T.child
where T.parent.id = T.child.Parent_id
and T.X = T.child.X
and T.Y = T.child.Y
after creating the index the execution plan shows that it is doing an index scan for CRS.PARENT but going for a full table scan for for CRS.CHILD and also during update as a result the query is taking for ever to complete.
Please suggest any tips or recommendations to solve this problem
You are updating all 500,000 rows, so an index is a bad idea. 500,000 index lookups will take much longer than it needs to.
You would be better served using a MERGE statement.
It is hard to tell exactly what your table structure is, but it would look something like this, assuming X and Y are the primary key columns in T (...could be wrong about that):
MERGE INTO T
USING ( SELECT TC.X,
TC.Y,
TP.A1,
TP.A2
FROM T TC
INNER JOIN T TP ON TP.ID = TC.PARENT_ID ) U
ON ( T.X = U.X AND T.Y = U.Y )
WHEN MATCHED THEN UPDATE SET T.A1 = U.A1,
T.A2 = U.A2;
I have the following compound sql statement for a lookup and I am trying to understand that are the optimal indexes (indices?) to create, and which ones I should leave out because they aren't needed or if it is counter productive to have multiple.
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items LEFT OUTER JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE items.standard_part_number LIKE '#{part_number}%'
UNION ALL
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items LEFT OUTER JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE part_numbers.value LIKE '#{part_number}%'
ORDER BY items.standard_part_number
LIMIT '#{limit}' OFFSET '#{offset}'
I have the following indices, some of them may not be necessary or could I be missing an index?... Or worse can having too many be working against the optimal performance configuration?
for items:
CREATE INDEX index_items_standard_part_number ON items (standard_part_number);
for part_numbers:
CREATE INDEX index_part_numbers_item_id ON part_numbers (item_id);
CREATE INDEX index_part_numbers_item_id_and_account_id on part_numbers (item_id,account_id);
CREATE INDEX index_part_numbers_item_id_and_account_id_and_value ON part_numbers (item_id,account_id,value);
CREATE INDEX index_part_numbers_item_id_and_value on part_numbers (item_id,value);
CREATE INDEX index_part_numbers_value on part_numbers (value);
Update:
The schema for the tables listed above
CREATE TABLE accounts (id INTEGER PRIMARY KEY,name TEXT,code TEXT UNIQUE,created_at INTEGER,updated_at INTEGER,company_id INTEGER,standard BOOLEAN,price_list_id INTEGER);
CREATE TABLE items (id INTEGER PRIMARY KEY,standard_part_number TEXT UNIQUE,standard_price INTEGER,part_number TEXT,price INTEGER,quantity INTEGER,unit_of_measure TEXT,metadata TEXT,image_file_name TEXT,created_at INTEGER,updated_at INTEGER,company_id INTEGER);
CREATE TABLE part_numbers (id INTEGER PRIMARY KEY,value TEXT,item_id INTEGER,account_id INTEGER,created_at INTEGER,updated_at INTEGER,company_id INTEGER,standard BOOLEAN);
Outer joins constrain the join order, so you should not use them unless necessary.
In the second subquery, the WHERE part_numbers.value LIKE ... clause would filter out any unmatched records anyway, so you should drop that LEFT OUTER.
SQLite can use at most one index per table per (sub)query.
So to be able to use the same index for both searching and sorting, both operations must use the same collation.
LIKE uses a case-insensitive collation, so the ORDER BY should be declared to use the same (ORDER BY items.standard_part_number COLLATE NOCASE).
This is not possible if the part numbers must be sorted case sensitively.
This is not needed if SQLite does not actually use the same index for both (check with EXPLAIN QUERY PLAN).
In the first subquery, there is no index that could be used for the items.standard_part_number LIKE '#{part_number}%' search.
You would need an index like this (NOCASE is needed for LIKE):
CREATE INDEX iii ON items(standard_part_number COLLATE NOCASE);
In the second subquery, SQLite is likely to use part_numbers as the outer table in the join because it has two filtered columns.
An index for these two searches must look like this (with NOCASE only for the second column):
CREATE INDEX ppp ON part_numbers(account_id, value COLLATE NOCASE);
With all these changes, the query and its EXPLAIN QUERY PLAN output look like this:
EXPLAIN QUERY PLAN
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items LEFT OUTER JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE items.standard_part_number LIKE '#{part_number}%'
UNION ALL
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE part_numbers.value LIKE '#{part_number}%'
ORDER BY items.standard_part_number COLLATE NOCASE
LIMIT -1 OFFSET 0;
1|0|0|SEARCH TABLE items USING INDEX iii (standard_part_number>? AND standard_part_number<?)
1|1|1|SEARCH TABLE part_numbers USING COVERING INDEX index_part_numbers_item_id_and_account_id_and_value (item_id=? AND account_id=?)
2|0|1|SEARCH TABLE part_numbers USING INDEX ppp (account_id=? AND value>? AND value<?)
2|1|0|SEARCH TABLE items USING INTEGER PRIMARY KEY (rowid=?)
2|0|0|USE TEMP B-TREE FOR ORDER BY
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
The second subquery cannot use an index for sorting because part_numbers is not the outer table in the join, but the speedup from looking up both account_id and value through an index is likely to be greater than the slowdown from doing an explicit sorting step.
For this query alone, you could drop all indexes not mentioned here.
If the part numbers can be searched case sensitively, you should remove all the COLLATE NOCASE stuff and replace the LIKE searches with a case-sensitive search (partnum BETWEEN 'abc' AND 'abcz').
I have two tables - Keys and KeysTemp.
KeysTemp contains temporary data which should be merged with Keys using the Hash field.
Here is the query:
SELECT
r.[Id]
FROM
[KeysTemp] AS k
WHERE
r.[Hash] NOT IN (SELECT [Hash] FROM [Keys] WHERE [SourceId] = 10)
I have indexes on both tables for SourceId and Hash fields:
CREATE INDEX [IdxKeysTempSourceIdHash] ON [KeysTemp]
(
[SourceId],
[Hash]
);
The same index for Keys table, but query is still very slow.
There is 5 rows in temporary table and about 60000 in the main table. Query by hash takes about 27 milliseconds, but querying this 5 rows takes about 3 seconds.
I also tried splitting index, i.e. creating different indexes for SourceId and Hash, but it works the same way. OUTER JOIN works even worse here. How to solve that issue?
UPDATE
If I remove WHERE [SourceId] = 10 from the query it completes in 30ms, that's great, but I need this condition :)
Thanks
Maybe
select k.id
from keytemp as k left outer join keys as kk on (k.hash=kk.hash and kk.sourceid=10)
where kk.hash is null;
? Assuming, that r is k. Also have you tried not exists? I have no idea if it works different way…
I would do :
SELECT
r.[Id]
FROM
[KeysTemp] AS k
WHERE
r.[Id] NOT IN (SELECT A.[Id] FROM [KeysTemp] AS A, [Keys] AS B WHERE B.[SourceId] = 10 AND A.[Hash] == B.[Hash])
You list all elements in KeysTemp (few) that exist in Keys, then take the not these ones in KeysTemp
If there are just a few new keys you could try this:
SELECT
r.[Id]
FROM
[KeysTemp] AS k
WHERE
r.[Id] NOT IN (SELECT kt.[Id] FROM [Keys] AS k1
INNER JOIN [KeysTemp] AS kt ON kt.Hash = k1.Hash
WHERE k1.[SourceId] = 10)
KeysTemp should have an index on the Hash column and Keys on the SourceId column.