Why the key prefix optimization doesn't work with secondary index on a clustering column? - scylla

Scylla DB implements, so-called "key prefix optimization" for secondary indexes, which eliminates filtering if a part of the primary key is specified. E.g. it's possible to execute SELECT * FROM A WHERE a = 'a' AND b = 'a' AND d = 'a'; on table A.
CREATE TABLE A (
a text,
b text,
c text,
d text,
PRIMARY KEY(a,b,c)
);
CREATE INDEX A_index ON A (d);
But it doesn't work if A.d is a clustering column. E.g. as in table B below.
CREATE TABLE B (
a text,
b text,
c text,
d text,
PRIMARY KEY(a,b,c,d)
);
CREATE INDEX B_index ON B (d);
The above SELECT query fails with the error:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING"
ScyllaDB 3.0.1.

thanks for finding an interesting corner case :)
The problem is that the second query restricts clustering columns (b, d), which in itself does not form a clustering key prefix. Of course, d is indexed, so what should happen is using a in key prefix optimization and d as an indexed column.
Instead, it's wrongfully decided that (b, d) does not form a prefix, so it's discarded from optimization candidates, without taking into account that d has an index.
This simplification will be fixed, I created a bug tracker issue here: https://github.com/scylladb/scylla/issues/4178

Related

I have a composite key for a table. I want to join on just one column of this key. Does that column need a separate index?

Imagine I have a table with a composite primary key containing DateCode and AddressCode.
I want to join that table with another table on just AddressCode.
I know there will be a single index on DateCode combined with AddressCode, since that is the primary key. Should I also have an index on just AddressCode in this table just for the purposes of efficient joins to other tables only using the AddressCode as a foreign key? This is was what I would do in MySQL, though I'm not sure if Microsoft SQL Server handles this situation better automatically somehow.
After further research and experimentation, I have my own answer. Yes, a join on a column that is part of a composite key but is not the first element of that index (that is, "most significant member") requires a separate index. Without that index, performing a JOIN on that column requires a full scan of either the composite index or the table.
To clarify this further, if there is a composite index (such as is automatically created for a composite primary key) on three columns a, b, and c, if the index was created on a, b, c via
CREATE INDEX NewIndex ON Table(a, b, c)
then a is the most significant and c is the least. If the index was created on b, c, a, like so
CREATE INDEX NewIndex ON Table(b, c, a)
then b is the most significant. Since the index is ordered according to this significance, finding values indexed by the most significant component of a composite index requires only a trivial amount of additional effort in comparison to finding values indexed by that column alone (that is, it’s like looking for all integers that begin with “7” in an ordered list from 1 to 1000), whereas finding values indexed on less significant components of a composite index typically requires a full index scan (that is, it’s like looking for all integers that end with “7” in an ordered list from 1 to 1000).

Does indexing in Postgres improve ordering speed?

Let's say you have a table with a primary key A, and two columns B and C.
When querying we want to do SELECT * FROM table WHERE A = 'thing' ORDER BY B, C
Since A is a primary key, it already has an index. Is there any benefit to adding an index on B and C in terms of speeding up ordering?
Thanks!
This query cannot benefit from additional indexes.
If a is the primary key, then the query can only return zero or one rows, so ordering is trivial and cannot be made faster.
In fact, you should omit the ORDER BY clause.

MS Access: Best indexing strategy for retrieving DISTINCT combinations of joined fields

I have two tables in MS Access 2010:
Table tblA:
idA AutoNumber
a Text(255)
b Text(255)
c Text(255)
x Text(255)
y Text(255)
Table tblB:
idB AutoNumber
fkA Long Integer
d Text(255)
e Text(255)
z Text(255)
... and need to execute the following query:
SELECT DISTINCT
tblA.a
, tblA.b
, tblA.c
, tblB.d
, tblB.e
FROM tblA
INNER JOIN tblB
on tblA.idA = tblB.fkA
;
Both tables are very large and I was wondering what is the best indexing strategy to achieve the fastest response time.
idA and idB are the primary keys for their respective tables and fkA has its own index.
But what about tblA.a, tblA.b, tblA.c, tblB.d, tblB.e? Should I create a composite index on tblA.a, tblA.b, tblA.c and one on tblB.d, tblB.e? Or should each field be indexed individually?
I tried both options and the first one seems to yield slightly better results, though both are not very satisfactory in terms of performance. I would like to understand more about the theoretical background and appreciate every input.
As you are joining all records, the DBMS may simply decide for full table scans to join the tables.
With indexes on tblA(idA) and tblB(fkA) you give the DBMS the option to use these instead, but it's up to the DBMS to do so or not (it will - hopefully - decide for the faster way, whichever this is).
You can also offer the DBMS covering indexes. That means all columns used in the query are in that index, so if the DBMS uses it, it doesn't have to access the table additionally, but can get everything from the index itself. As you have no where clause, the DBMS may still prefer to access the tables row by row, rather than run through indexes. The covering indexes would be:
tblA(idA, a, b, c)
tblB(fkA, d, e)

Avoid duplicate data in SQLite3 with a covering index

In our company we have a rather big SQLite3 database with, let's say, some points of interest (POI). The database is created once, and used in read-only mode in a mobile user application.
POI have names that can contain several words and letters with diacritics. To perform a quick search of POI in the application, there is an additional table with single uppercase ASCII words and the corresponding ID in the main table. And there is a covering index. The database looks like this (simplified) :
CREATE TABLE poi(id INTEGER PRIMARY KEY, name TEXT, attributes TEXT);
CREATE TABLE poi_search (word TEXT, poi_id INTEGER);
CREATE INDEX poi_search_idx ON poi_search(word, poi_id);
Then, you can query for POI whose name contain "FOO" with a request like that:
SELECT * from poi INNER JOIN poi_search ON poi.id=poi_search.poi_id
WHERE poi_search.word < 'FOO' AND poi_search.word < 'FOP';
The query is very quick and uses a covering index, so it doesn't need to access the poi_search table at all:
sqlite> EXPLAIN QUERY PLAN SELECT * from poi INNER JOIN poi_search ON poi.id=poi_search.poi_id WHERE poi_search.word < 'FOO' AND poi_search.word < 'FOP';
0|0|1|SEARCH TABLE poi_search USING COVERING INDEX poi_search_idx (word<?)
0|1|0|SEARCH TABLE poi USING INTEGER PRIMARY KEY (rowid=?)
I just realized that this is a big waste of space, since the covering index duplicates all the data of the index table. In the application, the table poi_search is in fact never used.
In there a way, even a tricky one, to remove or to truncate the poi_search table while keeping all data in the covering index ? I know that such a database will be in a incoherent state, so probably there is no way with the official API to do such a hack.
I don't care having a hacked version of SQLite3 for the production of the database; but the DB has to produce correct search values for the given request in a vanilla SQLite3 client.
There is no tricky way, or a hack, to do what you want.
You'll have to make do with the documented way, which is guaranteed to keep the database consistent:
CREATE TABLE poi_search (
word TEXT PRIMARY KEY,
poi_id INTEGER
) WITHOUT ROWID;
-- no other index needed

is a db index composite by default?

when I create an index on a db2, for example with the following code:
CREATE INDEX T_IDX ON T(
A,
B)
is it a composite index?
if not: how can I then create a composite index?
if yes: in order to have two different index should I create them separately as:
CREATE INDEX T1_IDX ON T(A)
CREATE INDEX T2_IDX ON T(A)
EDIT: this discussion is not going in the direction I expect (but in a better one :)) I actually asked how, and not why to create separate indexes, I planed to do that in a different question, but since you anticipated me:
suppose I have a table T(A,B,C) and a search function search() that select from the table using any of the following method
WHERE A = x
WHERE B = x
WHERE C = x
WHERE A = x AND B=y (and so on AC, CB, ABC)
if I create a compose index ABC, is it going to working for example when I select on just C?
the table is quite big, and the insert\update not so frequent
Yep multiple fields on create index = composite by definition: Specify two or more column names to create a composite index.
Understanding when to use composite indexes appears to be your last question...
If all columns selected by a query are in a composite index, then the dbengine can return these values from the index without accessing the table. so you have faster seek time.
However if one or the other are used in queries, then creating individual indexes will serve you best. It depends on the types of queries executed and what values they contain/filter/join.
If you sometimes have one, the other, or both, then creating all 3 indexes is a possibility as well. But keep in mind each additional index increases the amount of time it takes to insert, update or delete, so on highly maintained tables, more indexes are generally bad since the overhead to maintain the indexes effects performance.
The index on A, B is a composite index, and can be used to seek on just A or a seek on A with B or for a general scan, of course.
There is usually not much of a point in having an index on A, B and an index on just A, since a partial search on A, B can be used if you only have A. That wider index will be a little less efficient, however, so if the A lookup is extremely frequent and the write requirements mean that it is acceptable to update the extra index, it could be justifiable.
Having an index on B may be necessary, since the A, B index is not very suitable for searches based on B only.
First Answer: YES
CREATE INDEX JOB_BY_DPT
ON EMPLOYEE (WORKDEPT, JOB)
Second Answer:
It depends on your query; if most of the time your query referrence a single column in where clause like select * from T where A = 'something' then a single index would be what you want but if both column A and B get referrenced then you should go for creating a composite one.
For further referrence please check
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000919.htm