SQLite does not like inner joins? - sql

I am having trouble with sqlite in an android application.
It seems that any JOIN OPERATION totally kills my performance
One table is a fts3 table because my application is a dictionary and I read fts3 benefits dictionary like look ups.
These are my 2 tables I want to join (mainly getting the meaning of the word (okurigana) in different languages :
CREATE VIRTUAL TABLE tango USING fts3 (okurigana, kana, pos, pos_detail);
CREATE TABLE translation (_id int(7), language VARCHAR(10), meaning VARCHAR(100), FOREIGN KEY (_id) REFERENCES tango(rowid));
CREATE INDEX lang_match ON translation (language);
I query these tables with this command:
Select a.rowid, a.okurigana, a.kana, b.meaning
from tango a inner join translation b
ON a.rowid=b._id AND b.language='eng'
WHERE a.okurigana MATCH 'A*'"
The query takes several seconds to complete. I dont understand why. If I use this query (removed the inner join) the query is extremely fast.
Select a.rowid, a.okurigana, a.kana
from tango a
WHERE a.okurigana MATCH 'A*';
Why does a join kills the performance o.0?

You can speed up the query with the use of indexes. This is your query:
Select a.rowid, a.okurigana, a.kana, b.meaning
from tango a inner join
translation b
ON a.rowid = b._id AND b.language = 'eng'
WHERE a.okurigana MATCH 'A*'" ;
There are basically two ways for the engine to process this query. One way is to do the filtering on tango using the where clause and then to look up the values in translation. For this, a useful index would be:
create index translation_id_language_meaning on translation(_id, language, meaning)
The other way would be to scan translation and then do the the lookup on tango. For this, a useful index would be:
create index translation_language_id_meaning on translation(language, _id, meaning)
The first is probably most appropriate for your query, but the better solution depends on the table statistics and distribution of values.

If adding an inner join slows the query down without increasing significantly the number of rows that you get back, it is usually because your schema lacks an index.
In your case, it looks like your translation._id or translation.language is not indexed (perhaps both columns need indexing).
Adding indexes using the CREATE INDEX ... command for these two columns should speed up your query.

Related

Postgres Functional Index on string_to_array(long_string, ',') not being used

I'm dealing with a ~20M line table in Postgres 10.9 that has a text column in that is a bunch of comma delimited strings. This table gets joined all over to many tables that are much longer, and every time the previous authors did so, they join with the on clause being some_other_string = Any(string_to_array(col, ','))
I'm trying to implement a quick optimization to make queries faster while I work on a better solution with the following index:
My functional index:
create index string_to_array_index on happy_table (string_to_array(col));
Test query:
select string_to_array(col, ',') from happy_table;
When I execute an explain on the test query in order to see if the index is being used, I can see that it isn't. I see examples of functional indexes on strings where they lowercase the string or perform some basic operation like that. Do functional indexes work with string_to_array?
select a.id
from joyful_table a
join happy_table b on a.col = any(string_to_array(b.col, ','));
That is a bad design. No matter what you do and how big the tables are, you are stuck with a nested loop join (because the join condition does not use the = operator).
You are right; the best you can do is to speed up that nested loop with an index.
Your index doesn't work because it is a B-tree index, and that cannot be used with arrays in a meaningful way. What you need is a GIN index:
CREATE INDEX ON happy_table USING gin (string_to_array(col, ','));
But that index won't be used with = ANY. You'll have to rewrite the join to
SELECT a.id
FROM joyful_table a
JOIN happy_table b
ON ARRAY[a.col] <# string_to_array(b.col, ',');

Is this index defined correctly for this join usage? (Postgres)

select
*
from
tbl1 as a
inner join
tbl2 as b on
tbl1.id=b.id
left join
tbl3 as c on
tbl2.id=tb3.parent_id and
tb3.some_col=2 and
tb3.attribute_id=3
In the example above:
If I want optimal performance on the join, should I set the index on tbl3 as so?
parent_id,
some_col,
attribute_id
The answer depends on the chosen join type.
If PostgreSQL chooses a nested loop or a merge outer join, your index is perfect.
If PostgreSQL chooses a hash outer join, the index won't help at all. In that case you need an index on (some_col, attribute_id).
Work with EXPLAIN to make the best choice for your case.
Note: If one of the conditions on some_col and attribute_id is not selective (doesn't filter out a significant number of rows), it is often better to omit that column in the index. In that case, it is better to get the benefit of a smaller index and more HOT updates.
My answer is "Maybe". I am speaking from experience with SQL Server, so someone please correct me if I am wrong and it is different in Postgres.
Your index looks fine for the most part. An issue that may arise is using the SELECT *. If tbl3 has more columns than what is defined in your index and you are querying those fields, they won't be in your index and the engine will have to do additional lookups outside that index.
Another thing would be based on the cardinality of your fields, meaning which are the most selective. If parent_id has a high cardinality, meaning very few duplicates, it could cause more reads against the index. However, if your lowest cardinality field is first and the db can quickly filter out huge chunks of data, that might be more efficient.
I have seen both work very well in SQL Server. SQL Server has even recommended indexes, I apply them, and then it recommends a different one based on field cardinality. Again, I am not familiar with the Postgres engine and I am just assuming these topics apply across both. If all else fails, create 3 indexes with different column order and see which one the engine likes the best.

Oracle performance questions, inner selects in joins, temporary WITH tables indexes

I would like to consult three aspects of performance (Oracle 11g).
1./ If I define temporary table by keyword "WITH" like
WITH tbl AS (
SELECT [columns from both tables...]
FROM table_with_inexes
JOIN other_table ...
)
SELECT ...
FROM tbl
JOIN xxx ON tbl.column = xxx.column
is subsequent select on that temporary table able to use indexes, that was defined on table_with_inexes and other_table?
2./ Is it possible to add indexes to temporary table created by "WITH" in that above-like single SQL command?
3./ When I have construct such as this:
...
LEFT JOIN (
SELECT indexedColumn, otherColumns
FROM table
JOIN other_table
GROUP BY ...
) C
ON (outerTable.indexedColumn = C.indexedColumn)
in which cases could Oracle use indexes on indexedColumn? I assume, that the select in LEFT JOIN is only "projection" that does not maintain indexes, so the join's ON clausule evaluation is evaluated without using indexes?
The WITH clause (or subquery factoring as it's known as) is just a means of creating aliases for subqueries. It's most useful when you have multiple copies of the same subquery in your query, in which case Oracle may or may not choose to create a temporary table for it behind the scenes (aka "materialize" it). You should read up on this - here's a good link.
To answer your questions:
1) If the indexes are available to be used (no functions on the columns involved, selecting a small percentage of the data etc, etc) then they'll be used, just like in any other query.
2) You can't add indexes to the subquery. Not even to the temporary table that Oracle might create behind the scenes; you have no control over that.
3) I suggest you read up about when indexes might or might not be used. Try http://www.orafaq.com/node/1403 or http://www.orafaq.com/tuningguide/not%20using%20index.html, or perform your own google search.
WITH clause might be either inlined or materialized. It's up to Oracle to decide which approach is better. In your case most probably both queries will have the same execution plan(will be inlined)
PS: even if the table is materialized, indexes can not be added, Oracle can not do that. On the other hand in most cases it is not even necessary, the table can be materialized as a hash table(not heap table) or full table scan is used on it.

Index spanning multiple tables in PostgreSQL

Is it possible in PostgreSQL to place an index on an expression containing fields of multiple tables. So for example an index to speed up an query of the following form:
SELECT *, (table1.x + table2.x) AS z
FROM table1
INNER JOIN table2
ON table1.id = table2.id
ORDER BY z ASC
No it's not possible to have an index on many tables, also it really wouldn't guarantee speeding up anything since you won't always get an Index Only Scan. What you really want is a materialized view but pg doesn't have those either. You can try implementing it yourself using triggers like this or this.
Update
As noted by #petter. The materialized views were introduced in 9.3.
No, that's not possible in any currently shipping SQL dbms. Oracle supports bitmap join indexes, but that might not be relevant. It's not clear to me whether you want an index on only the join columns of multiple tables, or whether you want an index on arbitrary columns of joined tables.
To determine the real source of performance problems, learn to read the output of PostgreSQL's EXPLAIN ANALYZE.

mysql: which queries can untilize which indexes?

I'm using Mysql 5.0 and am a bit new to indexes. Which of the following queries can be helped by indexing and which index should I create?
(Don't assume either table to have unique values. This isn't homework, its just some examples I made up to try and get my head around indexing.)
Query1:
Select a.*, b.*
From a
Left Join b on b.type=a.type;
Query2:
Select a.*, b.*
From a,b
Where a.type=b.type;
Query3:
Select a.*
From a
Where a.type in (Select b.type from b where b.brand=5);
Here is my guess for what indexes would be use for these different kinds of queries:
Query1:
Create Index Query1 Using Hash on b (type);
Query2:
Create Index Query2a Using Hash on a (type);
Create Index Query2b Using Hash on b (type);
Query3:
Create Index Query2a Using Hash on b (brand,type);
Am I correct that neither Query1 or Query3 would utilize any indexes on table a?
I believe these should all be hash because there is only = or !=, right?
Thanks
using the explain command in mysql will give a lot of great info on what mysql is doing and how a query can be optimized.
in q1 and q2: an index on (a.type, all other a cols) and one on (b.type, all other b cols)
in q3: an index on (a.b_type, all other a cols) and one on b (brand, type)
ideally, you'd want all the columns that were selected stored directly in the index so that mysql doesn't have to jump from the index back to the table data to fetch the selected columns. however, that is not always manageable (i.e.: sometimes you need to select * and indexing all columns is too costly), in which case indexing just the search columns is fine.
so everything you said works great.
query 3 is invalid, but i assume you meant
where a.type in ....
Query 1 is the same as query two, just better syntax, both probably have the same query plan and both will use both indexes.
Query 3 will use the index on b.brand, but not the type portion of it. It would also use an index on a.type if you had one.
You are right that they should be hash indexes.
Query 3 could utilize an index on a.type if the number of b's with brand=5 is close to zero
Query2 will utilize indices if they are B-trees (and thus are sorted). Using hash indices with index-join may slow down your query (because you'll have to read Size(a) values in non-sequential way)
Query optimization and indexing is a huge topic, so you'll definitely want to read about MySQL and the specific storage engines you're using. The "using hash" is supported by InnoDB and NDB; I don't think MyISAM supports it.
The joins you have will perform a full table or index scan even though the join condition is equality; Every row will have to be read because there's no where clause.
You'll probably be better off with a standard b-tree index, but measure it and investigate the query plan with "explain". MySQL InnoDB stores row data organized by primary key so you should also have a primary key on your tables, not just an index. It's best if you can use the primary key in your joins because otherwise MySQL retrieves the primary key from the index, then does another fetch to get the row. The nice exception to that rule is if your secondary index includes all the columns you need in the query. That's called a covering index and MySQL will not have to lookup the row at all.