Any ways to speed up like 'foo%' queries in PostgreSQL?

Any ways to speed up like 'foo%' queries in PostgreSQL? - sql

I have many queries like
select * from table where (upper (column1) like 'FOO%')
and (upper (column2) like 'BAR%')
and (upper (column3) like 'XYZ%')
And suach an index:
create index on table (upper(column1::text), upper(column2::text), upper(column3::text));
But for some reason queries are pretty slow and explain operator shows that it doesn't use any index scan, just simple sec scan. I've read that B-tree index type is the best for queries like mine with anchor in the end of the constant.
Any ideas why this happen? May be something wrong with my index creation command?

For that, you need three indexes:
/* "text_pattern_ops" makes the index usable for LIKE */
CREATE INDEX ON "table" (column1 text_pattern_ops);
CREATE INDEX ON "table" (column2 text_pattern_ops);
CREATE INDEX ON "table" (column3 text_pattern_ops);
PostgreSQL will scan the index or the indexes for the WHERE conditions that promise to significantly reduce the number of rows. If it scans several indexes, it can combine the result. If one of these WHERE conditions is never selective, you can omit the corresponding index, since it won't be used.
You won't be able to cover that query with a single index.

Well the use of the UPPER function on the three columns basically precludes any chance of an index being used. However, if you could ensure that you only store uppercase values in the three columns, then you could add an index:
CREATE INDEX idx ON yourTable (column1, column2, column3);
You would then use this version of your query:
SELECT *
FROM yourTable
WHERE column1 LIKE 'FOO%' AND column2 LIKE 'BAR%' AND column3 LIKE 'XYZ%';
The reason this index would work is that your LIKE expressions are substrings starting from the very beginning of the column values. As a result, a B-tree index can be used to search for these starting substrings.

Related

In Oracle, if I make a composite index on 2 columns, then in which situation this index will be used to search the record?

In Oracle, if I make a composite index on 2 columns, then in which situation this index will be used to search the record ?
a) If my query has a WHERE clause which involves first column
e.g. WHERE first_column = 'John'
b) If my query has a WHERE clause which involves second column
e.g. WHERE second_column = 'Sharma'
c) Either a or b
d) Both a and b
e) Not specifically these 2 columns but it could be any column in the WHERE clause.
f) Only column a or both columns a and b

I happen to think that MySQL does a pretty good job of describing how composite indexes are used. The documentation is here.
The basic idea is that the index would normally be used in the following circumstances:
When the where condition is an equality on col1 (col1 = value).
When the where condition is an inequality or in on col1 (col1 in (list), col1 < value)
When the where condition is an equality on col1 and col2, connected by an and (col1 = val1 and col2 = val2)
When the where condition is an equality on col1 and an inequality or in on col2.
Any of the above four cases where additional columns are used with additional conditions on other columns, connected by an and.
In addition, the index would normally be used if col1 and col2 are the only columns referenced in the query. This is called a covering index, and -- assuming there are other columns in the table -- it is faster to read the index than the original table because the index is smaller.
Oracle has a pretty smart optimizer, so it might also use the index in some related circumstances, for instance when col1 uses an in condition along with a condition on col2.
In general, a condition will not qualify for an index if the column is an argument to a function. So, these clauses would not use a basic index:
where month(col1) = 3
where trunc(col1) = trunc(sysdate)
where abs(col1) < 1
Oracle supports functional indexes, so if these constructs are actually important, you can create an index on month(col1), trunc(col1), or abs(col1).
Also, or tends to make the use of indexes less likely.

d) Both a or b
If the leading column is used, Oracle will likely use a regular index range scan and just ignore the unused columns.
If a non-leading column is used, Oracle can use an index skip scan. In practice a skip scan is not used very often.
There are two completely different questions here: when can Oracle use an index and when will Oracle use an index. The above explains that Oracle can use an index in either case, and you can test that out with a hint: /*+ index(table_name index_name) */.
Determining when Oracle will use an index is much trickier. Oracle uses multi-block reads for full table scans and fast full index scans, and uses single-block reads for other index scans. This means a full table scan is more efficient when reading a larger percent of the data. But there are a lot of factors involved: the percentage of data, how big is the index, system statistics that tell Oracle how fast single- and multi-block IO are, the number of distinct values (especially important for choosing a skip scan), index clustering factor (how ordered is the table by the index columns), etc.

The optimizer will use indexes in several scenarios. Even if not "perfect".
Optimaly, if you are querying using the first columns in the index, then the index will be used. Even if you're referencing only the first column, then it will still use the index if the optimizer deems it filters out enough data.
If the indexed columns aren't answering the query requirement (for instance only referencing the second column in the where clause), the optimizer could still use the index for a full (table) index scan, if it holds all of the data required, because the index is smaller than the full table.
In your example, if you are only querying from that table, and you only have that one index, (a) will use the index, (b) will use it if you are only querying columns in the index, while the table itself has more.
If you have other indexes, or join other tables, then that could affect the explain plan compeltely.
Check out http://docs.oracle.com/cd/B19306_01/server.102/b14231/indexes.htm

wildcard or "in list" when querying in Postgres

I have a few tables where I need to get the data related to foo. The size of the tables are about 10^8 rows.
So I need to get all rows where the column include substring 'foo' from these tables.
select * from bar where my_col like '%foo%';
I know this is slow so I check the possible values:
select distinct my_col from bar where my_col like '%foo%';
-- => ('xx_foo', 'yy_foo', 'xx_foo_xx', 'foo' ... 'xx_foo_yy')
The number of possible values varies between 3 and 20.
Now how slow is '%foo%' really?
select * from bar where my_col like '%foo%';
-- or
select * from bar where my_col in('foo', 'xx_foo' ... 'foo_yy'); -- list_size = 20
Any general rule on when to use what, or is testing the speed for different cases the only way to go?
Edit: I do not own the table and no index exists on the column foo. So it needs to do a full table scan no matter what.

If you use %foo%, you will get a full-table scan, which is slow.
If you use IN with a list of values, than an index can be used if it exists on the column on which you have the condition.
So, if you are able, you should avoid using %foo%. Depending on how often new values may appear in the table, you might consider using an extra table holding the distinct values and use it when querying your main table, and update that extra table whenever new distinct value comes to play (if it is possible in your design).

A search using the like operator will sure lead to a table scan when the pattern starts with a %. When using the in operator and the values are not more than a few percent of the values in the table an index can be used, if it exists. Check the cardinality concept:
http://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29
The DBMS knows about the cardinalities keeping statistics about the tables. If your column has high cardinality and an index on it then an index scan is likely when using the in operator. To update the statistics issue an analyze command.

Does indexes work with group function in oracle?

I am running following query.
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
I have created index for columns (Field_1,Field_2,Field_3,Field_4) of Table_1 but the index is not getting used.
If I remove the SUM(Table_1.Field_5) from select clause then index is getting used.
I am confused if optimizer is not using this index or its because of SUM() function I have used in query.
Please share your explaination on the same.

When you remove the SUM you also remove field_5 from the query. All the data needed to answer the query can then be found in the index, which may be quicker than scanning the table. If you added field_5 to the index the query with SUM might use the index.

If your query is returning the large percentage of table's rows, Oracle may decide that doing a full table scan is cheaper than "hopping" between the index and the table's heap (to get the values in Table_1.Field_5).
Try adding Table_1.Field_5 to the index (thus covering the whole query with the index) and see if this helps.
See the Index-Only Scan: Avoiding Table Access at Use The Index Luke for conceptual explanation of what is going on.

As you mentioned, the presence of the summation function results in the the Index being overlooked.
There are function based indexes:
A function-based index includes columns that are either transformed by a function, such as the UPPER function, or included in an expression, such as col1 + col2.
Defining a function-based index on the transformed column or expression allows that data to be returned using the index when that function or expression is used in a WHERE clause or an ORDER BY clause. Therefore, a function-based index can be beneficial when frequently-executed SQL statements include transformed columns, or columns in expressions, in a WHERE or ORDER BY clause.
However, as with all, function based indexes have their restrictions:
Expressions in a function-based index cannot contain any aggregate functions. The expressions must reference only columns in a row in the table.

Though I see some good answers here couple of important points are being missed -
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
Saying that having SUM(Table_1.Field_5) in select clause causes index not to be used in not correct. Your index on (Field_1,Field_2,Field_3,Field_4) can still be used. But there are problems with your index and sql query.
Since your index is only on (Field_1,Field_2,Field_3,Field_4) even if your index gets used DB will have to access the actual table row to fetch Field_5 for applying filter. Now it completely depends on the execution plan charted out of sql optimizer which one is cost effective. If SQL optimizer figures out that full table scan has less cost than using index it will ignore the index. Saying so I will now tell you probable problems with your index -
As others have states you could simply add Field_5 to the index so that there is no need for separate table access.
Your order of index matters very much for performance. For eg. in your case if you give order as (Field_4,Field_1,Field_2,Field_3) then it will be quicker since you have equality on Field_4 -Table_1.Field_4 = '31-oct-2011'. Think of it this was -
Table_1.Field_4 = '31-oct-2011' will give you less options to choose final result from then Table_1.Field_3 NOT IN (1, 3). Things might change since you are doing a join. It's always best to see the execution plan and design your index/sql accordingly.

Does the Column Order in the WHERE clause matter for Index Selection?

Suppose I'm running a query that has:
WHERE column1 = "value1"
AND column2 = "value2"
column1 is indexed, and column2 is not. Does the order of my WHERE clause matter? Should I run a subquery over the indexed column first? Or, is SQL smart enough to automatically query over the indexed column first?

The order in the SQL statement does not matter, certainly not for indexes that are not covering indexes (more than one column).
Covering indexes require that there be a reference in the query for at least one column, starting from the left of the list. IE: A covering index defined as "column1, column2, column3" needs queries to at least reference column1 in order to use the index. A query that only has references to either column2, or a combination of column2 and column3 would not use the covering index.
That said, index decisions by the optimizer are determined by table statistics & how fragmented the index is at the time of the query. Neither of these is self-maintaining, because depending on the amount of data can be very time consuming (so you wouldn't want it happening all the time). Having an index doesn't guarantee the index will always be used.
Indexes are also not ANSI, but surprisingly vendors (MySQL, Oracle, etc) have relatively similar syntax & naming.

For that query, either of these is optimal:
INDEX(column1, column2)
INDEX(column2, column1)
The order of things in the WHERE does not matter; the order of the columns in an INDEX does matter, sometimes a lot.
Cardinality does not matter.
More on creating optimal indexes for MySQL; much of that should relevant to other engines.

The order that you type your where clause does not matter -- the execution planner for the database will sort that out.
In the example you show above, every row matching column1 will be looked up first because it is indexed and then the value of column2 checked.

If I remember correctly the order of clauses is not significant. Its all part of the same execution plan, so if you view the exec plan you will notice that the where clause on a nonindexed field will be very expensive, irregardless of the order you put it in.
If it is a highly queried you are better of having that field in a nonclustered index, or at the very least with an include clause in the index.

Does SQLite multi column primary key need an additional index?

If I create a table like so:
CREATE TABLE something (column1, column2, PRIMARY KEY (column1, column2));
Neither column1 nor column2 are unique by themselves. However, I will do most of my queries on column1.
Does the multi column primary key create an index for both columns separately? I would think that if you specify a multi column primary key it would index them together, but I really don't know.
Would there be any performance benefit to adding a UNIQUE INDEX on column1?

There will probably not be a performance benefit, because queries against col1=xxx and col2=yyy would use the same index as queries like col1=zzz with no mention of col2. But my experience is only Oracle, SQL Server, Ingres, and MySQL. I don't know for sure.

You certainly don't want to add a unique index on column 1 as you just stated:
Neither column1 nor column2 are unique by themselves.
If column one comes first, it will be first in the multicolumn index in most databases and thus it is likely to be used. The second column is the one that might not use the index. I wouldn't add one on the second column unless you see problems and again, I would add an index not a unique index based on the comment you wrote above.
But SQL lite must have some way of seeing what it is using like most other databases, right? Set the Pk and see if queries uing just column1 are using it.

I stumbled across this question while researching this same question, so figured I'd share my findings. Note that all of the below is tested on SQLite 3.39.4. I make no guarantees about how it will hold up on old/future versions. That said, SQLite is not exactly known for radically changing behavior at random.
To give a concrete answer for SQLite specifically: an index on column1 would provide no benefits, but an index on column2 would.
Let's look at a simple SQL script:
CREATE TABLE tbl (
column1 TEXT NOT NULL,
column2 TEXT NOT NULL,
val INTEGER NOT NULL,
PRIMARY KEY (column1, column2)
);
-- Uncomment to make the final SELECT fast
-- CREATE INDEX column2_ix ON tbl (column2);
EXPLAIN QUERY PLAN SELECT val FROM tbl WHERE column1 = 'column1' AND column2 = 'column2';
EXPLAIN QUERY PLAN SELECT val FROM tbl WHERE column1 = 'column1';
EXPLAIN QUERY PLAN SELECT val FROM tbl WHERE column2 = 'column2';
EXPLAIN QUERY PLAN is SQLite's method of allowing you to inspect what its query planner is actually going to do.
You can execute the script via something like:
$ sqlite3 :memory: < sample.sql
This gives the output
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=? AND column2=?)
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=?)
QUERY PLAN
`--SCAN tbl
So the first two queries, the ones which SELECT on (column1, column2) and (column1), will use the index to perform the search. Which should be nice and fast.
Note that the last query, the SELECT on (column2) has different output, though. It says it's going to SCAN the table -- that is, go through each row one by one. This will be significantly less performant.
What happens if we uncomment the CREATE INDEX in the above script? This will give the output
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=? AND column2=?)
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=?)
QUERY PLAN
`--SEARCH tbl USING INDEX column2_ix (column2=?)
Now the query on column2 will also use an index, and should be just as performant as the others.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas