Is substr or LIKE faster in Oracle? - sql

Would
WHERE substr(my_field,1,6) = 'search'
or
WHERE my_field LIKE 'search%'
be faster in Oracle, or would there be no difference?

Assuming maximum performance is the goal, I would ideally choose SUBSTR(my_field,1,6) and create a function-based index to support the query.
CREATE INDEX my_substr_idx
ON my_table( substr( my_field,1,6 ) );
As others point out, SUBSTR(my_field,1,6) would not be able to use a regular index on MY_FIELD. The LIKE version might use the index, but the optimizer's cardinality estimates in that case are generally rather poor so it is quite likely to either not use an index when it would be helpful or to use an index when a table scan would be preferable. Indexing the actual expression will give the optimizer far more information to work with so it is much more likely to pick the index correctly. Someone smarter than I am may be able to suggest a way to use statistics on virtual columns in 11g to give the optimizer better information for the LIKE query.
If 6 is a variable (i.e. you sometimes want to search the first 6 characters and sometimes want to search a different number), you probably won't be able to come up with a function-based index to support that query. In that case, you're probably better off with the vagaries of the optimizer's decisions with the LIKE formulation.

Of the two options provided, definitely LIKE. The substring method will have to be executed against all rows in the table. Using LIKE will allow the use of indexes.
To check my answer, just profile the results. It should be clear as day.

If you have an index on my_field, then LIKE may be faster. Do your own benchmarks.

If you have no index than there is no difference. Because oracle is doing a full table scan and evaluates the expression for each row.
You can put an index on the column to speed up both queries.
CREATE INDEX my_like_idx
ON my_table( my_field );
This index is more flexible and speeds up the query using like. It will work for any compare starting with characters and having placeholder (%) at the end. Oracle is doing a index range scan to find all matching rows.
CREATE INDEX my_substr_idx
ON my_table( substr( my_field,1,6 ) );
This index speeds up the query with substr. But the index is very special to compare only the first 6 characters.
If you query for a piece of starting in the middle. Creating a function based index will help.
WHERE substr(my_field,2,5) = 'earch'
WHERE my_field like '%earch%'

There's really two issues here:
For which one will Oracle produce the more accurate cardinality and cost estimate?
Which method is more flexible in terms of potential access methods?
This may vary by version, but both are pretty easy to test and that way you're sure that you have the best information for your version and your data.
Run execution plans for both queries using ...
explain plan for
select ... from ... where my_field LIKE 'search%';
select * from table(dbms_xplan.display);
and
explain plan for
select ... from ... where substr(my_field,1,6) = 'search';
select * from table(dbms_xplan.display);
You may see a difference in the execution plan, depending on the presence of indexes etc., but also compare the cardinality estimates with the actual result that you get from:
select count(*) from ... where my_field LIKE 'search%';
One of the two methods may be significantly more accurate than the other.
If neither of them is very accurate and this query is expected to run for a non-trivial amount of time then consider using dynamic sampling to improve the estimate, because with the wrong cardinality estimate the optimizer may chose a suboptimal access method anyway.
explain plan for
select /*+ dynamic_sampling(4) */ ... from ... where substr(my_field,1,6) = 'search';
select * from table(dbms_xplan.display);
As far as index usage goes, both methods could use an index-based access method. The LIKE predicate is probably more index friendly and could use a range scan or a fast full index scan. The SUBSTR method can certainly use the fast full index scan, but whether the optimizer will consider a range scan is best tested on your own version -- my recollection is that it won't but who's to say that substr(my_column,1,n) won't be recognised as a special case, if not now then in the future?

I would profile both. But I would guess the 'LIKE' would be much faster, because it uses the binary search on the index (if the field is indexed). If you use the SUBSTR method, you will end up with a full table scan, as Oracle has to process row by row the function.

Related

How to use index in SQL query

Well i am new to this stuff ..I have created an index in my SP at start like follows
Create Index index_fab
ON TblFab (Fab_name)
Now i have query under this
select fab_name from TblFab where artc = 'x' and atelr = 'y'.
now Is it necessary to use this index name in select clause or it will automatically used to speed up queries
Do i have to use something like
select fab_name from TblFab WITH(INDEX(index_fab)) where artc = 'x' and atelr = 'y'.
or any other method to use this index in query
and also how to use index if we are using join on this table?
Firstly, do you mean you're creating the index in a stored procedure? That's a bad idea - if you run the stored procedure twice, it will fail because the index already exists.
Secondly, your query doesn't use the column mentioned in the index, so it will have no impact.
Thirdly, as JodyT writes, the query analyzer (SQL Server itself) will decide which index to use; it's almost certainly better at it than you are.
Finally, to speed up the query you mention, create an index on columns artc and atelr.
The Query Optimizer of SQL Server will decide if it the index is suitable for the query. You can't force it to use a specific index. You can give hints on which you want it to use but it won't be a guarantee that it will use it.
As the other people answered your question to help you to understand better, my opinion is, you should first understand why you need to use indexes. As we know that indexes increase the performance , they could also cause performance issues as well. Its better to know when you need to use indexes, why you need to use indexes instead of how to use indexes.
You can read almost every little detail from here .
Regarding your example, your query's index has no impact. Because it doesn't have the mentioned column in your query's where clause.
You can also try:
CREATE INDEX yourIndexName
ON yourTableName (column_you_are_looking_for1,column_you_are_lookingfor2)
Also good to know: If no index exists on a table, a table scan must be performed for each table referenced in a database query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially. Although a table scan might be more efficient for a complex query that requires most of the rows in a table, for a query that returns only some table rows an index scan can access table rows more efficiently. (source from here )
Hope this helps.
An index should be used by default if you run a query against the table using it.
But I think in the query you posted it will not be used, because you are not filtering your data by the column you created your index on.
I think you would have to create the index for the artc and atelr columns to profit from that.
To see wether your index is used take a look at the execution plan that was used in the SQL Management Studio.
more info on indices: use the index luke
You dont need to include index in your query. Its managed by sql server. Also you dont need to include index in select if you want to make join to this table. Hope its clear.
You're index use "Fab_name" column which you don't filter on in your select statement, so it's of no use.
Since you're new to this, you might benefit from an index like this :
Create Index index_fab
ON TblFab (artc, atelr)
or maybe like this
Create Index index_fab
ON TblFab (atelr, artc)
...yes there are a lot of subtleties to learn.
For better performance:
List out the columns /tables which are frequently used,
Create index on those tables/columns only.
If index is properly set up, optimizer will use it automatically. By properly set up, I mean that it's selective enough, can effectively help the query etc. Read about it. You can check by yourself if index is being used by using "include actual execution plan" option in ssms.
It's generally not advised to use with(index()) hints and let optimizer decided by itself, except from very special cases when you just know better ;).

Filter Performance in Cognos SUBSTRING Vs LIKE

I'm Working in cognos 10.1 now.
I'd like to find the names which start with 'AB', 'CE', 'JA'. I'm concerned about the query's performance as the query subject (table) contains about 1,000,000+ records. Which filter should I use?
substring ([Participant],1,2) in ('AB', 'CE', 'JA')
or
[Participant] like 'AB%' or [Participant] like 'CE%' or [Participant] like 'JA%'
Which would execute faster?
My experience in the past has suggested that if you're interested in strings that start with a particular character set then LIKE will yield a better result than the SUBSTR method, but the benefits generally only appear for strings of sufficient length for the optimiser to believe that an index scan is beneficial. This has usually been more than two characters, as I recall, so you may not see benefits in your case.
With the substr() predicate, in the absence of a function based index (see below), the best sort of index access you can hope for is a fast full index scan, which would generally not be as good as a regular index access method which Like might allow.
However, it is possible to define a function-based index on Substr(participant,1,2) that could be used by the substr() function. It would only be worthwhile if the start and length arguments on the substr (1 and 2 in your case) are fixed. A bitmap index may be a good choice if the table modification patterns make it suitable for them in general
You don't have an index on that column. So the only choice the optimizer has is a Full Table Scan. Frankly, the precise syntax of the filter isn't going to make any difference to that cost.
As David suggests, building an function-based index on the substr(participant,1,2) could give you some benefits. But it's only worthwhile if this is the sort of query you'll run a lot.

How to use index in select statement?

Lets say in the employee table, I have created an index(idx_name) on the emp_name column of the table.
Do I need to explicitly specify the index name in select clause or it will automatically used to speed up queries.
If it is required to be specified in the select clause, What is the syntax for using index in select query ?
If you want to test the index to see if it works, here is the syntax:
SELECT *
FROM Table WITH(INDEX(Index_Name))
The WITH statement will force the index to be used.
Good question,
Usually the DB engine should automatically select the index to use based on query execution plans it builds. However, there are some pretty rare cases when you want to force the DB to use a specific index.
To be able to answer your specific question you have to specify the DB you are using.
For MySQL, you want to read the Index Hint Syntax documentation on how to do this
How to use index in select statement? this way:
SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;
SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
SELECT * FROM t1 USE INDEX (i1) IGNORE INDEX (i2) USE INDEX (i2);
And many more ways check this
Do I need to explicitly specify?
No, no Need to specify explicitly.
DB engine should automatically select the index to use based on query execution plans it builds from #Tudor Constantin answer.
The optimiser will judge if the use of your index will make your query run faster, and if it is, it will use the index. from #niktrl answer
In general, the index will be used if the assumed cost of using the index, and then possibly having to perform further bookmark lookups is lower than the cost of just scanning the entire table.
If your query is of the form:
SELECT Name from Table where Name = 'Boris'
And 1 row out of 1000 has the name Boris, it will almost certainly be used. If everyone's name is Boris, it will probably resort to a table scan, since the index is unlikely to be a more efficient strategy to access the data.
If it's a wide table (lot's of columns) and you do:
SELECT * from Table where Name = 'Boris'
Then it may still choose to perform the table scan, if it's a reasonable assumption that it's going to take more time retrieving the other columns from the table than it will to just look up the name, or again, if it's likely to be retrieving a lot of rows anyway.
The optimiser will judge if the use of your index will make your query run faster, and if it is, it will use the index.
Depending on your RDBMS you can force the use of an index, although it is not recommended unless you know what you are doing.
In general you should index columns that you use in table join's and where statements
Generally, when you create an index on a table, database will automatically use that index while searching for data in that table. You don't need to do anything about that.
However, in MSSQL, you can specify an index hint which can specify that a particular index should be used to execute this query. More information about this can be found here.
Index hint is also seems to be available for MySQL. Thanks to Tudor Constantine.
By using the column that the index is applied to within your conditions, it will be included automatically. You do not have to use it, but it will speed up queries when it is used.
SELECT * FROM TABLE WHERE attribute = 'value'
Will use the appropriate index.
The index hint is only available for Microsoft Dynamics database servers.
For traditional SQL Server, the filters you define in your 'Where' clause should persuade the engine to use any relevant indices...
Provided the engine's execution plan can efficiently identify how to read the information (whether a full table scan or an indexed scan) - it must compare the two before executing the statement proper, as part of its built-in performance optimiser.
However, you can force the optimiser to scan by using something like
Select *
From [yourtable] With (Index(0))
Where ...
Or to seek a particular index by using something like
Select *
From [yourtable] With (Index(1))
Where ...
The choice is yours. Look at the table's index properties in the object panel to get an idea of which index you want to use. It ought to match your filter(s).
For best results, list the filters which would return the fewest results first.
I don't know if I'm right in saying, but it seems like the query filters are sequential; if you get your sequence right, the optimiser shouldn't have to do it for you by comparing all the combinations, or at least not begin the comparison with the more expensive queries.

Multiple indexes on one column

Using Oracle, there is a table called User.
Columns: Id, FirstName, LastName
Indexes: 1. PK(Id), 2. UPPER(FirstName), 3. LOWER(FirstName), 4. Index(FirstName)
As you can see index 2, 3, 4 are indexes on the same column - FirstName.
I know this creates overhead, but my question is on selecting how will the database react/optimize?
For instance:
SELECT Id FROM User u WHERE
u.FirstName LIKE 'MIKE%'
Will Oracle hit the right index or will it not?
The problem is that via Hibernate this slows down the query VERY much (so it uses prepared statements).
Thanks.
UPDATE: Just to clarify indexes 2 and 3 are functional indexes.
In addition to Mat's point that either index 2 or 3 should be redundant because you should choose one approach to doing case-insensitive searches and to Richard's point that it will depend on the selectivity of the index, be aware that there are additional concerns when you are using the LIKE clause.
Assuming you are using bind variables (which it sounds like you are based on your use of prepared statements), the optimizer has to guess at how selective the actual bind value is going to be. Something short like 'S%' is going to be very non-selective, causing the optimizer to generally prefer a table scan. A longer string like 'Smithfield-Manning%', on the other hand, is likely to be very selective and would likely use index 4. How Oracle handles this variability will depend on the version.
In Oracle 10, Oracle introduced bind variable peeking. This meant that the first time Oracle parsed a query after a reboot (or after the query plan being aged out of the shared pool), Oracle looked at the bind value and decided what plan to use based on that value. Assuming that most of your queries would benefit from the index scan because users are generally searching on relatively selective values, this was great if the first query after a reboot had a selective condition. But if you got unlucky and someone did a WHERE firstname LIKE 'S%' immediately after a reboot, you'd be stuck with the table scan query plan until the query plan was removed from the shared pool.
Starting in Oracle 11, however, the optimizer has the ability to do adaptive cursor sharing. That means that the optimizer will try to figure out that WHERE firstname LIKE 'S%' should do a table scan and WHERE firstname LIKE 'Smithfield-Manning%' should do an index scan and will maintain multiple query plans for the same statement in the shared pool. That solves most of the problems that we had with bind variable peeking in earlier versions.
But even here, the accuracy of the optimizer's selectivity estimates are generally going to be problematic for medium-length strings. It's generally going to know that a single-character string is very weakly selective and that a 20 character string is highly selective but even with a 256 bucket histogram, it's not going to have a whole lot of information about how selective something like WHERE firstname LIKE 'Smit%' really is. It may know roughly how selective 'Sm%' is based on the column histogram but it's guessing rather blindly at how selective the next two characters are. So it's not uncommon to end up in a situation where most of the queries work efficiently but the optimizer is convinced that WHERE firstname LIKE 'Cave%' isn't selective enough to use an index.
Assuming that this is a common query, you may want to consider using Oracle's plan stability features to force Oracle to use a particular plan regardless of the value of a bind variable. This may mean that users that enter a single character have to wait even longer than they would otherwise have waited because the index scan is substantially less efficient than doing a table scan. But that may be worth it for other users that are searching for short but reasonably distinctive last names. And you may do things like add a ROWNUM limiter to the query or add logic to the front end that requires a minimum number of characters in the search box to avoid situations where a table scan would be more efficient.
It's a bit strange to have both the upper and lower function-based indexes on the same field. And I don't think the optimizer will use either in your query as it its.
You should pick one or the other (and probably drop the last one too), and only ever query on the upper (or lower)-case with something like:
select id from user u where upper(u.firstname) like 'MIKE%'
Edit: look at this post too, has some interesting info How to use a function-based index on a column that contains NULLs in Oracle 10+?
It may not hit any of your indexes, because you are returning ID in the SELECT clause, which is not covered by the indexes.
If the index is very selective, and Oracle decides it is still worthwhile using it to find 'MIKE%' then perform a lookup on the data to get the ID column, then it may use 4. Index(FirstName). 2 and 3 will only be used if the column searched uses the exact function defined in the index.

Indexing affects only the WHERE clause?

If I have something like:
CREATE INDEX idx_myTable_field_x
ON myTable
USING btree (field_x);
SELECT COUNT(field_x), field_x FROM myTable GROUP BY field_x ORDER BY field_x;
Imagine myTable with around 500,000 rows and most of field_x values being unique.
Since I don't use any WHERE clause, will the created index have any effect at all in my query?
Edit: I'm asking this question because I don't get any relevant difference between query-times before and after creating the index; They always take about 8 seconds (which, of course is too much time!). Is this behaviour expected?
The index will not help here as you are reading the whole table anyway there is no use in going to an index first (PostgreSQL does not yet have index-only scans)
Because nearly all values in the index are unique, it wouldn't really help in this situation anyway. Index lookups (including index-scans for other DBMS) tend to be really helpful for lookup of a small number of rows.
There is a slight possibility that the index might be used for ordering but I doubt that.
If you look at the output of EXPLAIN ANALYZE VERBOSE you can see if the sorting is done in memory or (due to the size of the result) is done on disk.
If sorting is done on disk, you can speed up the query by increasing the work_mem - either globally or just for your session.
Since field_x is the only column referenced in your query, your index covers the query and should help you avoid lookups into actual rows of myTable.
EDIT: As indicated in the comment discussion below, while this answer is valid for most RDBMS implementations, it does not apply to postgresql.
The index should be used. If you ever want to see how your indexes are being used (or not), the execution plan of the query is a great place to see what the database has decided to do. In your case you should execute something like:
explain SELECT COUNT(field_x), field_x FROM myTable GROUP BY field_x ORDER BY field_x;
More information about what all the output you are seeing means can be found in the postgres docs: http://www.postgresql.org/docs/8.4/static/sql-explain.html
There is also: http://wiki.postgresql.org/wiki/Image:Explaining_EXPLAIN.pdf which is a bit more in depth.