Using index on `LIKE :varname || '%'` in firebird - sql

I have a query
SELECT DISTINCT FKDOCUMENT
FROM PNTM_DOCUMENTS_FT_INDEX
WHERE WORD LIKE 'sometext%'
PLAN SORT ((PNTM_DOCUMENTS_FT_INDEX INDEX (IX_PNTM_DOCUMENTS_FT_INDEX)))
And it works okay.
BUT When I try to use concatenated string with LIKE, firebird does not use indicies:
SELECT DISTINCT FKDOCUMENT
FROM PNTM_DOCUMENTS_FT_INDEX
WHERE WORD LIKE 'sometext' || '%'
PLAN SORT ((PNTM_DOCUMENTS_FT_INDEX NATURAL))
How to force it to use indicies?

The short answer, as ain already commented, is to use STARTING [WITH] instead of LIKE if you don't need a like pattern, but always want to do a prefix search. So:
WHERE WORD STARTING WITH 'sometext' -- No %!
or
WHERE WORD STARTING WITH :param
As far as I know this is exactly what Firebird does with LIKE 'sometext%'. This will use an index when available, and you don't need to escape it for presence of like pattern symbols. The downside is that you can't use like pattern symbols.
Now as to why Firebird doesn't use an index when you use
WHERE WORD LIKE :param || '%' -- (or LIKE :param) for that matter
or
WHERE WORD LIKE 'sometext' || '%'
The first case is easily explained: statement preparation is done separately from execution. Firebird needs to take into account the possibility that the parameter value starts with a _ or - worse - a %, and it can't use an index for that.
As to the second case, it should be possible to optimize it to the equivalent of LIKE 'sometext%', but Firebird probably considers anything that is not a plain literal as not optimizable. For this specific example it would be possible to decide it should be optimizable, but this a very specific exception (usually one doesn't concatenate literals like this, most of the time one or more 'black' boxes like columns, functions, case statements etc are involved).

Related

How to ignore left padding in a LIKE statement?

I'm modifying code for an ID search bar and I'm trying to enable the user to be able to search for ID's using SQL syntax, so for example '%535%'. Doing just that is simple enough, but I've been searching and racking my brains for a while now and I can't seem to find a solution to the issue described below:
The problem is that the IDs are all left-padded varchar(14), as in:
' 8534'
' 393583'
' 123456/789'
This virtually disables the user from searching only for IDs that begin with a certain sequence, as '85%' returns no results due to the whitespace padding.
The site I'm maintaining is an oldie written in classic ASP (w/ JScript) and the search is done via a stored procedure with the whole 'WHERE' clause being passed in as a parameter.
I'm not able to modify the database, so what I"m asking is: is there any way to modify the clause so that the padding is ignored and '52%' returns IDs beginning with 52?
You want:
where ltrim(id) like '123%'
or, assuming that there are no interior spaces in the id:
where concat(' ', id) like '% 123%'
SELECT LTRIM(ID) FROM table WHERE ID LIKE '%1234%'
Edit as you can only modify WHERE statement
WHERE LTRIM(ID) LIKE '1234%'
Functions in the where clause tend to be slow. Something like this might be quicker:
where id like '123%'
or id like '% 123%'

SELECT by string prefix using an index

You have a column foo, of some string type, with a index on that column. You want to SELECT from the table WHERE the foo column has the prefix 'pre'. Obviously, the index should be able to help here.
Here is the most obvious way to search by prefix:
SELECT * FROM tab WHERE foo LIKE 'pre%';
Unfortunately, this does not get optimized to use the index (in Oracle or Postgres, at least).
The following, however, does work:
SELECT * FROM tab WHERE 'pre' <= foo AND foo < 'prf';
But are there better ways to accomplish this, or are there ways of making the above more elegant? In particular:
I need a function from 'pre' to 'prf', but this has to work for any underlying collation. Also, it's more complicated than above, because if searching for e.g. 'prz' then the upper bound would have to be 'psa', and so on.
Can I abstract this into a stored function/procedure and still hit the index? So I could write something like ... WHERE prefix('pre', foo);?
Answers for all DBMSes appreciated.
The database is quite important here. It so happens that SQL Server does this optimization for like.
One way is to do something like this:
where foo >= 'pre' and foo <= 'pre+'~'
'~' has the largest 7-bit ASCII value of a printable character, so it is basically bigger than anything else. This however, may be a problem if you are using wide characters or a non-standard character set.
You cannot abstract this into a function, because use of a function generally precludes the use of indexes. If you are always looking at the first three characters, then in Oracle you can create an index on those three characters (something called a "function-based index").
How about
select * from tab where foo between 'pre' and 'prf' and foo != 'prf'
this enables the index same way. The RDBMS must be pretty dumb not to use an index for that.

SQL: transform full-text search into like construction

I've got stored procedure that performs search using full-text indexes in general case. But I can't build full-text index for one field, and I need to use LIKE construction.
So, the problem is: parameter could be
"a*" or "b*"
like parameter for CONTAINS command.
Сan anyone give a good solution, how to transform this parameter for LIKE construction.
Thank you.
P.S: I use MSSQL Server
Depending on the full-text search constructs you want to support, this is generally impossible.
According to MSDN, full-text search syntax on SQL Server supports these constructs:
One or more specific words or phrases (simple term)
something along LIKE '%[,;.-()!? ]Term[,;.-()!? ]%'
A word or a phrase where the words begin with specified text (prefix term)
something along LIKE '%[,;.-()!? ]Term%'
Inflectional forms of a specific word (generation term)
Not possible
A word or phrase close to another word or phrase (proximity term)
Not possible
Synonymous forms of a specific word (thesaurus)
Not possible
Words or phrases using weighted values (weighted term)
Not possible
Those which I have marked "not possible" can't really be translated to LIKE queries, but of course you could get inventive (using your own stemming algorithm for inflectional forms, or your own thesaurus for synonyms) to support at least some of those.
In the end, you will probably need to use dynamic SQL.
Here is a way you can get the correct WHERE clause, given that input:
declare #str varchar(255) = '"a*" or "b*"';
with const as (select 'col' as col)
select col+' like '+replace(replace(REPLACE(#str, '"', ''''), '*', '%'), 'or ', 'or '+COL+' like ') as WhereClause
from const
The "const" is just a table with one column to specify your column name. It allows it to be specified in one place.
This just does replaces to get the correct syntax for LIKE. Of course, this would be more complex to support more functionality from CONTAINS.
Thanks to everyone!
Unfortunately expression parsing is not enough for general case.
I use regular expressions in MS SQL SERVER
http://anastasiosyal.com/POST/2008/07/05/REGULAR-EXPRESSIONS-IN-MS-SQL-SERVER-USING-CLR.ASPX

How to create simple fuzzy search with PostgreSQL only?

I have a little problem with search functionality on my RoR based site. I have many Produts with some CODEs. This code can be any string like "AB-123-lHdfj". Now I use ILIKE operator to find products:
Product.where("code ILIKE ?", "%" + params[:search] + "%")
It works fine, but it can't find product with codes like "AB123-lHdfj", or "AB123lHdfj".
What should I do for this? May be Postgres has some string normalization function, or some other methods to help me?
Postgres provides a module with several string comparsion functions such as soundex and metaphone. But you will want to use the levenshtein edit distance function.
Example:
test=# SELECT levenshtein('GUMBO', 'GAMBOL');
levenshtein
-------------
2
(1 row)
The 2 is the edit distance between the two words. When you apply this against a number of words and sort by the edit distance result you will have the type of fuzzy matches that you're looking for.
Try this query sample: (with your own object names and data of course)
SELECT *
FROM some_table
WHERE levenshtein(code, 'AB123-lHdfj') <= 3
ORDER BY levenshtein(code, 'AB123-lHdfj')
LIMIT 10
This query says:
Give me the top 10 results of all data from some_table where the edit distance between the code value and the input 'AB123-lHdfj' is less than 3. You will get back all rows where the value of code is within 3 characters difference to 'AB123-lHdfj'...
Note: if you get an error like:
function levenshtein(character varying, unknown) does not exist
Install the fuzzystrmatch extension using:
test=# CREATE EXTENSION fuzzystrmatch;
Paul told you about levenshtein(). That's a very useful tool, but it's also very slow with big tables. It has to calculate the Levenshtein distance from the search term for every single row. That's expensive and cannot use an index. The "accelerated" variant levenshtein_less_equal() is faster for long strings, but still slow without index support.
If your requirements are as simple as the example suggests, you can still use LIKE. Just replace any - in your search term with % in the WHERE clause. So instead of:
WHERE code ILIKE '%AB-123-lHdfj%'
Use:
WHERE code ILIKE '%AB%123%lHdfj%'
Or, dynamically:
WHERE code ILIKE '%' || replace('AB-123-lHdfj', '-', '%') || '%'
% in LIKE patterns stands for 0-n characters. Or use _ for exactly one character. Or use regular expressions for a smarter match:
WHERE code ~* 'AB.?123.?lHdfj'
.? ... 0 or 1 characters
Or:
WHERE code ~* 'AB\-?123\-?lHdfj'
\-? ... 0 or 1 dashes
You may want to escape special characters in LIKE or regexp patterns. See:
Escape function for regular expression or LIKE patterns
If your actual problem is more complex and you need something faster then there are various options, depending on your requirements:
There is full text search, of course. But this may be an overkill in your case.
A more likely candidate is trigram-matching with the additional module pg_trgm. See:
Using Levenshtein function on each element in a tsvector?
PostgreSQL LIKE query performance variations
Related blog post by Depesz
Can be combined it with LIKE, ILIKE, ~, or ~* since PostgreSQL 9.1.
Also interesting in this context: the similarity() function or % operator of that module.
Last but not least you can implement a hand-knit solution with a function to normalize the strings to be searched. For instance, you could transform AB1-23-lHdfj --> ab123lhdfj, save it in an additional column and search with terms transformed the same way.
Or use an index on the expression instead of the redundant column. (Involved functions must be IMMUTABLE.) Possibly combine that with pg_tgrm from above.
Overview of pattern-matching techniques:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

SQL Contains - only match at start

For some reason I cannot find the answer on Google! But with the SQL contains function how can I tell it to start at the beginning of a string, I.e I am looking for the full-text equivalent to
LIKE 'some_term%'.
I know I can use like, but since I already have the full-text index set up, AND the table is expected to have thousands of rows, I would prefer to use Contains.
Thanks!
You want something like this:
Rather than specify multiple terms, you can use a 'prefix term' if the
terms begin with the same characters. To use a prefix term, specify
the beginning characters, then add an asterisk (*) wildcard to the end
of the term. Enclose the prefix term in double quotes. The following
statement returns the same results as the previous one.
-- Search for all terms that begin with 'storm'
SELECT StormID, StormHead, StormBody FROM StormyWeather
WHERE CONTAINS(StormHead, '"storm*"')
http://www.simple-talk.com/sql/learn-sql-server/full-text-indexing-workbench/
You can use CONTAINS with a LIKE subquery for matching only a start:
SELECT *
FROM (
SELECT *
FROM myTable WHERE CONTAINS('"Alice in wonderland"')
) AS S1
WHERE S1.edition LIKE 'Alice in wonderland%'
This way, the slow LIKE query will be run against a smaller set
The only solution I can think of it to actually prepend a unique word to the beginning of every field in the table.
e.g. Update every row so that 'xfirstword ' appears at the start of the text (e.g. Field1). Then you can search for CONTAINS(Field1, 'NEAR ((xfirstword, "TERM*"),0)')
Pretty crappy solution, especially as we know that the full text index stores the actual position of each word in the text (see this link for details: http://msdn.microsoft.com/en-us/library/ms142551.aspx)
I am facing the similar issue. This is what I have implemented as a work around.
I have made another table and pulled only the rows like 'some_term%'.
Now, on this new table I have implemented the FullText search.
Please do inform me if you tried some other better approach