SQL: Use REGEXP_REPLACE on query parameter inside of LIKE statement - sql

I have a query which is supposed to find matching rows ignoring case and special characters that may be present both in the query and the corresponding column. For that I use REGEXP_REPLACE like this:
SELECT *
FROM Order
WHERE REGEXP_REPLACE(reference, '[^a-zA-Z0-9äöüÄÖÜ]', '') LIKE %:search%
where search is the name of the parameter I want to use. That works, but doesn't yet sanitize the search parameter from unwanted special characters.
What I would like to do is something like the following, i.e. having the REGEXP_REPLACE on the right side as well:
SELECT *
FROM Order
WHERE REGEXP_REPLACE(reference, '[^a-zA-Z0-9äöüÄÖÜ]', '') LIKE %REGEXP_REPLACE(:search, '[^a-zA-Z0-9äöüÄÖÜ]', '')%
However that doesn't work and I get the following error:
42000][1064] You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '%REGEXP_REPLACE(
Is it not possible to use a function on the parameter or as part of a LIKE statement? Are there any workarounds?

It looks like you want to create a string starting and ending with '%' to use in your LIKE operator. To do that in MySQL's dialect of SQL you need to do your string manipulation explicitly using the built-in string manipulation functions.
You can use those functions anywhere your query needs a text string.
Try using CONCAT in an expression like this to generate that string. You'll be able to use it on the right side of your LIKE.
CONCAT('%', REGEXP_REPLACE(:search, '[^a-zA-Z0-9äöüÄÖÜ]', ''), '%')
I hope you don't want your query to be fast. It will be slow. It must examine every value of Order.reference in your table. It's slow because
it's not sargable due to WHERE f(column) LIKE whatever, and
column LIKE '%something%' requires looking at every value of column, rather than random-acccessing a BTREE index.
If you build a database to scale up, you design it so your queries can be sargeable. Sargability here might look like
WHERE cleaned_up_reference
LIKE CONCAT(REGEXP_REPLACE(:search, '[^a-zA-Z0-9äöüÄÖÜ]', ''), '%')
without the leading % on the right, and without evaluating any function on the column or columns being searched.

You can try this:
SELECT * FROM Order a
WHERE REGEXP_REPLACE(a.reference, '[^a-zA-Z0-9äöüÄÖÜ]', '') LIKE '%:search%'

Related

regarding SQL Query "LIKE"

Why specifically '%' used in SQL for searching something with LIKE Query. Eg: If i want to search all the persons with name starting with word 's' why i have to write "LIKE 's%'" but not just write "LIKE 's'"
A wildcard character(%) is used to substitute one or more characters in a string. This is used to denote and display all other character which comes after the selected string in sql query. Because of that, we are using like letter% in sql query.

Django ORM underscore wildcard

I have been searching for a way of using Django ORM to use the SQL underscore wildcard, and do something equivalent to this:
SELECT * FROM table
WHERE field LIKE 'abc_wxyz'
Currently, I am doing:
field_like = 'abc_wxyz'
result = MyClass.objects.extra(where=["field LIKE " + field_like])
I already tried with contains() and icontains(), but that's not what I need, since what it does is adding parenthesis to the query:
SELECT * FROM table
WHERE field LIKE '%abc/_wxyz%'
Thanks!
You can use __regex lookup to build more complex lookup expressions than __contains, __startswith or __endswith (can add "i" character to beginning of each of these to make lookups case insensitive, like icontains). In your case, I think
MyClass.objects.filter(field__regex=r'^abc.wxyz$')
Would do what you are trying to do.
You can use the field__contains attribute.
for example:
MyClass.objects.filter(field__contains='abc_wxyz')
This is equivalent to:
SELECT * FROM MyClass WHERE field LIKE 'abc_wxyz'
Lord Elron's answer is incorrect. Django escapes all developer supplied wildcard characters to the LIKE-type lookups. The statement is equivalent to
SELECT * FROM MyClass WHERE field LIKE '%abc/_wxyz%'
(as the OP discovered) and the underscore has no effect.
See Escaping percent signs and underscores in LIKE statements
The field lookups that equate to LIKE SQL statements (iexact, contains, icontains, startswith, istartswith, endswith and iendswith) will automatically escape the two special characters used in LIKE statements – the percent sign and the underscore.

How does SQL's LIKE work in case of Path Enumeration?

I am reading the book SQL Antipatterns where a SQL query is used like this:
SELECT *
FROM Comments AS c
WHERE '1/4/6/7/' LIKE c.path || '%';
to find ancestors of comment #7 from this table:
I am not much familiar with the regex employed for LIKE and would appreciate understanding how it does its work. Specifically, does it matter that the literal '1/4/6/7' is located on the left hand of the LIKE keyword? And how does the entire WHERE predicate work (i.e. || '%')?
First of all, in case it is not clear, the || is the string concatenation operator. So, if the value of c.path is '1/', then c.path || '%' yields '1/%'.
So, obviously, you cannot do WHERE field LIKE 'constant%' because in this particular (weird) kind of query it is the constant that may be longer than the field, and not the other way around.
Usually, what we do with LIKE is WHERE field LIKE 'constant%' to check whether the value of the field starts with the constant. Here the author of the query wants to see whether the constant starts with the value of the field, which is a bizarre thing to do.
Simple LIKE expression in SQL (as opposed to regex LIKE, available in some RDBMS) does not support regular expressions. Instead, it supports two special "wildcard" characters: underscore _ that is roughly equivalent to dot . in regex, and percent % which is roughly equivalent to .* construct.
|| in the example is concatenation operator, similar to operator + applied to String objects in Java. Hence, a constant value 1/4/6/7/ is compared to a string from the path column followed by any characters - essentially, a prefix match.
This is a bad approach, because it places data from the table on the right side of the LIKE expression. This is very expensive, because this operation cannot use indexing, making the search run very slowly.

WHERE Clause Requires Like When Equals Should Work

I have a query that I think should look like this:
select *
from Requesters
where CITIZEN_STATUS = 'OS-IE ';
The field CITIZEN_STATUS, whose data type is varchar(15), has a trailing space for this particular value. I have pasted it into Notepad++ and looked at it with a hex editor, and the final space is indeed 0x20.
For the query to work, I have to write it like this:
select *
from Requesters
where CITIZEN_STATUS like 'OS-IE%';
So, obviously, I have a workaround and the question is not urgent. But I would really like to know why the first query fails to do what I expect. Does anyone have any ideas?
I should mention I am using SQL Server 2005 and can provide more information about the configuration if needed.
In MySQL 5, this query works. However, it does not distinguish on trailing whitespace. The query matches 'OS-IE ' as well as 'OS-IE'. In SQL Server 2005 you can use a regular expression that defines the end of a line. The correct character for this is the dollar sign '$' to indicate that you do want the space. See http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

SQLite not using index when using concatenation

I am using the following SQL statement for SQLite:
select * from words where \"word\" like ? || '%' || ? ;
In order to bind parameters to the first and last letters. I have tested this both with and without an index on the column word, and the results are the same. However, when running the queries as
select * from words where \"word\" like 'a%a';
etc. (that is, hardcoding each value instead of using ||, the query is about x10 faster when indexed.
Can someone show me how to use the index and the parameters both?
I found an answer thanks to the sqlite mailing list. It says here (http://sqlite.org/optoverview.html), section 4: "The right-hand side of the LIKE or GLOB must be either a string literal or a parameter bound to a string literal that does not begin with a wildcard character."