How does SQL's LIKE work in case of Path Enumeration? - sql

I am reading the book SQL Antipatterns where a SQL query is used like this:
SELECT *
FROM Comments AS c
WHERE '1/4/6/7/' LIKE c.path || '%';
to find ancestors of comment #7 from this table:
I am not much familiar with the regex employed for LIKE and would appreciate understanding how it does its work. Specifically, does it matter that the literal '1/4/6/7' is located on the left hand of the LIKE keyword? And how does the entire WHERE predicate work (i.e. || '%')?

First of all, in case it is not clear, the || is the string concatenation operator. So, if the value of c.path is '1/', then c.path || '%' yields '1/%'.
So, obviously, you cannot do WHERE field LIKE 'constant%' because in this particular (weird) kind of query it is the constant that may be longer than the field, and not the other way around.
Usually, what we do with LIKE is WHERE field LIKE 'constant%' to check whether the value of the field starts with the constant. Here the author of the query wants to see whether the constant starts with the value of the field, which is a bizarre thing to do.

Simple LIKE expression in SQL (as opposed to regex LIKE, available in some RDBMS) does not support regular expressions. Instead, it supports two special "wildcard" characters: underscore _ that is roughly equivalent to dot . in regex, and percent % which is roughly equivalent to .* construct.
|| in the example is concatenation operator, similar to operator + applied to String objects in Java. Hence, a constant value 1/4/6/7/ is compared to a string from the path column followed by any characters - essentially, a prefix match.
This is a bad approach, because it places data from the table on the right side of the LIKE expression. This is very expensive, because this operation cannot use indexing, making the search run very slowly.

Related

SQL: Use REGEXP_REPLACE on query parameter inside of LIKE statement

I have a query which is supposed to find matching rows ignoring case and special characters that may be present both in the query and the corresponding column. For that I use REGEXP_REPLACE like this:
SELECT *
FROM Order
WHERE REGEXP_REPLACE(reference, '[^a-zA-Z0-9äöüÄÖÜ]', '') LIKE %:search%
where search is the name of the parameter I want to use. That works, but doesn't yet sanitize the search parameter from unwanted special characters.
What I would like to do is something like the following, i.e. having the REGEXP_REPLACE on the right side as well:
SELECT *
FROM Order
WHERE REGEXP_REPLACE(reference, '[^a-zA-Z0-9äöüÄÖÜ]', '') LIKE %REGEXP_REPLACE(:search, '[^a-zA-Z0-9äöüÄÖÜ]', '')%
However that doesn't work and I get the following error:
42000][1064] You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '%REGEXP_REPLACE(
Is it not possible to use a function on the parameter or as part of a LIKE statement? Are there any workarounds?
It looks like you want to create a string starting and ending with '%' to use in your LIKE operator. To do that in MySQL's dialect of SQL you need to do your string manipulation explicitly using the built-in string manipulation functions.
You can use those functions anywhere your query needs a text string.
Try using CONCAT in an expression like this to generate that string. You'll be able to use it on the right side of your LIKE.
CONCAT('%', REGEXP_REPLACE(:search, '[^a-zA-Z0-9äöüÄÖÜ]', ''), '%')
I hope you don't want your query to be fast. It will be slow. It must examine every value of Order.reference in your table. It's slow because
it's not sargable due to WHERE f(column) LIKE whatever, and
column LIKE '%something%' requires looking at every value of column, rather than random-acccessing a BTREE index.
If you build a database to scale up, you design it so your queries can be sargeable. Sargability here might look like
WHERE cleaned_up_reference
LIKE CONCAT(REGEXP_REPLACE(:search, '[^a-zA-Z0-9äöüÄÖÜ]', ''), '%')
without the leading % on the right, and without evaluating any function on the column or columns being searched.
You can try this:
SELECT * FROM Order a
WHERE REGEXP_REPLACE(a.reference, '[^a-zA-Z0-9äöüÄÖÜ]', '') LIKE '%:search%'

Pattern matching in Big query vs SSMS-Return strings which contain special characters or numerics

I'm a bit lost.
I've had a look at the documentation but I'm not sure if you can use LIKE and pattern match in Big Query the same as SSMS.
The code shown here works in SSMS but the results are not correct in Big Query, so was wondering if there was another way to do it.
WHERE column_name NOT LIKE '[a-Z]%'
I'm looking to return strings which contain special characters or numerics.
Use REGEXP_CONTAINS instead
where not regexp_contains(column_name, r'[a-zA-Z]')
Meantime, LIKE is also supported as a comparison operator

Postgres Querying ILIKE and "%#{}%"

Can I get an explanation to me how the "?" prevents sql injection?
Candy.where("taste ILIKE ?", "%#{term}%")
Also why is "%{term}%" used as opposed to just #{term}? What do the percentages represent?
Percentages are wild card characters that matches the string in any part of the value
You've actually asked two different questions there.
Neither of them are particularly related to Rails, so I am going to answer them generically (also because I'm not that familiar with Ruby!).
How does using '?' prevent SQL Injection
SQL Injection occurs when you use values provided from outside your program - user provided values - directly in SQL statements. For example, suppose you had this pseudo code:
sql="SELECT foo FROM bar WHERE name='"+name+"'"
where perhaps name was a variable containing user inputted data. However, if name contained a single quote (') then the SQL engine would think that single quote was the end of the value and continue parsing the remainder of the variable as SQL text.
Using placeholders (such as '?') avoid this because the value inside the placeholder does not need to be quoted - all of the content of the placeholder is treated as part of the value, none of it will be parsed as SQL, regardless of any embedded quotes.
Incidentally, the actual form of the placeholders used is somewhat dependent on the actual DB engine used and/or the client framework. Natively, Postgresql uses the $1, $2, etc for placeholders. Many frameworks extend this to allow '?' and other placeholder syntaxes.
Why is "%#{term}%" used as opposed to just #{term}
The SQL ILIKE operator uses '%' signs as wildcards. An expression such as:
taste ILIKE '%apple%'
would match 'apple', 'rottenApple', 'applesauce' or any other string containing 'apple' using a case insensitive match.
Note that the '%' signs are part of the right hand operand to ILIKE, within the quotes, so you cannot use placeholders like this:
Candy.where("taste ILIKE %?%", "#{term}")
An alternative would be:
Candy.where("taste ILIKE '%' || ? || '%'", "#{term}")
That works because || is the concatenation operator, so it concatenates the literal value % with the value of the placeholder and then the trailing literal value %.
When you use ? Rails will escape SQL control characters by itself.

How to use BETWEEN Operator with Text Value in SQL?

How am I going to use BETWEEN Operator with Text Value or what is the right syntax when you will select all products with a ProductName for example ending with any of the letter BETWEEN 'C' and 'M'?
Most SQL dialects provide the RIGHT() function. This allows you to do:
WHERE RIGHT(TextValue, 1) BETWEEN 'C' AND 'M'
If your database doesn't have this function, you can do something similar with the built-in functions. Also, the exact comparison might depend on the collation of the column/table/database/server. Sometimes comparisons are independent of case and sometimes they are dependent on case.
In case you are interested in an alternative method (which does work with the w3schools SQL editor), you can also use the LIKE operator:
WHERE ProductName LIKE '%[c-m]'
This will get you all Product Names ending on any character between C and M.
(It does work with the w3schools SQL Editor.)
In this case, the LIKE operator is using two wildcard characters:
1.%
Any string of zero or more characters.
2.[c-m]
Any single character within the specified range ([a-f]) or set
([abcdef]).
You can find more information about the LIKE operator here:
https://msdn.microsoft.com/en-us/library/ms179859.aspx

How to determine whether a varchar field DOES NOT contain characters in set

I need to determine if all rows in varchar column in a db contain any characters outside of the particular set below:
abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|=''
I tried this but am not sure if it is correct:
select AccName
from Transactions
where AccName not like '%[!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%'
Should this work?
Any help appeciated.
You cannot use a regular expression inside an ordinary LIKE condition in a query. If you want to use regular expressions, you will have to use a special operator. In MySQL, you could try the following:
SELECT AccName
FROM Transactions
WHERE AccName REGEXP [!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%';
If this doesn't run to boot, then you may have to tidy up the regular expression you gave. And as marc_s asked, the exact regular expression and query will depend on the DB system you are using.
Database management systems vary in their support for matching regular expressions. Examples below use PostgreSQL, which supports POSIX regular expressions, along with other flavors. Examples below also test for case-sensitive matches to avoid sentences like "'Mike' doesn't not match the regular expression".
AFAIK, no DBMS lets you mix the like operator with a regular expression.
A like expression in the form column_name like '%a%' will match 'a' if it appears anywhere in the column. But you need your regular expression to match on the whole value of the column. Anchor the regular expression at the start and end of each value (^ and $), and tell the dbms to match one or more instances (+) of the atom.
select 'Mike' ~ '^[a-zA-Z0-9]+$'; -- 'Mike' matches the regex
Write a failing test.
select 'Mike?' ~ '^[a-zA-Z0-9]+$'; -- 'Mike?' doesn't match the regex
Add the question mark to the regex, and verify the test succeeds.
select 'Mike?' ~ '^[a-zA-Z0-9?]+$'; -- 'Mike?' matches the regex
Repeat failing test and succeeding test for each character. When you've caught all the characters you want, invert the logic using the !~ operator in place of the ~ operator.
When your data is clean move this into a CHECK constraint.
PostgreSQL pattern matching