How to _properly_ escape LIKE queries in Rails? [duplicate] - sql

I want to match a url field against a url prefix (which may contain percent signs), e.g. .where("url LIKE ?", "#{some_url}%").
What's the most Rails way?

From Rails version 4.2.x there is an active record method called sanitize_sql_like. So, you can do in your model a search scope like:
scope :search, -> search { where('"accounts"."name" LIKE ?', "#{sanitize_sql_like(search)}%") }
and call the scope like:
Account.search('Test_%')
The resulting escaped sql string is:
SELECT "accounts".* FROM "accounts" WHERE ("accounts"."name" LIKE 'Test\_\%%')
Read more here: http://edgeapi.rubyonrails.org/classes/ActiveRecord/Sanitization/ClassMethods.html

If I understand correctly, you're worried about "%" appearing inside some_url and rightly so; you should also be worried about embedded underscores ("_") too, they're the LIKE version of "." in a regex. I don't think there is any Rails-specific way of doing this so you're left with gsub:
.where('url like ?', some_url.gsub('%', '\\\\\%').gsub('_', '\\\\\_') + '%')
There's no need for string interpolation here either. You need to double the backslashes to escape their meaning from the database's string parser so that the LIKE parser will see simple "\%" and know to ignore the escaped percent sign.
You should check your logs to make sure the two backslashes get through. I'm getting confusing results from checking things in irb, using five (!) gets the right output but I don't see the sense in it; if anyone does see the sense in five of them, an explanatory comment would be appreciated.
UPDATE: Jason King has kindly offered a simplification for the nightmare of escaped escape characters. This lets you specify a temporary escape character so you can do things like this:
.where("url LIKE ? ESCAPE '!'", some_url.gsub(/[!%_]/) { |x| '!' + x })
I've also switched to the block form of gsub to make it a bit less nasty.
This is standard SQL92 syntax, so will work in any DB that supports that, including PostgreSQL, MySQL and SQLite.
Embedding one language inside another is always a bit of a nightmarish kludge and there's not that much you can do about it. There will always be ugly little bits that you just have to grin and bear.

https://gist.github.com/3656283
With this code,
Item.where(Item.arel_table[:name].matches("%sample!%code%"))
correctly escapes % between "sample" and "code", and matches "AAAsample%codeBBB" but does not for "AAAsampleBBBcodeCCC" on MySQL, PostgreSQL and SQLite3 at least.

Post.where('url like ?', "%#{some_url + '%'}%)

Related

VB.NET LIKE comparison with # and *

I have the following file name: d#cument.txt
I want to compare to see if it matches pattern: d#cument*
I do this like this:
return "d#cument.txt" Like "d#cument*"
This returns false. The ending asterix seems to be the problem. Because if I change file name to just "d#cument" and Like pattern to "d#cument" it returns true.
Any idea why and/or workaround?
Documentation states that # has a meaning in a Like pattern so you need to "escape" it by putting it between brackets :
Return "d#cument.txt" Like "d[#]cument*"
Alternatively you can use String.StartsWith to do the same thing without worrying with special chars.
Note also that though Like is convenient for simple patterns ; for things more complex it could be better to switch to Regex instead.

Lucene ignore keywords in search term

This seems like it should be simple, but I can't figure out how to get Lucene to ignore the AND, OR, and NOT keywords - the query parser throws a parse error when it gets one. I have a query builder class that splits the search term so that it searches on the words themselves as well as on n-grams in the word. I'm using Lucene in Java.
So in a search for, say, "ANDERSON COOPER" the query string looks like:
name: (ANDERSON COOPER "ANDERSON COOPER")^5 gram4: ( ANDE NDER DERS ERSO RSON
SONC ONCO NCOO COOP OOPE OPER)
the query parser throws an error when it gets those ANDs. Ideally, I'd like the parser to just ignore AND, OR, NOT altogether, and I'll use the &&, ||, and ! equivalents if I need them - do I have to modify the code in the QueryParser class itself to get this? Or is there an easier way? I could also just insert an escape character for these cases if that is the best way to do it, but adding \ before the word AND doesn't seem to do anything.
You can wrap the AND in quotes like this: "AND". Is that easy? A regex could probably do that easily if you know exactly what your queries look like.
The parser shouldn't have a problem with it, and the PhraseQuery will be rewritten as a term query, so it will be a small constant-time performance difference big-oh O(1).
The regex could probably look like this:
\b(AND|OR|NOT)\b
Which would be replaced with
"$1"

a proper way to escape %% when building LIKE queries in Rails 3 / ActiveRecord

I want to match a url field against a url prefix (which may contain percent signs), e.g. .where("url LIKE ?", "#{some_url}%").
What's the most Rails way?
From Rails version 4.2.x there is an active record method called sanitize_sql_like. So, you can do in your model a search scope like:
scope :search, -> search { where('"accounts"."name" LIKE ?', "#{sanitize_sql_like(search)}%") }
and call the scope like:
Account.search('Test_%')
The resulting escaped sql string is:
SELECT "accounts".* FROM "accounts" WHERE ("accounts"."name" LIKE 'Test\_\%%')
Read more here: http://edgeapi.rubyonrails.org/classes/ActiveRecord/Sanitization/ClassMethods.html
If I understand correctly, you're worried about "%" appearing inside some_url and rightly so; you should also be worried about embedded underscores ("_") too, they're the LIKE version of "." in a regex. I don't think there is any Rails-specific way of doing this so you're left with gsub:
.where('url like ?', some_url.gsub('%', '\\\\\%').gsub('_', '\\\\\_') + '%')
There's no need for string interpolation here either. You need to double the backslashes to escape their meaning from the database's string parser so that the LIKE parser will see simple "\%" and know to ignore the escaped percent sign.
You should check your logs to make sure the two backslashes get through. I'm getting confusing results from checking things in irb, using five (!) gets the right output but I don't see the sense in it; if anyone does see the sense in five of them, an explanatory comment would be appreciated.
UPDATE: Jason King has kindly offered a simplification for the nightmare of escaped escape characters. This lets you specify a temporary escape character so you can do things like this:
.where("url LIKE ? ESCAPE '!'", some_url.gsub(/[!%_]/) { |x| '!' + x })
I've also switched to the block form of gsub to make it a bit less nasty.
This is standard SQL92 syntax, so will work in any DB that supports that, including PostgreSQL, MySQL and SQLite.
Embedding one language inside another is always a bit of a nightmarish kludge and there's not that much you can do about it. There will always be ugly little bits that you just have to grin and bear.
https://gist.github.com/3656283
With this code,
Item.where(Item.arel_table[:name].matches("%sample!%code%"))
correctly escapes % between "sample" and "code", and matches "AAAsample%codeBBB" but does not for "AAAsampleBBBcodeCCC" on MySQL, PostgreSQL and SQLite3 at least.
Post.where('url like ?', "%#{some_url + '%'}%)

Regex for doubling single quotes in string

I would like a regex that would make this:
VALUES('Hit 'n Run')
into
VALUES('Hit ''n Run')
Is this possible?
No, this is not really possible. If you have VALUES('Hit 'n Run'), you already have an invalid mixture of delimiting apostrophes and literal apostrophes. String processing is like mixing sugar and salt: once you've mixed contexts without proper escaping there is no way of pulling them back apart.
If you are trying to rescue broken data, you could try something like (?<!\()'(?!\)) to match apostrophes that don't have a bracket next to them. It's a weak and easily fooled tactic but for simple data it might work.
If you are putting together dynamic SQL queries you must escape the ' before you put it into the query string, either using a simple string replace ' with '' if you're sure that's the only escape your DBMS requires, or — much better — using a dedicated SQL-string-literal-escaping function appropriate to your DBMS. Quite what that function would be depends on what platform (language, DBMS) you're talking about.
Any pattern that could be expressed in RegEx could then be exploited to create the very SQL injection issues you're trying to avoid.
Example nasty input:
VALUES(');DELETE * FROM customer;SELECT '

Split SQL statements

I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server.
MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.
I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.
http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.
I'd be happy to hear your suggestions.
Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.
The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.
As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.
I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:
Read in a character
If the character is not special, CONTINUE
If the character is escaped (checking this probably requires examining the previous character), CONTINUE
If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.
Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.
maybe with the following Java Regexp? check the test...
#Test
public void testRegexp() {
String s = //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n" + //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n";
String regexp = "([^;]*?('.*?')?)*?;\\s*";
assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.
Try this. Just replaced the 1st ' with \" and it seems to work for both ' and "
;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)