I would like a regex that would make this:
VALUES('Hit 'n Run')
into
VALUES('Hit ''n Run')
Is this possible?
No, this is not really possible. If you have VALUES('Hit 'n Run'), you already have an invalid mixture of delimiting apostrophes and literal apostrophes. String processing is like mixing sugar and salt: once you've mixed contexts without proper escaping there is no way of pulling them back apart.
If you are trying to rescue broken data, you could try something like (?<!\()'(?!\)) to match apostrophes that don't have a bracket next to them. It's a weak and easily fooled tactic but for simple data it might work.
If you are putting together dynamic SQL queries you must escape the ' before you put it into the query string, either using a simple string replace ' with '' if you're sure that's the only escape your DBMS requires, or — much better — using a dedicated SQL-string-literal-escaping function appropriate to your DBMS. Quite what that function would be depends on what platform (language, DBMS) you're talking about.
Any pattern that could be expressed in RegEx could then be exploited to create the very SQL injection issues you're trying to avoid.
Example nasty input:
VALUES(');DELETE * FROM customer;SELECT '
Related
I have a small issue, so here's a bit of background:
We are developing a Qlik Sense application and we normally write our expressions to an external script. We save these as variables, and then evaluate the variables in the application. The advantage of this is a) we can use better version control with GIT, and b) we can separate the queries from the application if we ever need to change platforms in future.
My Problem:
I have come across a situation where we need to concat a string to the result of an expression, which can be done easily in the application, but when you save the expression to an external file the single quotes around the expression interfere with the single quotes around the string.
I tried
using double quotes for the string only, but qlik doesn't evaluate it correctly.
same goes for the expression using double quotes only.
escaping the single quote inside the expression, eg. "\'" but same story.
What I was thinking of doing next was changing the quote to a rogue character so qlik would ignore it as text, then replacing it with a quote later so qlik would then try to evaluate it.
Example Code:
SET variable = 'if(isnull(month),'Month: ' & date(now(), 'MMM-YYYY'),'Month: ' & only({$<year={2016}, month={6}>}month)';
After some further research I found that Qlik has its own way of escaping characters without using the "\" character. I was able to solve this issue by escaping the inner single quotes like this:
SET variable = 'if(isnull(month),''Month: '' & date(now(), ''MMM-YYYY''),''Month: '' & only({$<year={2016}, month={6}>}month)';
Feels like a pretty silly oversight now, but hopefully this will save someone some time in the future.
I want to match a url field against a url prefix (which may contain percent signs), e.g. .where("url LIKE ?", "#{some_url}%").
What's the most Rails way?
From Rails version 4.2.x there is an active record method called sanitize_sql_like. So, you can do in your model a search scope like:
scope :search, -> search { where('"accounts"."name" LIKE ?', "#{sanitize_sql_like(search)}%") }
and call the scope like:
Account.search('Test_%')
The resulting escaped sql string is:
SELECT "accounts".* FROM "accounts" WHERE ("accounts"."name" LIKE 'Test\_\%%')
Read more here: http://edgeapi.rubyonrails.org/classes/ActiveRecord/Sanitization/ClassMethods.html
If I understand correctly, you're worried about "%" appearing inside some_url and rightly so; you should also be worried about embedded underscores ("_") too, they're the LIKE version of "." in a regex. I don't think there is any Rails-specific way of doing this so you're left with gsub:
.where('url like ?', some_url.gsub('%', '\\\\\%').gsub('_', '\\\\\_') + '%')
There's no need for string interpolation here either. You need to double the backslashes to escape their meaning from the database's string parser so that the LIKE parser will see simple "\%" and know to ignore the escaped percent sign.
You should check your logs to make sure the two backslashes get through. I'm getting confusing results from checking things in irb, using five (!) gets the right output but I don't see the sense in it; if anyone does see the sense in five of them, an explanatory comment would be appreciated.
UPDATE: Jason King has kindly offered a simplification for the nightmare of escaped escape characters. This lets you specify a temporary escape character so you can do things like this:
.where("url LIKE ? ESCAPE '!'", some_url.gsub(/[!%_]/) { |x| '!' + x })
I've also switched to the block form of gsub to make it a bit less nasty.
This is standard SQL92 syntax, so will work in any DB that supports that, including PostgreSQL, MySQL and SQLite.
Embedding one language inside another is always a bit of a nightmarish kludge and there's not that much you can do about it. There will always be ugly little bits that you just have to grin and bear.
https://gist.github.com/3656283
With this code,
Item.where(Item.arel_table[:name].matches("%sample!%code%"))
correctly escapes % between "sample" and "code", and matches "AAAsample%codeBBB" but does not for "AAAsampleBBBcodeCCC" on MySQL, PostgreSQL and SQLite3 at least.
Post.where('url like ?', "%#{some_url + '%'}%)
Is it possible to do String comparison where one of the strings I am comparing against has wild cards and is generally just for formatting purposes. For example
Dim correctFormat as String = "##-##-###-##"
Dim stringToCheck = someClass.SomeFunctionThatReturnsAStringToCheck
If FormatOf(CorrectFormat) = FormatOF(StringToCheck) then
Else
End if
I am aware of the made up FormatOf syntax, but I'm just using it to show what I am asking.
No need for regular expressions.
You can simply use the Like operator, which supports ?, * and # as wildcards and also character lists ([...], [!...])
So you simply change your code to:
If stringToCheck Like correctFormat Then
and it will work as expected.
The way is to use regular expressions - that's what they are for.
This is the regular expression that matches the format you have posted:
^\d{2}-\d{2}-\d{3}-\d{2}$
As the previous post mentioned, you should use regular expressions for that purpose - they are way better for that task.
Sadly, learning them can be confusing, especially finding bugs can be really annoying.
I really like http://www.regular-expressions.info/ and http://regexpal.com/ for building and testing regexes before.
In VB.net use something like reg.ismatch
I want to match a url field against a url prefix (which may contain percent signs), e.g. .where("url LIKE ?", "#{some_url}%").
What's the most Rails way?
From Rails version 4.2.x there is an active record method called sanitize_sql_like. So, you can do in your model a search scope like:
scope :search, -> search { where('"accounts"."name" LIKE ?', "#{sanitize_sql_like(search)}%") }
and call the scope like:
Account.search('Test_%')
The resulting escaped sql string is:
SELECT "accounts".* FROM "accounts" WHERE ("accounts"."name" LIKE 'Test\_\%%')
Read more here: http://edgeapi.rubyonrails.org/classes/ActiveRecord/Sanitization/ClassMethods.html
If I understand correctly, you're worried about "%" appearing inside some_url and rightly so; you should also be worried about embedded underscores ("_") too, they're the LIKE version of "." in a regex. I don't think there is any Rails-specific way of doing this so you're left with gsub:
.where('url like ?', some_url.gsub('%', '\\\\\%').gsub('_', '\\\\\_') + '%')
There's no need for string interpolation here either. You need to double the backslashes to escape their meaning from the database's string parser so that the LIKE parser will see simple "\%" and know to ignore the escaped percent sign.
You should check your logs to make sure the two backslashes get through. I'm getting confusing results from checking things in irb, using five (!) gets the right output but I don't see the sense in it; if anyone does see the sense in five of them, an explanatory comment would be appreciated.
UPDATE: Jason King has kindly offered a simplification for the nightmare of escaped escape characters. This lets you specify a temporary escape character so you can do things like this:
.where("url LIKE ? ESCAPE '!'", some_url.gsub(/[!%_]/) { |x| '!' + x })
I've also switched to the block form of gsub to make it a bit less nasty.
This is standard SQL92 syntax, so will work in any DB that supports that, including PostgreSQL, MySQL and SQLite.
Embedding one language inside another is always a bit of a nightmarish kludge and there's not that much you can do about it. There will always be ugly little bits that you just have to grin and bear.
https://gist.github.com/3656283
With this code,
Item.where(Item.arel_table[:name].matches("%sample!%code%"))
correctly escapes % between "sample" and "code", and matches "AAAsample%codeBBB" but does not for "AAAsampleBBBcodeCCC" on MySQL, PostgreSQL and SQLite3 at least.
Post.where('url like ?', "%#{some_url + '%'}%)
I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server.
MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.
I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.
http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.
I'd be happy to hear your suggestions.
Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.
The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.
As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.
I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:
Read in a character
If the character is not special, CONTINUE
If the character is escaped (checking this probably requires examining the previous character), CONTINUE
If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.
Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.
maybe with the following Java Regexp? check the test...
#Test
public void testRegexp() {
String s = //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n" + //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n";
String regexp = "([^;]*?('.*?')?)*?;\\s*";
assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.
Try this. Just replaced the 1st ' with \" and it seems to work for both ' and "
;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)