how to regex match string escaped with sql style? - sql

examples:
"""Romeo and Juliet"""
'another string that is quoted with '' single quotes'
the problem is that the string can have characters used for db escaping even in it's beginning and end, so regex should look if given char is used in sequence of odd length that is 1,3,5... at the end of matched string

Try this regex: '(?:[^']|'')*' for single quotes. The same for double quotes, i.e. full regex:
'(?:[^']|'')*'|"(?:[^"]|"")*"
In string hello 'my ''beautiful''' 'world'! """Romeo and Juliet""" it will find:
'my ''beautiful'''
'world'
"""Romeo and Juliet"""

Where your string includes (or might include) characters that cause problems in your sql, you should always escape.
If you're using PHP with a mysql database, use mysql_real_escape_string($text); (docs)
For other databases and languages, you'll have to check for your specific purposes, but there's likely to be an existing method.

Related

Printing Unnecessary escape character [duplicate]

I tried many ways to get a single backslash from an executed (I don't mean an input from html).
I can get special characters as tab, new line and many others then escape them to \\t or \\n or \\(someother character) but I cannot get a single backslash when a non-special character is next to it.
I don't want something like:
str = "\apple"; // I want this, to return:
console.log(str); // \apple
and if I try to get character at 0 then I get a instead of \.
(See ES2015 update at the end of the answer.)
You've tagged your question both string and regex.
In JavaScript, the backslash has special meaning both in string literals and in regular expressions. If you want an actual backslash in the string or regex, you have to write two: \\.
The following string starts with one backslash, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash in the string:
var str = "\\I have one backslash";
The following regular expression will match a single backslash (not two); again, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash character in the regular expression pattern:
var rex = /\\/;
If you're using a string to create a regular expression (rather than using a regular expression literal as I did above), note that you're dealing with two levels: The string level, and the regular expression level. So to create a regular expression using a string that matches a single backslash, you end up using four:
// Matches *one* backslash
var rex = new RegExp("\\\\");
That's because first, you're writing a string literal, but you want to actually put backslashes in the resulting string, so you do that with \\ for each one backslash you want. But your regex also requires two \\ for every one real backslash you want, and so it needs to see two backslashes in the string. Hence, a total of four. This is one of the reasons I avoid using new RegExp(string) whenver I can; I get confused easily. :-)
ES2015 and ES2018 update
Fast-forward to 2015, and as Dolphin_Wood points out the new ES2015 standard gives us template literals, tag functions, and the String.raw function:
// Yes, this unlikely-looking syntax is actually valid ES2015
let str = String.raw`\apple`;
str ends up having the characters \, a, p, p, l, and e in it. Just be careful there are no ${ in your template literal, since ${ starts a substitution in a template literal. E.g.:
let foo = "bar";
let str = String.raw`\apple${foo}`;
...ends up being \applebar.
Try String.raw method:
str = String.raw`\apple` // "\apple"
Reference here: String.raw()
\ is an escape character, when followed by a non-special character it doesn't become a literal \. Instead, you have to double it \\.
console.log("\apple"); //-> "apple"
console.log("\\apple"); //-> "\apple"
There is no way to get the original, raw string definition or create a literal string without escape characters.
please try the below one it works for me and I'm getting the output with backslash
String sss="dfsdf\\dfds";
System.out.println(sss);

sql regexp string end with ".0"

I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.

What regular expression characters have to be escaped in SQL?

To prevent SQL injection attack, the book "Building Scalable Web Sites" has a function to replace regular expression characters with escaped version:
function db_escape_str_rlike($string) {
preg_replace("/([().\[\]*^\$])/", '\\\$1', $string);
}
Does this function escape ( ) . [ ] * ^ $? Why are only those characters escaped in SQL?
I found an excerpt from the book you mention, and found that the function is not for escaping to protect against SQL injection vulnerabilities. I assumed it was, and temporarily answered your question with that in mind. I think other commenters are making the same assumption.
The function is actually about escaping characters that you want to use in regular expressions. There are several characters that have special meaning in regular expressions, so if you want to search for those literal characters, you need to escape them (precede with a backslash).
This has little to do with SQL. You would need to escape the same characters if you wanted to search for them literally using grep, sed, perl, vim, or any other program that uses regular expression searches.
Unfortunately, active characters in sql databases is an open issue. Each database vendor uses their own (mainly oracle's mysql, that uses \ escape sequences)
The official SQL way to escape a ', which is the string delimiter used for values is to double the ', as in ''.
That should be the only way to ensure transparency in SQL statements, and the only way to introduce a proper ' into a string. As soon as any vendor admits \' as a synonim of a quote, you are open to support all the extra escape sequences to delimit strings. Suppose you have:
'Mac O''Connor' (should go into "Mac O'Connor" string)
and assume the only way to escape a ' is that... then you have to check the next char when you see a ' for a '' sequence and:
you get '' that you change into '.
you get another, and you terminate the string literal and process the char as the first of the next token.
But if you admit \ as escape also, then you have to check for \' and for \\', and \\\' (this last one should be converted to \' on input) etc. You can run into trouble if you don't detect special cases as
\'' (should the '' be processed as SQL mandates, or the first \' is escaping the first ' and the second is the string end quote?)
\\'' (should the \\ be converted into a single \ then the ' should be the string terminator, or do we have to switch to SQL way of encoding and consider '' as a single quote?)
etc.
You have to check your database documentation to see if \ as escape characters affect only the encoding of special characters (like control characters or the like) and also affects the interpretation of the quote character or simply doesn't, and you have to escape ' the other way.
That is the reason for the vendors to include functions to do the escape/unescape of character literals into values to be embedded in a SQL statement. The idea of the attackers is to include (if you don't properly do) escape sequences into the data they post to you to see if that allows them to modify the text of the sql command to simply add a semicolon ; and write a complete sql statement that allows them to access freely your database.

How to create a regular expression to find an embedded dollar sign?

I am looking for a string like this: ">$3.45 in some HTML (I'm screen scraping), using this string as a regular expression: #"\">\\$"
The problem is that since the $ is a Regex character (match at end of line) my target is not found.
How do I write this string expression so NSRegularExpression will find the embedded ">$ in my HTML?
The \ is both the Objective-C string escape character and the regular-expression escape character... so to escape the $ you need to use:
#"\">\\$"
which creates a string containing a single \, and then that backslash is seen by NSRegularExpression and used to escape the $.
Note: At the time of writing this answer the question has been edited by a third party to remove the original problem!

what characters should be escaped in sql string parameters

I need a complete list of characters that should be escaped in sql string parameters to prevent exceptions. I assume that I need to replace all the offending characters with the escaped version before I pass it to my ObjectDataSource filter parameter.
No, the ObjectDataSource will handle all the escaping for you. Any parametrized query will also require no escaping.
As others have pointed out, in 99% of the cases where someone thinks they need to ask this question, they are doing it wrong. Parameterization is the way to go. If you really need to escape yourself, try to find out if your DB access library offers a function for this (for example, MySQL has mysql_real_escape_string).
SQL Books online:
Search for String Literals:
String Literals
A string literal consists of zero or more characters surrounded by quotation marks. If a string contains quotation marks, these must be escaped in order for the expression to parse. Any two-byte character except \x0000 is permitted in a string, because the \x0000 character is the null terminator of a string.
Strings can include other characters that require an escape sequence. The following table lists escape sequences for string literals.
\a
Alert
\b
Backspace
\f
Form feed
\n
New line
\r
Carriage return
\t
Horizontal tab
\v
Vertical tab
\"
Quotation mark
\
Backslash
\xhhhh
Unicode character in hexadecimal notation
Here's a way I used to get rid of apostrophes. You could do the same thing with other offending characters that you run into. (example in VB.Net)
Dim companyFilter = Trim(Me.ddCompany.SelectedValue)
If (Me.ddCompany.SelectedIndex > 0) Then
filterString += String.Format("LegalName like '{0}'", companyFilter.Replace("'", "''"))
End If
Me.objectDataSource.FilterExpression = filterString
Me.displayGrid.DataBind()