I have a text box in a project where user can write database queries, but I nedd to prevent statements like DELETE, DROP or use of comments (/* */, --) or semicolon ;.
I'm using the folowing RegExp to check the query. It must math only valid statments.
/^(?!.*\-\-)(?!.*\/\*)(?!.*\*\/)(?!.*;)(?!.*CREATE)(?!.*DROP)(?!.*ALTER)(?!.*UPDATE)(?!.*DELETE).*$/
The RegExp is working fine, but it's not matching also line breaks and carriage returns (\n, \r), which should be permited.
How can I update the RegExp to allow \n and \r?
#FabSa gave me the answer.
The trick is just set the PCRE_DOTALL flag, as I'm using PHP _preg_match(). This does make a dot (.) metacharacter matches all characters, incluind newline (\n).
So, the final regex is as folows:
/^(?!.*\-\-)(?!.*\/\*)(?!.*\*\/)(?!.*;)(?!.*CREATE)(?!.*DROP)(?!.*ALTER)(?!.*UPDATE)(?!.*DELETE).*$/s
Related
I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.
To prevent SQL injection attack, the book "Building Scalable Web Sites" has a function to replace regular expression characters with escaped version:
function db_escape_str_rlike($string) {
preg_replace("/([().\[\]*^\$])/", '\\\$1', $string);
}
Does this function escape ( ) . [ ] * ^ $? Why are only those characters escaped in SQL?
I found an excerpt from the book you mention, and found that the function is not for escaping to protect against SQL injection vulnerabilities. I assumed it was, and temporarily answered your question with that in mind. I think other commenters are making the same assumption.
The function is actually about escaping characters that you want to use in regular expressions. There are several characters that have special meaning in regular expressions, so if you want to search for those literal characters, you need to escape them (precede with a backslash).
This has little to do with SQL. You would need to escape the same characters if you wanted to search for them literally using grep, sed, perl, vim, or any other program that uses regular expression searches.
Unfortunately, active characters in sql databases is an open issue. Each database vendor uses their own (mainly oracle's mysql, that uses \ escape sequences)
The official SQL way to escape a ', which is the string delimiter used for values is to double the ', as in ''.
That should be the only way to ensure transparency in SQL statements, and the only way to introduce a proper ' into a string. As soon as any vendor admits \' as a synonim of a quote, you are open to support all the extra escape sequences to delimit strings. Suppose you have:
'Mac O''Connor' (should go into "Mac O'Connor" string)
and assume the only way to escape a ' is that... then you have to check the next char when you see a ' for a '' sequence and:
you get '' that you change into '.
you get another, and you terminate the string literal and process the char as the first of the next token.
But if you admit \ as escape also, then you have to check for \' and for \\', and \\\' (this last one should be converted to \' on input) etc. You can run into trouble if you don't detect special cases as
\'' (should the '' be processed as SQL mandates, or the first \' is escaping the first ' and the second is the string end quote?)
\\'' (should the \\ be converted into a single \ then the ' should be the string terminator, or do we have to switch to SQL way of encoding and consider '' as a single quote?)
etc.
You have to check your database documentation to see if \ as escape characters affect only the encoding of special characters (like control characters or the like) and also affects the interpretation of the quote character or simply doesn't, and you have to escape ' the other way.
That is the reason for the vendors to include functions to do the escape/unescape of character literals into values to be embedded in a SQL statement. The idea of the attackers is to include (if you don't properly do) escape sequences into the data they post to you to see if that allows them to modify the text of the sql command to simply add a semicolon ; and write a complete sql statement that allows them to access freely your database.
I am using a replace function to add some quotes around a couple of keywords.
However, this replacement doesn't work for a few cases like the one below.
See example below.
This is the query:
replace(replace(aa.SourceQuery,'sequence','"sequence"'),'timestamp','"timestamp"')
Before:
select timestamp, SparkTimeStamp
from SparkRecordCounts
After:
select "timestamp", Spark"timestamp"
from SparkRecordCounts
However, I want it to be like:
select "timestamp", Sparktimestamp
from SparkRecordCounts
EDIT I wrote this before knowing what RDBMS you were using but have left it in case it helps someone else.
I think you are looking for word boundaries in your replacement, which are generally a job for regular expressions.
Oracle has one built in, called regexp_replace, and you could use something like this:
regexp_replace(aa.SourceQuery, '(^|\s|\W)timestamp($|\s|\W)', '\1"timestamp"\2')
The regular expression looks at the start for:
^ - the start of the line OR
\s - a space character OR
\W - a non-word character
It then matches timestamp, and must end with:
$ - the end of the line OR
\s - a space character OR
\W - a non-word character
Then, and only then, does it perform the replace. \1 and \2 are used to preserve what word boundary matched at the beginning and ending of the word.
I'm not sure how other databases handle regexp_replace, it looks like mysql can via a plugin like this but there may not be a native method.
SQL Server has a solution to something similar here
In the following sql, what the use of escape is ?
select * from dual where dummy like 'funny&_' escape '&';
SQL*Plus ask for the value of _ whether escape is specified or not.
The purpose of the escape clause is to stop the wildcard characters (eg. % or _) from being considered as wildcards, as per the documentation
The reason why you're being prompted for the value of _ is because you're using &, which is also usually the character used to prompt for a substitution variable.
To stop the latter from happening, you could:
change to a different escape character
prior to running your statement, run set define off if you're using SQL*Plus (or as a script in a GUI, eg. Toad) or turn off the substitution variable prompting if you're using a GUI.
change the define character to something different by running set define <character>
The escape character is used to indicate that the underscore should be matched as an actual character, rather than as a single-character wildcard. This is explained in the documentation.
You can include the actual characters % or _ in the pattern by using the ESCAPE clause, which identifies the escape character. If the escape character precedes the character % or _ in the pattern, then Oracle interprets this character literally in the pattern rather than as a special pattern-matching character.
If you didn't have the escape clause then the underscore would match any single character, so where dummy like 'funny_' would match 'funnyA', 'funnyB', etc. and not just an actual underscore.
The escape character you've chosen is & which is the default SQL*Plus client substitution variable marker. It has nothing to do with the escape clause, and using that is causing the &_ part of the pattern to be interpreted as a substitution variable called _, hence your being prompted. As it isn't related, the escape clause has no effect on that.
The simplest thing is probably to choose a different escape character. If you want to use that specific escape character and not be prompted, disable or change the substitution character:
set define off
select * from dual where dummy like 'funny&_' escape '&';
set define on
That will then match rows where dummy contains exactly the string 'funny_'. (It's therefore equivalent to where dummy = 'funny_', as there are no unescaped wildcards, making the like pattern matching redundant). It will not match any that start with that pattern (it's sort of like using regexp_like with start and end anchors, and you might be expecting it to work as if you hadn't supplied anchors, but it doesn't). You would need to add a % wildcard for that:
set define off
select * from dual where dummy like 'funny&_%' escape '&';
set define on
And if you want to match any that don't start with funny_ but have it somewhere in the middle of the value, you would need to add another wildcard before it too:
set define off
select * from dual where dummy like '%funny&_%' escape '&';
set define on
You haven't shown any sample data or expected results to it isn't clear which pattern you need.
SQL Fiddle doesn't have substitution variables but here's an example showing how those three patterns match various values.
The syntax for the SQL LIKE Condition is:
expression LIKE pattern [ ESCAPE 'escape_character' ]
Parameters or Arguments
expression : A character expression such as a column or field.
pattern : A character expression that contains pattern matching. The patterns that you can choose from are:
Wildcard | Explanation
---------+-------------
% | Allows you to match any string of any length (including zero length)
_ | Allows you to match on a single character
escape_character: Optional. It allows you to test for literal instances of a wildcard character such as % or _.
Source : http://www.techonthenet.com/sql/like.php
SQLITE Query question:
I have a query which returns string with the character '#' in it.
I would like to remove all characters after this specific character '#':
select field from mytable;
result :
text#othertext
text2#othertext
text3#othertext
So in my sample I would like to create a query which only returns :
text
text2
text3
I tried something with instr() to get the index, but instr() was not recognized as a function -> SQL Error: no such function: instr (probably old version of db . sqlite_version()-> 3.7.5).
Any hints howto achieve this ?
There are two approaches:
You can rtrim the string of all characters other than the # character.
This assumes, of course, that (a) there is only one # in the string; and (b) that you're dealing with simple strings (e.g. 7-bit ASCII) in which it is easy to list all the characters to be stripped.
You can use sqlite3_create_function to create your own rendition of INSTR. The specifics here will vary a bit upon how you're using