Split SQL statements blocks using Regular expression in C# - sql

Can we write an regular expression such that it splits the stored procedure in multiple SQL statements.
It should split update, delete select etc statements.
Edit: my attempt to solve the problem http://tsqlparsergdr.codeplex.com/

If you have the grammar for the stored procedure language you could use ANTLR so parse the procedure to get the relevant parts of the language out and the do any further processing necessary. It should be reletively easy to get a grammar going from scratch as well.
There would need to be a set of regex expressions to deal with the whole procedure. I.e. a regex to mach just insert statements that possible spans many lines and possible has local variables from the proc in it and so on.

If you are working with a known set of SQL procedures it should be pretty easy to examine them and come up with a set of regexes to split them as required.
If you are looking for something which will handle any possible set of SQL procedures then regexes wont hack it! SQL has a complex recursive grammer, and, there will always be some sub select, group by, or literal that will break your regex based parser.
As the previous poster recommended you really need a full parser such as can be generated by ANTLR or Javacc (is there a C# eqivalent?).
There are a number of SQL-92 grammer definitions available for these parser generators on the net so a large part of the work has been done for you - the remaining part - writing the parsers application logic - is still far from trivial.

To parse arbitrary stored procedures, you're far better off with a SQL parser. Trying to parse arbitrary SQL with regexes will amount to writing your own parser.
To parse a specific set of stored procedures, a regex may be able to do the job. You'll need to provide a few examples of the input you have and the desired output if you want a more detailed answer.

Related

Is the single-quote the only character in PL/SQL text literals that needs escaping assuming LIKE clauses are not being used?

I have a script that generates a large bulk of update statements. Those statements will be executed by being copied and pasted into SQL Developer by a human without validation.
To avoid accidental SQL injection, I tried to understand more about special characters. Based on the resources I found so far, I'm starting to believe that if LIKE clauses are not being used, then it's only the single-quote that needs to be escaped. This contradicts my spidey sense, which states that escaping special characters must be more complicated than that, so I decided to post this question for validation.

Can you write a sql statement without spaces between keywords

I am trying to do SQL Injection testing but I am currently testing a command line that separates parameters by spaces, so I'm trying to write a sql statement without any spaces. I've gotten it down to:
create table"aab"("id"int,"notes"varchar(100))
But I cannot figure out how to get rid of the space between CREATE and TABLE. The same would apply obviously for DROP and TABLE, etc.
Does anyone have any ideas? This is for Microsoft SQL Server 2014. Thanks!
[Update]: We are evaluating a third party product for vulnerabilities. I am not doing this to test my own code for weaknesses.
You can write comments between lines instead of spaces in many cases. So /**/ instead of spaces.
Sure it is possible to write some pretty elaborate statements without spaces.
Here is one.
select'asdf'as[asdf]into[#MyTable]
You can even do things like execute sp_executesql without spaces.
exec[sp_executesql]N'select''asdf''as[asdf]into[#MyTable]'
This is not possible, you have to check every argument to make sure they are as intended.
If they are supposed to be numbers, make sure they are numbers, is they are supposed to be a string that may contain specific caracters (like ' or ,) you should escape them when executing the request.
There should be a dedicated mechanism in your programmation langage to take care of hat (like PreparedStatement in Java)
You can also using brackets () for every functions without spaces
SELECT(COUNT(id))FROM(users)where(id>5)

sql injection when single quote in input is always replaced with double quote

Suppose sql query invoked by the login form is:
SELECT * FROM Users WHERE Name ='user input data' AND Pass ='user input data'
Now all single quotes in user input data are replaced with double single quotes (in other words, ' is replaced with ''.).
I can think of this possible sql injection: set user as the intended user and set the password as '\' or 1=1
but I can't think of how I am going to avoid the last ' from disrupting my sql injection.
Avoiding the last ' from disrupting your SQL injection is as simple as using a comment, for example (-- works great for single-line SQL). Alternatively, you can always add something like this:
someHack and '' = '
Now, is it possible to get around your single-quote to double-quote replacement? That most likely depends heavily on the actual SQL variant and the SQL engine you're using. For example, some SQL engines might treat endlines in input in a way that would give you trouble. Unicode is very likely to give you trouble as well, with its very complicated rules. In particular, if parts of your code are unicode aware and others aren't, it's quite easy to pass a harmless unicode character to someone who expects ASCII, and actually use your replacement to add the (unescaped) quote. In case you think this is a weird edge case, it's actually a very real and used SQL injection vector for PHP programmers who thought replace (or addslashes) is good enough to prevent SQL injection :D
If possible, try to use parameters instead of slapping text together. It will likely help your performance as well :) The simplest way to avoid parser bugs is to avoid using the parser.

SQL injection if brackets and semicolons are filtered

I have a statement like this:
SELECT * FROM TABLE WHERE COLUMN = 123456
123456 is provided by the user so it is vulnerable to SQLi but if I strip all semicolons and brackets, is it possible for the hacker to run any other statements (like DROP,UPDATE,INSERT etc) except SELECT?
I am already using prepared statements but I am curious that if the input is stripped of the line-terminator and brackets, can the hacker modify the DB in any way?
Use sql parameters. Attempting to "sanitize" input is an extremely bad idea. Try googling some complex sql injection snippets, you won't believe how creative black hat hackers are.
In general it's very difficult to be 100% certain that you are safe from this type of attack by trying to strip out specific characters - there are just too many ways to get around your code (by using character encodings etc.)
A better option is to pass parameters to a stored procedure, like this:
CREATE PROCEDURE usp_MyStoredProcedure
#MyParam int
AS
BEGIN
SELECT * FROM TABLE WHERE COLUMN = #MyParam
END
GO
That way SQL will treat the value passed in as a parameter, and nothing else, no matter what it contains. And in this case it would only accept a value of type int anyway.
If you don't want, or can't, use a stored procedure, then I'd suggest changing your code so that the input parameter can only contain a pre-defined list of characters - in this case numeric characters. That way you can be certain that the value is safe to use.

T-SQL language specification and lexing rules

I'm thinking about writing a templating tool for generating T-SQL code, which will include delimited sections like below;
SELECT
~~idcolumn~~
FROM
~~table~~
WHERE
~~table~~.flag = 1
Notice the double-tildes delimiting bits? This is an idea for an escape sequence in my templating language. But I want to be certain that the escape sequence is valid -- that it will never occur in a valid T-SQL statement. Problem is, I can't find any official microsoft description of the T-SQL language.
Does anyone know of an official specification for the T-SQL language, or at least the lexing rules? So I can make an informed decision about the escape sequence.
UPDATES:
Thanks for the suggestions so far, but I'm not looking for confirmation of the '~~' escape sequence per se. What I need is a document I can reference I can point to and say 'microsoft says this character sequence is totally impossible in T-SQL.' For instance, microsoft publish the language specification for C# here which includes a description of what characters can go into valid C# programs. (see page 67 of the pdf.) I'm looking for a similar reference.
The double-tilde: "~~" is actually perfectly good T-SQL. For instance; "(SELECT ~~1)" returns '1'.
There are several well known and often used formats for template parameters, one example being $(paramname) (also used in other scripts as well as T-SQL scripts)
Why not use an existing format?
It doesn't matter if ~~ is legal TSQL or not, if you provide an escape for producing ~~ in actual TSQL when you need it.
Since template parameters have to have a nonzero-length identifier, you have a peculiar case where the identifier length is ridiculously "zero", e.g., ~~~~. This kind of thing makes an ideal escape sequence, since it is useless for anything else. Simply process your template text; whenever you find ~~~~ replace it by the named parameter string, and whenever you find ~~~~ replace it by ~~. Now, if ~~ is needed in the final TSQL, just write ~~~~ in your template.
I suspect that even if you do this, that the number of times you'll actually write ~~~~ in practice will be close to zero, so the reason for doing it is theoretical completeness and giving you a warm fuzzy feeling that you can write anything in a template.
Well, I'm not sure about a complete description of the language, but it appears that ~~ could occur in an identifier provided that it is quoted (in brackets, typically).
You may have more luck with a convention saying you don't support identifiers with ~~ in them. Or, just reserve your own lexical symbols and don't worry about ~~ occurring elsewhere.
You could treat quoted literals and strings as content, regardless if they contain your escape-sequence. It would make it more robust.
Run the text trough a lexer, to separate each token. If the token is a string or a quoted literal, treat it as such. But if it is a literal that begins and ends with ~~, you can safely assume it is a template placeholder.
I'm not sure you'll find something that will never occur in a valid statement. Consider:
DECLARE #TemplateBreakingString varchar(100) = '~~I hope this works~~'
or
CREATE TABLE [~~TemplateBreakingTable~~] (IDField INT Identity)
Your escape sequence can occur in string literals, but that is all. That said, Microsoft owns t-sql, and they are free to do anything they want with it moving forward for future versions of sql server. Still, I think ~~ is safe enough.