According to this forum discussion, SQL Server (I'm using 2005 but I gather this also applies to 2000 and 2008) silently truncates any varchars you specify as stored procedure parameters to the length of the varchar, even if inserting that string directly using an INSERT would actually cause an error. eg. If I create this table:
CREATE TABLE testTable(
[testStringField] [nvarchar](5) NOT NULL
)
then when I execute the following:
INSERT INTO testTable(testStringField) VALUES(N'string which is too long')
I get an error:
String or binary data would be truncated.
The statement has been terminated.
Great. Data integrity preserved, and the caller knows about it. Now let's define a stored procedure to insert that:
CREATE PROCEDURE spTestTableInsert
#testStringField [nvarchar](5)
AS
INSERT INTO testTable(testStringField) VALUES(#testStringField)
GO
and execute it:
EXEC spTestTableInsert #testStringField = N'string which is too long'
No errors, 1 row affected. A row is inserted into the table, with testStringField as 'strin'. SQL Server silently truncated the stored procedure's varchar parameter.
Now, this behaviour might be convenient at times but I gather there is NO WAY to turn it off. This is extremely annoying, as I want the thing to error if I pass too long a string to the stored procedure. There seem to be 2 ways to deal with this.
First, declare the stored proc's #testStringField parameter as size 6, and check whether its length is over 5. This seems like a bit of a hack and involves irritating amounts of boilerplate code.
Second, just declare ALL stored procedure varchar parameters to be varchar(max), and then let the INSERT statement within the stored procedure fail.
The latter seems to work fine, so my question is: is it a good idea to use varchar(max) ALWAYS for strings in SQL Server stored procedures, if I actually want the stored proc to fail when too long a string is passed? Could it even be best practice? The silent truncation that can't be disabled seems stupid to me.
It just is.
I've never noticed a problem though because one of my checks would be to ensure my parameters match my table column lengths. In the client code too. Personally, I'd expect SQL to never see data that is too long. If I did see truncated data, it'd be bleeding obvious what caused it.
If you do feel the need for varchar(max) beware a massive performance issue because of datatype precedence. varchar(max) has higher precedence than varchar(n) (longest is highest). So in this type of query you'll get a scan not a seek and every varchar(100) value is CAST to varchar(max)
UPDATE ...WHERE varchar100column = #varcharmaxvalue
Edit:
There is an open Microsoft Connect item regarding this issue.
And it's probably worthy of inclusion in Erland Sommarkog's Strict settings (and matching Connect item).
Edit 2, after Martins comment:
DECLARE #sql VARCHAR(MAX), #nsql nVARCHAR(MAX);
SELECT #sql = 'B', #nsql = 'B';
SELECT
LEN(#sql),
LEN(#nsql),
DATALENGTH(#sql),
DATALENGTH(#nsql)
;
DECLARE #t table(c varchar(8000));
INSERT INTO #t values (replicate('A', 7500));
SELECT LEN(c) from #t;
SELECT
LEN(#sql + c),
LEN(#nsql + c),
DATALENGTH(#sql + c),
DATALENGTH(#nsql + c)
FROM #t;
Thanks, as always, to StackOverflow for eliciting this kind of in-depth discussion. I have recently been scouring through my Stored Procedures to make them more robust using a standard approach to transactions and try/catch blocks. I disagree with Joe Stefanelli that "My suggestion would be to make the application side responsible", and fully agree with Jez: "Having SQL Server verify the string length would be much preferable". The whole point for me of using stored procedures is that they are written in a language native to the database and should act as a last line of defence. On the application side the difference between 255 and 256 is just a meangingless number but within the database environment, a field with a maximum size of 255 will simply not accept 256 characters. The application validation mechanisms should reflect the backend db as best they can, but maintenance is hard so I want the database to give me good feedback if the application mistakenly allows unsuitable data. That's why I'm using a database instead of a bunch of text files with CSV or JSON or whatever.
I was puzzled why one of my SPs threw the 8152 error and another silently truncated. I finally twigged: The SP which threw the 8152 error had a parameter which allowed one character more than the related table column. The table column was set to nvarchar(255) but the parameter was nvarchar(256). So, wouldn't my "mistake" address gbn's concern: "massive performance issue"? Instead of using max, perhaps we could consistently set the table column size to, say, 255 and the SP parameter to just one character longer, say 256. This solves the silent truncation problem and doesn't incur any performance penalty.
Presumably there is some other disadvantage that I haven't thought of, but it seems a good compromise to me.
Update:
I'm afraid this technique is not consistent. Further testing reveals that I can sometimes trigger the 8152 error and sometimes the data is silently truncated. I would be very grateful if someone could help me find a more reliable way of dealing with this.
Update 2:
Please see Pyitoechito's answer on this page.
The same behavior can be seen here:
declare #testStringField [nvarchar](5)
set #testStringField = N'string which is too long'
select #testStringField
My suggestion would be to make the application side responsible for validating the input before calling the stored procedure.
Update: I'm afraid this technique is not consistent. Further testing reveals that I can sometimes trigger the 8152 error and sometimes the data is silently truncated. I would be very grateful if someone could help me find a more reliable way of dealing with this.
This is probably occurring because the 256th character in the string is white-space. VARCHARs will truncate trailing white-space on insertion and just generate a warning. So your stored procedure is silently truncating your strings to 256 characters, and your insertion is truncating the trailing white-space (with a warning). It will produce an error when said character is not white-space.
Perhaps a solution would be to make the stored procedure's VARCHAR a suitable length to catch a non-white-space character. VARCHAR(512) would probably be safe enough.
One solution would be to:
Change all incoming parameters to be varchar(max)
Have sp private variable of the correct datalength (simply copy and paste all in parameters and add "int" at the end
Declare a table variable with the column names the same as variable names
Insert into the table a row where each variable goes into the column with the same name
Select from the table into internal variables
This way your modifications to the existing code are going to be very minimal like in the sample below.
This is the original code:
create procedure spTest
(
#p1 varchar(2),
#p2 varchar(3)
)
This is the new code:
create procedure spTest
(
#p1 varchar(max),
#p2 varchar(max)
)
declare #p1Int varchar(2), #p2Int varchar(3)
declare #test table (p1 varchar(2), p2 varchar(3)
insert into #test (p1,p2) varlues (#p1, #p2)
select #p1Int=p1, #p2Int=p2 from #test
Note that if the length of the incoming parameters is going to be greater than the limit instead of silently chopping off the string SQL Server will throw off an error.
You could always throw an if statement into your sp's that check the length of them, and if they're greater than the specified length throw an error. This is rather time consuming though and would be a pain to update if you update the data size.
This isn't the Answer that'll solve your problem today, but it includes a Feature Suggestion for MSSQL to consider adding, that would resolve this issue.
It is important to call this out as a shortcoming of MSSQL, so we may help them resolve it by raising awareness of it.
Here's the formal Suggestion if you'd like to vote on it:
https://feedback.azure.com/forums/908035-sql-server/suggestions/38394241-request-for-new-rule-string-truncation-error-for
I share your frustration.
The whole point of setting Character-Size on Parameters is so other Developers will instantly know
what the Size Limits are (via Intellisense) when passing in Data.
This is like having your documentation baked right into the Sproc's Signature.
Look, I get it, Implicit-Conversion during Variable Assignments is the culprit.
Still, there is no good reason to expend this amount of energy battling scenarios
where you are forced to work around this feature.
If you ask me, Sprocs and Functions should have the same engine-rules in place,
for Assigning Parameters, that are used when Populating Tables. Is this really too much to ask?
All these suggestions to use Larger Character-Limits
and then adding Validation for EACH Parameter in EVERY Sproc is ridiculous.
I know it's the only way to ensure Truncation is avoided, but really MSSQL?
I don't care if it's ANSI/ISO Standard or whatever, it's dumb!
When Values are too long - I want my code to break - every time.
It should be: Do not pass go, and fix your code.
You could have multiple truncation bugs festering for years and never catch them.
What happened to ensuring your Data-Integrity?
It's dangerous to assume your SQL Code will only ever be called after all Parameters are Validated.
I try to add the same Validation to both my Website and in the Sproc it calls,
and I still catch Errors in my Sproc that slipped past the website. It's a great sanity-check!
What if you want to re-use your Sproc for a WebSite/WebService and also have it called from other
Sprocs/Jobs/Deployment/Ad-Hoc Scripts (where there is no front-end to Validate Parameters)?
MSSQL Needs a "NO_TRUNC" Option to Enforce this on any Non-Max String Variable
(even those used as Parameters for Sprocs and Functions).
It could be Connection/Session-Scoped:
(like how the "TRANSACTION ISOLATION LEVEL READ UNCOMMITTED" Option affects all Queries)
Or focused on a Single Variable:
(like how "NOLOCK" is a Table Hint for just 1 Table).
Or a Trace-Flag or Database Property you turn on to apply this to All Sproc/Function Parameters in the Database.
I'm not asking to upend decades of Legacy Code.
Just asking MS for the option to better manage our Databases.
Related
I'm currently working on a .NET application and want to make it as modular as possible. I've already created a basic SELECT procedure, which returns data by checking inputted parameters on SQL Server side.
I want to create a procedure that parses structured data as string and inserts its' contents to corresponding table in database.
For example, I have a table as
CREATE TABLE ExampleTable (
id_exampleTable int IDENTITY (1, 1) NOT NULL,
exampleColumn1 nvarchar(200) NOT NULL,
exampleColumn2 int NULL,
exampleColumn3 int NOT NULL,
CONSTRAINT pk_exampleTable PRIMARY KEY ( id_exampleTable )
)
And my procedure starts as
CREATE PROCEDURE InsertDataIntoCorrespondingTable
#dataTable nvarchar(max), --name of Table in my DB
#data nvarchar(max) --normalized string parameter as 'column1, column2, column3, etc.'
AS
BEGIN
IF #dataTable = 'table'
BEGIN
/**Parse this string and execute insert command**/
END
ELSE IF /**Other statements**/
END
TL;DR
So basically, I'm looking for a solution that can help me achieve something like this
EXEC InsertDataIntoCorrespondingTableByID(
#dataTable = 'ExampleTable',
#data = '''exampleColumn1'', 2, 3'
)
Which should be equal to just
INSERT INTO ExampleTable SELECT 'exampleColumn1', 2, 3
Sure, I can push data as INSERT statements (for each and every 14 tables inside DB...), generated inside an app, but I want to conquer T-SQL :)
This might be reasonable (to some degree) on an RDBMS that supports structured data like JSON or XML natively, but doing this the way you are planning is going to cause some real pain-in-the-rear support and, more importantly, a sql injection attack vector. I would leave this to the realm of the web backend server where it belongs.
You are likely going to invent your own structured data markup language and parser to solve this as sql server. That's a wheel that doesn't need to be reinvented. If you do end up building this, highly consider going with JSON to avoid all the issues that structured data inherently bring with it, assuming your version of sql server supports json parsing/packaging.
Your front end that packages your data into your SDML is going to have to assume column ordinals, but column ordinal is not something that one should rely on in a database. SQL Amateurs often do, I know from years in the industry and dealing with end users that are upset when a new column is introduced in a position they don't want it. Adding a column to a table shouldn't break an application. If it does, that application has bad code.
Regarding the sql injection attack vector, your SP code is going to get ugly. You'll need to parse out each item in #data into a variable of its own in order to properly parameterize your dynamic sql that is being built. See here under the "working with parameters" section for what that will look like. Failure to add this to your SP code means that values passed in that #data SDML could become executable SQL instead of literals and that would be very bad. This is not easy to solve in SP language. Where it IS easy to solve though is in the backend server code. Every database library on the planet supports parameterized query building/execution natively.
Once you have this built you will be dynamically generating an INSERT statement and dynamically generating variables or an array or some data structure to pass in parameters to the INSERT statement to avoid sql injection attacks. It's going to be dynamic, on top of dynamic, on top of dynamic which leads to:
From a support context, imagine that your application just totally throws up one day. You have to dive into investigate. You track the SDML that your front end created that caused the failure, and you open up your SP code to troubleshoot. Imagine what this code ends up looking like
It has to determine if the table exists
It has to parse the SDML to get each literal
It has to read DB metadata to get the column list
It has to dynamically write the insert statement, listing the columns from metadata and dynamically creating sql parameters for the VALUES() list.
It has to execute sending a dynamic number of variables into the dynamically generated sql.
My support staff would hang me out to dry if they had to deal with that, and I'm the one paying them.
All of this is solved by using a proper backend to handle communication, deeper validation, sql parameter binding, error catching and handling, and all the other things that backend servers are meant to do.
I believe that your back end web server should be VERY aware of the underlying data model. It should be the connection between your view, your data, and your model. Leave the database to the things it's good at (reading and writing data). Leave your front end to the things that it's good at (presenting a UI for the end user).
I suppose you could do something like this (may need a little extra work)
declare #columns varchar(max);
select #columns = string_agg(name, ', ') WITHIN GROUP ( ORDER BY column_id )
from sys.all_columns
where object_id = object_id(#dataTable);
declare #sql varchar(max) = select concat('INSERT INTO ',#dataTable,' (',#columns,') VALUES (', #data, ')')
exec sp_executesql #sql
But please don't. If this were a good idea, there would be tons of examples of how to do it. There aren't so it's probably not a good idea.
There are however tons of examples of using ORMs or auto-generated code in stead - because that way your code is maintainable, debugable and performant.
I want to insert the results of a stored procedure into a temp table using OPENROWSET. However, the issue I run into is I'm not able to pass parameters to my stored procedure.
This is my stored procedure:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[N_spRetrieveStatement]
#PeopleCodeId nvarchar(10),
#StatementNumber int
AS
SET NOCOUNT ON
DECLARE #PersonId int
SELECT #PersonId = [dbo].[fnGetPersonId](#PeopleCodeId)
SELECT *
INTO #tempSpRetrieveStatement
FROM OPENROWSET('SQLNCLI', 'Server=PCPRODDB01;Trusted_Connection=yes;',
'EXEC Campus.dbo.spRetrieveStatement #StatementNumber, #PersonId');
--2577, 15084
SELECT *
FROM #tempSpRetrieveStatement;
OpenRowSet will not allow you to execute Procedure with input parameters. You have to use INSERT/EXEC.
INTO #tempSpRetrieveStatement(Col1, Col2,...)
EXEC PCPRODDB01.Campus.dbo.spRetrieveStatement #StatementNumber, #PersonId
Create and test a LinkedServer for PCPRODDB01 before running the above command.
The root of your problem is that you don't actually have parameters inside your statement that you're transmitting to the remote server you're connecting to, given the code sample you provided. Even if it was the very same machine you were connecting to, they'd be in different processes, and the other process doesn't have access to your session variables.
LinkedServer was mentioned as an option, and my understanding is that's the preferred option. However in practice that's not always available due to local quirks in tech or organizational constraints. It happens.
But there is a way to do this.
It's hiding in plain sight.
You need to pass literals into the string that will be executed on the other server, right?
So, you start by building the string that will do that.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[N_spRetrieveStatement]
#PeopleCodeId nvarchar(10),
#StatementNumber int
AS
SET NOCOUNT ON
DECLARE
#PersonId INT,
#TempSQL VARCHAR(4000) = '';
SELECT #PersonId = [dbo].[fnGetPersonId](#PeopleCodeId);
SET #TempSQL =
'EXEC Campus.dbo.spRetrieveStatement(''''' +
FORMAT(#StatementNumber,'D') +''''', ''''' +
FORMAT(#PersonId,'D') + ''''')';
--2577, 15084
Note the seemingly excessive number of quotes. That's not a mistake -- that's foreshadowing. Because, yes, OPENROWSET hates taking variables as parameters. It, too, only wants literals. So, how do we give OPENROWSET what it needs?
We create a string that is the entire statement, no variables of any kind. And we execute that.
SET #TempSQL =
'SELECT * INTO #tempSpRetrieveStatement ' +
'FROM OPENROWSET(''SQLNCLI'', ''Server=PCPRODDB01;Trusted_Connection=yes;'', ' + #TempSQL +
'EXEC Campus.dbo.spRetrieveStatement #StatementNumber, #PersonId';
EXEC (#TempSQL);
SELECT *
FROM #tempSpRetrieveStatement;
And that's it! Pretty simple except for counting your escaped quotes, right?
Now... This is almost beyond the scope of the question you asked, but it is a 'gotcha' I've experienced in executing stored procedures in another machine via OPENROWSET. You're obviously used to using temp tables. This will fail if the stored procedure you're calling is creating temp tables or doing a few other things that -- in a nutshell -- inspire the terror of ambiguity into your SQL server. It doesn't like ambiguity. If that's the case, you'll see a message like this:
"Msg 11514, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1
The metadata could not be determined because statement '…your remote EXEC statement here…' in procedure '…name of your local stored procedure here…' contains dynamic SQL. Consider using the WITH RESULT SETS clause to explicitly describe the result set."
So, what's up with that?
You don't just get data back with OPENROWSET. The local and remote servers have a short conversation about what exactly the local server is going to expect from the remote server (so it can optimize receiving and processing it as it comes in -- something that's extremely important for large rowsets). Starting with SQL Server 2012, sp_describe_first_result_set is the newer procedure for this, and normally it executes quickly without you noticing it. It's just that it's powers of divination aren't unlimited. Namely, it doesn't know how to get the type and name information regarding temp tables (and probably a few other things it can't do -- PIVOT in a select statement is probably right out).
I specifically wanted to be sure to point this out because of your reply regarding your hesitation about using LinkedServer. In fact, the very same reasons you're hesitant are likely to render that error message's suggestion completely useless -- you can't even predict what columns you're getting and in what order until you've got them.
I think what you're doing will work if, say, you're just branching upstream based on conditional statements and are executing one of several potential SELECT statements. I think it will work if you're just not confident that you can depend on the upstream component being fixed and are trying to ensure that even if it varies, this procedure doesn't have to because it's very generic.
But on the other hand you're facing a situation in which you literally cannot guarantee that SQL Server can predict the columns, you're likely going to have to force some changes in the stored procedure you're calling to insist that it's stable. You might, for instance work out how to ensure all possible fields are always present by using CASE expressions rather than any PIVOT. You might create a session table that's dedicated to housing what you need to SELECT just long enough to do that then DELETE the contents back out of there. You might change the way in which you transmit your data such that it's basically gone through the equivalent of UNPIVOT. And after all that extra work, maybe it'll be just a matter of preference if you use LinkedServer or OPENROWSET to port the data across.
So that's the answer to the literal question you asked, and one of the limits on what you can do with the answer.
I need to know how to turn off ANSI warnings on my stored procedure please. I keep getting the error
String or binary data would be truncated.
However, I would rather this be turned off so as I expect this and would rather allow it.
I added the statement
SET ANSI_WARNINGS OFF
GO
right before the stored procedure however, doing this does not seem to suppress the error at all.
For the reason why I have this truncate error to begin with, well one of my stored procs executes dynamic Sql to retrieve values(SQLFIddle showing the code ). And I had to set the length on all of my fields to the length of the max (NVarchar(3072)). When my query is executed however, I need them back to the right size when printing them to the client.
Would appreciate info on how to best deal with this please. Thanks in advance.
I agree with #marc_s -- fix the problem, not the symptom especially if your intent is to truncate. What will another developer think when he comes along and a proc is throwing these errors and a non standard flag was used to suppress the issue?
Code to make your intent to truncate clear.
Identifying your Problem
The fiddle doesn't display the behavior your describe. So I'm still a little confused as to the issue.
Also, your SQL fiddle is way too dense for a question like this. If I don't answer your question below work to isolate the problem to the simplest use case possible. Don't just dump 500 lines of your app into a window.
Note: The Max NVarchar is either 4000 in version of SQL 7 & 2000 or 2 Gigs (nvarchar(max)) in SQL 2005 and later. I have no idea where you came up with 3072.
My Test
If you're truncating at the SPROC parameter level, ANSI Warnings flags is ignored, as this MSDN page warns. If it's inside your procedure, I created a little test proc that displays the ANSI flag allowing truncation:
CREATE Proc DoSomething (#longThing varchar(50)) AS
DECLARE #T1 TABLE ( shortThing VARCHAR(20) );
SET ANSI_WARNINGS OFF
Print ' I don''t even whimpler when truncating'
INSERT INTO #T1 (ShortThing) VALUES ( #longThing);
SET ANSI_WARNINGS ON
Print ' I yell when truncated'
INSERT INTO #T1 (ShortThing) VALUES ( #longThing);
Then calling it the following works as expected:
exec DoSomething 'Text string longer than 20 characters'
FIXING THE PROBLEM
Nevertheless, why not just code so your intent to (potentially) truncate data is clear? You can avoid the warning rather than turn it off. I would do one of the following:
make your Procedure parameters long enough to accommodate the input
IF you need to shorten string data use Substring() to trim data.
Use CAST or CONVERT to format the data to your requirement. This page (section headed "Implicit Conversions" should help) details how cast & convert work.
My simple example above can be modified as follows to avoid the need to set any flag.
CREATE Proc DoSomethingBETTER (#longThing varchar(50)) AS
SET ANSI_WARNINGS ON
DECLARE #T1 TABLE ( shortThing VARCHAR(20) );
--try one of these 3 options...
INSERT INTO #T1 (ShortThing) VALUES ( Convert(varchar(20), #longThing));
INSERT INTO #T1 (ShortThing) VALUES ( Substring(#longThing, 1, 20));
INSERT INTO #T1 (ShortThing) VALUES ( Cast(#longThing as varchar(20)) );
Print('Ansi warnings can be on when truncating data');
An Aside - Clustered Guids
Looking at your fiddle I noticed that you Uniqueidentifer as the key in your Clustered indexes. In almost every scenario this is a pretty inefficient option. The randomness of GUIDs means your data is constantly being fragmented & re-shuffled.
Hopefully you can convert to int identity, you're using newsequentialid(), or COMB guids as described in Jimmy Nilsson's article.
You can see more about the problem here, here, here, and here.
If yes, why are there still so many successful SQL injections? Just because some developers are too dumb to use parameterized statements?
When articles talk about parameterized queries stopping SQL attacks they don't really explain why, it's often a case of "It does, so don't ask why" -- possibly because they don't know themselves. A sure sign of a bad educator is one that can't admit they don't know something. But I digress.
When I say I found it totally understandable to be confused is simple. Imagine a dynamic SQL query
sqlQuery='SELECT * FROM custTable WHERE User=' + Username + ' AND Pass=' + password
so a simple sql injection would be just to put the Username in as ' OR 1=1--
This would effectively make the sql query:
sqlQuery='SELECT * FROM custTable WHERE User='' OR 1=1-- ' AND PASS=' + password
This says select all customers where they're username is blank ('') or 1=1, which is a boolean, equating to true. Then it uses -- to comment out the rest of the query. So this will just print out all the customer table, or do whatever you want with it, if logging in, it will log in with the first user's privileges, which can often be the administrator.
Now parameterized queries do it differently, with code like:
sqlQuery='SELECT * FROM custTable WHERE User=? AND Pass=?'
parameters.add("User", username)
parameters.add("Pass", password)
where username and password are variables pointing to the associated inputted username and password
Now at this point, you may be thinking, this doesn't change anything at all. Surely you could still just put into the username field something like Nobody OR 1=1'--, effectively making the query:
sqlQuery='SELECT * FROM custTable WHERE User=Nobody OR 1=1'-- AND Pass=?'
And this would seem like a valid argument. But, you would be wrong.
The way parameterized queries work, is that the sqlQuery is sent as a query, and the database knows exactly what this query will do, and only then will it insert the username and passwords merely as values. This means they cannot effect the query, because the database already knows what the query will do. So in this case it would look for a username of "Nobody OR 1=1'--" and a blank password, which should come up false.
This isn't a complete solution though, and input validation will still need to be done, since this won't effect other problems, such as XSS attacks, as you could still put javascript into the database. Then if this is read out onto a page, it would display it as normal javascript, depending on any output validation. So really the best thing to do is still use input validation, but using parameterized queries or stored procedures to stop any SQL attacks.
The links that I have posted in my comments to the question explain the problem very well. I've summarised my feelings on why the problem persists, below:
Those just starting out may have no awareness of SQL injection.
Some are aware of SQL injection, but think that escaping is the (only?) solution. If you do a quick Google search for php mysql query, the first page that appears is the mysql_query page, on which there is an example that shows interpolating escaped user input into a query. There's no mention (at least not that I can see) of using prepared statements instead. As others have said, there are so many tutorials out there that use parameter interpolation, that it's not really surprising how often it is still used.
A lack of understanding of how parameterized statements work. Some think that it is just a fancy means of escaping values.
Others are aware of parameterized statements, but don't use them because they have heard that they are too slow. I suspect that many people have heard how incredibly slow paramterized statements are, but have not actually done any testing of their own. As Bill Karwin pointed out in his talk, the difference in performance should rarely be used as a factor when considering the use of prepared statements. The benefits of prepare once, execute many, often appear to be forgotten, as do the improvements in security and code maintainability.
Some use parameterized statements everywhere, but with interpolation of unchecked values such as table and columns names, keywords and conditional operators. Dynamic searches, such as those that allow users to specify a number of different search fields, comparison conditions and sort order, are prime examples of this.
False sense of security when using an ORM. ORMs still allow interpolation of SQL statement parts - see 5.
Programming is a big and complex subject, database management is a big and complex subject, security is a big and complex subject. Developing a secure database application is not easy - even experienced developers can get caught out.
Many of the answers on stackoverflow don't help. When people write questions that use dynamic SQL and parameter interpolation, there is often a lack of responses that suggest using parameterized statements instead. On a few occasions, I've had people rebut my suggestion to use prepared statements - usually because of the perceived unacceptable performance overhead. I seriously doubt that those asking most of these questions are in a position where the extra few milliseconds taken to prepare a parameterized statement will have a catastrophic effect on their application.
Well good question.
The answer is more stochastic than deterministic and I will try to explain my view, using a small example.
There many references on the net that suggest us to use parameters in our queries or to use stored procedure with parameters in order to avoid SQL Injection (SQLi). I will show you that stored procedures (for instance) is not the magic stick against SQLi. The responsibility still remains on the programmer.
Consider the following SQL Server Stored Procedure that will get the user row from a table 'Users':
create procedure getUser
#name varchar(20)
,#pass varchar(20)
as
declare #sql as nvarchar(512)
set #sql = 'select usrID, usrUName, usrFullName, usrRoleID '+
'from Users '+
'where usrUName = '''+#name+''' and usrPass = '''+#pass+''''
execute(#sql)
You can get the results by passing as parameters the username and the password. Supposing the password is in free text (just for simplicity of this example) a normal call would be:
DECLARE #RC int
DECLARE #name varchar(20)
DECLARE #pass varchar(20)
EXECUTE #RC = [dbo].[getUser]
#name = 'admin'
,#pass = '!#Th1siSTheP#ssw0rd!!'
GO
But here we have a bad programming technique used by the programmer inside the stored procedure, so an attacker can execute the following:
DECLARE #RC int
DECLARE #name varchar(20)
DECLARE #pass varchar(20)
EXECUTE #RC = [TestDB].[dbo].[getUser]
#name = 'admin'
,#pass = 'any'' OR 1=1 --'
GO
The above parameters will be passed as arguments to the stored procedure and the SQL command that finally will be executed is:
select usrID, usrUName, usrFullName, usrRoleID
from Users
where usrUName = 'admin' and usrPass = 'any' OR 1=1 --'
..which will get all rows back from users
The problem here is that even we follow the principle "Create a stored procedure and pass the fields to search as parameters" the SQLi is still performed. This is because we just copy our bad programming practice inside the stored procedure. The solution to the problem is to rewrite our Stored Procedure as follows:
alter procedure getUser
#name varchar(20)
,#pass varchar(20)
as
select usrID, usrUName, usrFullName, usrRoleID
from Users
where usrUName = #name and usrPass = #pass
What I am trying to say is that the developers must learn first what an SQLi attack is and how can be performed and then to safeguard their code accordingly. Blindly following 'best practices' is not always the safer way... and maybe this is why we have so many 'best practices'- failures!
Yes, the use of prepared statements stops all SQL injections, at least in theory. In practice, parameterized statements may not be real prepared statements, e.g. PDO in PHP emulates them by default so it's open to an edge case attack.
If you're using real prepared statements, everything is safe. Well, at least as long as you don't concatenate unsafe SQL into your query as reaction to not being able to prepare table names for example.
If yes, why are there still so many successful SQL injections? Just because some developers are too dumb to use parameterized statements?
Yes, education is the main point here, and legacy code bases. Many tutorials use escaping and those can't be easily removed from the web, unfortunately.
I avoid absolutes in programming; there is always an exception. I highly recommend stored procedures and command objects. A majority of my back ground is with SQL Server, but I do play with MySql from time to time. There are many advantages to stored procedures including cached query plans; yes, this can be accomplished with parameters and inline SQL, but that opens up more possibilities for injection attacks and doesn't help with separation of concerns. For me it's also much easier to secure a database as my applications generally only have execute permission for said stored procedures. Without direct table/view access it's much more difficult to inject anything. If the applications user is compromised one only has permission to execute exactly what was pre-defined.
My two cents.
I wouldn't say "dumb".
I think the tutorials are the problem. Most SQL tutorials, books, whatever explain SQL with inlined values, not mentioning bind parameters at all. People learning from these tutorials don't have a chance to learn it right.
Because most code isn't written with security in mind, and management, given a choice between adding features (especially something visible that can be sold) and security/stability/reliability (which is a much harder sell) they will almost invariably choose the former. Security is only a concern when it becomes a problem.
Can parameterized statement stop all SQL injection?
Yes, as long as your database driver offers a placeholder for the every possible SQL literal. Most prepared statement drivers don't. Say, you'd never find a placeholder for a field name or for an array of values. Which will make a developer to fall back into tailoring a query by hand, using concatenation and manual formatting. With predicted outcome.
That's why I made my Mysql wrapper for PHP that supports most of literals that can be added to the query dynamically, including arrays and identifiers.
If yes, why are there still so many successful SQL injections? Just because some developers are too dumb to use parameterized statements?
As you can see, in reality it's just impossible to have all your queries parameterized, even if you're not dumb.
First my answer to your first question: Yes, as far as I know, by using parameterized queries, SQL injections will not be possible anymore. As to your following questions, I am not sure and can only give you my opinion on the reasons:
I think it's easier to "just" write the SQL query string by concatenate some different parts (maybe even dependent on some logical checks) together with the values to be inserted.
It's just creating the query and executing it.
Another advantage is that you can print (echo, output or whatever) the sql query string and then use this string for a manual query to the database engine.
When working with prepared statements, you always have at least one step more:
You have to build your query (including the parameters, of course)
You have to prepare the query on the server
You have to bind the parameters to the actual values you want to use for your query
You have to execute the query.
That's somewhat more work (and not so straightforward to program) especially for some "quick and dirty" jobs which often prove to be very long-lived...
Best regards,
Box
SQL injection is a subset of the larger problem of code injection, where data and code are provided over the same channel and data is mistaken for code. Parameterized queries prevent this from occurring by forming the query using context about what is data and what is code.
In some specific cases, this is not sufficient. In many DBMSes, it's possible to dynamically execute SQL with stored procedures, introducing a SQL injection flaw at the DBMS level. Calling such a stored procedure using parameterized queries will not prevent the SQL injection in the procedure from being exploited. Another example can be seen in this blog post.
More commonly, developers use the functionality incorrectly. Commonly the code looks something like this when done correctly:
db.parameterize_query("select foo from bar where baz = '?'", user_input)
Some developers will concatenate strings together and then use a parameterized query, which doesn't actually make the aforementioned data/code distinction that provides the security guarantees we're looking for:
db.parameterize_query("select foo from bar where baz = '" + user_input + "'")
Correct usage of parameterized queries provides very strong, but not impenetrable, protection against SQL injection attacks.
To protect your application from SQL injection, perform the following steps:
Step 1. Constrain input.
Step 2. Use parameters with stored procedures.
Step 3. Use parameters with dynamic SQL.
Refer to http://msdn.microsoft.com/en-us/library/ff648339.aspx
even if
prepared statements are properly used throughout the web application’s own
code, SQL injection flaws may still exist if database code components construct
queries from user input in an unsafe manner.
The following is an example of a stored procedure that is vulnerable to SQL
injection in the #name parameter:
CREATE PROCEDURE show_current_orders
(#name varchar(400) = NULL)
AS
DECLARE #sql nvarchar(4000)
SELECT #sql = ‘SELECT id_num, searchstring FROM searchorders WHERE ‘ +
‘searchstring = ‘’’ + #name + ‘’’’;
EXEC (#sql)
GO
Even if the application passes the user-supplied name value to the stored
procedure in a safe manner, the procedure itself concatenates this directly into
a dynamic query and therefore is vulnerable.
I have worked on SQL stored procedures and I have noticed that many people use two different approaches -
First, to use select queries i.e. something like
Select * from TableA where colA = 10 order by colA
Second, is to do the same by constructing a query i.e. like
Declare #sqlstring varchar(100)
Declare #sqlwhereclause varchar(100)
Declare #sqlorderby varchar(100)
Set #sqlstring = 'Select * from TableA '
Set #sqlwhereclause = 'where colA = 10 '
Set #sqlorderby = 'order by colA'
Set #sqlstring = #sqlstring + #sqlwhereclause + #sqlorderby
exec #sqlstring
Now, I know both work fine. But, the second method I mentioned is a little annoying to maintain.
I want to know which one is better? Is there any specific reason one would resort to one method over the other? Any benefits of one method over other?
Use the first one. This will allow a query plan to be cached properly, apart from being the way you are supposed to work with SQL.
The second one is open to SQL Injection attacks, apart from the other issues.
With the dynamic SQL you will not get compile time checking, so it may fail only when invoked (the sooner you know about incorrect syntax, the better).
And, you noted yourself, the maintenance burden is also higher.
The second method has the obvious drawback of not being syntax checked at compile time. It does however allow a dynamic order by clause, which the first does not. I recommend that you always use the first example unless you have a very good reason to make the query dynamic. And, as #Oded has already pointed out, be sure to guard yourself against sql injection if you do go for the second approach.
I don't have a full comprehensive answer for you, but I can tell you right now that the latter method is much more difficult to work with when importing the stored procedure as a function in an ORM. Since the SQL is constructed dynamically, you have to manually create any type-classes that are returned from the stored procedure that aren't directly correlated to entities in your model.
With that in mind, there are times where you simply can't avoid constructing a SQL statement, especially when where clauses and joins depend on the parameters passed in. In my experience, I have found that stored procs that are creating large, variably joined/whered statements for EXECs are trying to do too many things. In these situations, I would recommend you keep the Single Responsibility Principle in mind.
Executing dynamic SQL inside a stored procedure reduces the value of using stored procedures to just a saved query container. Stored procedures are mostly beneficial in that the query execution plan (a very costly operation) is compiled and stored in memory the first time the procedure is executed. This means that every subsequent execution of the procedure is bypassing the query plan calculations, and jumping right to the data retrieval portiion of the operation.
Also, allowing a stored procedure to take an executable query string as a parameter is dangerous. Anyone with execute permission on granted on the procedure could potentially cause havoc on the rest of the database.