I'm currently working on a .NET application and want to make it as modular as possible. I've already created a basic SELECT procedure, which returns data by checking inputted parameters on SQL Server side.
I want to create a procedure that parses structured data as string and inserts its' contents to corresponding table in database.
For example, I have a table as
CREATE TABLE ExampleTable (
id_exampleTable int IDENTITY (1, 1) NOT NULL,
exampleColumn1 nvarchar(200) NOT NULL,
exampleColumn2 int NULL,
exampleColumn3 int NOT NULL,
CONSTRAINT pk_exampleTable PRIMARY KEY ( id_exampleTable )
)
And my procedure starts as
CREATE PROCEDURE InsertDataIntoCorrespondingTable
#dataTable nvarchar(max), --name of Table in my DB
#data nvarchar(max) --normalized string parameter as 'column1, column2, column3, etc.'
AS
BEGIN
IF #dataTable = 'table'
BEGIN
/**Parse this string and execute insert command**/
END
ELSE IF /**Other statements**/
END
TL;DR
So basically, I'm looking for a solution that can help me achieve something like this
EXEC InsertDataIntoCorrespondingTableByID(
#dataTable = 'ExampleTable',
#data = '''exampleColumn1'', 2, 3'
)
Which should be equal to just
INSERT INTO ExampleTable SELECT 'exampleColumn1', 2, 3
Sure, I can push data as INSERT statements (for each and every 14 tables inside DB...), generated inside an app, but I want to conquer T-SQL :)
This might be reasonable (to some degree) on an RDBMS that supports structured data like JSON or XML natively, but doing this the way you are planning is going to cause some real pain-in-the-rear support and, more importantly, a sql injection attack vector. I would leave this to the realm of the web backend server where it belongs.
You are likely going to invent your own structured data markup language and parser to solve this as sql server. That's a wheel that doesn't need to be reinvented. If you do end up building this, highly consider going with JSON to avoid all the issues that structured data inherently bring with it, assuming your version of sql server supports json parsing/packaging.
Your front end that packages your data into your SDML is going to have to assume column ordinals, but column ordinal is not something that one should rely on in a database. SQL Amateurs often do, I know from years in the industry and dealing with end users that are upset when a new column is introduced in a position they don't want it. Adding a column to a table shouldn't break an application. If it does, that application has bad code.
Regarding the sql injection attack vector, your SP code is going to get ugly. You'll need to parse out each item in #data into a variable of its own in order to properly parameterize your dynamic sql that is being built. See here under the "working with parameters" section for what that will look like. Failure to add this to your SP code means that values passed in that #data SDML could become executable SQL instead of literals and that would be very bad. This is not easy to solve in SP language. Where it IS easy to solve though is in the backend server code. Every database library on the planet supports parameterized query building/execution natively.
Once you have this built you will be dynamically generating an INSERT statement and dynamically generating variables or an array or some data structure to pass in parameters to the INSERT statement to avoid sql injection attacks. It's going to be dynamic, on top of dynamic, on top of dynamic which leads to:
From a support context, imagine that your application just totally throws up one day. You have to dive into investigate. You track the SDML that your front end created that caused the failure, and you open up your SP code to troubleshoot. Imagine what this code ends up looking like
It has to determine if the table exists
It has to parse the SDML to get each literal
It has to read DB metadata to get the column list
It has to dynamically write the insert statement, listing the columns from metadata and dynamically creating sql parameters for the VALUES() list.
It has to execute sending a dynamic number of variables into the dynamically generated sql.
My support staff would hang me out to dry if they had to deal with that, and I'm the one paying them.
All of this is solved by using a proper backend to handle communication, deeper validation, sql parameter binding, error catching and handling, and all the other things that backend servers are meant to do.
I believe that your back end web server should be VERY aware of the underlying data model. It should be the connection between your view, your data, and your model. Leave the database to the things it's good at (reading and writing data). Leave your front end to the things that it's good at (presenting a UI for the end user).
I suppose you could do something like this (may need a little extra work)
declare #columns varchar(max);
select #columns = string_agg(name, ', ') WITHIN GROUP ( ORDER BY column_id )
from sys.all_columns
where object_id = object_id(#dataTable);
declare #sql varchar(max) = select concat('INSERT INTO ',#dataTable,' (',#columns,') VALUES (', #data, ')')
exec sp_executesql #sql
But please don't. If this were a good idea, there would be tons of examples of how to do it. There aren't so it's probably not a good idea.
There are however tons of examples of using ORMs or auto-generated code in stead - because that way your code is maintainable, debugable and performant.
Related
I am trying to create a stored procedure that has a table and as an argument and executes some queries on that table.
So...
CREATE PROCEDURE blabla
#TableName nvarchar(50)
AS
DROP TABLE #TableName -- just an example, real queries are much longer
GO
This query gives me incorrect syntax error.
I know I can always use sp_executesql procedure, but I want a neater way where I don't need to worry about building an endless sql string.
Thanks
Here is a good article on why not to use Dynamic SQL in most cases as well as how to use it properly when it is the best solution:
http://www.sommarskog.se/dynamic_sql.html
Basically, doing what you are looking to do has a number of issues, including not allowing the system to properly check for permission issues before executing, not being able to optimize the stored procedure, and (most importantly) opening yourself up to SQL injection. You can mitigate this last issue somewhat but it involves a much more complex statement. Here is a quote from the above article:
Passing table and column names as parameters to a procedure with dynamic SQL is rarely a good idea for application code. (It can make perfectly sense for admin tasks). As I've said, you cannot pass a table or a column name as a parameter to sp_executesql, but you must interpolate it into the SQL string. Still you should protect it against SQL injection, as a matter of routine. It could be that bad it comes from user input.
To this end, you should use the built-in function quotename() (added in SQL 7). quotename() takes two parameters: the first is a string, and the second is a pair of delimiters to wrap the string in. The default for the second parameter is []. Thus, quotename('Orders') returns [Orders]. quotename() takes care of nested delimiters, so if you have a really crazy table name like Left]Bracket, quotename() will return [Left]]Bracket].
Note that when you work with names with several components, each component should be quoted separately. quotename('dbo.Orders') returns [dbo.Orders], but that is a table in an unknown schema of which the first four characters are d, b, o and a dot. As long as you only work with the dbo schema, best practice is to add dbo in the dynamic SQL and only pass the table name. If you work with different schemas, pass the schema as a separate parameter. (Although you could use the built-in function parsename() to split up a #tblname parameter in parts.)
I know you want a "neater" way of creating a dynamic statement but the reality is that no only is that not possible for how you want to do this, really you need to make the statement even more complex in order to ensure that the stored procedure is safe. I would try very hard to look at a different way to solve this issue (the article had a few suggestions). If you can avoid making this statement into dynamic SQL, you really should.
There are very few places that parameters can be used in T-SQL. Usually, it's exactly the places where you would find a quoted string - not just any arbitrary place within the query (where the query is necessarily in a string form anyway)
E.g., you could use a parameter or variable to replace 'hello' below:
SELECT * from Table2 where ColA = 'hello'
But you couldn't use it where Table2 appears. I don't know why people seem to expect such things to be possible in T-SQL, when it's generally not possible in most other programming languages either, outside of exec/eval style functions.
If you have multiple tables that share the same structure (names and types of columns), it generally suggests that what you should actually have is a single table, with possibly additional column(s) that distinguish between rows that would originally be in different tables. E.g. if you currently have:
CREATE TABLE MaleEmployees (
EmployeeNo int not null,
Name varchar(50) not null,
)
and
CREATE TABLE FemaleEmployees (
EmployeeNo int not null,
Name varchar(50) not null
)
You should instead have:
CREATE TABLE Employees (
EmployeeNo int not null,
Name varchar(50) not null,
Gender char(1) not null,
constraint CK_Gender_Valid CHECK (Gender in ('M','F'))
)
You can then query this Employees table, regardless of gender, rather than trying to parametrize the table name within your query. Of course, the above is an exaggerated example.
set #l = 'DROP TABLE ' + #TableName
exec #l
But if that's what you mean by 'endless string', not sure what you want
The correct syntax(notice the begin):
CREATE PROCEDURE blabla
#TableName nvarchar(50)
AS
begin
DROP TABLE #TableName -- just an example, real queries are much longer
END
GO
According to this forum discussion, SQL Server (I'm using 2005 but I gather this also applies to 2000 and 2008) silently truncates any varchars you specify as stored procedure parameters to the length of the varchar, even if inserting that string directly using an INSERT would actually cause an error. eg. If I create this table:
CREATE TABLE testTable(
[testStringField] [nvarchar](5) NOT NULL
)
then when I execute the following:
INSERT INTO testTable(testStringField) VALUES(N'string which is too long')
I get an error:
String or binary data would be truncated.
The statement has been terminated.
Great. Data integrity preserved, and the caller knows about it. Now let's define a stored procedure to insert that:
CREATE PROCEDURE spTestTableInsert
#testStringField [nvarchar](5)
AS
INSERT INTO testTable(testStringField) VALUES(#testStringField)
GO
and execute it:
EXEC spTestTableInsert #testStringField = N'string which is too long'
No errors, 1 row affected. A row is inserted into the table, with testStringField as 'strin'. SQL Server silently truncated the stored procedure's varchar parameter.
Now, this behaviour might be convenient at times but I gather there is NO WAY to turn it off. This is extremely annoying, as I want the thing to error if I pass too long a string to the stored procedure. There seem to be 2 ways to deal with this.
First, declare the stored proc's #testStringField parameter as size 6, and check whether its length is over 5. This seems like a bit of a hack and involves irritating amounts of boilerplate code.
Second, just declare ALL stored procedure varchar parameters to be varchar(max), and then let the INSERT statement within the stored procedure fail.
The latter seems to work fine, so my question is: is it a good idea to use varchar(max) ALWAYS for strings in SQL Server stored procedures, if I actually want the stored proc to fail when too long a string is passed? Could it even be best practice? The silent truncation that can't be disabled seems stupid to me.
It just is.
I've never noticed a problem though because one of my checks would be to ensure my parameters match my table column lengths. In the client code too. Personally, I'd expect SQL to never see data that is too long. If I did see truncated data, it'd be bleeding obvious what caused it.
If you do feel the need for varchar(max) beware a massive performance issue because of datatype precedence. varchar(max) has higher precedence than varchar(n) (longest is highest). So in this type of query you'll get a scan not a seek and every varchar(100) value is CAST to varchar(max)
UPDATE ...WHERE varchar100column = #varcharmaxvalue
Edit:
There is an open Microsoft Connect item regarding this issue.
And it's probably worthy of inclusion in Erland Sommarkog's Strict settings (and matching Connect item).
Edit 2, after Martins comment:
DECLARE #sql VARCHAR(MAX), #nsql nVARCHAR(MAX);
SELECT #sql = 'B', #nsql = 'B';
SELECT
LEN(#sql),
LEN(#nsql),
DATALENGTH(#sql),
DATALENGTH(#nsql)
;
DECLARE #t table(c varchar(8000));
INSERT INTO #t values (replicate('A', 7500));
SELECT LEN(c) from #t;
SELECT
LEN(#sql + c),
LEN(#nsql + c),
DATALENGTH(#sql + c),
DATALENGTH(#nsql + c)
FROM #t;
Thanks, as always, to StackOverflow for eliciting this kind of in-depth discussion. I have recently been scouring through my Stored Procedures to make them more robust using a standard approach to transactions and try/catch blocks. I disagree with Joe Stefanelli that "My suggestion would be to make the application side responsible", and fully agree with Jez: "Having SQL Server verify the string length would be much preferable". The whole point for me of using stored procedures is that they are written in a language native to the database and should act as a last line of defence. On the application side the difference between 255 and 256 is just a meangingless number but within the database environment, a field with a maximum size of 255 will simply not accept 256 characters. The application validation mechanisms should reflect the backend db as best they can, but maintenance is hard so I want the database to give me good feedback if the application mistakenly allows unsuitable data. That's why I'm using a database instead of a bunch of text files with CSV or JSON or whatever.
I was puzzled why one of my SPs threw the 8152 error and another silently truncated. I finally twigged: The SP which threw the 8152 error had a parameter which allowed one character more than the related table column. The table column was set to nvarchar(255) but the parameter was nvarchar(256). So, wouldn't my "mistake" address gbn's concern: "massive performance issue"? Instead of using max, perhaps we could consistently set the table column size to, say, 255 and the SP parameter to just one character longer, say 256. This solves the silent truncation problem and doesn't incur any performance penalty.
Presumably there is some other disadvantage that I haven't thought of, but it seems a good compromise to me.
Update:
I'm afraid this technique is not consistent. Further testing reveals that I can sometimes trigger the 8152 error and sometimes the data is silently truncated. I would be very grateful if someone could help me find a more reliable way of dealing with this.
Update 2:
Please see Pyitoechito's answer on this page.
The same behavior can be seen here:
declare #testStringField [nvarchar](5)
set #testStringField = N'string which is too long'
select #testStringField
My suggestion would be to make the application side responsible for validating the input before calling the stored procedure.
Update: I'm afraid this technique is not consistent. Further testing reveals that I can sometimes trigger the 8152 error and sometimes the data is silently truncated. I would be very grateful if someone could help me find a more reliable way of dealing with this.
This is probably occurring because the 256th character in the string is white-space. VARCHARs will truncate trailing white-space on insertion and just generate a warning. So your stored procedure is silently truncating your strings to 256 characters, and your insertion is truncating the trailing white-space (with a warning). It will produce an error when said character is not white-space.
Perhaps a solution would be to make the stored procedure's VARCHAR a suitable length to catch a non-white-space character. VARCHAR(512) would probably be safe enough.
One solution would be to:
Change all incoming parameters to be varchar(max)
Have sp private variable of the correct datalength (simply copy and paste all in parameters and add "int" at the end
Declare a table variable with the column names the same as variable names
Insert into the table a row where each variable goes into the column with the same name
Select from the table into internal variables
This way your modifications to the existing code are going to be very minimal like in the sample below.
This is the original code:
create procedure spTest
(
#p1 varchar(2),
#p2 varchar(3)
)
This is the new code:
create procedure spTest
(
#p1 varchar(max),
#p2 varchar(max)
)
declare #p1Int varchar(2), #p2Int varchar(3)
declare #test table (p1 varchar(2), p2 varchar(3)
insert into #test (p1,p2) varlues (#p1, #p2)
select #p1Int=p1, #p2Int=p2 from #test
Note that if the length of the incoming parameters is going to be greater than the limit instead of silently chopping off the string SQL Server will throw off an error.
You could always throw an if statement into your sp's that check the length of them, and if they're greater than the specified length throw an error. This is rather time consuming though and would be a pain to update if you update the data size.
This isn't the Answer that'll solve your problem today, but it includes a Feature Suggestion for MSSQL to consider adding, that would resolve this issue.
It is important to call this out as a shortcoming of MSSQL, so we may help them resolve it by raising awareness of it.
Here's the formal Suggestion if you'd like to vote on it:
https://feedback.azure.com/forums/908035-sql-server/suggestions/38394241-request-for-new-rule-string-truncation-error-for
I share your frustration.
The whole point of setting Character-Size on Parameters is so other Developers will instantly know
what the Size Limits are (via Intellisense) when passing in Data.
This is like having your documentation baked right into the Sproc's Signature.
Look, I get it, Implicit-Conversion during Variable Assignments is the culprit.
Still, there is no good reason to expend this amount of energy battling scenarios
where you are forced to work around this feature.
If you ask me, Sprocs and Functions should have the same engine-rules in place,
for Assigning Parameters, that are used when Populating Tables. Is this really too much to ask?
All these suggestions to use Larger Character-Limits
and then adding Validation for EACH Parameter in EVERY Sproc is ridiculous.
I know it's the only way to ensure Truncation is avoided, but really MSSQL?
I don't care if it's ANSI/ISO Standard or whatever, it's dumb!
When Values are too long - I want my code to break - every time.
It should be: Do not pass go, and fix your code.
You could have multiple truncation bugs festering for years and never catch them.
What happened to ensuring your Data-Integrity?
It's dangerous to assume your SQL Code will only ever be called after all Parameters are Validated.
I try to add the same Validation to both my Website and in the Sproc it calls,
and I still catch Errors in my Sproc that slipped past the website. It's a great sanity-check!
What if you want to re-use your Sproc for a WebSite/WebService and also have it called from other
Sprocs/Jobs/Deployment/Ad-Hoc Scripts (where there is no front-end to Validate Parameters)?
MSSQL Needs a "NO_TRUNC" Option to Enforce this on any Non-Max String Variable
(even those used as Parameters for Sprocs and Functions).
It could be Connection/Session-Scoped:
(like how the "TRANSACTION ISOLATION LEVEL READ UNCOMMITTED" Option affects all Queries)
Or focused on a Single Variable:
(like how "NOLOCK" is a Table Hint for just 1 Table).
Or a Trace-Flag or Database Property you turn on to apply this to All Sproc/Function Parameters in the Database.
I'm not asking to upend decades of Legacy Code.
Just asking MS for the option to better manage our Databases.
To make a long story short...
I'm building a web app in which the user can select any combination of about 40 parameters. However, for one of the results they want(investment experience), I have to extract information from a different table and compare the values in six different columns(stock exp, mutual funds exp, etc) and return only the highest value of the six for that specific record.
This is not the issue. The issue is that at runtime, my query to find the investment exp doesn't necessarily know the account id. Considering a table scan would bring well over half a million clients, this is not an option. So what I'm trying to do is edit a copy of my main dynamically built query, but instead of returning 30+ columns, it'll just return 2, the accountid and experienceid (which is the PK for the experience table) so I can do the filtering deal.
Some of you may define dynamic SQL a little different than myself. My query is a string that depending on the arguments sent to my procedure, portions of the where clause will be turned on or off by switches. In the end I execute, it's all done on the server side, all the web app does is send an array of arguments to my proc.
My over simplified code looks essentially like this:
declare #sql varchar(8000)
set #sql =
'select [columns]
into #tempTable
from [table]
[table joins]' + #dynamicallyBuiltWhereClause
exec(#sql)
after this part I try to use #tempTable for the investment experience filtering process, but i get an error telling me #tempTable doesn't exist.
Any and all help would be greatly appreciated.
The problem is the scope of your temp table only exists within the exec() statement. You can transform your temp table into a "global" temp table by using 2 hash signs -> ##tempTable. However, I wonder why you are using a variable #dynamicallyBuiltWhereClause to generate your SQL statement.
I have done what you are doing in the past, but have had better success generating SQL from the application (using C# to generate my SQL).
Also, you may want to look into Table Variables. I have seen some strange instances using temp tables where an application re-uses a connection and the temp table from the last query is still there.
I have about half a dozen generic, but fairly complex stored procedures and functions that I would like to use in a more generic fashion.
Ideally I'd like to be able to pass the table name as a parameter to the procedure, as currently it is hard coded.
The research I have done suggests I need to convert all existing SQL within my procedures to use dynamic SQL in order to splice in the dynamic table name from the parameter, however I was wondering if there is a easier way by referencing the table in another way?
For example:
SELECT * FROM #MyTable WHERE...
If so, how do I set the #MyTable variable from the table name?
I am using SQL Server 2005.
Dynamic SQL is the only way to do this, but I'd reconsider the architecture of your application if it requires this. SQL isn't very good at "generalized" code. It works best when it's designed and coded to do individual tasks.
Selecting from TableA is not the same as selecting from TableB, even if the select statements look the same. There may be different indexes, different table sizes, data distribution, etc.
You could generate your individual stored procedures, which is a common approach. Have a code generator that creates the various select stored procedures for the tables that you need. Each table would have its own SP(s), which you could then link into your application.
I've written these kinds of generators in T-SQL, but you could easily do it with most programming languages. It's pretty basic stuff.
Just to add one more thing since Scott E brought up ORMs... you should also be able to use these stored procedures with most sophisticated ORMs.
You'd have to use dynamic sql. But don't do that! You're better off using an ORM.
EXEC(N'SELECT * from ' + #MyTable + N' WHERE ... ')
You can use dynamic Sql, but check that the object exists first unless you can 100% trust the source of that parameter. It's likely that there will be a performance hit as SQL server won't be able to re-use the same execution plan for different parameters.
IF OBJECT_ID(#tablename, N'U') IS NOT NULL
BEGIN
--dynamic sql
END
ALTER procedure [dbo].[test](#table_name varchar(max))
AS
BEGIN
declare #tablename varchar(max)=#table_name;
declare #statement varchar(max);
set #statement = 'Select * from ' + #tablename;
execute (#statement);
END
In the application I'm working on porting to the web, we currently dynamically access different tables at runtime from run to run, based on a "template" string that is specified. I would like to move the burden of doing that back to the database now that we are moving to SQL server, so I don't have to mess with a dynamic GridView. I thought of writing a Table-valued UDF with a parameter for the table name and one for the query WHERE clause.
I entered the following for my UDF but obviously it doesn't work. Is there any way to take a varchar or string of some kind and get a table reference that can work in the FROM clause?
CREATE FUNCTION TemplateSelector
(
#template varchar(40),
#code varchar(80)
)
RETURNS TABLE
AS
RETURN
(
SELECT * FROM #template WHERE ProductionCode = #code
)
Or some other way of getting a result set similar in concept to this. Basically all records in the table indicated by the varchar #template with the matching ProductionCode of the #code.
I get the error "Must declare the table variable "#template"", so SQL server probably things I'm trying to select from a table variable.
On Edit: Yeah I don't need to do it in a function, I can run Stored Procs, I've just not written any of them before.
CREATE PROCEDURE TemplateSelector
(
#template varchar(40),
#code varchar(80)
)
AS
EXEC('SELECT * FROM ' + #template + ' WHERE ProductionCode = ' + #code)
This works, though it's not a UDF.
The only way to do this is with the exec command.
Also, you have to move it out to a stored proc instead of a function. Apparently functions can't execute dynamic sql.
The only way that this would be possible is with dynamic SQL, however, dynamic SQL is not supported by SqlServer within a function.
I'm sorry to say that I'm quite sure that it is NOT possible to do this within a function.
If you were working with stored procedures it would be possible.
Also, it should be noted that, be replacing the table name in the query, you've destroyed SQL Server's ability to cache the execution plan for the query. This pretty much reduces the advantage of using a UDF or SP to nil. You might as well just call the SQL query directly.
I have a finite number of tables that I want to be able to address, so I could writing something using IF, that tests #template for matches with a number of values and for each match runs
SELECT * FROM TEMPLATENAME WHERE ProductionCode = #code
It sounds like that is a better option
If you have numerous tables with identical structure, it usually means you haven't designed your database in a normal form. You should unify these into one table. You may need to give this table one more attribute column to distinguish the data sets.