Is there a way to validate programmability objects in SQL Server 2008?
I have a database with ~500 programmability objects which depend on other programmability objects (not only tables).
If I do some refactoring, it is very hard find other objects which are broken by the changes. For example if I change the parameter count...
Original state of database:
CREATE FUNCTION [dbo].[GetSomeText]() RETURNS nvarchar(max) AS BEGIN RETURN 'asdf' END
/* uses "GetSomeText()" function */
CREATE FUNCTION [dbo].[GetOtherText]() RETURNS nvarchar(max) AS BEGIN RETURN [dbo].[GetSomeText]() + '-qwer' END
Now I do some refactoring (add parameter #Num to GetSomeText() function):
ALTER FUNCTION [dbo].[GetSomeText](#Num int) RETURNS nvarchar(max) AS BEGIN RETURN 'asdf' + CAST(#Num as nvarchar(max)) END
Now the function GetOtherText() is broken, because it is calling GetSomeText() function without a required parameter.
Is there a way to get information about this error?
Currently I script every programmability object as ALTER, run the alter script, and check for errors. This way looks to be too complex (and is hard to use in T-SQL only enviroment).
EDIT:
Thanks for answers! I know how to get dependenices or list of all objects.
The problem is in checking the body of object. If I get the dependency, is there other way to check validity than run ALTER script?
I don't think there's a way to find the dependency. You can, however, find everything that references the name of the object you're changing like this:
select OBJECT_DEFINITION(o.object_id) as objectDefinition, *
from sys.objects o
where o.type in ('P', 'FN')
and OBJECT_DEFINITION(o.object_id) like '%GetSomeText%'
o.type in ('P', 'FN') limits the search to P - Procedures and FN - Scalar Functions. Check out more info about OBJECT_DEFINITION: http://msdn.microsoft.com/en-us/library/ms176090.aspx
Perhaps you could try introducing some automated database developer/unit testing.
With 500 SQL objects it would be onerous to go back and 'retro fit' for them all. Best approach might be to incrementally create these tests as the need to refactor/change existing APIs/create new SQL objects arises
These automated tests could then be included as part of your overall continous integration approach. Note for the example given you would still have the issue of finding existing dependencies. But once there was sufficient test coverage the tests should highlight any breaking changes introduced.
I have created a test tool that might be of use - but there are a number of others out there:
http://dbtestunit.wordpress.com/
One of the easiest ways to get dependency is to use sp_depends. This does work with functions, but you need to be sure you are in the right DB context:
USE MyDatabase
EXEC sp_depends #objname = N'dbo.FunctionName'
This will show you any object whether it be a function, stored proc, table, or view that has a dependency for the listed object.
This is not always accurate with cross-database dependencies, though, so be aware.
Related
I'm currently working on a .NET application and want to make it as modular as possible. I've already created a basic SELECT procedure, which returns data by checking inputted parameters on SQL Server side.
I want to create a procedure that parses structured data as string and inserts its' contents to corresponding table in database.
For example, I have a table as
CREATE TABLE ExampleTable (
id_exampleTable int IDENTITY (1, 1) NOT NULL,
exampleColumn1 nvarchar(200) NOT NULL,
exampleColumn2 int NULL,
exampleColumn3 int NOT NULL,
CONSTRAINT pk_exampleTable PRIMARY KEY ( id_exampleTable )
)
And my procedure starts as
CREATE PROCEDURE InsertDataIntoCorrespondingTable
#dataTable nvarchar(max), --name of Table in my DB
#data nvarchar(max) --normalized string parameter as 'column1, column2, column3, etc.'
AS
BEGIN
IF #dataTable = 'table'
BEGIN
/**Parse this string and execute insert command**/
END
ELSE IF /**Other statements**/
END
TL;DR
So basically, I'm looking for a solution that can help me achieve something like this
EXEC InsertDataIntoCorrespondingTableByID(
#dataTable = 'ExampleTable',
#data = '''exampleColumn1'', 2, 3'
)
Which should be equal to just
INSERT INTO ExampleTable SELECT 'exampleColumn1', 2, 3
Sure, I can push data as INSERT statements (for each and every 14 tables inside DB...), generated inside an app, but I want to conquer T-SQL :)
This might be reasonable (to some degree) on an RDBMS that supports structured data like JSON or XML natively, but doing this the way you are planning is going to cause some real pain-in-the-rear support and, more importantly, a sql injection attack vector. I would leave this to the realm of the web backend server where it belongs.
You are likely going to invent your own structured data markup language and parser to solve this as sql server. That's a wheel that doesn't need to be reinvented. If you do end up building this, highly consider going with JSON to avoid all the issues that structured data inherently bring with it, assuming your version of sql server supports json parsing/packaging.
Your front end that packages your data into your SDML is going to have to assume column ordinals, but column ordinal is not something that one should rely on in a database. SQL Amateurs often do, I know from years in the industry and dealing with end users that are upset when a new column is introduced in a position they don't want it. Adding a column to a table shouldn't break an application. If it does, that application has bad code.
Regarding the sql injection attack vector, your SP code is going to get ugly. You'll need to parse out each item in #data into a variable of its own in order to properly parameterize your dynamic sql that is being built. See here under the "working with parameters" section for what that will look like. Failure to add this to your SP code means that values passed in that #data SDML could become executable SQL instead of literals and that would be very bad. This is not easy to solve in SP language. Where it IS easy to solve though is in the backend server code. Every database library on the planet supports parameterized query building/execution natively.
Once you have this built you will be dynamically generating an INSERT statement and dynamically generating variables or an array or some data structure to pass in parameters to the INSERT statement to avoid sql injection attacks. It's going to be dynamic, on top of dynamic, on top of dynamic which leads to:
From a support context, imagine that your application just totally throws up one day. You have to dive into investigate. You track the SDML that your front end created that caused the failure, and you open up your SP code to troubleshoot. Imagine what this code ends up looking like
It has to determine if the table exists
It has to parse the SDML to get each literal
It has to read DB metadata to get the column list
It has to dynamically write the insert statement, listing the columns from metadata and dynamically creating sql parameters for the VALUES() list.
It has to execute sending a dynamic number of variables into the dynamically generated sql.
My support staff would hang me out to dry if they had to deal with that, and I'm the one paying them.
All of this is solved by using a proper backend to handle communication, deeper validation, sql parameter binding, error catching and handling, and all the other things that backend servers are meant to do.
I believe that your back end web server should be VERY aware of the underlying data model. It should be the connection between your view, your data, and your model. Leave the database to the things it's good at (reading and writing data). Leave your front end to the things that it's good at (presenting a UI for the end user).
I suppose you could do something like this (may need a little extra work)
declare #columns varchar(max);
select #columns = string_agg(name, ', ') WITHIN GROUP ( ORDER BY column_id )
from sys.all_columns
where object_id = object_id(#dataTable);
declare #sql varchar(max) = select concat('INSERT INTO ',#dataTable,' (',#columns,') VALUES (', #data, ')')
exec sp_executesql #sql
But please don't. If this were a good idea, there would be tons of examples of how to do it. There aren't so it's probably not a good idea.
There are however tons of examples of using ORMs or auto-generated code in stead - because that way your code is maintainable, debugable and performant.
I want to create a function to return a list of files in a directory so that I can call the function in a SELECT statement. Yes I could use a stored procedure, but then I would need to use a cursor.
This is what I want to do, but this gives the error
Invalid use of a side-effecting operator 'INSERT EXEC' within a function.
Code:
CREATE FUNCTION [dbo].[fnGetFilesInDirectory]
(#Path VARCHAR(512),
#FileMask VARCHAR(256))
RETURNS #Files TABLE (
FilePath VARCHAR(512)
)
AS
BEGIN
DECLARE #Cmd VARCHAR(8000)
SET #cmd = 'dir ' + quotename(#Path + #FileMask, NCHAR(34)) + ' /B'
INSERT INTO #Files (FilePath)
EXEC xp_cmdshell #cmd
RETURN
END
Funnily enough, this is valid:
INSERT INTO #Files (FilePath) SELECT 'test.txt'
and this is valid without the INSERT before it:
EXEC xp_cmdshell #cmd
But combining them is not.
Any suggestions on another approach to this.
The documentation clearly specifies that this is not possible:
Calling Extended Stored Procedures from Functions
The extended stored procedure, when it is called from inside a
function, cannot return result sets to the client. Any ODS APIs that
return result sets to the client will return FAIL. The extended stored
procedure could connect back to an instance of SQL Server; however, it
should not try to join the same transaction as the function that
invoked the extended stored procedure.
I am not sure where this limitation comes from. The suggested work-around is a hack, but it might work. Call an extended stored procedure that executes a shell script that connects to the database that populates a table with the results of the shell command into another table. The use the results from that table. There might be some transactional issues.
I don't fully understand the advantage of putting this logic in a function. I admit it might seem convenient. But, if you are iterating through files -- say to load them -- then you need to execute stored procedures on each one. If you are loading a table, you can do so through a stored procedure, using the same logic.
The problem is almost certainly that the INSERT INTO table EXEC proc; construct creates an internal Transaction, and you aren't allowed to use Transactions in T-SQL functions (Scalar UDF and Multi-statement TVF; Inline TVF isn't relevant here as it can only be a SELECT statement).
However, this is rather trivial to handle via a SQLCLR TVF. You can use classes like FileSystemInfo and DirectoryInfo, etc., to enumerate files in directories in several different ways (i.e. with or without passing in filters that can include the * and ? wildcards, recursive through subdirectories or not). You just need to mark the Assembly as WITH PERMISSION_SET = EXTERNAL_ACCESS. And you do not need (or want) to set the DB to TRUSTWORTHY ON, but instead sign the Assembly, create an Asymmetric Key in [master] from the signed Assembly, create a Login from that Asymmetric Key, and then grant that Login the EXTERNAL ACCESS ASSEMBLY permission. For more information on working with SQLCLR, please see the series I am writing on that topic at SQL Server Central: Stairway to SQLCLR (that site does require free registration, but it's definitely worth it). Level 7 in particular shows how to handle doing the security properly when using Visual Studio/SSDT.
For anyone who doesn't want to deal with doing any development, I wrote a library of SQLCLR functions and stored procedures called SQL# that includes several file system functions, including File_GetDirectoryListing which does exactly this. It is a streaming TVF so it is very fast / efficient, and allows for RegEx filters on Filename and Path instead of the standard * and ? wildcards. However, just FYI: it is only available in the Full version, not in the Free version.
I am working with some commercial schemas, which have a a set of similar tables, which differ only in language name e.g.:
Products_en
Products_fr
Products_de
I also have several stored procedures which I am using to access these to perform some administrative functions, and I have opted to use synonyms since there is a lot of code, and writing everything as dynamic SQL is just painful:
declare #lang varchar(50) = 'en'
if object_id('dbo.ProductsTable', 'sn') is not null drop synonym dbo.ProductsTable
exec('create synonym dbo.ProductsTable for dbo.Products_' + #lang)
/* Call the synonym table */
select top 10 * from dbo.ProductsTable
update ProductsTable set a = 'b'
My question is how does SQL Server treat synonyms when it comes to concurrent access? My fear is that a procedure could start, then a second come along and change the table the synonym points to halfway through causing major issues. I could wrap everything in a BEGIN TRAN and COMMIT TRAN which should theoretically remove the risk of two processes changing a synonym, however the documentation is scarce on this matter and I can not get a definitive answer.
Just to note, although this system is concurrent, it is not high traffic, so the performance hits of using synonyms/transactions are not really an issue here.
Thanks for any suggestions.
Your fear is correct. Synonyms are not intended to used in this way. Wrapping it is a transaction (not sure what isolation level would be required) might solve the issue, but only by making the system single user.
If I was dealing with this then I would probably have gone with dynamic SQL becuase I am familiar with it. However, having thought about it I wonder if schemas could solve your problem.
If you created schema for each language and then had a table called products in each schema. Your stored proc can then reference an un-qualified table name and SQL should resolve the reference to the table that is in the default schema of the current user. You'll then need to either change what account your application authenticates as to determine which schema it uses or use EXECUTE AS in a stored proc to decide which schema is default.
I haven't tested this schema idea, I may not have thought of everything and I don't know enough about your application to know if it is actually workable in your case. Let us know if you decide to try it.
When it comes to creating stored procedures, views, functions, etc., is it better to do a DROP...CREATE or an ALTER on the object?
I've seen numerous "standards" documents stating to do a DROP...CREATE, but I've seen numerous comments and arguments advocating for the ALTER method.
The ALTER method preserves security, while I've heard that the DROP...CREATE method forces a recompile on the entire SP the first time it's executed instead of just a a statement level recompile.
Can someone please tell me if there are other advantages / disadvantages to using one over the other?
ALTER will also force a recompile of the entire procedure. Statement level recompile applies to statements inside procedures, eg. a single SELECT, that are recompiled because the underlying tables changes, w/o any change to the procedure. It wouldn't even be possible to selectively recompile just certain statements on ALTER procedure, in order to understand what changed in the SQL text after an ALTER procedure the server would have to ... compile it.
For all objects ALTER is always better because it preserves all security, all extended properties, all dependencies and all constraints.
This is how we do it:
if object_id('YourSP') is null
exec ('create procedure dbo.YourSP as select 1')
go
alter procedure dbo.YourSP
as
...
The code creates a "stub" stored procedure if it doesn't exist yet, otherwise it does an alter. In this way any existing permissions on the procedure are preserved, even if you execute the script repeatedly.
Starting with SQL Server 2016 SP1, you now have the option to use CREATE OR ALTER syntax for stored procedures, functions, triggers, and views. See CREATE OR ALTER – another great language enhancement in SQL Server 2016 SP1 on the SQL Server Database Engine Blog. For example:
CREATE OR ALTER PROCEDURE dbo.MyProc
AS
BEGIN
SELECT * FROM dbo.MyTable
END;
Altering is generally better. If you drop and create, you can lose the permissions associated with that object.
If you have a function/stored proc that is called very frequently from a website for example, it can cause problems.
The stored proc will be dropped for a few milliseconds/seconds, and during that time, all queries will fail.
If you do an alter, you don't have this problem.
The templates for newly created stored proc are usually this form:
IF EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = '<name>')
BEGIN
DROP PROCEDURE <name>
END
GO
CREATE PROCEDURE <name>
......
However, the opposite is better, imo:
If the storedproc/function/etc doesn't exist, create it with a dummy select statement. Then, the alter will always work - it will never be dropped.
We have a stored proc for that, so our stored procs/functions usually like this:
EXEC Utils.pAssureExistance 'Schema.pStoredProc'
GO
ALTER PROCECURE Schema.pStoredProc
...
and we use the same stored proc for functions:
EXEC Utils.pAssureExistance 'Schema.fFunction'
GO
ALTER FUNCTION Schema.fFunction
...
In Utils.pAssureExistance we do a IF and look at the first character after the ".": If it's a "f", we create a dummy fonction, if it's "p", we create a dummy stored proc.
Be careful though, if you create a dummy scalar function, and your ALTER is on a table-valued function, the ALTER FUNCTION will fail, saying it's not compatible.
Again, Utils.pAssureExistance can be handy, with an additional optional parameter
EXEC Utils.pAssureExistance 'Schema.fFunction', 'TableValuedFunction'
will create a dummy table-valued function,
Additionaly, I might be wrong, but I think if you do a drop procedure and a query is currently using the stored proc, it will fail.
However, an alter procedure will wait for all queries to stop using the stored proc, and then alter it. If the queries are "locking" the stored proc for too long (say a couple seconds), the ALTER will stop waiting for the lock, and alter the stored proc anyway: the queries using the stored proc will probably fail at that point.
DROP generally loses permissions AND any extended properties.
On some UDFs, ALTER will also lose extended properties (definitely on SQL Server 2005 multi-statement table-valued functions).
I typically do not DROP and CREATE unless I'm also recreating those things (or know I want to lose them).
I don't know if it's possible to make such blanket comment and say "ALTER is better". I think it all depends on the situation. If you require this sort of granular permissioning down to the procedure level, you probably should handle this in a separate procedure. There are benefits to having to drop and recreate. It cleans out existing security and resets it what's predictable.
I've always preferred using drop/recreate. I've also found it easier to store them in source control. Instead of doing .... if exists do alter and if not exists do create.
With that said... if you know what you're doing... I don't think it matters too much.
If you perform a DROP, and then use a CREATE, you have almost the
same effect as using an ALTER VIEW statement. The problem is that you need to entirely re-establish your permissions on who can and can’t use the view. ALTER retains any dependency information and set permissions.
You've asked a question specifically relating to DB objects that do not contain any data, and theoretically should not be changed that often.
Its likely you may need to edit these objects but not every 5 minutes. Because of this I think you've already hit the hammer on the head - permissions.
Short answer, not really an issue, so long as permissions are not an issue
We used to use alter while we were working in development either creating new functionality or modifying the functionality. When we were done with our development and testing we would then do a drop and create. This modifys the date/time stamp on the procs so you can sort them by date/time.
It also allowed us to see what was bundeled by date for each deliverable we sent out.
Add with a drop if exists is better because if you have multiple environments when you move the script to QA or test or prod you don't know if the script already exists in that environment. By adding an drop (if it already exists) and and then add you will be covered regardless if it exists or not. You then have to reapply permissions but its better then hearing your install script error-ed out.
From a usability point of view a drop and create is better than a alter. Alter will fail in a database that doesn't contain that object, but having an IF EXISTS DROP and then a CREATE will work in a database with the object already in existence or in a database where the object doesn't exist. In Oracle and PostgreSQL you normally create functions and procedures with the statement CREATE OR REPLACE that does the same as a SQL SERVER IF EXISTS DROP and then a CREATE. It would be nice if SQL Server picked up this small but very handy syntax.
This is how I would do it. Put all this in one script for a given object.
IF EXISTS ( SELECT 1
FROM information_schema.routines
WHERE routine_schema = 'dbo'
AND routine_name = '<PROCNAME'
AND routine_type = 'PROCEDURE' )
BEGIN
DROP PROCEDURE <PROCNAME>
END
GO
CREATE PROCEDURE <PROCNAME>
AS
BEGIN
END
GO
GRANT EXECUTE ON <PROCNAME> TO <ROLE>
GO
Is it possible to create a stored procedure as
CREATE PROCEDURE Dummy
#ID INT NOT NULL
AS
BEGIN
END
Why is it not possible to do something like this?
You could check for its NULL-ness in the sproc and RAISERROR to report the state back to the calling location.
CREATE proc dbo.CheckForNull #i int
as
begin
if #i is null
raiserror('The value for #i should not be null', 15, 1) -- with log
end
GO
Then call:
exec dbo.CheckForNull #i = 1
or
exec dbo.CheckForNull #i = null
Your code is correct, sensible and even good practice. You just need to wait for SQL Server 2014 which supports this kind of syntax.
After all, why catch at runtime when you can at compile time?
See also this Microsoft document and search for Natively Compiled in there.
As dkrez says, nullability is not considered part of the data type definition. I still wonder why not.
Oh well, it seems I cannot edit #Unsliced post because "This edit deviates from the original intent of the post. Even edits that must make drastic changes should strive to preserve the goals of the post's owner.".
So (#crokusek and everyone interested) this is my porposed solution:
You could check for its NULL-ness in the sproc and RAISERROR to report the state back to the calling location.
CREATE proc dbo.CheckForNull
#name sysname = 'parameter',
#value sql_variant
as
begin
if #value is null
raiserror('The value for %s should not be null', 16, 1, #name) -- with log
end
GO
Then call:
exec dbo.CheckForNull #name 'whateverParamName', #value = 1
or
exec dbo.CheckForNull #value = null
One reason why you may need such syntax is that, when you use sp in C# dataset GUI wizard, it creates function with nullable parameters if there is no null restriction. No null check in sp body helps it.
Parameter validation is not currently a feature of procedural logic in SQL Server, and NOT NULL is only one possible type of data validation. The CHAR datatype in a table has a length specification. Should that be implemented as well? And how do you handle exceptions? There is an extensive, highly developed and somewhat standards-based methodology for exception handling in table schemas; but not for procedural logic, probably because procedural logic is defined out of relational systems. On the other hand, stored procedures already have an existing mechanism for raising error events, tied into numerous APIs and languages. There is no such support for declarative data type constraints on parameters. The implications of adding it are extensive; especially since it's well-supported, and extensible, to simply add the code:
IF ISNULL(#param) THEN
raise error ....
END IF
The concept of NULL in the context of a stored procedure isn't even well-defined especially compared to the context of a table or an SQL expression. And it's not Microsoft's definition. The SQL standards groups have spent a lot of years generating a lot of literature establishing the behavior of NULL and the bounds of the definitions for that behavior. And stored procedures isn't one of them.
A stored procedure is designed to be as light-weight as possible to make database performance as efficient as possible. The datatypes of parameters are there not for validation, but to enable the compiler to give the query optimizer better information for compiling the best possible query plan. A NOT NULL constraint on a parameter is headed down a whole nother path by making the compiler more complex for the new purpose of validating arguments. And hence less efficient and heavier.
There's a reason stored procedures aren't written as C# functions.