I'm providing maintenance support for some SSIS packages. The packages have some data flow sources with complex embedded SQL scripts that need to be modified from time to time. I'm thinking about moving those SQL scripts into stored procedures and call them from SSIS, so that they are easier to modify, test, and deploy. I'm just wondering if there is any negative impact for the new approach. Can anyone give me a hint?
Yes there are issues with using stored procs as data sources (not in using them in Execute SQL tasks though in the control flow)
You might want to read this:
http://www.jasonstrate.com/2011/01/31-days-of-ssis-no-more-procedures-2031/
Basically the problem is that SSIS cannot always figure out the result set and thus the columns from a stored proc. I personally have run into this if you write a stored proc that uses a temp table.
I don't know that I would go as far as the author of the article and not use procs at all, but be careful that you are not trying to do too much with them and if you have to do something complicated, do it in an execute sql task before the dataflow.
I can honestly see nothing but improvements. Stored procedures will offer better security, the possibility for better performance due to cached execution plans, and easier maintenance, like you pointed out.
Refactor away!
You will not face issues using only simple stored procedures as data source. If procedure is using temp tables and CTE - there is no guarantee you will not face issues. Even when you can preview results in design time - you may get errors in a run time.
My experience has been that trying to get a sproc to function as a data source is just not worth the headache. Maybe some simple sprocs are fine, and in some cases TVFs will work well instead, but if you need to do some complex operations there's no alternative to a sproc.
The best workaround I've found is to create an output table for each sproc you need to use in SSIS.
Modify the sproc to truncate the new output table at start, and to write its output to this instead of (or in addition to) ending with a SELECT statement.
Call the sproc with an Exec SQL task before your data flow.
Have your data flow read from the output table - a much simpler task.
If you want to save space, truncate the output table again with another Exec SQL. I prefer to leave it, as it lets me examine the data later and lets me rerun the output data flow if it fails without calling the sproc again.
This is certainly less elegant than reading directly from a sproc's output, but it works. FWIW, this pattern follows the philosophy (obligate in Oracle) that a sproc should not try to be a parameterized view.
Of course, all this assumes that you have privs to adjust the sproc in question. If necessary, you could write a new wrapper sproc which truncates the output table, then calls the old sproc and redirects its output to the new table.
Related
I have T-SQL queries stored on a hard drive: I:\queries\query1.sql and I:\queries\query2.sql.
I usually work in a way that I execute a query from a drive, and then I copy results into Excel, and then I work on it.
My problem here is that query1.sql is already long, and now I would like to extend it by getting a result of query2.sql, and join it with a result of query1.sql.
What I could do is appending a code from query2.sql to query1.sql. But then the query is getting really long and hard to maintain.
I would like to do something like this:
SELECT * FROM ("Result of I:\queries\query1.sql") q1
LEFT JOIN ("Result of I:\queries\query2.sql") q2 ON q1.ID=q2.ID
Is there any way to write a query or stored procedure, which will be again stored on a drive to do this?
Basically, you need to ask your DBA for a database when you are able to store things in the database. This can be on the same system where the data is stored. Or, it could be on a linked system. Gosh, you could run SQL Server locally and store the information and data there.
Then, the queries that you are storing in files should be views in the database. You can then run the queries and store and combine the results locally.
You are essentially recreating database functionality using text files and data files -- going through a lot of effort when SQL Server already supports this functionality.
To expand on Gordon's comment (+1), why are you running scripts off of a drive? Most DBA's I've known would treaten bodily harm over this as executing code that they can't control / troubleshoot / see source code control on brings a whole host of security and supportability issues.
Far better to store this code in a Stored Procedure, which will have a saved query execution plan, can be tracked using various DMV's, and have permissions assigned to it, then your outside Excel doc can just set a connection and execute the SP.
I'm not sure if a stored procedure is the correct path here so first, I'll explain what I need to accomplish.
I need to query one table with a variable as such:
SELECT *
FROM db.partList
WHERE column1 = 'order_no';
Then, I need to query another table as such:
SELECT [serial_no]
FROM db.resultsList;
Finally I need to iterate through the results of the above, and return the first [serial_no] from db.partList that is not in the list produced.
The original programmer was doing this in a way that was blowing up the customer's network unnecessarily. There shouldn't be any reason this can't be done locally. Now I'm here to clean it up.
Thanks in advance.
So my questions are, would this be correct use of a stored procedure? If so, could someone perhaps give me some sample code to start working with? I don't often have to dive that deep into SQL Server.
This is SQL Server 2012.
I've got some example code of how I would do it in other languages if needed. I'm just not familiar enough with stored procedures to do this quickly.
In a SSIS ETL, I have a query that I need to run on a server/db that does not allow us to create stored procedures.
I would normally use the stored procedure in my variable as the source for my OLE DB source:
However, since we can't put the stored procedure on this server, I was going to store the code for the stored procedure into a variable by executing a SQL statement, retrieving the text from our home database, then use the text stored in this variable as the SQL command for the source:
This way, I can still remotely change the SSIS OLE DB Source object WHERE clause (as long as I don't change the SELECT portion).
I can't imagine that this is very common, so I wanted to get some opinions - is there a better way to do this? I don't want to put all of the code for this SP into the OLE DB Source editor directly because we can't afford to redeploy in case of a WHERE clause update.
You've got the part down that many folks don't do and that's using Variables to drive your package execution. You are further correct in that you can't exactly swap out your columns. To be pedantic, which I am, you can completely change out the query as long as the same metadata is presented.
So, then this question becomes how best to accomplish allowing a package to have a query's filter driven by an external force. Factoring in maintainability, ease of debugging, etc.
My gut reaction is 3 Variables
QueryBase: String. Hardcoded. SELECT * FROM MyTable except of course I'd enumerate my columns
Query: String. EvaluateAsExpression = True Expression: #[User::QueryBase] + #[User::QueryFilter]
QueryFilter: String
So, we use Query in the OLE DB Source much as you have your longer variable name in there. The only downside to this approach, pre SSIS-2012 is the limitation on string length in an expression. It was ... 4k I believe. If you assign a value of 5k characters, it's fine. It's just in the expression language, adding two strings together can't exceed 4k.
I didn't specify what QueryFilter is going to have in it or the magic to get it there. That, I would base on the bigger picture of your environment, usage, etc. but the general concept is that it will eventually turn into WHERE Condition1 IS NOT NULL but maybe in a full reload situation, it becomes an empty string.
So, what are our options for changing the value of QueryFilter
/SET is an optional parameter passed to the invoking process (dtexec.exe) that makes SSIS packages go. If you have a very limited set of choices and aren't interested in building additional infrastructure out to support the parameters, just hard code some examples. Approximately dtexec /file p1.dtsx /set \Package.Variables[User::QueryFilter].Properties[Value];" WHERE Condition1 IS NOT NULL" Save it into .bat files, different sql agent jobs, whatever. Click and run and you're done.
Configuration approach. SSIS offers native ability to use configurations from a SQL Server table, XML, Registry, Parent Package and Environment Variable for 2005 to current edition. The only downside to this approach is that it would not support concurrent execution with different parameters like the first would.
Environment approach. 2012 and 2014, with their new Project Deployment Model, give us the concept of Environments within the SSISDB catalog which is similar to configuration with a SQL Server table but it is done after development is complete and the packages are deployed. It's rather nice as it builds out a history of values used so if someone asks why is the data all wrong, you can write a query to pull back the parameters used and Oh look someone used the initial load filter instead of the daily. Whoopsidaisy. Same concern over concurrent execution and changing values.
Table driven approach. Instead of using the Configuration with SQL Server table backing, you roll your own table and then add into your package an Execute SQL Task to retrieve the filter, Single Row, into our QueryFilter Variable.
Script Task. Use whatever floats your boat to determine what the filter should be.
Message Queue. They have built in a Message Queue Task and might be of use here if you're already doing it. Otherwise, too much effort to manage
I have a stored procedure that takes a table name and writes out a series of INSERT statements, one for each row in the table. It's being used to provide sample, "real world" data for our test environment.
It works well but some of these sample rowsets are 10, 20k records. The stored proc writes them out using the PRINT statement and it's hard to copy that many lines and paste them into the management studio to run them. Is there a SQL redirect feature I might be able to use, perhaps to write this output to a table and a way to loop through and run each statement that way? Just a thought.
I'd like to do this all from within the management studio and not have to write a C# program to create a dataset and loop over it, etc. I'm essentially looking for suggestions on a good approach. Thanks very much.
Use EXEC:
PRINT #INSERT_statement
EXEC #INSERT_statement
...to run the query.
But I'd recommend looking at bulk insertion to make the data load faster:
BULK INSERT
Where is your stored procedure getting this data?
You may want to look into importing it as a table and then running your stored procedure against that inserted table. SQL Server Management studio has many options for importing data.
If your stored proc is generating the data - then that's a whole other issue.
Say I have a stored procedure, ProcA. It calls ProcB some of the time, ProcC all of the time and ProcD if it hits an error. ProcB calls ProcE every time. ProcC calls ProcF if it hits an error.
How could I automate getting the text of ProcA along with the text of all the procs that it calls and regress all the way down the tree (A down to F)? This would help a lot with finding errors in complex sql procs.
My first thought here is get the text of ProcA, regex through it and find any calls to other procs, wash rinse repeat, at the end spit out a text (file or to UI) that looks like this:
ProcA definition
...
ProcB definition
...
...
ProcF definition
But I'm open to suggestion, perhaps there's an easier way. If there's a tool that does this already lemme know. I have no idea what to put into google on this one.
In SQL Server you have sys.sysdepends and sys.syscomments.
INFORMATION_SCHEMA.ROUTINES can be useful in the general SQL case, but it has a limit to the size of the text returned on SQL Server, so I avoid it.
this is highly database dependent, because the best way (if possible) would be to query the database's internal tables.
If you're using MS SQL in SQL 2000 Enterprise Manager right click on your stored procedure > All Tasks > Display Dependencies. In SQL 2005 Management Studio, right click on your stored procedure > View Dependencies.
It won't show you any code for the other objects that this proc id dependent on, but it will list the objects which you can then "wash, rinse, repeat"