SSIS 2008 with Stored Procedures. Transaction best practices

SSIS 2008 with Stored Procedures. Transaction best practices - sql

I'm currenlty using SSIS for some process flow, scripting, and straight data import. Most of the data cleaning and transformation is happening within stored procedures that I'm calling from SSIS execute SQL tasks. For most of the sprocs, if it fails for any reason, I don't really care about rolling back any transactions. My SSIS error handling essentially wipes out any staging data and then logs the errors to a table. (A human needs to fix the underlying data issue at that point)
My question revolves around begin tran, end tran. Are there any cases where a stored proc can fail, and then not let the calling SSIS process know? I'm looking for hardware failure, lock timeouts, etc.
I'd prefer to avoid using transactions as much as possible and rely on my SSIS error handling.
Thoughts?

Once case I can think of (and transactions won't help either) would be if the stored proc did not update or insert any records. That would not be a failure, but it might need to be for an SSIS package. You might want to return how many rows were affected and check that after.
We also do this for some imports where a number significantly off from the last import indicates a data problem. So if we usually get 100,000 records from client A in Import B and we get 5000 instead, the SSIS package fails until a human can look at it and see it the file is bad or if they genuiinely did mean to reduce their work force or customer list.
Incidentally we stage to two tables (one with the raw unchanged data and one we use for cleaning. A failure of the SSIS package should not roll those back if you want to easily see what the data issues was. You can then tell if the data was wrong from the start or if somehow it got lost or fixed incorrectly inteh cleaning process. Sometimes the place where the error got logged is not the place where the error actually occurred and it is nice to see what the data looked like unchanged and after the change process. Sometimes you have bad data, yes (Ok the majority of times) but sometimes you have a bug. Having both those tables enables you to uickly see which of the two it is.
You could have all your procs insert to a logging table as the last step and make sure that the record is there before executing the next step if you are concerned that you are losing some executions that are not bubbling back to the package.

Related

Is it enough to test a stored-procedure safely just by running it in a transaction?

I have a sp called MoveSomeItems which gets some rows from tableA from Foo Db. and moves them to tableA in Bar Db.
I want to test this sp if it really moves the items.
Is it enough to run this sp in a transaction and select the rows to see if they are moved OR I should approach it in a different way?

This depends upon what the impact of it all going wrong is? What impact would having incorrect data in the destination table be, will it kill someone, simply annoy them or is it unlikely anyone will notice? Will it be easy to fix?
There are risks associated with the approach you have given. For instance:
If the database is very busy, it is possible to cause excessive locking or even a deadlock with a transaction that may cause other transactions to fail. Setting the TRANSACTION ISOLATION LEVEL to READ UNCOMITTED and the DEADLOCK PRIORITY to LOW will help to minimise this but not eliminate it entirely.
There is the possibility that other transactions may be running in READ UNCOMMITED isolation mode. In which case they will see the results of the insert temporarily until the roll back is issued.
It is worth noting that if the procedure you are testing calls COMMIT TRANSACTION inside it you might not get the result you want when you call the ROLLBACK.
You might push the database or log to run out of disk space.
You might use up all the available CPU, Memory, Disk IO, Network or some other capacity limit.
Finally, I suspect this is not a complete list. The point I’m trying to make is that it could go wrong in strange ways.
If you have a personal development database that is fully backed up then you wouldn't even need the transaction, simply do a restore after the event. The transaction may well save you some time though. This is the safest solution.
If you are using a shared development database your approach might be acceptable enough, but I would still do a backup just in case, especially if you are already on bad terms with the team.
If you are using a live database it may still be acceptable if the system as a whole is not that critical and can sustain some downtime while you repair things. Again do a backup.
If the database you are looking at is controlling a process that is safety critical or some other mission critical function, don't even go there you may lose the no claims on your liability insurance or worse. In this instance it is best to restore a backup onto a test server and test there thus creating my first scenario. But be warned there are lots of issues that have to be considered when doing this. For instance it may be illegal to use personal information in a test system. Also there may be dependencies on other systems that will need to be mocked out to ensure you don't affect them, for example don't connect a test system to a live email server.

If I havea complex stored proc that I want to be able to test and rollback, I add an input parameter(always as the last parameter), #debug with a default value of 0 (so you don't need to specify it when you are running on prod).
Then I write code at the end to test if the parameter = 1 and if so I run any select queries to shwo me what data I want to see and then send the program to the catch block using raiseerror (Never write multiple transactions without a try catch block) and have it rollback.
This way you can easily check your results on dev and automatically rollback.

How to handle errors in a trigger?

I'm writing some SQL code that needs to be executed when rows are inserted in a database table, so I'm using an AFTER INSERT trigger; the code is quite complex, thus there could still be some bugs around.
I've discovered that, if an error happens when executing a trigger, SQL Server aborts the batch and/or the whole transaction. This is not acceptable for me, because it causes problems to the main application that uses the database; I also don't have the source code for that application, so I can't perform proper debugging on it. I absolutely need all database actions to succeed, even if my trigger fails.
How can I code my trigger so that, should an error happen, SQL Server will not abort the INSERT action?
Additionally, how can I perform proper error handling so that I can actually know the trigger has failed? Sending an email with the error data would be ok for me (the trigger's main purpose is actually sending emails), but how do I detect an error condition in a trigger and react to it?
Edit:
Thanks for the tips about optimizing performance by using something else than a trigger, but this code is not "complex" in the sense that it's long-running or performance intensive; it simply builds and sends a mail message, but in order to do so, it must retrieve data from various linked tables, and since I am reverse-engineering this application, I don't have the database schema available and am still trying to find my way around it; this is why conversion errors or unexpected/null values can still creep up, crashing the trigger execution.
Also, as stated above, I absolutely can't perform debugging on the application itself, nor modify it to do what I need in the application layer; the only way to react to an application event is by firing a database trigger when the application writes to the DB that something has just heppened.

If the operations in the trigger are complex and/or potentially long running, and you don't want the activity to affect the original transaction, then you need to find a way to decouple the activity.
One way might be to use Service Broker. In the trigger, just create message(s) (one per row) and send them on their way, then do the rest of the processing in the service.
If that seems too complex, the older way to do it is to insert the rows needing processing into a work/queue table, and then have a job continuously pulling rows from there are doing the work.
Either way, you're now not preventing the original transaction from committing.

Triggers are part of the transaction. You could do try catch swallow around the trigger code, or somewhat more professional try catch log swallow, but really you should let it go bang and then fix the real problem which can only be in your trigger.
If none of the above are acceptable, then you can't use a trigger.

Sql Server 2005 - SSIS statistics per component per run

Coming from a different ETL tool, I'm trying to figure out how to get (production) statistics on each component as it runs in SSIS.
For example, if the flat file is reading from an external source that has a high deviation (the rows/sec changes drastically at different times), I would like to know that information.
If an SSIS has a significant 'slow point' (buffer filling up / data stream impacted), I would also like to know that information.
And using sprocs for example from the DMV's, the CPU time and readIO/writeIO would also be ideal (and useful for people showing improvement by moving from sproc to SSIS in a consistent/measurable approach).
The reason I'm asking this question is I see the rows going through BIDS during debugging, but it may not reflect the actual rows/sec on each component in production.
How would one either enable/introspect/obtain these kinds of statistics for production environments (even if it takes a small hit, the numbers are a big deal).
Thanks!
-Darren

This is difficult to do in SSIS 2005. I have seen the runtime engine "just stop" when trying to perform task-level logging from event handlers in complex SSIS packages. One thought: to instrument the Data Flows only by adding Row Count Transformations just after Source Adapters and on each Data Flow Path that outputs rows. Then add an Execute SQL Task to each Data Flow Task's OnPreExecute event handler to log the start of execution, and add another Execute SQL Task to the corresponding OnPostExecute event handler. In the onPostExecute logic, store the row counts and the end time of the data flow task execution. I believe that will provide enough metrics to calculate throughput for the data flow pipeline.
Hope this helps,
Andy

Not sure if it will help, but maybe you can try to configure logging on your package and select "SSIS log provider for SQL Server Profiler"
It shows several information between begin and end of the DataSource Processing

Database Modification Scripts - Rollout / Rollback Best Practice?

I'm looking for insight into best practices regarding database scripting for modifications that go out along side other code changes for a software system.
I used to work for a company that insisted that every roll out has a roll back ready in case of issues. This sounds sensible, but in my opinion, the roll back code for database modifications deployed via scripts has the same likely hood of failing as the roll out script.
For managed code, version control makes this quite simple, but for a database schema, rolling back changes is not so easy - especially if data is changed as part of the roll out.
My current practice is to test the roll out code by running against a test database during late stage development, and then run the application against that test database. Following this, I back up the live DB, and proceed with the roll out.
I'm yet to run into a problem, but am wondering how other shops manage database changes, and what the strategy is for recovering from any bugs.

All of our database scripts go through several test phases against databases that are like our live database. This way we can be fairly certain that the modification scripts will work as expected.
For rolling back, stored procedures, views, functions, triggers, everything programmatic is easy to roll back, just apply the previous version of the object.
Like you mentioned, the challenging part comes when updating / deleting records from tables, or even adding new columns to tables. And you're right that in this case the rollback can be just as likely to fail.
What we do is if we have a change that can't be easily rolled back, but is a sensitive / critical section... is that we have a set rollback scripts that also go through the same testing environments. We run the update script, validate that it works as expected, and then run the rollback script, and validate that it works as it did prior to the modification.
Another thing that we do as just a precaution is to create a database snapshot (SQL Server 2005) prior to an update. That way if there are any unexpected issues, we can use the snapshot to recover any data that was potentially lost during the update.
So the safest course of action is to test against databases that are as close to your live system as possible, and to test your rollback scripts as well... and just in case both of those fail, have a snapshot ready just in case you need it.

SQL Diff (or something like it is always helpful if you are using a test database. It has a lot of checks and balances, safeguards, and ways of restoring or rolling back if there is an issue. Very useful.
http://www.apexsql.com/sql_tools_diff.aspx

Recover from SQL batch-abort errors inside a transaction? Alternative?

I'm looking for a way to continue execution of a transaction despite errors while inserting low-priority data. It seems like real nested transaction could be a solution, but they aren't supported by SQL Server 2005/2008. Another solution would be to have logic to decide if an error is critical or not, but it would seem that's not possible either.
Here's more detail on my scenario:
Data is periodicaly inserted in the database using ADO.NET/C#, and while some of it is vital, some could also be missing without problems. When the inserts are done, some computations are made on the data. (Both vital and non-vital) This whole process is inside a transaction so everything remains in synch.
Currently, transaction save points are used, and partial rollbacks are made on exceptions which occur during non-vital inserts. However, this doesn't work for "batch-abort" errors, which automaticly rollback the entire transaction. I understand some errors are critical, but things like failed casts are considered by SQL Server to be batch-abort errors. (Info on batch errors) I'm trying to prevent these errors from bringing down the whole insert if they occur on low priority data.
If what I'm describing isn't possible, I'm willing to consider any alternative way to achieve data integrity but allow the failure of the non-vital inserts.
Thanks for your help.

Unfortunately, can't be done as you describe (full support for nested transactions would be key here). Couple things I can think of that have been used to get around this in the past:
Best option would probably be to separate the commands into important/non-important commands that could be executed distinctly, naturally this would require that they not be order-dependent on each other
Could also use a messaging based approach (see Service Broker) where you would execute the primary commands inline and push the non-primary commands onto a queue for execution later/separately. The push to the queue would be transactional within the batch, but the execution of the command when you pop off the queue would be separate. This too would require they not be order-dependent on each other.
If order-dependent, you could use the messaging approach for everything, which would ensure order and could have separate messages per operation, then grouping them together (via conversation groups) would allow you to pull them off the queue in order as well and use separate transactions for each 'type' of operation (i.e. primary vs. non-primary). This would require some special coding on your part if all the grouped messages must be a single autonomous operation, but could be done.
I hesitate to even mention this option, because it is a terrible option, but for full disclosure I suppose you could consider it at your discretion if you think it fits (but it is definitely not an architecture that would apply to almost any scenario). You could use xp_cmdshell to call out to the command line and execute sqlcmd/osql for the non-critical tasks - this sqlcmd execution would be in a separate transaction from the module you are executing from, and simply ignoring the xp_cmdshell failure should allow the primary batch to continue.
Those are some ideas...

Can you do your import into a temporary location, using transactions only for the important parts. Once the temp location loaded, having absorbed any non-critical errors, you can copy the data into its final destination in a single transaction. Depends on the nature the work you are doing, but potentially a viable option.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas