SSIS SQL Step Seems to fail - sql

I have an SSIS package (ss2k12) I'm working on which begins with a SQL task to check if a table exists, create it if it doesn't, then truncates it. The table is a work table for a Data Flow task that follows.
When I run the task, it works. When I run the package (after deleting the table...) it fails looking for the missing table (which the sql task creates if it's missing....) Is this because it's "pre-checking" the data flow task? How do I get around the issue?

When a package receives the signal to start, the SSIS engine looks at every component and verifies that it exists, the meta data signatures match, etc. Then, when the component gets the signal that it can run, the metadata is then rechecked prior to execution.
To get around this issue, you need to use the DelayValidation property to indicate that validation should only occur when it is ready to execute.
Depending on how your package is structured, you might need to set this at both the Task (Data Flow) as well as the Package (control flow) level.

Related

How to capture data flow component name dynamically while package SSIS package is executing?

I would like to fetch the data flow component name while package is executing. Since system variable named [System::SourceName] only fetches name of the control flow tasks?
Is there a way to capture them using script component?
I want to log the data flow task component and would like to keep record of how many rows have been affected during the package run. I know Row Count gives the number of row being processed but also have to capture the name of the data flow task component. I am using SQL Server 2012
I would enable the built-in Logging feature, and direct the output to the msdb database. Then I would query the sysssislog table.
Then I would enjoy a peaceful cup of tea while I contemplated how much time had I saved vs reinventing the wheel with a custom solution. I might even stare out the window for a while ...

SSIS Clear configuration variables values after package run / before saving

I have some SSIS packages that connect to an Oracle database. The connection parameters are stored in a SQL database and retrieved by using the Package Configuration tool.
My problem is that the variable that gets populated automatically by SSIS with the configuration string does not get emptied after the package is run. As a result, the value of the variable get saved in the source code when the package is saved. I DO NOT want to have this variable value saved in my source files.
Any idea on how to prevent this from happening?
Thanks!
You can try setting the variable in a package configuration. The way you do this is simple.
First go to the top most layer of your package and right click on the empty space and select package configurations. Choose to add add. Give a location and name for the file and then click next.
Once you've done that choose the variable you want and set the value like this.
Now you're not storing the actual value in the package. Just the information for how to find it.
EDIT: I may not have been clear on this. This process will create a completely separate file that the package will look to to get that expression. This way you don't have to store the expression or the value in the package itself. It just knows at run time to go look for that config file for any additional data.
EDIT 2: The package configuration will only overwrite when you execute the package in BIDS over Visual Studio. The reason this happens is because the package evaluates and then saves prior to run time. This does not happen when you are using SQL agent to run a package and therefore will not store the value or the expression in the source code. I hope I have clarified that for you.

Check for multiple files

Okay, I'll try to explain as good as I can... Quite a particular case.
Tools: SSIS 2008
We have a control flow that now needs to be triggered by an event: the presence of one or multiple files. (1,2 or 3)
The variables used:
BO_FileLocation_1
BO_FileLocation_2
BO_FileLocation_3
BO_FileName_1
BO_FileName_2
BO_FileName_3
There can be one, two or three files: defined in above variables. When they are filled in,
they should be processed. When they are empty, this means there's just one file file, the process should ignore them and jump to the next (file watcher?) task.
For example:
BO_FileLocation_1= "C:\"
BO_FileLocation_2 NULL
BO_FileLocation_3 NULL
BO_FileName_1= "test.csv"
BO_FileName_2 NULL
BO_FileName_3 NULL
The report only needs one file.
I'd need a generic concept that checks the presence of these files, it could be more generic than my SSIS knowledge can handle right now. For example handy, when there's a 4th file in the future. I was also thinking to work with a single script to handle all the logic.
Thanks in advance
A possibly irrelevant image:
If all you want is to trigger the Copy Source File to handle if one or more of the files is present, just use the OR Constraint in your flow. The following image shows you how:
First connect all to the destination:
Then click one of the green arrows. This will make its properties window pop up. Select the Logical ORinstead of the Logical AND:
If everything went well, you should now see the connections as dashed lines:
There are several possible solutions:
Create a sequence container and include all the file imports in the sequence container. Add int variables for RowCountFile1, RowCountFile2, and RowCountFile3 and set the value to 0 (this is the default value when you create an int variable). Add a RowCount transformation to each of the data flows. Create a precedence constraint from the sequence container to the "Do something" task. Set the precedence constraint to success and expression. Set the expression value to #RowCountFile1 > 0 || #RowCountFile2 > 0 || #RowCountFile3 > 0. The advantage of this approach is that you can take an action as soon as the files are detected, you import all available files, and you only take an action after all the files have been imported. You could then schedule running this SSIS package as a SQL Server Agent job step and run it as frequently as you want.
A variant on solution 1 is to use for each file enumerator containers inside the sequence container. This would be useful if you don't know the exact name of the file and you expect to import more than one under some circumstances. For instance, if you get a file every few minutes with a timestamp in its file name and your process doesn't run for some reason, then you may have to process multiple files to get caught up and then take an action once it has been done.
You could use the file watcher task as you outlined in your question. The only problem I have with the file watcher task is that the package has to be in a constantly running state. This makes it hard to troubleshoot problems and performance. It also can introduce other problems since I remember having some problems with the file watcher task years ago when it first came out. It may well be a totally stable task now, but I prefer other methods over the task after having been burned previously. If you really want the package to run continously instead of having it be called by a job, then you could always use a script task to check for file, sleep thread if not found, check again, etc. I'm sure that's what the file watcher task does, but I would trust my own C# over the task. Power to anyone who has had better experiences than me with File Watcher...
Use PowerShell. If you just want to take an action if a file appears and you aren't importing the data, then a PowerShell script could do this just as well as a SSIS package. The drawback is that you have to learn some basic PowerShell, it may be hard to maintain in the future since PowerShell is probably not your bread and butter core language, and you may have to rewrite the code again to a SSIS package if you want to import the data. You would probably call the PowerShell script from a SQL Server Agent job step, so scheduling can be handled pretty easily.
There are more options than what I listed, so let me know if you still want more suggestions.

Is it possible to force an error in an Integration Services data flow to demonstrate its rollback?

I have been tasked with demoing how Integration Services handles an error during a data flow to show that no data makes it into the destination. This is an existing package and I want to limit the code changes to the package as much as possible (since this is most likely a one time deal).
The scenario that is trying to be understood is a "systemic" failure - the source file disappears midstream, or the file server loses power, etc.
I know I can make this happen by having the Error Output of the source set to Failure and introducing bad data but I would like to do something lighter than that.
I suppose I could add a Script Transform task and look for a certain value and throw an error but I was hoping someone has come up with something easier / more elegant.
Thanks,
Matt
mess up the file that you are trying to import by pasting some bad data or saving it in another format like UTF-8 or something like that
We always have a task at the end that closes the dataflow in our meta data tables. To test errors, I simply remove the ? that is the variable for the stored proc it runs. Easy to do and easy to fix back the way it was and it doesn't mess up anything datawise as our error trapping then closes the the data flow with an error. You could do something similar by adding a task to call a stored proc with an input variable but assign no parameters to it so it will fail. Then once the test is done, simply disable that task.
Data will make it to the destination if it is not running as a transaction. If you want to prevent populating partial data you have to use transactions. Then there is an option to set the end result of a control flow item as "failed" irrespective of the actual result but this is not available in data flow items. You will have to either produce an actual error in the data or code in a situation that will create an error. There is no other way...
Could we try with transaction level property of the package?
On failure of the data flow it will revert all the data from the target.
On successful dataflow only it will commit the data to target otherwise it will roll back the data from target.

SSIS Intermittent variable error: The system cannot find the file specified

Our SSIS pacakges a structured as one Control package and many child packages (about 30) that are invoked from the control package. The child packages are invoked with Execute Package Task. There is one Execute Package Task per child package. Each Execute Package Task uses File Connection Manager to specify path to the child package dtsx file. There is one File Connection Manager per child package. Each File Connection Manager has an expression defined for ConnectionString property. This expression looks like this:
#[Template::FolderPackages]+"MyPackage.dtsx"
The file name is different for each package. The variable (FolderPackages) is specified in the SSIS package configuration file.
The error that is generated during run time is
Error 0x80070002 while loading package file "MyPackage.dtsx"
The system cannot find the file specified." The package that fails is different from run to run and sometimes no packages fail at all. This is when run on exactly the same environment/data etc.
I ran FileMon during this error and found out that when the error happens SSIS tries to read the dtsx file from a wrong place, namely from system32. I checked that this is identical to what would happen if #[Template::FolderPackages] variable were empty, but because the very same variable is used for every child package and works for some but doesn't work sometimes for others, I have no expalnation to this fact.
Anything obvious, or time to raise a support call with Microsoft?
Are you using Expressions on the SSIS variables directly? Variables with Expressions are calculated each time the variable is referenced by the consuming object which needs to use it. That is where the race condition bug exists, because sometimes the expression doesn't get evaluated if another thread is already evaluating a different variable, and the default value for the variable is provided to the consumer object.
If that matches your design, these two bugs on the connect site discuss the problem, and the workarounds:
https://connect.microsoft.com/SQLServer/feedback/details/332372/ssis-variable-expressions-dont-always-evaluate
A second one at
connect.microsoft.com/SQLServer/feedback/details/406534/ssis-2008-variable-expressions-dont-always-evaluate
A summary of workarounds is
{
- Note the parallel tasks that could run in you SSIS control flow and utilize these expression variables. If you have two tasks side-by-side if each relies upon the same variable, and that variable has an Expression to set its value, then you could hit this.
Manually sequentialize such tasks, so that they don't run in parallel. Ie. Add a green arrow on the control flow, so that the tasks occur in order Task1, Task2, Task3, rather than side-by-side on parallel paths and rather than inside the same container with no paths.
You could avoid variable expressions: Assigning local variables in the required order using a home-made script task that does the same kind of work, so that variables are not evaluated using expressions (ie. the thing which can hit this race condition). In other words, manually assign the variable values at a point in time in your control flow just before they are used. The point of using expressions on variables is to dynamically set a value based on another value whenever it is used, so this acheives a similar design goal but in a manual way.
Reduce threads to minimize potential: Setting the Dataflow task EngineThreads to 1 and MaxConcurrentExecutables to 1. This will help sequentalize execution of your package to one task at a time, but that has the side effect which may cause slower performance.
Create and set values on distinct copies of variables at different scope levels in the design, so that they evaluate in different parallel execution scopes and avoid the expression evaluation on parallel threads. Master::Var1, Child1::Var1, Child2::Var1
}
A bit of a stab in the dark but...
I've had a similar issue with variables where readonly=false and multiple components were reading the variable at the same time and causing locking issues.
I consistently recreated the problem by running a pair of dataflows that did nothing but reference the variable inside a for loop container and changed the variable to be read only and this resolved the problem.
If you temporarily hardcode the package name does this resolve the issue?
Turns out after sending trace info to Microsoft that we are encountering heap corruption. I'll update this question if we get to the bottom of it.
The current suggestion is to disable heap lookaside for dtexec.exe.
The official answer to this issue is that it is a bug in SQL 2005 and 2008. Many tasks accessing the same variable cause a race condition, and some tasks get the default value for the expression instead of the evaluated value.
The workaround is to ensure that the default value (the value defined in the property sheet for whatever property you are having trouble with) should be the value that will work in your production environment.
This way, when the race condition happens in prod, SSIS will fall back to the package value, which will still work.
In dev? Well you're just going to have to deal with that manually until we get a bug fix from Microsoft.
There is a KB article relating to this issue: http://support.microsoft.com/kb/2448991 which states when and where this was fixed.