SSIS Intermittent variable error: The system cannot find the file specified - sql

Our SSIS pacakges a structured as one Control package and many child packages (about 30) that are invoked from the control package. The child packages are invoked with Execute Package Task. There is one Execute Package Task per child package. Each Execute Package Task uses File Connection Manager to specify path to the child package dtsx file. There is one File Connection Manager per child package. Each File Connection Manager has an expression defined for ConnectionString property. This expression looks like this:
#[Template::FolderPackages]+"MyPackage.dtsx"
The file name is different for each package. The variable (FolderPackages) is specified in the SSIS package configuration file.
The error that is generated during run time is
Error 0x80070002 while loading package file "MyPackage.dtsx"
The system cannot find the file specified." The package that fails is different from run to run and sometimes no packages fail at all. This is when run on exactly the same environment/data etc.
I ran FileMon during this error and found out that when the error happens SSIS tries to read the dtsx file from a wrong place, namely from system32. I checked that this is identical to what would happen if #[Template::FolderPackages] variable were empty, but because the very same variable is used for every child package and works for some but doesn't work sometimes for others, I have no expalnation to this fact.
Anything obvious, or time to raise a support call with Microsoft?

Are you using Expressions on the SSIS variables directly? Variables with Expressions are calculated each time the variable is referenced by the consuming object which needs to use it. That is where the race condition bug exists, because sometimes the expression doesn't get evaluated if another thread is already evaluating a different variable, and the default value for the variable is provided to the consumer object.
If that matches your design, these two bugs on the connect site discuss the problem, and the workarounds:
https://connect.microsoft.com/SQLServer/feedback/details/332372/ssis-variable-expressions-dont-always-evaluate
A second one at
connect.microsoft.com/SQLServer/feedback/details/406534/ssis-2008-variable-expressions-dont-always-evaluate
A summary of workarounds is
{
- Note the parallel tasks that could run in you SSIS control flow and utilize these expression variables. If you have two tasks side-by-side if each relies upon the same variable, and that variable has an Expression to set its value, then you could hit this.
Manually sequentialize such tasks, so that they don't run in parallel. Ie. Add a green arrow on the control flow, so that the tasks occur in order Task1, Task2, Task3, rather than side-by-side on parallel paths and rather than inside the same container with no paths.
You could avoid variable expressions: Assigning local variables in the required order using a home-made script task that does the same kind of work, so that variables are not evaluated using expressions (ie. the thing which can hit this race condition). In other words, manually assign the variable values at a point in time in your control flow just before they are used. The point of using expressions on variables is to dynamically set a value based on another value whenever it is used, so this acheives a similar design goal but in a manual way.
Reduce threads to minimize potential: Setting the Dataflow task EngineThreads to 1 and MaxConcurrentExecutables to 1. This will help sequentalize execution of your package to one task at a time, but that has the side effect which may cause slower performance.
Create and set values on distinct copies of variables at different scope levels in the design, so that they evaluate in different parallel execution scopes and avoid the expression evaluation on parallel threads. Master::Var1, Child1::Var1, Child2::Var1
}

A bit of a stab in the dark but...
I've had a similar issue with variables where readonly=false and multiple components were reading the variable at the same time and causing locking issues.
I consistently recreated the problem by running a pair of dataflows that did nothing but reference the variable inside a for loop container and changed the variable to be read only and this resolved the problem.
If you temporarily hardcode the package name does this resolve the issue?

Turns out after sending trace info to Microsoft that we are encountering heap corruption. I'll update this question if we get to the bottom of it.
The current suggestion is to disable heap lookaside for dtexec.exe.

The official answer to this issue is that it is a bug in SQL 2005 and 2008. Many tasks accessing the same variable cause a race condition, and some tasks get the default value for the expression instead of the evaluated value.
The workaround is to ensure that the default value (the value defined in the property sheet for whatever property you are having trouble with) should be the value that will work in your production environment.
This way, when the race condition happens in prod, SSIS will fall back to the package value, which will still work.
In dev? Well you're just going to have to deal with that manually until we get a bug fix from Microsoft.

There is a KB article relating to this issue: http://support.microsoft.com/kb/2448991 which states when and where this was fixed.

Related

Accessing the attempt counter for change in job behaviour

I have a job that might fail with a specific configuration, what I want to do is let it run once and if it fails run a slightly different configuration.
I found the attempt parameter but I don't find a way to access it outside the resources tag...
do you know how to access it or any alternative?
the attempt counter is contained as argument "'--attempt' 'int'" to the jobscript (in my case a wrapper script in python)
therefore you can access it for example with:
sys.argv[sys.argv.index('--attempt')+1]

SSIS SQL Step Seems to fail

I have an SSIS package (ss2k12) I'm working on which begins with a SQL task to check if a table exists, create it if it doesn't, then truncates it. The table is a work table for a Data Flow task that follows.
When I run the task, it works. When I run the package (after deleting the table...) it fails looking for the missing table (which the sql task creates if it's missing....) Is this because it's "pre-checking" the data flow task? How do I get around the issue?
When a package receives the signal to start, the SSIS engine looks at every component and verifies that it exists, the meta data signatures match, etc. Then, when the component gets the signal that it can run, the metadata is then rechecked prior to execution.
To get around this issue, you need to use the DelayValidation property to indicate that validation should only occur when it is ready to execute.
Depending on how your package is structured, you might need to set this at both the Task (Data Flow) as well as the Package (control flow) level.

Check for multiple files

Okay, I'll try to explain as good as I can... Quite a particular case.
Tools: SSIS 2008
We have a control flow that now needs to be triggered by an event: the presence of one or multiple files. (1,2 or 3)
The variables used:
BO_FileLocation_1
BO_FileLocation_2
BO_FileLocation_3
BO_FileName_1
BO_FileName_2
BO_FileName_3
There can be one, two or three files: defined in above variables. When they are filled in,
they should be processed. When they are empty, this means there's just one file file, the process should ignore them and jump to the next (file watcher?) task.
For example:
BO_FileLocation_1= "C:\"
BO_FileLocation_2 NULL
BO_FileLocation_3 NULL
BO_FileName_1= "test.csv"
BO_FileName_2 NULL
BO_FileName_3 NULL
The report only needs one file.
I'd need a generic concept that checks the presence of these files, it could be more generic than my SSIS knowledge can handle right now. For example handy, when there's a 4th file in the future. I was also thinking to work with a single script to handle all the logic.
Thanks in advance
A possibly irrelevant image:
If all you want is to trigger the Copy Source File to handle if one or more of the files is present, just use the OR Constraint in your flow. The following image shows you how:
First connect all to the destination:
Then click one of the green arrows. This will make its properties window pop up. Select the Logical ORinstead of the Logical AND:
If everything went well, you should now see the connections as dashed lines:
There are several possible solutions:
Create a sequence container and include all the file imports in the sequence container. Add int variables for RowCountFile1, RowCountFile2, and RowCountFile3 and set the value to 0 (this is the default value when you create an int variable). Add a RowCount transformation to each of the data flows. Create a precedence constraint from the sequence container to the "Do something" task. Set the precedence constraint to success and expression. Set the expression value to #RowCountFile1 > 0 || #RowCountFile2 > 0 || #RowCountFile3 > 0. The advantage of this approach is that you can take an action as soon as the files are detected, you import all available files, and you only take an action after all the files have been imported. You could then schedule running this SSIS package as a SQL Server Agent job step and run it as frequently as you want.
A variant on solution 1 is to use for each file enumerator containers inside the sequence container. This would be useful if you don't know the exact name of the file and you expect to import more than one under some circumstances. For instance, if you get a file every few minutes with a timestamp in its file name and your process doesn't run for some reason, then you may have to process multiple files to get caught up and then take an action once it has been done.
You could use the file watcher task as you outlined in your question. The only problem I have with the file watcher task is that the package has to be in a constantly running state. This makes it hard to troubleshoot problems and performance. It also can introduce other problems since I remember having some problems with the file watcher task years ago when it first came out. It may well be a totally stable task now, but I prefer other methods over the task after having been burned previously. If you really want the package to run continously instead of having it be called by a job, then you could always use a script task to check for file, sleep thread if not found, check again, etc. I'm sure that's what the file watcher task does, but I would trust my own C# over the task. Power to anyone who has had better experiences than me with File Watcher...
Use PowerShell. If you just want to take an action if a file appears and you aren't importing the data, then a PowerShell script could do this just as well as a SSIS package. The drawback is that you have to learn some basic PowerShell, it may be hard to maintain in the future since PowerShell is probably not your bread and butter core language, and you may have to rewrite the code again to a SSIS package if you want to import the data. You would probably call the PowerShell script from a SQL Server Agent job step, so scheduling can be handled pretty easily.
There are more options than what I listed, so let me know if you still want more suggestions.

Reusing tasks in SSIS

How can a task be reused in SSIS without copy/paste?
For example, I'd like to use the tasks I've defined in an event handler for one executable in another executable, but not with all executables in the package. So far, I haven't found any solutions other than writing a complete custom component, which seems like overkill. Any suggestions?
Have you considered using an event at the package level, and filtering to only fire when your particular condition requires it?
E.g. you could use the OnPostExecute event just by putting a dummy task in your flow with a name that starts with a specific string like "RunMyTasks", and then check the System::SourceName to see if it starts with "RunMyTasks". If it does, then branch to run your tasks (and otherwise branch to handle the event as you normally would).
You could do a similar thing using OnVariableValueChanged - this might be better (although you'd need to test it). Create a variable with RaiseChangedEvent=TRUE. Create a script task / component to change the value of the variable; finally, put your task set into the event handler.
Check the scoping notes at the bottom of Jamie's post here.
If you can use third-party solutions, check the commercial CozyRoc SSIS+ library. It includes enhanced Script Task Plus, which allows export of script to external file and then link and reuse in other packages.

What are the pitfalls of using a separate SSIS package for centralized error management?

We have a few loosely coupled SSIS packages that are in charge of batch integration. When they have an error (validation issue with data, or an actual OnError error) then they all do the same thing, they email a message to a distribution list. The content of the message varies, and sometimes other people need to be cc'd on the message. But it is basically the same process for everything.
I am thinking of creating a single ErrorHandler package that has a few parameters (error message, cc address, subject line etc) and just getting the parent packages to run an Execute Package Step when they need to send an error message.
The way I see it, we then have one single SSIS package that allows us to manage what we do with the incoming errors. If we decide we want to write stuff into a log file, or call a web service, it only has to be changed in the one place.
Limited testing so far looks fine. Am I missing something obvious here? Why doesn't everybody do this? Is there a transactional or cascading issue that could be a problem?
Well, one pitfall we have encountered now we've started down this road is that you have to keep your SSIS packages in the same project otherwise they can't invoke each other.
This negates some of the good reasons we chose to use this pattern in the first place.
For instance, SSIS Parameters can be defined at the package level or the project level, but not in between, so now we have parameters that need to be shared between three packages, but which are visible to/shared across ALL packages because they must be kept at the project level.
You can't organise SSIS packages within a project by, for instance, grouping them into subfolders. The only way to visually organise them is to introduce some kind of naming schema.
There are ways to overcome this, by invoking packages via stored procs or script steps, but these come with their own shortfalls. (For instance, you can specify the Environment you want to use when invoking a package from a stored proc, but how does the calling package know what Environment it is being run in in order to invoke the child package with the same Environment? By Environment I refer to the logical grouping of parameters into environments such as Test, Staging, Prod that can be performed on the SQL Server.)
Microsoft is aware of this issue, but does not seem to want to do anything about it. https://connect.microsoft.com/SQLServer/feedback/details/779789/ssis-2012-execute-package-task-external-reference-does-not-work-with-ssisdb