SSIS 2012 Package hangs randomly - sql

Intially the package was in the package deployment model (SSIS 2008), which export the data to a local CSV file in parallel from a local database.
I've converted it to Project deployment Model and now the same parallelism exists but by calling a child package (utilizing 26 threads) through Execute Package Task (earlier it was through Execute Process Task) using the Execute-Out-of-Process in-order to utilize the resources
The child package picks a random customer out of 15K customers and exports it's related data from a view to the CSV file.
<>
The customer are placed in a table and all the threads read the table and a mutex is applied over it using the TABLOCKX, whichever thread gets the write access first will pick-up the customer and modifies the load status to 'Progress'. The other threads waiting for the write access will follow the same process.
The process in each thread is repeated for all the customers using the "Forloop" container
For the 576 executions it exports good and quickly but surprisingly it hangs up for several minutes at the 576th execution of a random customer. I've tried to repro it for several times and it hangs up at the same point.
Your help on this is very much appreciated!!
PS: The issue is not there in the earlier version of my package

There is a bug in SSIS 2012 due to which my Migrated package Hangs.
SSIS package with multiple child packages when executed all at once creates a deadlock in the internal Catalog tables. So running a child package with multiple parallel thread should be avoid. If needed, run them with few milli seconds of delay (> 100 ms).
Adding a delay resolved the problem. Hope this bug will be resolved by Microsoft in the later versions of SSIS

Related

mulesoft batch job is not executed

I work in Runtime 4.1.5 and batch job undertakes the work of synchronizing data
If the batch job is completed normally, the log should look like this:
Created instance 'dc97a040-009e-11ec-a7bf-00155d801499' for batch job 'sendFlow_Job'
splitAndLoad: Starting loading phase for instance 'dc97a040-009e-11ec-a7bf-00155d801499' of job 'sendFlow_Job'
Finished loading phase for instance dc97a040-009e-11ec-a7bf-00155d801499 of job sendFlow_Job. 1 records were loaded
Started execution of instance 'dc97a040-009e-11ec-a7bf-00155d801499' of job 'sendFlow_Job'
batch step customer log ....
Finished execution for instance 'dc97a040-009e-11ec-a7bf-00155d801499' of job 'sendFlow_Job'. Total Records processed: 1. Successful records: 1. Failed Records: 0
=================end=======================
The log in question is as follows:
Created instance 'dc97a040-009e-11ec-a7bf-00155d801499' for batch job 'sendFlow_Job'
splitAndLoad: Starting loading phase for instance 'dc97a040-009e-11ec-a7bf-00155d801499' of job 'sendFlow_Job'
Finished loading phase for instance dc97a040-009e-11ec-a7bf-00155d801499 of job sendFlow_Job. 1 records were loaded
Started execution of instance 'dc97a040-009e-11ec-a7bf-00155d801499' of job 'sendFlow_Job'
=================end===================
It can be clearly seen that the log shows that the batch job has only completed the first stage of work. After that, the batch job is as if it has never existed, there is no log output, and no errors are thrown.And from the target database, the data is indeed not synchronized
I tested it in the local environment and the problem was reproduced. Use kill -9 to kill the process while the batch step is executing, then the process will restart, and then all batch jobs will have problems
I found the queue file used by batch job in the .mule folder. It is similar to BSQ-batch-job-flow-name-dc97a040-009e-11ec-a7bf-00155d801499-XXX
Under normal circumstances, each batch job will create three BSQ file and delete it at the mplete.
In my question, the BSQ file will be created but not deleted
I looked up some posts and they suggested deleting the .mule folder and restarting. In the actual environment, I don’t know when there will be a problem and deleting the .mule folder does not completely solve the problem of batch job not being executed.
Is anyone proficient in mule batch job? Can you give me some suggestions, thanks
You should not delete the .mule directory. There is other information in there unrelated to batch that would be lost, like clustering configurations, persistent object stores, other applications batches and queues. It may be ok to delete it inside the Studio embedded runtime because that just your development environment and you probably are not losing production data, but in any case is not a solution just to delete information.
There are too many possible causes to identify the right one, and you should provide a lot more information. My first recommendation is to ensure your Mule 4.1.5 has the latest cumulative patch to ensure all known issues are resolved. Note that Mule 4.1.5 has been released almost 3 years ago. If possible at all migrate to the latest Mule 4.3.0 with the latest cumulative patching. It should be more stable and performant than 4.1.5.

SQL Server agent job failure handling

My team relies heavily on SSIS to manage the upkeep of our large datamart. We have over 1000 jobs, and 4000 packages. Managing failing jobs and packages is a never-ending task.
We're currently using SSIS package "error-handling" flows. However, I was wondering if there were existing tools or features (or strategies) for handling foreseeable job failures?
Most job failures are due to an exogenous problem (such as a source database not being available) - however, our solution to these problems is often "take no action - rerun job on next day". How can we code ourselves out of doing this manually (for at least a large percentage of these problems, if not all of them?

SSIS - Connection Management Within a Loop

I have the following SSIS package:
alt text http://www.freeimagehosting.net/uploads/5161bb571d.jpg
The problem is that within the Foreach loop a connection is opened and closed for each iteration.
On running SQL Profiler I see a series of:
Audit Login
RPC:Completed
Audit Logout
The duration for the login and the RPC that actually does the work is minimal. However, the duration for the logout is significant, running into several seconds each. This causes the JOB to run very slowly - taking many hours. I get the same problem when running either on a test server or stand-alone laptop.
Could anyone please suggest how I may change the package to improve performance?
Also, I have noticed that when running the package from Visual Studio, it looks as though it continues to run with the component blocks going amber then green but actually all the processing has been completed and SQL profiler has dropped silent?
Thanks,
Rob.
Have you tried running your data flow task in parallel vs serial? You can most likely break up your for loops to enable you to run each 'set' in parallel, so while it might still be expensive to login/out, you will be doing it N times simultaneously.
SQL Server is most performant when running a batch of operations in a single query. Is it possible to redesign your package so that it batches updates in a single call, rather than having a procedural workflow with for-loops, as you have it here?
If the design of your application and the RPC permits (or can be refactored to permit it), this might be the best solution for performance.
For example, instead of something like:
for each Facility
for each Stock
update Qty
See if you can create a structure (using SQL, or a bulk update RPC with a single connection) like:
update Qty
from Qty join Stock join Facility
...
If you control the implementation of the RPC, the RPC could maintain the same API (if needed) by delegating to another which does the batch operation, but specifies a single-record restriction (where record=someRecord).
Have you tried doing the following?
In your connection managers for the connection that is used within the loop, right click and choose properties. In the properties for the connection, find "RetainSameConnection" and change it to True from the default of False. This will let your package maintain the connection throughout your package run. Your profiler would then probably look like:
Audit Login
RPC:Completed
RPC:Completed
RPC:Completed
RPC:Completed
RPC:Completed
RPC:Completed
...
Audit Logout
With the final Audit Logout happening at the end of package execution.

PSEXEC...DTEXEC ERROR 128

I'm using PSEXEC within batch files to execute DTEXEC (SQL SSIS jobs) as part of a scheduling system. What I'm finding is when a bunch of jobs are triggered together (or even close to one another) I get multiple ERROR 128 messages and the DTEXEC jobs immediately abort. I'm guessing there is some sort of problem running multiple instances of DTEXEC (or at least a maximum allowed number).
Aside from staggering the jobs is there any other settings or ways to avoid the errors?
Sounds similar to this error resolved here:
SSIS Package Execution using dtexec utility

Spawning multiple SQL tasks in SQL Server 2005

I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.