Trapping All Batch Job from MVS - batch-processing

I'm trying to trap all the batch Job from MVS.
I want to transmit all the batch job information (start,end,error) to an external system in order to conduct further analysis.
Has anyone got any idea on how to do this ?

Write an IEFACTRT exit (or whatever its modern day equivalent is) and have the systems programmers install it.

IBM actually provides a facility for this. You can have it write SMF (System Management Facility) records for all jobs. The record layouts are available and you can write code to do analysis on them or you can get 3rd party products like OmegaMon that will do the analysis and reporting for you.

as in my shop, we print the job info into plain files, and ftp down to some file servers and from where we run extract/format with some scripts and pull the data into BI platform for later analysis/visualisation.
Currently, we are studying to utilise the power of graph db like Neo4j to deeper understand our batch job relationship/better present the job relationship with people who interested. and for now we think graph db is a very neat tool for such kind of thing(batch job management)...
Hope my answer can give you some inspiration/reminders...

Typically, installations cut SMF type 30 records. Subtype 1 is written when a new transaction is started. transaction means, System Resources Manager (SRM) transaction. Don't confuse it with transactions in the context of e.g. a database system. A batch job that begins execution is such a transaction. Subtype 5 is written when a transaction ends. Along with subtype 5, there is a completion section that reports the job termination status.
Now, SMF processing is traditionally done in batch as you have to prepare the SMF records first either by extracting them from the log stream or from one of the SYS1.MANx data sets.
But recently, capabilities have been added to z/OS that allow you to hook into the process when SMF records are written. A product like the IBM Common Data Provider for z/OS can be used transform the data in the way you want it to be and to stream it to a destination of choice, for instance logstash. Following such a technique allows to process SMF records almost in real time.

Related

Setting timeout for query in oracle

We have data warehouse setting where we use oracle 12c and informatica for ETL. We call some hourly procedures in informatica workflow. Sometimes these procedures take more than one hour for various reasons. Is it possible to set timeout event to generate mail alert at database level or informatica level which will terminate current execution and generate mail alert for the same.
Best Regards
Well... no. This and a bunch of other features are not part of Informatica. Here an external Orchestration tool is very much helpful. One that takes care of file watching and triggering workflows upon file arrival, reports that a workflow runs too long or too short, one that will notify in case a file you expect to get has not been received and so on.

SQL Server: Using triggers for workflow automation

In a media management system my task is to create a workflow automation. Currently, i have created it using SQL Server triggers and the UI using ASP.NET with JQuery.
For Ex:
When a new file enters the system the trigger works and it will update the database metadata table with some data for that file.
Millions of assets get through the system. Is it ideal to have triggers to do this process.
Is there a better way to create this automation?
Is there a "best practice" to do this kind of works?
I'm having the same issue and data enters my central asset database on several ways (may differ from client to client).
So I also want to create an easily customizable workflow in the data layer (no other dependencies)
As the other people mention, triggers may affect the parent activity.
That is overcome by writing your action that should be performed away to a queue table.
Example Trigger. Hardware.Status = "Issue Work Order"
INSERT INTO Queue (Created, Task, Completed) VALUES (GETUTCDATE(),"EXEC dbo.IssueWorkOrder(123)",0);
The insert of a record into your queue table will reduce the problems as highlighted by other user comments.
The you build a scheduling tool (hangfire, sql tasks, or whatever), that execute tasks in the queue in the data order it wAS added.
Now, of course in practice it's not as simple as that. You will have to address the following:
What if the step fails2
Dependencies of previous steps to first have been completed
Multiple operators changing a record. (the deploy time between the job step being executed, and another person updating the same record.
I guess #2 and #3 is an issue with any workflow engine / pipleline. To address this a locking mechanism must be put in place.

What is the practice for scheduling multiple inter-dependent SQL Server Agent jobs?

The way my team currently schedules jobs is through the SQL Server Job Agent. Many of these jobs have dependencies on other internal servers which in turn have their own SQL Server Jobs that need to be run to keep their data up to date.
This has created dependencies in the start time and length of each of our SQL Server Jobs. Job A might depend on Job B finishing, so we schedule Job B a certain estimated time in advance to Job A. All of this process is very subjective and not scalable, as we add more jobs and servers which create more dependencies.
I would love to get out of the business of subjectively scheduling these jobs and hoping that the dominos fall in the right order. I am wondering what the accepted practices for scheduling SQL Server jobs are. Do people use SSIS to chain jobs together? Is there tooling already built into the SQL Server Job Agent to handle this?
What is the accepted way to handle the scheduling of multiple SQL Server jobs with dependencies on each other?
I have used Control-M before to schedule multiple inter-dependent jobs in different environment. Control-M generally works by using batch files (from what I remember) to execute SSIS packages.
We had a complicated environment hosting 2 data warehouses side by side (1 International and 1 US Local). There were jobs that were dependent on other jobs and those jobs on others and so on, but by using Control-M we could easily decide on the dependency (It has a really nice and intuitive GUI). Other tool that comes to my mind is Tidal Scheduler.
There is no set standard for job scheduling, but I think its safe to say that job schedules depend entirely on what an organization needs. For example Finance jobs might be dependent on Sales and Sales on Inventory and so on. But the point is, if you need to have job inter dependency, using a third party software such as Control-M is a safe bet. It can control jobs on different environments and give you real sense of the company wide job control.
We too had the requirement to manage dependencies between multiple agent jobs - after looking at various 3rd party tools and discounting them for various reasons (mainly down to the internal constraints relating to the use of 3rd party software) we decided to create our own solution.
The solution centres around a configuration database that holds details about processes (jobs) that need to run and how they are grouped (batches), along with the dependencies between processes.
Summary of configuration tables used:
Batch - highlevel definition of a group of related processes, includes metadata such as max concurrent processes, and current batch instance etc.
Process - meta data relating to a process (job) such as name, max wait time, earliest run time, status (enabled / disabled), batch (what batch the process belongs to), process job name etc.
Batch Instance - the active instance of a given batch
Process Instance - active instances of processes for a given batch
Process Dependency - dependency matrix
Batch Instance Status - lookup for batch instance status
Process Instance Status - loolup for process instance status
Each batch has 2 control jobs - START BATCH and UPDATE BATCH. The 1st deals with starting all processes that belong to it and the 2nd is the last to run in any given batch and deals with updating the outcome statuses.
Each process has an agent job associated with it that gets executed by the START BATCH job - processes have a capped concurrency (defined in the batch configuration) so processes are started up to a max of x at a time and then START BATCH waits until a free slot becomes available before starting the next process.
The process agent job steps call a templated SSIS package that deals with the actual ETL work and with the decision making around whether the process needs to run and has to wait for dependencies etc.
We are currently looking to move to a Service Broker solution for greater flexibility and control.
Anyway, probably too much detail and not enough example here so VS2010 project available on request.
I'm not sure how much this will help, but we ended up creating an email solution for scheduling.
We built an email reader that accesses an exchange mailbox. As jobs finish, they send an email to the mail reader to start another job. The other nice part, is that most applications have email notifications built in, so there really isn't much in the way of custom programming.
We really only built it in the first place to handle data files coming in from lots of other partners. It was much easier to give them an email address rather than setting them up with an ftp site, etc.
The mail reader app now has grown to include basic filtering, time of day scheduling, use of semaphores to prevent concurrent jobs, etc. It really works great.

Sql Server 2005 - SSIS statistics per component per run

Coming from a different ETL tool, I'm trying to figure out how to get (production) statistics on each component as it runs in SSIS.
For example, if the flat file is reading from an external source that has a high deviation (the rows/sec changes drastically at different times), I would like to know that information.
If an SSIS has a significant 'slow point' (buffer filling up / data stream impacted), I would also like to know that information.
And using sprocs for example from the DMV's, the CPU time and readIO/writeIO would also be ideal (and useful for people showing improvement by moving from sproc to SSIS in a consistent/measurable approach).
The reason I'm asking this question is I see the rows going through BIDS during debugging, but it may not reflect the actual rows/sec on each component in production.
How would one either enable/introspect/obtain these kinds of statistics for production environments (even if it takes a small hit, the numbers are a big deal).
Thanks!
-Darren
This is difficult to do in SSIS 2005. I have seen the runtime engine "just stop" when trying to perform task-level logging from event handlers in complex SSIS packages. One thought: to instrument the Data Flows only by adding Row Count Transformations just after Source Adapters and on each Data Flow Path that outputs rows. Then add an Execute SQL Task to each Data Flow Task's OnPreExecute event handler to log the start of execution, and add another Execute SQL Task to the corresponding OnPostExecute event handler. In the onPostExecute logic, store the row counts and the end time of the data flow task execution. I believe that will provide enough metrics to calculate throughput for the data flow pipeline.
Hope this helps,
Andy
Not sure if it will help, but maybe you can try to configure logging on your package and select "SSIS log provider for SQL Server Profiler"
It shows several information between begin and end of the DataSource Processing

Spawning multiple SQL tasks in SQL Server 2005

I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.