How to create a Priority queue schedule in Autosys? - batch-processing

Technologies available: Autosys, Informatica, Unix scripting, Database (available via informatica)
How our batch currently works is with filewatchers looking for a file called "control.txt" which gets deleted when a feed starts processing. It gets recreated once completed which allows all "control" autosys jobs waiting, to have one pick up the control file and begin processing data feeds one by one.
However, the system has grown large, and some feeds have become more important than others, and we're looking at ways to improve our scheduler to prioritize feeds over others.
With the current design, of one a file deciding when the next feed runs, it can't be done, and I haven't been able to come up with a simple solution to make it happen.
Example:
1. Feed A is processing
2. Feed B, Feed C, Feed X, Feed F come in while Feed A is processing
3. Need to ensure that Feed B is processed next, even though C, X, F are ready.
4. C, X, F have a lower priority than A and B, but have the same priority and can process in any order

A very interesting question. One thing that I can think of is to have an extra Autosys job with a shell script that copies the file in certain order. Like:
Create input folder e.g. StageFolder
Let's call your current Autosys input folder "the InputFolder"
Have Autosys monitor it and for any file run a OrderedFileCopyScript.sh, every minute
OrderedFileCopyScript.sh should copy one file from StageFolder to InputFolder in desired order only if InputFolder is empty
I hope I made myself clear.

I oppose use of Autosys for this requirement ! Wrong tool !
I don't know all the details but considering an application with the usual reference tables.
In this case you should make use of feed reference table to include relative priorities.
I would suggest to create(or reuse) a table to loaded by the successor job of the file watcher.
1) Table to contain the unprocessed file with the corresponding priority and then use this table to process the files based on the priority.
2) Remove/archive the entries once done.
3) Have another job of this and run like a daemon with start_times/run_window.
This gives the flexibility to deal with change in priorities and keeps overall design simple.
This gives

Related

How to set KOFAX KTM Server global variable value which will be initialized in Batch open, updated in SeparateCurrentPage & used in BatchClose?

I am trying to count a specific barcode value from Project.Document_SeparateCurrentPage and use it in BatchClose to compare if the count is greater than 1 and if it is >1 then send the batch to a specific queue with specific priority. I used a global variable in KTM Project Script to hold the count value which was initialized to 0 in Batch open. It worked fine until unit testing. But our automation team found that out of 20 similar batches, few batches were sent to the queue where the batch should go only if the count satisfies the greater than one condition, though they used only one barcode.
I googled and found that KTM Server script events do not allow to use shared information in different processes(https://docshield.kofax.com/KTM/en_US/6.4.0-uuxag78yhr/help/SCRIPT/ScriptDocumentation/c_ServerScriptEvents.html). Then I tried to use a batch field to hold the barcode count but unable to update its value from Project.Document_SeparateCurrentPage function using pXRootFolder.Fields.ItemByName("BatchFieldName").Text = "GreaterThanOne". The logs show that the batch reads the first page three times and then errors out.
Any links would help. Thanks in advance.
As you mentioned, the different phases of batch/document processing can execute in different processes, so global variables initialized in one event won’t necessarily be available in others. Ideally you should only use global variables if their content can be set from Application_InitializeScript or Application_InitializeBatch, because these events occur in each separate process. As you’ve found out, you shouldn’t use a global variable for your use case, because Document_SeparateCurrentPage and Batch_Close for one batch may occur in different processes, just as the same process will likely execute those events for multiple batches.
Also, you cannot set batch fields from document level events for a related reason: any number of separate processes could be processing documents of a batch in parallel, so batch level data is read-only to document events. It is a bit unintuitive, but separation is a document level event even though it seems like it is acting on the whole batch. (The three times you saw is just an error retry mechanism.)
If it meets your needs, the simplest answer might be to use a barcode locator as part of normal extraction (not just separation), and assign to a field if needed. While you cannot set batch fields from document events, you can read document data from batch events. So instead of trying to track something like a count over the course of document events, just make sure whatever data you need is saved at a document level. Then in a Batch_Close you can iterate the documents and count/calculate whatever you need. (In your case maybe the number of locator alternatives for the barcode locator, across each document.)

ABAP Program to notify Users X amount of days before user account will be disabled

I'm currently learning ABAP and trying to make an enhancement but have broken down in confusion on how to go about building on top of existing code. I have a program that runs periodically via a background job that disables user accounts X amount of days (in this case 90 days of inactive usage based on USR02~TRDAT).
I want to add an enhancement to notify the User via their email address (result of usr02~bname to match usr21~bname to pass the usr21~persnumber and usr21~addrnumber to adr6 which will point to the adr6~smtp_addr of the user, providing the usr02~bname -> adr6~smtp_addr relationship) based on their last logon date being 30, 15, 7, 5, 3, and 1 day away from the 90 day inactivity threshold with a link to the SAP system to help them reactivate the account with ease.
I'm beginning to think that an enhancement might not be a good idea but rather create a new program and schedule the background job daily. Any guidance or information would be greatly appreciated...
Extract
CLASS cl_inactive_users_reader DEFINITION.
PUBLIC SECTION.
TYPES:
BEGIN OF ts_inactive_user,
user_name TYPE syst_uname,
days_of_inactivity TYPE int1,
END OF ts_inactive_user.
TYPES tt_inactive_users TYPE STANDARD TABLE OF ts_inactive_user WITH EMPTY KEY.
CLASS-METHODS read_inactive_users
IMPORTING
min_days_of_inactivity TYPE int1
RETURNING
VALUE(result) TYPE tt_inactive_users.
ENDCLASS.
Then refactor
REPORT block_inactive_users.
DATA(inactive_users) = cl_inactive_users_readers=>read_inactive_users( 90 ).
LOOP AT inactive_users INTO DATA(inactive_user).
" block user
ENDLOOP.
And add
REPORT warn_inactive_users.
DATA(inactive_users) = cl_inactive_users_readers=>read_inactive_users( 60 ).
LOOP AT inactive_users INTO DATA(inactive_user).
CASE inactive_user-days_of_inactivity.
" choose urgency
ENDCASE.
" send e-mail
ENDLOOP.
and run both reports daily.
Don't create a big ball of mud by squeezing new features into existing code.
From SAP wiki:
The enhancement concept allows you to add your own functionality to SAP's standard business applications without having to modify the original applications. To modify the standard SAP behavior as per customer requirements, we can use enhancement framework.
As per your description, it doesn't sound like a use case for an enhancement. It isn't an intervention in an existing process. The original process and your new requirement are two different processes with some mutual logical part - selection of days of inactivity of users. The two shouldn't rely on each other.
Structurally I think it is best to have a separate program for computing which e-mails need to be sent and when, and a separate program for actually sending them.
I would copy your original program to a new one, and modify it a little bit so that instead of disabling a user, it records into some table for each user: 1) an e-mail 2) a date when to send 3) how many days left (30, 15, 7, etc) 4) status if the e-mail was sent or not. Initially you can even have multiple such jobs for each period (30, 15, 7 etc) and pass it as a parameter (which you use inside instead of 90).
This program you run daily as a job and it populates that table with e-mail "tasks" of what needs to be sent today. It just adds new lines, so lines from yesterday should stay in there.
The 2nd program should just read that table and send actual e-mails and update the statuses. You run that program daily as well.
This way you have:
overview: just check the table to see what's going on
control: if the e-mailer dies or hangs, you can restart it and it will continue where it left off; with statuses you avoid sending duplicate mails
you can make sure that you don't send outdated e-mails if in your mailer script you ignore all tasks older than say 2 days
I want to clarify your confusion about the use of enhancements:
You would want to use enhancements in terms of 'something' happens or is going to happen in the system and you would want to change this standard way.
That something, let's call it event or process could be for example an order is placed, a certain user is logging onto the system or a material has been or is going to be changed.
The change could be notifying another system of an order or checking the logged on user with additional checks for example his GUI version and warn him/her if not up-to-date.
Ask yourself, what process on the system does the execution of your program or code depend on. Does anything need to happen before the program is executed? No, only elapsing time.
Even if you had found an enhancement, you would want to use. If this process using the enhancement would not be run in 90 days, your mails would not be sent, because the enhancement would never been called.
edit: That being said, supposing you mean by enhancement 'building on your existing program' instead of 'creating a new one' would be absolutely not the right terminology for enhancement in the sap universe.
I would extend the functionality of your existing program, since you already compute how many days are left and you would have only one job to maintain.

Combine the Output of 3 Transformation in Pentaho

I'm executing 3 transformations in parallel. the o/p of three transformation contains same column names.
I've added output of all transformation to common dummy step in job and also added WaitForSql step to wait until all 3 transformations have completed execution, and also added unique step in next transformation to remove duplicate records.
All works proper till WaitForSQL, but when next transformation gets rows from result and performs Unique step I get duplicate records also when I perform Unique step.
Has anyone solution for this issue, plz reply.....
You have to sort your resulting stream after the dummy step before removing the duplicate rows. The sort will also make sure that all 3 streams are completed before sorting.
I didn't know you could use the dummy step to combine stream results. I always used the append streams-step for that.
Several points:
You cannot simply combine the outputs of multiple transformations at the job level. You wil need another transformation to read the data using the Get rows from result; jobs don't know about data streams, they only know about tasks (job entries) and exit status.
Be careful with "Launch next entries in parallel" at the job level. Lets say you have 2 transformations, trans1 and trans2, launched in parallel, followed by a dummy step. The dummy will be called TWICE, once after trans1 finishes and another when trans2 finishes. A job hop is not a data stream, it's a workflow. If you want to run transformations in parallel and later go back to a single workflow you need a subjob that calls the transformations and doens't have a Success job entry. That way, the subjob only finishes after the 2nd transformation finishes and only then it goes to the dummy step in the parent.
Why do you need those transformations running inside a job? If they have the same column structure, why don't you call them as sub transformations inside a transformation, and not a job? Steps in a transformation are always launched in parallel, so if you're parallelizing things for performance, a transformation is the way to do it, not a job. A job is meant to run multiple tasks in sequential order, one after the other, with workflow control depending on the result of the previous step.
If you want the Output from three of them into a single file, then you could run the same 3 instances in a single transformation with the append to target option ticked in your output step

Process Each Row in Kettle ONE AT A TIME?

I was wondering if it is possible to work on a per row basis in the kettle?
I am trying to implement a reporting scheme which consists of a table, where the requests get queued for processing and then the Pentaho job that picks up the records on that table.
my job currently has 3 transformations in it,
1st is to get records from the queued requests table
2nd is to analyze the values on each record and come up with multiple results based on that record. for example, a user would request to have records of movies of the horror genre. then it should spit out the horror movies
3rd is to further retrieve the information about the movies such as the year, director and etc, which is to be outputted to an excel file.
this is the idea, but it's a bit challenging doing it in Pentaho as it does stuff all at the same. is there a way that I can make my job work on records one by one?
EDIT.
Just to add, I have been trying to extend the implementation of the Pentaho cookbook sample but if I compare to my design, its like step 2 and step 3 only.
I can't seem to make the table input step work one at a time.
i just made it act like the implementation in the cookbook, i did adjustments on it. instead of using two transformations to gather all the necessary fields, i just retrieved all the information that i need in 1 transformation.
then after that i copied those information to the next steps, then some queries to complete the information and it is now working.
passing parameters between transformations is a bit confusing, there are parameters to be set on the transformation itself and also on the job where the transformations lay so i kinda went guessing for some time just to make it work.

How to backup tcsh history periodically to a single file in chronological manner?

I use tcsh at work - one of the features I use extensively is command-line history completion at the shell prompt. Currently, I've limited the size of my history file to 2000 (as I don't want to slow down the shell too much). However at times I need a command I know I've used a month or two back , but by now has been erased. So I want a system wherein:
My history buffer stores 2000 lines only
Instead of older commands getting erased , they should be saved into a "master" history file, ordered chronologically i.e if two shells were opened , then the commands entered in the history should be sorted as per the datestamp (not the order in which the shells were closed)!
It would be perfect , if this master history file could be auto-backed up, say per week basis.
I'm sure many of avid shell users have faced a situation like this - I'm hoping to get the answer from one of such users !!
2000 is pretty low. You could raise that a fair amount without suffering too much.
Next you probably want to store the history on logout, since this is when new commands are added to the .history file.
Create a file called .logout in your $HOME (for bash users, this file is .bash_logout). In this, copy the contents of the history to a permanent store. For example:
cat $HOME/.history >> $HOME/.ancient_history
This will append the history to a file ".ancient_history". For bash users, the file to copy is called .bash_history.
Then create a cron job that creates a back up of this every now and again. For starters here is one that moves the file to a filename with a date stamp at 5 minutes past midnight every day.
5 0 * * * mv $HOME/.ancient_history $HOME/.ancient_history_`date +%s`
There are probably more things you could do with this, but this is enough to get started. It's a pretty good idea that I hadn't thought of doing before either :-)
never quite thought of doing this but the simplest way would be to write a cron job that appended the history file to another file. The problem with this would be that you would get duplicates unless you wrote the cron to clear the history file after it did the dump.
history is stored (as far as i am aware) by line number only so the numbers would repeat for each dump. but you cold add a marker line with the date of the dump.