submit TWS schedule on specific AGENT - workload-scheduler

is this possible to submit TWS schedule or run TWS schedule on specific AGENT
e.g currently TWS dynamically select AGENT [AGENT A] and run jobs, how can we specifically select AGENT [AGENT B] and run schedule job on that

This is not a native scenario, but you can look at a some different ways to address that.
Schedule workstation is not relevant, what you need to change is the job workstation only.
You can use:
Rest APIs
Run a query on job stream definition to get the id: POST /model/jobstream/header/query
Generate a new temporary instance for the job stream: POST /plan/current/jobstream/{jobstreamId}/action/make_jobstream
Modify the returned JSON replacing the job workstations
Submit the modified instance: POST /plan/{planId}/jobstream/action/submit_jobstream
Use pools
If you actually want to move all jobs from one agent to another, you can use pools workstations and change the actual agents in the pool: changing members for static pools, or moving a logical resource for dynamic pools
Clone the definition
Another option is to clone the definition with composer or using workload application templates and then submit the cloned definition. If you want you can remove the cloned definition after the submission.

Related

How to make Dataproc detect Python-Hive connection as a Yarn Job?

I launch a Dataproc cluster and serve Hive on it. Remotely from any machine I use Pyhive or PyODBC to connect to Hive and do things. It's not just one query. It can be a long session with intermittent queries. (The query itself has issues; will ask separately.)
Even during one single, active query, the operation does not show as a "Job" (I guess it's Yarn) on the dashboard. In contrast, when I "submit" tasks via Pyspark, they show up as "Jobs".
Besides the lack of task visibility, I also suspect that, w/o a Job, the cluster may not reliably detect a Python client is "connected" to it, hence the cluster's auto-delete might kick in prematurely.
Is there a way to "register" a Job to companion my Python session, and cancel/delete the job at times of my choosing? For my case, it is a "dummy", "nominal" job that does nothing.
Or maybe there's a more proper way to let Yarn detect my Python client's connection and create a job for it?
Thanks.
This is not supported right now, you need to submit jobs via Dataproc Jobs API to make them visible on jobs UI page and to be taken into account by cluster TTL feature.
If you can not use Dataproc Jobs API to execute your actual jobs, then you can submit a dummy Pig job that sleeps for desired time (5 hours in the example below) to prevent cluster deletion by max idle time feature:
gcloud dataproc jobs submit pig --cluster="${CLUSTER_NAME}" \
--execute="sh sleep $((5 * 60 * 60))"

Pentaho Logging specify Job or Trans for each line

I am running Pentaho Kettle 6.1 through a java application. All of the Pentaho logs are directed through the java app and logged out into the same log file at the java level.
When a job starts or finishes the logs indicate which job is starting or finishing, but when the job is in the middle of running the log output only indicates the specific step it is on without any indication of which job or trans is executing.
This causes confusion and is difficult to follow when there is more than one job running simultaneously. Does anyone know of a way to prepend the name of the job or trans to each log entry?
Not that I know, and I doubt there is for the simple reason that the same transformation/job may be split to run on more than one machine, by more that one user, and/or launched in parallel in different job hierarchies of callers.
The general answer is to log in a database (right-click any where, Parameters, Logging, define the logging table and what you want to log). All the logging will be copied to a table database together with a channel_id. This is a unique number that will be attributed to each "run" and link together all the logging information that comes from all the dependent job/transformations. You can then view this info with a SELECT...WHERE channel_id=...
However, you case seams to be simpler. Use the database logging with a log_intervale of, say, 2 seconds and SELECT TRANSNAME/JOBNAME, LOG_FIELD FROM LOG_TABLE continuously on your terminal.
You can also follow a specific job/transformation by logging in a specific table, but this means you know in advance which is the job/transformation to debug.

Using SoftLayer API to configure Evault Backup (configure agent, jobs, and schedule)

I would like to automate, via the SoftLayer API, the configuration of an Evault Backup system -- configure the agent, create a job, set the file selection to backup, create the schedule. I can't find any structures that seem to contain that information to create the configuration (except for creating a schedule). Does anyone know if the items needed are available using the SoftLayer API?
To try to get a better picture of related underlying structures, I went via the GUI and created an agent, jobs, and schedule and see that the backups for that are running. I can use the Soflayer API to then query some things -- the job details (job name/description, last run date result), agent status, but cannot seem to query the schedules or replication schedule, nor any of the Agent configuration information beyond its status.
What I know, with the Softlayer API you only will able to get information about the Evault, to configure the device you need to use the WebCC client.

What is the practice for scheduling multiple inter-dependent SQL Server Agent jobs?

The way my team currently schedules jobs is through the SQL Server Job Agent. Many of these jobs have dependencies on other internal servers which in turn have their own SQL Server Jobs that need to be run to keep their data up to date.
This has created dependencies in the start time and length of each of our SQL Server Jobs. Job A might depend on Job B finishing, so we schedule Job B a certain estimated time in advance to Job A. All of this process is very subjective and not scalable, as we add more jobs and servers which create more dependencies.
I would love to get out of the business of subjectively scheduling these jobs and hoping that the dominos fall in the right order. I am wondering what the accepted practices for scheduling SQL Server jobs are. Do people use SSIS to chain jobs together? Is there tooling already built into the SQL Server Job Agent to handle this?
What is the accepted way to handle the scheduling of multiple SQL Server jobs with dependencies on each other?
I have used Control-M before to schedule multiple inter-dependent jobs in different environment. Control-M generally works by using batch files (from what I remember) to execute SSIS packages.
We had a complicated environment hosting 2 data warehouses side by side (1 International and 1 US Local). There were jobs that were dependent on other jobs and those jobs on others and so on, but by using Control-M we could easily decide on the dependency (It has a really nice and intuitive GUI). Other tool that comes to my mind is Tidal Scheduler.
There is no set standard for job scheduling, but I think its safe to say that job schedules depend entirely on what an organization needs. For example Finance jobs might be dependent on Sales and Sales on Inventory and so on. But the point is, if you need to have job inter dependency, using a third party software such as Control-M is a safe bet. It can control jobs on different environments and give you real sense of the company wide job control.
We too had the requirement to manage dependencies between multiple agent jobs - after looking at various 3rd party tools and discounting them for various reasons (mainly down to the internal constraints relating to the use of 3rd party software) we decided to create our own solution.
The solution centres around a configuration database that holds details about processes (jobs) that need to run and how they are grouped (batches), along with the dependencies between processes.
Summary of configuration tables used:
Batch - highlevel definition of a group of related processes, includes metadata such as max concurrent processes, and current batch instance etc.
Process - meta data relating to a process (job) such as name, max wait time, earliest run time, status (enabled / disabled), batch (what batch the process belongs to), process job name etc.
Batch Instance - the active instance of a given batch
Process Instance - active instances of processes for a given batch
Process Dependency - dependency matrix
Batch Instance Status - lookup for batch instance status
Process Instance Status - loolup for process instance status
Each batch has 2 control jobs - START BATCH and UPDATE BATCH. The 1st deals with starting all processes that belong to it and the 2nd is the last to run in any given batch and deals with updating the outcome statuses.
Each process has an agent job associated with it that gets executed by the START BATCH job - processes have a capped concurrency (defined in the batch configuration) so processes are started up to a max of x at a time and then START BATCH waits until a free slot becomes available before starting the next process.
The process agent job steps call a templated SSIS package that deals with the actual ETL work and with the decision making around whether the process needs to run and has to wait for dependencies etc.
We are currently looking to move to a Service Broker solution for greater flexibility and control.
Anyway, probably too much detail and not enough example here so VS2010 project available on request.
I'm not sure how much this will help, but we ended up creating an email solution for scheduling.
We built an email reader that accesses an exchange mailbox. As jobs finish, they send an email to the mail reader to start another job. The other nice part, is that most applications have email notifications built in, so there really isn't much in the way of custom programming.
We really only built it in the first place to handle data files coming in from lots of other partners. It was much easier to give them an email address rather than setting them up with an ftp site, etc.
The mail reader app now has grown to include basic filtering, time of day scheduling, use of semaphores to prevent concurrent jobs, etc. It really works great.

How to start a job or process when a job on another server finished?

I want to send a mail or automatically start a job on my server as soon as a job on another server has finished successfully. I have access to the other server and can view the job status but I cannot change the job itself, which is running an SSIS package.
Basically, I want to start refreshing my database (via running an ETL through job) as soon as source has stopped refreshing itself. I would love to have suggestion beside this windows service implementation.
I took the liberty of editing the question and title to make it more explicit. As I understand it from your description, you want to run a job (or start some other process) on server A when a job on server B has completed successfully. You cannot change the job definition on server B, but you can log on to it and view the job history.
If you can't change the job or anything else on server B, that means it cannot notify server A when the job is complete. Therefore, you need to query server B from server A, using a Windows service or possibly a simple script that runs every few minutes (or hours, or whatever is appropriate).
You can query the status of a job from .NET or PowerShell using the SMO Job class, or from TSQL using the sp_help_job procedure. Which of these is a better solution depends on how you want to implement your polling mechanism.