Camunda modular design - bpmn

I want to manage a huge workflow in Camunda.
I have decided to split this into different processes like Create, Configuration, Review & Confirm. Each of these processes have 10 to 15 tasks. These processes should be executed in sequence.
If I want to design my workflow like this, how will I link each process. What is the proper way for Camunda modular design.

You would probably go with some kind of SubProcess. If you plan to model different processes you most likely will use Call Activities and execute them one ofter another in some kind of root process.
Beware of the fact that each sub process starts its own process instance and thus you have to handle different execution scopes. That will be relevant if you request information from the system like e.g. the List of UserTasks. You can not use the processInstanceId of the root process in this case and will have to use a businessKey.
You also have to handle the process variables and decide which variables you want to propagate to the sub process.

Related

Azure Data Factory: Execute Pipeline activity cannot reference calling pipeline, cyclical behaviour required

I have a number of pipelines that need to cycle depending on availability of data. If the data is not there wait and try again. The pipe behaviours are largely controlled by a database which captures logs which are used to make decisions about processing.
I read the Microsoft documentation about the Execute Pipeline activity which states that
The Execute Pipeline activity allows a Data Factory or Synapse
pipeline to invoke another pipeline.
It does not explicitly state that it is impossible though. I tried to reference Pipe_A from Pipe_A but the pipe is not visible in the drop down. I need a work-around for this restriction.
Constraints:
The pipe must not call all pipes again, just the pipe in question. The preceding pipe is running all pipes in parallel.
I don't know how many iterations are needed and cannot specify this quantity.
As far as possible best effort has been implemented and this pattern should continue.
Ideas:
Create a intermediary pipe that can be referenced. This is no good I would need to do this for every pipe that requires this behaviour because dynamic content is not allowed for pipe selection. This approach would also pollute the Data Factory workspace.
Direct control flow backwards after waiting inside the same pipeline if condition is met. This won't work either, the If activity does not allow expression of flow within the same context as the If activity itself.
I thought about externalising this behaviour to a Python application which could be attached to an Azure Function if needed. The application would handle the scheduling and waiting. The application could call any pipe it needed and could itself be invoked by the pipe in question. This seems drastic!
Finally, I discovered an activity Until which has do while behaviour. I could wrap these pipes in Until, the pipe executes and finishes and sets database state to 'finished' or cannot finish and sets the state to incomplete and waits. The expression then either kicks off another execution or it does not. Additional conditional logic can be included as required in the procedure that will be used to set a value to variable used by the expression in the Until. I would need a variable per pipe.
I think idea 4 makes sense, I thought I would post this anyway in case people can spot limitations in this approach and/or recommend an approach.
Yes, absolutely agree with All About BI, its seems in your scenario the best suited ADF Activity is Until :
The Until activity in ADF functions as a wrapper and parent component
for iterations, with inner child activities comprising the block of
items to iterate over. The result (s) from those inner child
activities must then be used in the parent Until expression to
determine if another iteration is necessary. Alternatively, if the
pipeline can be maintained
The assessment condition for the Until activity might comprise outputs from other activities, pipeline parameters, or variables.
When used in conjunction with the Wait activity, the Until activity allows you to create loop conditions to periodically check the status of specific operations. Here are some examples:
Check to see if the database table has been updated with new rows.
Check to see if the SQL job is complete.
Check to see whether any new files have been added to a specific
folder.

Hangfire - Is there a way to attach additional meta data to jobs when they are created to be able to identify them later?

I am looking to implement Hangfire within an Asp.Net Core application.
However, I'm struggling to understand how best to prevent the user from creating duplicate Hangfire "Fire-and-Forget" jobs.
The Problem
Say the user, via the app, creates a job that does some processing relating to a specific client. The process may take several minutes to complete. I want to be able to prevent the user from creating another job for the same client while there are other jobs for that client still being processed by Hangfire (i.e. there can only be 1 processing job for a specific client at any one time, although several different clients could also each have their own job being processed).
Solution?
I need a way to attach additional meta-data (in this example, the client id) to each job as it is created, which I can then use to interrogate the jobs currently processing in Hangfire to see if any of them relate to the client id in question.
It seems like such a basic feature that would prove so useful for such scenarios, but I'm coming to the conclusion that such a thing isn't supported, which surprises me.
... Unless you know different.
Hangfire looks great, and I'm keen to use it, but this might be a show-stopper for me.
Any advice would be greatly received.
Thanks
I need a way to attach additional meta-data (in this example, the
client id) to each job as it is created
Adding metadata to jobs can be achieved by the mean of hangfire filters.
You may have a look at this answer.
https://stackoverflow.com/a/57396553/1236044
Depending on your needs you may use more filters types.
For example, the IElectStateFilter may be useful to filter out jobs if another one is currently processing.
I you have several processing servers, you will need your own storage solution to handle your own custom currently processing/priority/locking mechanism.

Prevent one failed subtask failing all tasks in Flyte

I have a dynamic_task which kicks off a number of python_tasks. However, as soon as one of the python_tasks fails, the other ones that are still running would fail as well. Is this by design? Is there a way to change this behavior so that other tasks can still complete without failing?
This is by design, as a means to save resources, but it is configurable. Presumably, dynamic tasks are related to each other, and downstream tasks will need the output of all of them. So if one fails, the default behavior is to fail the rest.
If you'd like to change this, create your dynamic task with a float as this argument in the decorator: https://github.com/lyft/flytekit/blob/d4cfedc4c580f08bf904e6e474a0b948a4608737/flytekit/common/tasks/sdk_dynamic.py#L84
The idea is that partial failures are not tolerated within a data passing DAG. If some node fails, then by definition the data is partial.
But for dynamic array tasks, Flyte allows a special provision (actually the Array tasks plugin), which allows the users to provide a ratio of acceptable successful tasks.

How to reflect automatic processes and processes done by different users with BPMN?

Let's say I have the following simplified process:
How should I reflect there that the data could be added not only by manual input, but can be received from another system (without user verification)?
And is there more correct way to display the same actions done by different users (see Verification step done by Manager 1 or Manager 2; in reality there are much more steps than just Verification, and all of them are the same in Manager 1 and Manager 2 columns).
Obviously there are many open questions regarding your specific requirements, so I can just give you an example:
I am using two lanes, one for the manager, one for the user. I assume that the concrete person (or subrole) necessary to carry out the steps for the "manager" needs to be determined within the process. From a process perspective it's just one role carried out by people with different skill sets or authorizations. I show that "Assign" task here as an automatic step, but it could also be a manual step. A BPMN process can have several start "events", I am using here two of them to show the different ways in which the process can start. I am using a collapsed pool "External System" and a message flow to indicate where the automatic message is coming from.
(Please note that BPMN processes are typically modeled from left to right, but may also be modeled from top to bottom. Also note, that for more complex processes and a more finegrained level of detail, it is often preferable to show every process participant in a separate pool with a separate process and exchange of messages in between them. Modeling one process pool with several lanes quite soon reaches practical limits!)

Spawning multiple SQL tasks in SQL Server 2005

I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.