Custom process state in a Zeebe workflow instance

Custom process state in a Zeebe workflow instance - bpmn

Is there a way to attach/retrieve a custom state (a string value) to/from a running Zeebe workflow ?
example: Considering a canonical charge credit card workflow in Zeebe.
Start -> ChargeCreditCard (service task) -> End
The ChargeCreditCard task is modelled as an external task that a nodejs worker app is listening on a topic. Assuming this nodejs app takes 1 minute to execute and complete, i'd like to define/attach 2 custom state names to this model.
State # 1. charging-credit-card (before 1 min)
State # 2. credit-card-charged-successfully (after 1 min)
so that if someone retrieves the state of a running workflow instance via the zeebe rest api, they get state # 1 before the execution and state # 2 after 1 minute when the nodejs worker is done.
My question is there a native way to do this in Zeebe using the standard BPMN objects. If not, are there any workarounds to achieve the same.

I think the previous answer can be improved by emphasizing that querying workflow state is not supported by design, for the purpose of scalabality. So, you should not even want to query the zeebe broker/engine for its internal state, but let it handle its own state in isolation, while you limit yourself to just dealing with the artifacts the zeebe broker publishes or exports asynchronously about its state.
What I do not agree with is to clutter BPMNs with service workers that have no functional meaning but serve as technical workarounds to achieve some goal you should not even be pursuing in the first place. Because, by definition that defeats the purpose of using BPMN to clarify and orchestrate your process flow.
The solution that I am now working on and will open source in a month or so, is roughly as follows.
- A modern Javascript user interface that connects with an API server and a socket server
- An API server exposing a RESTful interface for creating workflow instances, putting data into workflow instances, etc.
- A zeebe installation with a kafka exporter
- A socket server that subscribes to kafka topics related to workflow instance events (using kafkajs in my case), processes kafka messages and emits processed data over a socket back to the JavaScript front-end application
A rudimentary proof-of-concept can be found here https://gitlab.com/werk-en-inkomen/zeebe-kafka-socket . The neater and more elaborate, fully Dockerized solution will follow soon.

There is no Zeebe REST API. Nor is there any Zeebe gRPC Query API to retrieve the state of a running workflow instance (at least not in any supported for production release - it was removed in 0.18). There is discussion about getting updates on running workflows in this feature request: "Awaitable Workflow Outcomes".
So there is currently no way to query a workflow state. You have to broadcast/message it out from a worker.
You could achieve what you want to do by putting the "Charge Credit Card" Service Task in a sub-process, and put non-interrupting boundary event timers on the sub-process to trigger the state updates via service tasks.

Related

How to organize scheduled data polling during the application scaling?

I have a microservice that among other things is used as a "caching proxy" (I'm not sure that this term is correct). It is in between the application API and Azure API. This microservice periodically fetches some data from Azure for several resources and stores it in Redis. Application API from the other side requests the resource data but reads it not from Azure itself, but from Redis.
(This is done in order to limit the scale of requests hitting the Azure API when having a high load on the application API.)
The periodical polling is currently implemented as a naive "while not canceled - fetch, update Redis and sleep for 15 seconds".
This worked well while I had only one instance of the microservice. But now due to new requirements, I have an automatic scaling of my microservice. And that means that if there are 5 instances of the microservice running right now - I'm hitting the Azure API 5 times more frequently than I should.
My question is how can I fix this to do "one request to Azure API per resource once in 15 seconds" - no matter how many microservice instances I have?
My constraints are:
do the minimal changes since the microservice is already in Production;
use the existing resources as much as possible (apart from Redis the microservice is already using message queues - Azure Service Bus).
Ideas I have:
make only one instance a "master" - only this instance will fetch data from Azure. But what should I do when auto-scaling shuts this instance down? How can I detect this and decide on a new master instance? Maybe I could store the master instance identifier in a short-living key in Redis and prolong it every time the resource data is retrieved from Azure? If there is no key in Redis - a new master instance is selected.
use Azure Service Bus message scheduling - on microservice application startup the instance schedules a message in the next 15 seconds which will be received by only one microservice instance. On receiving this message the microservice instance will fetch the data from Azure, update Redis - and schedule another message in the next 15 seconds. This time another microservice instance can receive the instance and do the same - fetch data, update Redis, and schedule the next message. But I don't know how to avoid parallel message chains initiated when several microservice instances are started/restarted.
Anyway, I don't see any good solution for my problem and would appreciate a hint.

Microservices Why Use RabbitMQ?

I haven't found an existing post asking this but apologize if I missed it.
I'm trying to get my head round microservices and have come across articles where RabbitMQ is used. I'm confused why RabbitMQ is needed. Is the intention that the services will use a web api to communicate with the outside world and RabbitMQ to communicate with each other?

In Microservices architecture you have two ways to communicate between the microservices:
Synchronous - that is, each service calls directly the other microservice , which results in dependency between the services
Asynchronous - you have some central hub (or message queue) where you place all requests between the microservices and the corresponding service takes the request, process it and return the result to the caller. This is what RabbitMQ (or any other message queue - MSMQ and Apache Kafka are good alternatives) is used for. In this case all microservices know only about the existance of the hub.
microservices.io has some very nice articles about using microservices

A message queue provide an asynchronous communications protocol - You have the option to send a message from one service to another without having to know if another service is able to handle it immediately or not. Messages can wait until the responsible service is ready. A service publishing a message does not need know anything about the inner workings of the services that will process that message. This way of handling messages decouple the producer from the consumer.
A message queue will keep the processes in your application separated and independent of each other; this way of handling messages could create a system that is easy to maintain and easy to scale.
Simply put, two obvious cases can be used as examples of when message queues really shine:
For long-running processes and background jobs
As the middleman in between microservices
For long-running processes and background jobs:
When requests take a significant amount of time, it is the perfect scenario to incorporate a message queue.
Imagine a web service that handles multiple requests per second and cannot under any circumstances lose one. Plus the requests are handled through time-consuming processes, but the system cannot afford to be bogged down. Some real-life examples could include:
Images Scaling
Sending large/many emails (like newsletters)
Search engine indexing
File scanning
Video encoding
Delivering notifications
PDF processing
Calculations
The middleman in between microservices:
For communication and integration within and between applications, i.e. as the middleman between microservices, a message queue is also useful. Think of a system that needs to notify another part of the system to start to work on a task or when there are a lot of requests coming in at the same time, as in the following scenarios:
Order handling (Order placed, update order status, send an order, payment, etc.)
Food delivery service (Place an order, prepare an order, deliver food)
Any web service that needs to handle multiple requests
Here is a story explaining how Parkster (a digital parking service) are breaking down their system into multiple microservices by using RabbitMQ.
This guide follow a scenario where a web application allows users to upload information to a web site. The site will handle this information and generate a PDF and email it back to the user. Handling the information, generating the PDF and sending the email will in this example case take several seconds and that is one of the reasons of why a message queue will be used.
Here is a story about how and why CloudAMQP used message queues and RabbitMQ between microservices.
Here is a story about the usage of RabbitMQ in an event-based microservices architecture to support 100 million users a month.
And finally a link to Kontena, about why they chose RabbitMQ for their microservice architecture: "Because we needed a stable, manageable and highly-available solution for messaging.".
Please note that I work for the company behind CloudAMQP (hosting provider of RabbitMQ).

The same question can be why REST is necessary for microservices? Microservice concept is not something new under moon. A long time distribution of workflow was used for backend engineering and asynchronous request processing, Microservice is the same component in a separated jvm which matches with S(single responsibility) in SOLID. What makes it micro SERVICE - is that it is balanced. And that is the all! Particularly (!), it can be REST Service on Spring Cloud/REST base, which is registered by Eureka, has proxy gateway and load balancing over Zuul and Ribbon. But it is not the whole world of microservices!By the way, asynchronous distributed processing is one of tasks which microservices are used for. Long time ago services(components) in separated JVM was integrated over any messaging and the pattern is known as ESB. Microservices are the same subjects the pattern. Due to fashion for Spring Cloud REST seems like it is the only way of microservices. Nope! Message based asynchronous microservice architecture is supported by Vertx https://dzone.com/articles/asynchronous-microservices-with-vertx, for example. Why not to use RabbitMQ as message channel? In this case load balancing can be provided by building RabbitMQ cluster. For example:https://codeburst.io/using-rabbitmq-for-microservices-communication-on-docker-a43840401819. So, world is much wide more.

On Heroku, does utilising Node.js prevent the need for queues + worker dynos for third-party API calls?

The Heroku Dev Center on the page about using worker dynos and background jobs states that you need to use worker's + queues to handle API calls, such as fetching an RSS feed, as the operation may take some time if the server is slow and doing this on a web dyno would result in it being blocked from receiving additional requests.
However, from what I've read, it seems to me that one of the major points of Node.js is that it doesn't suffer from blocking under these conditions due to its asynchronous event-based runtime model.
I'm confused because wouldn't this imply that it would be ok to do API calls (asynchronously) in the web dynos? Perhaps the docs were written more for the Ruby/Python/etc use cases where a synchronous model was more prevalent?

NodeJS is an implementation of the reactor pattern. The default build of of NodeJS uses 5 reactors. Once these 5 reactors are being used for IO bound tasks, the main event loop will block.
A common misconception about NodeJS is that it is a system that allows you to do many things at once. This is not necessarily the case, it allows you to do other things while waiting on IO bound tasks, up to 5 at a time.
Any CPU bound tasks are always executed in the main event loop, meaning they will block.
This means if your "job" is IO bound, like putting things in databases then you can probably get away with not using dynos. This of course is dependent on how many things you plan on having go on at once. Remember, any task you put in your main app will take away resources from other incoming requests.
Generally it is not recommended for things like this, if you have a job that does some processing, it belongs in a queue that is executed in its own process or thread.

Long running workflow in asp.net mvc

I'm developing an intranet site using asp.net mvc4 to manage some of our data. One important feature of this site is to trigger import/export jobs. These jobs can take anywhere between 5 minutes to 1 hour. Users of the site need to be able to determine whether a job is currently running as well as the status of prior jobs. Many jobs will often include warning messages concerning duplicate data and these warnings need to be visible on the site.
My plan is to implement these long running processes as a WCF Workflow Service that the asp.net site will interact with. I've got much of the business logic implemented via activities and have tested it using a simple console application. I should note I'm using a correlation handle in order to partition the service based on specific "Projects" on the site.
My problem is how do I go by querying the status of an active job (if one exists) as well as the warning messages of previous jobs. I suspect the best way to do this would be to use the AppFabric tracking service and have my asp.net query a SQL monitoring store and report back on the current status. After setting up AppFabric and adding custom tracking messages, I ran into a few issues. My first issue is that I cannot figure out how to filter out workflow instances that were not using the correct correlation handle as I'd like to show only workflows for a specific project. The other issue is that the tracking database can be delayed quite a bit which causes issues for me trying to determine if a workflow is currently running.
Another possible solution could be to have the workflow explicitly update a database with its current status and any error messages. I'm leaning towards this solution but could use some expert advice.
TL;DR: I need to know the best way to query the execution status and any warning messages of a WCF Workflow service.

As you want to query workflow status and messages even after the workflow is finished I would start by creating a table where you can convert the correlation values a client send to the related workflow ID. I would create a custom activity to do that and drop it right after the receive that creates the workflow.
Next I would create a regular WCF service the client app uses to query the status. This WCF service can query the WF persistence store to see if a given workflow is still running. If so the active bookmarks column will tell you what SOAP messages the workflow is currently waiting for.
As far as messages go you can either use the AppFabric tracking infrastructure to store and retrieve them or you could create a custom activity and store them in your own database. It really depends if you are also interested in the standard WF tracking messages generated.
Update on cheking for running workflow instances:
There are several downsides to adding an IsRunning message to your workflow. For one you would need to make sure one branch keeps looping and waiting for the message but stops as soon as the other real workflow branch is done. Certainly possible but it complicates the workflow and is a possible source of errors. And as it is not part of the business problem it really has no place in the workflow as far as I am concerned. It also means that you will have to load a workflow from disk and persist it back just to tell you that it is there. If it was finished you will need to wait for a fault to indicate there was no workflow instance. And that usually means you get a timeout exception after, by default, 60 seconds. Add throttling to that and you request might be queued because there are too many other workflow instances or SOAP request being processed. So a timeout might mean that a workflow instance exists but is unreachable due to system constraints. Instead I would opt for the simple thing and check if the record in the instance store is still available. The additional info from the active bookmarks column will tell you what the workflow is waiting on, information I have used in the past to dynamically update the UI by enabling/disabling UI elements.

NServiceBus Sagas and REST API Integration best-practices

What is the most sensible approach to integrate/interact NServiceBus Sagas with REST APIs?
The scenario is as follows,
We have a load balanced REST API. Depending on the load we can add more nodes.
REST API is a wrapper around a DomainServices API. This means the API can be consumed directly.
We would like to use Sagas for workflow and implement NServiceBus Distributor to scale-out.
Question is, if we use the REST API from Sagas, the actual processing happens in the API farm. This in a way defeats the purpose of implementing distributor pattern.
On the other hand, using DomainServives API directly from Sagas, allows processing locally within worker nodes. With this approach we will have to maintain API assemblies in multiple locations but the throughput could be higher.
I am trying to understand the best approach. Personally, I’d prefer to consume the API (if readily available) but this could introduce chattiness to the system and could take longer to complete as compared to to in-process.
A typical sequence could be similar to publishing an online advertisement,
Advertiser submits a new advertisement request via a web application.
Web application invokes the relevant API endpoint and sends a command
message.
Command message initiates a new publish advertisement Saga
instance.
Saga sends a command to validate caller permissions (in
process/out of process API call)
Saga sends a command to validate the
advertisement data (in process/out of process API call)
Saga sends a
command to the fraud service (third party service)
Once the content and fraud verifications are successful,
Saga sends a command to the billing system.
Saga invokes an API call to save add details. (in
process/out of process API call)
And this goes on until the advertisement is expired, there are a number of retry and failure condition paths.

After a number of design iterations we came up with the following guidelines,
Treat REST API layer as the integration platform.
Assume API endpoints are capable of abstracting fairly complex micro work-flows. Micro work-flows are operations that executes in a single burst (not interruptible) and completes with-in a short time span (<1 second).
Assume API farm is capable of serving many concurrent requests and can be easily scaled-out.
Favor synchronous invocations over asynchronous message based invocations when the target operation is fairly straightforward.
When asynchronous processing is required use a single message handler and invoke API from the handlers. This will delegate work to the API farm. This will also eliminate the need for a distributor and extra hardware resources.
Avoid Saga’s unless if the business work-flow contains multiple transactions, compensation logic and resumes. Tests reveals Sagas do not perform well under load.
Avoid consuming DomainServices directly from a message handler. This till do the work locally and also introduces a deployment hassle by distributing business logic.
Happy to hear out thoughts.

You are right on with identifying that you will need Sagas to manage workflow. I'm willing to bet that your Domain hooks up to a common database. If that is true then it will be faster to use your Domain directly and remove the serialization/network overhead. You will also lose the ability to easily manage the transactions at the database level.
Assuming your are directly calling your Domain, the performance becomes a question of how the Domain performs. You may take steps to optimize the database, drive down distributed transaction costs, sharding the data, etc. You may end up using the Distributor to have multiple Saga processing nodes, but it sounds like you have some more testing to do once a design is chosen.
Generically speaking, we use REST APIs to model the commands as resources(via POST) to allow interaction with NSB from clients who don't have direct access to messaging. This is a potential solution to get things onto NSB from your web app.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas