Correlation between BPMN event and trigger - bpmn

I have been reading BPMN 2.0 . Can some some one please tell me the difference between event and trigger and how do they both correlate from BPMN perspective

Generally this question can be answered in two manners.
Simplified manner, where all secondary details are dropped.
Event is something that answers the question WHAT happened?
Trigger is something that answers the question WHY it happened?
Qualified manner, with all the required details.
Event is something that happens with the context of the process, which changes process state, data objects, process flow.
The main fundamental events of any process are:
Start event
End event
Intermediate event, i.e. something really significant
The first two determine if the process was stopped or still working.
Triggers are much more versatile entities, also called Event Definitions, and are designated to be caught by events, i.e. to activate them. They can be described as a set of conditions which, if true, fire some event. Triggers can be:
Message-triggers. They are executed when receiving the message.
Time-triggers. They are executed on time alarm.
Conditional triggers. They are fired when certain conditions are met.
Escalation triggers. They are fired when the process is escalated.
and much much more
Exact set of available triggers depends on concrete BPMS and requires a separate solid article.

Related

Use cases of Event Sourcing, when we don't care about past states

I have been reading about Event Sourcing pattern, I have seen it used in the projects I have worked on, but I am still yet to see any benefit of it, while it makes the design much more complicated.
That is, many sources mention that Event Sourcing is good if you want to see Audit Log, be able to reconstruct the state of 15 days ago and I see that Event Sourcing solves all of that beautifully. But apart from that, what is the point?
Yes, I can imagine that if you are in relational world, then writes are comparatively slow as they lock the data and so on. But it is much easier to solve this problem, by going no-sql and using something like Cassandra. Cassandra's writes are super fast, as they are append-only (kinda temporary event source), it scales beautifully as well. Sources also mention that Event Sourcing helps scaling - how on earth it can help you to scale, when instead of storing ~1 row of data per user, now you have 9000 and instead of retrieving that single row, now you are replaying 9000 rows (or less, if you complicate the design even more and add some temporal snapshots of state and replay the current state form the last snapshot).
Any examples of real life problems that Event Sourcing solves or links would be much appreciated.
While I haven't implemented a distributed, event-sourced sub-system as yet (so I'm no expert), I have been researching and evaluating the approach. Event sourcing provides a number of key benefits:
Reliability
Scalability
Evolvability
Audit
I'm sure there are more. To a large extent, the benefits of event sourcing depend on the baseline you are comparing it against (CRUD, event-driven DDD, CQRS, or whatever), and the domain.
Let's look at each of those in turn:
Reliability
With event driven systems that fire events whenever the system is updated, you often have a problem: how do you both update the system state and fire the event in one go? If the 2nd operation fails, your system is in a broken, inconsistent state. Event sourcing provides a neat solution to this, since the system only requires a single operation for the state change, which will either succeed or fail atomically: the writing of the event. Other solutions tend to be more complex and less scalable - 2 phase commit, etc.
This is a big benefit in a large, high transaction system, where components are failing, being updated or replaced all the time while transactions are going on. The ability to terminate a process at any time without any worry about data corruption or consistency is a big benefit and helps you sleep at night.
In many domains you won't have concurrent writes to the same entities, or you won't require events since a state change has no knock-on effects, in which case event sourcing is unlikely to be a good approach, and simpler approaches like CRUD may be fine.
Scalability
First of all, event streams make consistent writes very efficient - it's just an append only log, which makes replication and 'compare and set' simple to optimise. Something like Cassandra is quite slow in the scenario where you need to protect your invariants - that is, you need to validate a command against the current state of a 'row', and reject the update if the row changes before you have a chance to update it. You either need to use 'lightweight transactions' to ensure consistency, or have a single writer thread per partition, so that you can be sure that you can successfully validate a command against the current state of the system before allowing the update. Of course you can implement an event store in Cassandra, using either of these approaches (single thread/lightweight transactions).
Read scalability is the biggest performance benefit though - since you can build as many different eventually consistent projections (views) on the data as you want by reading from event streams, and horizontally scale query services on these views as much as you want. These views can use custom databases (Cassandra, graph databases) as necessary to allow queries to be optimised as much as you want. They can store denormalised data, to allow all required data to be fetched in a single (non-joined) database query. They can even store the projected state in memory, for maximum performance. While this can potentially be achieved without event sourcing, it is much more complex to implement.
If you don't have complex querying and high scalability requirements, event sourcing may not be the right solution.
Evolvability
If you need to look at your data in a new way, say you create a new client app or screen in an app, it's very easy to add new projections of the event streams as new, independent services. If you need to add some data to an existing read view that you missed, or fix a bug in the read view, you can just rebuild the views using the event streams and throw away the old ones. The advantages here vs. the non-event sourced case are:
You don't need to write both DB migration code and then code to keep the view up to date as events come in. Instead, you just write the code to keep it up to date, and run it on the events from the start of time.
Related to this, you can do the update without having to bring down the query service to do a schema change - instead, just leave the old service version running against the old DB, generate a new DB with the new service version, and when it's caught up with the event streams, just atomically switch over then clean up the old service and DB once you're happy the new one is stable (noting that the old service will be keeping itself up to date in the meantime, if you need to roll back!). This is likely to be extremely difficult to achieve without event sourcing.
If you need any temporal information to be added to your views (e.g. when was the last update, when was this created), that's already available and easy to add, but impossible to add retrospectively without event sourcing.
Note that the above isn't about modifying event streams (which is tricker, see my comment on challenges below) - it's about using the existing event streams to enhance a view or create a new one.
There are simple ways to do this without event sourcing, such as using database views (with an RDBMS), but they aren't as scalable.
Event sourcing also has some challenges for evolvability - you need to take care of event versioning, probably using a combination of weak event schema (so you can add properties with default values) and stream replacement (when you want to do a bigger change to your events). Greg Young is writing a good book on this.
Audit
As you mentioned, you're not interested in this.

(Fluent) NHibernate progress events for lengthy transactions?

We've hooked up the ISaveOrUpdateEventListener event and hoped we could tie it to a progress bar update for each node being visited during the save traversal of a pretty big model, BUT the event only fires once when the save operations starts (only on the node on which the Save( ) was inititated and not on any subnodes).
Are there any other events that are more appropriate to listen to for this?
We've also tried breaking up the save operation (of a hierarchical model) by doing the traversal ourselves, but that seems to degrade the performance even further.
Perhaps we're trying to solve a problem for which FNH wasn't aimed to be used. We're new to it.
We've also set up an alternative solution using SqlBulkCopy, as recommended elsewhere.
We've seen the comments that FNH is primarily supposed for smaller transactions (OLTP) and not the type of exhaustive model we're bound to by our problem (signal processing of huge data volumes).
Background:
We're trying to use Fluent NHibernate on a larger database project with data gathered from fairly complex real time analysis (high frequency, multiple input signals, long experiment times etc). In a prototype we've built we see pretty scary wait times for the moment, and need to hook in some sort of reliable progress indicator.
Yes, now confirmed - as mentioned in my comment above. One (possible) solution to this is to simply turn of Cascades and do the model traversal manually and do explicit Save( ) calls.
This works, although it's not as neat as just handling an event. Still, given the genuin design of NHibernate, I bet there's certainly an event somewhere that could be intercepted - the question is just under what name. ... I bet someone on here knows more.
Also to improve performance we used a Stateless Session, experiemented with differnet batch size, and periodically/explicitly call Flush() and Clear(). See articles below for further details:
http://davybrion.com/blog/2008/10/bulk-data-operations-with-nhibernates-stateless-sessions/
http://ideas-net.blogspot.com/2009/03/nhibernate-update-performance-issue.html
Hope this helps.

Restarting agent program after it crashes

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"
So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
lock_global_variable(balance,agent_machine_id);
///////////////////// **POINT A**
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
/////////////////// **POINT B**
dequeue(); // once transaction is complete, request can be dequeued
}
Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
*atomic*
{
lock_global_variable(balance,agent_machine_id);
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
dequeue(); // once transaction is complete, request can be dequeued
}
}
I am looking for answers to these questions :
How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?
Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.
Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.
Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.
Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.
Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.
Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.
Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.
If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.

How would you create a cyclic task graph in TPL, and/or is this possible?

My project has a requirement to gather data from a number of sources, then do things in response to the completion of the gathering of that data. Some of the gathering tasks have dependencies on prior gathering tasks. TPL has been a good fit because it naturally continues with tasks from their antecedents, and the "final" tasks that use the results are again dependents. Great. However, we would like to have a "sleep and regather" task that starts upon completion of the "final" tasks; this task's job is logically to be the antecedent of the "final" tasks and kick off the next cycle. In effect, the TPL's DAG becomes cyclic, or, if thought of sequentially, a loop.
Is it possible to express this cyclic requirement completely within the TPL API? If so, how? Our current implementation instead does a WaitAll() on the antecedents, and then a Task.StartNew() given a delegate that does a sleep followed by rebuilding a task graph with the WaitAll(). This works, but seems a bit artificial.
There are a few options here. What you are doing now seems reasonable.
However, you could potentially setup the entire operation as a producer/consumer scenario using BlockingCollection<T>. If your consuming enumerable used a ManualResetEvent that was set after the WaitAll completed, it could allow a single "item" to be consumed at a time, using tasks as you have it written now.
That being said, this seems like a perfect candidate for the TPL Dataflow library (in CTP).

Is there something like a "long running offline transaction" for NHibernate or any other ORM?

In essence this is a followup of this question. I'm beginning to feel that I should give up the whole idea, but I'll give it one more shot.
What I want is pretty much like a DB transaction. It should track my changes to the DB and then in the end allow me to either commit or rollback them. If I insert an object, I should get it back in my next (appropriate) SELECT query. If I delete it, future SELECT queries should not return it. Etc.
But there is one catch - this transaction would be very long running. It would start when the user opened a form (I'm talking about Windows Forms here), and the commit/rollback would be when the user closed it(with OK/Cancel). So it could take anywhere between seconds and days. This requirement rules out a standard DB transaction because that would lock the tables/rows it touched, and other users wouldn't be able to use the system. Also the transaction should not commit ANY changes to the DB until it was really committed. So if one user makes some changes, others don't see them until OK button is hit. This prevents errors in case the computer crashes or is disconnected from the network.
I'm quite OK if the solution puts constraints on my model (I'm using MSSQL 2008, btw). I can design the DB/code any way I like. I'm also fine with the idea that a commit could fail because someone already modified one of the objects my transaction touched.
Is there anything like this? I looked at NHibernate.Burrow, but I'm not sure that that's the thing I want.
Added: It's the very beginning of the project so I'm not tied to NHibernate. I started out with it but I can still change easily.
As far as I can judge, DataObjects.Net supports exactly this concept via DisconnectedState. The feature is very new (released just few weeks ago), its preliminary documentation is here. WPF sample for DataObjects.Net uses it for UI transactions.
I'm not sure if it is mentioned there, but DisconnectedState, as well as its OperationLog can be serialized. So its cached state can survive even application restarts.
I don't think anyone will implement this in the NHibernate core, because nobody will use it. Viewmodel is not the same model as domain model.
This is not a direct answer to your question, but this is the sort of thing that WWF (gotta love the name) was set out to solve (not that it did so at least by v 3.5).
If you're still following this, Ayende Rahien has an article in MSDN magazine http://msdn.microsoft.com/en-us/magazine/ee819139.aspx about the session per form/presenter approach. Also take a look at chapter 5 of the NHibernate book http://manning.com/kuate/ (sample chapter available), the one on transactions and conversations.
As long as you delay the flush/transaction till the ok button is pressed, it should work (depending on the flush mode). But complete isolation is a difficult ask because your session will be able to access data that has been committed by other sessions when dealing with multiple entities. You will have to think about handling such issues.
As an aside, how would you deal with this situation if you don't use NHibernate?
EclipseLink has limited supported for such a beast. They call it "Conforming" and they implemented it in the "unit of work" context.