When does an information system get its own lane in BPMN? - bpmn

I understand that in BPMN, each actor within a pool gets a distinct lane. My general guideline of whether or not a non-human information system gets its own lane is if the system carries out any automated tasks that are displayed in the BPMN diagram--if it autonomously carries out a task, then it gets its own dedicated lane.
In particular, if the only time the information system appears is a data object (that is, a database) that supplies messages or data associations to tasks by human actors, but without the system having any of its own tasks, then I do not represent the database in its own lane; I rather place it in the lane of the most logical human actor.
Is this usage correct, or are there better or more accurate rules for when information systems get their own BPMN lane?

I agree with your usage, I generally use lanes for human actors to show responsabilities over a set of tasks. I do not model much lanes for systems: most of the time an automated tasks is under the (business) responsability of an actor of the process.

Related

Scheduling a Job that consumes multiple resources

I've got a problem which I think optaplanner may be able to solve, but I haven't seen a demo that quite fits what I'm looking to do. My problem set is scheduling IoT node usage for a testbed. Each test execution (job) requires different sets of constraints on the nodes it will use. For example, a job may ask for M nodes with resource A, and N nodes with resource B. It will also specify a length of time it needs the nodes for and a window in which the job start is acceptable. To successfully schedule a job, it must be able to claim enough resources to meet the job specific requirements (ie, hard limits).
Being new to optaplanner, my understanding is that most of the examples focus on only needing one resource per Job. Any insight into whether this problem could be solved with optaplanner and where to start would be highly appreciated.
If you haven't already, look at the (cheap time scheduling example](https://www.youtube.com/watch?v=r6KsveB6v-g&list=PLJY69IMbAdq0uKPnjtWXZ2x7KE1eWg3ns) and project job scheduling example.
The differentiating question is if when job J1 needs M nodes with resource A if whether or not any of those M nodes can also supply resource B, just not at the same time.
If that's not the case, this is an easy model: you can threat resource A as a capacity like cloud balancing.
If that is the case, it's a complex model (but still possible), for example the jobs are chained or time grained (=> planning var 1) and each job has tasks which are assigned to nodes (=> planning var 2). All of this is likely to need custom moves for efficiency.

How to model Storage Capacity in BPMN?

I am right now trying to model a warehouse with import and export processes. I have the problem that I do not know how I should model the capacity of different storage places in the warehouse. There are processes where vehicles with different loadings come and all of them need to be stored in the warehouse with a limited capacity. Else the arriving goods have to be declined.
I am modeling this process in a BPM Suite and was thinking about using Python to access this problem. I thought that I could simply use variables and if clauses to check the capacity of each storage. But if I would simulate this process with this approach then the variables are re-instantiated each time with the start value and do not hold the actual value., beucause with the script is included in the model as a script task.
Does anyone has other ideas to model capacity in BPMN?
Have you considered to not use BPMN as it is clearly adds more complexity than benefit in your case? Look at the Cadence Workflow which allows to specify orchestration logic using normal code and would support your requirements directly without any ugly workarounds.

How should I handle measurement logging in my Discrete Event Simulation engine?

NOTE: This question has been ported over from Programmers since it appears to be more appropriate here given the limitation of the language I'm using (VBA), the availability of appropriate tags here and the specificity of the problem (on the inference that Programmers addresses more theoretical Computer Science questions).
I'm attempting to build a Discrete Event Simulation library by following this tutorial and fleshing it out. I am limited to using VBA, so "just switch to [insert language here] and it's easy!" is unfortunately not possible. I have specifically chosen to implement this in Access VBA to have a convenient location to store configuration information and metrics.
How should I handle logging metrics in my Discrete Event Simulation engine?
If you don't want/need background, skip to The Design or The Question section below...
Simulation
The goal of a simulation of the type in question is to model a process to perform analysis of it that wouldn't be feasible or cost-effective in reality.
The canonical example of a simulation of this kind is a Bank:
Customers enter the bank and get in line with a statistically distributed frequency
Tellers are available to handle customers from the front of the line one by one taking an amount of time with a modelable distribution
As the line grows longer, the number of tellers available may have to be increased or decreased based on business rules
You can break this down into generic objects:
Entity: These would be the customers
Generator: This object generates Entities according to a distribution
Queue: This object represents the line at the bank. They find much real world use in acting as a buffer between a source of customers and a limited service.
Activity: This is a representation of the work done by a teller. It generally processes Entities from a Queue
Discrete Event Simulation
Instead of a continuous tick by tick simulation such as one might do with physical systems, a "Discrete Event" Simulation is a recognition that in many systems only critical events require process and the rest of the time nothing important to the state of the system is happening.
In the case of the Bank, critical events might be a customer entering the line, a teller becoming available, the manager deciding whether or not to open a new teller window, etc.
In a Discrete Event Simulation, the flow of time is kept by maintaining a Priority Queue of Events instead of an explicit clock. Time is incremented by popping the next event in chronological order (the minimum event time) off the queue and processing as necessary.
The Design
I've got a Priority Queue implemented as a Min Heap for now.
In order for the objects of the simulation to be processed as events, they implement an ISimulationEvent interface that provides an EventTime property and an Execute method. Those together mean the Priority Queue can schedule the events, then Execute them one at a time in the correct order and increment the simulation clock appropriately.
The simulation engine is a basic event loop that pops the next event and Executes it until there are none left. An event can reschedule itself to occur again or allow itself to go idle. For example, when a Generator is Executed it creates an Entity and then reschedules itself for the generation of the next Entity at some point in the future.
The Question
How should I handle logging metrics in my Discrete Event Simulation engine?
In the midst of this simulation, it is necessary to take metrics. How long are Entities waiting in the Queue? How many Acitivity resources are being utilized at any one point? How many Entities were generated since the last metrics were logged?
It follows logically that the metric logging should be scheduled as an event to take place every few units of time in the simulation.
The difficulty is that this ends up being a cross-cutting concern: metrics may need to be taken of Generators or Queues or Activities or even Entities. Consider also that it might be necessary to take derivative calculated metrics: e.g. measure a, b, c, and ((a-c)/100) + Log(b).
I'm thinking there are a few main ways to go:
Have a single, global Stats object that is aware of all of the simulation objects. Have the Generator/Queue/Activity/Entity objects store their properties in an associative array so that they can be referred to at runtime (VBA doesn't support much in the way of reflection). This way the statistics can be attached as needed Stats.AddStats(Object, Properties). This wouldn't support calculated metrics easily unless they are built into each object class as properties somehow.
Have a single, global Stats object that is aware of all of the simulation objects. Create some sort of ISimStats interface for the Generator/Queue/Activity/Entity classes to implement that returns an associative array of the important stats for that particular object. This would also allow runtime attachment, Stats.AddStats(ISimStats). The calculated metrics would have to be hardcoded in the straightforward implementation of this option.
Have multiple Stats objects, one per Generator/Queue/Activity/Entity as a child object. This might make it easier to implement simulation object-specific calculated metrics, but clogs up the Priority Queue a little bit with extra things to schedule. It might also cause tighter coupling, which is bad :(.
Some combination of the above or completely different solution I haven't thought of?
Let me know if I can provide more (or less) detail to clarify my question!
Any and every performance metric is a function of the model's state. The only time the state changes in a discrete event simulation is when an event occurs, so events are the only time you have to update your metrics. If you have enough storage, you can log every event, its time, and the state variables which got updated, and retrospectively construct any performance metric you want. If storage is an issue you can calculate some performance measures within the events that affect those measures. For instance, the appropriate time to calculate delay in queue is when a customer begins service (assuming you tagged each customer object with its arrival time). For delay in system it's when the customer ends service. If you want average delays, you can update the averages in those events. When somebody arrives, the size of the queue gets incremented, then they begin service it gets decremented. Etc., etc., etc.
You'll have to be careful calculating statistics such as average queue length, because you have to weight the queue lengths by the amount of time you were in that state: Avg(queue_length) = (1/T) integral[queue_length(t) dt]. Since the queue_length can only change at events, this actually boils down to summing the queue lengths multiplied by the amount of time you were at that length, then divide by total elapsed time.

Implementation of achievement systems in modern, complex games

Many games that are created these days come with their own achievement system that rewards players/users for accomplishing certain tasks. The badges system here on stackoverflow is exactly the same.
There are some problems though for which I couldn't figure out good solutions.
Achievement systems have to watch out for certain events all the time, think of a game that offers 20 to 30 achievements for e.g.: combat. The server would have to check for these events (e.g.: the player avoided x attacks of the opponent in this battle or the player walked x miles) all time.
How can a server handle this large amount of operations without slowing down and maybe even crashing?
Achievement systems usually need data that is only used in the core engine of the game and wouldn't be needed out of there anyway if there weren't those nasty achievements (think of e.g.: how often the player jumped during each fight, you don't want to store all this information in a database.). What I mean is that in some cases the only way of adding an achievement would be adding the code that checks for its current state to the game core, and thats usually a very bad idea.
How do achievement systems interact with the core of the game that holds the later unnecessary information? (see examples above)
How are they separated from the core of the game?
My examples may seem "harmless" but think of the 1000+ achievements currently available in World of Warcraft and the many, many players online at the same time, for example.
Achievement systems are really just a form of logging. For a system like this, publish/subscribe is a good approach. In this case, players publish information about themselves, and interested software components (that handle individual achievements) can subscribe. This allows you to watch public values with specialised logging code, without affecting any core game logic.
Take your 'player walked x miles' example. I would implement the distance walked as a field in the player object, since this is a simple value to increment and does not require increasing space over time. An achievement that rewards players that walk 10 miles is then a subscriber of that field. If there were many players then it would make sense to aggregate this value with one or more intermediate broker levels. For example, if 1 million players exist in the game, then you might aggregate the values with 1000 brokers, each responsible for tracking 1000 individual players. The achievement then subscribes to these brokers, rather than to all the players directly. Of course, the optimal hierarchy and number of subscribers is implementation-specific.
In the case of your fight example, players could publish details of their last fight in exactly the same way. An achievement that monitors jumping in fights would subscribe to this info, and check the number of jumps. Since no historical state is required, this does not grow with time either. Again, no core code need be modified; you only need to be able to access some values.
Note also that most rewards do not need to be instantaneous. This allows you some leeway in managing your traffic. In the previous example, you might not update the broker's published distance travelled until a player has walked a total of one more mile, or a day has passed since last update (incrementing internally until then). This is really just a form of caching; the exact parameters will depend on your problem.
You can even do this if you don't have access to source, for example in videogame emulators. A simple memory-scan tool can be written to find the displayed score for example. Once you have that your achievement system is as easy as polling that memory location every frame and seeing if their current "score" or whatever is higher than their highest score. The cool thing about videogame emulators is that memory locations are deterministic (no operating system).
There are two ways this is done in normal games.
Offline games: nothing as complex as pub/sub - that's massive overkill. Instead you just use a big map / dictionary, and log named "events". Then every X frames, or Y seconds (or, usually: "every time something dies, and 1x at end of level"), you iterate across achievements and do a quick check. When the designers want a new event logged, it's trivial for a programmer to add a line of code to record it.
NB: pub/sub is a poor fit for this IME because the designers never want "when player.distance = 50". What they actually want is "when player's distance as perceived by someone watching the screen seems to have travelled past the first village, or at least 4 screen widths to the right" -- i.e. far more vague and abstract than a simple counter.
In practice, that means that the logic goes at the point where the change happens (before the event is even published), which is a poor way to use pub/sub. There are some game engines that make it easier to do a "logic goes at the point of receipt" (the "sub" part), but they're not the majority, IME.
Online games: almost identical, except you store "counters" (int that goes up), and usually also: "deltas" (circular buffers of what's-happened frame to frame), and: "events" (complex things that happened in game that can be hard-coded into a single ID plus a fixed-size array of parameters). These are then exposed via e.g SNMP for other servers to collect at low CPU cost and asynchronously
i.e. almost the same as 1 above, except that you're careful to do two things:
Fixed-size memory usage; and if the "reading" servers go offline for a while, achievements won in that time will need to be re-won (although you usually can have a customer support person manually go through the main system logs and work out that the achievement "probably" was won, and manually award it)
Very low overhead; SNMP is a good standard for this, and most teams I know end up using it
If your game architecture is Event-driven, then you can implement achievements system using finite-state machines.

Stopping fraud by looking for patterns in data

What applications are recommended for SQL Server auditing and, more specifically, fraud investigations?
I need a tool that allows an end user to correlate data values to find fraud patterns. This tool must allow tuning as needed to reduce false positives.
It's also important that it be fairly intuitive. Ideally, once in place it would allow an end user unfamiliar with SQL to interface with it directly and customize using a GUI interface.
Suggestions?
It varies from simple business rules - user of type X aren't allowed to change discounts, no more than N uses of a coupon.
Through to some very clever Bayesian inference engine stuff that finds customer X's surname is the arabic translation of Mr Y's name who signed for him as a mortgage guarantee and they claim different home addresses but in the same zip code. This stuff gets very '6figure' pricey
Data-mining is used by law enforcement and credit card companies to stop criminals. There are patterns in large data sets that can reveal a greater motive. The more data the law enforcement has, the better they can track down the criminal(s).
You want to gather as much data as you can about a crime that may happen. This means you want to run a Network Intrusion Detection System (NIDS) on the Database's network. Snort is a very good NIDS and its free and open source. You wan to provide as much evidence of a crime to law enforcement and the FBI will LOVE your snort logs. I say when because its only a matter of time.