Correct way to represent a while loop with one task in BPMN? - bpmn

Which is the correct~er way in BPMN to represent a simple while loop that redirects to one task only?

I would say that using the loop activity is the better option as it helps keep the process model tidy.
Also be careful when creating loop in a process as usually task definition change between the first iteration and the second. e.g. first iteration is creation of a file, second will actually be an edition of the file: two different actions (create and edit) should not be in a single task definition.

Normally, the BPMN represents activities marching through time in a linear fashion similar to a Value Stream Map. To create a backward loop would disrupt the timeline.

Related

UML activity diagrams: the meaning of <<iterative >>

I want to check what is the definition of «iterative» in expansion regions in activity diagrams. For me personally this was never a question because I understand it as letting me do a For loop, e.g.,
For i=1 to 10
Do-Something // So it does it 10 times
End For
However, while I was presenting my UML diagram to an audience, an engineer team leader (not a UML maven) objected against the term ‘iterative’, because he understood ‘iterative’ to mean an 'iterative process' such that each step improves a result. I am also aware of this definition, but I assume the UML definition is not that, but rather means a simple For-Loop.
Please confirm that the UML definition of «iterative» and iteration is like a simple For-loop. Or otherwise, if so.
No, it has a different meaning. UML 2.5 states in p. 480:
The mode of an ExpansionRegion controls how its expansion executions proceed.
If the value is iterative, the expansion executions must occur in an iterative sequence, with one completing before another can begin. The first expansion execution begins immediately when the ExpansionRegion starts executing, with subsequent executions starting when the previous execution is completed. If the input collections are ordered, then the expansion executions are sequenced in the order induced by the input collection. Otherwise, the order of the expansion executions is not defined.
Other values for this keyword are parallel and stream. You can guess that behavior defined in a parallel region can be executed in parallel. stream is a bit more complicated and you might read on that page in the UML spec.
The for-loop itself comes from the input collection you pass to the region. This can be processed in either of the above ways.
tl;dr
So rather than a for loop the keyword «iterative» for the region tells that it's behavior may not be handeled in parallel.
Ahhh, semantics...
First a disclaimer - I am not a native English speaker. Yet my believe both my level of English and IT experience are sufficient to answer this question.
Let's have a look at the dictionary definition of iterative first:
iterative adjective
/ˈɪtərətɪv/
/ˈɪtəreɪtɪv/, /ˈɪtərətɪv/
​(of a process) that involves repeating a process or set of instructions again and again, each time applying it to the result of the previous stage
We used an iterative process of refinement and modification.
an iterative procedure/method/approach
The highlight with a script font is mine.
Of course this is a pure word definition, not in the context of software development.
In real life a process can quite easily be considered repetitive but in itself not really iterative. Imagine an assembly line in a mass production factory. On one of the positions a particular screw/set of screws is applied to join two or more elements. For every next run, identical set of elements the same type and number of screws is applied. There is a virtually endless stream of similar part sets, each set consisting of the same type of parts as previously and requiring the same kind of connection. From the position perspective joining the elements is a repetitive process but it is not iterative, as each join is applied to a different set of elements - it does not apply to those already joined.
If you think of a code, it's somewhat different though. When applying a loop, almost always you have some sort of a resulting set impacted by it and one can argue that with every loop step that resulting set is being further changed, meaning the next loop step is applied on the result of the previous step. From this perspective almost every loop is iterative.
On the other hand, you can have a loop like that:
loop
wait 10
while buffer is empty
read buffer
You can clearly say it is a loop and nothing is being changed. All the code does is waiting for a buffer to fill. So it is not iterative.
For UML specifically though the precise meaning is included in qwerty_so's answer so I will not repeat it here.

How best to handle errors and or missing data in a Neuraxle pipeline?

Let's assume you have a pipeline with steps that can fail for some input elements for example:
FetchSomeImagesFromIds -> Resize -> DoSomethingElse
In this case the 1st step downloads 10 out of a 100 images... and passes those to resize..
I'm looking for suggestions on how to report or handle this missing data at the pipeline level for example something like:
Pipeline.errors() -> PluginX: Succeed: 10, Failed: 90, Total: 100, Errors: key: error
My current implementation removes the missing keys from current_keys so that the key -> data mapping is kept and actually exits the whole program if there's anything missing.. given the previous problem with https://github.com/Neuraxio/Neuraxle/issues/418
Thoughts?
I think that using a Service in your pipeline would be the good way. Here's what I'd do if I think about it, although more solutions could exist:
Create your pipeline and pipeline steps.
Create a context and add to the context a custom memory bank service in which you can keep track of what data processed properly or not properly. Depending on your needs and broader context, it could be either a positive data bank, or negative one, in which you'd respectively either add the processed examples or substract them from the set.
Adapt the pipeline made at point 1, and its steps, such that it can use the service from the context in the handle_transform_data_container methods. You could even have a WhileTrue() step which would loop forever until a BreakIf() step would evaluate that everything has been processed for instance, if you want your pipeline to work until everything has been processed, and fetching the batches as they come without an end condition other than the BreakIf step and its lambda. The lambda would call the service indeed to know where the data processing is at.
At the end of the whole processing, wheter you breaked prematurely (without any while loop) or wheter you did break only at the end, you still have access to the context and to what's stored inside.
More info:
To see an example on how to use the service and context together and using this in steps, see this answer: https://stackoverflow.com/a/64850395/2476920
Also note that the BreakIf and While steps are core steps that are not yet developed in Neuraxle. We've recently had a brilliant ideas with Vincent Antaki where Neuraxle is a language, and therefore steps in a pipeline are like basic language keywords (While, Break, Continue, ForEach) and so forth. This abstraction is powerful in the sense that it's possible to control a data flow as a logical execution flow.
This is my best solution for now and this exactly has never been done yet. There may be much more other ways to do this, with creativity. One could even think of doing TryCatch steps in the pipeline to catch some errors and managing what happens in the execution flow of the data.
All of this is to be done in Neuraxle and is not yet done. We're looking for collaborators. That would do a nice paper as well: "Machine Learning Pipelines as a Programming Language" :)

In “Given-When-Then” style BDD tests, is it OK to have multiple “When”s conjoined with an “And”?

I read Bob Martin's brilliant article on how "Given-When-Then" can actual be compared to an FSM. It got me thinking. Is it OK for a BDD test to have multiple "When"s?
For eg.
GIVEN my system is in a defined state
WHEN an event A occurs
AND an event B occurs
AND an event C occurs
THEN my system should behave in this manner
I personally think these should be 3 different tests for good separation of intent. But other than that, are there any compelling reasons for or against this approach?
When multiple steps (WHEN) are needed before you do your actual assertion (THEN), I prefer to group them in the initial condition part (GIVEN) and keep only one in the WHEN section. This kind of shows that the event that really triggers the "action" of my SUT is this one, and that the previous one are more steps to get there.
Your test would become:
GIVEN my system is in a defined state
AND an event A occurs
AND an event B occurs
WHEN an event C occurs
THEN my system should behave in this manner
but this is more of a personal preference I guess.
If you truly need to test that a system behaves in a particular manner under those specific conditions, it's a perfectly acceptable way to write a test.
I found that the other limiting factor could be in an E2E testing scenario that you would like to reuse a statement multiple times. In my case the BDD framework of my choice(pytest_bdd) is implemented in a way that a given statement can have a singular return value and it maps the then input parameters automagically by the name of the function that was mapped to the given step. Now this design prevents reusability whereas in my case I wanted that. In short I needed to create objects and add them to a sequence object provided by another given statement. The way I worked around this limitation is by using a test fixture(which I named test_context), which was a python dictionary(a hashmap) and used when statements that don't have same singular requirement so the '(when)add object to sequence' step looked up the sequence in the context and appended the object in question to it. So now I could reuse the add object to sequence action multiple times.
This requirement was tricky because BDD aims to be descriptive. So I could have used a single given statement with the pickled memory map of the sequence object that I wanted to perform test action on. BUT would it have been useful? I think not. I needed to get the sequence constructed first and that needed reusable statements. And although this is not in the BDD bible I think in the end it is a practical and pragmatic solution to a very real E2E descriptive testing problem.

Inactive sequence in stacked sequence structure in LabVIEW

I have a stacked sequence structure with 6 sequences. My problem is when I run the program, after the first sequence, data which should flow to the second one does not pass through this sequence. I checked it by having two numeric indicators, one inside and the other outside the wall of this sequence. Do you have any idea about this problem?
Thanks.
If you have a more recent version of LabVIEW, the fastest way to check your data flow is to right click on the sequence structure, and select Replace>>Replace with Flat Sequence.
This will convert any sequence locals to wires, which makes debugging much easier. You can use Undo to revert the structure back, if needed.
On a more general note, it's usually a good idea to avoid stacked sequences. They're difficult to use well.
I agree with Jakub that we need a screenshot, but here's my attempt: you can either use a "Local Variable" or a Shift Register (I prefer the former unless you are using the variable in every sequence).

How would you create a cyclic task graph in TPL, and/or is this possible?

My project has a requirement to gather data from a number of sources, then do things in response to the completion of the gathering of that data. Some of the gathering tasks have dependencies on prior gathering tasks. TPL has been a good fit because it naturally continues with tasks from their antecedents, and the "final" tasks that use the results are again dependents. Great. However, we would like to have a "sleep and regather" task that starts upon completion of the "final" tasks; this task's job is logically to be the antecedent of the "final" tasks and kick off the next cycle. In effect, the TPL's DAG becomes cyclic, or, if thought of sequentially, a loop.
Is it possible to express this cyclic requirement completely within the TPL API? If so, how? Our current implementation instead does a WaitAll() on the antecedents, and then a Task.StartNew() given a delegate that does a sleep followed by rebuilding a task graph with the WaitAll(). This works, but seems a bit artificial.
There are a few options here. What you are doing now seems reasonable.
However, you could potentially setup the entire operation as a producer/consumer scenario using BlockingCollection<T>. If your consuming enumerable used a ManualResetEvent that was set after the WaitAll completed, it could allow a single "item" to be consumed at a time, using tasks as you have it written now.
That being said, this seems like a perfect candidate for the TPL Dataflow library (in CTP).