When To Use Flowgear Subworkflow - flowgear

I will creating one demo for used of flowgear subworkflow,i have read help document but i am not understand when to used subworkflow.Can you please give me one example for how to create subworkflow in flowgear and in which situation i used flowgear subworkflow.
thanks

Think of referencing a sub-workflow as the equivalent of calling a function from in function in code. You should factor workflows into multiple workflows and call them from each other when your top level workflow gets too big (readability), does too many things (separation of concerns) or you need to use elements of a workflow in other places (re-use).

Related

How best to handle errors and or missing data in a Neuraxle pipeline?

Let's assume you have a pipeline with steps that can fail for some input elements for example:
FetchSomeImagesFromIds -> Resize -> DoSomethingElse
In this case the 1st step downloads 10 out of a 100 images... and passes those to resize..
I'm looking for suggestions on how to report or handle this missing data at the pipeline level for example something like:
Pipeline.errors() -> PluginX: Succeed: 10, Failed: 90, Total: 100, Errors: key: error
My current implementation removes the missing keys from current_keys so that the key -> data mapping is kept and actually exits the whole program if there's anything missing.. given the previous problem with https://github.com/Neuraxio/Neuraxle/issues/418
Thoughts?
I think that using a Service in your pipeline would be the good way. Here's what I'd do if I think about it, although more solutions could exist:
Create your pipeline and pipeline steps.
Create a context and add to the context a custom memory bank service in which you can keep track of what data processed properly or not properly. Depending on your needs and broader context, it could be either a positive data bank, or negative one, in which you'd respectively either add the processed examples or substract them from the set.
Adapt the pipeline made at point 1, and its steps, such that it can use the service from the context in the handle_transform_data_container methods. You could even have a WhileTrue() step which would loop forever until a BreakIf() step would evaluate that everything has been processed for instance, if you want your pipeline to work until everything has been processed, and fetching the batches as they come without an end condition other than the BreakIf step and its lambda. The lambda would call the service indeed to know where the data processing is at.
At the end of the whole processing, wheter you breaked prematurely (without any while loop) or wheter you did break only at the end, you still have access to the context and to what's stored inside.
More info:
To see an example on how to use the service and context together and using this in steps, see this answer: https://stackoverflow.com/a/64850395/2476920
Also note that the BreakIf and While steps are core steps that are not yet developed in Neuraxle. We've recently had a brilliant ideas with Vincent Antaki where Neuraxle is a language, and therefore steps in a pipeline are like basic language keywords (While, Break, Continue, ForEach) and so forth. This abstraction is powerful in the sense that it's possible to control a data flow as a logical execution flow.
This is my best solution for now and this exactly has never been done yet. There may be much more other ways to do this, with creativity. One could even think of doing TryCatch steps in the pipeline to catch some errors and managing what happens in the execution flow of the data.
All of this is to be done in Neuraxle and is not yet done. We're looking for collaborators. That would do a nice paper as well: "Machine Learning Pipelines as a Programming Language" :)

Google Dataflow Sharing Resources Between Windows

I am currently building a google data-flow pipeline that writes to multiple big query tables at run-time. The problem I am currently facing is, I need to re-use the resources like big query service instance, table info etc. (I do not want to re-create those resources every time) but I am not able to cache them in an efficient way.
Currently I am using a simple factory to cache them (using static concurrent hash map). The pipeline does not seem to pick those from the cache (actually it does it for couple of times but most of them are re-created).
I saw some work around with fixed size session windows but I need more simpler solution if there exists any.
So, is there any best practices or solution to the current problem I am facing.
Is there any way to share resources between windows ?
Actually I misplaced the logging information which let to invert the result (my bad). But the solution with Static Factory separate from the pipeline job seem to resolve the resource sharing issue. Hope this helps to anyone having similar issue further :)

PigServer or PigRunner? Which is better?

I have written embedded pig program using PigServer class.But I come to know that we can also execute queries using PigRunner class.
Can anyone tell which one is better? please Explain the reason as well.
PigRunner essentially presents the same interface as the command line program "pig" with the advantage that it can be called without going to the system shell and that it returns a PigStats objects. It is therefore convenient for running complete user supplied scripts.
PigServer however allows on-the-fly creation and registration of queries, and then programmatic iteration over the results. It therefore provides a much more flexible and complete interface to PIG.

help me define process and procedure?

I have never undertood the basic difference (if there is any) between these two terms "process" and "procedure", could you help me out? it can be answered in programming-terms or in any other terms you like.
A process involves procedures, because the process is the whole, while the procedure is the part. In some languages (like vb, sql) procedure is a method which does not return values, in counterpart to the function that return values. Also in computing a process means a program that is being executed or at least is loaded in memory.
Process is business oriented (it can be represented by a workflow diagram), normally includes a set of business rules, while the procedure is algorithm oriented (it can be represented by a flow diagram).
See:
http://en.wikipedia.org/wiki/Procedure_(term)
http://en.wikipedia.org/wiki/Process_(computing)
Here are the definitions for both terms provided by the Information Technology Infrastructure Library (ITIL):
Procedure: A Document containing steps that specify how to achieve an
Activity. Procedures are defined as
part of Processes. See Work
Instruction.
Process: A structured set of activities designed to accomplish a
specific Objective. A Process takes
one or more defined inputs and turns
them into defined outputs. A Process
may include any of the Roles,
responsibilities, tools and management
Controls required to reliably deliver
the outputs. A Process may define
Policies, Standards, Guidelines,
Activities, and Work Instructions if
they are needed.
I found this link which I think sums it up Process versus Procedures
I think the first two comparisons are crucial and give a good idea of what the rest elaborate on:
Procedures are driven by completion of the task
Processes are driven by achievement of a desired outcome
Procedures are implemented
Processes are operated
In the sicp book, there is a section: 1.2 Procedures and the Processes They Generate
And the description of procedure may help understand:
A procedure is a pattern for the local evolution of a computational process. It specifies how each stage of the process is built upon the previous stage. We would like to be able to make statements about the overall, or global, behavior of a process whose local evolution has been specified by a procedure. This is very difficult to do in general, but we can at least try to describe some typical patterns of process evolution.
Per my understanding, a procedure is about how to program to resolve your problems with the program language while a process is what the computer need to do according to your defined procedure.
Policy is a rule or regulation for a task.
Process is a high level view on how to achieve task, simply it is a way.
Procedure is an instruction to perform an activity within a process.

DRY for JMeter tests

Is there a way to modularize JMeter tests.
I have recorded several use cases for our application. Each of them is in a separate thread group in the same test plan. To control the workflow I wrote some primitives (e.g. postprocessor elements) that are used in many of these thread groups.
Is there a way not to copy these elements into each thread group but to use some kind of referencing within the same test plan? What would also be helpful is a way to reference elements from a different file.
Does anybody have any solutions or workarounds. I guess I am not the only one trying to follow the DRY principle...
I think this post from Atlassian describes what you're after using Module controllers. I've not tried it myself yet, but have it on my list of things to do :)
http://blogs.atlassian.com/developer/2008/10/performance_testing_with_jmete.html
Jared
You can't do this with JMeter. The UI doesn't support it. The Workbench would be a perfect place to store those common elements but it's not saved in JMX.
However, you can parameterize just about anything so you can achieve similar effects. For example, we use the same regex post processor in several thread groups. Even though we can't share the processor, the whole expression is a parameter defined in the test plan, which is shared. We only need to change one place when the regex changes.
They are talking about saving Workbench in a future version of Jmeter. Once that's done, it's trivial to add some UI to refer to the element in Workbench.
Module controllers are useful for executing the same samples in different thread groups.
It is possible to use the same assertions in multiple thread groups very easily.
At your Test Plan level, create a set of User Defined variables with names like "Expected_Result_x". Then, in your response assertion, simply reference the variable name ${Expected_Result_x}. You would still need to add the assertion manually to every page you want a particular assertion on, but now you only have to change it one place if the assertion changes.