SCIPincludeBranchruleMybranchingrule() called many times - scip

I'm including my own branching rule on SCIP and I'm using the SCIPincludeBranchruleMybranchingrule() function to initialize some branching rule data. One of the things I do is to call the SCIPgetNVars() function. When I run the code, I see that the function is called many times (not once, as I thought, before the B&B algorithm starts) and I get the following error triggered by the SCIPgetNVars() function:
[src/scip/scip.c:10048] ERROR: invalid SCIP stage <0>
I'm confused about the use of SCIPincludeBranchruleMybranchingrule(), since the documentation states that this function can be use to initialize branching rule data. I would like to initialize some data that can be used at every B&B node, maybe the branching rule data is not the right way of doing it.
I'll appreciate any help!

The important thing to note here is that there is no problem available yet for which you want to access the variables.
Branching rules of SCIP provide several callbacks for data initialization. The include-
callback is only called once when SCIP starts, aka in the SCIP_STAGE_INIT stage of SCIP.
At this stage, you want the branching rule to inform SCIP that it exists, and optionally introduce some user parameters that are problem-independent.
There are two more callback-functions that allow for storing data which are better suited for what you intend to do; SCIPbranchruleInitsolMybranchingrule which is called just before the (presolved)
problem is about to be solved via branch-and-bound, and SCIPbranchruleInitMybranchingrule, which is called after a newly read problem was transformed.
Since the execution of a branching-rule is restricted to within the branch-and-bound-process, your callback is SCIPbranchruleInitSolMybranchingrule which you should implement by moving all problem specific data initialization there. Don't forget to also implement SCIPbranchruleExitsolMybranchrule to free the stored data every time the branch-and-bound search is terminated, either if search was terminated, or if a time limit was hit, or SCIP decided that it wants another restart.
FYI: Data that is allocated during the include-callback can be freed with the SCIPbranchruleFreeMybranchingrule-callback, which is executed once when SCIP is about to exit and free all left system-memory.

Related

When to use multiple MTLRenderCommandEncoders to perform my Metal rendering?

I'm learning Metal, and there's a conceptual question that I'm trying to wrap my head around: at what level, exactly, should my code handle successive drawing operations that require different pipeline states? As I understand it (from answers like this: https://stackoverflow.com/a/43827775/2752221), I can use a single MTLRenderCommandEncoder and change its pipeline state, the vertex buffer it's using, etc., between calls to drawPrimitives:, and the encoder state that was current at the time of each call to drawPrimitives: will be preserved. So that's great. But it also seems like the design of Metal is such that one can make multiple MTLRenderCommandEncoder instances, and use them to sequentially throw batches of commands into a MTLCommandBuffer. Given that the former works – using one MTLRenderCommandEncoder and changing its state – why would one do the latter? Under what circumstances is it correct to do the former, and under what circumstances is it necessary to do the latter? What is an example of a situation where the latter would be necessary/appropriate?
If it matters, I'm working on a macOS app, using Objective-C. Thanks.
Ignoring multithreaded encoding cases, which are somewhat advanced, the main reason you'd want to create multiple render command encoders during a frame is because you need to change which textures you're rendering to.
You'll notice that you need to provide a render pass descriptor when creating a render command encoder. For this reason, we often say that the sequence of commands belonging to a particular encoder constitute a render pass. The attachments of that descriptor refer to the textures that will be written to by the commands encoded by the encoder.
Many different techniques, including shadow mapping and postprocessing effects like bloom require multiple passes to produce. Since you can't change attachments in the midst of a pass, creating a new encoder is the only way to encode multiple passes in a frame.
Relatedly, you should ordinarily use one command buffer per frame. You can, however, sometimes reduce frame time by splitting your passes across multiple command buffers, but this is highly dependent on the shape of your workload and should only be done in tandem with profiling, as it's not always an optimization.
In addition to Warren's answer, another way to look at the question is by examining the API. A number of Metal objects are created from descriptors. The properties of the descriptor at the time an object is created from it govern that object for its lifetime. Those are aspects of the object that can't be changed after creation.
By contrast, the object will have various setter methods to modify other properties over its lifetime.
For a render command encoder, the properties that are fixed for its lifetime are those specified by the MTLRenderPassDescriptor used to create it. If you want to render with different values for any of those properties, the only way to do so is to create a new encoder from a different descriptor. On the other hand, if you can do everything you need/want to do by using the encoder's setter methods, then you don't need a new encoder.

How should I organise a pile of singly used functions?

I am writing a C++ OpenCV-based computer vision program. The basic idea of the program could be described as follows:
Read an image from a camera.
Do some magic to the image.
Display the transformed image.
The implementation of the core logic of the program (step 2) falls into a sequential calling of OpenCV functions for image processing. It is something about 50 function calls. Some temporary image objects are created to store intermediate results, but, apart from that, no additional entities are created. The functions from step 2 are used only once.
I am confused about organising this type of code (which feels more like a script). I used to create several classes for each logical step of the image processing. Say, here I could create 3 classes like ImagePreprocessor, ImageProcessor, and ImagePostprocessor, and split the abovementioned 50 OpenCV calls and temorary images correspondingly between them. But it doesn't feel like a resonable OOP design. The classes would be nothing more than a way to store the function calls.
The main() function would still just create a single object of each class and call thier methods consequently:
image_preprocessor.do_magic(img);
image_processor.do_magic(img);
image_postprocessor.do_magic(img);
Which is, to my impression, essentially the same thing as callling 50 OpenCV functions one by one.
I start to question whether this type of code requiers an OOP design at all. After all, I can simply provide a function do_magic(), or three functions preprocess(), process(), and postprocess(). But, this approach doesn't feel like a good practice as well: it is still just a pile of function calls, separated into a different function.
I wonder, are there some common practices to organise this script-like kind of code? And what would be the way if this code is a part of a large OOP system?
Usually, in Image Processing, you have a pipeline of various Image Processing Modules. Same is applicable on Video Processing, where each Image is processed according to its timestamp order in the video.
Constraints to consider before designing such pipeline:
Order of Execution of these modules is not always same. Thus, the pipeline should be easily configurable.
All modules of the pipeline should be executable in parallel with each other.
Each module of the pipeline may also have a multithreaded operation. (Out of scope of this answer, but is a good idea when a single module becomes the bottleneck for the pipeline).
Each module should easily adhere to the design and have the flexibility of internal implementation changes without affecting other modules.
The benefit of preprocessing of a frame by one module should be available to later modules.
Proposed Design.
Video Pipeline
A video pipeline is a collection of modules. For now, assume module is a class whose process method is called with some data. How each module can be executed will depend on how such modules are stored in VideoPipeline! To further explain, see below two categories:-
Here, let’s say we have modules A, B, and C which always execute in same order. We will discuss the solution with a video of Frame 1, 2 and 3.
a. Linked List: In a single-threaded application, frame 1 is first executed by A, then B and then C. The process is repeated for next frame and so on. So linked list seems like an excellent choice for the single threaded application.
For a multi-threaded application, speed is what matters. So, of course, you would want all your modules running 128-core machine. This is where Pipeline class comes into play. If each Pipeline object runs in a separate thread, the whole application which may have 10 or 20 modules starts running multithreaded. Note that the single-thread/multithread approach can be made configurable
b. Directed Acyclic Graph: Above-linked list implementation can be further improved when you have high processing power and want to reduce the lag between input and response time of pipeline. Such a case is when module C does not depend on B, but on A. In such case, any frame can be parallelly processed by module B and module C using a DAG based implementation. However, I wouldn’t recommend this as the benefits are not so great compared to the increased complexity, as further management of output from module B and C needs to be done by say module D where D depends on B or C or both. The number of scenarios increases.
Thus, for simplicity sake, let’s use LinkedList based design.
Pipeline
Create a linked list of PipelineElement.
Make process method of pipeline call process method of the first element.
PipelineElement
First, the PipelineElement processes the information by calling its ImageProcessor(read below). The PipelineElement will pass a Packet(of all data, read below) to ImageProcessor and receives the updated packet.
If next element is not null, call next PipelineElement process and pass updated packet.
If next element of a PipelineElement is null, stop. This element is special as it has an Observer object. Other PipelineElement will be set to null for Observer field.
FrameReader(VIdeoReader/ImageReader)
For video/image reader, create an abstract class. Whether you process video or image or multiple, processing is done one frame at a time, so create an abstract class(interface) ImageProcessor.
A FrameReader object stores reference to the pipeline.
For each frame, it pushes the information in by calling process method of Pipeline.
ImageProcessor
There is no Pre and Post ImageProcessor. For example, retinex processing is used as Post Processing but some application can use it as PreProcessing. Retinex processing class will implement ImageProcessor. Each element will hold Its ImageProcessor and Next PipeLineElement object.
Observer
A special class which extends PipelineElement and provides a meaningful output using GUI or disk.
Multithreading
1. Make each method run in its thread.
2. Each thread will poll messages from a BlockingQueue( of small size like 2-3 Frames) to act as a buffer between two PipelineElements. Note: The queue helps in averaging the speed of each module. Thus, small jitters(a module taking too long time for a frame) does not affect video output rate and provides smooth playback.
Packet
A packet will store all the information such as input or Configuration class object. This way you can store intermediate calculations as well as observe a real-time effect of changing configuration of an algorithm using a Configuration Manager.
To conclude, each element can now process in parallel. The first element will process nth frame, the second element will process n-1th frame, and soon, but with this, a lot more issues such as pipeline bottlenecks and additional delays due to less core power available to each element will pop up.
This structure lends itself to the pipes and filters architecture (see Pattern-Oriented Software Architecture Volume 1: A System of Patterns by Frank Buschmann):
The Pipes and Filters architectural pattern provides a structure for
systems that process a stream of data. Each processing step is
encapsulated in a filter component. Data is passed through pipes
between adjacent filters. Recombining filters allows you to build
families of related systems.
See also this short description (with images) from the Enterprise Integration Patterns book.

Get value of control refnum in one step in SubVI

I'm trying to de-spaghetti a big UI by creating SubVIs that handle only the controls that are relevant, via control refnums.
Now, when extracting the code from the main VI and re-wiring into the subVIs, things get clutter-y.
To read/write these refnums, I have to do a two-step process. First add a terminal to get the control refnum value and then another to get the value of the control.
Wiring the refnums everywhere is not really an option as that will create more spaghetti if there are more than two of them. (usually 4-10)
Is there a better way?
UPDATE
Guys, this is a low-level question about the picture above, not really a queston about large scale architecture / design patterns. I'm using QMH, classes, et.al. where appropriate.
I just feel there should be a way to get the typed value from a typed control ref in one step. It feels kind of common.
In the caller VI, where the controls/indicators actually live, create all your references, then bundle them into clusters of relevant pieces. Pass the clusters into your subVIs, giving a given subVI only the cluster it needs. This both keeps your conpane cleaned up and and makes it clear the interface that each subVI is talking to. Instead of a cluster, you may want to create a LV class to further encapsulate and define the sub-UI operations, but that's generally only on larger projects where some components of the UI will be reused in other UIs.
I'm not sure there is a low-touch way to de-spaghetti a UI with lots of controls and indicators.
My suggestion is to rework the top-level VI into a queued message handler, which would allow you to decouple the user interaction from the application's response. In other words, rather than moving both the controls and the code that handles their changes to subVIs (as you're currently doing), this would keep the controls where they are (so you don't need to use ref nums and property nodes) and only move the code to subVIs.
This design pattern is built-in to recent versions of LabVIEW: navigate to File » Create Project to make LabVIEW generate a project you can evaluate. For more information about understanding how to extend and customize it, see this NI slide deck: Decisions Behind the Design of the
Queued Message Handler Template.
In general, it is not the best practice to read/write value using refnum in perspective of performance. It requires a thread swap to the UI thread each time (which is a heavy process), whereas the FP Terminal is privileged to be able to update the panel without switching execution threads and without mutex friction.
Using references to access value
Requires to update the front panel item every single time they are called.
They are a pass by reference function as opposed to a pass by value function. This means they are essentially pointers to specific memory locations. The pointers must be de-referenced, and then the value in memory updated. The process of de-referencing the variables causes them to be slower than Controls/Indicators, or Local Variables.
Property Nodes cause the front panel of a SubVI to remain in memory, which increases memory use. If the front panel of a SubVI is not displayed, remove property nodes to decrease memory use.
If after this you want to use this method you can use VI scripting to speed up the process: http://sine.ni.com/nips/cds/view/p/lang/en/nid/209110

LabVIEW: Programmatically setting FPGA I/O variables (templates?)

Question
Is there a way to programmatically set what FPGA variables I am reading from or writing to so that I can generalize my main simulation loop for every object that I want to run? The simulation loops for each object are identical except for which FPGA variables they read and write. Details follow.
Background
I have a code that uses LabVIEW OOP to define a bunch of things that I want to simulate. Each thing then has an update method that runs inside of a Timed Loop on an RT controller, takes a cluster of inputs, and returns a cluster of outputs. Some of these inputs come from an FPGA, and some of the outputs are passed back to the FPGA for some processing before being sent out to hardware.
My problem is that I have a separate simulation VI for every thing in my code, since different values are read from and returned to the FPGA for each thing. This is a pain for maintainability and seems to cry out for a better method. The problem is illustrated below. The important parts are the FPGA input and output nodes (change for every thing), and the input and output clusters for the update method (always the same).
Is there some way to define a generic main simulation VI and then programmatically (maybe with properties stored in my things) tell it which specific inputs and outputs to use from the FPGA?
If so then I think the obvious next step would be to make the main simulation loop a public method for my objects and just call that method for each object that I need to simulate.
Thanks!
The short answer is no. Unfortunately once you get down to the hardware level with LabVIEW FPGA things begin to get very static and rely on hard-coded IO access. This is typically handled exactly how you have presented your current approach. However, you may be able encapsulate the IO access with a bit of trickery here.
Consider this, define the IO nodes on your diagram as interfaces and abstract them away with a function (or VI or method, whichever term you prefer). You can implement this with either a dynamic VI call or an object oriented approach.
You know the data types defined by your interface are well known because you are pushing and pulling them from clusters that do not change.
By abstracting away the hardware IO with a method call you can then maintain a library of function calls that represent unique hardware access for every "thing" in your system. This will encapsulate changes to the hardware IO access within a piece of code dedicated to that job.
Using dynamic VI calls is ugly but you can use the properties of your "things" to dictate the path to the exact function you need to call for that thing's IO.
An object oriented approach might have you create a small class hierarchy with a root object that represents generic IO access (probably doing nothing) with children overriding a core method call for reading or writing. This call would take your FPGA reference in and spit out the variables every hardware call will return (or vice versa for a read). Under the hood it is taking care of deciding exactly which IO on the FPGA to access. Example below:
Keep in mind that this is nowhere near functional, I just wanted you to see what the diagram might look like. The approach will help you further generalize your main loop and allow you to embed it within a public call as you had suggested.
This looks like an [object mapping] problem which LabVIEW doesn't have great support for, but it can be done.
My code maps one cluster to another assuming the control types are the same using a 2 column array as a "lookup."

Why certain block closure optimization is good and valid?

In a very interesting post from 2001 Allen Wirfs-Brock explains how to implement block closures without reifying the (native) stack.
From the many ideas he exposes there is one that I don't quite understand and I thought it would be a good idea to ask it here. He says:
Any variable that can never be assigned during the lifetime of a block (e.g., arguments of enclosing methods and blocks) need not be placed in the environment if instead a copy of the variable is placed in the closure when it is created
There are two things I'm not sure I understand well enough:
Why using two copies of the read-only variable is faster than having the variable moved to the environment? Is it because it would be faster for the enclosing context to access the (original) variable in the stack?
How can we ensure that the two variables remain synchronized?
In question 1 there must be another reason. Otherwise I don't see the gain (when compared with the cost of implementing the optimization.)
For Question 2 take a non argument that is assigned in the method and not in the block. Why the oop stored in the stack would remain unchanged during the life of the block?
I think I know the answer to Q2: Because the execution of the block cannot be intertwined with the execution of the method, i.e., while the block lives, the enclosing context does not run. But isn't there any way to modify the stack temporary while the block is alive?
Thanks to the comment of #aka.nice I found the answers to the two questions in Clement Bera's post, whose reading is both pleasant and clarifying.
For Q1 let's first say that Allen's remark means that the copy of the read-only variable can be placed in the block's stack, as if it were a local temporary of the block. The advantage of doing this only materializes if all variables defined outside the block and used inside it are never written in the block. Under these circumstances there would be no need to create the environment array and to emit any prolog or epilog to take care of it.
The machine code that accesses a stack variable is equivalent to the one required to access the environment one because the first would address the location using [ebp + offset] while the second would use [edi + offest], once edi has been set to point to the environment array (tempVector in Clement's notation.) So, there is no gain if some but not all of the environment variables are read-only.
The second question is also answered in Clement's excellent blog. Yes, there is another way to break the synchrony between the original variable and its copy in the block's stack: the debugger (as aka.nice would have told us!) If the programmer modifies the variable in the enclosing context, the debugger will need to detect the action and update the copy as well. Same if the programmer modifies the copy held in the block's stack.
I'm glad I decided to post the question here. The help I received from aka.nice and Clement Bera, plus the comments some people sent me by email helped a lot in augmenting my understanding.
One final remark. Wirfs-Brock claims that avoiding the reification of method contexts is mandatory. I tend to agree. However, many important operations on these data structures can be better implemented if the reification follows the lightweight pattern. More precisely, when debugging you can model these contexts with "viewers" that point to the native stack and use two indexes to delimit the portion that corresponds to the activation under analysis. This is both efficient and clean and the combination of both techniques leads to the best of the worlds because you can have speed and expressiveness at once. Smalltalk is amazing.