Which matplotlib.axes methods are safe to call from a background thread? - matplotlib

I’m writing an interactive application using wxPython and matplotlib. I have a single graph embedded in the window as a FigureCanvasWxAgg instance. I’m using this graph to display some large data sets, between 64,000 and 512,000 data points, so matplotlib’s rendering takes a while. New data can arrive every 1–2 seconds so rendering speed is important to me.
Right now I have an update_graph_display method that does all of the work of updating the graph. It handles updating the actual data as well as things like changing the y axis scale from linear to logarithmic in response to a user action. All in all, this method calls quite a few methods on my axes instance: set_xlim, set_ylabel, plot, annotate, and a handful of others.
The update_graph_display is wrapped in a decorator that forces it to run on the main thread in order to prevent the UI from being modified from multiple threads simultaneously. The problem is that all of this graph computation and drawing takes a while, and since all of this work happens on the main thread the application is unresponsive for noticeable periods of time.
To what extent can the computation of the graph contents be done on some other thread? Can I call set_xlim, plot, and friends on a background thread, deferring just the final canvas.draw() call to the main thread? Or are there some axes methods which will themselves force the graph to redraw itself?

I will reproduce #tcaswell’s comment:
No Axes methods should force a re-draw (and if you find any that do please report it as a bug), but I don't know enough about threading to tell you they will be safe. You might get some traction using blitting and/or re-using artists as much as possible (via set_data calls), but you will have to write the logic to manage that your self. Take a look at how the animation code works.

Related

In a 2d application where you're drawing a lot of individual sprites, will the rasterization stage inevitably become a bottleneck? [duplicate]

I'm in the processing of learning Vulkan, and I have just integrated ImGui into my code using the Vulkan-GLFW example in the original ImGui repo, and it works fine.
Now I want to render both the GUI and my 3D model on the screen at the same time, and since the GUI and the model definitely needs different shaders, I need to use multiple pipelines and submit multiples commands. The GUI is partly transparent, so I would like it to be rendered after the model. The Vulkan specs states that the execution order of commands are not likely to be the order that I record the commands, thus I need synchronization of some kind. In this Reddit post several methods of exactly achieving my goals was proposed, and I once believed that I must use multiple subpasses (together with subpass dependency) or barriers or other synchronization methods like that to solve this problem.
Then I had a look at SaschaWillems' Vulkan examples, in the ImGui example though, I see no synchronization between the two draw calls, it just record the command to draw the model first, and then the command to draw the GUI.
I am confused. Is synchronization really needed in this case, or did I misunderstand something about command re-ordering or blending? Thanks.
Think about what you're doing for a second. Why do you think there needs to be synchronization between the two sets of commands? Because the second set of commands needs to blend with the data in the first set, right? And therefore, it needs to do a read/modify/write (RMW), which must be able to read data written by the previous set of commands. The data being read has to have been written, and that typically requires synchronization.
But think a bit more about what that means. Blending has to read from the framebuffer to do its job. But... so does the depth test, right? It has to read the existing sample's depth value, compare it with the incoming fragment, and then discard the fragment or not based on the depth test. So basically every draw call that uses a depth test contains a framebuffer read/modify/wright.
And yet... your depth tests work. Not only do they work between draw calls without explicit synchronization, they also work within a draw call. If two triangles in a draw call overlap, you don't have any problem with seeing the bottom one through the top one, right? You don't have to do inter-triangle synchronization to make sure that the previous triangles' writes are finished before the reads.
So somehow, the depth test's RMW works without any explicit synchronization. So... why do you think that this is untrue of the blend stage's RMW?
The Vulkan specification states that commands, and stages within commands, will execute in a largely unordered way, with several exceptions. The most obvious being the presence of explicit execution barriers/dependencies. But it also says that the fixed-function per-sample testing and blending stages will always execute (as if) in submission order (within a subpass). Not only that, it requires that the triangles generated within a command also execute these stages (as if) in a specific, well-defined order.
That's why your depth test doesn't need synchronization; Vulkan requires that this is handled. This is also why your blending will not need synchronization (within a subpass).
So you have plenty of options (in order from fastest to slowest):
Render your UI in the same subpass as the non-UI. Just change pipelines as appropriate.
Render your UI in a subpass with an explicit dependency on the framebuffer images of the non-UI subpass. While this is technically slower, it probably won't be slower by much if at all. Also, this is useful for deferred rendering, since your UI needs to happen after your lighting pass, which will undoubtedly be its own subpass.
Render your UI in a different render pass. This would only really be needed for cases where you need to do some full-screen work (SSAO) that would force your non-UI render pass to terminate anyway.

SceneKit crashes without verbose crash info

Update: For anyone who stumbles upon this, it seems like SceneKit has a threshold for the maximum number of objects it can render. Using [SCNNode flattenedClone] is a great way to help increase the amount of objects it can handle. As #Hal suggested, I'll file a bug report with Apple describing the performance issues discussed below.
I'm somewhat new to iOS and I'm currently working on my first OS X project for a class. I'm essentially creating random geometric graphs (random points in space connected to one another if the distance between them is ≤ a given radius) and I'm using SceneKit to display the results. I already know I'm pushing SceneKit to its limits, but if the number of objects I'm trying to graph is too large, the whole thing just crashes and I don't know how to interpret the results.
My SceneKit scene consists of the default camera, 2 lighting nodes, approximately 5,000 SCNSpheres each within an SCNNode (the nodes on the graph), and then about 50,0000 connections which are of type SCNPrimitiveSCNGeometryPrimitiveTypeLine which are also within SCNNodes. All of these nodes are then added to one large node which is added to my scene.
The code works for smaller numbers of spheres and connections.
When I run my app with these specifications, everything seems to work fine, then 5-10 seconds after executing the following lines:
dispatch_async(dispatch_get_main_queue(), ^{
[self.graphSceneView.scene.rootNode addChildNode:graphNodes];
});
the app crashes with this resulting screen: .
Given that I'm sort of new to Xcode and used to more verbose output upon crashing, I'm a bit over my head. What can I do to get more information on this crash?
That's terse output for sure. You can attack it by simplifying until you don't see the crash anymore.
First, do you ever see anything on screen?
Second, your call to
dispatch_async(dispatch_get_main_queue(), ^{
[self.graphSceneView.scene.rootNode addChildNode:graphNodes];
});
still runs on the main queue, so I would expect it to make no difference in perceived speed or responsiveness. So take addChildNode: out of the GCD block and invoke it directly. Does that make a difference? At the least, you'll see your crash immediately, and might get a better stack trace.
Third, creating your own geometry from primitives like SCNPrimitiveSCNGeometryPrimitiveTypeLine is trickier than using the SCNGeometry subclasses. Memory mismanagement in that step could trigger mysterious crashes. What happens if you remove those connection lines? What happens if you replace them with long, skinny SCNBox instances? You might end up using SCNBox by choice because it's easier to style in SceneKit than a primitive line.
Fourth, take a look at #rickster's answer to this question about optimization: SceneKit on OS X with thousands of objects. It sounds like your project would benefit from node flattening (flattenedClone), and possibly the use of SCNLevelOfDetail. But these suggestions fall into the category of premature optimization, the root of all evil.
It would be interesting to hear what results from toggling between the Metal and OpenGL renderers. That's a setting on the SCNView in IB ("preferred renderer" I think), and an optional entry in Info.plist.

Working around WebGL readPixels being slow

I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.
As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:
The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).
What are some tricks for speeding up readPixels in this kind of use case?
Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
Should I be using a different method to get the information back out of the textures?
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.
Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
This will get you immensely better performance.
If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).
The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)
I don't know enough about your use case but just guessing, Why do you need to readPixels at all?
First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.
Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.
If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.
Am I making any sense?

Why is my graph so choppy?

I have created a scatter plot using Core Plot. My graph however needs to be refreshed dynamically (points are constantly being added and removed). I need the plot to be fluent and appear to be "sliding across the graph". Instead I seem to be getting a choppy line that adds several values at once, freezes and the again adds several values. What could be causing this behaviour?
-(void)updateDataWithVal:(double)percentageUsage
{
if ([self.graphData count] >= 10)
{
[self.graphData removeLastObject];
}
[self.graphData insertObject:[NSNumber numberWithDouble:percentageUsage] atIndex: 0];
[self.graph reloadData];
}
Above is the function that is called every time I want the graph to change. The problem isn't with the data being updated. I debugged the function and noticed that the data is being updated fluently (a point is added and removed from the data array per second). The problem is with the graph actually changing. What could be causing the graph to freeze and adding several points at once (every 6-7 seconds) instead of continuously updating every second like the data is?
I doubt this is being caused by adding to many points in a short interval. Only one point is removed and added per second. Additionally, my graph has only one plot.
My graph is running on OSX not iOS. All code is in Objective-C.
As requested, I can convert my comments into an answer so that this can closed out.
Core Plot graphs are heavily reliant on display elements, so any updates to them must be performed on the main thread. Otherwise, you will see odd rendering behavior like inconsistent updates, visual artifacts, and your application most likely will crash at some point.
I have to do the same thing that you describe within one of my Mac applications. For this, I use a background GCD queue to handle the data acquisition and processing to avoid blocking the main thread. However, every time that I need to insert the results into the graph and have it update, I use dispatch_async() to wrap the appropriate code in a block to be performed on the main thread. This should protect you against rendering oddities like what you see here.

Skipping Kinect events

Kinect sensor raises many events per second and if you are not very fast to elaborate them (for example trying to animate a true 3D character) in a few frames you get stuck.
What is the best approach to handle only the reasonable number of events, without blocking the User Interface?
Thanks.
I would suggest requesting the frame in a loop instead of using the event method.
To do this in your animation loop just call:
sensor.DepthStream.OpenNextFrame(millisecondsWait);
Or:
sensor.SkeletonStream.OpenNextFrame(millisecondsWait);
Or:
sensor.ColorStream.OpenNextFrame(millisecondsWait);
Event driven programming is great but when you run into problems like you mention it is better to just call the functions when you need it.
I'd say that if you're animating something really quick and elaborate (e.g. complex 60fps 3D image), the time you'll take to get the image from the camera synchronously might create bumps in your rendering.
I'd try splitting the rendering and the Kinect processing/polling in separate threads; with that approach you could even keep using the 30fps event-driven model.