Colab kernel restarts for unknown reasons - pandas

My colab kernel restarted itself for unknown reasons when I tried to plot my data with matplotlib plot (right before which, I had successfully plotted histograms for the same data). The colab gave a notification to check the runtime logs, which are as follows. I don't see any apparent reason why this happened.

I just found out why it was happening. The colab runtime RAM was being consumed entirely while running through the last cell. It happened when I used the matplotlib plot, but when I used matplotlib histograms, it turned out just fine. Histograms are probably lighter when you have a high precision float values.
I repeated with multiple trials between the two methods, and got the same result every time.

Related

Geoviews bokeh vs matplotlib for plotting large xarrays

I am trying to plot a large xarray dataset of x=1000 by y=1000 for t=10 different timestamp in Google Colab.
See the following example notebook:
https://colab.research.google.com/drive/1HLCqM-x8kt0nMwbCjCos_tboeO6VqPjn
However, when I try to plot this with gv.extension('bokeh') it doesn't give any output...
When gv.extenstion('matplotlib') does correctly show a plot of this data.
I suppose it has something to do with the amount of data bokeh can store in one view?
I already tried putting dynamic=True, which does make it work. But for my use case the delay in viewing different timestamps is not really desirable. Same goes for datashader regrid, which makes it run faster, but the delay in viewing different timestamps is not wanted.
Is there a way to plot this large xarray with bokeh making it as smoothly visible and slidable as with matplotlib?
Or are there any other ways I can try and visualise this data interactively (on a web app)?

How to redraw only the regions of an image that have changed using matplotlib

I have an application where only one row of a 1024x2048 pixel image changes at a rate of 100 times per second. I would like to display and update this image in real time using minimal resources.
However, matplotlib redraws the entire image every time I call the plt.draw() function. This is slow and processor intensive.
Is there a way to redraw only one line at a time?
I am not an expert on matplotlib internals, but I think it cannot be done in that way. Matplotlib was not designed for displaying large changing textures at a high frame rate, it is designed to offer a high level and be very easy to use API for displaying interactive plots.
Internally it is implemented in both python and c++ (for low level and high performance operation), and it uses Tcl / Tk as graphical user interface and widget toolkit (what allows the great cross-platform portability among OSs).
So, your 1024x2048 matrix has to be transformated several times before it is displayed.
If you do not need the extra features matplotlib gives (as autoscaling, axes, interactive zoom...) and your main goal is speed, I recommend you to use a more focused in performance python library / module for displaying.
There are lots of options: pyopencv, pySDL2...

PyPlot in Julia only showing plot when code ends

I have recently begun learning to use Julia, converting over from Matlab/Octave. I decided that the best way to get some experience was to convert some code I was already working on i Octave - a Newton solver for a complicated multidimensional problem. I have been able to convert the code over successfully (and with noticeable speedup relative to Octave, without devectorisation or other performance-based changes), with only one issue arising.
I have chosen to use PyPlot for plotting, due to its similarity to Matlab/Octave's plotting functionality. However, there is some behaviour from PyPlot that is undesired. I use the plotting function to display the current state of the vector I am trying to get to zero (using the Newton solver part of the code), so that I can see what it is doing, and adjust the code to try to improve this behaviour. I input the number of Newton steps to take before the code stops, and then I can make adjustments or re-issue the command to continue attempting to converge.
I have the code set up to plot the current state every few steps, so that I can, for instance, have the code take 200 steps, but show me the status after every 10 steps. In Octave, this works perfectly, providing me with up-to-date information - should the behaviour of the code not be desirable, I can quickly cancel the code with Ctrl-C (this part works in Julia, too).
However, Julia does not produce or update the plots when the plot() command is used; instead, it produces the plot, or updates it if the plot window is already open, only when the code finishes. This entirely defeats the purpose of the intermittent plotting within the code. Once the code has completed, the plot is correctly generated, so I know that the plot() command itself is being used correctly.
I have tried adding either draw() or show() immediately after the plot command. I have also tried display(gcf()). None of these have modified the result. I have confirmed that isinteractive() outputs "true". I have also tried turning interactivity off (ioff()) and switching whether to use the python or julia backend (pygui(true) and pygui(false)), with no effect on this behaviour.
Have I missed something? Is there another package or option that needs to be set in order to force PyPlot to generate the current plot immediately, rather than waiting until Julia finishes its current code run to generate the plot?
Or is it perhaps possible that scope is causing a problem, here, as the intermittent plotting happens inside a while loop?
I am using xubuntu 12.10 with Julia 0.2.1.
PyPlot defaults to this behavior in the REPL. To make it show the plots as they are plotted type ion(). To turn it off again type ioff().
ion() is only effective for the current season so if you want it to stay on across sessions just add it to your .juliarc file.
If you're using iPython ion() will plot to a new window but ioff() will plot inline.

PyMC change backend after sampling

I have been using PyMC in an analysis of some high energy physics data. It has worked to perfection, the analysis is complete, and we are working on the paper.
I have a small problem, however. I ran the sampler with the RAM database backend. The traces have been sitting around in memory in an IPython kernel process for a couple of months now. The problem is that the workstation support staff want to perform a kernel upgrade and reboot that workstation. This will cause me to lose the traces. I would like to keep these traces (as opposed to just generating new), since they are what I've made all the plots with. I'd also like to include a portion of the traces (only the parameters of interest) as supplemental material with the publication.
Is it possible to take an existing chain in a pymc.MCMC object created with the RAM backend, change to a different backend, and write out the traces in the chain?
The trace values are stored as NumPy arrays, so you can use numpy.savetxt to send the values of each parameter to a file. (This is what the text backend does under the hood.)
While saving your current traces is a good idea, I'd suggest taking the time to make your analysis repeatable before publishing.

Visualizing a large data series

I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?
Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.
First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.
There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.
I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.