Geoviews bokeh vs matplotlib for plotting large xarrays - matplotlib

I am trying to plot a large xarray dataset of x=1000 by y=1000 for t=10 different timestamp in Google Colab.
See the following example notebook:
https://colab.research.google.com/drive/1HLCqM-x8kt0nMwbCjCos_tboeO6VqPjn
However, when I try to plot this with gv.extension('bokeh') it doesn't give any output...
When gv.extenstion('matplotlib') does correctly show a plot of this data.
I suppose it has something to do with the amount of data bokeh can store in one view?
I already tried putting dynamic=True, which does make it work. But for my use case the delay in viewing different timestamps is not really desirable. Same goes for datashader regrid, which makes it run faster, but the delay in viewing different timestamps is not wanted.
Is there a way to plot this large xarray with bokeh making it as smoothly visible and slidable as with matplotlib?
Or are there any other ways I can try and visualise this data interactively (on a web app)?

Related

Colab kernel restarts for unknown reasons

My colab kernel restarted itself for unknown reasons when I tried to plot my data with matplotlib plot (right before which, I had successfully plotted histograms for the same data). The colab gave a notification to check the runtime logs, which are as follows. I don't see any apparent reason why this happened.
I just found out why it was happening. The colab runtime RAM was being consumed entirely while running through the last cell. It happened when I used the matplotlib plot, but when I used matplotlib histograms, it turned out just fine. Histograms are probably lighter when you have a high precision float values.
I repeated with multiple trials between the two methods, and got the same result every time.

How to move my pandas dataframe to d3?

I am new to Python and have worked my way through a few books on it. Everything is great, except visualizations. I really dislike matplotlib and Bokeh requires too heavy of a stack.
The workflow I want is:
Data munging analysis using pandas in ipython notebook -> visualization using d3 in sublimetext2
However, being new to both Python and d3, I don't know the best way to export my pandas dataframe to d3. Should I just have it as a csv? JSON? Or is there a more direct way?
Side question: Is there any (reasonable) way to do everything in an ipython notebook instead of switching to sublimetext?
Any help would be appreciated.
Basically there is no best format what will fit all your visualization needs.
It really depends on the visualizations you want to obtain.
For example, a Stacked Bar Chart takes as input a CSV file, and an adjacency matrix vizualisation takes a JSON format.
From my experience:
to display relations beetween items, like adjacency matrix or chord diagram, one will prefer a JSON format that will allow to describe only existing relations. Data are stored like in a sparse matrix, and several data can be nested using dictionary. Moreover this format can directly be parsed in Python.
to display properties of an array of items, a CSV format can be fine. A perfect example can be found here with a parallel chart display.
to display hierarchical data, like a tree, JSON is best suited.
The best thing to do to help you figure out what best format you need, is to have a look at this d3js gallery
You can use D3 directly inside of Jupyter / Ipython. Try the two links below ..
http://blog.thedataincubator.com/2015/08/embedding-d3-in-an-ipython-notebook/
https://github.com/cmoscardi/embedded_d3_example/blob/master/Embedded_D3.ipynb

PyPlot in Julia only showing plot when code ends

I have recently begun learning to use Julia, converting over from Matlab/Octave. I decided that the best way to get some experience was to convert some code I was already working on i Octave - a Newton solver for a complicated multidimensional problem. I have been able to convert the code over successfully (and with noticeable speedup relative to Octave, without devectorisation or other performance-based changes), with only one issue arising.
I have chosen to use PyPlot for plotting, due to its similarity to Matlab/Octave's plotting functionality. However, there is some behaviour from PyPlot that is undesired. I use the plotting function to display the current state of the vector I am trying to get to zero (using the Newton solver part of the code), so that I can see what it is doing, and adjust the code to try to improve this behaviour. I input the number of Newton steps to take before the code stops, and then I can make adjustments or re-issue the command to continue attempting to converge.
I have the code set up to plot the current state every few steps, so that I can, for instance, have the code take 200 steps, but show me the status after every 10 steps. In Octave, this works perfectly, providing me with up-to-date information - should the behaviour of the code not be desirable, I can quickly cancel the code with Ctrl-C (this part works in Julia, too).
However, Julia does not produce or update the plots when the plot() command is used; instead, it produces the plot, or updates it if the plot window is already open, only when the code finishes. This entirely defeats the purpose of the intermittent plotting within the code. Once the code has completed, the plot is correctly generated, so I know that the plot() command itself is being used correctly.
I have tried adding either draw() or show() immediately after the plot command. I have also tried display(gcf()). None of these have modified the result. I have confirmed that isinteractive() outputs "true". I have also tried turning interactivity off (ioff()) and switching whether to use the python or julia backend (pygui(true) and pygui(false)), with no effect on this behaviour.
Have I missed something? Is there another package or option that needs to be set in order to force PyPlot to generate the current plot immediately, rather than waiting until Julia finishes its current code run to generate the plot?
Or is it perhaps possible that scope is causing a problem, here, as the intermittent plotting happens inside a while loop?
I am using xubuntu 12.10 with Julia 0.2.1.
PyPlot defaults to this behavior in the REPL. To make it show the plots as they are plotted type ion(). To turn it off again type ioff().
ion() is only effective for the current season so if you want it to stay on across sessions just add it to your .juliarc file.
If you're using iPython ion() will plot to a new window but ioff() will plot inline.

Visualizing a large data series

I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?
Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.
First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.
There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.
I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.

Plotting data in real time

I have a program which outputs to the terminal a number, one line at a time.
My goal is to have something else read these numbers and graph them in a line plot in real time. matplotlib and wxpython have been suggested, but I'm not sure how to go about implementing these.
See the following links:
What is the best real time plotting widget for wxPython?
Minimalistic Real-Time Plotting in Python
http://eli.thegreenplace.net/2008/08/01/matplotlib-with-wxpython-guis/
http://wxpython-users.1045709.n5.nabble.com/real-time-data-plots-td2344816.html
As some of those point out, you might be able to use wx's PyPlot for something really simple or use Chaco.
I really like this library for HTML5 graphing. Here is demo of real time updates: http://dygraphs.com/gallery/#g/dynamic-update
Are you simply asking for recommendations on plotting libs?