How do I plot different trials with separate lines with equal styles using FacetGrid in seaborn? - pandas

My pandas dataset has velocity curves for different trials of an experiment and for different conditions ('shifts'). I want to plot the velocity lines for each trial in the same facet per condition, but without specifying that I want the trials to influence a style they are just plotted in one continuous line, where the end of the first is connected to the start of the second line, and so on. This also means I can't use alpha values to see where the velocity curves are most dense, because it's just one big line. Is there a way to separate them?
this is what it looks like without separation
this is what it's supposed to look like, just not with different hues for each line
This is the code I used for the second example
grid = sns.FacetGrid(half_second_df, col='shift', hue='trial', col_wrap=3)
grid.map(plt.plot, 't_rel_sacc', 'yaw_velo', linewidth=0.5, alpha=0.3)

Related

Optimal display for overlapping series in a line chart

In a context of a line chart displaying time data in regular intervals where multiple series might overlap what would be the optimal way to:
A) hint the user that the chart has overlapping series?
B) give the user the capability to visualize all those series? Like spanning the series somehow?
For overlapping series in a line chart, I would keep the traditional line chart but put a label at the end of the graph with a color legend. The legend and label will help the user get information quickly.
Another version of a line chart for overlapping series can be a line area chat.
If you are not stuck on only line charts, I would suggest a bar chart. Below are some examples that you can use.
Example 1:
Example 2:
Example 3:
There are couple ways to indicate that there are overlapping series on a chart. You can increase the marker radius of one of them. The number of legend elements tells you how many series there is, too. Finally, you can distribute series on a different yAxis, with different top and height properties. Also, in styled mode, when you hover on legend item, other series opacity changes.
API Reference:
http://api.highcharts.com/highcharts/plotOptions.line.marker.radius
Examples:
http://jsfiddle.net/whsgpdyw/ - changing marker radius
http://jsfiddle.net/fuq6j4sg/ - each series on a different yAxis

Interpolating data onto a line of points

I have some irregularly spaced data and need to analyze it. I can successfully interpolate this data onto a regular grid using mlab.griddata (or rather, the natgrid implementation of it). This allows me to use pcolormesh and contour to generate plots, extract levels, etc. Using plot.contour, I then extract a certain level using get_paths from the contour CS.collections().
Now, what I'd like to do is then, with my original irregularly spaced data, interpolate some quantities onto this specific contour line (i.e., NOT onto a regular grid). The similarly named griddata function from Scipy allows for this behavior, and it almost works. However, I find that as I increase the number of original points, I can get odd erratic behavior in the interpolation. I'm wondering if there's a way around this, i.e., another way to interpolate irregularly spaced (or regularly spaced data for that matter, since I can use my regularly spaced data from mlab.griddata) onto a specific line.
Let me show some numerical examples of what I'm talking about. Take a look at this figure:
The top left shows my data as points, and the line shows an extracted level of level=0 from some data D that I have at those points (x,y) [note, I have data 'D', 'Energy', and 'Pressure', all defined in this (x,y) space]. Once I have this curve, I can plot the interpolated quantities of D, Energy, and Pressure onto my specific line. First, note the plot of D (middle, right). It should be zero at all points, but it's not quite zero at all points. The likely cause of this is that the line that corresponds to the 0 level is generated from a uniform set of points that came from mlab.griddata, whereas the plot of 'D' is generated from my ORIGINAL data interpolated onto that level curve. You can also see some unphysical wiggles in 'Energy' and 'Pressure'.
Okay, seems easy enough, right? Maybe I should just get more original data points along my level=0 curve. Getting some more of these points, I then generate the following plots:
First look at the top left. You can see that I've sampled the hell out of the (x,y) space in the vicinity of my level=0 curve. Furthermore, you can see that my new "D" plot (middle, right) now correctly interpolates to zero in the region that it originally didn't. But now I get some wiggles at the start of the curve, as well as getting some other wiggles in the 'Energy' and 'Pressure' in this space! It is far from obvious to me that this should occur, since my original data points are still there and I've only supplemented additional points. Furthermore, some regions where my interpolation is going bad aren't even near the points that I added in the second run -- they are exclusively neighbored by my original points.
So this brings me to my original question. I'm worried that the interpolation that produces the 'Energy', 'D', and 'Pressure' curves is not working correctly (this is scigrid's griddata). Mlab's griddata only interpolates to a regular grid, whereas I want to interpolate to this specific line shown in the top left plot. What's another way for me to do this?
Thanks for your time!
After posting this, I decided to try scipy.interpolate.SmoothBivariateSpline, which produced the following result:
You can now see that my line is smoothed, so it seems like this will work. I'll mark this as the answer unless someone posts something soon that hints that there may be an even better solution.
Edit: As requested, below is some of the code used to generate these plots. I don't have a minimally working example, and the above plots were generated in a larger framework of code, but I'll write the important parts schematically below with comments.
# x,y,z are lists of data where the first point is x[0],y[0],z[0], and so on
minx=min(x)
maxx=max(x)
miny=min(y)
maxy=max(y)
# convert to numpy arrays
x=np.array(x)
y=np.array(y)
z=np.array(z)
# here we are creating a fine grid to interpolate the data onto
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
# here we interpolate our data from the original x,y,z unstructured grid to the new
# fine, regular grid in xi,yi, returning the values zi
zi=griddata(x,y,z,xi,yi)
# now let's do some plotting
plt.figure()
# returns the CS contour object, from which we'll be able to get the path for the
# level=0 curve
CS=plt.contour(x,y,z,levels=[0])
# can plot the original data if we want
plt.scatter(x,y,alpha=0.5,marker='x')
# now let's get the level=0 curve
for c in CS.collections:
data=c.get_paths()[0].vertices
# lineX,lineY are simply the x,y coordinates for our level=0 curve, expressed as arrays
lineX=data[:,0]
lineY=data[:,1]
# so it's easy to plot this too
plt.plot(lineX,lineY)
# now what to do if we want to interpolate some other data we have, say z2
# (also at our original x,y positions), onto
# this level=0 curve?
# well, first I tried using scipy.interpolate.griddata == scigrid like so
origdata=np.transpose(np.vstack((x,y))) # just organizing this data like the
# scigrid routine expects
lineZ2=scigrid(origdata,z2,data,method='linear')
# plotting the above curve (as plt.plot(lineZ2)) gave me really bad results, so
# trying a spline approach
Z2spline=SmoothBivariateSpline(x,y,z2)
# the above creates a spline object on our original data. notice we haven't EVALUATED
# it anywhere yet (we'll want to evaluate it on our level curve)
Z2Line=[]
# here we evaluate the spline along all our points on the level curve, and store the
# result as a new list
for i in range(0,len(lineX)):
Z2Line.append(Z2spline(lineX[i],lineY[i])[0][0]) # the [0][0] is just to get the
# value, which is enclosed in
# some array structure for some
# reason otherwise
# you can then easily plot this
plt.plot(Z2Line)
Hope this helps someone!

How to create subplots of pictures made with the hist() function in matplotlib/pyplot/numpy ?

I would like to create four subplots of pictures made with the hist() function, using matplotlib, pyplot and/or numpy. (I am not entirely sure what the differences between these things are. Usually I just import whatever I need -- based on an example.)
In my case, I have a a list consisting of four lists that describe what the amount of virus particles are at the end of some simulation involving the virus population. Each of these four lists contain 30 (whole) numbers. Most of the numbers are either between 0 and 10, or between 450 and 600 (which means that, in the former case, the virus population has (nearly) become extinct, or that, in the latter case, the virus population has survived and adapted to certain changing conditions).
I would like to show, in each of the subplots created with the hist() function, how often the virus population (nearly) goes extinct, has adapted, or is somewhere in between. So I would like to create four subplot histogram pictures that are bundled together in one big picture. On the x-axis, the population at the end of the simulation is shown, and on the y-axis, the frequency of the virus population having this amount of virus particles is shown.
Also, I would like to be able to give each of the subplots a title and label the x- and y-axes.
I already tried to do this numerous times by looking at the documentation on the hist() function and the subplot option in pyplot, but I couldn't figure out how to combine these options. Could you please give me a small example of this? Then I will probably be able to extrapolate how to adjust the example to my situation.
Unless you are not already familiar with it, you should have a look at the matplotlib gallery.
There you will find lots of examples hot to use the add_subplot and subplots commands.
A minimal example of what you propably want to achieve is
import matplotlib.pyplot as plt
import numpy as np
data=np.random.random((4,10))
xaxes = ['x1','x2','x3','x4']
yaxes = ['y1','y2','y3','y4']
titles = ['t1','t2','t3','t4']
f,a = plt.subplots(2,2)
a = a.ravel()
for idx,ax in enumerate(a):
ax.hist(data[idx])
ax.set_title(titles[idx])
ax.set_xlabel(xaxes[idx])
ax.set_ylabel(yaxes[idx])
plt.tight_layout()
which results in a plot like

Reportlab LinePlot - how do I add a lineLegend...or label my lines?

I have a lineplot with 2 lines on it...they're two separate channels from the same data set. Would love to just label each one - the "labels" options are all about giving a number for each point on your plot, and that is simply not helpful.
Would love to know how to do any (really, all, but I just need to do one to be happy) of these:
plot each against its own y axis and be able to sensibly label that axis with units (and color the numbers to correspond to the data it correlates to)
put a legend on it. I can't figure out how to use lineLegend
just put any kind of (singular) label in the vicinity of the lines.

Plot variable size/color-heatmap for mulitple occurences of points in scatter plot

I'm stuck with the following problem and I hope I can explain it coherent.
So, I have a number (about 10) of descrete positions on a coordinate system.
Now, I want to analyse data from a program where user could label each point as somethingA and somethingB.
I extracted the data points for each class. So I have about 60 points for the somethingA class and a little bit less for the other class. One class stands for good points and one for bad points. I want to find the positions which have the most good/bad labels. I do that with machine learning algorithms, I just want to visualize this with plots.
I now want to plot those points. So I make one plot per class. But since in every class every point occurs at least once, the two plots would look exactly the same.
But, the amount of occurences has a different distribution thoughout the positions.
Maybe point A has 20 occurences in class A and 1 in class B, both plots would look the same.
So, my question is: How can I take the number of occurences for points into account when plotting scatters in Matplotlib?
Either with different colors (like a heatmap?) maybe with a cool legend.
Or with different sizes (e.g. higher amount = bigger cirlce).
Any help would be appreciated!
I don't know if this helps you but I have had a problem where I wanted a scatterplot to reflect both positions as well as two variables that were attributed to the data points.
Since size and color in the scatter function do not allow variables themselves, meaning one has to specify color code and size in the usual way, meaning sth like
ax.scatter(..., c=whatEverFunction, s=numberOfOccurences, ...)
did not work for me.
what I did was to bin the values of the two variables I wanted to visualize. In my case the variable nodeMass and another variable.
for i in range(Number):
mask[i] = False
if(lowerBound1<variableOne[i]<upperBound1):
mask[i] = True & pmask[i]
if len(positionX[mask])>0:
ax.scatter(positionX[mask], positionY[mask], positionZ[mask],C='#424242',s=10, edgecolors='none')
for i in range(Number):
mask[i] = False
if(lowerBound2<variableOne[i]<upperBound2):
mask[i] = True & pmask[i]
if len(positionX[mask])>0:
ax.scatter(positionX[mask], positionY[mask], positionZ[mask],c='#9E0050',s=25,edgecolors='none')
I know it is not very elegant but it worked for me. I had to make as many for loops as I had bins in my variables. With if-querys and the masks I could at least avoid redundant or 'unreadable' plots.