squared-off line plot matplotlib - matplotlib

How do I generate a line graph in Matplotlib where lines connecting the data points are only vertical and horizontal, not diagonal, giving a "blocky" look?
Note that this is sometimes called zero order extrapolation.
MWE
import matplotlib.pyplot as plt
x = [1, 3, 5, 7]
y = [2, 0, 4, 1]
plt.plot(x, y)
This gives:
and I want:

I think you are looking for plt.step. Here are some examples.

Related

mouse-over only on actual data points

Here's a really simple line chart.
%matplotlib notebook
import matplotlib.pyplot as plt
lines = plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.setp(lines,marker='D')
plt.ylabel('foo')
plt.xlabel('bar')
plt.show()
If I move my mouse over the chart, I get the x and y values for wherever the pointer is. Is there any way to only get values only when I'm actually over a data point?
I understood you wanted to modify the behavior of the coordinates displayed in the status bar at the bottom right of the plot, is that right?
If so, you can "hijack" the Axes.format_coord() function to make it display whatever you want. You can see an example of this on matplotlib's example gallery.
In your case, something like this seem to do the trick?
my_x = np.array([1, 2, 3, 4])
my_y = np.array([1, 4, 9, 16])
eps = 0.1
def format_coord(x, y):
close_x = np.isclose(my_x, x, atol=eps)
close_y = np.isclose(my_y, y, atol=eps)
if np.any(close_x) and np.any(close_y):
return 'x=%s y=%s' % (ax.format_xdata(my_x[close_x]), ax.format_ydata(my_y[close_y]))
else:
return ''
fig, ax = plt.subplots()
ax.plot(my_x, my_y, 'D-')
ax.set_ylabel('foo')
ax.set_xlabel('bar')
ax.format_coord = format_coord
plt.show()

Matplotlib `fill_between`: Remove thin boundary

Consider the following code:
import matplotlib.pyplot as plt
import numpy as np
from pylab import *
graph_data = [[0, 1, 2, 3], [5, 8, 7, 9]]
x = range(len(graph_data[0]))
y = graph_data[1]
fig, ax = plt.subplots()
alpha = 0.5
plt.plot(x, y, '-o',markersize=3, color=[1., alpha, alpha], markeredgewidth=0.0)
ax.fill_between(x, 0, y, facecolor=[1., alpha, alpha], interpolate=False)
plt.show()
filename = 'test1.pdf'
fig.savefig(filename, bbox_inches='tight')
It works fine. However, when zoomed in the generated PDF, I can see two thin gray/black boundaries that separate the line:
I can see this when viewing in both Edge and Chrome. My question is, how can I get rid of the boundaries?
UPDATE I forgot to mention, I was using Sage to generate the graph. Now it seems a problem specific to Sage (and not to Python in general). This time I used native Python, and got correct result.
I could not reproduce it but maybe you can try to not plot the line.
import matplotlib.pyplot as plt
import numpy as np
from pylab import *
graph_data = [[0, 1, 2, 3], [5, 8, 7, 9]]
x = range(len(graph_data[0]))
y = graph_data[1]
fig, ax = plt.subplots()
alpha = 0.5
plt.plot(x, y, 'o',markersize=3, color=[1., alpha, alpha])
ax.fill_between(x, 0, y, facecolor=[1., alpha, alpha], interpolate=False)
plt.show()
filename = 'test1.pdf'
fig.savefig(filename, bbox_inches='tight')

Customizing legend with scatterplot

I struggle with customizing the legend of my scatterplot. Here is a snapshot :
And here is a code sample :
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
my_df = pd.DataFrame([[5, 3, 1], [2, 1, 2], [3, 4, 1], [1, 2, 1]],
columns=["DUMMY_CT", "FOO_CT", "CI_CT"])
g = sns.scatterplot("DUMMY_CT", "FOO_CT", data=my_df, size="CI_CT")
g.set_title("Number of Baz", weight="bold")
g.set_xlabel("Dummy count")
g.set_ylabel("Foo count")
g.get_legend().set_title("Baz count")
Also, I work in a Jupyter-lab notebook with Python 3, if it helps.
The red thingy issue
First things first, I wish to hide the name of the CI_CT variable (contoured in red on the picture). After exploring the whole documentation for this afternoon, I found the get_legend_handlers_label method (see here), which produces the following :
>>> g.get_legend_handles_labels()
([<matplotlib.collections.PathCollection at 0xfaaba4a8>,
<matplotlib.collections.PathCollection at 0xfaa3ff28>,
<matplotlib.collections.PathCollection at 0xfaa3f6a0>,
<matplotlib.collections.PathCollection at 0xfaa3fe48>],
['CI_CT', '0', '1', '2'])
Where I can spot my dear CI_CT string. However, I'm unable to change this name or to hide it completely. I found a dirty way, that basically consists in not using efficiently the dataframe passed as a data parameter. Here is the scatterplot call :
g = sns.scatterplot("DUMMY_CT", "FOO_CT", data=my_df, size=my_df["CI_CT"].values)
Result here :
It works, but is there a cleaner way to achieve this?
The green thingy issue
Displaying a 0 level in this legend is incorrect, since there is no zero value in the column CI_CT of my_df. It is therefore misleading for the readers, who might assume the smaller dots represents a value of 0 or 1. I wish to setup a defined scale, in the way one can do it for the x and y axis. However, I cannot achieve it. Any idea?
TL;DR : A broader question that could solve everything
Those adventures make me wonder if there is a way to handle the data you can pass to the scatterplots with hue and size parameters in a clean, x-and-y-axis way. Is it actually possible?
Please pardon my English, please let me know if the question is too broad or uncorrectly labelled.
The "green thing issue", namely that there is one more legend entry than there are sizes, is solved by specifying legend="full".
g = sns.scatterplot(..., legend="full")
The "red thing issue" is more tricky. The problem here is that seaborn misuses a normal legend label as a headline for the legend. An option is indeed to supply the values directly instead of the name of the column, to prevent seaborn from using that column name.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
my_df = pd.DataFrame([[5, 3, 1], [2, 1, 2], [3, 4, 1], [1, 2, 1]],
columns=["DUMMY_CT", "FOO_CT", "CI_CT"])
g = sns.scatterplot("DUMMY_CT", "FOO_CT", data=my_df, size=my_df["CI_CT"].values, legend="full")
g.set_title("Number of Baz", weight="bold")
g.set_xlabel("Dummy count")
g.set_ylabel("Foo count")
g.get_legend().set_title("Baz count")
plt.show()
If you really must use the column name itself, a hacky solution is to crawl into the legend and remove the label you don't want.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
my_df = pd.DataFrame([[5, 3, 1], [2, 1, 2], [3, 4, 1], [1, 2, 1]],
columns=["DUMMY_CT", "FOO_CT", "CI_CT"])
g = sns.scatterplot("DUMMY_CT", "FOO_CT", data=my_df, size="CI_CT", legend="full")
g.set_title("Number of Baz", weight="bold")
g.set_xlabel("Dummy count")
g.set_ylabel("Foo count")
g.get_legend().set_title("Baz count")
#Hack to remove the first legend entry (which is the undesired title)
vpacker = g.get_legend()._legend_handle_box.get_children()[0]
vpacker._children = vpacker.get_children()[1:]
plt.show()
I finally managed to get the result I wish, but the ugly way. It might be useful to someone, but I would not advise to do this.
The solution to fix the scale into the legend consists of moving all the CI_CT column values to the negatives (to keep the order and the consistency of markers size). Then, the values displayed in the legend are corrected accordingly to the previous data changes (inspiration from here).
However, I did not find any better way to make the "CI_CT" text desapear in the legend without leaving an atrociously huge blank space.
Here is the sample of code and the result.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
my_df = pd.DataFrame([[5, 3, 1], [2, 1, 2], [3, 4, 1], [1, 2, 1]], columns=["DUMMY_CT", "FOO_CT", "CI_CT"])
# Substracting the maximal value of CI_CT for each value
max_val = my_df["CI_CT"].agg("max")
my_df["CI_CT"] = my_df.apply(lambda x : x["CI_CT"] - max_val, axis=1)
# scatterplot declaration
g = sns.scatterplot("DUMMY_CT", "FOO_CT", data=my_df, size=my_df["CI_CT"].values)
g.set_title("Number of Baz", weight="bold")
g.set_xlabel("Dummy count")
g.set_ylabel("Foo count")
g.get_legend().set_title("Baz count")
# Correcting legend values
l = g.legend_
for t in l.texts :
t.set_text(int(t.get_text()) + max_val)
# Restoring the DF
my_df["CI_CT"] = my_df.apply(lambda x : x["CI_CT"] + max_val, axis=1)
I'm still looking for a better way to achieve this.

How do I plot a hexagon in 3D using matplotlib [duplicate]

This question already has answers here:
How can matplotlib 2D patches be transformed to 3D with arbitrary normals?
(4 answers)
Closed 4 years ago.
I have tried few things by searching but I am missing on the understanding of vertices or something at least brain fade at the moment can some one help me I need a regular hexagon
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection, Line3DCollection
fig = plt.figure(figsize=(15,9))
ax = fig.add_subplot(111, projection='3d')
x = [0, 2, 1, 1,1,1]
y = [0, 0, 1, 0, 1,1]
z = [0, 0, 0, 1,1,1]
vertices = [[0, 1, 2], [0, 1, 3], [0, 2, 3], [1, 2, 3],[0,1,2],[0,1,2]]
tupleList = list(zip(x, y, z))
poly3d = [[tupleList[vertices[ix][iy]] for iy in range(len(vertices[0]))] for ix in range(len(vertices))]
ax.scatter(x,y,z)
ax.add_collection3d(Poly3DCollection(poly3d, facecolors='w', linewidths=1, alpha=0.5))
ax.add_collection3d(Line3DCollection(poly3d, colors='k', linewidths=0.2, linestyles=':'))
plt.show()
Matplotlib is not cabable for real 3D.
The 3D stuff in matplotlib is mostly just for a nicer appearance of 2D-data.
If you need real 3D visualization i'd recommend Mayavi or VTK.
If your hexagon can not be expressed as a mathematical function of 2 variables (e.g. z = f(x,y) ) then matplotlib is the wrong tool for that.

How do I assign multiple labels at once in matplotlib?

I have the following dataset:
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[9, 8, 7, 6, 5] ]
Now I plot it with:
import matplotlib.pyplot as plt
plt.plot(x, y)
However, I want to label the 3 y-datasets with this command, which raises an error when .legend() is called:
lineObjects = plt.plot(x, y, label=['foo', 'bar', 'baz'])
plt.legend()
File "./plot_nmos.py", line 33, in <module>
plt.legend()
...
AttributeError: 'list' object has no attribute 'startswith'
When I inspect the lineObjects:
>>> lineObjects[0].get_label()
['foo', 'bar', 'baz']
>>> lineObjects[1].get_label()
['foo', 'bar', 'baz']
>>> lineObjects[2].get_label()
['foo', 'bar', 'baz']
Question
Is there an elegant way to assign multiple labels by just using the .plot() method?
You can iterate over your line objects list, so labels are individually assigned. An example with the built-in python iter function:
lineObjects = plt.plot(x, y)
plt.legend(iter(lineObjects), ('foo', 'bar', 'baz'))`
Edit: after updating to matplotlib 1.1.1, it looks like the plt.plot(x, y), with y as a list of lists (as provided by the author of the question), doesn't work anymore. The one step plotting without iteration over the y arrays is still possible thought after passing y as numpy.array (assuming (numpy)[http://numpy.scipy.org/] as been previously imported).
In this case, use plt.plot(x, y) (if the data in the 2D y array are arranged as columns [axis 1]) or plt.plot(x, y.transpose()) (if the data in the 2D y array are arranged as rows [axis 0])
Edit 2: as pointed by #pelson (see commentary below), the iter function is unnecessary and a simple plt.legend(lineObjects, ('foo', 'bar', 'baz')) works perfectly
It is not possible to plot those two arrays agains each other directly (with at least version 1.1.1), therefore you must be looping over your y arrays. My advice would be to loop over the labels at the same time:
import matplotlib.pyplot as plt
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [9, 8, 7, 6, 5] ]
labels = ['foo', 'bar', 'baz']
for y_arr, label in zip(y, labels):
plt.plot(x, y_arr, label=label)
plt.legend()
plt.show()
Edit: #gcalmettes pointed out that as numpy arrays, it is possible to plot all the lines at the same time (by transposing them). See #gcalmettes answer & comments for details.
I came over the same problem and now I found a solution that is most easy! Hopefully that's not too late for you. No iterator, just assign your result to a structure...
from numpy import *
from matplotlib.pyplot import *
from numpy.random import *
a = rand(4,4)
a
>>> array([[ 0.33562406, 0.96967617, 0.69730654, 0.46542408],
[ 0.85707323, 0.37398595, 0.82455736, 0.72127002],
[ 0.19530943, 0.4376796 , 0.62653007, 0.77490795],
[ 0.97362944, 0.42720348, 0.45379479, 0.75714877]])
[b,c,d,e] = plot(a)
legend([b,c,d,e], ["b","c","d","e"], loc=1)
show()
Looks like this:
The best current solution is:
lineObjects = plt.plot(x, y) # y describes 3 lines
plt.legend(['foo', 'bar', 'baz'])
You can give the labels while plotting the curves
import pylab as plt
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [9, 8, 7, 6, 5] ]
labels=['foo', 'bar', 'baz']
colors=['r','g','b']
# loop over data, labels and colors
for i in range(len(y)):
plt.plot(x,y[i],'o-',color=colors[i],label=labels[i])
plt.legend()
plt.show()
In case of numpy matrix plot assign multiple legends at once for each column
I would like to answer this question based on plotting a matrix that has two columns.
Say you have a 2 column matrix Ret
then one may use this code to assign multiple labels at once
import pandas as pd, numpy as np, matplotlib.pyplot as plt
pd.DataFrame(Ret).plot()
plt.xlabel('time')
plt.ylabel('Return')
plt.legend(['Bond Ret','Equity Ret'], loc=0)
plt.show()
I hope this helps
This problem comes up for me often when I have a single set of x values and multiple y values in the columns of an array. I really don't want to plot the data in a loop, and multiple calls to ax.legend/plt.legend are not really an option, since I want to plot other stuff, usually in an equally annoying format.
Unfortunately, plt.setp is not helpful here. In newer versions of matplotlib, it just converts your entire list/tuple into a string, and assigns the whole thing as a label to all the lines.
I've therefore made a utility function to wrap calls to ax.plot/plt.plot in:
def set_labels(artists, labels):
for artist, label in zip(artists, labels):
artist.set_label(label)
You can call it something like
x = np.arange(5)
y = np.random.ranint(10, size=(5, 3))
fig, ax = plt.subplots()
set_labels(ax.plot(x, y), 'ABC')
This way you get to specify all your normal artist parameters to plot, without having to see the loop in your code. An alternative is to put the whole call to plot into a utility that just unpacks the labels, but that would require a lot of duplication to figure out how to parse multiple datasets, possibly with different numbers of columns, and spread out across multiple arguments, keyword or otherwise.
I used the following to show labels for a dataframe without using the dataframe plot:
lines_ = plot(df)
legend(lines_, df.columns) # df.columns is a list of labels
If you're using a DataFrame, you can also iterate over the columns of the data you want to plot:
# Plot figure
fig, ax = plt.subplots(figsize=(5,5))
# Data
data = data
# Plot
for i in data.columns:
_ = ax.plot(data[i], label=i)
_ = ax.legend()
plt.show()