I have time series data recorded at discrete ordinal levels (e.g. 0, 1, 2), and I'd like to plot them with meaningful names (e.g. low, medium, high).
Currently I have:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({
"x": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04"],
"y": [2, 1, 2, 0],
})
fig = px.line(x=df.x, y=df.y, line_shape="hv")
fig.show()
which produces:
But I'd like something like:
This feels like the easiest way:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({
"x": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04"],
"y": [2, 1, 2, 0],
})
fig = px.line(x=df.x, y=df.y, line_shape="hv")
fig.update_yaxes(
ticktext=["Low", "Medium", "High"],
tickvals=[0, 1, 2],
)
fig.show()
Result:
In Plotly language this falls under the "categorical" umbrella.
If the order needs tweaked, the categoryarray and categoryorder can also be set with update_yaxes.
https://plotly.com/python/reference/layout/yaxis/#layout-yaxis-categoryarray
https://plotly.com/python/reference/layout/yaxis/#layout-yaxis-categoryorder
This question already has answers here:
Python - Plotting colored grid based on values
(3 answers)
Closed 2 years ago.
I have a dataset containing row and column values and corresponding True or False, such as
Sr. Row Col Result
1 2 4 true
2 12 5 false
3 5 4 false
And I would like to plot it as bellow,
You can grab the positive indices via df[df['Result'] == 'true']. As your indices seem to start with 1, subtract 1 for the rows and 1 for the columns. Use these indices to fill in a matrix. Then use imshow to display that matrix.
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import numpy as np
import pandas as pd
df = pd.DataFrame({'Row': [2, 12, 5, 1, 12, 12],
'Col': [4, 5, 4, 1, 1, 8],
'Result': ['true', 'false', 'false', 'true', 'true', 'true']})
positive_indices = df[df['Result'] == 'true'][['Row', 'Col']].to_numpy() - np.array([1, 1])
matrix = np.zeros((12, 8), dtype=bool)
matrix[positive_indices[:, 0], positive_indices[:, 1]] = True
fig, ax = plt.subplots()
ax.imshow(matrix, extent=[.5, 8.5, .5, 12.5], cmap='bwr_r', origin='lower')
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.yaxis.set_major_locator(MultipleLocator(1))
plt.show()
PS: To have the first row at the top, reverse the y-limits in the extent and use origin='upper' (the default):
ax.imshow(matrix, extent=[.5, 8.5, 12.5, .5], cmap='bwr_r', origin='upper')
Colormap cmap='bwr' would color the 'true' cells red. A ListedColormap provides even more control over the color, e.g. cmap=matplotlib.colors.ListedColormap(['Crimson', 'DeepSkyBlue']).
I am trying to do a scatter plot with pandas. Unfortunately kind='scatter' doesn't work. If I change this to kind='line' it works as expected. What can I do to fix this?
for label, d in df.groupby('m'):
d[['te','n']].sort_values(by='n', ascending=False).plot(kind="scatter", x='n', y='te', ax=ax, label='m = '+str(label))```
Use plot.scatter instead:
df = pd.DataFrame({'x': [0, 5, 7,3, 2, 4, 6], 'y': [0, 5, 7,3, 2, 4, 6]})
df.plot.scatter('x', 'y')
Use this snippet if you want individual labels and colours:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'm': np.random.randint(0, 5, size=100),
'x': np.random.uniform(size=100),
'y': np.random.uniform(size=100),
})
fig, ax = plt.subplots()
for label, d in df.groupby('m'):
# generate a random color:
color = list(np.random.uniform(size=3))
d.plot.scatter('x', 'y', label=f'group {label}', ax=ax, c=[color])
I have two different data sets. I want to plot histogram using two different data sets but keeping the bins same, there width and range of each bin should be same.
Data1 = np.array([1,2,3,3,5,6,7,8])
Data2 = np.array[1,2,3,4,6,7,8,8]
n,bins,patches = plt.hist(Data1,bins=20)
plt.ylabel("no of states")
plt.xlabel("bins")
plt.savefig("./DOS")`
You can look at the documentation for matplotlib.pyplot.hist and you will see that the bins argument can be an integer (defining the number of bins) or a sequence (defining the edges of the bins themselves).
Therefore, you need to manually define the bins you want to use and pass these to plt.hist:
import matplotlib.pyplot as plt
import numpy as np
bin_edges = [0, 2, 4, 6, 8]
data = np.random.rand(50) * 8
plt.hist(data, bins=bin_edges)
You can pass the bins returned from your first histogram plot as an argument to the second histogram to make sure both have the same bin sizes.
Complete answer:
import numpy as np
import matplotlib.pyplot as plt
Data1 = np.array([1, 2, 3, 3, 5, 6, 7, 8])
Data2 = np.array([1, 2, 3, 4, 6, 7, 8, 8])
n, bins, patches = plt.hist(Data1, bins=20, label='Data 1')
plt.hist(Data2, bins=bins, label='Data 2')
plt.ylabel("no of states")
plt.xlabel("bins")
plt.legend()
plt.show()
I have the following dataset:
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[9, 8, 7, 6, 5] ]
Now I plot it with:
import matplotlib.pyplot as plt
plt.plot(x, y)
However, I want to label the 3 y-datasets with this command, which raises an error when .legend() is called:
lineObjects = plt.plot(x, y, label=['foo', 'bar', 'baz'])
plt.legend()
File "./plot_nmos.py", line 33, in <module>
plt.legend()
...
AttributeError: 'list' object has no attribute 'startswith'
When I inspect the lineObjects:
>>> lineObjects[0].get_label()
['foo', 'bar', 'baz']
>>> lineObjects[1].get_label()
['foo', 'bar', 'baz']
>>> lineObjects[2].get_label()
['foo', 'bar', 'baz']
Question
Is there an elegant way to assign multiple labels by just using the .plot() method?
You can iterate over your line objects list, so labels are individually assigned. An example with the built-in python iter function:
lineObjects = plt.plot(x, y)
plt.legend(iter(lineObjects), ('foo', 'bar', 'baz'))`
Edit: after updating to matplotlib 1.1.1, it looks like the plt.plot(x, y), with y as a list of lists (as provided by the author of the question), doesn't work anymore. The one step plotting without iteration over the y arrays is still possible thought after passing y as numpy.array (assuming (numpy)[http://numpy.scipy.org/] as been previously imported).
In this case, use plt.plot(x, y) (if the data in the 2D y array are arranged as columns [axis 1]) or plt.plot(x, y.transpose()) (if the data in the 2D y array are arranged as rows [axis 0])
Edit 2: as pointed by #pelson (see commentary below), the iter function is unnecessary and a simple plt.legend(lineObjects, ('foo', 'bar', 'baz')) works perfectly
It is not possible to plot those two arrays agains each other directly (with at least version 1.1.1), therefore you must be looping over your y arrays. My advice would be to loop over the labels at the same time:
import matplotlib.pyplot as plt
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [9, 8, 7, 6, 5] ]
labels = ['foo', 'bar', 'baz']
for y_arr, label in zip(y, labels):
plt.plot(x, y_arr, label=label)
plt.legend()
plt.show()
Edit: #gcalmettes pointed out that as numpy arrays, it is possible to plot all the lines at the same time (by transposing them). See #gcalmettes answer & comments for details.
I came over the same problem and now I found a solution that is most easy! Hopefully that's not too late for you. No iterator, just assign your result to a structure...
from numpy import *
from matplotlib.pyplot import *
from numpy.random import *
a = rand(4,4)
a
>>> array([[ 0.33562406, 0.96967617, 0.69730654, 0.46542408],
[ 0.85707323, 0.37398595, 0.82455736, 0.72127002],
[ 0.19530943, 0.4376796 , 0.62653007, 0.77490795],
[ 0.97362944, 0.42720348, 0.45379479, 0.75714877]])
[b,c,d,e] = plot(a)
legend([b,c,d,e], ["b","c","d","e"], loc=1)
show()
Looks like this:
The best current solution is:
lineObjects = plt.plot(x, y) # y describes 3 lines
plt.legend(['foo', 'bar', 'baz'])
You can give the labels while plotting the curves
import pylab as plt
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [9, 8, 7, 6, 5] ]
labels=['foo', 'bar', 'baz']
colors=['r','g','b']
# loop over data, labels and colors
for i in range(len(y)):
plt.plot(x,y[i],'o-',color=colors[i],label=labels[i])
plt.legend()
plt.show()
In case of numpy matrix plot assign multiple legends at once for each column
I would like to answer this question based on plotting a matrix that has two columns.
Say you have a 2 column matrix Ret
then one may use this code to assign multiple labels at once
import pandas as pd, numpy as np, matplotlib.pyplot as plt
pd.DataFrame(Ret).plot()
plt.xlabel('time')
plt.ylabel('Return')
plt.legend(['Bond Ret','Equity Ret'], loc=0)
plt.show()
I hope this helps
This problem comes up for me often when I have a single set of x values and multiple y values in the columns of an array. I really don't want to plot the data in a loop, and multiple calls to ax.legend/plt.legend are not really an option, since I want to plot other stuff, usually in an equally annoying format.
Unfortunately, plt.setp is not helpful here. In newer versions of matplotlib, it just converts your entire list/tuple into a string, and assigns the whole thing as a label to all the lines.
I've therefore made a utility function to wrap calls to ax.plot/plt.plot in:
def set_labels(artists, labels):
for artist, label in zip(artists, labels):
artist.set_label(label)
You can call it something like
x = np.arange(5)
y = np.random.ranint(10, size=(5, 3))
fig, ax = plt.subplots()
set_labels(ax.plot(x, y), 'ABC')
This way you get to specify all your normal artist parameters to plot, without having to see the loop in your code. An alternative is to put the whole call to plot into a utility that just unpacks the labels, but that would require a lot of duplication to figure out how to parse multiple datasets, possibly with different numbers of columns, and spread out across multiple arguments, keyword or otherwise.
I used the following to show labels for a dataframe without using the dataframe plot:
lines_ = plot(df)
legend(lines_, df.columns) # df.columns is a list of labels
If you're using a DataFrame, you can also iterate over the columns of the data you want to plot:
# Plot figure
fig, ax = plt.subplots(figsize=(5,5))
# Data
data = data
# Plot
for i in data.columns:
_ = ax.plot(data[i], label=i)
_ = ax.legend()
plt.show()