My pandas-generated subplots are layouted incorrectly

My pandas-generated subplots are layouted incorrectly - numpy

I ran the following code to get two plots next to each other (it is a minimal working example that you can copy):
import pandas as pd
import numpy as np
from matplotlib.pylab import plt
comp1 = np.random.normal(0,1,size=200)
values = pd.Series(comp1)
plt.close("all")
f = plt.figure()
plt.show()
sp1 = f.add_subplot(2,2,1)
values.hist(bins=100, alpha=0.5, color="r", normed=True)
sp2 = f.add_subplot(2,2,2)
values.plot(kind="kde")
Unfortunately, I then get the following image:
This is also an interesting layout, but I wanted the figures to be next to each other. What did I do wrong? How can I correct it?
For clarity, I could also use this:
import pandas as pd
import numpy as np
from matplotlib.pylab import plt
comp1 = np.random.normal(0,1,size=200)
values = pd.Series(comp1)
plt.close("all")
fig, axes = plt.subplots(2,2)
plt.show()
axes[0,0].hist(values, bins=100, alpha=0.5, color="r", normed=True) # Until here, it works. You get a half-finished correct image of what I was going for (though it is 2x2 here)
axes[0,1].plot(values, kind="kde") # This does not work
Unfortunately, in this approach axes[0,1] refers to the subplot that has a plot method but does not know kind="kde". Please take into consideration that the in the first version plot is executed on the pandas object, whereas in the second version plot is executed on the subplot, which does not work with the kind="kde" parameter.

use ax= argument to set which subplot object to plot:
import pandas as pd
import numpy as np
from matplotlib.pylab import plt
comp1 = np.random.normal(0,1,size=200)
values = pd.Series(comp1)
plt.close("all")
f = plt.figure()
sp1 = f.add_subplot(2,2,1)
values.hist(bins=100, alpha=0.5, color="r", normed=True, ax=sp1)
sp2 = f.add_subplot(2,2,2)
values.plot(kind="kde", ax=sp2)

Related

Cannot set ylim for sklearn partial dependence plot

I am following this example on sklearn documentation
I want to change the limits of y axis so I can visually compare results from different models.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.tree import DecisionTreeRegressor
from sklearn.inspection import PartialDependenceDisplay
diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = diabetes.target
tree = DecisionTreeRegressor()
tree.fit(X, y)
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_ylim(50,300)
tree_disp = PartialDependenceDisplay.from_estimator(tree, X, ["age", "bmi"], ax=ax)
However, it seems that ax.set_ylim get ignored no matter what I specify. On the other hand, ax.set_title given in example works fine.

PartialDependenceDisplay have an axes_ attribute that represents both matplotlib's axes of the figure.
You can modify them as follow:
tree_disp = PartialDependenceDisplay.from_estimator(tree, X, ["age", "bmi"], ax=ax)
tree_disp.axes_[0][0].set_ylim(50,300)
tree_disp.axes_[0][1].set_ylim(50,300)
This will output the following plot:

Making sure 0 gets white in a RdBu colorbar

I create a heatmap with the following snippet:
import numpy as np
import matplotlib.pyplot as plt
d = np.random.normal(.4,2,(10,10))
plt.imshow(d,cmap=plt.cm.RdBu)
plt.colorbar()
plt.show()
The result is plot below:
Now, since the middle point of the data is not 0, the cells in which the colormap has value 0 are not white, but rather a little reddish.
How do I force the colormap so that max=blue, min=red and 0=white?

Use a DivergingNorm.
Note: From matplotlib 3.2 onwards DivergingNorm is renamed to TwoSlopeNorm.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
d = np.random.normal(.4,2,(10,10))
norm = mcolors.DivergingNorm(vmin=d.min(), vmax = d.max(), vcenter=0)
plt.imshow(d, cmap=plt.cm.RdBu, norm=norm)
plt.colorbar()
plt.show()

A previous SO post (Change colorbar gradient in matplotlib) wanted a solution for a more complicated situation, but one of the answers talked about the MidpointNormalize subclass in the matplotlib documentation. With that, the solution becomes:
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
class MidpointNormalize(mpl.colors.Normalize):
## class from the mpl docs:
# https://matplotlib.org/users/colormapnorms.html
def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
self.midpoint = midpoint
super().__init__(vmin, vmax, clip)
def __call__(self, value, clip=None):
# I'm ignoring masked values and all kinds of edge cases to make a
# simple example...
x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
return np.ma.masked_array(np.interp(value, x, y))
d = np.random.normal(.4,2,(10,10))
plt.imshow(d,cmap=plt.cm.RdBu,norm=MidpointNormalize(midpoint=0))
plt.colorbar()
plt.show()
Kudos to Joe Kington for writing the subclass, and to Rutger Kassies for pointing out the answer.

Understanding plt.show() in Matplotlib

import numpy as np
import os.path
from skimage.io import imread
from skimage import data_dir
img = imread(os.path.join(data_dir, 'checker_bilevel.png'))
import matplotlib.pyplot as plt
#plt.imshow(img, cmap='Blues')
#plt.show()
imgT = img.T
plt.figure(1)
plt.imshow(imgT,cmap='Greys')
#plt.show()
imgR = img.reshape(20,5)
plt.figure(2)
plt.imshow(imgR,cmap='Blues')
plt.show(1)
I read that plt.figure() will create or assign the image a new ID if not explicitly given one. So here, I have given the two figures, ID 1 & 2 respectively. Now I wish to see only one one of the image.
I tried plt.show(1) epecting ONLY the first image will be displayed but both of them are.
What should I write to get only one?

plt.clf() will clear the figure
import matplotlib.pyplot as plt
plt.plot(range(10), 'r')
plt.clf()
plt.plot(range(12), 'g--')
plt.show()

plt.show will show all the figures created. The argument you forces the figure to be shown in a non-blocking way. If you only want to show a particular figure you can write a wrapper function.
import matplotlib.pyplot as plt
figures = [plt.subplots() for i in range(5)]
def show(figNum, figures):
if plt.fignum_exists(figNum):
fig = [f[0] for f in figures if f[0].number == figNum][0]
fig.show()
else:
print('figure not found')

matplotlib adding string to a an axis

import matplotlib.pyplot as plt
import numpy as np
ydata = [55,60,65,70,75,80]
xdata = [1,2,3,4,5,6]
plt.plot(xdata, ydata)
set(plt.gca,'XTickLabel',{'Jan','Feb','Mar','April','May','June'})
plt.show()
I am using matplotlib and trying to add text values to appear on the x axis.
I have tried to use the following code but get the following error message
set(plt.gca,'XTickLabel',
{'Jan','Feb','Mar','April','May','June'})
TypeError: set expected at most 1 arguments, got 3 I am not sure what this
is referring get current access I have set the value

Sets are a Python data structure, it has nothing to do with what you want here, you only need to use ax.set_xticklabels and ax.set_xticks to ensure all of them show in the plot:
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
ydata = [55,60,65,70,75,80]
xdata = [1,2,3,4,5,6]
ax.set_xticks(xdata)
ax.set_xticklabels(['Jan','Feb','Mar','April','May','June'])
plt.plot(xdata, ydata)
plt.show()

Annotate labels in pandas scatter plot

I saw this method from an older post but can't get the plot I want.
To start
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10),'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
scatter plot
ax = df.plot('x','y', kind='scatter', s=50)
Then define a function to iterate the rows to annotate
def annotate_df(row):
ax.annotate(row.name, row.values,
xytext=(10,-5),
textcoords='offset points',
size=18,
color='darkslategrey')
Last apply to get annotation
ab= df.apply(annotate_df, axis=1)
Somehow I just get a series ab instead of the scatter plot I want. Where is wrong? Thank you!

Your code works, you just need plt.show() at the end.
Your full code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10),'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
ax = df.plot('x','y', kind='scatter', s=50)
def annotate_df(row):
ax.annotate(row.name, row.values,
xytext=(10,-5),
textcoords='offset points',
size=18,
color='darkslategrey')
ab= df.apply(annotate_df, axis=1)
plt.show()

Looks like that this doesn't work any more, however the solution is easy: convert row.values from numpy.ndarray to list:
list(row.values)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

My pandas-generated subplots are layouted incorrectly - numpy

Related

Cannot set ylim for sklearn partial dependence plot

Making sure 0 gets white in a RdBu colorbar

Understanding plt.show() in Matplotlib

matplotlib adding string to a an axis

Annotate labels in pandas scatter plot

Categories

Resources