Matplotlib weird behavior with 2D arrays plot

Matplotlib weird behavior with 2D arrays plot - numpy

As per the matplotlib documentation, x and/or y may be 2D arrays, and in this case the columns are treated as different datasets. When I follow the example in the matplotlib page it works fine:
>>> x = [1, 2, 3]
>>> y = np.array([[1, 2], [3, 4], [5, 6]])
>>> plot(x, y)
However, when I try with larger, float64 arrays, it plots a weird figure. This is what I got
from scipy.stats import chi2
x = np.linspace(0,5,1000)
chi2_2, chi2_5 = chi2.pdf(x,2), chi2.pdf(x,5)
y = np.array((chi2_2,chi2_5)).reshape(1000,2)
fig, ax = plt.subplots()
ax.plot(x,y)
and produces this plot:
if I plot them separately, it comes out fine:
fig, ax = plt.subplots()
ax.plot(x,chi2_2,'b')
ax.plot(x,chi2_5,'r')
I can't figure out what is the difference between the example and my case other then using 2D arrays with Float64 instead of Int64.
Any help is appreciated.

It looks like reshape isn't doing what you expect it to do. I think the function that you are looking for is transpose rather than reshape.
from scipy.stats import chi2
x = np.linspace(0,5,1000)
chi2_2, chi2_5 = chi2.pdf(x,2), chi2.pdf(x,5)
y = np.array((chi2_2,chi2_5)).T
y2 = np.array((chi2_2,chi2_5)).reshape(1000,2)
print(np.array_equal(y,y2))
fig, ax = plt.subplots()
ax.plot(x,y)
plt.show()
Using transpose returns the plot that you want and np.array_equal(y,y2) being False
confirms that the 2 arrays are not the same.
Below is the output:

Related

Changing subplots from 2x2 to 3x3? [duplicate]

I am a little confused about how this code works:
fig, axes = plt.subplots(nrows=2, ncols=2)
plt.show()
How does the fig, axes work in this case? What does it do?
Also why wouldn't this work to do the same thing:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)

There are several ways to do it. The subplots method creates the figure along with the subplots that are then stored in the ax array. For example:
import matplotlib.pyplot as plt
x = range(10)
y = range(10)
fig, ax = plt.subplots(nrows=2, ncols=2)
for row in ax:
for col in row:
col.plot(x, y)
plt.show()
However, something like this will also work, it's not so "clean" though since you are creating a figure with subplots and then add on top of them:
fig = plt.figure()
plt.subplot(2, 2, 1)
plt.plot(x, y)
plt.subplot(2, 2, 2)
plt.plot(x, y)
plt.subplot(2, 2, 3)
plt.plot(x, y)
plt.subplot(2, 2, 4)
plt.plot(x, y)
plt.show()

import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 2)
ax[0, 0].plot(range(10), 'r') #row=0, col=0
ax[1, 0].plot(range(10), 'b') #row=1, col=0
ax[0, 1].plot(range(10), 'g') #row=0, col=1
ax[1, 1].plot(range(10), 'k') #row=1, col=1
plt.show()

You can also unpack the axes in the subplots call
And set whether you want to share the x and y axes between the subplots
Like this:
import matplotlib.pyplot as plt
# fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
ax1, ax2, ax3, ax4 = axes.flatten()
ax1.plot(range(10), 'r')
ax2.plot(range(10), 'b')
ax3.plot(range(10), 'g')
ax4.plot(range(10), 'k')
plt.show()

You might be interested in the fact that as of matplotlib version 2.1 the second code from the question works fine as well.
From the change log:
Figure class now has subplots method
The Figure class now has a subplots() method which behaves the same as pyplot.subplots() but on an existing figure.
Example:
import matplotlib.pyplot as plt
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
plt.show()

Read the documentation: matplotlib.pyplot.subplots
pyplot.subplots() returns a tuple fig, ax which is unpacked in two variables using the notation
fig, axes = plt.subplots(nrows=2, ncols=2)
The code:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
does not work because subplots() is a function in pyplot not a member of the object Figure.

Iterating through all subplots sequentially:
fig, axes = plt.subplots(nrows, ncols)
for ax in axes.flatten():
ax.plot(x,y)
Accessing a specific index:
for row in range(nrows):
for col in range(ncols):
axes[row,col].plot(x[row], y[col])

Subplots with pandas
This answer is for subplots with pandas, which uses matplotlib as the default plotting backend.
Here are four options to create subplots starting with a pandas.DataFrame
Implementation 1. and 2. are for the data in a wide format, creating subplots for each column.
Implementation 3. and 4. are for data in a long format, creating subplots for each unique value in a column.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
Imports and Data
import seaborn as sns # data only
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# wide dataframe
df = sns.load_dataset('planets').iloc[:, 2:5]
orbital_period mass distance
0 269.300 7.10 77.40
1 874.774 2.21 56.95
2 763.000 2.60 19.84
3 326.030 19.40 110.62
4 516.220 10.50 119.47
# long dataframe
dfm = sns.load_dataset('planets').iloc[:, 2:5].melt()
variable value
0 orbital_period 269.300
1 orbital_period 874.774
2 orbital_period 763.000
3 orbital_period 326.030
4 orbital_period 516.220
1. subplots=True and layout, for each column
Use the parameters subplots=True and layout=(rows, cols) in pandas.DataFrame.plot
This example uses kind='density', but there are different options for kind, and this applies to them all. Without specifying kind, a line plot is the default.
ax is array of AxesSubplot returned by pandas.DataFrame.plot
See How to get a Figure object, if needed.
How to save pandas subplots
axes = df.plot(kind='density', subplots=True, layout=(2, 2), sharex=False, figsize=(10, 6))
# extract the figure object; only used for tight_layout in this example
fig = axes[0][0].get_figure()
# set the individual titles
for ax, title in zip(axes.ravel(), df.columns):
ax.set_title(title)
fig.tight_layout()
plt.show()
2. plt.subplots, for each column
Create an array of Axes with matplotlib.pyplot.subplots and then pass axes[i, j] or axes[n] to the ax parameter.
This option uses pandas.DataFrame.plot, but can use other axes level plot calls as a substitute (e.g. sns.kdeplot, plt.plot, etc.)
It's easiest to collapse the subplot array of Axes into one dimension with .ravel or .flatten. See .ravel vs .flatten.
Any variables applying to each axes, that need to be iterate through, are combined with .zip (e.g. cols, axes, colors, palette, etc.). Each object must be the same length.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
cols = df.columns # create a list of dataframe columns to use
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for col, color, ax in zip(cols, colors, axes):
df[col].plot(kind='density', ax=ax, color=color, label=col, title=col)
ax.legend()
fig.delaxes(axes[3]) # delete the empty subplot
fig.tight_layout()
plt.show()
Result for 1. and 2.
3. plt.subplots, for each group in .groupby
This is similar to 2., except it zips color and axes to a .groupby object.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
dfg = dfm.groupby('variable') # get data for each unique value in the first column
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for (group, data), color, ax in zip(dfg, colors, axes):
data.plot(kind='density', ax=ax, color=color, title=group, legend=False)
fig.delaxes(axes[3]) # delete the empty subplot
fig.tight_layout()
plt.show()
4. seaborn figure-level plot
Use a seaborn figure-level plot, and use the col or row parameter. seaborn is a high-level API for matplotlib. See seaborn: API reference
p = sns.displot(data=dfm, kind='kde', col='variable', col_wrap=2, x='value', hue='variable',
facet_kws={'sharey': False, 'sharex': False}, height=3.5, aspect=1.75)
sns.move_legend(p, "upper left", bbox_to_anchor=(.55, .45))

Convert the axes array to 1D
Generating subplots with plt.subplots(nrows, ncols), where both nrows and ncols is greater than 1, returns a nested array of <AxesSubplot:> objects.
It’s not necessary to flatten axes in cases where either nrows=1 or ncols=1, because axes will already be 1 dimensional, which is a result of the default parameter squeeze=True
The easiest way to access the objects, is to convert the array to 1 dimension with .ravel(), .flatten(), or .flat.
.ravel vs. .flatten
flatten always returns a copy.
ravel returns a view of the original array whenever possible.
Once the array of axes is converted to 1-d, there are a number of ways to plot.
This answer is relevant to seaborn axes-level plots, which have the ax= parameter (e.g. sns.barplot(…, ax=ax[0]).
seaborn is a high-level API for matplotlib. See Figure-level vs. axes-level functions and seaborn is not plotting within defined subplots
import matplotlib.pyplot as plt
import numpy as np # sample data only
# example of data
rads = np.arange(0, 2*np.pi, 0.01)
y_data = np.array([np.sin(t*rads) for t in range(1, 5)])
x_data = [rads, rads, rads, rads]
# Generate figure and its subplots
fig, axes = plt.subplots(nrows=2, ncols=2)
# axes before
array([[<AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>]], dtype=object)
# convert the array to 1 dimension
axes = axes.ravel()
# axes after
array([<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>],
dtype=object)
Iterate through the flattened array
If there are more subplots than data, this will result in IndexError: list index out of range
Try option 3. instead, or select a subset of the axes (e.g. axes[:-2])
for i, ax in enumerate(axes):
ax.plot(x_data[i], y_data[i])
Access each axes by index
axes[0].plot(x_data[0], y_data[0])
axes[1].plot(x_data[1], y_data[1])
axes[2].plot(x_data[2], y_data[2])
axes[3].plot(x_data[3], y_data[3])
Index the data and axes
for i in range(len(x_data)):
axes[i].plot(x_data[i], y_data[i])
zip the axes and data together and then iterate through the list of tuples.
for ax, x, y in zip(axes, x_data, y_data):
ax.plot(x, y)
Ouput
An option is to assign each axes to a variable, fig, (ax1, ax2, ax3) = plt.subplots(1, 3). However, as written, this only works in cases with either nrows=1 or ncols=1. This is based on the shape of the array returned by plt.subplots, and quickly becomes cumbersome.
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) for a 2 x 2 array.
This option is most useful for two subplots (e.g.: fig, (ax1, ax2) = plt.subplots(1, 2) or fig, (ax1, ax2) = plt.subplots(2, 1)). For more subplots, it's more efficient to flatten and iterate through the array of axes.

You could use the following:
import numpy as np
import matplotlib.pyplot as plt
fig, _ = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(fig.axes):
ax.plot(np.sin(np.linspace(0,2*np.pi,100) + np.pi/2*i))
Or alternatively, using the second variable that plt.subplot returns:
fig, ax_mat = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(ax_mat.flatten()):
...
ax_mat is a matrix of the axes. It's shape is nrows x ncols.

here is a simple solution
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=False)
for sp in fig.axes:
sp.plot(range(10))

Go with the following if you really want to use a loop:
def plot(data):
fig = plt.figure(figsize=(100, 100))
for idx, k in enumerate(data.keys(), 1):
x, y = data[k].keys(), data[k].values
plt.subplot(63, 10, idx)
plt.bar(x, y)
plt.show()

Another concise solution is:
// set up structure of plots
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20,10))
// for plot 1
ax1.set_title('Title A')
ax1.plot(x, y)
// for plot 2
ax2.set_title('Title B')
ax2.plot(x, y)
// for plot 3
ax3.set_title('Title C')
ax3.plot(x,y)

Iterating over a folder and plotting multiple csv files [duplicate]

I am a little confused about how this code works:
fig, axes = plt.subplots(nrows=2, ncols=2)
plt.show()
How does the fig, axes work in this case? What does it do?
Also why wouldn't this work to do the same thing:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)

There are several ways to do it. The subplots method creates the figure along with the subplots that are then stored in the ax array. For example:
import matplotlib.pyplot as plt
x = range(10)
y = range(10)
fig, ax = plt.subplots(nrows=2, ncols=2)
for row in ax:
for col in row:
col.plot(x, y)
plt.show()
However, something like this will also work, it's not so "clean" though since you are creating a figure with subplots and then add on top of them:
fig = plt.figure()
plt.subplot(2, 2, 1)
plt.plot(x, y)
plt.subplot(2, 2, 2)
plt.plot(x, y)
plt.subplot(2, 2, 3)
plt.plot(x, y)
plt.subplot(2, 2, 4)
plt.plot(x, y)
plt.show()

import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 2)
ax[0, 0].plot(range(10), 'r') #row=0, col=0
ax[1, 0].plot(range(10), 'b') #row=1, col=0
ax[0, 1].plot(range(10), 'g') #row=0, col=1
ax[1, 1].plot(range(10), 'k') #row=1, col=1
plt.show()

You can also unpack the axes in the subplots call
And set whether you want to share the x and y axes between the subplots
Like this:
import matplotlib.pyplot as plt
# fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
ax1, ax2, ax3, ax4 = axes.flatten()
ax1.plot(range(10), 'r')
ax2.plot(range(10), 'b')
ax3.plot(range(10), 'g')
ax4.plot(range(10), 'k')
plt.show()

You might be interested in the fact that as of matplotlib version 2.1 the second code from the question works fine as well.
From the change log:
Figure class now has subplots method
The Figure class now has a subplots() method which behaves the same as pyplot.subplots() but on an existing figure.
Example:
import matplotlib.pyplot as plt
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
plt.show()

Read the documentation: matplotlib.pyplot.subplots
pyplot.subplots() returns a tuple fig, ax which is unpacked in two variables using the notation
fig, axes = plt.subplots(nrows=2, ncols=2)
The code:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
does not work because subplots() is a function in pyplot not a member of the object Figure.

Iterating through all subplots sequentially:
fig, axes = plt.subplots(nrows, ncols)
for ax in axes.flatten():
ax.plot(x,y)
Accessing a specific index:
for row in range(nrows):
for col in range(ncols):
axes[row,col].plot(x[row], y[col])

Subplots with pandas
This answer is for subplots with pandas, which uses matplotlib as the default plotting backend.
Here are four options to create subplots starting with a pandas.DataFrame
Implementation 1. and 2. are for the data in a wide format, creating subplots for each column.
Implementation 3. and 4. are for data in a long format, creating subplots for each unique value in a column.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
Imports and Data
import seaborn as sns # data only
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# wide dataframe
df = sns.load_dataset('planets').iloc[:, 2:5]
orbital_period mass distance
0 269.300 7.10 77.40
1 874.774 2.21 56.95
2 763.000 2.60 19.84
3 326.030 19.40 110.62
4 516.220 10.50 119.47
# long dataframe
dfm = sns.load_dataset('planets').iloc[:, 2:5].melt()
variable value
0 orbital_period 269.300
1 orbital_period 874.774
2 orbital_period 763.000
3 orbital_period 326.030
4 orbital_period 516.220
1. subplots=True and layout, for each column
Use the parameters subplots=True and layout=(rows, cols) in pandas.DataFrame.plot
This example uses kind='density', but there are different options for kind, and this applies to them all. Without specifying kind, a line plot is the default.
ax is array of AxesSubplot returned by pandas.DataFrame.plot
See How to get a Figure object, if needed.
How to save pandas subplots
axes = df.plot(kind='density', subplots=True, layout=(2, 2), sharex=False, figsize=(10, 6))
# extract the figure object; only used for tight_layout in this example
fig = axes[0][0].get_figure()
# set the individual titles
for ax, title in zip(axes.ravel(), df.columns):
ax.set_title(title)
fig.tight_layout()
plt.show()
2. plt.subplots, for each column
Create an array of Axes with matplotlib.pyplot.subplots and then pass axes[i, j] or axes[n] to the ax parameter.
This option uses pandas.DataFrame.plot, but can use other axes level plot calls as a substitute (e.g. sns.kdeplot, plt.plot, etc.)
It's easiest to collapse the subplot array of Axes into one dimension with .ravel or .flatten. See .ravel vs .flatten.
Any variables applying to each axes, that need to be iterate through, are combined with .zip (e.g. cols, axes, colors, palette, etc.). Each object must be the same length.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
cols = df.columns # create a list of dataframe columns to use
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for col, color, ax in zip(cols, colors, axes):
df[col].plot(kind='density', ax=ax, color=color, label=col, title=col)
ax.legend()
fig.delaxes(axes[3]) # delete the empty subplot
fig.tight_layout()
plt.show()
Result for 1. and 2.
3. plt.subplots, for each group in .groupby
This is similar to 2., except it zips color and axes to a .groupby object.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
dfg = dfm.groupby('variable') # get data for each unique value in the first column
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for (group, data), color, ax in zip(dfg, colors, axes):
data.plot(kind='density', ax=ax, color=color, title=group, legend=False)
fig.delaxes(axes[3]) # delete the empty subplot
fig.tight_layout()
plt.show()
4. seaborn figure-level plot
Use a seaborn figure-level plot, and use the col or row parameter. seaborn is a high-level API for matplotlib. See seaborn: API reference
p = sns.displot(data=dfm, kind='kde', col='variable', col_wrap=2, x='value', hue='variable',
facet_kws={'sharey': False, 'sharex': False}, height=3.5, aspect=1.75)
sns.move_legend(p, "upper left", bbox_to_anchor=(.55, .45))

Convert the axes array to 1D
Generating subplots with plt.subplots(nrows, ncols), where both nrows and ncols is greater than 1, returns a nested array of <AxesSubplot:> objects.
It’s not necessary to flatten axes in cases where either nrows=1 or ncols=1, because axes will already be 1 dimensional, which is a result of the default parameter squeeze=True
The easiest way to access the objects, is to convert the array to 1 dimension with .ravel(), .flatten(), or .flat.
.ravel vs. .flatten
flatten always returns a copy.
ravel returns a view of the original array whenever possible.
Once the array of axes is converted to 1-d, there are a number of ways to plot.
This answer is relevant to seaborn axes-level plots, which have the ax= parameter (e.g. sns.barplot(…, ax=ax[0]).
seaborn is a high-level API for matplotlib. See Figure-level vs. axes-level functions and seaborn is not plotting within defined subplots
import matplotlib.pyplot as plt
import numpy as np # sample data only
# example of data
rads = np.arange(0, 2*np.pi, 0.01)
y_data = np.array([np.sin(t*rads) for t in range(1, 5)])
x_data = [rads, rads, rads, rads]
# Generate figure and its subplots
fig, axes = plt.subplots(nrows=2, ncols=2)
# axes before
array([[<AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>]], dtype=object)
# convert the array to 1 dimension
axes = axes.ravel()
# axes after
array([<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>],
dtype=object)
Iterate through the flattened array
If there are more subplots than data, this will result in IndexError: list index out of range
Try option 3. instead, or select a subset of the axes (e.g. axes[:-2])
for i, ax in enumerate(axes):
ax.plot(x_data[i], y_data[i])
Access each axes by index
axes[0].plot(x_data[0], y_data[0])
axes[1].plot(x_data[1], y_data[1])
axes[2].plot(x_data[2], y_data[2])
axes[3].plot(x_data[3], y_data[3])
Index the data and axes
for i in range(len(x_data)):
axes[i].plot(x_data[i], y_data[i])
zip the axes and data together and then iterate through the list of tuples.
for ax, x, y in zip(axes, x_data, y_data):
ax.plot(x, y)
Ouput
An option is to assign each axes to a variable, fig, (ax1, ax2, ax3) = plt.subplots(1, 3). However, as written, this only works in cases with either nrows=1 or ncols=1. This is based on the shape of the array returned by plt.subplots, and quickly becomes cumbersome.
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) for a 2 x 2 array.
This option is most useful for two subplots (e.g.: fig, (ax1, ax2) = plt.subplots(1, 2) or fig, (ax1, ax2) = plt.subplots(2, 1)). For more subplots, it's more efficient to flatten and iterate through the array of axes.

You could use the following:
import numpy as np
import matplotlib.pyplot as plt
fig, _ = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(fig.axes):
ax.plot(np.sin(np.linspace(0,2*np.pi,100) + np.pi/2*i))
Or alternatively, using the second variable that plt.subplot returns:
fig, ax_mat = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(ax_mat.flatten()):
...
ax_mat is a matrix of the axes. It's shape is nrows x ncols.

here is a simple solution
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=False)
for sp in fig.axes:
sp.plot(range(10))

Go with the following if you really want to use a loop:
def plot(data):
fig = plt.figure(figsize=(100, 100))
for idx, k in enumerate(data.keys(), 1):
x, y = data[k].keys(), data[k].values
plt.subplot(63, 10, idx)
plt.bar(x, y)
plt.show()

Another concise solution is:
// set up structure of plots
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20,10))
// for plot 1
ax1.set_title('Title A')
ax1.plot(x, y)
// for plot 2
ax2.set_title('Title B')
ax2.plot(x, y)
// for plot 3
ax3.set_title('Title C')
ax3.plot(x,y)

change color of bar for data selection in seaborn histogram (or plt)

Let's say I have a dataframe like:
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)
a = pd.DataFrame({"X3": X3, "X2":X2})
and I am doing the following plotting routine:
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
sns.distplot(a[c], ax = axes[1,i])
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
plt.tight_layout()
plt.show()
Which yields to:
Now I want to be able to perform a data selection on the dataframe a. Let's say something like:
b = a[(a['X2'] <4)]
and highlight the selection from b in the posted histograms.
for example if the first row of b is [32:0] for X3 and [0:5] for X2, the desired output would be:
is it possible to do this with the above for loop and with sns? Many thanks!
EDIT: I am also happy with a matplotlib solution, if easier.
EDIT2:
If it helps, it would be similar to do the following:
b = a[(a['X3'] >38)]
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
sns.distplot(a[c], ax = axes[1,i])
sns.distplot(b[c], ax = axes[1,i])
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
plt.tight_layout()
plt.show()
which yields the following:
However, I would like to be able to just colour those bars in the first plot in a different colour!
I also thought about setting the ylim to only the size of the blue plot so that the orange won't distort the shape of the blue distribution, but it wouldn't still be feasible, as in reality I have about 10 histograms to show, and setting ylim would be pretty much the same as sharey=True, which Im trying to avoid, so that I'm able to show the true shape of the distributions.

I think I found the solution for this using the inspiration from the previous answer and this video:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)
a = pd.DataFrame({"X3": X3, "X2":X2})
b = a[(a['X3'] < 30)]
hist_idx=[]
for i, c in enumerate(a.columns):
bin_ = np.histogram(a[c], bins=20)[1]
hist = np.where(np.logical_and(bin_<=max(b[c]), bin_>min(b[c])))
hist_idx.append(hist)
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
axes[1, i].hist(a[c], bins = 20)
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
for it, index in enumerate(hist_idx):
lenght = len(index[0])
for r in range(lenght):
try:
axes[1, it].patches[index[0][r]-1].set_fc("red")
except:
pass
plt.tight_layout()
plt.show()
which yields the following for b = a[(a['X3'] < 30)] :
or for b = a[(a['X3'] > 36)]:
Thought I'd leave it here - although niche, might help someone in the future!

I created the following code with the understanding that the intent of your question is to add a different color to the histogram based on the data extracted under certain conditions.
Use np.histogram() to get an array of frequencies and an array of bins. Get the index of the value closest to the value of the first row of data extracted for a certain condition. Change the color of the histogram with that retrieved index. The same method can be used to deal with the other graph.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)
a = pd.DataFrame({"X3": X3, "X2":X2})
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
sns.distplot(a[c], ax = axes[1,i])
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
b = a[(a['X2'] <4)]
hist3, bins3 = np.histogram(X3)
idx = np.abs(np.asarray(hist3) - b['X3'].head(1).values[0]).argmin()
for k in range(idx):
axes[1,0].get_children()[k].set_color("red")
plt.tight_layout()
plt.show()

Calculating and plotting parametric equations in sympy

So i'm struggling with these parametric equations in Sympy.
𝑓(𝜃) = cos(𝜃) − sin(𝑎𝜃) and 𝑔(𝜃) = sin(𝜃) + cos(𝑎𝜃)
with 𝑎 ∈ ℝ∖{0}.
import matplotlib.pyplot as plt
import sympy as sp
from IPython.display import display
sp.init_printing()
%matplotlib inline
This is what I have to define them:
f = sp.Function('f')
g = sp.Function('g')
f = sp.cos(th) - sp.sin(a*th)
g = sp.sin(th) + sp.cos(a*th)
I don't know how to define a with the domain ℝ∖{0} and it gives me trouble when I want to solve the equation
𝑓(𝜃)+𝑔(𝜃)=0
The solution should be:
𝜃=[3𝜋/4,3𝜋/4𝑎,𝜋/2(𝑎−1),𝜋/(𝑎+1)]
Next I want to plot the parametric equations when a=2, a=4, a=6 and a=8. I want to have a different color for every value of a. The most efficient way will probably be with a for-loop.
I also need to use lambdify to have a list of values but I'm fairly new to this so it's a bit vague.
This is what I already have:
fig, ax = plt.subplots(1, figsize=(12, 12))
theta_range = np.linspace(0, 2*np.pi, 750)
colors = ['blue', 'green', 'orange', 'cyan']
a = [2, 4, 6, 8]
for index in range(0, 4):
# I guess I need to use lambdify here but I don't see how
plt.show()
Thank you in advance!

You're asking two very different questions. One question about solving a symbolic expression, and one about plotting curves.
First, about the symbolic expression. a can be defined as a = sp.symbols('a', real=True, nonzero=True) and theta as th = sp.symbols('theta', real=True). There is no need to define f and g as sympy symbols, as they get assigned a sympy expression. To solve the equation, just use sp.solve(f+g, th). Sympy gives [pi, pi/a, pi/(2*(a - 1)), pi/(a + 1)] as the result.
Sympy also has a plotting function, which could be called as sp.plot(*[(f+g).subs({a:a_val}) for a_val in [2, 4, 6, 8]]). But there is very limited support for options such as color.
To have more control, matplotlib can do the plotting based on numpy functions. sp.lambdify converts the expression: sp.lambdify((th, a), f+g, 'numpy').
Then, matplotlib can do the plotting. There are many options to tune the result.
Here is some example code:
import matplotlib.pyplot as plt
import numpy as np
import sympy as sp
th = sp.symbols('theta', real=True)
a = sp.symbols('a', real=True, nonzero=True)
f = sp.cos(th) - sp.sin(a*th)
g = sp.sin(th) + sp.cos(a*th)
thetas = sp.solve(f+g, th)
print("Solutions for theta:", thetas)
fg_np = sp.lambdify((th, a), f+g, 'numpy')
fig, ax = plt.subplots(1, figsize=(12, 12))
theta_range = np.linspace(0, 2*np.pi, 750)
colors = plt.cm.Set2.colors
for a_val, color in zip([2,4,6,8], colors):
plt.plot(theta_range, fg_np(theta_range, a_val), color=color, label=f'a={a_val}')
plt.axhline(0, color='black')
plt.xlabel("theta")
plt.ylabel(f+g)
plt.legend()
plt.grid()
plt.autoscale(enable=True, axis='x', tight=True)
plt.show()

Matplotlib: Don't show errorbars in legend

I'm plotting a series of data points with x and y error but do NOT want the errorbars to be included in the legend (only the marker). Is there a way to do so?
Example:
import matplotlib.pyplot as plt
import numpy as np
subs=['one','two','three']
x=[1,2,3]
y=[1,2,3]
yerr=[2,3,1]
xerr=[0.5,1,1]
fig,(ax1)=plt.subplots(1,1)
for i in np.arange(len(x)):
ax1.errorbar(x[i],y[i],yerr=yerr[i],xerr=xerr[i],label=subs[i],ecolor='black',marker='o',ls='')
ax1.legend(loc='upper left', numpoints=1)
fig.savefig('test.pdf', bbox_inches=0)

You can modify the legend handler. See the legend guide of matplotlib.
Adapting your example, this could read:
import matplotlib.pyplot as plt
import numpy as np
subs=['one','two','three']
x=[1,2,3]
y=[1,2,3]
yerr=[2,3,1]
xerr=[0.5,1,1]
fig,(ax1)=plt.subplots(1,1)
for i in np.arange(len(x)):
ax1.errorbar(x[i],y[i],yerr=yerr[i],xerr=xerr[i],label=subs[i],ecolor='black',marker='o',ls='')
# get handles
handles, labels = ax1.get_legend_handles_labels()
# remove the errorbars
handles = [h[0] for h in handles]
# use them in the legend
ax1.legend(handles, labels, loc='upper left',numpoints=1)
plt.show()
This produces

Here is an ugly patch:
pp = []
colors = ['r', 'b', 'g']
for i, (y, yerr) in enumerate(zip(ys, yerrs)):
p = plt.plot(x, y, '-', color='%s' % colors[i])
pp.append(p[0])
plt.errorbar(x, y, yerr, color='%s' % colors[i])
plt.legend(pp, labels, numpoints=1)
Here is a figure for example:

The accepted solution works in simple cases but not in general. In particular, it did not work in my own more complex situation.
I found a more robust solution, which tests for ErrorbarContainer, which did work for me. It was proposed by Stuart W D Grieve and I copy it here for completeness
import matplotlib.pyplot as plt
from matplotlib import container
label = ['one', 'two', 'three']
color = ['red', 'blue', 'green']
x = [1, 2, 3]
y = [1, 2, 3]
yerr = [2, 3, 1]
xerr = [0.5, 1, 1]
fig, (ax1) = plt.subplots(1, 1)
for i in range(len(x)):
ax1.errorbar(x[i], y[i], yerr=yerr[i], xerr=xerr[i], label=label[i], color=color[i], ecolor='black', marker='o', ls='')
handles, labels = ax1.get_legend_handles_labels()
handles = [h[0] if isinstance(h, container.ErrorbarContainer) else h for h in handles]
ax1.legend(handles, labels)
plt.show()
It produces the following plot (on Matplotlib 3.1)

I works for me if I set the label argument as a None type.
plt.errorbar(x, y, yerr, label=None)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Matplotlib weird behavior with 2D arrays plot - numpy

Related

Changing subplots from 2x2 to 3x3? [duplicate]

Iterating over a folder and plotting multiple csv files [duplicate]

change color of bar for data selection in seaborn histogram (or plt)

Calculating and plotting parametric equations in sympy

Matplotlib: Don't show errorbars in legend

Categories

Resources