How to use double markers in matplotlib - matplotlib

I wish to indicate that one curve is a hybrid of two other curves. I thought it would be a good idea to use the upper/lower triangles, and an overlay/superposition of the two for the hybrid. I.e.
'^'
'v'
a hexagram
Problem: there's no hexagram marker. And I don't manage to use any of the latex symbols (\DavidStar, \davidsstar, \largestarofdavid). How can I do so?
Alternative strategy:
overlay '^' and 'v'. How?
Alternative strategy: use some other 3 symbols that complement one another. However, I cannot find such a set in matplotlib.
Edit
In response to comments I tried this:
from matplotlib import pyplot
import numpy as np
fig, ax = pyplot.subplots()
a = np.arange(5)
lh1, = ax.plot(a, a, 'k', marker='v',ms=30,markerfacecolor='none',markeredgewidth=1.5)
lh2, = ax.plot(a, a, 'k', marker='^',ms=30,markerfacecolor='none',markeredgewidth=1.5)
ax.legend([lh1,lh2], ['1','2'] )
And this:
lh1, = ax.plot(a, a , 'r', marker=10 ,ms=30,markerfacecolor='none',markeredgewidth=1.5)
lh2, = ax.plot(a, 2*a, 'y', marker=11 ,ms=30,markerfacecolor='none',markeredgewidth=1.5)
lh3, = ax.plot(a, 3*a, 'o', marker='D',ms=30,markerfacecolor='none',markeredgewidth=1.5)
But the result is not good, and I'd like to avoid this type of hacking.

Related

Draw bar-charts with value_counts() for multiple columns in a Pandas DataFrame

I'm trying to draw bar-charts with counts of unique values for all columns in a Pandas DataFrame. Kind of what df.hist() does for numerical columns, but I have categorical columns.
I'd prefer to use the object-oriented approach, because if feels more natural and explicit to me.
I'd like to have multiple Axes (subplots) within a single Figure, in a grid fashion (again like what df.hist() does).
My solution below does exactly what I want, but it feels cumbersome. I doubt whether I really need the direct dependency on Matplotlib (and all the code for creating the Figure, removing the unused Axes etc.). I see that pandas.Series.plot has parameters subplots and layout which seem to point to what I want, but maybe I'm totally off here. I tried looping over the columns in my DataFrame and apply these parameters, but I cannot figure it out.
Does anyone know a more compact way to do what I'm trying to achieve?
# Defining the grid-dimensions of the Axes in the Matplotlib Figure
nr_of_plots = len(ames_train_categorical.columns)
nr_of_plots_per_row = 4
nr_of_rows = math.ceil(nr_of_plots / nr_of_plots_per_row)
# Defining the Matplotlib Figure and Axes
figure, axes = plt.subplots(nrows=nr_of_rows, ncols=nr_of_plots_per_row, figsize=(25, 50))
figure.subplots_adjust(hspace=0.5)
# Plotting on the Axes
i, j = 0, 0
for column_name in ames_train_categorical:
if ames_train_categorical[column_name].nunique() <= 30:
axes[i][j].set_title(column_name)
ames_train_categorical[column_name].value_counts().plot(kind='bar', ax=axes[i][j])
j += 1
if j % nr_of_plots_per_row == 0:
i += 1
j = 0
# Cleaning up unused Axes
# plt.subplots creates a square grid of Axes. On the last row, not all Axes will always be used. Unused Axes are removed here.
axes_flattened = axes.flatten()
for ax in axes_flattened:
if not ax.has_data():
ax.remove()
Edit: alternative idea
Using the pyplot/state-machine WoW, you could do it like this with very limited lines of code. But this also has the downside that every graph gets it's own figure, you they're not nicely arranged in a grid.
for column_name in ames_train_categorical:
ames_train_categorical[column_name].value_counts().plot(kind='bar')
plt.show()
Desired output
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"MS Zoning": ["RL", "FV", "RL", "RH", "RL", "RL"],
"Street": ["Pave", "Pave", "Pave", "Grvl", "Pave", "Pave"],
"Alley": ["Grvl", "Grvl", "Grvl", "Grvl", "Pave", "Pave"],
"Utilities": ["AllPub", "NoSewr", "AllPub", "AllPub", "NoSewr", "AllPub"],
"Land Slope": ["Gtl", "Mod", "Sev", "Mod", "Sev", "Sev"],
}
)
Here is a bit more idiomatic way to do it:
import math
from matplotlib import pyplot as plt
size = math.ceil(df.shape[1]** (1/2))
fig = plt.figure()
for i, col in enumerate(df.columns):
fig.add_subplot(size, size, i + 1)
df[col].value_counts().plot(kind="bar", ax=plt.gca(), title=col, rot=0)
fig.tight_layout()

FacetGrid plot with aggregate in Seaborn/other library

I've toy-dataframe like this:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'cat': ['a', 'a', 'a', 'b', 'b', 'b'], 'n1': [1,1,1,4,5,6], 'n2': [6,5,2,2,2,1]})
I want to groupby by cat and plot histograms for n1 and n2, additionally I want to plot those histograms without grouping, so first, transform data to seaborn format:
df2 = pd.melt(df, id_vars='cat', value_vars=['n1', 'n2'], value_name='value')
second add "all":
df_all = df2.copy()
df_all['cat'] = 'all'
df3 = pd.concat([df2, df_all])
Finally plot:
g = sns.FacetGrid(df2, col="variable", row="cat")
g.map(plt.hist, 'value', ec="k")
I wonder, if it could be done in more elegant, concise way, without creating df3 or df2. Different library could be used.
As I mentioned in my comment, I think what you do is perfectly fine. Craft a function if needed to perform often. Nevertheless, you might be interested in pandas_profiling. This describes in detail the profile of your data, and in an interactive way. In my opinion, this is probably overkill for what you want to do, but I'll let you be the judge of that ;)
import pandas_profiling
df.profile_report()
Extract of the interactive output:

Pandas histogram plot with Y axis or colorbar

In Pandas, I am trying to generate a Ridgeline plot for which the density values are shown (either as Y axis or color-ramp). I am using the Joyplot but any other alternative ways are fine.
So, first I created the Ridge plot to show the different distribution plot for each condition (you can reproduce it using this code):
import pandas as pd
import joypy
import matplotlib
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'Category1':np.random.choice(['C1','C2','C3'],1000),'Category2':np.random.choice(['B1','B2','B3','B4','B5'],1000),
'year':np.arange(start=1900, stop=2900, step=1),
'Data':np.random.uniform(0,1,1000),"Period":np.random.choice(['AA','CC','BB','DD'],1000)})
data_pivot=df1.pivot_table('Data', ['Category1', 'Category2','year'], 'Period')
fig, axes = joypy.joyplot(data_pivot, column=['AA', 'BB', 'CC', 'DD'], by="Category1", ylim='own', figsize=(14,10), legend=True, alpha=0.4)
so it generates the figure but without my desired Y axis. So, based on this post, I could add a colorramp, which neither makes sense nor show the differences between the distribution plot of the different categories on each line :) ...
ar=df1['Data'].plot.kde().get_lines()[0].get_ydata() ## a workaround to get the probability values to set the colorramp max and min
norm = plt.Normalize(ar.min(), ar.max())
original_cmap = plt.cm.viridis
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(ar)))
sm = matplotlib.cm.ScalarMappable(cmap=original_cmap, norm=norm)
sm.set_array([])
# plotting ....
fig, axes = joypy.joyplot(data_pivot,colormap = cmap , column=['AA', 'BB', 'CC', 'DD'], by="Category1", ylim='own', figsize=(14,10), legend=True, alpha=0.4)
fig.colorbar(sm, ax=axes, label="density")
But what I want is some thing like either of these figures (preferably with colorramp) :

bars not proportional to value - matplotlib bar chart [duplicate]

This question already has an answer here:
Difference in plotting with different matplotlib versions
(1 answer)
Closed 4 years ago.
I am new to matplotlib and am trying to plot a bar chart using pyplot. Instead of getting a plot where the height of bar represents the value, I am getting bars that are linearly increasing in height while their values are displayed on the y-axis as labels.
payment_modes = ['Q', 'NO', 'A', 'C', 'P', 'E', 'D']
l1=[]
l2=[]
for i in payment_modes:
l.append(str(len(df[df['PMODE_FEB18']==i])))
# here l = ['33906', '37997', '815', '4350', '893', '98', '6']
plt.figure()
plt.bar(range(7),l)
This is what I am getting:
The problem is that you seem to be feeding bar with strings, not with numerical quantities. If you instead use the actual numerical quantities, bar will behave as you would expect:
import matplotlib.pyplot as plt
l = [33906, 37997, 815, 4350, 893, 98, 6]
plt.figure()
plt.bar(range(7),l)
plt.show()
gives

How can I change the filled color of stacked area plot in DataFrame?

I want to change the filled color in the stacked area plots drawn with Pandas.Dataframe.
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
ax = df.plot.area(linewidth=0);
The area plot example
Now I guess that the instance return by the plot function offers the access to modifying the attributes like colors.
But the axes classes are too complicated to learn fast. And I failed to find similar questions in the Stack Overflow.
So can any master do me a favor?
Use 'colormap' (See the document for more details):
ax = df.plot.area(linewidth=0, colormap="Pastel1")
The trick is using the 'color' parameter:
Soln 1: dict
Simply pass a dict of {column name: color}
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'], )
ax = df.plot.area(color={'b':'0', 'c':'#17A589', 'a':'#9C640C', 'd':'#ECF0F1'})
Soln 2: sequence
Simply pass a sequence of color codes (it will match the order of your columns).
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'], )
ax = df.plot.area(color=('0', '#17A589', '#9C640C', '#ECF0F1'))
No need to set linewidth (it will automatically adjust colors). Also, this wouldn't mess with the legend.
The API of matplotlib is really complex, but here artist Module gives a very plain illustration. For the bar/barh plots, the attributes can be visited and modified by .patches, but for the area plot they need to be with .collections.
To achieve the specific modification, use codes like this.
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
ax = df.plot.area(linewidth=0);
for collection in ax.collections:
collection.set_facecolor('#888888')
highlight = 0
ax.collections[highlight].set_facecolor('#aa3333')
Other methods of the collections can be found by run
dir(ax.collections[highlight])