I am developing hierarchal clusters in the form of dendrograms using Python 3.4 and Seaborn, using the work of Olga Botvinnik (http://nbviewer.ipython.org/gist/olgabot/bfe1e3638af3eea52fb1#). My goal is to cluster U.S. cities based on greenhouse gas emissions. I was able to successfully read my csv file and create a figure with residential and commercial buildings emissions on the x axis and city names on the y axis, but I cannot see any of the city names because they are too squished together. The image needs to be elongated so that I can read it. Can anyone point me in a good direction?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('/Users/JCMartel 1/Desktop/ghg_directory/rescom.csv', index_col=0)
data.index = data.index.map(lambda x: x.strip())
sns.clustermap(data);
#Need to improve layout
fig = plt.gcf()
fig.savefig('clustermap_bbox_tight.png', bbox_inches='tight')
Following is the final script that I am using:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('/Users/JCMartel 1/Desktop/ghg_directory/ghgmodel4.csv', index_col=0)
data.index = data.index.map(lambda x: x.strip())
cmap = sns.cubehelix_palette(as_cmap=True, rot=-.3, light=1)
sns.clustermap(data, col_cluster=False, cmap=cmap, linewidths=.5, figsize=(8, 30))
Related
I'm trying to transform the scales on y-axis to the log values. For example, if one of the numbers on y is 0.01, I want to get -2 (which is log(0.01)). How should I do this in matplotlib (or any other library)?!
Thanks,
Without plt.yscale('log') there will be few y-ticks visible that have a nice number as log. You can change the "formatter" to a function that only shows the exponent. Also note that in the latest seaborn version distplot has been replaced by histplot(..., kde=True) or kdeplot(...).
Here is an example:
import matplotlib.pyplot as plt
from matplotlib.ticker import LogFormatterExponent
import numpy as np
import seaborn as sns
x = np.random.randn(10, 1000).cumsum(axis=1).ravel()
ax = sns.histplot(x, kde=True, stat='density', color='purple')
ax.set_yscale('log')
ax.yaxis.set_major_formatter(LogFormatterExponent(base=10.0, labelOnlyBase=True))
ax.set_ylabel(ax.get_ylabel() + ' (exponent)')
ax.margins(x=0)
plt.show()
I am trying to create a grouped bar graph using Seaborn but I am getting a bit lost in the weeds. I actually have it working but it does not feel like an elegant solution. Seaborn only seems to support clustered bar graphs when there is a binary option such as Male/Female. (https://seaborn.pydata.org/examples/grouped_barplot.html)
It does not feel right having to fall back onto matplotlib so much - using the subplots feels a bit dirty :). Is there a way of handling this completely in Seaborn?
Thanks,
Andrew
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
sns.set_theme(style="whitegrid")
rcParams.update({'figure.autolayout': True})
dataframe = pd.read_csv("https://raw.githubusercontent.com/mooperd/uk-towns/master/uk-towns-sample.csv")
dataframe = dataframe.groupby(['nuts_region']).agg({'elevation': ['mean', 'max', 'min'],
'nuts_region': 'size'}).reset_index()
dataframe.columns = list(map('_'.join, dataframe.columns.values))
# We need to melt our dataframe down into a long format.
tidy = dataframe.melt(id_vars='nuts_region_').rename(columns=str.title)
# Create a subplot. A Subplot makes it convenient to create common layouts of subplots.
# https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.subplots.html
fig, ax1 = plt.subplots(figsize=(6, 6))
# https://stackoverflow.com/questions/40877135/plotting-two-columns-of-dataframe-in-seaborn
g = sns.barplot(x='Nuts_Region_', y='Value', hue='Variable', data=tidy, ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
I'm not sure why you need seaborn. Your data is wide format, so pandas does it pretty well without the need for melting:
from matplotlib import rcParams
sns.set(style="whitegrid")
rcParams.update({'figure.autolayout': True})
fig, ax1 = plt.subplots(figsize=(12,6))
dataframe.plot.bar(x='nuts_region_', ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
Output:
So I tried to make a categorical plot of my data and this is what my code and the graph.
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
sns.set(style="whitegrid")
sns.set_style("ticks")
sns.set_context("paper", font_scale=1, rc={"lines.linewidth": 6})
sns.catplot(y = "Region",x = "Interest by subregion",data = sample)
Image:
How can I make the y-labels more spread out and have a bigger font?
Try using sns.figure(figsize(x,y)) and sns.set_context(context=None,font_scale=1).
Try different values for these parameters to get the best results.
I'm generating a heatmap from a pandas dataframe using a code that looks like this on my apple computer.
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax = plt.subplots(figsize=(14,14))
sns.set(font_scale=1.4)
sns_plot = sns.heatmap(df, annot=True, linewidths=.5, fmt='g', ax=ax).set_yticklabels(ax.get_yticklabels(), rotation=0)
ax.set_ylabel('Product')
ax.set_xlabel('Manufacturer')
ax.xaxis.set_ticks_position('top')
ax.xaxis.set_label_position('top')
fig.savefig('output.png')
And I get a heatmap looking like this:
I then put my code in a docker container with an ubuntu image and I install the same version of seaborn. The only difference is that I need to add a matplotlib configuration so that TCL doesn't scream:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import seaborn as sns
And I get a heatmap that looks like this (I use the same code and the same pandas dataframe):
I'm unable to find why the color gradient is inverted and would love to hear if you have any idea.
Thank you !
The default colormap has changed to 'rocket' for sequential data with 0.8 release of seaborn, see the release notes. The colormap looks this way now:
You can always use the cmap argument and specify which colormap you prefer to use. For example, to get the pre-0.8 colormap for non-divergent data use: cmap=sns.cubehelix_palette(light=.95, as_cmap=True).
I am trying to use pandas plotting to create a stacked horizontal barplot with a seaborn import. I would like to remove space between the bars, but also not have the bars overlap. This is what I've tried:
import pandas as pd
import numpy as pd
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot.barh(stacked=True, width=1)
This seems to work without importing seaborn, though I like the seaborn style and it is usually an import in the ipython notebook I am working in is this possible?
This artifact is also visible with matplotlib defaults if you set the bar linewidth to what seaborn style has:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=.5)
A solution would be to increase the bar lines back to roughly where the matplotlib defaults are:
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=1)
Perhaps you should reduce the line width?
import seaborn as sns
f, ax = plt.subplots(figsize=(10, 10))
df.plot(kind='barh', stacked=True, width=1, lw=0.1, ax=ax)