Why stackplot and pandas area plot show stacked area so different? - pandas

The pandas dataframe df is like below from month Jan to Oct.
First graph was drawn by the command plt.stackplot(df.index,df, colors=pal, alpha=0.4) and second graph was drawn by command df.plot(kind='area', stacked=True, figsize=(18, 10)).
The second one is what I expected to see and I believe is also correct. But why the two graphs are so different from the same dataset? How can I fix the first command to correct the first graph?

Related

Annotating numeric values on grouped bars chart in pyplot

Good evening all,
I have a pd.dataframe called plot_eigen_vecs_df which is of (3,11) dimension, and I am plotting each column value grouped by rows on a bar chart. I am using the following code:
plot_eigen_vecs_df.plot(kind='bar', figsize=(12, 8),
title='First 3 PCs factor loadings',
xlabel='Evects', legend=True)
The result is this graph:
enter image description here
I would like to keep the graph (grouped) exactly as it is, but I need to show the numeric value above each bars.
Thank you
I tried the add_label method, but unfortunately I am currently using a version of pyplot which is not the most recent, so .add_label doesn't work for me. Could you please help on the matter?

problem in reordering the graph axis ggplot2, phyloseq

i have a shiny appp created which plots metagenome data using ggplot2, phyloseq and plotly with dplyr and tidyr. It creates pretty good stacked barplots and heatmaps only problem is it reorders sample names at x-axis e.g. 1-10 are arranged as 1,10,2,,5,6... how to correct that bug?

How to make a Scatter Plot for a Dataset with 4 Attribtues and 5th attribute being the Cluster

I have a dataset which looks like this,
It has four attributes and the fifth column (which I added by myself) is the cluster of each row to which the row belongs.
I want to build something like a Scatter Plot for this dataset, but I am unable to do so. I have tried searching it up and the best I could find was this following question on Stackoverflow,
How to make a 4d plot with matplotlib using arbitrary data
Using this, I was able to make a Scatter Plot but it can only be done for three attributes while fourth attribute being the cluster of each row.
Can anyone help me figure out how would it be possible to do the same to make a Scatter Plot for a dataset similar to mine?
I would recommend something like seaborn's pairplot:
import seaborn as sns
sns.pairplot(df, hue="cluster")
See the images in the link, of what it looks like.
This creates several pairwise scatterplots instead of trying to make a 3D plot and arbitrarily flatten one of the dimensions.

What does ax=ax do while creating a plot in matplotlib?

I have a DataFrame of Heart Disease patients, which has over 300 values. What I have done initially is filter the patients aging over 50. Now I am trying to plot that DF, but running on Google, I found this piece of code that helped me plotting it.
But I am not able to understand the concept of ax = ax here:
fig, ax = plt.subplots()
over_50.plot(x="age",
y="chol",
c="target",
kind="scatter",
---------> ax=ax); <---------
I want to learn the concept behind this little piece of code here. What is it doing at its core?
In this case (a single axes plot) you can do without this parameter.
But there are more complex cases, when you create subplots with
a number of axes objects (a grid).
In this case ax (the second result from plt.subplots()) is an array
of axes objects.
Then, creating each plot, you should specify in which axes this plot
is to be created.
See e.g. https://matplotlib.org/3.1.0/gallery/subplots_axes_and_figures/subplots_demo.html
and find title Stacking subplots in one direction.
It contains such example:
fig, axs = plt.subplots(2)
fig.suptitle('Vertically stacked subplots')
axs[0].plot(x, y)
axs[1].plot(x, -y)
Here:
there is created a figure composed of 2 columns,
in the first axes there is created one line plot, and in the second - another plot.
Alternative form of how to specify axes object in which particular plot
is to be created is just ax parameter, like in our code,
where you can pass one of axes objects from the current figure.

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])