How can I make this plot awesome (colours by group plus alpha value by second group) - matplotlib

I do have following dataframe:
I plotted it the following way:
Right now the plot looks ugly. Aside of using different font size, marker_edge_width, marker face color etc. I would like to have two colors for each protein (hum1 and hum2) and within the group the different pH values should have different intensities. What makes it more difficult is the fact that my groups do not have the same size.
Any ideas ?
P.S Such a build in feature would be really cool e.g colourby = level_one thenby level_two

fig = plt.figure(figsize=(9,9))
ax = fig.add_subplot(1,1,1)
c1 = plt.cm.Greens(np.linspace(0.5, 1, 4))
c2 = plt.cm.Blues(np.linspace(0.5, 1, 4))
colors = np.vstack((c1,c2))
gr.unstack(level=(0,1))['conc_dil'].plot(marker='o',linestyle='-',color=colors,ax=ax)
plt.legend(loc=1,bbox_to_anchor = (0,0,1.5,1),numpoints=1)
gives:
P.S This post helped me:
stacked bar plot and colours

Related

for loop of scatterplots, setting the colorbar of a scatterplot horizontally under each plot

I am really new to plots and matplotlib
I am trying to plot several scatterplots with a loop for data from a pandas data frame
each scatterplot will have on the x axis data from the columns and on the y axis will be the data from the last column
I can make the plots work, but my color bar is displayed on the right side of each plot
I would like to set the color bar horizontally, under each plot but unfortunately I lack the necessary knowledge.
my code looks like this:
num_cols = df.columns.to_list()[:-1]
for col in num_cols:
df.plot.scatter(x = col, y = df.columns[-1], c=col, cmap='Paired', title=col, figsize = (5, 5))
current relust looks like this (sample of 1 plot from the loop):[current reslut]https://i.stack.imgur.com/m26wn.jpg
I tried a lines of code, but with no success.
I would like to have something like this (excuse my windows paint skills):expected result
You can use the cmap_location to move it to bottom.
num_cols = df.columns.to_list()[:-1]
for col in num_cols:
df.plot.scatter(x = col, y = df.columns[-1], c=col, cmap='Paired', title=col, figsize = (5, 5), cbar_location="bottom", cbar_mode="edge",
cbar_pad=0.25,
cbar_size="15%",)

Create a bar chart with bars colored according to a category and line on the same chart

I trained a model to predict a value and I want to make a bar chart that plots target - prediction for each sample, and then color these bars according to a category. I then want to add two horizontal lines for plus or minus sigma around the central axis, so it's clear which predictions are very far off. Imagine we know sigma == 0.3 and we have a dataframe
error
sample_id
category
.1
1
'A'
.4
2
'A'
.1
3
'B'
-.2
4
'B'
-.1
5
'C'
How could I do this? I've managed to do just the errors and the plus or minus sigma lines just using matplotlib, here it is to communicate what I mean.
You'll find the pd.Series.transform() and/or pd.DataFrame.apply() methods quite useful. Essentially, you can map each value of your input columns (in this case errors) into some valid color value, returning a pd.Series of colors that's the same shape as errors.
The phrasing of the question is unclear, but it sounds like you want a single pair of lines for each category? In which case, you will first need to do a pd.Series.groupby() operation to get the shape that you want before the transform opeartion. Probably just a series of length 3, for your A B C categories.
Then, this Series (whether it is of length len(df) or df.category.nunique()) can be passed into your plt.bar method as the color argument.
This is actually very easy, I just didn't understand the 'color' option of plt.bar. If it is a list of length equal to the number of bars, then it will color each bar with the corresponding color. It's as simple as
plt.(x,y,color = z)
#len(x) = len(y) = len(z), and z is an array of colors
As krukah mentions, you just need to translate categories to colors. I picked a color map, made a dictionary that picked a color for each unique category, and then turned the cats array (a 2d np array, each row encodes a category) into an array of colors.
unique_cats = np.unique(cats, axis=0)
n_unique = unique_cats.shape[0]
for_picking = np.arange(0,1,1/n_unique)
cmap = plt.cm.get_cmap('plasma')
color_dict = {}
#this for loop fills in the dictionary by picking colors from the cmap
for i in range(n_unique):
color_dict[str(unique_cats[i])] =cmap(for_picking[i])
color_cats = [color_dict[str(cat)] for cat in cats]
Hopefully that helps someone some day.

Stacked hue histogram

I don't have the reputation to add inline images I'm sorry.
This is the code I found:
bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)
g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)
g.map(plt.hist, 'Principal', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
Output:
I want to do something similar with some data I have:
bins = np.linspace(df.overall.min(), df.overall.max(), 10)
g = sns.FacetGrid(df, col="player_positions", hue="preferred_foot", palette="Set1", col_wrap=4)
g.map(plt.hist, 'overall', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
The hue "preferred_foot" is just left and right.
My output:
I am not sure why I can't see the left values on the plot
df['preferred_foot'].value_counts()
Right 13960
Left 4318
I am fairly sure those are not stacked histograms, but just two histograms one behind the other. I believe your "left" red bars are simply hidden behind the "right" blue bars.
You could try adding some alpha=0.5 or changing the order of the hues (add hue_order=['Right','Left'] to the call to FacetGrid.

Grouping the factors in ggplot

I am trying to create a graph based on matrix similar to one below... I am trying to group the Erosion values based on "Slope"...
library(ggplot2)
new_mat<-matrix(,nrow = 135, ncol = 7)
colnames(new_mat)<-c("Scenario","Runoff (mm)","Erosion (t/ac)","Slope","Soil","Tillage","Rotation")
for ( i in 1:nrow(new_mat)){
new_mat[i,2]<-sample(10:50, 1)
new_mat[i,3]<-sample(0.1:20, 1)
new_mat[i,4]<-sample(c("S2","S3","S4","S5","S1"),1)
new_mat[i,5]<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,7]<-sample(c("WBP","WBF","WF"),1)
new_mat[i,6]<-sample(c("Intense","Reduced","Notill"),1)
new_mat[i,1]<-paste0(new_mat[i,4],"_",new_mat[i,5],"_",new_mat[i,6],"_",new_mat[i,7],"_")
}
#### Graph part ########
grphs_mat<-as.data.frame(new_mat)
grphs_mat$`Runoff (mm)`<-as.numeric(as.character(grphs_mat$`Runoff (mm)`))
grphs_mat$`Erosion (t/ac)`<-as.numeric(as.character(grphs_mat$`Erosion (t/ac)`))
ggplot(grphs_mat, aes(Scenario, `Erosion (t/ac)`,group=Slope, colour = Slope))+
scale_y_continuous(limits=c(0,max(as.numeric((grphs_mat$`Erosion (t/ac)`)))))+
geom_point()+geom_line()
But when i run this code.. The values are distributed in x-axis for all 135 scenarios. But what i want is grouping to be done in terms of slope but it also picks up the other common factors such as Soil+Rotation+Tillage and place it in x-axis. For example:
For these five scenarios:
S1_Deep_Intense_WBF_
S2_Deep_Intense_WBF_
S3_Deep_Intense_WBF_
S4_Deep_Intense_WBF_
S5_Deep_Intense_WBF_
It separates the S1, S2, S3,S4,S5 but also be able to know that other factors are same and put them in x-axis such that the slope lines are stacked on top of each other in 135/5 = 27 x-axis points. The final figure should look like this (Refer image). Apologies for not being able to explain it better.
I think i am making a mistake in grouping or assigning the x-axis values.
I will appreciate your suggestions.
In the example you give, I didn't get every possible factor combination represented so the plots looked a bit weird. What I did instead was start with the following:
set.seed(42)
new_mat <- matrix(,nrow = 1000, ncol = 7)
And then deduplicated this by summarising the values. A possible relevant step here for you analysis is that I made new variable with the interaction() function that is the combination of three other factors.
library(tidyverse)
df <- grphs_mat
df$x <- with(df, interaction(Rotation, Soil, Tillage))
# The simulation did not yield unique combinations
df <- df %>% group_by(x, Slope) %>%
summarise(n = sum(`Erosion (t/ac)`))
Next, I plotted this new x variable on the x-axis and used "stack" positions for the lines and points.
g <- ggplot(df, aes(x, y = n, colour = Slope, group = Slope)) +
geom_line(position = "stack") +
geom_point(position = "stack")
To make the x-axis slightly more readable, you can replace the . that the interaction() function placed by newlines.
g + scale_x_discrete(labels = function(x){gsub("\\.", "\n", x)})
Another option is to simply rotate the x axis labels:
g + theme(axis.text.x.bottom = element_text(angle = 90))
There are a few additional options for the x-axis if you go into ggplot2 extension packages.

Annotation box does not appear in matplotlib

The planned annotation box does not appear on my plot, however, I've tried a wide range of values for its coordinates.
What's wrong with that?!
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def f(s,t):
a = 0.7
b = 0.8
Iext= 0.5
tau = 12.5
v = s[0]
w = s[1]
dndt = v - np.power(v,3)/3 - w + Iext
dwdt = (v + a - b * w)/tau
return [dndt, dwdt]
t = np.linspace(0,200)
s0=[1,1]
s = odeint(f,s0,t)
plt.plot(t,s[:,0],'b-', linewidth=1.0)
plt.xlabel(r"$t(sec.)$")
plt.ylabel(r"$V (volt)$")
plt.legend([r"$V$"])
annotation_string = r"$I_{ext}=0.5$"
plt.text(15, 60, annotation_string, bbox=dict(facecolor='red', alpha=0.5))
plt.show()
The coordinates to plt.text are data coordinates by default. This means in order to be present in the plot they should not exceed the data limits of your plot (here, ~0..200 in x direction, ~-2..2 in y direction).
Something like plt.text(10,1.8) should work.
The problem with that is that once the data limits change (because you plot something different or add another plot) the text item will be at a different position inside the canvas.
If this is undesired, you can specify the text in axes coordinates (ranging from 0 to 1 in both directions). In order to place the text always in the top left corner of the axes, independent on what you plot there, you can use e.g.
plt.text(0.03,0.97, annotation_string, bbox=dict(facecolor='red', alpha=0.5),
transform=plt.gca().transAxes, va = "top", ha="left")
Here the transform keyword tells the text to use Axes coordinates, and va = "top", ha="left" means, that the top left corner of the text should be the anchor point.
The annotation is appearing far above your plot because you have given a 'y' coordinate of 60, whereas your plot ends at '2' (upwards).
Change the second argument here:
plt.text(15, 60, annotation_string, bbox=dict(facecolor='red', alpha=0.5))
It needs to be <=2 to show up on the plot itself. You may also want to change the x coorinate (from 15 to something less), so that it doesn't obscure your lines.
e.g.
plt.text(5, 1.5, annotation_string, bbox=dict(facecolor='red', alpha=0.5))
Don't be alarmed by my (5,1.5) suggestion, I would then add the following line to the top of your script (beneath your imports):
rcParams['legend.loc'] = 'best'
This will choose a 'best fit' for your legend; in this case, top left (just above your annotation). Both look quite neat then, your choice though :)