Generate legend with matplotlib based on the values in column [duplicate] - matplotlib

This question already has answers here:
Scatter plots in Pandas/Pyplot: How to plot by category [duplicate]
(8 answers)
Closed 3 years ago.
Assume we have a dataframe of 4 individuals' scores in 2 different tests and the 3rd column tells us if they passed or failed overall
df:
[10,20,failed
10,40,passed
20,40,passed
30,10,failed]
I would like to generate a scatter plot with the scores of the 1st column on the x axis, the scores of the 2nd test on the y axis, and indicate with color (or marker) if they passed or failed. I have achieved this with:
plt.scatter(x=df[column1], y=df[column2], c=df[column3])
The question is, how can I have a legend based on the color (or marker) and column3?
[red: failed
blue: passed]

Here's my suggestion: Plot the failed an passed separately to get their handles, which can then be used for the legend.
fig = plt.figure()
ax1 = fig.add_subplot(111)
passed = ax1.scatter(x=df.loc[df[column3].eq('passed'), column1], y=df.loc[df[column3].eq('passed'), column2], c='green')
failed = ax1.scatter(x=df.loc[df[column3].eq('failed'), column1], y=df.loc[df[column3].eq('failed'), column2], c='red')
ax1.legend(handles=[passed, failed], labels=['Passed', 'Failed'])

Related

How to pick a new color for each plotted line in matplotlib? [duplicate]

This question already has answers here:
How to pick a new color for each plotted line within a figure in matplotlib?
(7 answers)
Closed 7 months ago.
I have a small problem with the colors when I make a loop for to create my lines
Can you help me please?
I would like to change the color of each line
data = {'Note':['3','3','3','4','4','4','5','5','5','1','2','2','1'],
'Score':['10','10.1','17','17.5','16.5','14.3','10','10.1','17','17.5','16.5','16.5','16.5']}
Create DataFrame
df = pd.DataFrame(data)
Create Plot by loop for
groups=df.groupby("Note")["Score"].min()
for color in ['r', 'b', 'g', 'k', 'm']:
for i in groups.values:
plt.axhline(y=i, color=color)
plt.show()
The reason you are not seeing the colors is because you are running 2 for loops and there are several values which are the same number (like 16.5 and 10). So, the lines get written one on top of the other. You can see the different colors by changing you code like this. But do note that only the last color for a particular y value will be seen.
data = {'Note':['3','3','3','4','4','4','5','5','5','1','2','2','1'],
'Score':['10','10.1','17','17.5','16.5','14.3','10','10.1','17','17.5','16.5','16.5','16.5']}
df = pd.DataFrame(data)
groups=df.groupby("Note")["Score"].min()
print(group.values)
color = ['r', 'b', 'g', 'k', 'm']
j=0
for i in groups.values:
plt.axhline(y=i, color=color[j%5])
j=j+1
plt.show()
OUTPUT

What are the parameters of pyplot.subplot? [duplicate]

This question already has answers here:
What does the argument mean in fig.add_subplot(111)?
(8 answers)
What is the behaviour of subplot given single argument? [duplicate]
(1 answer)
Why do many examples use `fig, ax = plt.subplots()` in Matplotlib/pyplot/python
(6 answers)
Closed 1 year ago.
Can't understand pyplot.subplot(330+1+i)
# plot first few images
for i in range(9):
#define subplot
pyplot.subplot(330 + 1 + i)
# plot raw pixel data
pyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray'))
# show the figure
pyplot.show()
You can view the arguments in the documentation for subplot. The basic arguments are subplot(nrows, ncols, index) where nrows is the number of rows of plots, ncols is the number of columns of plots, and index is the plot number when counting across the grid of plots from left to right, top to bottom.
Another way to specify subplots is with a three-digit integer as in your example.
A 3-digit integer. The digits are interpreted as if given separately as three single-digit integers, i.e. fig.add_subplot(235) is the same as fig.add_subplot(2, 3, 5). Note that this can only be used if there are no more than 9 subplots.
In your example where i ranges from 0 to 8, inclusive, the argument for subplot will range from 331 to 339. Following to the documentation, your subplots will be over 3 rows and 3 columns with in indices 1 to 9.

matplotlib subplots do not show the exact x tick labels passed to it as list

I am plotting a plot of Accuracy versus the var_smoothing curve of 4 different instances. My values are:
var_smoothing_values
>>
[1e-09, 1e-06, 0.001, 1]
gauss_accuracies
>>
[0.728, 0.8, 0.826, 0.832]
I have used the 2 subplots and on the second subplot, I am plotting this as:
f,ax = plt.subplots(1,2,figsize=(15,5))
ax[1].plot(var_smoothing_values,gauss_accuracies,marker='*',markersize=12)
ax[1].set_ylabel('Accuracy')
ax[1].set_xlabel('var_smoothing values')
ax[1].set_title('Accuracy vs var_smoothing | GaussianNB',size='large')
plt.show()
ax[1].set_xticks(var_smoothing_values) shows only 3 ticks.
How can I show only 4 ticks which corresponds to each of my var_smoothing_values??
You need to use the log scale on the x-axis since your x-values span acoss several orders of magnitude
ax[1].set_xscale('log')
ax[1].set_xticks(var_smoothing_values);

Stacked bars with hue in seaborn and pandas [duplicate]

This question already has answers here:
How to have clusters of stacked bars
(10 answers)
Closed 2 years ago.
I have a dataframe that looks like this:
df = pd.DataFrame(columns=["type", "App","Feature1", "Feature2","Feature3",
"Feature4","Feature5",
"Feature6","Feature7","Feature8"],
data=[["type1", "SHA",0,0,1,5,1,0,1,0],
["type2", "LHA",1,0,1,1,0,1,1,0],
["type2", "FRA",1,0,2,1,1,0,1,1],
["type1", "BRU",0,0,1,0,3,0,0,0],
["type2", "PAR",0,1,1,4,1,0,1,0],
["type2", "AER",0,0,1,1,0,1,1,0],
["type1", "SHE",0,0,0,1,0,0,1,0]])
I want to make a stacked bar with type as a hue. This is, in the x axis I want the features, and for each feature I want 2 stacked bars, one for type1 and one for type2.
For instance, here they explain how to make a stacked bar plot with seaborn when the column type is dropped. Instead, I want for each feature two stacked bars. Note: the values of App are shared for type1 and type2
For instance, if I just plot the stacked bars corresponding to type1, I get this:
I want to make a stacked bar plot where for each feature there are two stacked bars, one for type1, and the other one for type2
I don't think seaborn has a function for barplots that are both stacked and grouped. But you can do it in matplotlib itself by hand. Here is an example.
I think what you are looking for is the melt Function
d = df.drop(columns='App')
d = d.melt('type', var_name='a', value_name='b')
sns.barplot(x='a', y='b', data=d, hue='type')

bars not proportional to value - matplotlib bar chart [duplicate]

This question already has an answer here:
Difference in plotting with different matplotlib versions
(1 answer)
Closed 4 years ago.
I am new to matplotlib and am trying to plot a bar chart using pyplot. Instead of getting a plot where the height of bar represents the value, I am getting bars that are linearly increasing in height while their values are displayed on the y-axis as labels.
payment_modes = ['Q', 'NO', 'A', 'C', 'P', 'E', 'D']
l1=[]
l2=[]
for i in payment_modes:
l.append(str(len(df[df['PMODE_FEB18']==i])))
# here l = ['33906', '37997', '815', '4350', '893', '98', '6']
plt.figure()
plt.bar(range(7),l)
This is what I am getting:
The problem is that you seem to be feeding bar with strings, not with numerical quantities. If you instead use the actual numerical quantities, bar will behave as you would expect:
import matplotlib.pyplot as plt
l = [33906, 37997, 815, 4350, 893, 98, 6]
plt.figure()
plt.bar(range(7),l)
plt.show()
gives