How to show the titles of subplots on box plots done using Groupby? - pandas

I just created box plots using a groupby, however, I'm having trouble including the titles of each box plot. Just to clarify, I don't manually want to change the titles of the subplots, I would like it to be automatically displayed since right now I just get all of the plots but I have no idea which is for what group...
Here's an example:
Here's the code I'm using:
gt_venta_precio_zona = gt_venta[['Precio USD','Zona']]
gt_venta_precio_zona.groupby('Zona').plot.box(fontsize=20,rot=90,figsize=(12,8),return_type='axes',patch_artist=True)
Any help will be highly appreciated.
Thank you in advance!

You can save the grouped data and then iterate over the groups, supplying the group name (key in groups.keys()) as title to the plot function:
import pandas as pd
df = pd.DataFrame( {'col1': pd.np.repeat(list('ABCD'), 50), 'col2': pd.np.random.random(200)})
grp = df.groupby('col1')
for key in grp.groups.keys():
grp.get_group(key).plot.box(title=key)

Related

How to create a (bar) graph using plotly express from a dictionary?

I have a dictionary of dates (keys) to a value for each date. I'm trying to show this in a simple bar graph using plotly-express in python. I've tried putting it in a pandas DataFrame and Series object and just using a plain dict, but I seem to get an error/not what I want each time when I try to put it in a plotly express bar graph as such:
fig = px.bar(daily_charge_dict, x=daily_charge_dict.keys(),y=daily_charge_dict.values(), barmode="group")
Any suggestions on how to complete this? Thanks!
To plot a bar from a dictionary, the x and y must be a list. For example in your case, you want x axis to be a list of dates and y axis to be some values for each date. So the dictionary should look like:
the_dict = {'dates': ['2020-01-01', '2020-01-02'], 'y_vals': [100,200]}
So rather than have several keys of dates, have just a two key dictionary, with the list of dates being the first element and the list of corresponding values being the second.
Then plot them using plotly express as :
import plotly.express as px
fig = px.bar(the_dict, x='dates', y='y_vals')
fig.show()

Seaborn time series plotting: a different problem for each function

I'm trying to use seaborn dataframe functionality (e.g. passing column names to x, y and hue plot parameters) for my timeseries (in pandas datetime format) plots.
x should come from a timeseries column(converted from a pd.Series of strings with pd.to_datetime)
y should come from a float column
hue comes from a categorical column that I calculated.
There are multiple streams in the same series that I am trying to separate (and use the hue for separating them visually), and therefore they should not be connected by a line (like in a scatterplot)
I have tried the following plot types, each with a different problem:
sns.scatterplot: gets the plotting right and the labels right bus has problems with the xlimits, and I could not set them right with plt.xlim() using data.Dates.min and data.Dates.min
sns.lineplot: gets the limits and the labels right but I could not find a setting to disable the lines between the individual datapoints like in matplotlib. I tried the setting the markers and the dashes parameters to no avail.
sns.stripplot: my last try, plotted the datapoints correctly and got the xlimits right but messed the labels ticks
Example input data for easy reproduction:
dates = pd.to_datetime(('2017-11-15',
'2017-11-29',
'2017-12-15',
'2017-12-28',
'2018-01-15',
'2018-01-30',
'2018-02-15',
'2018-02-27',
'2018-03-15',
'2018-03-27',
'2018-04-13',
'2018-04-27',
'2018-05-15',
'2018-05-28',
'2018-06-15',
'2018-06-28',
'2018-07-13',
'2018-07-27'))
values = np.random.randn(len(dates))
clusters = np.random.randint(1, size=len(dates))
D = {'Dates': dates, 'Values': values, 'Clusters': clusters}
data = pd.DataFrame(D)
To each of the functions I am passing the same arguments:
sns.OneOfThePlottingFunctions(x='Dates',
y='Values',
hue='Clusters',
data=data)
plt.show()
So to recap, what I want is a plot that uses seaborn's pandas functionality, and plots points(not lines) with correct x limits and readable x labels :)
Any help would be greatly appreciated.
ax = sns.scatterplot(x='Dates', y='Values', hue='Clusters', data=data)
ax.set_xlim(data['Dates'].min(), data['Dates'].max())

How remove a specific legend label in the Dataframe.plot?

I am trying to plot two DataFrame together by 'bar' style and 'line' style respectively, but have trouble when showing the legend only for the bars, excluding the line.
Here are my codes:
import numpy as np
import pandas as pd
np.random.seed(5)
df = pd.DataFrame({'2012':np.random.random_sample((4,)),'2014':np.random.random_sample((4,))})
df.index = ['A','B','C','D']
sumdf = df.T.apply(np.sum,axis=1)
ax = df.T.plot.bar(stacked=True)
sumdf.plot(ax=ax)
ax.set_xlim([-0.5,1.5])
ax.set_ylim([0,3])
ax.legend(loc='upper center',ncol=3,framealpha=0,labelspacing=0,handlelength=4,borderaxespad=0)
Annoyingly got this: Figure, where the line legend is also shown in the legend box. I want to remove it rather than make it invisible.
But I do not find the way.
Thank you!
If a matplotlib.legend's label starts with an underscore, it will not be shown in the legend by default.
You can simply change
sumdf.plot(ax=ax)
to
sumdf.plot(ax=ax, label='_')

Adding Arbitrary points on pandas time series using Dataframe.plot function

I have been trying to plot some time series graphs using the pandas dataframe plot function. I was trying to add markers at some arbitrary points on the plot to show anomalous points. The code I used :
df1 = pd.DataFrame({'Entropy Values' : MeanValues}, index=DateRange)
df1.plot(linestyle = '-')
I have a list of Dates on which I need to add markers.Such as:
Dates = ['15:45:00', '15:50:00', '15:55:00', '16:00:00']
I had a look at this link matplotlib: Set markers for individual points on a line. Does DF.plot have a similar functionality?
I really appreciate the help. Thanks!
DataFrame.plot passes all keyword arguments it does not recognize to the matplotlib plotting method. To put markers at a few points in the plot you can use the markevery argument. Here is an example:
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': range(10)}).set_index('A')
df.plot(linestyle='-', markevery=[1, 5, 7, 8], marker='o', markerfacecolor='r')
In your case, you would have to do something like
df1.plot(linestyle='-', markevery=Dates, marker='o', markerfacecolor='r')

Seaborn FacetGrid plots are empty if any of the sub-plots have no data

I have a dataset that I want to plot with FacetGrids using the seaborn library. The problem is my data is "sparse"; some of the individual subplots don't exist (ie. there are zero data points). I would like those cells to either not show up, or just show up and be blank, but still see the subplots that have data. Here's a simple example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=['a','b','c','d'],
data=[[1,1,1,4],[1,2,2,8],[2,1,2,12],[2,1,3,14]])
print df
g = sns.FacetGrid(df, col='a', row='b', hue='c')
g.map(plt.scatter, 'c', 'd', marker='o')
Unfortunately, when I plot this, I just get four empty plots instead of 3 filled plots and one empty one. If I change the last row of data to [2,2,3,14] instead, then all four plots appear as expected. Is this a bug in seaborn? Can I work around it somehow?