Plotly.express Area multiple plots data error

Plotly.express Area multiple plots data error - pandas

This is my code:
import pandas as pd
import plotly.express as px
df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,6],'y3':[3,4,5,6,7]}
df=pd.DataFrame(df)
fig = px.area(df, x="x", y=['y1','y2','y3'])
fig.show()
As you can see my Y data are maximum 7. Why the results in the figure shows wrong values?
Why the results in the figure shows wrong values?

plotly.express.area creates a stacked area plot, where each filled area corresponds to one column of the input data: https://plotly.com/python/filled-area-plots/
For example, at x = 5, the stacked area plot reaches y = 18 = 5 + 6 + 7.
To show each column's data series as-is, without stacking (i.e., starting from the y-coordinate of zero):
fig = px.line(df, x='x', y=['y1', 'y2', 'y3'])

Short answer:
Just add the following to your setup to obtain the desired behavior of px.area:
fig.update_traces(stackgroup = None, fill = 'tozeroy')
The details:
px.area produces a stacked area chart through the trace attribute stackgroup which by default is set to 'one'. To obtain what you're requesting here, you can update this attribute to None with:
fig.update_traces(stackgroup = None)
Plot 1 - Unstacked traces
As you can see, this will display the values in the way you're requesting, but that won't help you that much since the area fill colors are missing. By default, the fill attribute for the traces are set to tonexty. Other options are:
['none', 'tozeroy', 'tozerox', 'tonexty', 'tonextx', 'toself', 'tonext']
And if you set fill to tozeroy with fig.update_traces(fill = 'tozeroy') you'll get what you're looking for:
Plot 2 - Almost there, but what's up with the colors?
This is going to look a bit strange for your particular dataset since the resulting areas will cover each other all the way and because the area colors are opaque. But you can see by the colors of the legend that the color setup is still the same as in your plot. You can verify this by using the following dataset:
df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,2],'y3':[3,4,5,6,1]}
Plot 3 - Bingo!
Complete code:
import pandas as pd
import plotly.express as px
# df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,6],'y3':[3,4,5,6,7]}
df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,2],'y3':[3,4,5,6,1]}
df=pd.DataFrame(df)
fig = px.area(df, x="x", y=['y1','y2','y3'])
#fig.update_traces(stackgroup = None)
#fig.update_traces(fill = 'tozeroy')
fig.update_traces(stackgroup = None, fill = 'tozeroy')
fig.show()

Related

How to turn Pandas output into an image for download? [duplicate]

I constructed a pandas dataframe of results. This data frame acts as a table. There are MultiIndexed columns and each row represents a name, ie index=['name1','name2',...] when creating the DataFrame. I would like to display this table and save it as a png (or any graphic format really). At the moment, the closest I can get is converting it to html, but I would like a png. It looks like similar questions have been asked such as How to save the Pandas dataframe/series data as a figure?
However, the marked solution converts the dataframe into a line plot (not a table) and the other solution relies on PySide which I would like to stay away simply because I cannot pip install it on linux. I would like this code to be easily portable. I really was expecting table creation to png to be easy with python. All help is appreciated.

Pandas allows you to plot tables using matplotlib (details here).
Usually this plots the table directly onto a plot (with axes and everything) which is not what you want. However, these can be removed first:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.table.plotting import table # EDIT: see deprecation warnings below
ax = plt.subplot(111, frame_on=False) # no visible frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
table(ax, df) # where df is your data frame
plt.savefig('mytable.png')
The output might not be the prettiest but you can find additional arguments for the table() function here.
Also thanks to this post for info on how to remove axes in matplotlib.
EDIT:
Here is a (admittedly quite hacky) way of simulating multi-indexes when plotting using the method above. If you have a multi-index data frame called df that looks like:
first second
bar one 1.991802
two 0.403415
baz one -1.024986
two -0.522366
foo one 0.350297
two -0.444106
qux one -0.472536
two 0.999393
dtype: float64
First reset the indexes so they become normal columns
df = df.reset_index()
df
first second 0
0 bar one 1.991802
1 bar two 0.403415
2 baz one -1.024986
3 baz two -0.522366
4 foo one 0.350297
5 foo two -0.444106
6 qux one -0.472536
7 qux two 0.999393
Remove all duplicates from the higher order multi-index columns by setting them to an empty string (in my example I only have duplicate indexes in "first"):
df.ix[df.duplicated('first') , 'first'] = '' # see deprecation warnings below
df
first second 0
0 bar one 1.991802
1 two 0.403415
2 baz one -1.024986
3 two -0.522366
4 foo one 0.350297
5 two -0.444106
6 qux one -0.472536
7 two 0.999393
Change the column names over your "indexes" to the empty string
new_cols = df.columns.values
new_cols[:2] = '','' # since my index columns are the two left-most on the table
df.columns = new_cols
Now call the table function but set all the row labels in the table to the empty string (this makes sure the actual indexes of your plot are not displayed):
table(ax, df, rowLabels=['']*df.shape[0], loc='center')
et voila:
Your not-so-pretty but totally functional multi-indexed table.
EDIT: DEPRECATION WARNINGS
As pointed out in the comments, the import statement for table:
from pandas.tools.plotting import table
is now deprecated in newer versions of pandas in favour of:
from pandas.plotting import table
EDIT: DEPRECATION WARNINGS 2
The ix indexer has now been fully deprecated so we should use the loc indexer instead. Replace:
df.ix[df.duplicated('first') , 'first'] = ''
with
df.loc[df.duplicated('first') , 'first'] = ''

There is actually a python library called dataframe_image
Just do a
pip install dataframe_image
Do the imports
import pandas as pd
import numpy as np
import dataframe_image as dfi
df = pd.DataFrame(np.random.randn(6, 6), columns=list('ABCDEF'))
and style your table if you want by:
df_styled = df.style.background_gradient() #adding a gradient based on values in cell
and finally:
dfi.export(df_styled,"mytable.png")

The best solution to your problem is probably to first export your dataframe to HTML and then convert it using an HTML-to-image tool.
The final appearance could be tweaked via CSS.
Popular options for HTML-to-image rendering include:
WeasyPrint
wkhtmltopdf/wkhtmltoimage
Let us assume we have a dataframe named df.
We can generate one with the following code:
import string
import numpy as np
import pandas as pd
np.random.seed(0) # just to get reproducible results from `np.random`
rows, cols = 5, 10
labels = list(string.ascii_uppercase[:cols])
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 10)), columns=labels)
print(df)
# A B C D E F G H I J
# 0 44 47 64 67 67 9 83 21 36 87
# 1 70 88 88 12 58 65 39 87 46 88
# 2 81 37 25 77 72 9 20 80 69 79
# 3 47 64 82 99 88 49 29 19 19 14
# 4 39 32 65 9 57 32 31 74 23 35
Using WeasyPrint
This approach uses a pip-installable package, which will allow you to do everything using the Python ecosystem.
One shortcoming of weasyprint is that it does not seem to provide a way of adapting the image size to its content.
Anyway, removing some background from an image is relatively easy in Python / PIL, and it is implemented in the trim() function below (adapted from here).
One also would need to make sure that the image will be large enough, and this can be done with CSS's #page size property.
The code follows:
import weasyprint as wsp
import PIL as pil
def trim(source_filepath, target_filepath=None, background=None):
if not target_filepath:
target_filepath = source_filepath
img = pil.Image.open(source_filepath)
if background is None:
background = img.getpixel((0, 0))
border = pil.Image.new(img.mode, img.size, background)
diff = pil.ImageChops.difference(img, border)
bbox = diff.getbbox()
img = img.crop(bbox) if bbox else img
img.save(target_filepath)
img_filepath = 'table1.png'
css = wsp.CSS(string='''
#page { size: 2048px 2048px; padding: 0px; margin: 0px; }
table, td, tr, th { border: 1px solid black; }
td, th { padding: 4px 8px; }
''')
html = wsp.HTML(string=df.to_html())
html.write_png(img_filepath, stylesheets=[css])
trim(img_filepath)
Using wkhtmltopdf/wkhtmltoimage
This approach uses an external open source tool and this needs to be installed prior to the generation of the image.
There is also a Python package, pdfkit, that serves as a front-end to it (it does not waive you from installing the core software yourself), but I will not use it.
wkhtmltoimage can be simply called using subprocess (or any other similar means of running an external program in Python).
One would also need to output to disk the HTML file.
The code follows:
import subprocess
df.to_html('table2.html')
subprocess.call(
'wkhtmltoimage -f png --width 0 table2.html table2.png', shell=True)
and its aspect could be further tweaked with CSS similarly to the other approach.

Although I am not sure if this is the result you expect, you can save your DataFrame in png by plotting the DataFrame with Seaborn Heatmap with annotations on, like this:
http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html#seaborn.heatmap
It works right away with a Pandas Dataframe. You can look at this example: Efficiently ploting a table in csv format using Python
You might want to change the colormap so it displays a white background only.
Hope this helps.
Edit:
Here is a snippet that does this:
import matplotlib
import seaborn as sns
def save_df_as_image(df, path):
# Set background to white
norm = matplotlib.colors.Normalize(-1,1)
colors = [[norm(-1.0), "white"],
[norm( 1.0), "white"]]
cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)
# Make plot
plot = sns.heatmap(df, annot=True, cmap=cmap, cbar=False)
fig = plot.get_figure()
fig.savefig(path)

The solution of #bunji works for me, but default options don't always give a good result.
I added some useful parameter to tweak the appearance of the table.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import table
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df.index = [item.strftime('%Y-%m-%d') for item in df.index] # Format date
fig, ax = plt.subplots(figsize=(12, 2)) # set size frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
ax.set_frame_on(False) # no visible frame, uncomment if size is ok
tabla = table(ax, df, loc='upper right', colWidths=[0.17]*len(df.columns)) # where df is your data frame
tabla.auto_set_font_size(False) # Activate set fontsize manually
tabla.set_fontsize(12) # if ++fontsize is necessary ++colWidths
tabla.scale(1.2, 1.2) # change size table
plt.savefig('table.png', transparent=True)
The result:

I had the same requirement for a project I am doing. But none of the answers came elegant to my requirement. Here is something which finally helped me, and might be useful for this case:
from bokeh.io import export_png, export_svgs
from bokeh.models import ColumnDataSource, DataTable, TableColumn
def save_df_as_image(df, path):
source = ColumnDataSource(df)
df_columns = [df.index.name]
df_columns.extend(df.columns.values)
columns_for_table=[]
for column in df_columns:
columns_for_table.append(TableColumn(field=column, title=column))
data_table = DataTable(source=source, columns=columns_for_table,height_policy="auto",width_policy="auto",index_position=None)
export_png(data_table, filename = path)

There is a Python library called df2img available at https://pypi.org/project/df2img/ (disclaimer: I'm the author). It's a wrapper/convenience function using plotly as backend.
You can find the documentation at https://df2img.dev.
import pandas as pd
import df2img
df = pd.DataFrame(
data=dict(
float_col=[1.4, float("NaN"), 250, 24.65],
str_col=("string1", "string2", float("NaN"), "string4"),
),
index=["row1", "row2", "row3", "row4"],
)
Saving a pd.DataFrame as a .png-file can be done fairly quickly. You can apply formatting, such as background colors or alternating the row colors for better readability.
fig = df2img.plot_dataframe(
df,
title=dict(
font_color="darkred",
font_family="Times New Roman",
font_size=16,
text="This is a title",
),
tbl_header=dict(
align="right",
fill_color="blue",
font_color="white",
font_size=10,
line_color="darkslategray",
),
tbl_cells=dict(
align="right",
line_color="darkslategray",
),
row_fill_color=("#ffffff", "#d7d8d6"),
fig_size=(300, 160),
)
df2img.save_dataframe(fig=fig, filename="plot.png")

If you're okay with the formatting as it appears when you call the DataFrame in your coding environment, then the absolute easiest way is to just use print screen and crop the image using basic image editing software.
Here's how it turned out for me using Jupyter Notebook, and Pinta Image Editor (Ubuntu freeware).

As jcdoming suggested, use Seaborn heatmap():
import seaborn as sns
import matplotlib.pyplot as plt
fig = plt.figure(facecolor='w', edgecolor='k')
sns.heatmap(df.head(), annot=True, cmap='viridis', cbar=False)
plt.savefig('DataFrame.png')

The easiest and fastest way to convert a Pandas dataframe into a png image using Anaconda Spyder IDE- just double-click on the dataframe in variable explorer, and the IDE table will appear, nicely packaged with automatic formatting and color scheme. Just use a snipping tool to capture the table for use in your reports, saved as a png:
This saves me lots of time, and is still elegant and professional.

The following would need extensive customisation to format the table correctly, but the bones of it works:
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import pandas as pd
df = pd.DataFrame({ 'A' : 1.,
'B' : pd.Series(1,index=list(range(4)),dtype='float32'),
'C' : np.array([3] * 4,dtype='int32'),
'D' : pd.Categorical(["test","train","test","train"]),
'E' : 'foo' })
class DrawTable():
def __init__(self,_df):
self.rows,self.cols = _df.shape
img_size = (300,200)
self.border = 50
self.bg_col = (255,255,255)
self.div_w = 1
self.div_col = (128,128,128)
self.head_w = 2
self.head_col = (0,0,0)
self.image = Image.new("RGBA", img_size,self.bg_col)
self.draw = ImageDraw.Draw(self.image)
self.draw_grid()
self.populate(_df)
self.image.show()
def draw_grid(self):
width,height = self.image.size
row_step = (height-self.border*2)/(self.rows)
col_step = (width-self.border*2)/(self.cols)
for row in range(1,self.rows+1):
self.draw.line((self.border-row_step//2,self.border+row_step*row,width-self.border,self.border+row_step*row),fill=self.div_col,width=self.div_w)
for col in range(1,self.cols+1):
self.draw.line((self.border+col_step*col,self.border-col_step//2,self.border+col_step*col,height-self.border),fill=self.div_col,width=self.div_w)
self.draw.line((self.border-row_step//2,self.border,width-self.border,self.border),fill=self.head_col,width=self.head_w)
self.draw.line((self.border,self.border-col_step//2,self.border,height-self.border),fill=self.head_col,width=self.head_w)
self.row_step = row_step
self.col_step = col_step
def populate(self,_df2):
font = ImageFont.load_default().font
for row in range(self.rows):
print(_df2.iloc[row,0])
self.draw.text((self.border-self.row_step//2,self.border+self.row_step*row),str(_df2.index[row]),font=font,fill=(0,0,128))
for col in range(self.cols):
text = str(_df2.iloc[row,col])
text_w, text_h = font.getsize(text)
x_pos = self.border+self.col_step*(col+1)-text_w
y_pos = self.border+self.row_step*row
self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
for col in range(self.cols):
text = str(_df2.columns[col])
text_w, text_h = font.getsize(text)
x_pos = self.border+self.col_step*(col+1)-text_w
y_pos = self.border - self.row_step//2
self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
def save(self,filename):
try:
self.image.save(filename,mode='RGBA')
print(filename," Saved.")
except:
print("Error saving:",filename)
table1 = DrawTable(df)
table1.save('C:/Users/user/Pictures/table1.png')
The output looks like this:

People who use Plotly for data visualization:
You can easily convert the dataframe to go.Table.
You can save the dataframe with columns names.
You can format the dataframe through go.Table.
You can save the dataframe as pdf, jpg, or png with different scales and high resolution.
import plotly.express as px
df = px.data.medals_long()
fig = go.Figure(data=[
go.Table(
header=dict(values=list(df.columns),align='center'),
cells=dict(values=df.values.transpose(),
fill_color = [["white","lightgrey"]*df.shape[0]],
align='center'
)
)
])
fig.write_image('image.png',scale=6)
Note: the image is downloaded in the same directory where the current python file is running.
Output:

I really like the way Jupyter notebooks format the DataFrame and this library exports it in the same format:
import dataframe_image as dfi
dfi.export(df, "df.png")
There is also a dpi argument in case you want to increase the quality of the image. I'd recommend 300 for an ok quality, 600 for exelent, 1200 for perfect and more than that is probably too much.
import dataframe_image as dfi
dfi.export(df, "df.png", dpi = 600)

Seaborn countplot -- trouble adding counts to top of bars

I have a survey dataset that I'm trying to develop a number of summary bar plots.
I have successfully created the plots (with hatching -- in case I need to publish in black and white)... but, I'd like to add counts to the top of the bars, but can't seem to get this correct.
Notice that code below (see "bar.annotate..."). The code successfully displays the plot but doesn't display the counts at the top of the bars.
import seaborn as sns
import matplotlib.pyplot al plt
df = survey['Which of the following best describes your position?'] # it's not a dataframe, but if single columns, it's a series
df = df.rename("Fig01: Which of the following best describes your position?")
# Set style
sns.set(style="whitegrid", color_codes=True)
# Make the barplot
#bar = sns.barplot(x=df.index, hue="class", data=df)
bar = sns.countplot(x=df.index, data=df)
bar.set_title("Fig01: Which of the following best describes your position?")
# Define some hatches
#hatches = ['--', '++', 'Xx', '\\', '**', 'oo']
hatches = ['+', 'xx', '\\\\'] # repeating letters increases the density
# Loop over the bars
for i,thisbar in enumerate(bar.patches):
# Set a different hatch for each bar
thisbar.set_hatch(hatches[i])
bar.annotate('%{:d}'.format(thisbar.get_height()), (thisbar.get_x(), thisbar.get_height()+50))
plt.xticks(rotation=-45)
plt.show()
Here's what the output currently looks like (again, I'm trying to get the counts to display at the top of the bars).

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.

Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).

How to pass different scatter kwargs to facets in lmplot in Seaborn

I'm trying to map a 3rd variable to the scatter point colour in the Seaborn lmplot. So total_bill on x, tip on y and point colour as function of size.
It works when no faceting is enabled but fails when col is used because the colour array size does not match the size of the data plotted in each facet.
This is my code
import matplotlib as mpl
import seaborn as sns
sns.set(color_codes=True)
# load data
data = sns.load_dataset("tips")
# size of data
print len(data.index)
### we want to plot scatter point colour as function of variable 'size'
# first, sort the data by 'size' so that high 'size' values are plotted
# over the smaller sizes (so they are more visible)
data = data.sort_values(by=['size'], ascending=True)
scatter_kws = dict()
cmap = mpl.cm.get_cmap(name='Blues')
# normalise 'size' variable as float range needs to be
# between 0 and 1 to map to a valid colour
scatter_kws['c'] = data['size'] / data['size'].max()
# map normalised values to colours
scatter_kws['c'] = cmap(scatter_kws['c'].values)
# colour array has same size as data
print len(scatter_kws['c'])
# this works as intended
g = sns.lmplot(data=data, x="total_bill", y="tip", scatter_kws=scatter_kws)
The above works well and produces the following (not allowed to include images yet, so here's the link):
lmplot with point colour as function of size
However, when I add col='sex' to lmplot (try code below), the issue is that the colour array has the size of the original dataset which is larger than the size of data plotted in each facet. So, for example col='male' has 157 data points so first 157 values from the colour array are mapped to the points (and these aren't even the correct ones). See below:
lmplot with point colour as function of size with col=sex
g = sns.lmplot(data=data, x="total_bill", y="tip", col="sex", scatter_kws=scatter_kws)
Ideally, I'd like to pass an array of scatter_kws to the lmplot so that each facet uses the correct colour array (which I'd calculate before passing to lmplot). But that doesn't seem to be an option.
Any other ideas or workarounds that still allow me to use the functionality of Seaborn's lmplot (meaning, without resorting to re-creating lmplot functionality from FacetGrid?

In principle the lmplot with different cols seems to be just a wrapper for several regplots. So instead of one lmplot we could use two regplots, one for each sex.
We therefore need to separate the original dataframe into male and female, the rest is rather straight forward.
import matplotlib.pyplot as plt
import seaborn as sns
data = sns.load_dataset("tips")
data = data.sort_values(by=['size'], ascending=True)
# make a new dataframe for males and females
male = data[data["sex"] == "Male"]
female = data[data["sex"] == "Female"]
# get normalized colors for all data
colors = data['size'].values / float(data['size'].max())
# get colors for males / females
colors_male = colors[data["sex"].values == "Male"]
colors_female = colors[data["sex"].values == "Female"]
# colors are values in [0,1] range
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(9,4))
#create regplot for males, put it to left axes
#use colors_male to color the points with Blues cmap
sns.regplot(data=male, x="total_bill", y="tip", ax=ax1,
scatter_kws= {"c" : colors_male, "cmap":"Blues"})
# same for females
sns.regplot(data=female, x="total_bill", y="tip", ax=ax2,
scatter_kws={"c" : colors_female, "cmap":"Greens"})
ax1.set_title("Males")
ax2.set_title("Females")
for ax in [ax1, ax2]:
ax.set_xlim([0,60])
ax.set_ylim([0,12])
plt.tight_layout()
plt.show()

Using pd.cut to create bins for a graph, but bin values are not coming out as expected

Here is the code I'm running:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index() #grouping by 'fare' rounded to an integer and 'sex' and then getting the survivability
x =pd.cut(y.fare, (0,17,35,70,300,515)) #I'm not sure if my format is correct but this is how I cut up the fare values
y['Fare_bins']= x # adding the newly created bins to a new column "Fare_bins' in original dataframe.
#graphing with seaborn
sns.set(style="whitegrid")
g = sns.factorplot(x='Fare_bins', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
The problem I'm having is that Fare_values are showing up as (0,17].
The left side is a circle bracket and the right side is square bracket.
If possible I would like to have something like this:
(0-17) or [0-17]
Next, there seems to be a gap between each bar plot. I was expecting them to be adjoined. There are two graphs being represented, so I don't expect of the bars to be ajoined, but the first 5 bars(first graph)should be connected and the last 5 bars to eachother(second graph).
How can I go about fixing these two issues?

It seems I can add labels.
Just by adding labels to the "cut" method parameters, I can display the Fare_values as I want.
x =pd.cut(y.fare, (0,17,35,70,300,515), labels = ('(0-17)', '(17-35)', '(35-70)', '(70-300)','(300-515)') )
As for the brackets showing around the fare_value groups,
according to the documentation:
right : bool, optional
Indicates whether the bins include the rightmost edge or not. If right == True (the default), then the bins [1,2,3,4] indicate (1,2], (2,3], (3,4].
Still not sure if it's possible to join the bars though.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Plotly.express Area multiple plots data error - pandas

Related

How to turn Pandas output into an image for download? [duplicate]

Seaborn countplot -- trouble adding counts to top of bars

Second Matplotlib figure doesn't save to file

How to pass different scatter kwargs to facets in lmplot in Seaborn

Using pd.cut to create bins for a graph, but bin values are not coming out as expected

Categories

Resources