Pandas dataframe rendered with bokeh shows no marks - pandas

I am attempting to create a simple hbar() chart on two columns [project, bug_count]. Sample dataframe follows:
df = pd.DataFrame({'project': ['project1', 'project2', 'project3', 'project4'],
'bug_count': [43683, 31647, 27494, 24845]})
When attempting to render any chart: scatter, circle, vbar etc... I get a blank chart.
This very simple code snippet shows an empty viz. This example shows a f.circle() just for demonstration, I'm actually trying to implement a f.hbar().
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
f = figure()
f.circle(df['project'], df['bug_count'],size = 10)
show(f)

The values of df['project'] are strings, i.e. categorical values, not numbers. Categorical ranges must be explicitly provided, since you are the only person who possess the knowledge of what order the arbitrary factors should appear in on the axis. Something like
p = figure(x_range=sorted(set(df['project'])))
There is an entire chapter in the User's Guide devoted to Handling Categorical Data, with many complete examples (including many bar charts) that you can refer to.

Related

Cannot plot a histogram from a Pandas dataframe

I've used pandas.read_csv to generate a 1000-row dataframe with 32 columns. I'm looking to plot a histogram or bar chart (depending on data type) of each column. For columns of type 'int64', I've tried doing matplotlib.pyplot.hist(df['column']) and df.hist(column='column'), as well as calling matplotlib.pyplot.hist on df['column'].values and df['column'].to_numpy(). Weirdly, nthey all take areally long time (>30s) and when I've allowed them to complet, I get unit-height bars in multiple colors, as if there's some sort of implicit grouping and they're all being separated into different groups. Any ideas about what I can do to get a normal histogram? Unfortunately I closed the charts so I can't show you an example right now.
Edit - this seems to be a much bigger problem with Int columns, and casting them to float fixes the problem.
Follow these two steps:
import the Histogram class from the Matplotlib library
use the "plot" method, which will accept a dataframe as argument
import matplotlib.pyplot as plt
plt.hist(df['column'], color='blue', edgecolor='black', bins=int(45/1))
Here's the source.

Make colors in seaborn based on column names

I have a table with three columns: Names, A, B that I use to create a plot with the following code:
import seaborn as sns
sns.scatterplot(data=df, x="B", y="A")
How can make two different colors for dots based on column names? (i.e. A - red, B - green)
In a scatterplot, each point represents a pair values relating two sets of data, in your case A and B. Therefore, since each point on the graph is a pair, you can't colour different each individual point based on 'A' or 'B'.
What you can do, is set a different colour based on your Name column, using hue argument.
Below is an example using seaborn's tips dataset.
import seaborn as sns
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")
In your case try something like:
sns.scatterplot(data=df, x="B", y="A",hue="Name")
https://seaborn.pydata.org/generated/seaborn.scatterplot.html

Pandas styling - change font size and format float/apply background gradient

I am building an application that displays stock correlations data in various visual forms, including a matrix with a heatmap applied. My heatmap is created by passing the correlation matrix dataframe into IPy Widgets Output, so I can display it as part of a VBox later on. I have successfully applied a background gradient and formatted my numbers to 2dp. Can anyone help me edit the function to also reduce the font size, I just want to shrink it up a little?
Note: I chose to do this using dataframe styling over matplotlib as I had a number of issues getting the output to display in the way I wanted. I also have a function that downloads the dataframe to excel with the styling applied.
I have tried putting the following line of code at the beginning of my notebook so I can leave it outside of the function, but it seems to get ignored once the dataframe is passed to Output.
pd.options.display.float_format = "{:,.2f}".format
Here is my code sample:
import seaborn as sns
import ipywidgets as ipw
import pandas as pd
import numpy as np
#Sample Data
data = np.random.randint(5,30,size=500)
df = pd.DataFrame(data.reshape((50,10)))
corr = df.corr()
#Function produces dataframe as Output
def output_heatmap_df(df):
out = ipw.Output()
with out:
display(df.style\
.background_gradient(cmap=sns.diverging_palette(220,10, as_cmap=True),axis=None).format("{:,.2f}"))
out.layout.width='1600px'
return out
output_heatmap_df(corr)
In case anyone should come across this, the below code worked for me in the end:
def output_heatmap_df(df):
out = ipw.Output()
with out:
display(df.style\
.background_gradient(cmap=sns.diverging_palette(220,10, as_cmap=True),axis=None).format("{:,.2f}")
.set_properties(**{'text-align':'center','font-size':'10px'})
.set_table_styles([{'selector':'th','props':[('text-align','center'),('font-size','10px')]}])
)
out.layout.width='1600px'
return out

visualizing longitudinal patient data: Adding specific icons or symbols to certain cells in a time series heatmap to indicate events/outcomes

I am currently involved in a clinical study. We are trying to visualize patients blood work over time using seaborn cluster maps (for example patient CPR levels). For reference: We have some 200 Patients and up to 60 days of observed data, so cells in the plot are pretty small.
Some patients during the observations period either died or developed an outcome of interest. We would love to visualize these key events with some form a symbol or icon. I am imagining something like this:
In addition to its color coding the field at the date of death gehts big dot right in the middle, or even a symbolic cross or some other symbol.
Things that might work, but i do not know how to do:
I am using lines to seperate cells. Changing the widths and color of the cells at the date an event occured might work.
Things that dont work:
cell in my heatmap are too small for custom annotations
import pandas as pd
import seaborn as sns
df= pd.read_excel('data.xlsx')
heatmap = sns.clustermap(df,col_cluster=False, row_cluster=False, cmap='YlOrRd', mask=df=0, vmax=10, vmin=0, linewidths=1, linecolor='black', figsize=(20,16), cbar_pos=(0.1, 0.2, .02, .6))

Holoviews Polygons inputs

I have been able to make a choropleth map in Bokeh using multiple lists (latitudes, longitudes, county names, value to display, color to display). I wanted to use Holoviews with Bokeh to get their color legend as I prefer it over Bokeh's disjoint grouping one.
In general, I have been unable to find good documentation on structuring a dataframe so that Holoviews can pull data from it. I found mentions of it on their GeoViews documentation, and tried to replicate the Choropleths example they give but cannot get it to work. How do dataframes need to be formatted for Holoviews?
If you are wanting to render polygons from dataframes in HoloViews/GeoViews you have one of two options:
1) Use geopandas dataframes, which will work out of the box. Just pass your geopandas dataframe to the Polygons element and it will display itself.
2) Pass in a list of dataframes one for each polygon, e.g. in the following example we create list of dataframes by creating Box elements and calling dframe on them. This list of dataframes can now be passed to the Polygons element:
list_of_dfs = [hv.Box(0, 0, i/10.).dframe() for i in range(10, 1, -1)]
hv.Polygons(list_of_dfs)