Choropleth us-states map showing incorrect data and color coding - pandas

I am trying to show hospitals by type in US states. The dataset I am using is here https://www.kaggle.com/carlosaguayo/usa-hospitals
I am using choropleth and here is my code. I basically have a dropdown with the type of hospital and when select, I am getting the count
#app.callback(Output('figure-1', 'figure'),
[Input('options-drop', 'value')])
def make_figure(varname):
mygraphtitle = f'Hospitals of {varname}'
mycolorscale = 'Blues'
mycolorbartitle = "Count"
data=go.Choropleth(
locations=df['STATE'],
locationmode = 'USA-states',
z = df[df["TYPE"] == varname]["STATE"].value_counts(),
colorscale = mycolorscale,
colorbar_title = mycolorbartitle,
)
fig = go.Figure(data)
fig.update_layout(
title_text = mygraphtitle,
geo_scope='usa',
width=1200,
height=800
)
return fig
I have 3 issues with the grpah
Data is not being shown for all states
Data shown for few states is incorrect
Color coding is incorrect even for those states with incorrect data. state with higher hospital count is shown with lighter blue whereas with lower count is shown with darker blue
You can see below from pandas, I can tell Texas has only 65 critical access hospitals but the US map shows count as 78 and even if it was 78, the color of Texas is light blue compared with other state with lower hospitals. Where am I going wrong?

I have not verified this in the Dash environment, but I believe the operation will be the same. The cause is that you are specifying a state column for the entire data frame you are setting up. The easiest response is to target the filtered data frame.
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./data/Hospitals.csv')
varname = 'CRITICAL ACCESS'
filtered_df = df[df["TYPE"] == varname]["STATE"].value_counts().to_frame('value')
#print(filtered_df)
fig = go.Figure(data=go.Choropleth(
locations=filtered_df.index,
z = filtered_df['value'],
locationmode = 'USA-states',
colorscale = 'Blues',
colorbar_title = "Count",
))
fig.update_layout(
title_text = f'Hospitals of {varname}',
geo_scope='usa',
width=1200,
height=800
)
fig.show()

Related

How to include a matplotlib graph for an interactive dashboard?

I want to include a line chart (constructed with matplotlib) in an interactive dashboard. My graph describes the evolution for one year of the frequency of the word "France" in 7 media for Central Africa. The database is called: "df_france_pivot".
What I've seen so far is that first of all I have to transform my plot into an object with the go.figure function. So I tried this code:
`app = dash.Dash()
def update_graph():
plt.style.use('seaborn-darkgrid')
fig, ax = plt.subplots()
ax.set_prop_cycle(color=['304558', 'FE9235', '526683', 'FE574B', 'FFD104', '6BDF9C'])
num=0
for column in df_france_pivot.drop('month_year', axis=1):
num+=1
plt.plot(df_france_pivot['month_year'], df_france_pivot[column], marker='',
linewidth=1, alpha=0.9, label=column)
plt.xticks(rotation=45)
plt.legend(loc=0, prop={'size': 9},bbox_to_anchor=(1.05, 1.0), title='Media in South Africa')
plt.title("Frequency of the word 'France' in the media ", loc='left', fontsize=12, fontweight=0, color='orange')
plt.xlabel("Time")
plt.ylabel("Percentage")
figure = go.Figure(fig)
return figure
app.layout = html.Div(id = 'parent', children = [
html.H1(id = 'H1', children = 'Styling using html components', style = {'textAlign':'center',\
'marginTop':40,'marginBottom':40}),
dcc.Graph(id = 'line_plot', figure = update_graph())
]
)`
When running it I got this response: Output exceeds the size limit. Open the full output data in a text editor. Is it because my linechart is more complex i.e. with 7 lines?
Thank you in advance!

mplcursors on multiaxis graph

In my program, im using mplcursors on a matplotlib graph so I can identify certain points precisely.
mplcursors.cursor(multiple=True).connect("add", lambda sel: sel.annotation.draggable(False))
Now I made a complex graph with multiple axis:
first = 1
offset = 60
for x in range(len(cat_list)):
if "Time" not in cat_list[x]:
if first and not cat_list[x].startswith("EngineSpeed"):
parasites[x] = ParasiteAxes(host, sharex = host)
host.parasites.append(parasites[x])
parasites[x].axis["right"].set_visible(True)
parasites[x].set_ylabel(cat_list[x])
parasites[x].axis["right"].major_ticklabels.set_visible(True)
parasites[x].axis["right"].label.set_visible(True)
p_plot, = parasites[x].plot(t, t_num_list[x], label = cat_list[x])
#parasites[x].axis["right"+str(x+1)].label.set_color(p_plot.get_color())
parasites[x].axis["right"].label.set_color(p_plot.get_color())
first = 0
elif not cat_list[x].startswith("EngineSpeed"):
parasites[x] = ParasiteAxes(host, sharex = host)
host.parasites.append(parasites[x])
parasites[x].set_ylabel(cat_list[x])
new_axisline = parasites[x].get_grid_helper().new_fixed_axis
parasites[x].axis["right"+str(x+1)] = new_axisline(loc = "right",
axes = parasites[x],
offset = (offset, 0))
p_plot, = parasites[x].plot(t, t_num_list[x])
parasites[x].axis["right"+str(x+1)].label.set_color(p_plot.get_color())
offset = offset + 60
host.legend()
fig.add_axes(host)
plt.show()
This code results in the following graph:
https://i.stack.imgur.com/Wl7yC.png
Now I have to somehow be able to select certain points by selecting which axis im using. How do I make a selection menu for choosing an active axis and how do I then use mplcursors to select my points?
Thanks,
Ziga

Getting "'function' object is not iterable" while trying to make a bar plot

Note: Not sure how I can make the title more descriptive without messing it up. Please let me know if it's necessary
So I'm trying to make a bar plot of 5 countries from a list of 52 countries (y-axis), while showing number of data scientists from those countries on the bar (x-axis)
Countop = df_demog_ds['Country'].value_counts()
Counval = []
Counindex = []
for index, val in Countop.iteritems():
Counindex.append(index)
Counval.append(val)
print(Counindex[0:5])
from matplotlib.ticker import FuncFormatter
fcounval = Counval[0:5]
fcounindex = Counindex[0:5]
barc = np.arange(5)
fig5, ax5 = plt.subplots
plt.bar(barc, fcounval)
plt.xticks(barc, fcounindex)
plt.show()
fcounval and fcounindex select the first 5 elements from Counval and Counindex. All 4 are lists
What I might be doing wrong here?
Link to the dataset - https://www7.zippyshare.com/v/IhcLQsKZ/file.html

How to use hover events in mpl_connect in matplotlib

I'm working on line plotting a metric for a course module as well as each of its questions within a Jupyter Notebook using %matplotlib notebook. That part is no problem. A module has typically 20-35 questions, so it results in a lot of lines on a chart. Therefore, I am plotting the metric for each question in a low alpha and I want to change the alpha and display the question name when I hover over the line, then reverse those when no longer hovering over the line.
The thing is, I've tried every test version of interactivity from the matplotlib documentation on event handling, as well as those in this question. It seems like the mpl_connect event is never firing, whether I use click or hover.
Here's a test version with a reduced dataset using the solution to the question linked above. Am I missing something necessary to get events to fire?
def update_annot(ind):
x,y = line.get_data()
annot.xy = (x[ind["ind"][0]], y[ind["ind"][0]])
text = "{}, {}".format(" ".join(list(map(str,ind["ind"]))),
" ".join([names[n] for n in ind["ind"]]))
annot.set_text(text)
annot.get_bbox_patch().set_alpha(0.4)
def hover(event):
vis = annot.get_visible()
if event.inaxes == ax:
cont, ind = line.contains(event)
if cont:
update_annot(ind)
annot.set_visible(True)
fig.canvas.draw_idle()
else:
if vis:
annot.set_visible(False)
fig.canvas.draw_idle()
module = 'bd2bc472-ee0d-466f-8557-788cc6de3018'
module_metrics[module] = {
'q_count': 31,
'sequence_pks': [0.5274546300604932,0.5262044653349001,0.5360993905297703,0.5292329279700655,0.5268691588785047,0.5319099014547161,0.5305164319248826,0.5268235294117647,0.573648805381582,0.5647933116581514,0.5669839795681448,0.5646591970121382,0.5663157894736842,0.5646976090014064,0.5659005628517824,0.5693634879925391,0.5728268468888371,0.5668834184858337,0.5687237026647967,0.5795640965549567,0.5877684407096172,0.585690904839841,0.5766899766899767,0.5971341320178529,0.6059972105997211,0.6055516678329834,0.6209865053513262,0.6203121360354065,0.6153666510976179,0.6236909471724459,0.6387654898293196],
'q_pks': {
'0da04f02-4aad-4ac8-91a5-214862b5c0d0': [0.6686046511627907,0.6282051282051282,0.76,0.6746987951807228,0.7092198581560284,0.71875,0.6585365853658537,0.7070063694267515,0.7171052631578947,0.7346938775510204,0.7737226277372263,0.7380952380952381,0.6774193548387096,0.7142857142857143,0.7,0.6962962962962963,0.723404255319149,0.6737588652482269,0.7232704402515723,0.7142857142857143,0.7164179104477612,0.7317073170731707,0.6333333333333333,0.75,0.7217391304347827,0.7017543859649122,0.7333333333333333,0.7641509433962265,0.6869565217391305,0.75,0.794392523364486],
'10bd29aa-3a26-49e6-bc2c-50fd503d7ab5': [0.64375,0.6014492753623188,0.5968992248062015,0.5059523809523809,0.5637583892617449,0.5389221556886228,0.5576923076923077,0.51875,0.4931506849315068,0.5579710144927537,0.577922077922078,0.5467625899280576,0.5362318840579711,0.6095890410958904,0.5793103448275863,0.5159235668789809,0.6196319018404908,0.6143790849673203,0.5035971223021583,0.5897435897435898,0.5857142857142857,0.5851851851851851,0.6164383561643836,0.6054421768707483,0.5714285714285714,0.627906976744186,0.5826771653543307,0.6504065040650406,0.5864661654135338,0.6333333333333333,0.6851851851851852]
}}
suptitle_size = 24
title_size = 18
tick_size = 12
axis_label_size = 15
legend_size = 14
fig, ax = plt.subplots(figsize=(15,8))
fig.suptitle('PK by Sequence Order', fontsize=suptitle_size)
module_name = 'Test'
q_count = module_metrics[module]['q_count']
y_ticks = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
x_ticks = np.array([x for x in range(0,q_count)])
x_labels = x_ticks + 1
# Plot it
ax.set_title(module_name, fontsize=title_size)
ax.set_xticks(x_ticks)
ax.set_yticks(y_ticks)
ax.set_xticklabels(x_labels, fontsize=tick_size)
ax.set_yticklabels(y_ticks, fontsize=tick_size)
ax.set_xlabel('Sequence', fontsize=axis_label_size)
ax.set_xlim(-0.5,q_count-0.5)
ax.set_ylim(0,1)
ax.grid(which='major',axis='y')
# Output module PK by sequence
ax.plot(module_metrics[module]['sequence_pks'])
# Output PK by sequence for each question
for qid in module_metrics[module]['q_pks']:
ax.plot(module_metrics[module]['q_pks'][qid], alpha=0.15, label=qid)
annot = ax.annotate("", xy=(0,0), xytext=(-20,20),textcoords="offset points", bbox=dict(boxstyle="round", fc="w"), arrowprops=dict(arrowstyle="->"))
annot.set_visible(False)
mpl_id = fig.canvas.mpl_connect('motion_notify_event', hover)
Since there are dozens of modules, I created an ipywidgets dropdown to select the module, which then runs a function to output the chart. Nonetheless, whether running it hardcoded as here or from within the function, mpl_connect never seems to fire.
Here's what this one looks like when run

Issue when trying to plot geom_tile using ggplotly

I would like to plot a ggplot2 image using ggplotly
What I am trying to do is to initially plot rectangles of grey fill without any aesthetic mapping, and then in a second step to plot tiles and change colors based on aesthetics. My code is working when I use ggplot but crashes when I try to use ggplotly to transform my graph into interactive
Here is a sample code
library(ggplot2)
library(data.table)
library(plotly)
library(dplyr)
x = rep(c("1", "2", "3"), 3)
y = rep(c("K", "B","A"), each=3)
z = sample(c(NA,"A","L"), 9,replace = TRUE)
df <- data.table(x,y,z)
p<-ggplot(df)+
geom_tile(aes(x=x,y=y),width=0.9,height=0.9,fill="grey")
p<-p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p
But when I type this
ggplotly(p)
I get the following error
Error in [.data.frame(g, , c("fill_plotlyDomain", "fill")) :
undefined columns selected
The versions I use are
> packageVersion("plotly")
1 ‘4.7.1
packageVersion("ggplot2")
1 ‘2.2.1.9000’
##########Edited example for Arthur
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill="G"),width=0.9,height=0.9)
p<- p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p<-p+ scale_fill_manual(
guide = guide_legend(title = "test",
override.aes = list(
fill =c("red","white") )
),
values = c("red","grey"),
labels=c("A",""))
p
This works
but ggplotly(p) adds the grey bar labeled G in the legend
The output of the ggplotly function is a list with the plotly class. It gets printed as Plotly graph but you can still work with it as a list. Moreover, the documentation indicates that modifying the list makes it possible to clear all or part of the legend. One only has to understand how the data is structured.
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
p2 <- ggplotly(p)
str(p2)
The global legend is here in p2$x$layout$showlegend and setting this to false displays no legend at all.
The group-specific legend appears at each of the 9 p2$x$data elements each time in an other showlegend attribute. Only 3 of them are set to TRUE, corresponding to the 3 keys in the legend. The following loop thus clears all the undesired labels:
for(i in seq_along(p2$x$data)){
if(p2$x$data[[i]]$legendgroup!='A'){
p2$x$data[[i]]$showlegend <- FALSE
}
}
Voilà!
This works here:
ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
ggplotly(p)
I guess your problem comes from the use of 2 different data sources, df and filter(df,z=="A"), with columns with the same name.
[Note this is not an Answer Yet]
(Putting for reference, as it is beyond the limits for comments.)
The problem is rather complicated.
I just finished debugging the code of plotly. It seems like it's occurring here.
I have opened an issue in GitHub
Here is the minimal code for the reproduction of the problem.
library(ggplot2)
set.seed(1503)
df <- data.frame(x = rep(1:3, 3),
y = rep(1:3, 3),
z = sample(c("A","B"), 9,replace = TRUE),
stringsAsFactors = F)
p1 <- ggplot(df)+
geom_tile(aes(x=x,y=y, fill="grey"), color = "black")
p2 <- ggplot(df)+
geom_tile(aes(x=x,y=y),fill="grey", color = "black")
class(plotly::ggplotly(p1))
#> [1] "plotly" "htmlwidget"
class(plotly::ggplotly(p2))
#> Error in `[.data.frame`(g, , c("fill_plotlyDomain", "fill")): undefined columns selected