bokeh.factor_mark() not mapping color even though code run without any troubles? - data-visualization

I was trying to colormap by using a categorical column like the code below:
#Define input
x = df[x_column]
y = df[y_column]
#z = df[colors_list]
#Mapping color
#colormap = {'working': 'red', 'idle': 'green', 'no tray': 'blue','take item out': 'yellow','take tray out': 'black', 'put tray in':'gray'}
#colors_list = [colormap[i] for i in df['state']]
#Unique State
STATE = sorted(df.state.unique())
markers = ['circle_cross', 'circle_dot', 'circle_x', 'circle_y', 'cross', 'dash']
#groupby STATE
group = df.groupby(by=['state'])
index_cmap = factor_mark('state',
markers=markers,
factors=STATE,
)
#MARKERS = ['hex', 'circle_x', 'triangle']
#Source data
source = ColumnDataSource(data=dict(x=x, y=y))
# Set up plot
plot = figure(plot_height=700, plot_width=1000, title="Weight by Time",
x_axis_label="Time",
y_axis_label="Weights",
x_axis_type="datetime", x_axis_location="above",
background_fill_color="#fafafa",
x_range=[min(x), max(x)],
y_range=[min(y)-0.5, max(y)+1],)
plot.line(x='x',y='y',source=source)
plot.scatter(x='x', y='y', source=source,legend_field="STATE", fill_color=index_cmap, fill_alpha=0.5, size=4,)
However the factor_mark() function seems to not working properly:
While I was expecting the scatter plot to look like this:
I have read all the Bokeh.Transform() but still havent quite figured out why this happenning. Your help will be greatly appreciated.

Related

How to "bring to front" points plotted on in boxplot

As stated, I am trying to "bring to front" the points in the boxplot as shown in the below graph.
current plot
Here is the code I used to create the graph.
ggboxplot(dose2, x = "Group", y = "WTNeut", ylab = "NT50", palette = "jco", add = "point",
color = "black")+ scale_x_discrete(labels=c("Control","Infliximab","Ifx+Thio","Thiopurine","Tofacitinib","Ustekinumab","Vedolizumab"))+
theme(axis.text.x = element_text(angle=22.5, hjust=1))+
geom_boxplot(aes(fill = Group))+
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1e-1, 1e6))
I have tried to move the "add = "point"" to the front or to the back, however still didn't work.
Any thoughs would be greatfully appreciated! Thank you!

Choropleth us-states map showing incorrect data and color coding

I am trying to show hospitals by type in US states. The dataset I am using is here https://www.kaggle.com/carlosaguayo/usa-hospitals
I am using choropleth and here is my code. I basically have a dropdown with the type of hospital and when select, I am getting the count
#app.callback(Output('figure-1', 'figure'),
[Input('options-drop', 'value')])
def make_figure(varname):
mygraphtitle = f'Hospitals of {varname}'
mycolorscale = 'Blues'
mycolorbartitle = "Count"
data=go.Choropleth(
locations=df['STATE'],
locationmode = 'USA-states',
z = df[df["TYPE"] == varname]["STATE"].value_counts(),
colorscale = mycolorscale,
colorbar_title = mycolorbartitle,
)
fig = go.Figure(data)
fig.update_layout(
title_text = mygraphtitle,
geo_scope='usa',
width=1200,
height=800
)
return fig
I have 3 issues with the grpah
Data is not being shown for all states
Data shown for few states is incorrect
Color coding is incorrect even for those states with incorrect data. state with higher hospital count is shown with lighter blue whereas with lower count is shown with darker blue
You can see below from pandas, I can tell Texas has only 65 critical access hospitals but the US map shows count as 78 and even if it was 78, the color of Texas is light blue compared with other state with lower hospitals. Where am I going wrong?
I have not verified this in the Dash environment, but I believe the operation will be the same. The cause is that you are specifying a state column for the entire data frame you are setting up. The easiest response is to target the filtered data frame.
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./data/Hospitals.csv')
varname = 'CRITICAL ACCESS'
filtered_df = df[df["TYPE"] == varname]["STATE"].value_counts().to_frame('value')
#print(filtered_df)
fig = go.Figure(data=go.Choropleth(
locations=filtered_df.index,
z = filtered_df['value'],
locationmode = 'USA-states',
colorscale = 'Blues',
colorbar_title = "Count",
))
fig.update_layout(
title_text = f'Hospitals of {varname}',
geo_scope='usa',
width=1200,
height=800
)
fig.show()

Confidence Interval for large dataset

I would like to get a confidence interval for very large datasets. It is composed by around 700,000 points for x and y. I also tried to use less data, like 200 points, and with that it is possible to plot. But, when it comes to my specific datasets, it does not show the confidence interval.
For that, my code is based on:
x_x = np.array(y_test[:, 0]) #about 700,000 points
y_y = np.array(y_pred[:, 0]) #about 700,000 points
sns.set(style = 'whitegrid')
p = sns.FacetGrid(d, size = 4, aspect = 1.5)
p.map(plt.scatter, 'x_x', 'y_y', color = 'red')
p.map(sns.regplot, 'x_x', 'y_y', scatter = False, ci = 95,
fit_reg = True, color = 'blue')
p.map(sns.regplot, 'x_x', 'y_y', scatter = False, ci = 0,
fit_reg = True, color = 'darkgreen')
And also the Figure so far:

bnlearn error in structural.em

I got an error when try to use structural.em in "bnlearn" package
This is the code:
cut.learn<- structural.em(cut.df, maximize = "hc",
+ maximize.args = "restart",
+ fit="mle", fit.args = list(),
+ impute = "parents", impute.args = list(), return.all = FALSE,
+ max.iter = 5, debug = FALSE)
Error in check.data(x, allow.levels = TRUE, allow.missing = TRUE,
warn.if.no.missing = TRUE, : at least one variable has no observed
values.
Did anyone have the same problems, please tell me how to fix it.
Thank you.
I got structural.em working. I am currently working on a python interface to bnlearn that I call pybnl. I also ran into the problem you desecribe above.
Here is a jupyter notebook that shows how to use StructuralEM from python marks.
The gist of it is described in slides-bnshort.pdf on page 135, "The MARKS Example, Revisited".
You have to create an inital fit with an inital imputed dataframe by hand and then provide the arguments to structural.em like so (ldmarks is the latent-discrete-marks dataframe where the LAT column only contains missing/NA values):
library(bnlearn)
data('marks')
dmarks = discretize(marks, breaks = 2, method = "interval")
ldmarks = data.frame(dmarks, LAT = factor(rep(NA, nrow(dmarks)), levels = c("A", "B")))
imputed = ldmarks
# Randomly set values of the unobserved variable in the imputed data.frame
imputed$LAT = sample(factor(c("A", "B")), nrow(dmarks2), replace = TRUE)
# Fit the parameters over an empty graph
dag = empty.graph(nodes = names(ldmarks))
fitted = bn.fit(dag, imputed)
# Although we've set imputed values randomly, nonetheless override them with a uniform distribution
fitted$LAT = array(c(0.5, 0.5), dim = 2, dimnames = list(c("A", "B")))
# Use whitelist to enforce arcs from the latent node to all others
r = structural.em(ldmarks, fit = "bayes", impute="bayes-lw", start=fitted, maximize.args=list(whitelist = data.frame(from = "LAT", to = names(dmarks))), return.all = TRUE)
You have to use bnlearn 4.4-20180620 or later, because it fixes a bug in the underlying impute function.

while creating table underneath axis on a plot, is there a way to create some whitespace between the axis and the table using matplotlib?

I am trying to create table inside a plot right underneath the axis of the plot using matplotlib
I am using the plt.table function to do this
However, when i create the table, it's created right on top of the axis, hence overlaps with the axislabels
Is there a way to create the white space in between
the code looks something like this
for key, value in arrayToPlot.iteritems():
ax1 = fig.add_subplot(2, 2, 1)
if value["HorErr"]:
cdf = []
#calculate percentile stats for the value["HorErr"]
cdfArrayPointer[key]["HorErr"]["percentileStats"]=libMath.percentileForListofPercentiles( value["HorErr"], PERCENTILE, validPointsOnly = True )
# now calculate the cdf values
cdfArrayPointer[key]["HorErr"]["cdf"] = libMath.cdf( value["HorErr"], 2, 400, validPointsOnly = True)
for k, v in cdfArrayPointer[key]["HorErr"]["cdf"].iteritems():
cdf.append( v )
#plot the cdf value
ax1.plot(cdf, 'o-', label = ('HorErr for ' + str( key) ), color = getColour(key), markersize=3)
plt.title("CDF Plot of 2D-Horizontal Error", size = 8)
plt.ylabel('Percentile %', size = 7)
plt.xlabel('Horizontal Error [m]', size = 6)
plt.axis([0, 150, 0, 110])
leg = plt.legend(loc = 4)
setLegendSize( leg, 7)
# creating the table to be drawn on the axis
tableTexts["rows"].append(key)
tableTexts["rowColour"].append(getColour(key))
if (len(tableTexts["col"]) == 0):
tableTexts["col"] = tuple(cdfArrayPointer[key]["HorErr"]["percentileStats"].keys())
tableTexts["values"].append(cdfArrayPointer[key]["HorErr"]["percentileStats"].values())
the_table = plt.table(cellText=tableTexts["values"], rowLabels= tableTexts["rows"], rowColours= tableTexts["rowColour"] ,colLabels= tableTexts["col"], loc="bottom")
What about breaking your figure up using subplot?
You could have the axis in one subplot, and the table in another. (See example near bottom of page here)
I can illustrate further if you can't follow.