how to display x and y error bars using openpyxl? - openpyxl

I am working on a project that requires the generation of xy plots in Excel. The x and y values have standard deviations associated with them. I would like to display both the x and y standard deviations as error bars. I can get the each of the error bars to display individually, but not both at the same time.
Below is an example of the code. Any suggestions would be greatly appreciated.
D201_avgX = Reference(resultsws,min_col=D201_avg_col,min_row=2,max_col=D201_avg_col,max_row=smprow_n)
D201_stdevX = NumDataSource(NumRef(Reference(resultsws,min_col=D201_stdev_col,min_row=2,max_col=D201_stdev_col,max_row=smprow_n)))
D201_stdev = ErrorBars(plus = D201_stdevX,minus = D201_stdevX,errBarType = 'both',errDir = 'x', errValType = 'cust')
D199_avgY = Reference(resultsws,min_col=D199_avg_col,min_row=1,max_col=D199_avg_col,max_row=smprow_n)
D199_stdevY = NumDataSource(NumRef(Reference(resultsws,min_col=D199_stdev_col,min_row=1,max_col=D199_stdev_col,max_row=smprow_n)))
D199_stdev = ErrorBars(plus = D199_stdevY,minus = D199_stdevY,errBarType = 'both',errDir = 'y', errValType = 'cust')
D201_D199_ser = SeriesFactory(D199_avgY,D201_avgX,title_from_data=True)
D201_D199_ser.marker.symbol = 'circle'
D201_D199_ser.graphicalProperties.line.noFill = True
D201_D199_ser.errBars = D201_stdev
D201_D199_ser.errBars = D199_stdev
D201_D199_chart = ScatterChart()
D201_D199_chart.series.append(D201_D199_ser)
D201_D199_ser.trendline = Trendline(dispRSqr=True)
D201_D199_chart.x_axis.title = 'D201'
D201_D199_chart.y_axis.title = 'D199'
D201_D199_chart.height = 10
D201_D199_chart.width = 15

It's possible that the implementation isn't perfect: it's based on the specification and this isn't clear on some details. I suggest you create a relevant chart in Excel or similar and look at the generated XML. You should be able to create the same structure using openpyxl, which attempts to faithfully implement the spec. If there are any problems then please submit a bug report or pull request.

Related

Gseapy: how to get gene list used for each pathway

I am running an enrichment analysis with gseapy enrichr on a list of genes.
I am using the following code:
enr_res = gseapy.enrichr(gene_list = glist[:5000],
organism = 'Mouse',
gene_sets = ['GO_Biological_Process_2021'],
description = 'pathway',
#cutoff = 0.5
)
The result looks like this:
enr_res.results.head(10)
The question I have is, how do I get the full set of Genes (very right column in the picture) used for the individual pathways?
If I try the following code, it will just give me the already displayed genes. I added some correction to have a list that I then could further use for the analysis.
x = 'fatty acid beta-oxidation (GO:0006635)'
g_list = enr_res.results[enr_res.results.Term == x]['Genes'].to_string()
deliminator = ';'
g_list = [section + deliminator for section in g_list.split(deliminator) if section]
g_list = [s.replace(';', '') for s in g_list]
g_list = [s.replace(' ', '') for s in g_list]
g_list = [s.replace('.', '') for s in g_list]
first_gene = g_list[0:1]
first_gene = [sub[1 : ] for sub in first_gene]
g_list[0:1] = first_gene
for i in range(len(g_list)):
g_list[i] = g_list[i].lower()
for i in range(len(g_list)):
g_list[i] = g_list[i].capitalize()
g_list
I think my approach might be wrong to get all the Genes and I just get the displayed genes. Does somebody has an idea, how it is possible to get all genes?
pd.set_option('display.max_colwidth', 3000)
This increases the number of displayed characters and somehow this solves the problem for me. :)

django model migrate 3 problem(default, max_length, CharField)

I have a model like this, and i did migrate then two problem is here (First off, the run was a success.)
class light_status(models.Model):
class Meta:
db_table = 'light_status'
idx = models.AutoField(primary_key=True, max_length=11, null=False)
did = models.IntegerField(max_length=11, null=False)
onoff = models.CharField(max_length=3, default='on')
level = models.IntegerField(max_length=3, default=100)
ct = models.IntegerField(max_length=4)
r = models.IntegerField(max_length=3)
g = models.IntegerField(max_length=3)
b = models.IntegerField(max_length=3)
hue = models.IntegerField(max_length=4)
sat = models.IntegerField(max_length=4)
bright = models.IntegerField(max_length=4)
x = models.IntegerField(max_length=4)
y = models.IntegerField(max_length=4)
update = models.DateTimeField()
And look at this result.
RESULT
there are three problem.
one.
No default values ​​are entered.
two.
All Integerfield sizes are fixed at 11.
three.
All CharField are converted to VARCHAR. (Isn't there a way to use CHAR?)
I would like to know the reasons for the three problems and how to solve them. Thank you for teaching me.

mplcursors on multiaxis graph

In my program, im using mplcursors on a matplotlib graph so I can identify certain points precisely.
mplcursors.cursor(multiple=True).connect("add", lambda sel: sel.annotation.draggable(False))
Now I made a complex graph with multiple axis:
first = 1
offset = 60
for x in range(len(cat_list)):
if "Time" not in cat_list[x]:
if first and not cat_list[x].startswith("EngineSpeed"):
parasites[x] = ParasiteAxes(host, sharex = host)
host.parasites.append(parasites[x])
parasites[x].axis["right"].set_visible(True)
parasites[x].set_ylabel(cat_list[x])
parasites[x].axis["right"].major_ticklabels.set_visible(True)
parasites[x].axis["right"].label.set_visible(True)
p_plot, = parasites[x].plot(t, t_num_list[x], label = cat_list[x])
#parasites[x].axis["right"+str(x+1)].label.set_color(p_plot.get_color())
parasites[x].axis["right"].label.set_color(p_plot.get_color())
first = 0
elif not cat_list[x].startswith("EngineSpeed"):
parasites[x] = ParasiteAxes(host, sharex = host)
host.parasites.append(parasites[x])
parasites[x].set_ylabel(cat_list[x])
new_axisline = parasites[x].get_grid_helper().new_fixed_axis
parasites[x].axis["right"+str(x+1)] = new_axisline(loc = "right",
axes = parasites[x],
offset = (offset, 0))
p_plot, = parasites[x].plot(t, t_num_list[x])
parasites[x].axis["right"+str(x+1)].label.set_color(p_plot.get_color())
offset = offset + 60
host.legend()
fig.add_axes(host)
plt.show()
This code results in the following graph:
https://i.stack.imgur.com/Wl7yC.png
Now I have to somehow be able to select certain points by selecting which axis im using. How do I make a selection menu for choosing an active axis and how do I then use mplcursors to select my points?
Thanks,
Ziga

Is non-identical not enough to be considered 'distinct' for kmeans centroids?

I have an issue with kmeans clustering providing centroids. I saw the same problem already asked (
K-means: Initial centers are not distinct), but the solution in that post is not working in my case.
I selected the centroids using ClusterR::Kmeans_arma. I confirmed that my centroids are not identical using mgcv::uniquecombs, but still got the initial centers are not distinct error.
> dim(t(dat))
[1] 13540 11553
> centroids = ClusterR::KMeans_arma(data = t(dat), centers = 561,
n_iter = 50, seed_mode = "random_subset",
verbose = FALSE, CENTROIDS = NULL)
> dim(centroids)
[1] 561 11553
> x = mgcv::uniquecombs(centroids)
> dim(x)
[1] 561 11553
> res = kmeans(t(dat), centers = centroids, iter.max = 200)
Error in kmeans(t(dat), centers = centroids, iter.max = 200) :
initial centers are not distinct
Any suggestion to resolve this? Thanks!
I replicated the issue you've mentioned with the following data:
cols = 13540
rows = 11553
set.seed(1)
vec_dat = runif(rows * cols)
dat = matrix(vec_dat, nrow = rows, ncol = cols)
dim(dat)
dat = t(dat)
dim(dat)
There is no 'centers' parameter in the 'ClusterR::KMeans_arma()' function, therefore I've assumed you actually mean 'clusters',
centroids = ClusterR::KMeans_arma(data = dat,
clusters = 561,
n_iter = 50,
seed_mode = "random_subset",
verbose = TRUE,
CENTROIDS = NULL)
str(centroids)
dim(centroids)
The 'centroids' is a matrix of class "k-means clustering". If your intention is to come to the clusters then you can use,
clust = ClusterR::predict_KMeans(data = dat,
CENTROIDS = centroids,
threads = 6)
length(unique(clust)) # 561
class(centroids) # "k-means clustering"
If you want to pass the 'centroids' to the base R 'kmeans' function you have to set the 'class' of the 'centroids' object to NULL and that because the base R 'kmeans' function uses internally the base R 'duplicated()' function (you can view this by using print(kmeans) in the R console) which does not recognize the 'centroids' object as a matrix or data.frame (it is an object of class "k-means clustering") and performs the checking column-wise rather than row-wise. Therefore, the following should work for your case,
class(centroids) = NULL
dups = duplicated(centroids)
sum(dups) # this should actually give 0
res = kmeans(dat, centers = centroids, iter.max = 200)
I've made a few adjustments to the "ClusterR::predict_KMeans()" and particularly I've added the "threads" parameter and a check for duplicates, therefore if you want to come to the clusters using multiple cores you have to install the package from Github using,
remotes::install_github('mlampros/ClusterR',
upgrade = 'always',
dependencies = TRUE,
repos = 'https://cloud.r-project.org/')
The changes will take effect in the next version of the CRAN package which will be "1.2.2"
UPDATE regarding output and performance (based on your comment):
data(dietary_survey_IBS, package = 'ClusterR')
kmeans_arma = function(data) {
km_cl = ClusterR::KMeans_arma(data,
clusters = 2,
n_iter = 10,
seed_mode = "random_subset",
seed = 1)
pred_cl = ClusterR::predict_KMeans(data = data,
CENTROIDS = km_cl,
threads = 1)
return(pred_cl)
}
km_arma = kmeans_arma(data = dietary_survey_IBS)
km_algos = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")
for (algo in km_algos) {
cat('base-kmeans-algo:', algo, '\n')
km_base = kmeans(dietary_survey_IBS,
centers = 2,
iter.max = 10,
nstart = 1, # can be set to 5 or 10 etc.
algorithm = algo)
km_cl = as.vector(km_base$cluster)
print(table(km_arma, km_cl))
cat('--------------------------\n')
}
microbenchmark::microbenchmark(kmeans(dietary_survey_IBS,
centers = 2,
iter.max = 10,
nstart = 1, # can be set to 5 or 10 etc.
algorithm = algo), kmeans_arma(data = dietary_survey_IBS), times = 100)
I don't see any significant difference in the output clusters between the 'base R kmeans' and the 'kmeans_arma' function for all available 'base R kmeans' algorithms (you can test it also for your own data sets). I am not sure which algorithm the 'armadillo' library uses internally and moreover the 'base R kmeans' includes the 'nstart' parameter (you can consult the documentation for more info). Regarding performance you won't see any substantial differences for small to medium data sets but due to the fact that the armadillo library uses OpenMP internally in case that your computer has more than 1 cores then for big data sets I think the 'ClusterR::KMeans_arma' function will return the 'centroids' faster.

bnlearn error in structural.em

I got an error when try to use structural.em in "bnlearn" package
This is the code:
cut.learn<- structural.em(cut.df, maximize = "hc",
+ maximize.args = "restart",
+ fit="mle", fit.args = list(),
+ impute = "parents", impute.args = list(), return.all = FALSE,
+ max.iter = 5, debug = FALSE)
Error in check.data(x, allow.levels = TRUE, allow.missing = TRUE,
warn.if.no.missing = TRUE, : at least one variable has no observed
values.
Did anyone have the same problems, please tell me how to fix it.
Thank you.
I got structural.em working. I am currently working on a python interface to bnlearn that I call pybnl. I also ran into the problem you desecribe above.
Here is a jupyter notebook that shows how to use StructuralEM from python marks.
The gist of it is described in slides-bnshort.pdf on page 135, "The MARKS Example, Revisited".
You have to create an inital fit with an inital imputed dataframe by hand and then provide the arguments to structural.em like so (ldmarks is the latent-discrete-marks dataframe where the LAT column only contains missing/NA values):
library(bnlearn)
data('marks')
dmarks = discretize(marks, breaks = 2, method = "interval")
ldmarks = data.frame(dmarks, LAT = factor(rep(NA, nrow(dmarks)), levels = c("A", "B")))
imputed = ldmarks
# Randomly set values of the unobserved variable in the imputed data.frame
imputed$LAT = sample(factor(c("A", "B")), nrow(dmarks2), replace = TRUE)
# Fit the parameters over an empty graph
dag = empty.graph(nodes = names(ldmarks))
fitted = bn.fit(dag, imputed)
# Although we've set imputed values randomly, nonetheless override them with a uniform distribution
fitted$LAT = array(c(0.5, 0.5), dim = 2, dimnames = list(c("A", "B")))
# Use whitelist to enforce arcs from the latent node to all others
r = structural.em(ldmarks, fit = "bayes", impute="bayes-lw", start=fitted, maximize.args=list(whitelist = data.frame(from = "LAT", to = names(dmarks))), return.all = TRUE)
You have to use bnlearn 4.4-20180620 or later, because it fixes a bug in the underlying impute function.