Related
I'm trying to create an empty pandas.Dataframe with a Multi-Index that I can later fill columnwise with my data. I've looked at other answers (here and here), but they all work with data that does not fill in columnwise, or that is somehow connected in the different columns.
The information I want to be contained in the Multi-Index looks like this:
GCM_list = ['BCC-CSM2-MR', 'CAMS-CSM1-0', 'CESM2', 'CESM2-WACCM', 'CMCC-CM2-SR5', 'EC-Earth3', 'EC-Earth3-Veg', 'FGOALS-f3-L', 'GFDL-ESM4', 'INM-CM4-8', 'INM-CM5-0', 'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM', 'TaiESM1']
SSP_list = ['SSP_126', 'SSP_245', 'SSP_370', 'SSP_585']
index_years = [2030, 2040, 2050, 2060, 2070, 2080, 2090, 2100]
And I want it to look somewhat like this (for the three first items in GCM_list):
BCC-CSM2-MR CAMS-CSM1-0 CESM2
SSP_126 SSP_245 SSP_370 SSP_585 SSP_126 SSP_245 SSP_370 SSP_585 SSP_126 SSP_245 SSP_370 SSP_585
2030 | |
2040 | |
2050 V V
2060 1 2
2070
2080
2090
2100
The "arrows" in the first two columns should represent how and in what order I want to fill the Dataframe after the Index is created - if that's important for this question.
I've tried building the index like this, but I'm not sure what to make of the result. How should I proceed? Is there a way to build this empty dataframe so that I can fill it column after column?
arrays = [GCM_list, SSP_list]
index = pd.MultiIndex.from_arrays(arrays, names=('GCM', 'SSP'))
>>> index
MultiIndex(levels=[[u'BCC-CSM2-MR', u'CAMS-CSM1-0', u'CESM2', u'CESM2-WACCM', u'CMCC-CM2-SR5', u'EC-Earth3', u'EC-Earth3-Veg', u'FGOALS-f3-L', u'GFDL-ESM4', u'INM-CM4-8', u'INM-CM5-0', u'MPI-ESM1-2-HR', u'MRI-ESM2-0', u'NorESM2-MM', u'TaiESM1'], [u'SSP_126', u'SSP_245', u'SSP_370', u'SSP_585']],
labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14], [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]],
names=[u'GCM', u'SSP'])
Use MultiIndex.from_product:
arrays = [GCM_list, SSP_list]
mux = pd.MultiIndex.from_product(arrays, names=('GCM', 'SSP'))
df = pd.DataFrame(columns=mux, index=index_years)
enter image description hereenter image description hereI have a large dataframe with muliple columns and rows. They are grouped by geographic location and date. The problem is I have too many columns with dates. I think I need to further develop this dataframe so that I have: "GeographyCode", "Number of Awards", "Secondary School Stage", "SCQF Level" and "DateCode" as single rows. I do not know if my data can be used for Scikit Learn Linear Regression. Please help.
pivot02.columns
MultiIndex(levels=[['Number Of Awards', 'SCQF Level', 'Secondary School Stage'], ['2002/2003', '2003/2004', '2004/2005', '2005/2006', '2006/2007', '2007/2008', '2008/2009', '2009/2010', '2010/2011', '2011/2012', '2012/2013']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]],
names=[None, 'DateCode'])
I have successfuly grouped geographic location, Number Of Awards', 'SCQF Level', 'Secondary School Stage. But the final output is a multi index which I do not know if I can use in linear regression. Is this ok for machine learning?
I'm learning Bayesian data analysis. I try to replicate the tutorials by Trond Reitan by stan, which are originally created by WinBugs.
Specifically, I have following data and model
weta.windata<-list(numdet=c(0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 1, 2, 0, 3, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, 0, 3, 1, 1, 3, 1, 1, 2, 0, 2, 1, 1, 1, 1,0, 0, 0, 2, 0, 2, 4, 3, 1, 0, 0, 2, 0, 2, 2, 1, 0, 0, 1),
numvisit=c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,4, 4, 4, 4, 4, 4, 4 ,4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
nsites=72)
model_string1="
data{
int nsites;
real<lower=0> numdet[nsites];
real<lower=0> numvisit[nsites];
}
parameters{
real<lower=0> p;
real<lower=0> psi;
int<lower=0> z[nsites];
}
model{
p~uniform(0,1);
psi~uniform(0,1);
for(i in 1:nsites){
z[i]~ bernoulli(psi);
p.site[i]~z[i]*p;
numdet[i]~binomial(numvisit[i],p.site[i]);
}
}
"
mcmc_samples <- stan(model_code=model_string1,
data=weta.windata,
pars=c("p","psi","z"),
chains=3, iter=30000, warmup=10000)
The context is about detecting wetas in fields. There are 72 sites. for each site, researchers visited several times (i.e., numvisit) and recorded the number of times weta found (i.e., numdet).
There is a latent variable z, describing whether one site has weta or not. psi is the probability that one site has weta. p is the detection rate.
The problem I have is I can not declare z to be integers
parameters or transformed parameters cannot be integer or integer array; found declared type int, parameter name=z
Problem with declaration.
However, if I set z to be real, that is,
real<lower=0> z[nsites];
the problem becomes I cannot set the variable from bernoulli as integer...
No matches for:
real ~ bernoulli(real)
I'm very new to stan. Forgive me if this question is very silly.
Stan doesn't support integer parameters or hacks to let you pretend real variables are integers. What it does support is marginalizing the integer variables out of the density. You can then reconstruct them with much more efficiency and much higher tail resolution.
The chapter in the manual on latent discrete parameters is the place to start. It includes an implementation of the CJS population models, which may be familiar. I implemented the Dorazio and Royle occupance models as a case study and Hiroki Ito translated the entire Kery and Schaub book to Stan. They're all linked under users >> documentation on the web site.
I ran into this mysterious error with ulam while answering practice problems in Statistical Rethinking. When you're constructing a list to pass to the data argument to ulam be sure to use = rather than <- for assignment. If you don't the list you construct won't have named components, and a missing name produces this error.
Say I have input data with variable length sequences loaded into memory:
sentences = [
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5 ],
[0, 1, 2, 3, 4, 5 ]
]
How can I use this to fill a queue? E.g. something like:
padding_q = tf.PaddingFIFOQueue(
capacity=len(sentences),
dtypes=[tf.int32], shapes=[[None]])
qr = tf.train.QueueRunner(padding_q, [the_wanted_op])
How does the_wanted_op look like? It should enqueue one sentence yet four enqueues must have enqueued each sentence once.
I have a pandas dataframe with a multiindex. Unfortunately one of the indices gives years as a string
e.g. '2010', '2011'
how do I convert these to integers?
More concretely
MultiIndex(levels=[[u'2010', u'2011'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, , ...]], names=[u'Year', u'Month'])
.
df_cbs_prelim_total.index.set_levels(df_cbs_prelim_total.index.get_level_values(0).astype('int'))
seems to do it, but not inplace. Any proper way of changing them?
Cheers,
Mike
Will probably be cleaner to do this before you assign it as index (as #EdChum points out), but when you already have it as index, you can indeed use set_levels to alter one of the labels of a level of your multi-index. A bit cleaner as your code (you can use index.levels[..]):
In [165]: idx = pd.MultiIndex.from_product([[1,2,3], ['2011','2012','2013']])
In [166]: idx
Out[166]:
MultiIndex(levels=[[1, 2, 3], [u'2011', u'2012', u'2013']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
In [167]: idx.levels[1]
Out[167]: Index([u'2011', u'2012', u'2013'], dtype='object')
In [168]: idx = idx.set_levels(idx.levels[1].astype(int), level=1)
In [169]: idx
Out[169]:
MultiIndex(levels=[[1, 2, 3], [2011, 2012, 2013]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
You have to reassign it to save the changes (as is done above, in your case this would be df_cbs_prelim_total.index = df_cbs_prelim_total.index.set_levels(...))