Pandas apply on index of dataframe with a mult-iindex - pandas

I have a dataframe with an index like
MultiIndex(levels=[['A', 'B', 'C', 'D', 'E', 'F', 'G'], [0, 1]],
labels=[[0, 0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6], [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['I1', 'I2'])
Now I would like to apply a function to index I1.
If it were a simple column I would do something like
df['I1'] = df['I1'].apply(lamdba x :...)
How can I apply this to an Index in a df with a multi-index?

I believe need rename:
df = df.rename(lambda x: 'a' + x, level=0)
Or Index.get_level_values for select level of MultiIndex, map and then create MultiIndex.from_arrays:
idx = df.index.get_level_values('I1').map(lambda x: 'a' + x)
df.index = pd.MultiIndex.from_arrays([idx, df.index.get_level_values('I2')])
because I get :
df.index = df.index.set_levels(idx, level='I1')
ValueError: Level values must be unique: ['aA', 'aA', 'aB', 'aB', 'aC', 'aD', 'aD', 'aE', 'aE', 'aF', 'aF', 'aG', 'aG'] on level 0
Sample:
mux = pd.MultiIndex(levels=[['A', 'B', 'C', 'D', 'E', 'F', 'G'], [0, 1]],
labels=[[0, 0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6], [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['I1', 'I2'])
df = pd.DataFrame([0] * 13, index=mux, columns=['a'])
df = df.rename(lambda x: 'a' + x, level=0)
print(df)
a
I1 I2
aA 0 0
1 0
aB 0 0
1 0
aC 1 0
aD 0 0
1 0
aE 0 0
1 0
aF 0 0
1 0
aG 0 0
1 0

Related

how to split numpy array by step?

how to split numpy array by step?
Example:
I have array:
[3, 0, 5, 0, 7, 0, 3, 1]
I want to spit like this:
[3, 5, 6, 3]
[0, 0, 0, 1]
Or a more understandable example:
['a1', 'a2', 'b1', 'b2'] -- > ['a1', 'b1'] and ['a2', 'b2']
You can do this with array slicing.
arr = np.array([3, 0, 5, 0, 7, 0, 3, 1])
A = arr[::2]
B = arr[1::2]
see docs on slices here

how to change value based on criteria pandas

I have a following problem. I have this df:
d = {'id': [1, 1, 2, 2, 3], 'value': [0, 1, 0, 0, 1]}
df = pd.DataFrame(data=d)
I would like to have a new column where value will be 1 if in any other cases it is also 1. See desired output:
d = {'id': [1, 1, 2, 2, 3], 'value': [0, 1, 0, 0, 1], 'newvalue': [1, 1, 0, 0, 1]}
df = pd.DataFrame(data=d)
How can I do it please?
If need set 0,1 by condition - here at least one 1 use GroupBy.transform with GroupBy.any for mask and casting to integers for True, False to 1,0 map:
df['newvalue'] = df['value'].eq(1).groupby(df['id']).transform('any').astype(int)
Alternative:
df['newvalue'] = df['id'].isin(df.loc[df['value'].eq(1), 'id']).astype(int)
Or if only 0,1 values is possible simplify solution for new column by maximal values per groups:
df['newvalue'] = df.groupby('id')['value'].transform('max')
print (df)
id value newvalue
0 1 0 1
1 1 1 1
2 2 0 0
3 2 0 0
4 3 1 1

Some array indexing in numpy

lookup = np.array([60, 40, 50, 60, 90])
The values in the following arrays are equal to indices of lookup.
a = np.array([1, 2, 0, 4, 3, 2, 4, 2, 0])
b = np.array([0, 1, 2, 3, 3, 4, 1, 2, 1])
c = np.array([4, 2, 1, 4, 4, 0, 4, 4, 2])
array 1st column elements lookup value
a 1 --> 40
b 0 --> 60
c 4 --> 90
Maximum is 90.
So, first element of result is 4.
This way,
expected result = array([4, 2, 0, 4, 4, 4, 4, 4, 0])
How to get it?
I tried as:
d = np.vstack([a, b, c])
print (d)
res = lookup[d]
res = np.max(res, axis = 0)
print (d[enumerate(lookup)])
I got error
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Do you want this:
d = np.vstack([a,b,c])
# option 1
rows = lookup[d].argmax(0)
d[rows, np.arange(d.shape[1])]
# option 2
(lookup[:,None] == lookup[d].max(0)).argmax(0)
Output:
array([4, 2, 0, 4, 4, 4, 4, 4, 0])

tensorflow expand counts into ranges

We have a Tensor of unknown length N, containing some int32 values.
How can we generate another Tensor that will contain N ranges concatenated together, each one between 0 and the int32 value from the original tensor ?
For example, if we have [4, 4, 5, 3, 1], the output Tensor should look like [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0].
Thank you for any advice.
You can make this work with a tensor as input by using a tf.RaggedTensor which can contain dimensions of non-uniform length.
# Or any other N length tensor
tf_counts = tf.convert_to_tensor([4, 4, 5, 3, 1])
tf.print(tf_counts)
# [4 4 5 3 1]
# Create a ragged tensor, each row is a sequence of length tf_counts[i]
tf_ragged = tf.ragged.range(tf_counts)
tf.print(tf_ragged)
# <tf.RaggedTensor [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2], [0]]>
# Read values
tf.print(tf_ragged.flat_values, summarize=-1)
# [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0]
For this 2-dimensional case the ragged tensor tf_ragged is a “matrix“ of rows with varying length:
[[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2],
[0]]
Check tf.ragged.range for more options on how to create the sequences on each row: starts for inclusive lower limits, limits for exclusive upper limit, deltas for increment. Each may vary for each sequence.
Also mind that the dtype of the tf_counts tensor will propagate to the final values.
If you want to have everything as a tensorflow object, then use tf.range() along with tf.concat().
In [88]: vals = [4, 4, 5, 3, 1]
In [89]: tf_range = [tf.range(0, limit=item, dtype=tf.int32) for item in vals]
# concat all `tf_range` objects into a single tensor
In [90]: concatenated_tensor = tf.concat(tf_range, 0)
In [91]: concatenated_tensor.eval()
Out[91]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
There're other approaches to do this as well. Here, I assume that you want a constant tensor but you can construct any tensor once you have the full range list.
First, we construct the full range list using a list comprehension, make a flat list out of it, and then construct a tensor.
In [78]: from itertools import chain
In [79]: vals = [4, 4, 5, 3, 1]
In [80]: range_list = list(chain(*[range(item) for item in vals]))
In [81]: range_list
Out[81]: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
In [82]: const_tensor = tf.constant(range_list, dtype=tf.int32)
In [83]: const_tensor.eval()
Out[83]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
On the other hand, we can also use tf.range() but then it returns an array when you evaluate it. So, you'd have to construct the list from the arrays and then make a flat list out of it and finally construct the tensor as in the following example.
list_of_arr = [tf.range(0, limit=item, dtype=tf.int32).eval() for item in vals]
range_list = list(chain(*[arr.tolist() for arr in list_of_arr]))
# output
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
const_tensor = tf.constant(range_list, dtype=tf.int32)
const_tensor.eval()
#output tensor as numpy array
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)

Convert string to integer pandas dataframe index

I have a pandas dataframe with a multiindex. Unfortunately one of the indices gives years as a string
e.g. '2010', '2011'
how do I convert these to integers?
More concretely
MultiIndex(levels=[[u'2010', u'2011'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, , ...]], names=[u'Year', u'Month'])
.
df_cbs_prelim_total.index.set_levels(df_cbs_prelim_total.index.get_level_values(0).astype('int'))
seems to do it, but not inplace. Any proper way of changing them?
Cheers,
Mike
Will probably be cleaner to do this before you assign it as index (as #EdChum points out), but when you already have it as index, you can indeed use set_levels to alter one of the labels of a level of your multi-index. A bit cleaner as your code (you can use index.levels[..]):
In [165]: idx = pd.MultiIndex.from_product([[1,2,3], ['2011','2012','2013']])
In [166]: idx
Out[166]:
MultiIndex(levels=[[1, 2, 3], [u'2011', u'2012', u'2013']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
In [167]: idx.levels[1]
Out[167]: Index([u'2011', u'2012', u'2013'], dtype='object')
In [168]: idx = idx.set_levels(idx.levels[1].astype(int), level=1)
In [169]: idx
Out[169]:
MultiIndex(levels=[[1, 2, 3], [2011, 2012, 2013]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
You have to reassign it to save the changes (as is done above, in your case this would be df_cbs_prelim_total.index = df_cbs_prelim_total.index.set_levels(...))