Function like `enumerate` for arrays with custom indices? - indexing

For an array with a non-one based index like:
using OffsetArrays
a = OffsetArray( [1,2,3], -1)
Is there a simple way to get a tuple of (index,value), similar to enumerate?
Enumerating still counts the elements... collect(enumerate(a)) returns:
3-element Array{Tuple{Int64,Int64},1}:
(1, 1)
(2, 2)
(3, 3)
I'm looking for:
(0, 1)
(1, 2)
(2, 3)

The canonical solution is to use pairs:
julia> a = OffsetArray( [1,2,3], -1);
julia> for (i, x) in pairs(a)
println("a[", i, "]: ", x)
end
a[0]: 1
a[1]: 2
a[2]: 3
julia> b = [1,2,3];
julia> for (i, x) in pairs(b)
println("b[", i, "]: ", x)
end
b[1]: 1
b[2]: 2
b[3]: 3
It works for other types of collections too:
julia> d = Dict(:a => 1, :b => 2, :c => 3);
julia> for (i, x) in pairs(d)
println("d[:", i, "]: ", x)
end
d[:a]: 1
d[:b]: 2
d[:c]: 3
You can find a lot of other interesting iterators by reading the documentation of Base.Iterators.

Try eachindex(a) to get the indexes, see the example below:
julia> tuple.(eachindex(a),a)
3-element OffsetArray(::Array{Tuple{Int64,Int64},1}, 0:2) with eltype Tuple{Int64,Int64} with indices 0:2:
(0, 1)
(1, 2)
(2, 3)

Related

using agg to flatten a series of lists in pandas

I have a number of multi-index columns each with a list of tuples that I want to flatten (the list, not the tuples) but I'm struggling with it. Here's what I have:
df = pd.DataFrame([[[(1,'a')],[(6,'b')],np.nan,np.nan],[[(5,'d'),(10,'e')],np.nan,np.nan,[(8,'c')]]])
df.columns = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)])
>>> df
a b
0 1 0 1
0 [(1, a)] [(6, b)] NaN NaN
1 [(5, d), (10, e)] NaN NaN [(8, c)]
Desired result:
>>> df
a b
0 [(1, a), (6, b)] [NaN, NaN]
1 [(5, d), (10, e), NaN] [NaN, (8, c)]
How do I do this? From this related question, I tried the following:
>>> df.stack(level=1).groupby(level=[0]).agg(lambda x: np.array(list(x)).flatten())
a b
0 a b
1 a b
>>> df.stack(level=1).groupby(level=[0]).agg(lambda x: np.concatenate(list(x)))
...
Exception: Must produce aggregated value
Here's a way to do:
# taken from https://stackoverflow.com/questions/12472338/flattening-a-list-recursively
def flatten(S):
if S == []:
return S
if isinstance(S[0], list):
return flatten(S[0]) + flatten(S[1:])
return S[:1] + flatten(S[1:])
# reshape the data for get the desired structure
df2 = (df
.unstack()
.reset_index()
.drop('level_1', 1)
.groupby(['level_0', 'level_2'])[0]
.apply(list).apply(flatten).unstack().T)
df2.index.name = None
df2.columns.name = None
print(df2)
a b
0 [(1, a), (6, b)] [na, na]
1 [(5, d), (10, e), na] [na, (8, c)]
Found a one-liner:
Using the flatten custom function given by #YOLO
>>> df.stack(level=1).groupby(level=0).agg(list).applymap(flatten)
a b
0 [(1, a), (6, b)] [nan, nan]
1 [(5, d), (10, e), nan] [nan, (8, c)]
where
def flatten(S):
if S == []:
return S
if isinstance(S[0], list):
return flatten(S[0]) + flatten(S[1:])
return S[:1] + flatten(S[1:])

Selecting values with Pandas multiindex using lists of tuples

I have a DataFrame with a MultiIndex with 3 levels:
id foo bar col1
0 1 a -0.225873
2 a -0.275865
2 b -1.324766
3 1 a -0.607122
2 a -1.465992
2 b -1.582276
3 b -0.718533
7 1 a -1.904252
2 a 0.588496
2 b -1.057599
3 a 0.388754
3 b -0.940285
Preserving the id index level, I want to sum along the foo and bar levels, but with different values for each id.
For example, for id = 0 I want to sum over foo = [1] and bar = [["a", "b"]], for id = 3 I want to sum over foo = [2] and bar = [["a", "b"]], and for id = 7 I want to sum over foo = [[1,2]] and bar = [["a"]]. Giving the result:
id col1
0 -0.225873
3 -3.048268
7 -1.315756
I have been trying something along these lines:
df.loc(axis = 0)[[(0, 1, ["a","b"]), (3, 2, ["a","b"]), (7, [1,2], "a")].sum()
Not sure if this is even possible. Any elegant solution (possibly removing the MultiIndex?) would be much appreciated!
The list of tuples is not the problem. The fact that each tuple does not correspond to a single index is the problem (Since a list isn't a valid key). If you want to index a Dataframe like this, you need to expand the lists inside each tuple to their own entries.
Define your options like the following list of dictionaries, then transform using a list comprehension and index using all individual entries.
d = [
{
'id': 0,
'foo': [1],
'bar': ['a', 'b']
},
{
'id': 3,
'foo': [2],
'bar': ['a', 'b']
},
{
'id': 7,
'foo': [1, 2],
'bar': ['a']
},
]
all_idx = [
(el['id'], i, j)
for el in d
for i in el['foo']
for j in el['bar']
]
# [(0, 1, 'a'), (0, 1, 'b'), (3, 2, 'a'), (3, 2, 'b'), (7, 1, 'a'), (7, 2, 'a')]
df.loc[all_idx].groupby(level=0).sum()
col1
id
0 -0.225873
3 -3.048268
7 -1.315756
A more succinct solution using slicers:
sections = [(0, 1, slice(None)), (3, 2, slice(None)), (7, slice(1,2), "a")]
pd.concat(df.loc[s] for s in sections).groupby("id").sum()
col1
id
0 -0.225873
3 -3.048268
7 -1.315756
Two things to note:
This may be less memory-efficient than the accepted answer since pd.concat creates a new DataFrame.
The slice(None)'s are mandatory, otherwise the index columns of the df.loc[s]'s mismatch when calling pd.concat.

Cumulative Z op throws a "The iterator of this Seq is already in use/consumed by another Seq"

This is another way of solving previous question
my #bitfields;
for ^3 -> $i {
#bitfields[$i] = Bool.pick xx 3;
}
my #total = [\Z+] #bitfields;
say #total;
It should zip-add every row to the next one, and accumulate the value. However, this yields the error
The iterator of this Seq is already in use/consumed by another Seq
(you might solve this by adding .cache on usages of the Seq, or
by assigning the Seq into an array)
in block <unit> at vanishing-total.p6 line 8
Any idea how this might be solved?
First xx creates a Sequence
say (Bool.pick xx 3).^name; # Seq
So you probably want to turn that into an Array (or List).
for ^3 -> $i {
#bitfields[$i] = [Bool.pick xx 3];
}
Also rather than .pick xx 3, I would use .roll(3).
for ^3 -> $i {
#bitfields[$i] = [Bool.roll(3)];
}
The zip (Z) meta operator creates Sequences as well.
say ( [1,2] Z [3,4] ).perl;
# ((1, 3), (2, 4)).Seq
say ( [1,2] Z+ [3,4] ).perl
# (4, 6).Seq
So [\Z+] won't even work the way you want for two inputs.
say [\Z+]( [1,2], [3,4] ).perl;
# (Seq.new-consumed(), Seq.new-consumed()).Seq
say [\Z+]( 1, 2 ).perl;
# (Seq.new-consumed(), Seq.new-consumed()).Seq
It does work if you do something to cache the intermediate values.
say [\Z+]( [1,2], [3,4] ).map(*.cache).perl
# ((3,), (4, 6)).Seq
say [\Z+]( [1,2], [3,4] ).map(*.list).perl
# ((3,), (4, 6)).Seq
say [\Z+]( [1,2], [3,4] ).map(*.Array).perl
# ([3], [4, 6]).Seq
You might want to also add a list to the front, and a .skip.
my #bitfields = [
[Bool::True, Bool::True, Bool::False],
[Bool::False, Bool::False, Bool::True ],
[Bool::False, Bool::True, Bool::True ]
];
say [\Z+]( #bitfields ).map(*.List)
# ((2) (1 1 1) (1 2 2))
say [\Z+]( (0,0,0), |#bitfields ).map(*.List).skip
# ((1 1 0) (1 1 1) (1 2 2))
If you don't need the intermediate results [Z+] would work just fine.
say [Z+]( Bool.roll(3) xx 3 ) for ^5;
# (0 1 3)
# (1 2 1)
# (1 0 3)
# (0 1 2)
# (1 2 2)

filtering factor programming

I want to make function, which is filtering odd- index.
(filter-idx '(0 1 2 3 4)) => '(1 3)
(filter-idx '(#\a #\b #\c (0 1))) => '(#\b (0 1))
So, I made like this, but it doesn't work..
(define (filter-idx xs)
(memf (lambda (x)
(= (remainder x 2) 1))
xs))
You need to handle the indexes and the elements separately. This is one way to do it:
(define (filter-idx xs)
(for/list ([i (range (length xs))] ; iterate over indexes of elements
[x xs] ; iterate over elements of list
#:when (odd? i)) ; filter elements in odd indexes
x)) ; create a list with only the elements that meet the condition
For example:
(filter-idx '(0 1 2 3 4))
=> '(1 3)
(filter-idx '(#\a #\b #\c (0 1)))
=> '(#\b (0 1))

Fill pandas fields with tuples as elements by slicing

Sorry if this question has been asked before, but I did not find it here nor somewhere else:
I want to fill some of the fields of a column with tuples. Currently I would have to resort to:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
df['b'] = ''
df['b'] = df['b'].astype(object)
mytuple = ('x','y')
for l in df[df.a % 2 == 0].index:
df.set_value(l, 'b', mytuple)
with df being (which is what I want)
a b
0 1
1 2 (x, y)
2 3
3 4 (x, y)
This does not look very elegant to me and probably not very efficient. Instead of the loop, I would prefer something like
df.loc[df.a % 2 == 0, 'b'] = np.array([mytuple] * sum(df.a % 2 == 0), dtype=tuple)
which (of course) does not work. How can I improve my above method by using slicing?
In [57]: df.loc[df.a % 2 == 0, 'b'] = pd.Series([mytuple] * len(df.loc[df.a % 2 == 0])).values
In [58]: df
Out[58]:
a b
0 1
1 2 (x, y)
2 3
3 4 (x, y)