Passing Tuple to a function via apply - pandas

I am trying to run below function which takes two points..
point A=(2,3)
point B=(4,5
def Somefunc(pointA, point B):
x= pointA[0] + pointB[1]
return x
Now, when in try to create a separate column based on this fucntion, it is throwing me errors like cannot convert the series to <class 'float'>, so I tried this
df['T']=df.apply(Somefunc((df['A'].apply(lambda x: float(x)),df['B'].apply(lambda x: float(x))),\
(df['C'].apply(lambda x: float(x)),df['D'].apply(lambda x: float(x)))),axis=0))
Sample dataframe below;
A B C D
1 2 3 5
2 4 7 8
4 7 9 0
Any help will be appreciated.

This is the best guess I can make as to what you're trying to do:
df['T']=df.apply(lambda row: [(row['A'],row['B']),(row['C'],row['D'])],axis=1)
Edit: to apply your function;
df['T'] = df.apply(lambda row: SomeFunc((row['A'],row['B']),(row['C'],row['D'])),axis=1)
that being said, the same result can be achieved much quicker and idiomatically like so:
>>> df
A B C D
0 2 7 3 3
1 3 1 5 7
2 2 0 6 2
3 3 9 5 9
4 0 2 3 7
>>> df['T']=df.apply(tuple,axis=1)
>>> df
A B C D T
0 2 7 3 3 (2, 7, 3, 3)
1 3 1 5 7 (3, 1, 5, 7)
2 2 0 6 2 (2, 0, 6, 2)
3 3 9 5 9 (3, 9, 5, 9)
4 0 2 3 7 (0, 2, 3, 7)

Related

create column based on column values - merge integers

I would like to create a new column "Group". The integer values from column "Step_ID" should be converted into 1 and 2. The fist two values should be converted to 1, the second two values to 2, the third two values to 1 etc. See the image below.
import pandas as pd
data = {'Step_ID': [1, 1, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9, 10, 11, 11]}
df1 = pd.DataFrame(data)
You can try:
m = (df.Step_ID % 2) + df.Step_ID
df['new_group'] = (m.ne(m.shift()).cumsum() % 2).replace(0,2)
OUTPUT:
Step_ID new_group
0 1 1
1 1 1
2 2 1
3 2 1
4 3 2
5 4 2
6 5 1
7 6 1
8 6 1
9 7 2
10 8 2
11 8 2
12 9 1
13 10 1
14 11 2
15 11 2

Concatenate all combinations of sub-level columns in a pandas DataFrame

Given the following DataFrame:
cols = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b']])
example = pd.DataFrame([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]], columns=cols)
example
A B
a b a b
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
I would like to end up with the following one:
A B
0 0 2
1 4 6
2 8 10
3 0 3
4 4 7
5 8 11
6 1 2
7 5 6
8 9 10
9 1 3
10 5 7
11 9 11
I used this code:
concatenated = pd.DataFrame([])
for A_sub_col in ('a', 'b'):
for B_sub_col in ('a', 'b'):
new_frame = example[[['A', A_sub_col], ['B', B_sub_col]]]
new_frame.columns = ['A', 'B']
concatenated = pd.concat([concatenated, new_frame])
However, I strongly suspect that there is a more straight-forward, idiomatic way to do that with Pandas. How would one go about it?
Here's an option using list comprehension:
pd.concat([
example[[('A', i), ('B', j)]].droplevel(level=1, axis=1)
for i in example['A'].columns
for j in example['B'].columns
]).reset_index(drop=True)
Output:
A B
0 0 2
1 4 6
2 8 10
3 0 3
4 4 7
5 8 11
6 1 2
7 5 6
8 9 10
9 1 3
10 5 7
11 9 11
Here is one way. Not sure how more pythonic this is. It is definitely less readable :-) but on the other hand does not use explicit loops:
(example
.apply(lambda c: [list(c)])
.stack(level=1)
.apply(lambda c:[list(c)])
.explode('A')
.explode('B')
.apply(pd.Series.explode)
.reset_index(drop = True)
)
to understand what's going on it would be helpful to do this one step at a time, but the end result is
A B
0 0 2
1 4 6
2 8 10
3 0 3
4 4 7
5 8 11
6 1 2
7 5 6
8 9 10
9 1 3
10 5 7
11 9 11

Append dataframe in specific row

I have dataframe in the following format
a b label
1 5 A
2 6 A
3 7 A
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
I want append with new dataframe
a b label
3 4 A
The result become this
a b label
1 5 A
2 6 A
3 7 A
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
3 4 A <-- New Data
My question is how order new data become this every append new data
a b label
1 5 A
2 6 A
3 7 A
3 4 A <-- New Data
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
This is my code
import pandas as pd
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 1, 2,5,3],
"b":[5, 6, 7, 8, 5, 6,6,2],
"label":['A','A','A','B','B','B','C','C']})
new_data = pd.DataFrame({"a":[3],
"b":[4],
"label":['A']})
df1 = df1.append(new_data,ignore_index = True)
You can simply sort it on the label column after the data frame append
import numpy as np
import pandas as pd
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 1, 2,5,3],
"b":[5, 6, 7, 8, 5, 6,6,2],
"label":['A','A','A','B','B','B','C','C']})
new_data = pd.DataFrame({"a":[3],
"b":[4],
"label":['A']})
df1 = df1.append(new_data,ignore_index = True).sort_values(by='label')
Result :
a b label
1 5 A
2 6 A
3 7 A
3 4 A <-- new data here
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C

groupby list of lists of indexes

I have a list of np. arrays, representing indexes of pandas dataframe.
I need to groupby index to get each group for each array
let's say, that is the df:
index values
0 2
1 3
2 2
3 2
4 4
5 4
6 1
7 4
8 4
9 4
and that is the list of np.arrays:
[array([0, 1, 2, 3]), array([6, 7, 8])]
from this data I expect to get 2 groups without loop opertaions as a single groupby object:
group1:
index values
0 2
1 3
2 2
3 2
group2:
index values
6 1
7 4
8 4
I would stress again that finally I need to get a single groupby object.
Thank you!
I still using for-loop to create the groupby key dict
l=[np.array([0, 1, 2, 3]), np.array([6, 7, 8])]
df=pd.DataFrame([2, 3, 2, 2, 4, 4, 1, 4, 4, 4],columns=['values'])
from collections import ChainMap
L=dict(ChainMap(*[dict.fromkeys(y,x) for x, y in enumerate(l)]))
list(df.groupby(L))
Out[33]:
[(0.0, values
index
0 2
1 3
2 2
3 2), (1.0, values
index
6 1
7 4
8 4)]
df=pd.DataFrame([2,3,2,2,4,4,1,4,4,4],columns=['values'])
df.index.name ='index'
l=[np.array([0, 1, 2, 3]), np.array([6, 7, 8])]
group1= df.loc[pd.Series(l[0])]
group2= df.loc[pd.Series(l[1])]
This seems like an X-Y problem:
l = [np.array([0,1,2,3]), np.array([6,7,8])]
df_indx = pd.DataFrame(l).stack().reset_index()
df_new = df.assign(foo=df['index'].map(df_indx.set_index(0)['level_0']))
for n,g in df_new.groupby('foo'):
print(g)
Output:
index values foo
0 0 2 0.0
1 1 3 0.0
2 2 2 0.0
3 3 2 0.0
index values foo
6 6 1 1.0
7 7 4 1.0
8 8 4 1.0

Conditional filter of entire group for DataFrameGroupBy

If I have the following data
>>> data = pd.DataFrame({'day': [1, 1, 1, 1, 2, 2, 2, 2, 3, 4],
'hour':[4, 5, 6, 7, 4, 5, 6, 7, 4, 7]})
>>> data
day hour
0 1 4
1 1 5
2 1 6
3 1 7
4 2 4
5 2 5
6 2 6
7 2 7
8 3 4
9 4 7
And I would like to keep only days where hour has 4 unique values then I would think to do something like this
>>> data.groupby('day').apply(lambda x: x[x['hour'].nunique() == 4])
But this returns KeyError: True
I am hoping to get this
>>> data
day hour
0 1 4
1 1 5
2 1 6
3 1 7
4 2 4
5 2 5
6 2 6
7 2 7
Where we see that where day == 3 and day == 4 have been filtered because when grouped by day they don't have 4 unique values of hour. I'm doing this at scale so simply filtering where (day == 3) & (day == 4) is not an option. I think grouping would be a good way to do this but can't get it to work. Anyone have experience with applying functions to DataFrameGroupBy?
I think you actually need to filter the data:
>>> data.groupby('day').filter(lambda x: x['hour'].nunique() == 4)
day hour
0 1 4
1 1 5
2 1 6
3 1 7
4 2 4
5 2 5
6 2 6
7 2 7