How to combine two Pandas dataframes into a single one across the axis=2 (ie. so that the cell values are tuples)?

How to combine two Pandas dataframes into a single one across the axis=2 (ie. so that the cell values are tuples)? - pandas

I have two (large) dataframes. They have the same index & columns, and I want to combine them so that they have tuple values in each cell.
The example explains it best:
pd.DataFrame({
'A':[True, True, False],
'B':[False, True, False],
})
df2 = pd.DataFrame({
'A':[1, 2, 3],
'B':[5, 6, 7],
})
# Desired output:
pd.DataFrame({
'A':[(True, 1), (True, 2), (False, 3)],
'B':[(False, 5), (True, 6), (False, 7)],
})
The DataFrames are large (1m rows+), so looking to do this somewhat efficiently.
I tried np.stack([df1.values, df2.values], axis=2) and that got me the right value array, but I could not convert it into a dataframe.
Any ideas?

I got your desired output with this solution
import pandas as pd
df1 = pd.DataFrame({
'A':[True, True, False],
'B':[False, True, False],
})
df2 = pd.DataFrame({
'A':[1, 2, 3],
'B':[5, 6, 7],
})
for df_1k, df_2k in zip(df1.columns, df2.columns):
df1[df_1k] = list(map(tuple, zip(df1[df_1k], df2[df_2k])))
print(df1)

Related

Set index for aggregated dataframe

I did some calculation to a list of dataframes. I'd like the result dataframe uses rangeindex. However, it uses one of the column name as index, even I set index=None
d1 = {'id': [1, 2, 3, 4, 5], 'is_free': [True, False, False, True, True], 'level': ['Top', 'Mid', 'Top', 'Top', 'Low']}
d2 = {'id': [1, 3, 4, 5, 7], 'is_free': [True, True, False, False, False], 'level': ['Top', 'High', 'Top', 'Top', 'Low']}
d1 = pd.DataFrame(data=d1)
d2 = pd.DataFrame(data=d2)
df_list = [d1, d2]
dfs = []
for i, df in enumerate(df_list):
df = df.groupby('is_free')['id'].count()
dfs.append(df)
df = pd.DataFrame(data=dfs, index=None)
It returns
is_free False True
id 2 3
id 3 2
df.index returns
Index(['id', 'id'], dtype='object')

From your code:
df = pd.DataFrame(data=dfs, index=None).reset_index(drop=True)
However, in general, I would avoid append iteratively. Try concat:
pd.concat({i:d.groupby('is_free')['id'].count()
for i,d in enumerate(df_list)},
axis=1).T
Or use pd.DataFrame:
pd.DataFrame({i:d.groupby('is_free')['id'].count()
for i,d in enumerate(df_list)}).T
Output:
is_free False True
0 2 3
1 3 2

numpy unique over multiple arrays

Numpy.unique expects a 1-D array. If the input is not a 1-D array, it flattens it by default.
Is there a way for it to accept multiple arrays? To keep it simple, let's just say a pair of arrays, and we are unique-ing the pair of elements across the 2 arrays.
For example, say I have 2 numpy array as inputs
a = [1, 2, 3, 3]
b = [10, 20, 30, 31]
I'm unique-ing against both of these arrays, so against these 4 pairs (1,10), (2,20) (3, 30), and (3,31). These 4 are all unique, so I want my result to say
[True, True, True, True]
If instead the inputs are as follows
a = [1, 2, 3, 3]
b = [10, 20, 30, 30]
Then the last 2 elements are not unique. So the output should be
[True, True, True, False]

You could use the unique_indices value returned by numpy.unique():
In [243]: def is_unique(*lsts):
...: arr = np.vstack(lsts)
...: _, ind = np.unique(arr, axis=1, return_index=True)
...: out = np.zeros(shape=arr.shape[1], dtype=bool)
...: out[ind] = True
...: return out
In [244]: a = [1, 2, 2, 3, 3]
In [245]: b = [1, 2, 2, 3, 3]
In [246]: c = [1, 2, 0, 3, 3]
In [247]: is_unique(a, b)
Out[247]: array([ True, True, False, True, False])
In [248]: is_unique(a, b, c)
Out[248]: array([ True, True, True, True, False])
You may also find this thread helpful.

Efficient column MultiIndex ordering

I have this dataframe :
df = pandas.DataFrame({'A' : [2000, 2000, 2000, 2000, 2000, 2000],
'B' : ["A+", 'B+', "A+", "B+", "A+", "B+"],
'C' : ["M", "M", "M", "F", "F", "F"],
'D' : [1, 5, 3, 4, 2, 6],
'Value' : [11, 12, 13, 14, 15, 16] }).set_index((['A', 'B', 'C', 'D']))
df = df.unstack(['C', 'D']).fillna(0)
And I'm wondering is there is a more elegant way to order the columns MultiIndex that the following code :
# rows ordering
df = df.sort_values(by = ['A', "B"], ascending = [True, True])
# col ordering
df = df.transpose().sort_values(by = ["C", "D"], ascending = [False, False]).transpose()
Especially I feel like the last line with the two transpose si far more complex than it should be. I tried using sort_index but wasn't able to use it in a MultiIndex context (for both lines and columns).

You can use sort index on both levels:
out = df.sort_index(level=[0,1],axis=1,ascending=[True, False])

I can use
axis=1
And therefore the last line become
df = df.sort_values(axis = 1, by = ["C", "D"], ascending = [True, False])

Printing unique list of indices in multiindex pandas dataframe

I am just starting out with pandas and have the following code:
import pandas as pd
d = {'num_legs': [4, 4, 2, 2, 2],
'num_wings': [0, 0, 2, 2, 2],
'class': ['mammal', 'mammal','bird-mammal', 'mammal', 'bird'],
'animal': ['cat', 'dog','cat', 'bat', 'penguin'],
'locomotion': ['walks', 'walks','hops', 'flies', 'walks']}
df = pd.DataFrame(data=d)
df = df.set_index(['class', 'animal', 'locomotion'])
I want to print everything that the animal cat does; here, that will be 'walks' and 'hops'.
I can filter to just the cat cross-section using
df2=df.xs('cat', level=1)
But from here, how do I access the level 'locomotion'?

You can do get_level_values
df.xs('cat', level=1).index.get_level_values(1)
Out[181]: Index(['walks', 'hops'], dtype='object', name='locomotion')

pandas dataframe subplot grouping by columns

df = pd.DataFrame([[0, 1, 2], [0, 1, 2]])
df.plot(subplots=True)
I want subplot by group [0, 1] and [2] columns. is there the way?

You can use DataFrameGroupBy.plot by Index.map by dictionary for 2 groups:
mapping = {0:'a', 1:'a', 2:'b'}
df.groupby(df.columns.map(mapping.get), axis=1).plot()
Detail:
print (df.columns.map(mapping.get))
Index(['a', 'a', 'b'], dtype='object')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to combine two Pandas dataframes into a single one across the axis=2 (ie. so that the cell values are tuples)? - pandas

Related

Set index for aggregated dataframe

numpy unique over multiple arrays

Efficient column MultiIndex ordering

Printing unique list of indices in multiindex pandas dataframe

pandas dataframe subplot grouping by columns

Categories

Resources