Related
Consider numpy array arr , shown below:
arr = ([[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]])
I want to find all row permutations of arr. NOTE: the order of elements in any given row is unchanged. It is the entire rows that are being permuted.
Because arr has 5 rows, there will be 5! = 120 permutations. I’m hoping these could be ‘stacked’ into a 3d array p, having shape (120, 5, 6):
p = [[[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]],
[[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[1, 2, 3, 4, 5, 6]
[4, 8, 4, 8, 4, 8]],
… etc …
[[1, 2, 3, 4, 5, 6],
[4, 8, 4, 8, 4, 8],
[0, 1, 0, 1, 0, 1],
[2, 2, 2, 2, 2, 2],
[1, 5, 6, 3, 3, 7]]]
There is a lot of material online about permitting elements within rows, but I need help in permuting the entire rows themselves.
You can make use of itertools.permutations and np.argsort:
from itertools import permutations
out = np.array([arr[np.argsort(idx)] for idx in permutations(range(5))])
print(out)
[[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[1 2 3 4 5 6]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 2 3 4 5 6]
[4 8 4 8 4 8]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 2 3 4 5 6]]
...
[[1 2 3 4 5 6]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]
[[4 8 4 8 4 8]
[1 2 3 4 5 6]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]]
Similar answer, but you do not need to .argsort one more time
from itertools import permutations
import numpy as np
arr = np.array([[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]])
output = np.array([arr[i, :] for i in permutations(range(5))])
print(output)
[[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[1 2 3 4 5 6]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 2 3 4 5 6]
[4 8 4 8 4 8]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 2 3 4 5 6]]
...
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 5 6 3 3 7]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 5 6 3 3 7]
[2 2 2 2 2 2]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]]
This is a bit faster, here are speed comparisons:
%%timeit
output = np.array([arr[i, :] for i in permutations(range(5))])
381 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
output = np.array([arr[np.argsort(idx)] for idx in permutations(range(5))])
863 µs ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
I have the following dataframe:
I would like to get the following output from the dataframe
Is there anyway to group other columns ['B', 'index'] based on column 'A' using groupby aggregate function, pivot_table in pandas.
I couldn't think about an approach to write code.
Use:
df=df.reset_index() #if 'index' not is a colum
g=df['A'].ne(df['A'].shift()).cumsum()
new_df=df.groupby(g,as_index=False).agg(index=('index',list),A=('A','first'),B=('B',lambda x: list(x.unique())))
print(new_df)
In pandas <0.25:
new_df=df.groupby(g,as_index=False).agg({'index':list,'A':'first','B':lambda x: list(x.unique())})
if you want to repeat repeated in the index use the same function for the index column as for B:
new_df=df.groupby(g,as_index=False).agg(index=('index',lambda x: list(x.unique())),A=('A','first'),B=('B',lambda x: list(x.unique())))
print(new_df)
Here is an example:
df=pd.DataFrame({'index':range(20),
'A':[1,1,1,1,2,2,0,0,0,1,1,1,1,1,1,0,0,0,3,3]
,'B':[1,2,3,5,5,5,7,8,9,9,9,12,12,14,15,16,17,18,19,20]})
print(df)
index A B
0 0 1 1
1 1 1 2
2 2 1 3
3 3 1 5
4 4 2 5
5 5 2 5
6 6 0 7
7 7 0 8
8 8 0 9
9 9 1 9
10 10 1 9
11 11 1 12
12 12 1 12
13 13 1 14
14 14 1 15
15 15 0 16
16 16 0 17
17 17 0 18
18 18 3 19
19 19 3 20
g=df['A'].ne(df['A'].shift()).cumsum()
new_df=df.groupby(g,as_index=False).agg(index=('index',list),A=('A','first'),B=('B',lambda x: list(x.unique())))
print(new_df)
index A B
0 [0, 1, 2, 3] 1 [1, 2, 3, 5]
1 [4, 5] 2 [5]
2 [6, 7, 8] 0 [7, 8, 9]
3 [9, 10, 11, 12, 13, 14] 1 [9, 12, 14, 15]
4 [15, 16, 17] 0 [16, 17, 18]
5 [18, 19] 3 [19, 20]
I am working with a pandas dataframe that something looks like this:
col1 col2 col3 col_num
0 [-0.20447069290738076, 0.4159556680196389, -0.... [-0.10935000772973974, -0.04425263358067333, -... [51.0834196, 10.4234469] 3160
1 [-0.42439951483476124, -0.3135960467759942, 0.... [0.3842614765721414, -0.06756644506033657, 0.4... [45.5643442, 17.0118954] 3159
3 [0.3158755226012898, -0.007057682056994253, 0.... [-0.33158941456615376, 0.09637640660002277, -0... [50.6402809, 4.6667145] 3157
5 [-0.011089723491692679, -0.01649481399305317, ... [-0.02827408211098023, 0.00019040943944721592,... [53.45733965, -2.22695880505223] 3157
I would like to concatenate vectors across rows as so:
df['col1'] + df['col2'] + df['col3'] + df['col_num'].transform(lambda item: [item])
However I am prompted with the following error:
/opt/conda/lib/python3.6/site-packages/pandas/core/ops.py in <lambda>(x)
708 if is_object_dtype(lvalues):
709 return libalgos.arrmap_object(lvalues,
--> 710 lambda x: op(x, rvalues))
711 raise
712
ValueError: operands could not be broadcast together with shapes (30,) (86597,)
It's looking like for some reason ti's getting stuck at concatenating the 3rd column, which only has 2 dimensions. The data is 86597 rows long. How can I fix this error?
You can convert problematic column to list like:
df['col1'] + df['col2'] + df['col3'].apply(list) + df['col_num'].transform(lambda x: [x])
Another solution is convert all lists to 2d numpy arrays and use hstack, if same length of lists in each column, because you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks:
np.random.seed(123)
N = 10
df = pd.DataFrame({
"col1": [np.random.randint(10, size=3) for i in range(N)],
"col2": [np.random.randint(10, size=3) for i in range(N)],
"col3": [np.random.randint(10, size=2) for i in range(N)],
'col_num': range(N)
})
print (df)
col1 col2 col3 col_num
0 [2, 2, 6] [9, 3, 4] [2, 4] 0
1 [1, 3, 9] [6, 1, 5] [8, 1] 1
2 [6, 1, 0] [6, 2, 1] [2, 1] 2
3 [1, 9, 0] [8, 3, 5] [1, 3] 3
4 [0, 9, 3] [0, 2, 6] [5, 9] 4
5 [4, 0, 0] [2, 4, 4] [0, 8] 5
6 [4, 1, 7] [6, 3, 0] [1, 6] 6
7 [3, 2, 4] [6, 4, 7] [3, 3] 7
8 [7, 2, 4] [6, 7, 1] [5, 9] 8
9 [8, 0, 7] [5, 7, 9] [7, 9] 9
a = np.array(df['col1'].values.tolist())
b = np.array(df['col2'].values.tolist())
c = np.array(df['col3'].values.tolist())
#create Nx1 array
d = df['col_num'].values[:, None]
arr = np.hstack((a,b,c, d))
print (arr)
[[2 2 6 9 3 4 2 4 0]
[1 3 9 6 1 5 8 1 1]
[6 1 0 6 2 1 2 1 2]
[1 9 0 8 3 5 1 3 3]
[0 9 3 0 2 6 5 9 4]
[4 0 0 2 4 4 0 8 5]
[4 1 7 6 3 0 1 6 6]
[3 2 4 6 4 7 3 3 7]
[7 2 4 6 7 1 5 9 8]
[8 0 7 5 7 9 7 9 9]]
df = pd.DataFrame(arr)
print (df)
0 1 2 3 4 5 6 7 8
0 2 2 6 9 3 4 2 4 0
1 1 3 9 6 1 5 8 1 1
2 6 1 0 6 2 1 2 1 2
3 1 9 0 8 3 5 1 3 3
4 0 9 3 0 2 6 5 9 4
5 4 0 0 2 4 4 0 8 5
6 4 1 7 6 3 0 1 6 6
7 3 2 4 6 4 7 3 3 7
8 7 2 4 6 7 1 5 9 8
9 8 0 7 5 7 9 7 9 9
First, I want to reshape the 2-D to 4-D tensor using tf.reshape().
I thought tf.reshape() will transform
[batch, array] -> [batch, width, height, channels] (NHWC) order
but in practice it transformed
[batch, array] -> [batch, channels, width, height] (NCHW) order
Example:
a = np.array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]])
print(a.shape)
# [batch_size, channels, height, width]
b = sess.run(tf.reshape(a, shape=[2, 3, 4, 4]))
# [batch_size, height, width, channels]
c = sess.run(tf.reshape(a, shape=[2, 4, 4, 3]))
print(b)
print('*******')
print(c)
The result was:
(2, 48)
[[[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]]
[[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]]]
*******
[[[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[13 14 15]
[16 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 1]
[ 2 3 4]]
[[ 5 6 7]
[ 8 9 10]
[11 12 13]
[14 15 16]]]
[[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[13 14 15]
[16 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 1]
[ 2 3 4]]
[[ 5 6 7]
[ 8 9 10]
[11 12 13]
[14 15 16]]]]
So, I changed data_format='channels_first' for conv and pooling layers to using the reshaped tensor in NCHW order. In fact, the training was good. --verbose: it gave better result, as mention by #mrry in here, and I think it could be understandable because NCHW is the default order of cuDNN.
However, I can not add image to summary using tf.summary.image(), which is documented here, because the required tensor shape should be in [batch_size, height, width, channels] order.
Moreover, if I train and visualize the input images in [batch, width, height, channels] order, it represented incorrect images.
And I worth to mention that, the training result was not as good as using [batch, channels, width, height] order.
There are several questions:
1. Why tf.reshape() transform [batch , array] -> (NCHW) order instead of (NHWC) order ? I tested with both tf CPU and GPU, same result. I also used np.reshape(), also same result. (This's why I think I could misunderstand something here)
2. How can I visualize images in tensorboard using tf.summary.image() in (NCHW) order? (question #2 solved using advice from #Maosi Chen. Thanks)
I've trained the model on GPU (version 1.4), the images are from CIFAR-10 dataset.
Thanks
You can reorder the dimensions by tf.transpose (https://www.tensorflow.org/api_docs/python/tf/transpose).
Note that perm elements are the dimension indices of the source tensor (a)
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
a = np.array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]])
print(a.shape)
# [batch_size, channels, height, width]
b = sess.run(tf.reshape(a, shape=[2, 3, 4, 4]))
# [batch_size, height, width, channels]
c = sess.run(tf.transpose(b, perm=[0, 2, 3, 1]))
print(b)
print('*******')
print(c)
Results:
(2, 48) [[[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]]
[[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]]]
******* [[[[ 1 1 1] [ 2 2 2] [ 3 3 3] [ 4 4 4]]
[[ 5 5 5] [ 6 6 6] [ 7 7 7] [ 8 8 8]]
[[ 9 9 9] [10 10 10] [11 11 11] [12 12 12]]
[[13 13 13] [14 14 14] [15 15 15] [16 16 16]]]
[[[ 1 1 1] [ 2 2 2] [ 3 3 3] [ 4 4 4]]
[[ 5 5 5] [ 6 6 6] [ 7 7 7] [ 8 8 8]]
[[ 9 9 9] [10 10 10] [11 11 11] [12 12 12]]
[[13 13 13] [14 14 14] [15 15 15] [16 16 16]]]]
I am new to pandas and I am facing the following problem:
I have 2 data frames:
df1 :
x y
1 3 4
2 nan
3 6
4 nan
5 9 2
6 1 4 9
df2:
x y
1 2 3 6 1 5
2 4 1 8 7 5
3 6 3 1 4 5
4 2 1 3 5 4
5 9 2 3 8 7
6 1 4 5 3 7
The size of the two is same.
I want to merge the two dataframes such that all the resulting dataframe i get is the following:
result :
x y
1 3 4 6 1 5
2 4 1 8 7 5
3 6 3 1 4 5
4 2 1 3 5 4
5 9 2 3 8 7
6 1 4 5 6 7
So in the result, priority is given to df2. If there is a value in df2, it is put first and the remaining values are put from df1 (they have the same position as in df1). There should be no repeated values in the result (i.e if a value is in position 1 in df1 and position 3 in df2, then that value should come only in position 1 in the result and not repeat)
Any kind of help will be appreciated.
Thanks!
IIUC
Setup
df1 = pd.DataFrame(dict(x=range(1, 7),
y=[[3, 4], None, [6], None, [9, 2], [1, 4, 9]]))
df2 = pd.DataFrame(dict(x=range(1, 7), y=[[2, 3, 6, 1, 5], [4, 1, 8, 7, 5],
[6, 3, 1, 4, 5], [2, 1, 3, 5, 4],
[9, 2, 3, 8, 7], [1, 4, 5, 3, 7]]))
print df1
print
print df2
x y
0 1 [3, 4]
1 2 None
2 3 [6]
3 4 None
4 5 [9, 2]
5 6 [1, 4, 9]
x y
0 1 [2, 3, 6, 1, 5]
1 2 [4, 1, 8, 7, 5]
2 3 [6, 3, 1, 4, 5]
3 4 [2, 1, 3, 5, 4]
4 5 [9, 2, 3, 8, 7]
5 6 [1, 4, 5, 3, 7]
convert to something more usable:
df1_ = df1.set_index('x').y.apply(pd.Series)
df2_ = df2.set_index('x').y.apply(pd.Series)
print df1_
print
print df2_
0 1 2
x
1 3.0 4.0 NaN
2 NaN NaN NaN
3 6.0 NaN NaN
4 NaN NaN NaN
5 9.0 2.0 NaN
6 1.0 4.0 9.0
0 1 2 3 4
x
1 2 3 6 1 5
2 4 1 8 7 5
3 6 3 1 4 5
4 2 1 3 5 4
5 9 2 3 8 7
6 1 4 5 3 7
Combine with priority given to df1 (I think you meant df1 as that what was consistent with my interpretation of your question and the expected output you provided) then reducing to eliminate duplicates:
print df1_.combine_first(df2_).apply(lambda x: x.unique(), axis=1)
0 1 2 3 4
x
1 3 4 6 1 5
2 4 1 8 7 5
3 6 3 1 4 5
4 2 1 3 5 4
5 9 2 3 8 7
6 1 4 9 3 7