Permuting entire rows in a 2d numpy array

Permuting entire rows in a 2d numpy array - numpy

Consider numpy array arr , shown below:
arr = ([[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]])
I want to find all row permutations of arr. NOTE: the order of elements in any given row is unchanged. It is the entire rows that are being permuted.
Because arr has 5 rows, there will be 5! = 120 permutations. I’m hoping these could be ‘stacked’ into a 3d array p, having shape (120, 5, 6):
p = [[[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]],
[[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[1, 2, 3, 4, 5, 6]
[4, 8, 4, 8, 4, 8]],
… etc …
[[1, 2, 3, 4, 5, 6],
[4, 8, 4, 8, 4, 8],
[0, 1, 0, 1, 0, 1],
[2, 2, 2, 2, 2, 2],
[1, 5, 6, 3, 3, 7]]]
There is a lot of material online about permitting elements within rows, but I need help in permuting the entire rows themselves.

You can make use of itertools.permutations and np.argsort:
from itertools import permutations
out = np.array([arr[np.argsort(idx)] for idx in permutations(range(5))])
print(out)
[[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[1 2 3 4 5 6]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 2 3 4 5 6]
[4 8 4 8 4 8]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 2 3 4 5 6]]
...
[[1 2 3 4 5 6]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]
[[4 8 4 8 4 8]
[1 2 3 4 5 6]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]]

Similar answer, but you do not need to .argsort one more time
from itertools import permutations
import numpy as np
arr = np.array([[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]])
output = np.array([arr[i, :] for i in permutations(range(5))])
print(output)
[[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[1 2 3 4 5 6]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 2 3 4 5 6]
[4 8 4 8 4 8]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 2 3 4 5 6]]
...
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 5 6 3 3 7]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 5 6 3 3 7]
[2 2 2 2 2 2]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]]
This is a bit faster, here are speed comparisons:
%%timeit
output = np.array([arr[i, :] for i in permutations(range(5))])
381 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
output = np.array([arr[np.argsort(idx)] for idx in permutations(range(5))])
863 µs ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Related

Groupby transform to list in pandas does not work

Best described with an example
import pandas as pd
df = pd.DataFrame({
'a' : ['A','B','C','A','B','C','A','B','C'],
'b': [1,2,3,4,5,6,7,8,9]}
)
And i want to create a column that contains in a list the elements of column b by group of column a
resulting in the following
a b c
0 A 1 [1, 4, 7]
1 A 4 [1, 4, 7]
2 A 7 [1, 4, 7]
3 B 2 [2, 5, 8]
4 B 5 [2, 5, 8]
5 B 8 [2, 5, 8]
6 C 3 [3, 6, 9]
7 C 6 [3, 6, 9]
8 C 9 [3, 6, 9]
I can do this with groupby and apply or agg and then joining the dataframes like so
df_tmp = df.groupby('a')['b'].agg(list).reset_index()
df.merge(df_tmp, on='a')
But i would also be expecting to do the same with transform
df['c'] = df.groupby('a')['b'].transform(list)
but the column c is the same as column b
Also the following
df.groupby('a')['b'].transform(lambda x: len(x))
return a series with the values 3 i.e. the length of the grouped elements is 3 (to be expected)
Also this
df.groupby('a')['b'].transform(lambda x: list(x))
does not provide the expected result.
So to my question, how can i obtain the desired result with groupby and tranform
pandas version is 1.0.5

I come up one fix with below. PS : it should something wrong with transform , when the object type is list tuple or set..
df.groupby('a')['b'].transform(lambda x : [x.tolist()]*len(x))
Out[226]:
0 [1, 4, 7]
1 [1, 4, 7]
2 [1, 4, 7]
3 [2, 5, 8]
4 [2, 5, 8]
5 [2, 5, 8]
6 [3, 6, 9]
7 [3, 6, 9]
8 [3, 6, 9]
Name: b, dtype: object

Interesting problem, not sure what happens with transform in the background. One go-around is to map with groupby().agg():
df['c'] = df['a'].map(df.groupby('a')['b'].agg(list))
Output:
a b c
0 A 1 [1, 4, 7]
1 B 2 [2, 5, 8]
2 C 3 [3, 6, 9]
3 A 4 [1, 4, 7]
4 B 5 [2, 5, 8]
5 C 6 [3, 6, 9]
6 A 7 [1, 4, 7]
7 B 8 [2, 5, 8]
8 C 9 [3, 6, 9]

How to use padded_batch() in TensorFlow 2.0

X = tf.range(10)
dataset = tf.data.Dataset.from_tensor_slices(X)
dataset2 = dataset.repeat(3).padded_batch(7, padded_shapes=([]))
for item in dataset2:
print(item)
output
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int32)
tf.Tensor([7 8 9 0 1 2 3], shape=(7,), dtype=int32)
tf.Tensor([4 5 6 7 8 9 0], shape=(7,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int32)
tf.Tensor([8 9], shape=(2,), dtype=int32)
How to define padded_shapes for getting result like bellow?
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int32)
tf.Tensor([7 8 9 0 1 2 3], shape=(7,), dtype=int32)
tf.Tensor([4 5 6 7 8 9 0], shape=(7,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int32)
tf.Tensor([8 9 0 0 0 0 0], shape=(7,), dtype=int32)

I solved the problem with batch(7).
dataset2 = dataset.repeat(3).batch(7).padded_batch(7, padded_shapes=([None]))
output
tf.Tensor(
[[0 1 2 3 4 5 6]
[7 8 9 0 1 2 3]
[4 5 6 7 8 9 0]
[1 2 3 4 5 6 7]
[8 9 0 0 0 0 0]], shape=(5, 7), dtype=int32)

ValueError: operands could not be broadcast together with shapes in concatenatinng arrays across pandas columns

I am working with a pandas dataframe that something looks like this:
col1 col2 col3 col_num
0 [-0.20447069290738076, 0.4159556680196389, -0.... [-0.10935000772973974, -0.04425263358067333, -... [51.0834196, 10.4234469] 3160
1 [-0.42439951483476124, -0.3135960467759942, 0.... [0.3842614765721414, -0.06756644506033657, 0.4... [45.5643442, 17.0118954] 3159
3 [0.3158755226012898, -0.007057682056994253, 0.... [-0.33158941456615376, 0.09637640660002277, -0... [50.6402809, 4.6667145] 3157
5 [-0.011089723491692679, -0.01649481399305317, ... [-0.02827408211098023, 0.00019040943944721592,... [53.45733965, -2.22695880505223] 3157
I would like to concatenate vectors across rows as so:
df['col1'] + df['col2'] + df['col3'] + df['col_num'].transform(lambda item: [item])
However I am prompted with the following error:
/opt/conda/lib/python3.6/site-packages/pandas/core/ops.py in <lambda>(x)
708 if is_object_dtype(lvalues):
709 return libalgos.arrmap_object(lvalues,
--> 710 lambda x: op(x, rvalues))
711 raise
712
ValueError: operands could not be broadcast together with shapes (30,) (86597,)
It's looking like for some reason ti's getting stuck at concatenating the 3rd column, which only has 2 dimensions. The data is 86597 rows long. How can I fix this error?

You can convert problematic column to list like:
df['col1'] + df['col2'] + df['col3'].apply(list) + df['col_num'].transform(lambda x: [x])
Another solution is convert all lists to 2d numpy arrays and use hstack, if same length of lists in each column, because you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks:
np.random.seed(123)
N = 10
df = pd.DataFrame({
"col1": [np.random.randint(10, size=3) for i in range(N)],
"col2": [np.random.randint(10, size=3) for i in range(N)],
"col3": [np.random.randint(10, size=2) for i in range(N)],
'col_num': range(N)
})
print (df)
col1 col2 col3 col_num
0 [2, 2, 6] [9, 3, 4] [2, 4] 0
1 [1, 3, 9] [6, 1, 5] [8, 1] 1
2 [6, 1, 0] [6, 2, 1] [2, 1] 2
3 [1, 9, 0] [8, 3, 5] [1, 3] 3
4 [0, 9, 3] [0, 2, 6] [5, 9] 4
5 [4, 0, 0] [2, 4, 4] [0, 8] 5
6 [4, 1, 7] [6, 3, 0] [1, 6] 6
7 [3, 2, 4] [6, 4, 7] [3, 3] 7
8 [7, 2, 4] [6, 7, 1] [5, 9] 8
9 [8, 0, 7] [5, 7, 9] [7, 9] 9
a = np.array(df['col1'].values.tolist())
b = np.array(df['col2'].values.tolist())
c = np.array(df['col3'].values.tolist())
#create Nx1 array
d = df['col_num'].values[:, None]
arr = np.hstack((a,b,c, d))
print (arr)
[[2 2 6 9 3 4 2 4 0]
[1 3 9 6 1 5 8 1 1]
[6 1 0 6 2 1 2 1 2]
[1 9 0 8 3 5 1 3 3]
[0 9 3 0 2 6 5 9 4]
[4 0 0 2 4 4 0 8 5]
[4 1 7 6 3 0 1 6 6]
[3 2 4 6 4 7 3 3 7]
[7 2 4 6 7 1 5 9 8]
[8 0 7 5 7 9 7 9 9]]
df = pd.DataFrame(arr)
print (df)
0 1 2 3 4 5 6 7 8
0 2 2 6 9 3 4 2 4 0
1 1 3 9 6 1 5 8 1 1
2 6 1 0 6 2 1 2 1 2
3 1 9 0 8 3 5 1 3 3
4 0 9 3 0 2 6 5 9 4
5 4 0 0 2 4 4 0 8 5
6 4 1 7 6 3 0 1 6 6
7 3 2 4 6 4 7 3 3 7
8 7 2 4 6 7 1 5 9 8
9 8 0 7 5 7 9 7 9 9

Change axis of matrix - Python (concatenate)

I would like concatenate two matricies with different size
[[1 1 1]
[2 3 2]
[5 5 3]
[3 2 5]
[4 4 4]]
[[1 3 2 5 4]
[1 2 5 3 4]]
to have this matrix
[[1 1 1 1 1]
[2 3 2 3 2]
[5 5 3 2 5]
[3 2 5 5 3]
[4 4 4 4 4]]
I know size of matricies are differents ((5, 3), 'VS', (2, 5))
but i want concatenate these matricies between them.
It's possible to change axis of second matrix ?
Thanks !

How to add dimension to a python numpy array

Let's suppose i have this array:
import numpy as np
x = np.arange(4)
array([0, 1, 2, 3])
I want to write a very basic formula which will generate from x this array:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
What is the shortest way to do that with python and numpy ?
Thanks

The easiest way I can think of is to use numpy broadcasting.
x[:,None]+x
Out[87]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])

This should do what you want (note that I have introduced a different number of rows (5) than of columns (4) to make a clear distinction):
import numpy as np
A = np.tile(np.arange(4).reshape(1,4),(5,1))+np.tile(np.arange(5).reshape(5,1),(1,4))
print(A)
A break-down of steps:
np.tile(np.arange(4).reshape(1,4),(5,1)) creates a (5,4) matrix with entries 0,1,2,3 in each row:
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
np.tile(np.arange(5).reshape(5,1),(1,4)) creates a (5,4) matrix with 0,1,2,3,4 in each column:
[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]
[4 4 4 4]]
The sum of the two results in what you want:
[[0 1 2 3]
[1 2 3 4]
[2 3 4 5]
[3 4 5 6]
[4 5 6 7]]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Permuting entire rows in a 2d numpy array - numpy

Related

Groupby transform to list in pandas does not work

How to use padded_batch() in TensorFlow 2.0

ValueError: operands could not be broadcast together with shapes in concatenatinng arrays across pandas columns

Change axis of matrix - Python (concatenate)

How to add dimension to a python numpy array

Categories

Resources