Index based dot product with numpy - numpy

I am trying to optimize a transformation problem and let numpy do as much heavy-lifting as possible.
In my case I have a range of coordinate sets that each have to be dotted with corresponding indexed roll/pitch/yaw values.
Programatically it looks like this:
In [1]: import numpy as np
...: from ds.tools.math import rotation_array
...: from math import pi
...:
...: rpy1 = rotation_array(pi, 0.001232, 1.234243)
...: rpy2 = rotation_array(pi/1, 1.325, 0.5674543)
In [2]: rpy1
Out[2]:
array([[ 3.30235500e-01, 9.43897768e-01, -1.23199969e-03],
[ 9.43898485e-01, -3.30235750e-01, 1.22464587e-16],
[-4.06850342e-04, -1.16288264e-03, -9.99999241e-01]])
In [3]: rpy2
Out[3]:
array([[ 2.05192356e-01, 1.30786082e-01, -9.69943863e-01],
[ 5.37487075e-01, -8.43271987e-01, 2.97991829e-17],
[-8.17926489e-01, -5.21332290e-01, -2.43328794e-01]])
...:
...: a1 = np.array([[-9.64996132, -5.42488639, -3.08443],
...: [-8.08814188, -4.56431952, -3.01381]])
...:
...: a2 = np.array([[-6.91346292, -3.91137259, -2.82621],
...: [-4.34534536, -2.34535546, -4.87692]])
Then I dot the coordinates in a1 with rpy1 and a2 with rpy2
In [4]: a1.dot(rpy1)
Out[4]:
array([[-8.30604694, -7.31349869, 3.09631641],
[-6.97801968, -6.12357288, 3.0237723 ]])
In [5]: a2.dot(rpy2)
Out[5]:
array([[-1.20926993, 3.86756074, 7.3933692 ],
[ 1.83673215, 3.95195774, 5.40143613]])
Instead of iterating over lists of a's and rpy's I want to do the whole thing in one operation. So I was hoping for that effect with the following code, so that each set of coordinates in a12 would be dotted with the corresponding indexed array from rpy_a.
But as it is clear, from the output I an getting more than I was hoping for:
In [6]: rpy_a = np.array([rpy1, rpy2])
...:
...: a12 = np.array([a1, a2])
In [7]: a12.dot(rpy_a)
Out[7]:
array([[[[-8.30604694, -7.31349869, 3.09631641],
[-2.37306761, 4.92058705, 10.1104514 ]],
[[-6.97801968, -6.12357288, 3.0237723 ],
[-1.6478126 , 4.36234287, 8.57839034]]],
[[[-5.9738597 , -5.23064061, 2.83472524],
[-1.20926993, 3.86756074, 7.3933692 ]],
[[-3.64678058, -3.32137028, 4.88226976],
[ 1.83673215, 3.95195774, 5.40143613]]]])
What I need is:
array([[[-8.30604694, -7.31349869, 3.09631641],
[-6.97801968, -6.12357288, 3.0237723 ]],
[[-1.20926993, 3.86756074, 7.3933692 ],
[ 1.83673215, 3.95195774, 5.40143613]]])
Can anyone tell me how to achieve this?
EDIT:
Runnable example:
import numpy as np
rpy1 = np.array([[ 3.30235500e-01, 9.43897768e-01, -1.23199969e-03],
[ 9.43898485e-01, -3.30235750e-01, 1.22464587e-16],
[-4.06850342e-04, -1.16288264e-03, -9.99999241e-01]])
rpy2 = np.array([[ 2.05192356e-01, 1.30786082e-01, -9.69943863e-01],
[ 5.37487075e-01, -8.43271987e-01, 2.97991829e-17],
[-8.17926489e-01, -5.21332290e-01, -2.43328794e-01]])
a1 = np.array([[-9.64996132, -5.42488639, -3.08443],
[-8.08814188, -4.56431952, -3.01381]])
a2 = np.array([[-6.91346292, -3.91137259, -2.82621],
[-4.34534536, -2.34535546, -4.87692]])
print(a1.dot(rpy1))
# array([[-8.30604694, -7.31349869, 3.09631641],
# [-6.97801968, -6.12357288, 3.0237723 ]])
print(a2.dot(rpy2))
# array([[-1.20926993, 3.86756074, 7.3933692 ],
# [ 1.83673215, 3.95195774, 5.40143613]])
rpy_a = np.array([rpy1, rpy2])
a12 = np.array([a1, a2])
print(a12.dot(rpy_a))
# Result:
# array([[[[-8.30604694, -7.31349869, 3.09631641],
# [-2.37306761, 4.92058705, 10.1104514 ]],
# [[-6.97801968, -6.12357288, 3.0237723 ],
# [-1.6478126 , 4.36234287, 8.57839034]]],
# [[[-5.9738597 , -5.23064061, 2.83472524],
# [-1.20926993, 3.86756074, 7.3933692 ]],
# [[-3.64678058, -3.32137028, 4.88226976],
# [ 1.83673215, 3.95195774, 5.40143613]]]])
# Need:
# array([[[-8.30604694, -7.31349869, 3.09631641],
# [-6.97801968, -6.12357288, 3.0237723 ]],
# [[-1.20926993, 3.86756074, 7.3933692 ],
# [ 1.83673215, 3.95195774, 5.40143613]]])

Assuming you want to treat an arbitrary number of arrays rpy1, rpy2, ..., rpyn and a1, a2, ..., an, I suggest explicit first axis concatenation with explicit broadcasting, simply for the cause of "explicit is better than implicit":
a12 = np.concatenate([_a[None, ...] for _a in (a1, a2)], axis=0)
rpy_a = np.concatenate([_a[None, ...] for _a in (rpy1, rpy2)], axis=0)
This is equal to:
a12 = np.array([a1, a2])
rpy_a = np.array([rpy1, rpy2])
np.array requires less code and is also faster than my explicit approach, but I just like defining the axis explicitly so that everyone reading the code can guess the resulting shape without executing it.
Whatever path you choose, the important part is the following:
np.einsum('jnk,jkm->jnm', a12, rpy_a)
# Out:
array([[[-8.30604694, -7.3134987 , 3.09631641],
[-6.97801969, -6.12357288, 3.0237723 ]],
[[-1.20926993, 3.86756074, 7.3933692 ],
[ 1.83673215, 3.95195774, 5.40143613]]])
Using the Einstein summation convention, you can define your np.matmul (equal to np.dot for 2D-arrays) to be executed along a specific axis.
In this case, we define the concatenation axis j (or first dim. or axis 0) to be the shared axis, along which the operation 'nk,km->nm' (equal to np.matmul, see the signature description in the out parameter) is performed.
It is also possible to simply use np.matmul (or the python operator #) for the same results:
np.matmul(a12, rpy_a)
a12 # rpy_a
But again: For the general case, where the concatenation axis or shapes may change, the more explicit np.einsum is preferable. If you know that no changes will be made to shapes etc., np.matmul should be preferred (less code and faster).

Related

How to Plot in 3D Principal Component Analysis Visualizations, using the fast PCA script from this answer

I found this fast script here in Stack Overflow for perform PCA with a given numpy array.
I don't know how to plot this in 3D, and also plot in 3D the Cumulative Explained Variances and the Number of Components. This fast script was perform with covariance method, and not with singular value decomposition, maybe that's the reason why I can't get my Cumulative Variances?
I tried to plotting with this, but it doesn't work.
This is the code and my output:
from numpy import array, dot, mean, std, empty, argsort
from numpy.linalg import eigh, solve
from numpy.random import randn
from matplotlib.pyplot import subplots, show
def cov(X):
"""
Covariance matrix
note: specifically for mean-centered data
note: numpy's `cov` uses N-1 as normalization
"""
return dot(X.T, X) / X.shape[0]
# N = data.shape[1]
# C = empty((N, N))
# for j in range(N):
# C[j, j] = mean(data[:, j] * data[:, j])
# for k in range(j + 1, N):
# C[j, k] = C[k, j] = mean(data[:, j] * data[:, k])
# return C
def pca(data, pc_count = None):
"""
Principal component analysis using eigenvalues
note: this mean-centers and auto-scales the data (in-place)
"""
data -= mean(data, 0)
data /= std(data, 0)
C = cov(data)
E, V = eigh(C)
key = argsort(E)[::-1][:pc_count]
E, V = E[key], V[:, key]
U = dot(data, V)
print(f'Eigen Values: {E}')
print(f'Eigen Vectors: {V}')
print(f'Key: {key}')
print(f'U: {U}')
print(f'shape: {U.shape}')
return U, E, V
data = dftransformed.transpose() # df tranpose and convert to numpy
trans = pca(data, 3)[0]
fig, (ax1, ax2) = subplots(1, 2)
ax1.scatter(data[:50, 0], data[:50, 1], c = 'r')
ax1.scatter(data[50:, 0], data[50:, 1], c = 'b')
ax2.scatter(trans[:50, 0], trans[:50, 1], c = 'r')
ax2.scatter(trans[50:, 0], trans[50:, 1], c = 'b')
show()
I understand the eigen values & eigen vectors, but I can't understand this key value, the user didn't comment this section of code in the answer, anyone knows what means each variable printed?
output:
Eigen Values: [126.30390621 68.48966957 26.03124927]
Eigen Vectors: [[-0.05998409 0.05852607 -0.03437937]
[ 0.00807487 0.00157143 -0.12352761]
[-0.00341751 0.03819162 0.08697668]
...
[-0.0210582 0.06601974 -0.04013712]
[-0.03558994 0.02953385 0.01885872]
[-0.06728424 -0.04162485 -0.01508154]]
Key: [439 438 437]
U: [[-12.70954048 8.97405411 -2.79812235]
[ -4.90853527 4.36517107 0.54129243]
[ -2.49370123 0.48341147 7.26682759]
[-16.07860635 6.16100749 5.81777637]
[ -1.81893291 6.48443689 -5.8655646 ]
[ 9.03939039 2.64196391 4.22056618]
[-14.71731064 9.19532016 -2.79275543]
[ 1.60998654 8.37866823 0.86207034]
[ -4.4503797 10.12688097 -5.12453656]
[ 12.16293556 2.2594413 -2.11730311]
[-15.76505125 9.48537581 -2.73906772]
[ -2.54289959 9.86768111 -4.84802992]
[ -5.78214902 9.21901651 -8.13594627]
[ -1.35428398 5.85550586 6.30553987]
[ 12.87261987 0.96283606 -3.26982121]
[ 24.57767477 -4.28214631 6.29510659]
[ 4.13941679 3.3688288 3.01194055]
[ -2.98318764 1.32775227 7.62610929]
[ -4.44461549 -1.49258339 1.39080386]
[ -0.10590795 -0.3313904 8.46363066]
[ 6.05960739 1.03091753 5.10875657]
[-21.27737352 -3.44453629 3.25115921]
[ -1.1183025 0.55238687 10.75611405]
[-10.6359291 7.58630341 -0.55088259]
[ 4.52557492 -8.05670864 2.23113833]
[-11.07822559 1.50970501 4.66555889]
[ -6.89542628 -19.24672805 -3.71322812]
[ -0.57831362 -17.84956249 -5.52002876]
[-12.70262277 -14.05542691 -2.72417438]
[ -7.50263129 -15.83723295 -3.2635125 ]
[ -7.52780216 -17.60790567 -2.00134852]
[ -5.34422731 -17.29394266 -2.69261597]
[ 9.40597893 0.21140292 2.05522806]
[ 12.12423431 -2.80281266 7.81182024]
[ 19.51224195 4.7624575 -11.20523383]
[ 22.38102384 0.82486072 -1.64716468]
[ -8.60947699 4.12597477 -6.01885407]
[ 9.56268414 1.18190655 -5.44074124]
[ 14.97675455 3.31666971 -3.30012109]
[ 20.47530869 -1.95896058 -1.91238615]]
shape: (40, 3)
trans = pca(data, 3)[0] is the U data, since [0] selects the first index of the returned data, and pca returns U, E, V
ax2.scatter(trans[:50, 0], trans[:50, 1], c = 'r') plots the first 50 rows of column 0 against the first 50 rows of column 1, and ax2.scatter(trans[50:, 0], trans[50:, 1], c = 'b') does the same for rows from 50 to the end. This from the sample data given in this fast script, but your data only has shape: (40, 3) (e.g. only 40 rows of data).
In order to plot trans as a 3d scatter plot, extract each of the 3 columns into a separate variable and plot as a scatter plot.
# imports as shown in the linked answer
from numpy import array, dot, mean, std, empty, argsort
from numpy.linalg import eigh, solve
from numpy.random import randn
from matplotlib.pyplot import subplots, show
# other imports
import numpy as np
# test data from linked answer (e.g. this fast script)
np.random.seed(365) # makes data repeatable
data = array([randn(8) for k in range(150)]) # creates array with shape (150, 8)
data[:50, 2:4] += 5 # adds 5 to first 50 rows of columns 2:4
data[50:, 2:5] += 5 # adds 5 to to rows from 50 of columns 2:5
# function call
trans = pca(data, 3)[0] # [0] gets U returned by pca(...)
# extract each column to a separate variable
x = trans[:, 0] # all rows of column 0
y = trans[:, 1] # all rows of column 1
z = trans[:, 2] # all rows of column 2
# plot 3d scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)

Get column-wise maximums from a NumPy array

I have a 2D array, say
x = np.random.rand(10, 3)
array([[ 0.51158246, 0.51214272, 0.1107923 ],
[ 0.5210391 , 0.85308284, 0.63227215],
[ 0.57239625, 0.06276943, 0.1069803 ],
[ 0.71627613, 0.66454443, 0.56771438],
[ 0.24595493, 0.01007568, 0.84959605],
[ 0.99158904, 0.25034553, 0.00144037],
[ 0.43292656, 0.9247424 , 0.5123086 ],
[ 0.07224077, 0.57230282, 0.88522979],
[ 0.55665913, 0.20119776, 0.58865823],
[ 0.55129624, 0.26226446, 0.63070611]])
Then I find the indexes of maximum elements along the columns:
indexes = np.argmax(x, axis=0)
array([5, 6, 7])
So far so good.
But how do I actually get those elements? That is, how do I get ?some_operation?(x, indexes) == [0.99158904, 0.9247424, 0.88522979]?
Note that I need both the indexes and the associated values.
The best I could come up with was x[indexes, range(x.shape[1])], but it looks kinda complicated and inefficient. Is there a more idiomatic way?
You can use np.amax to find max value along an axis.
Using your example (x is the original array in your post):
In[1]: np.argmax(x, axis=0)
Out[1]:
array([5, 6, 7], dtype=int64)
In[2]: np.amax(x, axis=0)
Out[2]:
array([ 0.99158904, 0.9247424 , 0.88522979])
Documentation link

Numpy compare values inside to return greater index

I have a numpy array and another array:
[array([-1.67397643, -2.77258872]), array([-1.67397643, -2.77258872]), array([-2.77258872, -1.67397643]), array([-2.77258872, -1.67397643])]
Which index position inside the numpy arrays wins - i.e. -1.67397643 > -2.77258872 - so the first value would be 0.
Final output of the numpy array would be [0, 0, 1, 1] (a list is fine too)
How can I do that ?
It seems you have a list of arrays, so I would start by making them a proper numpy array:
a = [array([-1.67397643, -2.77258872]), array([-1.67397643, -2.77258872]), array([-2.77258872, -1.67397643]), array([-2.77258872, -1.67397643])]
b = np.array(a).T # .T transposes it.
c = b[0] < b[1]
c is now an array([False, False, True, True], dtype=bool), and probably serves your purpose. If you must have [0,0,1,1] instead, then:
d = np.zeros(len(c))
d[c] = 1
d is now an array([ 0., 0., 1., 1.])

With Numba's `guvectorize` targeted to CUDA, how do I specify a variable as both input and output?

I want to use Numba's guvectorize method to run code on my CUDA card. I first defined a CPU method
from numba import guvectorize
import numpy as np
#guvectorize(['float32[:,:], float32[:,:]',
'float64[:,:], float64[:,:]'],
'(n,m)->(n,m)', nopython=True, target='cpu')
def update_a_cpu(A, Anew):
n, m = A.shape
for j in range(1, n-1):
for i in range(1, m-1):
Anew[j, i] = 0.25 * (A[j, i+1] + A[j, i-1] + A[j-1, i] + A[j+1, i])
which gives the expected output for a test matrix
>>> A = np.arange(16, dtype=np.float32).reshape(4,4) # single precision for GTX card
>>> Anew = np.zeros((4,4), dtype=np.float32)
>>> res_cpu = update_a_cpu(A, Anew)
>>> print(res_cpu)
[[ 0. 0. 0. 0.]
[ 0. 5. 6. 0.]
[ 0. 9. 10. 0.]
[ 0. 0. 0. 0.]]
Actually, when targeting the CPU, Anew is mutated in place so there was no need to assign the output to res_cpu
>>> res_cpu is Anew
True
Changing the target to 'cuda' drastically changes the guvectorize behavior in a manner not documented for Generalized CUDA ufuncs. Here is the modified ufunc definition
#guvectorize(['float32[:,:], float32[:,:]',
'float64[:,:], float64[:,:]'],
'(n,m)->(n,m)', nopython=True, target='cuda')
def update_a_cuda(A, Anew):
n, m = A.shape
for j in range(1, n-1):
for i in range(1, m-1):
Anew[j, i] = 0.25 * (A[j, i+1] + A[j, i-1] + A[j-1, i] + A[j+1, i])
Now the function does not accept the second input matrix
>>> res_cuda = update_a_cuda(A, Anew)
...
TypeError: invalid number of input argument
and instead creates an empty matrix to put the value into
>>> res_cuda = update_a_cuda(A)
>>> print(res_cuda)
array([[ 1.55011636e-41, 1.55011636e-41, 1.55011636e-41, 1.55011636e-41],
[ 1.55011636e-41, 5.00000000e+00, 6.00000000e+00, 1.55011636e-41],
[ 1.55011636e-41, 9.00000000e+00, 1.00000000e+01, 1.55011636e-41],
[ 1.55011636e-41, 1.55011636e-41, 1.55011636e-41, 1.55011636e-41]], dtype=float32)
I would like the generalized ufunc to update the appropriate values of an input matrix rather than populating an empty matrix. When targeting a CUDA device, is there a way to specify a variable as both input and output?

transposing data in array using numpy

I have list as following and need to be tranposed to a numpy array
samplelist= [ [ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
]
Expected Result:
samplearray = [ [ ['Name-1','Age-1'], ['Name-2','Age-2'], ['Name-3','Age-3'] ],
[ ['new_Name_1','new_Age_1], ['new_Name_2','new_Age_2'], ['new_Name_3','new_Age_3'] ]
]
np.transpose results:
np.transpose(a)
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
samplelist is a 3-D array.
In [58]: samplelist.shape
Out[58]: (2, 2, 3)
Using transpose swaps the first and last axes (0 and 2):
In [55]: samplelist.T
Out[55]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
In [57]: samplelist.swapaxes(0,2)
Out[57]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
To get the desired array, swap axes 1 and 2:
import numpy as np
samplelist = np.array([
[ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
])
print(samplelist.swapaxes(1,2))
# [[['Name-1' 'Age-1']
# ['Name-2' 'Age-2']
# ['Name-3' 'Age-3']]
# [['new_Name_1' 'new_Age_1']
# ['new_Name_2' 'new_Age_2']
# ['new_Name_3' 'new_Age_3']]]