transposing data in array using numpy - numpy

I have list as following and need to be tranposed to a numpy array
samplelist= [ [ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
]
Expected Result:
samplearray = [ [ ['Name-1','Age-1'], ['Name-2','Age-2'], ['Name-3','Age-3'] ],
[ ['new_Name_1','new_Age_1], ['new_Name_2','new_Age_2'], ['new_Name_3','new_Age_3'] ]
]
np.transpose results:
np.transpose(a)
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')

samplelist is a 3-D array.
In [58]: samplelist.shape
Out[58]: (2, 2, 3)
Using transpose swaps the first and last axes (0 and 2):
In [55]: samplelist.T
Out[55]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
In [57]: samplelist.swapaxes(0,2)
Out[57]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
To get the desired array, swap axes 1 and 2:
import numpy as np
samplelist = np.array([
[ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
])
print(samplelist.swapaxes(1,2))
# [[['Name-1' 'Age-1']
# ['Name-2' 'Age-2']
# ['Name-3' 'Age-3']]
# [['new_Name_1' 'new_Age_1']
# ['new_Name_2' 'new_Age_2']
# ['new_Name_3' 'new_Age_3']]]

Related

Index based dot product with numpy

I am trying to optimize a transformation problem and let numpy do as much heavy-lifting as possible.
In my case I have a range of coordinate sets that each have to be dotted with corresponding indexed roll/pitch/yaw values.
Programatically it looks like this:
In [1]: import numpy as np
...: from ds.tools.math import rotation_array
...: from math import pi
...:
...: rpy1 = rotation_array(pi, 0.001232, 1.234243)
...: rpy2 = rotation_array(pi/1, 1.325, 0.5674543)
In [2]: rpy1
Out[2]:
array([[ 3.30235500e-01, 9.43897768e-01, -1.23199969e-03],
[ 9.43898485e-01, -3.30235750e-01, 1.22464587e-16],
[-4.06850342e-04, -1.16288264e-03, -9.99999241e-01]])
In [3]: rpy2
Out[3]:
array([[ 2.05192356e-01, 1.30786082e-01, -9.69943863e-01],
[ 5.37487075e-01, -8.43271987e-01, 2.97991829e-17],
[-8.17926489e-01, -5.21332290e-01, -2.43328794e-01]])
...:
...: a1 = np.array([[-9.64996132, -5.42488639, -3.08443],
...: [-8.08814188, -4.56431952, -3.01381]])
...:
...: a2 = np.array([[-6.91346292, -3.91137259, -2.82621],
...: [-4.34534536, -2.34535546, -4.87692]])
Then I dot the coordinates in a1 with rpy1 and a2 with rpy2
In [4]: a1.dot(rpy1)
Out[4]:
array([[-8.30604694, -7.31349869, 3.09631641],
[-6.97801968, -6.12357288, 3.0237723 ]])
In [5]: a2.dot(rpy2)
Out[5]:
array([[-1.20926993, 3.86756074, 7.3933692 ],
[ 1.83673215, 3.95195774, 5.40143613]])
Instead of iterating over lists of a's and rpy's I want to do the whole thing in one operation. So I was hoping for that effect with the following code, so that each set of coordinates in a12 would be dotted with the corresponding indexed array from rpy_a.
But as it is clear, from the output I an getting more than I was hoping for:
In [6]: rpy_a = np.array([rpy1, rpy2])
...:
...: a12 = np.array([a1, a2])
In [7]: a12.dot(rpy_a)
Out[7]:
array([[[[-8.30604694, -7.31349869, 3.09631641],
[-2.37306761, 4.92058705, 10.1104514 ]],
[[-6.97801968, -6.12357288, 3.0237723 ],
[-1.6478126 , 4.36234287, 8.57839034]]],
[[[-5.9738597 , -5.23064061, 2.83472524],
[-1.20926993, 3.86756074, 7.3933692 ]],
[[-3.64678058, -3.32137028, 4.88226976],
[ 1.83673215, 3.95195774, 5.40143613]]]])
What I need is:
array([[[-8.30604694, -7.31349869, 3.09631641],
[-6.97801968, -6.12357288, 3.0237723 ]],
[[-1.20926993, 3.86756074, 7.3933692 ],
[ 1.83673215, 3.95195774, 5.40143613]]])
Can anyone tell me how to achieve this?
EDIT:
Runnable example:
import numpy as np
rpy1 = np.array([[ 3.30235500e-01, 9.43897768e-01, -1.23199969e-03],
[ 9.43898485e-01, -3.30235750e-01, 1.22464587e-16],
[-4.06850342e-04, -1.16288264e-03, -9.99999241e-01]])
rpy2 = np.array([[ 2.05192356e-01, 1.30786082e-01, -9.69943863e-01],
[ 5.37487075e-01, -8.43271987e-01, 2.97991829e-17],
[-8.17926489e-01, -5.21332290e-01, -2.43328794e-01]])
a1 = np.array([[-9.64996132, -5.42488639, -3.08443],
[-8.08814188, -4.56431952, -3.01381]])
a2 = np.array([[-6.91346292, -3.91137259, -2.82621],
[-4.34534536, -2.34535546, -4.87692]])
print(a1.dot(rpy1))
# array([[-8.30604694, -7.31349869, 3.09631641],
# [-6.97801968, -6.12357288, 3.0237723 ]])
print(a2.dot(rpy2))
# array([[-1.20926993, 3.86756074, 7.3933692 ],
# [ 1.83673215, 3.95195774, 5.40143613]])
rpy_a = np.array([rpy1, rpy2])
a12 = np.array([a1, a2])
print(a12.dot(rpy_a))
# Result:
# array([[[[-8.30604694, -7.31349869, 3.09631641],
# [-2.37306761, 4.92058705, 10.1104514 ]],
# [[-6.97801968, -6.12357288, 3.0237723 ],
# [-1.6478126 , 4.36234287, 8.57839034]]],
# [[[-5.9738597 , -5.23064061, 2.83472524],
# [-1.20926993, 3.86756074, 7.3933692 ]],
# [[-3.64678058, -3.32137028, 4.88226976],
# [ 1.83673215, 3.95195774, 5.40143613]]]])
# Need:
# array([[[-8.30604694, -7.31349869, 3.09631641],
# [-6.97801968, -6.12357288, 3.0237723 ]],
# [[-1.20926993, 3.86756074, 7.3933692 ],
# [ 1.83673215, 3.95195774, 5.40143613]]])
Assuming you want to treat an arbitrary number of arrays rpy1, rpy2, ..., rpyn and a1, a2, ..., an, I suggest explicit first axis concatenation with explicit broadcasting, simply for the cause of "explicit is better than implicit":
a12 = np.concatenate([_a[None, ...] for _a in (a1, a2)], axis=0)
rpy_a = np.concatenate([_a[None, ...] for _a in (rpy1, rpy2)], axis=0)
This is equal to:
a12 = np.array([a1, a2])
rpy_a = np.array([rpy1, rpy2])
np.array requires less code and is also faster than my explicit approach, but I just like defining the axis explicitly so that everyone reading the code can guess the resulting shape without executing it.
Whatever path you choose, the important part is the following:
np.einsum('jnk,jkm->jnm', a12, rpy_a)
# Out:
array([[[-8.30604694, -7.3134987 , 3.09631641],
[-6.97801969, -6.12357288, 3.0237723 ]],
[[-1.20926993, 3.86756074, 7.3933692 ],
[ 1.83673215, 3.95195774, 5.40143613]]])
Using the Einstein summation convention, you can define your np.matmul (equal to np.dot for 2D-arrays) to be executed along a specific axis.
In this case, we define the concatenation axis j (or first dim. or axis 0) to be the shared axis, along which the operation 'nk,km->nm' (equal to np.matmul, see the signature description in the out parameter) is performed.
It is also possible to simply use np.matmul (or the python operator #) for the same results:
np.matmul(a12, rpy_a)
a12 # rpy_a
But again: For the general case, where the concatenation axis or shapes may change, the more explicit np.einsum is preferable. If you know that no changes will be made to shapes etc., np.matmul should be preferred (less code and faster).

What is the difference between the following matrix?

I have a piece of code like the following. I have to implement image2vector() that takes an input of shape (length, height, 3) and returns a vector of shape (length*height*3). It doesn't give me a result of what I expect. Actually, I don't understand the difference between the result which I got and the expected one.
def image2vector(image):
v = None
v = image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
return v
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
print ("image2vector(image) = " + str(image2vector(image)))
I got te following result:
image2vector(image) = [[ 0.67826139 0.29380381 0.90714982 0.52835647 0.4215251 0.45017551
0.92814219 0.96677647 0.85304703 0.52351845 0.19981397 0.27417313
0.60659855 0.00533165 0.10820313 0.49978937 0.34144279 0.94630077]]
But I want to get the following one:
[[ 0.67826139] [ 0.29380381] [ 0.90714982] [ 0.52835647] [ 0.4215251 ] [ 0.45017551] [ 0.92814219] [ 0.96677647] [ 0.85304703] [ 0.52351845] [ 0.19981397] [ 0.27417313] [ 0.60659855] [ 0.00533165] [ 0.10820313] [ 0.49978937] [ 0.34144279] [ 0.94630077]]
What is the difference between them? How I get the second matrix from the first one?
Your image does not have the shape (length, height, 3)
In [1]: image = np.array([[[ 0.67826139, 0.29380381],
...: [ 0.90714982, 0.52835647],
...: [ 0.4215251 , 0.45017551]],
...:
...: [[ 0.92814219, 0.96677647],
...: [ 0.85304703, 0.52351845],
...: [ 0.19981397, 0.27417313]],
...:
...: [[ 0.60659855, 0.00533165],
...: [ 0.10820313, 0.49978937],
...: [ 0.34144279, 0.94630077]]])
In [2]: image.shape
Out[2]: (3, 3, 2)
and you can't do the reshape you try:
In [3]: image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-aac5649a99ea> in <module>
----> 1 image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
ValueError: cannot reshape array of size 18 into shape (1,9,18)
It has only 18 elements; you can't increase the number of elements with reshape.
In [4]: image.reshape(1, image.shape[0] * image.shape[1] * image.shape[2])
Out[4]:
array([[0.67826139, 0.29380381, 0.90714982, 0.52835647, 0.4215251 ,
0.45017551, 0.92814219, 0.96677647, 0.85304703, 0.52351845,
0.19981397, 0.27417313, 0.60659855, 0.00533165, 0.10820313,
0.49978937, 0.34144279, 0.94630077]])
In [5]: _.shape
Out[5]: (1, 18)
The apparently desired shape is:
In [6]: image.reshape(image.shape[0] * image.shape[1] * image.shape[2],1)
Out[6]:
array([[0.67826139],
[0.29380381],
[0.90714982],
[0.52835647],
...
[0.94630077]])
In [7]: _.shape
Out[7]: (18, 1)
The difference if you want just a vector array, or you want a row or column vector.
usually column vector "vertical vector" has the shape(n,1) and row vector "horizontal" has the shape (1,n)
import numpy as np
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
reshapedImage = image.reshape(18,1)
reshapedImage
array([[0.67826139],
[0.29380381],
[0.90714982],
[0.52835647],
[0.4215251],
[0.45017551],
[0.92814219],
[0.96677647],
[0.85304703],
[0.52351845],
[0.19981397],
[0.27417313],
[0.60659855],
[0.00533165],
[0.10820313],
[0.49978937],
[0.34144279],
[0.94630077]], dtype=object)

Get column-wise maximums from a NumPy array

I have a 2D array, say
x = np.random.rand(10, 3)
array([[ 0.51158246, 0.51214272, 0.1107923 ],
[ 0.5210391 , 0.85308284, 0.63227215],
[ 0.57239625, 0.06276943, 0.1069803 ],
[ 0.71627613, 0.66454443, 0.56771438],
[ 0.24595493, 0.01007568, 0.84959605],
[ 0.99158904, 0.25034553, 0.00144037],
[ 0.43292656, 0.9247424 , 0.5123086 ],
[ 0.07224077, 0.57230282, 0.88522979],
[ 0.55665913, 0.20119776, 0.58865823],
[ 0.55129624, 0.26226446, 0.63070611]])
Then I find the indexes of maximum elements along the columns:
indexes = np.argmax(x, axis=0)
array([5, 6, 7])
So far so good.
But how do I actually get those elements? That is, how do I get ?some_operation?(x, indexes) == [0.99158904, 0.9247424, 0.88522979]?
Note that I need both the indexes and the associated values.
The best I could come up with was x[indexes, range(x.shape[1])], but it looks kinda complicated and inefficient. Is there a more idiomatic way?
You can use np.amax to find max value along an axis.
Using your example (x is the original array in your post):
In[1]: np.argmax(x, axis=0)
Out[1]:
array([5, 6, 7], dtype=int64)
In[2]: np.amax(x, axis=0)
Out[2]:
array([ 0.99158904, 0.9247424 , 0.88522979])
Documentation link

How do I create a surface plot with matplotlib of a closed loop revolve about an axis given coordinate data of the 2D profile?

I have the closed loop stored as a two column by N row numpy array.
The last row of the array is the same as the first row, implying that it is, indeed, a closed loop.
The number of angular divisions in the rotation (as in, "slices of pie" so to speak) ought be set by a variable called 'angsteps'
The profile in question is plotted in the x-y coordinate plane, and is rotated about the 'x-axis'.
You can find the profile in question plotted here. https://i.imgur.com/yJoKIEp.png
I apologize for the lack of code, but the profile data has so many interdependencies that I can't post the code that generates it without basically taking a shortcut to plugging the github page for it.
a downsampled version of the curve data looks like this.
bulkmat = [[ 5.2 0. ]
[ 0.381 0. ]
[ 0.381 3.164 ]
[ 2. 3.164 ]
[ 2. 4.1 ]
[ 3.78 4.1 ]
[ 3.78 6.477 ]
[ 1.898 6.477 ]
[ 1.898 7. ]
[ 3.18 7. ]
[ 3.18 9.6 ]
[ 1.898 9.6 ]
[ 1.898 9.6 ]
[ 2.31987929 12.42620027]
[ 3.4801454 15.24663923]
[ 5.22074074 17.97407407]
[ 7.38360768 20.521262 ]
[ 9.81068861 22.80096022]
[ 12.34392593 24.72592593]
[ 14.825262 26.20891632]
[ 17.09663923 27.16268861]
[ 19. 27.5 ]
[ 19. 27.5 ]
[ 19.62962963 27.44718793]
[ 20.18518519 27.29972565]
[ 20.66666667 27.07407407]
[ 21.07407407 26.7866941 ]
[ 21.40740741 26.45404664]
[ 21.66666667 26.09259259]
[ 21.85185185 25.71879287]
[ 21.96296296 25.34910837]
[ 22. 25. ]
[ 22. 25. ]
[ 21.12125862 24.17043472]
[ 18.91060645 23.59946824]
[ 15.97201646 22.9218107 ]
[ 12.84280513 21.85346069]
[ 9.96762011 20.14089993]
[ 7.67242798 17.51028807]
[ 6.13850192 13.61665735]
[ 5.37640942 7.99310742]
[ 5.2 0. ]]
The following would be an example of a solid of revolution plotted around the z axis. As input we take some points and then create the necessary 2D arrays from them.
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
# input xy coordinates
xy = np.array([[1,0],[2,1],[2,2],[1,1.5],[1,0]])
# radial component is x values of input
r = xy[:,0]
# angular component is one revolution of 60 steps
phi = np.linspace(0, 2*np.pi, 60)
# create grid
R,Phi = np.meshgrid(r,phi)
# transform to cartesian coordinates
X = R*np.cos(Phi)
Y = R*np.sin(Phi)
# Z values are y values, repeated 60 times
Z = np.tile(xy[:,1],len(Y)).reshape(Y.shape)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax2 = fig.add_axes([0.05,0.7,0.15,.2])
ax2.plot(xy[:,0],xy[:,1], color="k")
ax.plot_surface(X, Y, Z, alpha=0.5, color='gold', rstride=1, cstride=1)
plt.show()

one-hot encoding and existing data

I have a numpy array (N,M) where some of the columns should be one-hot encoded. Please help to make a one-hot encoding using numpy and/or tensorflow.
Example:
[
[ 0.993, 0, 0.88 ]
[ 0.234, 1, 1.00 ]
[ 0.235, 2, 1.01 ]
.....
]
The 2nd column here ( with values 3 and 2 ) should be one hot-encoded, I know that there are only 3 distinct values ( 0, 1, 2 ).
The resulting array should look like:
[
[ 0.993, 0.88, 0, 0, 0 ]
[ 0.234, 1.00, 0, 1, 0 ]
[ 0.235, 1.01, 1, 0, 0 ]
.....
]
Like that I would be able to feed this array into the tensorflow.
Please notice that 2nd column was removed and it's one-hot version was appended in the end of each sub-array.
Any help would be highly appreciated.
Thanks in advance.
Update:
Here is what I have right now:
Well, not exactly...
1. I have more than 3 columns in the array...but I still want to do it only with 2nd..
2. First array is structured, ie it's shape is (N,)
Here is what I have:
def one_hot(value, max_value):
value = int(value)
a = np.zeros(max_value, 'uint8')
if value != 0:
a[value] = 1
return a
# data is structured array with the shape of (N,)
# it has strings, ints, floats inside..
# was get by np.genfromtxt(dtype=None)
unique_values = dict()
unique_values['categorical1'] = 1
unique_values['categorical2'] = 2
for row in data:
row[col] = unique_values[row[col]]
codes = np.zeros((data.shape[0], len(unique_values)))
idx = 0
for row in data:
codes[idx] = one_hot(row[col], len(unique_values)) # could be optimised by not creating new array every time
idx += 1
data = np.c_[data[:, [range(0, col), range(col + 1, 32)]], codes[data[:, col].astype(int)]]
Also trying to concatenate via:
print data.shape # shape (5000,)
print codes.shape # shape (5000,3)
data = np.concatenate((data, codes), axis=1)
Here's one approach -
In [384]: a # input array
Out[384]:
array([[ 0.993, 0. , 0.88 ],
[ 0.234, 1. , 1. ],
[ 0.235, 2. , 1.01 ]])
In [385]: codes = np.array([[0,0,0],[0,1,0],[1,0,0]]) # define codes here
In [387]: codes
Out[387]:
array([[0, 0, 0], # encoding for 0
[0, 1, 0], # encoding for 1
[1, 0, 0]]) # encoding for 2
# Slice out the second column and append one-hot encoded array
In [386]: np.c_[a[:,[0,2]], codes[a[:,1].astype(int)]]
Out[386]:
array([[ 0.993, 0.88 , 0. , 0. , 0. ],
[ 0.234, 1. , 0. , 1. , 0. ],
[ 0.235, 1.01 , 1. , 0. , 0. ]])