matrix multiplication with numpy array object and dataframe - pandas

I did this using a for loop and feel like there is a faster way to achieve it but it eludes me.
datal=[[-9.8839112e-05, -0.001128727317, -0.000197679149],
[-0.0009201639200000001, 0.0005601014289999999, 0.000496686232],
[-0.000184700668, 9.414391600000001e-05, 0.000409526574]]
bigtranfo=[array([[ 0.89442732, 0. , 0.44721334],
[ 0.44721334, 0. , -0.89442732],
[-0. , 1. , 0. ]]),
array([[ 0.27639329, 0.85065091, 0.44721334],
[ 0.13819655, 0.42532516, -0.89442732],
[-0.9510565 , 0.30901705, 0. ]]),
array([[-0.72360684, 0.52573128, 0.44721334],
[-0.36180316, 0.26286545, -0.89442732],
[-0.58778535, -0.80901692, 0. ]])]
vectorfield=[]
for i in range(0,3):
x=list(bigtransfo[i].dot(datal[['tx','ty','tz']].iloc[i]))
vectorfield.append(x)
vf=pd.DataFrame(vectorfield,columns=['tx','ty','tz'])
output:
[[-0.00017680915486414586, 0.00013260746120237444, -0.001128727317],
[0.00044424836491567196, -0.0003331879850180065, 0.0010482087675674915],
[0.000366290815094829, -0.00027471928608663234, 3.240032593749042e-05]]
bigtransfo is an object containing 800 3x3 arrays, transformations. datal is just a chunk of a data frame that has 800 rows and three columns. The idea is to multiply the three components of each row, the selected vector, by the corresponding transformation.
Any ideas are welcome. Thanks in advance.
update: Added working example.

Related

numpy function to use for mathematical dot product to produce scalar

Question
What numpy function to use for mathematical dot product in the case below?
Backpropagation for a Linear Layer
Define sample (2,3) array:
In [299]: dldx = np.arange(6).reshape(2,3)
In [300]: w
Out[300]:
array([[0.1, 0.2, 0.3],
[0. , 0. , 0. ]])
Element wise multiplication:
In [301]: dldx*w
Out[301]:
array([[0. , 0.2, 0.6],
[0. , 0. , 0. ]])
and summing on the last axis (size 3) produces a 2 element array:
In [302]: (dldx*w).sum(axis=1)
Out[302]: array([0.8, 0. ])
Your (6) is the first term, dropping the 0. One might argue that the use of a dot/inner in (5) is a bit sloppy.
np.einsum borrows ideas from physics, where dimensions may be higher. This case can be expressed as
In [303]: np.einsum('ij,ik->i',dldx,w)
Out[303]: array([1.8, 0. ])
inner and dot do more calculations that we want. We just want the diagonal:
In [304]: np.dot(dldx,w.T)
Out[304]:
array([[0.8, 0. ],
[2.6, 0. ]])
In [305]: np.inner(dldx,w)
Out[305]:
array([[0.8, 0. ],
[2.6, 0. ]])
In matmul/# terms, the size 2 dimension is a 'batch' one, so we have to add dimensions:
In [306]: dldx[:,None,:]#w[:,:,None]
Out[306]:
array([[[0.8]],
[[0. ]]])
This is (2,1,1), so we need to squeeze out the 1s.

Sparse matrix visualisation

I'm working on FEM analysis. I just wanted to evaluate a simple matrix multiplication and see the numeric result. How can I see the elements of the sparse matrix?
the code that I have used for is:
U_h= 0.5 * np.dot(np.dot(U[np.newaxis], K), U[np.newaxis].T)
Since U is a 1x3 matrix, K is 3x3 matrix and U.T is 3x1 matrix, I expect a 1x1 matrix with a single number in it. However, the result is "[[<3x3 sparse matrix of type 'class 'numpy.float64' with 3 stored elements in Compressed Sparse Row format>]]"
In [260]: M = sparse.random(5,5,.2, format='csr')
What you got was the repr format of the matrix:
In [261]: M
Out[261]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [262]: repr(M)
Out[262]: "<5x5 sparse matrix of type '<class 'numpy.float64'>'\n\twith 5 stored elements in Compressed Sparse Row format>"
The str format used print is:
In [263]: print(M)
(1, 0) 0.7152749140462651
(1, 1) 0.4298096228326874
(1, 3) 0.8148327301300698
(4, 0) 0.23366934073409018
(4, 3) 0.6117499168861333
In [264]: str(M)
Out[264]: ' (1, 0)\t0.7152749140462651\n (1, 1)\t0.4298096228326874\n (1, 3)\t0.8148327301300698\n (4, 0)\t0.23366934073409018\n (4, 3)\t0.6117499168861333'
If the matrix isn't big, displaying it as a dense array is nice. M.toarray() does that, or for short:
In [265]: M.A
Out[265]:
array([[0. , 0. , 0. , 0. , 0. ],
[0.71527491, 0.42980962, 0. , 0.81483273, 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0.23366934, 0. , 0. , 0.61174992, 0. ]])
for a graphical inspection use plt.spy()
see an applied example here
see the reference manual here

Using numpy einsum to perform high dimensional subtraction broadcasting

I'm having troubles in using a broadcasting subtraction. My problem is the following. I have an array x of shape [L,N], where L is an integer and N is the number of variables of my problem.
I need to compute a [L,N,N] array where at each element l,i,j it contains x[l,i]-x[l,j].
If L=1 this is equivalent to run broadcasting on subtraction: x-x.T
For example here with L=1 and N=3:
import numpy as np
x = np.array([[0,2,4]])
x-x.T
However, if one increases the dimension L things become more complicated and enter the realm of the np.einsum function.
So I tried to recreate my example, in the case L=2, where I've replicated the two rows. What I'd expect is to get a 2x3x3 array with two 3x3 matrices with equal elements.
x = np.array([[0,2,4],[0,2,4]])
n = 3
k = 2
X = np.zeros([k,n,n])
for l in range(k):
for i in range(n):
for j in range(n):
X[l,i,j] = x[l,i]-x[l,j]
print(X)
which returns
[[[ 0. -2. -4.]
[ 2. 0. -2.]
[ 4. 2. 0.]]
[[ 0. -2. -4.]
[2. 0. -2.]
[ 4. 2. 0.]]]
But how to make this with numpy einsum? I can only obtain the product:
np.einsum('ki,kj->kij',x,-x)
Are there specific examples of numpy batched subtractions or additions with increased dimension?

Plotting a histogram of 2D numpyArray of (latitude, latitude), in order to determine the proper values for DBSCAN

I am trying to apply DBSCAN on a dataset of (Lan,Lat) .. The algorithm is very sensitive for the parameter; EPS & MinPts.
I would like to have a look through a Histogram over the data, to determine the proper values. Unfortunately, Matplotlib Hist() take only 1D array.
Passing a 2D matrix as argument, Hist() treats each column as a separate input.
Scatter plot and histograms:
Does anyone has a way to solve this,
If you follow the DBSCAN article, you only need the 4-nearest-neighbor distance for each object, not all pairwise distances. I.e., a 1 dimensional array.
Instead of doing a histogram, they sort the values, and try to choose a knee in this plot.
find the 4 nearest neighbor of each object
collect all 4NN distances in one array
sort this array in descending order
plot the resulting curve
look for a knee, often best at around 5%-10% of your x axis (so 95%-90% of objects are core points).
For details, see the original DBSCAN publication!
You could use numpy.histogram2d:
import numpy as np
np.random.seed(2016)
N = 100
arr = np.random.random((N, 2))
xedges = np.linspace(0, 1, 10)
yedges = np.linspace(0, 1, 10)
lat = arr[:, 0]
lng = arr[:, 1]
hist, xedges, yedges = np.histogram2d(lat, lng, (xedges, yedges))
print(hist)
yields
[[ 0. 0. 5. 0. 3. 0. 0. 0. 3.]
[ 0. 3. 0. 3. 0. 0. 4. 0. 2.]
[ 2. 2. 1. 1. 1. 1. 3. 0. 1.]
[ 2. 1. 0. 3. 1. 2. 1. 1. 3.]
[ 3. 0. 3. 2. 0. 1. 0. 2. 0.]
[ 3. 2. 3. 1. 1. 2. 1. 1. 0.]
[ 2. 3. 0. 1. 0. 1. 3. 0. 0.]
[ 1. 1. 1. 1. 2. 0. 2. 1. 1.]
[ 0. 1. 1. 0. 1. 1. 2. 0. 0.]]
Or to visualize the histogram:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.imshow(hist)
plt.show()

matplotlib: Get the colormap array

I am new to matplotlib, and have get stuck in colormaps.
In matplotlib how do I get the whole array of RGB colors for a specific colormap, let's say for "hot". For example if I was in MATLAB I would have just done this:
# in matlab
c = hot(256);
disp(c)
Any ideas?
You can look up the values by calling the colormap as a function, and it accepts numpy arrays to query many values at once:
In [12]: from matplotlib import cm
In [13]: cm.hot(range(256))
Out[13]:
array([[ 0.0416 , 0. , 0. , 1. ],
[ 0.05189484, 0. , 0. , 1. ],
[ 0.06218969, 0. , 0. , 1. ],
...,
[ 1. , 1. , 0.96911762, 1. ],
[ 1. , 1. , 0.98455881, 1. ],
[ 1. , 1. , 1. , 1. ]])
Got it! So you just go in the command window of your Matlab and type
cmap = colormap(nameOfTheColormapYouWant)
Possible colormap in Matlab are: parula, jet, hsv, hot, cool, spring, summer,autumn,winter, gray, bone, copper, pink, lines, colorcube, prism, flag.
You get a matrix where each row is the color code used for the colormap.