Multiply every row of a matrix with every row of another matrix - numpy

In numpy / PyTorch, I have two matrices, e.g. X=[[1,2],[3,4],[5,6]], Y=[[1,1],[2,2]]. I would like to dot product every row of X with every row of Y, and have the results
[[3, 6],[7, 14], [11,22]]
How do I achieve this?, Thanks!

I think this is what you are looking for:
import numpy as np
x= [[1,2],[3,4],[5,6]]
y= [[1,1],[2,2]]
x = np.asarray(x) #convert list to numpy array
y = np.asarray(y) #convert list to numpy array
product = np.dot(x, y.T)
.T transposes the matrix, which is neccessary in this case for the multiplication (because of the way dot products are defined). print(product) will output:
[[ 3 6]
[ 7 14]
[11 22]]

Using einsum
np.einsum('ij,kj->ik', X, Y)
array([[ 3, 6],
[ 7, 14],
[11, 22]])

In PyTorch, you can achieve this using torch.mm(a, b) or torch.matmul(a, b), as shown below:
x = np.array([[1,2],[3,4],[5,6]])
y = np.array([[1,1],[2,2]])
x = torch.from_numpy(x)
y = torch.from_numpy(y)
# print(torch.matmul(x, torch.t(y)))
print(torch.mm(x, torch.t(y)))
output:
tensor([[ 3, 6],
[ 7, 14],
[11, 22]], dtype=torch.int32)

Related

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])

Is there a simple pad in numpy?

Is there a numpy function that pads an array this way?
import numpy as np
def pad(x, length):
tmp = np.zeros((length,))
tmp[:x.shape[0]] = x
return tmp
x = np.array([1,2,3])
print pad(x, 5)
Output:
[ 1. 2. 3. 0. 0.]
I couldn't find a way to do it with numpy.pad()
You can use ndarray.resize():
>>> x = np.array([1,2,3])
>>> x.resize(5)
>>> x
array([1, 2, 3, 0, 0])
Note that this functions behaves differently from numpy.resize(), which pads with repeated copies of the array itself. (Consistency is for people who can't remember everything.)
Sven Marnach's suggestion to use ndarray.resize() is probably the simplest way to do it, but for completeness, here's how it can be done with numpy.pad:
In [13]: x
Out[13]: array([1, 2, 3])
In [14]: np.pad(x, [0, 5-x.size], mode='constant')
Out[14]: array([1, 2, 3, 0, 0])

Transpose of a vector using numpy

I am having an issue with Ipython - Numpy. I want to do the following operation:
x^T.x
with and x^T the transpose operation on vector x. x is extracted from a txt file with the instruction:
x = np.loadtxt('myfile.txt')
The problem is that if i use the transpose function
np.transpose(x)
and uses the shape function to know the size of x, I get the same dimensions for x and x^T. Numpy gives the size with a L uppercase indice after each dimensions. e.g.
print x.shape
print np.transpose(x).shape
(3L, 5L)
(3L, 5L)
Does anybody know how to solve this, and compute x^T.x as a matrix product?
Thank you!
What np.transpose does is reverse the shape tuple, i.e. you feed it an array of shape (m, n), it returns an array of shape (n, m), you feed it an array of shape (n,)... and it returns you the same array with shape(n,).
What you are implicitly expecting is for numpy to take your 1D vector as a 2D array of shape (1, n), that will get transposed into a (n, 1) vector. Numpy will not do that on its own, but you can tell it that's what you want, e.g.:
>>> a = np.arange(4)
>>> a
array([0, 1, 2, 3])
>>> a.T
array([0, 1, 2, 3])
>>> a[np.newaxis, :].T
array([[0],
[1],
[2],
[3]])
As explained by others, transposition won't "work" like you want it to for 1D arrays.
You might want to use np.atleast_2d to have a consistent scalar product definition:
def vprod(x):
y = np.atleast_2d(x)
return np.dot(y.T, y)
I had the same problem, I used numpy matrix to solve it:
# assuming x is a list or a numpy 1d-array
>>> x = [1,2,3,4,5]
# convert it to a numpy matrix
>>> x = np.matrix(x)
>>> x
matrix([[1, 2, 3, 4, 5]])
# take the transpose of x
>>> x.T
matrix([[1],
[2],
[3],
[4],
[5]])
# use * for the matrix product
>>> x*x.T
matrix([[55]])
>>> (x*x.T)[0,0]
55
>>> x.T*x
matrix([[ 1, 2, 3, 4, 5],
[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15],
[ 4, 8, 12, 16, 20],
[ 5, 10, 15, 20, 25]])
While using numpy matrices may not be the best way to represent your data from a coding perspective, it's pretty good if you are going to do a lot of matrix operations!
For starters L just means that the type is a long int. This shouldn't be an issue. You'll have to give additional information about your problem though since I cannot reproduce it with a simple test case:
In [1]: import numpy as np
In [2]: a = np.arange(12).reshape((4,3))
In [3]: a
Out[3]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [4]: a.T #same as np.transpose(a)
Out[4]:
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
In [5]: a.shape
Out[5]: (4, 3)
In [6]: np.transpose(a).shape
Out[6]: (3, 4)
There is likely something subtle going on with your particular case which is causing problems. Can you post the contents of the file that you're reading into x?
This is either the inner or outer product of the two vectors, depending on the orientation you assign to them. Here is how to calculate either without changing x.
import numpy
x = numpy.array([1, 2, 3])
inner = x.dot(x)
outer = numpy.outer(x, x)
The file 'myfile.txt' contain lines such as
5.100000 3.500000 1.400000 0.200000 1
4.900000 3.000000 1.400000 0.200000 1
Here is the code I run:
import numpy as np
data = np.loadtxt('iris.txt')
x = data[1,:]
print x.shape
print np.transpose(x).shape
print x*np.transpose(x)
print np.transpose(x)*x
And I get as a result
(5L,)
(5L,)
[ 24.01 9. 1.96 0.04 1. ]
[ 24.01 9. 1.96 0.04 1. ]
I would be expecting one of the two last result to be a scalar instead of a vector, because x^T.x (or x.x^T) should give a scalar.
b = np.array([1, 2, 2])
print(b)
print(np.transpose([b]))
print("rows, cols: ", b.shape)
print("rows, cols: ", np.transpose([b]).shape)
Results in
[1 2 2]
[[1]
[2]
[2]]
rows, cols: (3,)
rows, cols: (3, 1)
Here (3,) can be thought as "(3, 0)".
However if you want the transpose of a matrix A, np.transpose(A) is the solution. Shortly, [] converts a vector to a matrix, a matrix to a higher dimension tensor.

How do I assign multiple labels at once in matplotlib?

I have the following dataset:
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[9, 8, 7, 6, 5] ]
Now I plot it with:
import matplotlib.pyplot as plt
plt.plot(x, y)
However, I want to label the 3 y-datasets with this command, which raises an error when .legend() is called:
lineObjects = plt.plot(x, y, label=['foo', 'bar', 'baz'])
plt.legend()
File "./plot_nmos.py", line 33, in <module>
plt.legend()
...
AttributeError: 'list' object has no attribute 'startswith'
When I inspect the lineObjects:
>>> lineObjects[0].get_label()
['foo', 'bar', 'baz']
>>> lineObjects[1].get_label()
['foo', 'bar', 'baz']
>>> lineObjects[2].get_label()
['foo', 'bar', 'baz']
Question
Is there an elegant way to assign multiple labels by just using the .plot() method?
You can iterate over your line objects list, so labels are individually assigned. An example with the built-in python iter function:
lineObjects = plt.plot(x, y)
plt.legend(iter(lineObjects), ('foo', 'bar', 'baz'))`
Edit: after updating to matplotlib 1.1.1, it looks like the plt.plot(x, y), with y as a list of lists (as provided by the author of the question), doesn't work anymore. The one step plotting without iteration over the y arrays is still possible thought after passing y as numpy.array (assuming (numpy)[http://numpy.scipy.org/] as been previously imported).
In this case, use plt.plot(x, y) (if the data in the 2D y array are arranged as columns [axis 1]) or plt.plot(x, y.transpose()) (if the data in the 2D y array are arranged as rows [axis 0])
Edit 2: as pointed by #pelson (see commentary below), the iter function is unnecessary and a simple plt.legend(lineObjects, ('foo', 'bar', 'baz')) works perfectly
It is not possible to plot those two arrays agains each other directly (with at least version 1.1.1), therefore you must be looping over your y arrays. My advice would be to loop over the labels at the same time:
import matplotlib.pyplot as plt
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [9, 8, 7, 6, 5] ]
labels = ['foo', 'bar', 'baz']
for y_arr, label in zip(y, labels):
plt.plot(x, y_arr, label=label)
plt.legend()
plt.show()
Edit: #gcalmettes pointed out that as numpy arrays, it is possible to plot all the lines at the same time (by transposing them). See #gcalmettes answer & comments for details.
I came over the same problem and now I found a solution that is most easy! Hopefully that's not too late for you. No iterator, just assign your result to a structure...
from numpy import *
from matplotlib.pyplot import *
from numpy.random import *
a = rand(4,4)
a
>>> array([[ 0.33562406, 0.96967617, 0.69730654, 0.46542408],
[ 0.85707323, 0.37398595, 0.82455736, 0.72127002],
[ 0.19530943, 0.4376796 , 0.62653007, 0.77490795],
[ 0.97362944, 0.42720348, 0.45379479, 0.75714877]])
[b,c,d,e] = plot(a)
legend([b,c,d,e], ["b","c","d","e"], loc=1)
show()
Looks like this:
The best current solution is:
lineObjects = plt.plot(x, y) # y describes 3 lines
plt.legend(['foo', 'bar', 'baz'])
You can give the labels while plotting the curves
import pylab as plt
x = [0, 1, 2, 3, 4]
y = [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [9, 8, 7, 6, 5] ]
labels=['foo', 'bar', 'baz']
colors=['r','g','b']
# loop over data, labels and colors
for i in range(len(y)):
plt.plot(x,y[i],'o-',color=colors[i],label=labels[i])
plt.legend()
plt.show()
In case of numpy matrix plot assign multiple legends at once for each column
I would like to answer this question based on plotting a matrix that has two columns.
Say you have a 2 column matrix Ret
then one may use this code to assign multiple labels at once
import pandas as pd, numpy as np, matplotlib.pyplot as plt
pd.DataFrame(Ret).plot()
plt.xlabel('time')
plt.ylabel('Return')
plt.legend(['Bond Ret','Equity Ret'], loc=0)
plt.show()
I hope this helps
This problem comes up for me often when I have a single set of x values and multiple y values in the columns of an array. I really don't want to plot the data in a loop, and multiple calls to ax.legend/plt.legend are not really an option, since I want to plot other stuff, usually in an equally annoying format.
Unfortunately, plt.setp is not helpful here. In newer versions of matplotlib, it just converts your entire list/tuple into a string, and assigns the whole thing as a label to all the lines.
I've therefore made a utility function to wrap calls to ax.plot/plt.plot in:
def set_labels(artists, labels):
for artist, label in zip(artists, labels):
artist.set_label(label)
You can call it something like
x = np.arange(5)
y = np.random.ranint(10, size=(5, 3))
fig, ax = plt.subplots()
set_labels(ax.plot(x, y), 'ABC')
This way you get to specify all your normal artist parameters to plot, without having to see the loop in your code. An alternative is to put the whole call to plot into a utility that just unpacks the labels, but that would require a lot of duplication to figure out how to parse multiple datasets, possibly with different numbers of columns, and spread out across multiple arguments, keyword or otherwise.
I used the following to show labels for a dataframe without using the dataframe plot:
lines_ = plot(df)
legend(lines_, df.columns) # df.columns is a list of labels
If you're using a DataFrame, you can also iterate over the columns of the data you want to plot:
# Plot figure
fig, ax = plt.subplots(figsize=(5,5))
# Data
data = data
# Plot
for i in data.columns:
_ = ax.plot(data[i], label=i)
_ = ax.legend()
plt.show()

connecting all numpy array plot points to each other using plt.plot() from matplotlib

I have a numpy array with xy co-ordinates for points. I have plotted each of these points and want a line connecting each point to every other point (a complete graph). The array is a 2x50 structure so I have transposed it and used a view to let me iterate through the rows. However, I am getting an 'index out of bounds' error with the following:
plt.plot(*zip(*v.T)) #to plot all the points
viewVX = (v[0]).T
viewVY = (v[1]).T
for i in range(0, 49):
xPoints = viewVX[i], viewVX[i+1]
print("xPoints is", xPoints)
yPoints = viewVY[i+2], viewVY[i+3]
print("yPoints is", yPoints)
xy = xPoints, yPoints
plt.plot(*zip(*xy), ls ='-')
I was hoping that the indexing would 'wrap-around' so that for the ypoints, it'd start with y0, y1 etc. Is there an easier way to accomplish what I'm trying to achieve?
import matplotlib.pyplot as plt
import numpy as np
import itertools
v=np.random.random((2,50))
plt.plot(
*zip(*itertools.chain.from_iterable(itertools.combinations(v.T,2))),
marker='o', markerfacecolor='red')
plt.show()
The advantage of doing it this way is that there are fewer calls to plt.plot. This should be significantly faster than methods that make O(N**2) calls to plt.plot.
Note also that you do not need to plot the points separately. Instead, you can use the marker='o' parameter.
Explanation: I think the easiest way to understand this code is to see how it operates on a simple v:
In [4]: import numpy as np
In [5]: import itertools
In [7]: v=np.arange(8).reshape(2,4)
In [8]: v
Out[8]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
itertools.combinations(...,2) generates all possible pairs of points:
In [10]: list(itertools.combinations(v.T,2))
Out[10]:
[(array([0, 4]), array([1, 5])),
(array([0, 4]), array([2, 6])),
(array([0, 4]), array([3, 7])),
(array([1, 5]), array([2, 6])),
(array([1, 5]), array([3, 7])),
(array([2, 6]), array([3, 7]))]
Now we use itertools.chain.from_iterable to convert this list of pairs of points into a (flattened) list of points:
In [11]: list(itertools.chain.from_iterable(itertools.combinations(v.T,2)))
Out[11]:
[array([0, 4]),
array([1, 5]),
array([0, 4]),
array([2, 6]),
array([0, 4]),
array([3, 7]),
array([1, 5]),
array([2, 6]),
array([1, 5]),
array([3, 7]),
array([2, 6]),
array([3, 7])]
If we plot these points one after another, connected by lines, we get our complete graph. The only problem is that plt.plot(x,y) expects x to be a sequence of x-values, and y to be a sequence of y-values.
We can use zip to convert the list of points into a list of x-values and y-values:
In [12]: zip(*itertools.chain.from_iterable(itertools.combinations(v.T,2)))
Out[12]: [(0, 1, 0, 2, 0, 3, 1, 2, 1, 3, 2, 3), (4, 5, 4, 6, 4, 7, 5, 6, 5, 7, 6, 7)]
The use of the splat operator (*) in zip and plt.plot is explained here.
Thus we've managed to massage the data into the right form to be fed to plt.plot.
With a 2 by 50 array,
for i in range(0, 49):
xPoints = viewVX[i], viewVX[i+1]
print("xPoints is", xPoints)
yPoints = viewVY[i+2], viewVY[i+3]
would get out of bounds for i = 47 and i = 48 since you use i+2 and i+3 as indices into viewVY.
This is what I came up with, but I hope someone comes up with something better.
def plot_complete(v):
for x1, y1 in v.T:
for x2, y2, in v.T:
plt.plot([x1, x2], [y1, y2], 'b')
plt.plot(v[0], v[1], 'sr')
The 'b' makes the lines blue, and 'sr' marks the points with red squares.
Have figured it out. Basically used simplified syntax provided by #Bago for plotting and considered #Daniel's indexing tip. Just have to iterate through each xy set of points and construct a new set of xx' yy' set of points to use to send to plt.plot():
viewVX = (v[0]).T #this is if your matrix is 2x100 ie row [0] is x and row[1] is y
viewVY = (v[1]).T
for i in range(0, v.shape[1]): #v.shape[1] gives the number of columns
for j in range(0, v.shape[1]):
xPoints = viewVX[j], viewVX[i]
yPoints = viewVY[j], viewVY[i]
xy = [xPoints, yPoints] #tuple/array of xx, yy point
#print("xy points are", xy)
plt.plot(xy[0],xy[1], ls ='-')