How to create Numpy matrix of row index where a certain condition is met? - numpy

How do I convert a numpy matrix of values to numpy matrix of row indexes where a certain condition is met?
Let's say
A = array([[ 0., 5., 0.],[ 0., 0., 3.],[ 0., 0., 0.]])
If there is a condition that I want to use here -- if an element is greater than 0 then replace it by row index+1, how would I do it?
So output should be,
B = array([[0., 1., 0.],[0., 0., 2.],[0., 0., 0.]])
Not sure if I am using np.where correctly. Thanks.

Using numpy.where
np.where(A>0, np.arange(1, A.shape[0]+1)[:, None], A)
array([[0., 1., 0.],
[0., 0., 2.],
[0., 0., 0.]])
Or you can use arithmetic (won't work if you have values less than 0):
(A > 0) * np.arange(1, A.shape[0]+1)[:, None]

Related

Add x,y Values to numpy Matrix

So, what I have is a data file in the form of
1 , 1 , 2
2 , 5 , 8
3 , 9 , 10
...
...
In my case, every single triplet is in the form of: value , x-position , y-position.
What i want to achieve is to insert this data in a 2d-matrix, which I already created using the np.zeros function. However, I am stuck and can't figure out how to write a function which puts the given values to the right x and y position in the matrix :/
My current Matrix (named matrix) looks like:
array([[0,0,0,...,0]
[0,0,0,...,0]
[... ]
[0,0,0,...,0]])
and if i would use matrix[1,1]=2 (first line of data) i would get:
array([[0,0,0,...,0]
[0,2,0,...,0]
[... ]
[0,0,0,...,0]])
My goal is to insert all lines of data in this way.
You can make use of the np.genfromtxt function [numpy-doc] where you set as delimiter=… parameter, the comma (','). So given you made a file data.txt, you can load that file into a numpy array with:
>>> import numpy as np
>>> np.genfromtxt('data.txt', delimiter=',')
array([[ 1., 1., 2.],
[ 2., 5., 8.],
[ 3., 9., 10.]])
Or if you are only interested in the x/y values, you can use the usecols=… parameter:
>>> np.genfromtxt('data.txt', delimiter=',', usecols=(1,2))
array([[ 1., 2.],
[ 5., 8.],
[ 9., 10.]])
You can load the data using genfromtxt():
import numpy as np
tmp = np.genfromtxt('data.txt', delimiter=',', dtype=int)
and then generate an empty data matrix a from the first two columns of tmp
a = np.zeros(np.max(tmp[:, :2], axis=0) + 1)
and populate it with values from tmp
a[tmp[:, 0], tmp[:, 1]] = tmp[:, 2]
a
# array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [ 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0., 8., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 10.]])

How can I transfer an sparse representaion of .txt to a dense matrix in scipy?

I have a .txt file from epinion data set which is a sparse representation (ie.
23 387 5 represents the fact "user 23 has rated item 387 as 5") . from this sparse format I want to transfer it to its dense Representation scipy so I can do matrix factorization on it.
I have loaded the file with loadtxt() from numpy and it is a [664824, 3] array. Using scipy.sparse.csr_matrix I transfer it to numpy array and using todense() from scipy I was hoping to achieve the dense format but I always get the same matrix: [664824, 3]. How can I turn it into the original [40163,139738] dense representation?
import numpy as np
from io import StringIO
d = np.loadtxt("MFCode/Epinions_dataset.txt")
S = csr_matrix(d)
D = R.todense()
I expected a dense matrix with the shape of [40163,139738]
A small sample csv like text:
In [218]: np.lib.format.open_memmap?
In [219]: txt = """0 1 3
...: 1 0 4
...: 2 2 5
...: 0 3 6""".splitlines()
In [220]: data = np.loadtxt(txt)
In [221]: data
Out[221]:
array([[0., 1., 3.],
[1., 0., 4.],
[2., 2., 5.],
[0., 3., 6.]])
Using sparse, using the (data, (row, col)) style of input:
In [222]: from scipy import sparse
In [223]: M = sparse.coo_matrix((data[:,2], (data[:,0], data[:,1])), shape=(5,4))
In [224]: M
Out[224]:
<5x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in COOrdinate format>
In [225]: M.A
Out[225]:
array([[0., 3., 0., 6.],
[4., 0., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Alternatively fill in a zeros array directly:
In [226]: arr = np.zeros((5,4))
In [227]: arr[data[:,0].astype(int), data[:,1].astype(int)]=data[:,2]
In [228]: arr
Out[228]:
array([[0., 3., 0., 6.],
[4., 0., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
But be ware that np.zeros([40163,139738]) could raise a memory error. M.A (M.toarray())` could also do that.

Multiplying numpy array stack with hermitian transpose of itself without loop

I want to completely get rid of for loops in my code.
I have a complex numpy array stack1 of dimension OxMxN This is a stack of MxN arrays stacked in the 1st dimension. For each MxN array that we call A I want to compute the matrix multiplication:
for k in range(stack1.shape[0]):
A=stack1[k,:,:]
newstack[k,:,:]=A.dot( numpy.conj(numpy.transpose(A)) )
I tried
newstack = stack1 # np.conj(stack1.T)
but I run in an issue because the dimensions won't match
We can use einsum -
np.einsum('ijk,ilk->ijl',stack1,np.conj(stack1))
We can also use np.matmul -
np.matmul(stack1,np.conj(stack1).swapaxes(1,2))
On Python 3.x, simplifies with # operator -
stack1 # np.conj(stack1).swapaxes(1,2)
Just try to correct your for loop
a=[]
for k in range(stack1.shape[0]):
A=stack1[k,:,:]
a.append(A.dot( numpy.conj(numpy.transpose(A)) ))
np.array(a)
Out[399]:
array([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]])

NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?

First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.
The formula portion at issue is where a matrix is added to a scalar.
A = V + x
Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.
The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.
With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)
The other matrix types (CSR, CSC, LIL) don't support scalar addition.
I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.
There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.
I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.
Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:
>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 20., 0., 0., 0.]])
Then you could update:
>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 119., 0., 0., 0.]])
Not particularly performant, but is O(nonzero).
OTOH, if you have something like COO, CSC, or CSR, you can modify the data attribute directly:
>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119., 109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 1119., 0., 0., 0.]])
Note that you're probably going to want to add an additional
>>> C.eliminate_zeros()
to handle the possibility that you've added a negative number and so there's now a 0 which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number trick, it would add somenumber to that zero you introduced.

Pairwise calculation on a 1D-Array with Matrix-like output

Assume you have the following 1D-Array:
array([1,2,3,4,5])
I want to perform different (simple) calculations between each combination of numbers (such as addition, subtraction, etc.) resulting in a Matrix-type output (without duplication), i.e. for the above array, the output should be as below if we wanted to calculated the pairwise difference:
array([0,-,-,-,-],
[1,0,-,-,-],
[2,1,0,-,-],
[3,2,1,0,-],
[4,3,2,1,0])
Of course one could use brute force with two for loops but I feel like there is a better way, I just can't seem to find the right method.
For anyone interested, I managed to find a solution using pairwise_distances from scikit-learn. This will by default just calculate the absolute distance between any pair, but it is possible to supply a custom function that takes two arguments, i.e. two numbers of a pair, for more elaborate calculations. It will require a slight reshape for 1D arrays.
from sklearn.metrics import pairwise_distances
def custom_calc(x,y):
return (y-x)
a = np.array([1,2,3,4,5])
matrix = pairwise_distances(a.reshape(-1,1), metric=custom_calc)
matrix will look as follows:
array([[0., 1., 2., 3., 4.],
[1., 0., 1., 2., 3.],
[2., 1., 0., 1., 2.],
[3., 2., 1., 0., 1.],
[4., 3., 2., 1., 0.]])
Make use of numpy broadcasting to calculate the pairwise difference. Like this no loops are needed. For that to happen the operation has to be done between a row- and a column-vector of the same array.
import numpy as np
x = np.arange(1,6, dtype=np.float)
# x[:,None] adds a second axis to the array
mat = x[:,None]-x
this yields:
array([[ 0., -1., -2., -3., -4.],
[ 1., 0., -1., -2., -3.],
[ 2., 1., 0., -1., -2.],
[ 3., 2., 1., 0., -1.],
[ 4., 3., 2., 1., 0.]])