How to use CNTK classification_error()? - cntk

I am trying to understand the correct usage of cntk.metrics.classification_error() and use it to verify a batch of predictions against their ground truths.
The below toy example (based on the Python API docs):
import numpy as np
from cntk.metrics import classification_error
predictions = np.asarray([[1., 2., 3., 4.],[1., 2., 3., 4.],[1., 2., 3., 4.]], dtype=np.float32)
labels = np.asarray([[0., 0., 0., 1.],[0., 0., 0., 1.],[0., 0., 1., 0.]], dtype=np.float32)
classification_error(predictions, labels).eval()
yields the following result:
array([[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.]], dtype=float32)
Is there a way I can obtain a vector rather than a square matrix which appears inefficient given I would like to process a large batch?
I've tried using the axis keyword when calling classification_error(), but whether I set axis=0 or axis=1 I get an empty result.

This happens because CNTK is trying to be user-friendly and ends up being confused about the types :-) You can tell because the classification error is not even correct.
If you add a little bit of typing information it gets the semantics right.
p = C.input(4)
y = C.input(4)
classification_error(p, y).eval({p:predictions, y:labels})
array([[ 0.],
[ 0.],
[ 1.]], dtype=float32)
We will work on a fix that will prevent the confusion.

Related

how to fill particular indices of a dimension of tensor with a constant value k in tensorflow?

I am trying to find a way to replace certain indices of a dimension in tensor with a constant value k. Something similar to index_fill_ in PyTorch.
I have checked tensor_scatter_nd_update, but that requires the entire tensor along with the indices and values to be replaced. Which requires the indices to be with respect to the whole tensor but not just one particular dimension and also requires the values to be in the form of a tensor rather than just a single constant. I am looking for something simpler?
If anyone knows any of this, can you please provide some solution or a direction in which i should be looking into? Thank you
You haven't given an example but you could slice and assign like this.
import tensorflow as tf
aa = tf.Variable(tf.zeros([10, 4]))
tensor = tf.constant(10,shape=(4,3))
aa[0:4, 1:4 ].assign(tf.ones_like(tensor, dtype=tf.float32))
print(aa)
aa[0:4, 1:4 ].assign([[1,1,1],[1,1,1],[1,1,1],[1,1,1]])
print(aa)
Both formats of assign print this.
array([[0., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]], dtype=float32)>

How can I transfer an sparse representaion of .txt to a dense matrix in scipy?

I have a .txt file from epinion data set which is a sparse representation (ie.
23 387 5 represents the fact "user 23 has rated item 387 as 5") . from this sparse format I want to transfer it to its dense Representation scipy so I can do matrix factorization on it.
I have loaded the file with loadtxt() from numpy and it is a [664824, 3] array. Using scipy.sparse.csr_matrix I transfer it to numpy array and using todense() from scipy I was hoping to achieve the dense format but I always get the same matrix: [664824, 3]. How can I turn it into the original [40163,139738] dense representation?
import numpy as np
from io import StringIO
d = np.loadtxt("MFCode/Epinions_dataset.txt")
S = csr_matrix(d)
D = R.todense()
I expected a dense matrix with the shape of [40163,139738]
A small sample csv like text:
In [218]: np.lib.format.open_memmap?
In [219]: txt = """0 1 3
...: 1 0 4
...: 2 2 5
...: 0 3 6""".splitlines()
In [220]: data = np.loadtxt(txt)
In [221]: data
Out[221]:
array([[0., 1., 3.],
[1., 0., 4.],
[2., 2., 5.],
[0., 3., 6.]])
Using sparse, using the (data, (row, col)) style of input:
In [222]: from scipy import sparse
In [223]: M = sparse.coo_matrix((data[:,2], (data[:,0], data[:,1])), shape=(5,4))
In [224]: M
Out[224]:
<5x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in COOrdinate format>
In [225]: M.A
Out[225]:
array([[0., 3., 0., 6.],
[4., 0., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Alternatively fill in a zeros array directly:
In [226]: arr = np.zeros((5,4))
In [227]: arr[data[:,0].astype(int), data[:,1].astype(int)]=data[:,2]
In [228]: arr
Out[228]:
array([[0., 3., 0., 6.],
[4., 0., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
But be ware that np.zeros([40163,139738]) could raise a memory error. M.A (M.toarray())` could also do that.

Multiplying numpy array stack with hermitian transpose of itself without loop

I want to completely get rid of for loops in my code.
I have a complex numpy array stack1 of dimension OxMxN This is a stack of MxN arrays stacked in the 1st dimension. For each MxN array that we call A I want to compute the matrix multiplication:
for k in range(stack1.shape[0]):
A=stack1[k,:,:]
newstack[k,:,:]=A.dot( numpy.conj(numpy.transpose(A)) )
I tried
newstack = stack1 # np.conj(stack1.T)
but I run in an issue because the dimensions won't match
We can use einsum -
np.einsum('ijk,ilk->ijl',stack1,np.conj(stack1))
We can also use np.matmul -
np.matmul(stack1,np.conj(stack1).swapaxes(1,2))
On Python 3.x, simplifies with # operator -
stack1 # np.conj(stack1).swapaxes(1,2)
Just try to correct your for loop
a=[]
for k in range(stack1.shape[0]):
A=stack1[k,:,:]
a.append(A.dot( numpy.conj(numpy.transpose(A)) ))
np.array(a)
Out[399]:
array([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]])

NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?

First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.
The formula portion at issue is where a matrix is added to a scalar.
A = V + x
Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.
The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.
With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)
The other matrix types (CSR, CSC, LIL) don't support scalar addition.
I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.
There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.
I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.
Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:
>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 20., 0., 0., 0.]])
Then you could update:
>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 119., 0., 0., 0.]])
Not particularly performant, but is O(nonzero).
OTOH, if you have something like COO, CSC, or CSR, you can modify the data attribute directly:
>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119., 109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 1119., 0., 0., 0.]])
Note that you're probably going to want to add an additional
>>> C.eliminate_zeros()
to handle the possibility that you've added a negative number and so there's now a 0 which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number trick, it would add somenumber to that zero you introduced.

Pairwise calculation on a 1D-Array with Matrix-like output

Assume you have the following 1D-Array:
array([1,2,3,4,5])
I want to perform different (simple) calculations between each combination of numbers (such as addition, subtraction, etc.) resulting in a Matrix-type output (without duplication), i.e. for the above array, the output should be as below if we wanted to calculated the pairwise difference:
array([0,-,-,-,-],
[1,0,-,-,-],
[2,1,0,-,-],
[3,2,1,0,-],
[4,3,2,1,0])
Of course one could use brute force with two for loops but I feel like there is a better way, I just can't seem to find the right method.
For anyone interested, I managed to find a solution using pairwise_distances from scikit-learn. This will by default just calculate the absolute distance between any pair, but it is possible to supply a custom function that takes two arguments, i.e. two numbers of a pair, for more elaborate calculations. It will require a slight reshape for 1D arrays.
from sklearn.metrics import pairwise_distances
def custom_calc(x,y):
return (y-x)
a = np.array([1,2,3,4,5])
matrix = pairwise_distances(a.reshape(-1,1), metric=custom_calc)
matrix will look as follows:
array([[0., 1., 2., 3., 4.],
[1., 0., 1., 2., 3.],
[2., 1., 0., 1., 2.],
[3., 2., 1., 0., 1.],
[4., 3., 2., 1., 0.]])
Make use of numpy broadcasting to calculate the pairwise difference. Like this no loops are needed. For that to happen the operation has to be done between a row- and a column-vector of the same array.
import numpy as np
x = np.arange(1,6, dtype=np.float)
# x[:,None] adds a second axis to the array
mat = x[:,None]-x
this yields:
array([[ 0., -1., -2., -3., -4.],
[ 1., 0., -1., -2., -3.],
[ 2., 1., 0., -1., -2.],
[ 3., 2., 1., 0., -1.],
[ 4., 3., 2., 1., 0.]])