How can I transfer an sparse representaion of .txt to a dense matrix in scipy? - numpy

I have a .txt file from epinion data set which is a sparse representation (ie.
23 387 5 represents the fact "user 23 has rated item 387 as 5") . from this sparse format I want to transfer it to its dense Representation scipy so I can do matrix factorization on it.
I have loaded the file with loadtxt() from numpy and it is a [664824, 3] array. Using scipy.sparse.csr_matrix I transfer it to numpy array and using todense() from scipy I was hoping to achieve the dense format but I always get the same matrix: [664824, 3]. How can I turn it into the original [40163,139738] dense representation?
import numpy as np
from io import StringIO
d = np.loadtxt("MFCode/Epinions_dataset.txt")
S = csr_matrix(d)
D = R.todense()
I expected a dense matrix with the shape of [40163,139738]

A small sample csv like text:
In [218]: np.lib.format.open_memmap?
In [219]: txt = """0 1 3
...: 1 0 4
...: 2 2 5
...: 0 3 6""".splitlines()
In [220]: data = np.loadtxt(txt)
In [221]: data
Out[221]:
array([[0., 1., 3.],
[1., 0., 4.],
[2., 2., 5.],
[0., 3., 6.]])
Using sparse, using the (data, (row, col)) style of input:
In [222]: from scipy import sparse
In [223]: M = sparse.coo_matrix((data[:,2], (data[:,0], data[:,1])), shape=(5,4))
In [224]: M
Out[224]:
<5x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in COOrdinate format>
In [225]: M.A
Out[225]:
array([[0., 3., 0., 6.],
[4., 0., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Alternatively fill in a zeros array directly:
In [226]: arr = np.zeros((5,4))
In [227]: arr[data[:,0].astype(int), data[:,1].astype(int)]=data[:,2]
In [228]: arr
Out[228]:
array([[0., 3., 0., 6.],
[4., 0., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
But be ware that np.zeros([40163,139738]) could raise a memory error. M.A (M.toarray())` could also do that.

Related

Add x,y Values to numpy Matrix

So, what I have is a data file in the form of
1 , 1 , 2
2 , 5 , 8
3 , 9 , 10
...
...
In my case, every single triplet is in the form of: value , x-position , y-position.
What i want to achieve is to insert this data in a 2d-matrix, which I already created using the np.zeros function. However, I am stuck and can't figure out how to write a function which puts the given values to the right x and y position in the matrix :/
My current Matrix (named matrix) looks like:
array([[0,0,0,...,0]
[0,0,0,...,0]
[... ]
[0,0,0,...,0]])
and if i would use matrix[1,1]=2 (first line of data) i would get:
array([[0,0,0,...,0]
[0,2,0,...,0]
[... ]
[0,0,0,...,0]])
My goal is to insert all lines of data in this way.
You can make use of the np.genfromtxt function [numpy-doc] where you set as delimiter=… parameter, the comma (','). So given you made a file data.txt, you can load that file into a numpy array with:
>>> import numpy as np
>>> np.genfromtxt('data.txt', delimiter=',')
array([[ 1., 1., 2.],
[ 2., 5., 8.],
[ 3., 9., 10.]])
Or if you are only interested in the x/y values, you can use the usecols=… parameter:
>>> np.genfromtxt('data.txt', delimiter=',', usecols=(1,2))
array([[ 1., 2.],
[ 5., 8.],
[ 9., 10.]])
You can load the data using genfromtxt():
import numpy as np
tmp = np.genfromtxt('data.txt', delimiter=',', dtype=int)
and then generate an empty data matrix a from the first two columns of tmp
a = np.zeros(np.max(tmp[:, :2], axis=0) + 1)
and populate it with values from tmp
a[tmp[:, 0], tmp[:, 1]] = tmp[:, 2]
a
# array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [ 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0., 8., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 10.]])

How is the IoU calculated for multiple bounding box predictions in Tensorflow Object Detection API?

How is the IoU metric calculated for multiple bounding box predictions in Tensorflow Object Detection API ?
Not sure exactly how TensorFlow does it but here is one way that I recently got it to work since I didn't find a good solution online. I used numpy matrices to get the IoU, & other metrics (TP, FP, TN, FN) for multi-object detection.
Lets say for this example that your image is 6x6.
import cv2
empty_array = np.zeros(36).reshape([6, 6])
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
And you have the ground truth for 2 objects, one in the bottom left of the image and one smaller one in the top right.
bbox_actual_obj1 = [[0, 3], [2, 5]] # top left coord & bottom right coord
bbox_actual_obj2 = [[4, 0], [5, 1]]
Using OpenCV, you can add these objects to a copy of the empty image array.
actual = empty.copy()
actual = cv2.rectangle(
actual,
bbox_actual_obj1[0],
bbox_actual_obj1[1],
1,
-1
)
actual = cv2.rectangle(
actual,
bbox_actual_obj2[0],
bbox_actual_obj2[1],
1,
-1
)
array([[0., 0., 0., 0., 1., 1.],
[0., 0., 0., 0., 1., 1.],
[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.]])
Now let's say that below are our predicted bounding boxes:
bbox_pred_obj1 = [[1, 3], [3, 5]] # top left coord & bottom right coord
bbox_pred_obj2 = [[3, 0], [5, 2]]
Now we do the same thing as above but change the value we assign within the array.
pred = empty.copy()
pred = cv2.rectangle(
pred,
bbox_person2_car1[0],
bbox_person2_car1[1],
2,
-1
)
pred = cv2.rectangle(
pred,
bbox_person2_car2[0],
bbox_person2_car2[1],
2,
-1
)
array([[0., 0., 0., 2., 2., 2.],
[0., 0., 0., 2., 2., 2.],
[0., 0., 0., 2., 2., 2.],
[0., 2., 2., 2., 0., 0.],
[0., 2., 2., 2., 0., 0.],
[0., 2., 2., 2., 0., 0.]])
If we convert these arrays to matrices and add them, we get the following result
actual_matrix = np.matrix(actual)
pred_matrix = np.matrix(pred)
combined = actual_matrix + pred_matrix
matrix([[0., 0., 0., 2., 3., 3.],
[0., 0., 0., 2., 3., 3.],
[0., 0., 0., 2., 2., 2.],
[1., 3., 3., 2., 0., 0.],
[1., 3., 3., 2., 0., 0.],
[1., 3., 3., 2., 0., 0.]])
Now all we need to do is count the amount of each number in the combined matrix to get the TP, FP, TN, FN rates.
combined = np.squeeze(
np.asarray(
pred_matrix + actual_matrix
)
)
unique, counts = np.unique(combined, return_counts=True)
zipped = dict(zip(unique, counts))
{0.0: 15, 1.0: 3, 2.0: 8, 3.0: 10}
Legend:
True Negative: 0
False Negative: 1
False Positive: 2
True Positive/Intersection: 3
Union: 1 + 2 + 3
IoU: 0.48 10/(3 + 8 + 10)
Precision: 0.56 10/(10 + 8)
Recall: 0.77 10/(10 + 3)
F1: 0.65 10/(10 + 0.5 * (3 + 8))
Each bounding box around an object has an IoU (intersection over union) with the ground-truth box of that object. It is calculated by dividing the common area (overlap) between the predicted bounding box and the actual correct (ground-truth box) by the cumulative area of the two boxes. After calculating all the IoUs for the boxes around an object, the ones with the highest IoU are selected as the result. Here it is explained better.
Also you can print the IoU value after this line.

How to create Numpy matrix of row index where a certain condition is met?

How do I convert a numpy matrix of values to numpy matrix of row indexes where a certain condition is met?
Let's say
A = array([[ 0., 5., 0.],[ 0., 0., 3.],[ 0., 0., 0.]])
If there is a condition that I want to use here -- if an element is greater than 0 then replace it by row index+1, how would I do it?
So output should be,
B = array([[0., 1., 0.],[0., 0., 2.],[0., 0., 0.]])
Not sure if I am using np.where correctly. Thanks.
Using numpy.where
np.where(A>0, np.arange(1, A.shape[0]+1)[:, None], A)
array([[0., 1., 0.],
[0., 0., 2.],
[0., 0., 0.]])
Or you can use arithmetic (won't work if you have values less than 0):
(A > 0) * np.arange(1, A.shape[0]+1)[:, None]

How to use CNTK classification_error()?

I am trying to understand the correct usage of cntk.metrics.classification_error() and use it to verify a batch of predictions against their ground truths.
The below toy example (based on the Python API docs):
import numpy as np
from cntk.metrics import classification_error
predictions = np.asarray([[1., 2., 3., 4.],[1., 2., 3., 4.],[1., 2., 3., 4.]], dtype=np.float32)
labels = np.asarray([[0., 0., 0., 1.],[0., 0., 0., 1.],[0., 0., 1., 0.]], dtype=np.float32)
classification_error(predictions, labels).eval()
yields the following result:
array([[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.]], dtype=float32)
Is there a way I can obtain a vector rather than a square matrix which appears inefficient given I would like to process a large batch?
I've tried using the axis keyword when calling classification_error(), but whether I set axis=0 or axis=1 I get an empty result.
This happens because CNTK is trying to be user-friendly and ends up being confused about the types :-) You can tell because the classification error is not even correct.
If you add a little bit of typing information it gets the semantics right.
p = C.input(4)
y = C.input(4)
classification_error(p, y).eval({p:predictions, y:labels})
array([[ 0.],
[ 0.],
[ 1.]], dtype=float32)
We will work on a fix that will prevent the confusion.

NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?

First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.
The formula portion at issue is where a matrix is added to a scalar.
A = V + x
Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.
The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.
With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)
The other matrix types (CSR, CSC, LIL) don't support scalar addition.
I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.
There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.
I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.
Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:
>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 20., 0., 0., 0.]])
Then you could update:
>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 119., 0., 0., 0.]])
Not particularly performant, but is O(nonzero).
OTOH, if you have something like COO, CSC, or CSR, you can modify the data attribute directly:
>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119., 109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 1119., 0., 0., 0.]])
Note that you're probably going to want to add an additional
>>> C.eliminate_zeros()
to handle the possibility that you've added a negative number and so there's now a 0 which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number trick, it would add somenumber to that zero you introduced.