Theano TypeError - numpy

I am reading jpg images and then reshaping them into a tensor. I am casting the images as float32:
def load(folder,table):
X=[]
train = pd.read_csv(table)
for i,img_id in enumerate(train['Image']):
img = io.imread(folder+img_id[2:])
X.append(img)
X = np.array(X)/255.
X = X.astype(np.float32)
X = X.reshape(-1, 1, 225, 225)
return X
However, I am getting this error
TypeError: ('Bad input argument to theano function with name "/Users/mas/PycharmProjects/Whale/nolearn_convnet/Zahraa5/lib/python2.7/site-packages/nolearn/lasagne/base.py:435" at index 1(0-based)', 'TensorType(int32, vector) cannot store a value of dtype float32 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to int32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))

This is a cross-post to the theano-users mailing list.
Doug provided an answer there:
The theano variable you are using is defined as integer, but you
passed in a float, hence the error 'TensorType(int32, vector) cannot
store a value of dtype float32...'. You can either modify your data
loading code to cast it as int32, or change the symbolic variable to
something that supports float32.
So somewhere you have a line that looks something like:
x = T.ivector()
or
x = T.vector(dtype='int32')
It looks like you need to change this to something like
x = T.tensor4()
where the dtype has been changed to equal theano.config.floatX and the dimensionality as been changed to 4 to match the 4-dimensional nature of X.

If you didn't figure it out, I had a similar error and here's how I fixed it:
Cast your y as int32. The x values can be floatx, but the y MUST be int32 in nolearn for classification.

Related

Add x,y Values to numpy Matrix

So, what I have is a data file in the form of
1 , 1 , 2
2 , 5 , 8
3 , 9 , 10
...
...
In my case, every single triplet is in the form of: value , x-position , y-position.
What i want to achieve is to insert this data in a 2d-matrix, which I already created using the np.zeros function. However, I am stuck and can't figure out how to write a function which puts the given values to the right x and y position in the matrix :/
My current Matrix (named matrix) looks like:
array([[0,0,0,...,0]
[0,0,0,...,0]
[... ]
[0,0,0,...,0]])
and if i would use matrix[1,1]=2 (first line of data) i would get:
array([[0,0,0,...,0]
[0,2,0,...,0]
[... ]
[0,0,0,...,0]])
My goal is to insert all lines of data in this way.
You can make use of the np.genfromtxt function [numpy-doc] where you set as delimiter=… parameter, the comma (','). So given you made a file data.txt, you can load that file into a numpy array with:
>>> import numpy as np
>>> np.genfromtxt('data.txt', delimiter=',')
array([[ 1., 1., 2.],
[ 2., 5., 8.],
[ 3., 9., 10.]])
Or if you are only interested in the x/y values, you can use the usecols=… parameter:
>>> np.genfromtxt('data.txt', delimiter=',', usecols=(1,2))
array([[ 1., 2.],
[ 5., 8.],
[ 9., 10.]])
You can load the data using genfromtxt():
import numpy as np
tmp = np.genfromtxt('data.txt', delimiter=',', dtype=int)
and then generate an empty data matrix a from the first two columns of tmp
a = np.zeros(np.max(tmp[:, :2], axis=0) + 1)
and populate it with values from tmp
a[tmp[:, 0], tmp[:, 1]] = tmp[:, 2]
a
# array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [ 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0., 8., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 10.]])

Keras: result of model.evaluate() stays high with all the weights and biases being 0

I created a VGG16 model using Keras application (TensorFlow backend). Then I wanted to change part of those weights and then test the accuracy of this modified model. To be direct and intuitive, I changed ALL the weights and biases in ALL layers to 0 like this:
model = VGG16(weights='imagenet', include_top=True)
# here is the test data and label containing 10 pictures I created.
data = np.load('./10_random_samples_array.npz')
data, label = data["X"], data["Y"]
# Modify the weights to zero
for z in [1, 2, 4, 5, 7, 8, 9, 11, 12, 13, 15, 16, 17]: # Conv layers
weight_bias = model.layers[z].get_weights()
shape_weight = np.shape(weight_bias[0])
shape_bias = np.shape(weight_bias[1])
weight_bias[0] = np.zeros(shape=(shape_weight[0],shape_weight[1],shape_weight[2],shape_weight[3]))
weight_bias[1] = np.zeros(shape=(shape_bias[0],))
model.layers[z].set_weights(weight_bias)
for z in [20,21,22]: # FC layers
weight_bias = model.layers[z].get_weights()
shape_weight = np.shape(weight_bias[0])
print(z, shape_weight)
shape_bias = np.shape(weight_bias[1])
weight_bias[0] = np.zeros(shape=(shape_weight[0],shape_weight[1],))
weight_bias[1] = np.zeros(shape=(shape_bias[0],))
model.layers[z].set_weights(weight_bias)
model.compile(loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
# To check if the weights have been modified.
print(model.layers[1].get_weights())
loss, acc = model.evaluate(data, label, verbose=1)
print(acc)
Then I got result like this:
[array([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
...(All zero, I omit them)
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]]], dtype=float32),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
10/10 [==============================] - 2s 196ms/step
0.9989999532699585
Firstly, You can tell that all the weights and biases have already been changed to 0 but the accuracy still stays very high. That is unreasonable.(The original result returned by model.evaluate() is 0.9993000030517578)
Secondly, I used only 10 pictures as my test dataset. The result must be a decimal with only one digit after the point. But I got 0.9989999532699585.
I also tried to modify all weights only in Conv1-1 to zero and the result is also 0.9989999532699585. It seems that it is the minimum result. Is there something wrong with my model? Or the weights cannot be modified in this way? Or model.evaluate() doesn't work as I suppose?

NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?

First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.
The formula portion at issue is where a matrix is added to a scalar.
A = V + x
Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.
The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.
With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)
The other matrix types (CSR, CSC, LIL) don't support scalar addition.
I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.
There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.
I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.
Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:
>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 20., 0., 0., 0.]])
Then you could update:
>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 119., 0., 0., 0.]])
Not particularly performant, but is O(nonzero).
OTOH, if you have something like COO, CSC, or CSR, you can modify the data attribute directly:
>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119., 109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 1119., 0., 0., 0.]])
Note that you're probably going to want to add an additional
>>> C.eliminate_zeros()
to handle the possibility that you've added a negative number and so there's now a 0 which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number trick, it would add somenumber to that zero you introduced.

Filling multiple diagonal elements of a numpy 2D array

What is the best way to fill multiple diagonal elements (but not all) of a 2 dimensional numpy array.
I know numpy.fill_diagonal is the recommended way to fill all the diagonal elements.
Currently I am just using a loop:
for i in a_list_of_indices: a_2d_array[i,i] = num
If the array is large and the number of diagonal elements to be filled is also large, is there a better way than above.
You can use this without looping:
a_2d_array[a_list_of_indices,a_list_of_indices] = num
Example:
a_2d_array = np.zeros((5,5))
a_list_of_indices = [2, 3]
returns:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0.]])

How to create a diagonal multi-dimensional (ie greater than 2) in numpy

Is there a higher (than two) dimensional equivalent of diag?
L = [...] # some arbitrary list.
A = ndarray.diag(L)
will create a diagonal 2-d matrix shape=(len(L), len(L)) with elements of L on the diagonal.
I'd like to do the equivalent of:
length = len(L)
A = np.zeros((length, length, length))
for i in range(length):
A[i][i][i] = L[i]
Is there a slick way to do this?
Thanks!
You can use diag_indices to get the indices to be set. For example,
x = np.zeros((3,3,3))
L = np.arange(6,9)
x[np.diag_indices(3,ndim=3)] = L
gives
array([[[ 6., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 7., 0.],
[ 0., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 8.]]])
Under the hood diag_indices is just the code Jaime posted, so which to use depends on whether you want it spelled out in a numpy function, or DIY.
You can use fancy indexing:
In [2]: a = np.zeros((3,3,3))
In [3]: idx = np.arange(3)
In [4]: a[[idx]*3] = 1
In [5]: a
Out[5]:
array([[[ 1., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 1.]]])
For a more general approach, you could set the diagonal of an arbitrarily sized array doing something like:
def set_diag(arr, values):
idx = np.arange(np.min(arr.shape))
arr[[idx]*arr.ndim] = values