I want to create a network that has specific fixed connections between layers.
For example,
Sparsely connected neural network
I tried looking into functions in Tensorflow, but I only found dense networks with regularizers, which doesn't function as I want.
If it's not possible in tensorflow, then please suggest some other library that can be used. Thanks!
You can always find a workaround. Let's say a layer does y = xW (Wx is also correct) but you want some of the entries in W always be zeros. You can do it column-wise:
For column i (or element i since y is a vector) of the output, y_i = x * D_i * W_i. The matrix D_i is a constant diagonal matrix (tf.constant, tf.diag) that controls what element would be zeros.
Then you can use tf.concat to combine all y_i to matrix Y.
You can abstract this into a function whose signature may look like def sparse_layer(input_layer, gates_matrix, activation_f, ...) which returns the output layer.
Related
since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike
I want to do max pooling in my network, like this:
My network is 1D (you can think the above example is one sentence with 6 words while every word has 3 embedding )I don't know the length of feature(not every sentence has the same length), so I can't set the pool_size in tf.layers.MaxPooling1D(https://www.tensorflow.org/api_docs/python/tf/layers/MaxPooling1D)
I just want to pooling every half features(or half sentence), Is there any function or method to do that?
(Note: My previous answer had an error that would have result in incorrect pooling windows. This one should be fine.)
Here is one possible way written in "low level" tensorflow. You might need to wrap this in a keras layer (or just use Lambda) to integrate it into your model.
x = ... # input, shape batch x n_words x features
x = tf.reshape(x, [batch, 2, n_words//2, features]) # need to get these dimensions, can get them from tf.shape(x) as well
x = tf.reduce_max(x, axis=2)
This would implement max pooling; you could also use reduce_mean for average pooling, for example.
This has one limitation, namely it's not going to work if n_words is odd. In that case, you might have to check whether it is and use tf.pad to add one element in the word axis to make it even.
I'm very new to Keras a neural network in general. and I was wondering if I had a list of points (x,y) that came from a quadratic function that looks like this (ax^2+bx+c) is it possible
to feed the points into a neural network and
get the coefficients a,b and c as an output from the network?
I know that I can simply use polynomial regression to achieve my goal. that is not the point.
If you are asking how to do polynomial regression using neural networks, here's the recipe.
Your dataset consists of points (x, y). Design your network to be a fully connected network (dense network) with 1 input layer and 1 output layer. The input layer consists of 2 nodes, the output layer consists of 1 node. Then, give to your network the inputs x and x^2. The output will be computed as:
y = w * X + c
where w is a matrix of learnable parameters. Specifically, it has shape 1x2 since it contains parameters a and b. c is a bias. The input matrix X has shape 2xN, where N is the number of points in your dataset and for each point, the first component is x^2 and the second component is x.
As loss function, use the standard Mean Squared Error loss. As for the optimizer, a simple Stochastic Gradient Descent should work just fine. At convergence, w and c will be good enough to approximate the true quadratic function.
I don't know keras, but I think it will not tough figuring out by yourself how to implement this naive network.
Saying I have a 2000x100 matrix, I put it into 10 dimension embedding layer, which gives me 2000x100x10 tensor. so it's 2000 examples and each example has a 100x10 matrix. and then, I pass it to a conv1d and KMaxpolling to get 2000x24 matrix, which is 2000 examples and each example has a 24 dimension vector. and now, I would like to recombine those examples before I apply another layer. I would like to combine the first 10 examples together, and such and such, so I get a tuple. and then I pass that tuple to the next layer.
My question is, Can I do that with Keras? and any idea on how to do it?
The idea of using "samples" is that these samples should be unique and not relate to each other.
This is something Keras will demand from your model: if it started with 2000 samples, it must end with 2000 samples. Ideally, these samples do not talk to each other, but you can use custom layers to hack this, but only in the middle. You will need to end with 2000 samples anyway.
I believe you're going to end your model with 200 groups, so maybe you should already start with shape (200,10,100) and use TimeDistributed wrappers:
inputs = Input((10,100)) #shape (200,10,100)
out = TimeDistributed(Embedding(....))(inputs) #shape (200,10,100,10)
out = TimeDistributed(Conv1D(...))(out) #shape (200,10,len,filters)
#here, you use your layer that will work on the groups without TimeDistributed.
To reshape a tensor without changing the batch size, use the Reshape(newShape) layer, where newShape does not include the first dimension (batch size).
To reshape a tensor including the batch size, use a Lambda(lambda x: K.reshape(x,newShape)) layer, where newShape includes the first dimension (batch size) - Here you must remember the warning above: somewhere you will need to undo this change so you end up with the same batch size as the input.
I have a model that outputs a Softmax, and I would like to develop a custom loss function. The desired behaviour would be:
1) Softmax to one-hot (normally I do numpy.argmax(softmax_vector) and set that index to 1 in a null vector, but this is not allowed in a loss function).
2) Multiply the resulting one-hot vector by my embedding matrix to get an embedding vector (in my context: the word-vector that is associated to a given word, where words have been tokenized and assigned to indices, or classes for the Softmax output).
3) Compare this vector with the target (this could be a normal Keras loss function).
I know how to write a custom loss function in general, but not to do this. I found this closely related question (unanswered), but my case is a bit different, since I would like to preserve my softmax output.
It is possible to mix tensorflow and keras in you customer loss function. Once you can access to all Tensorflow function, things become very easy. I just give you a example of how this function could be imlement.
import tensorflow as tf
def custom_loss(target, softmax):
max_indices = tf.argmax(softmax, -1)
# Get the embedding matrix. In Tensorflow, this can be directly done
# with tf.nn.embedding_lookup
embedding_vectors = tf.nn.embedding_lookup(you_embedding_matrix, max_indices)
# Do anything you want with normal keras loss function
loss = some_keras_loss_function(target, embedding_vectors)
loss = tf.reduce_mean(loss)
return loss
Fan Luo's answer points in the right direction, but ultimately will not work because it involves non-derivable operations. Note such operations are acceptable for the real value (a loss function takes a real value and a predicted value, non-derivable operations are only fine for the real value).
To be fair, that was what I was asking in the first place. It is not possible to do what I wanted, but we can get a similar and derivable behaviour:
1) Element-wise power of the softmax values. This makes smaller values much smaller. For example, with a power of 4 [0.5, 0.2, 0.7] becomes [0.0625, 0.0016, 0.2400]. Note that 0.2 is comparable to 0.7, but 0.0016 is negligible with respect to 0.24. The higher my_power is, the more similar to a one-hot the final result will be.
soft_extreme = Lambda(lambda x: x ** my_power)(softmax)
2) Importantly, both softmax and one-hot vectors are normalized, but not our "soft_extreme". First, find the sum of the array:
norm = tf.reduce_sum(soft_extreme, 1)
3) Normalize soft_extreme:
almost_one_hot = Lambda(lambda x: x / norm)(soft_extreme)
Note: Setting my_power too high in 1) will result in NaNs. If you need a better softmax to one-hot conversion, then you may do steps 1 to 3 two or more times in a row.
4) Finally we want the vector from the dictionary. Lookup is forbidden, but we can take the average vector using matrix multiplication. Because our soft_normalized is similar to one-hot encoding this average will be similar to the vector associated to the highest argument (original intended behaviour). The higher my_power is in (1), the truer this will be:
target_vectors = tf.tensordot(almost_one_hot, embedding_matrix, axes=[[1], [0]])
Note: This will not work directly using batches! In my case, I reshaped my "one hot" (from [batch, dictionary_length] to [batch, 1, dictionary_length] using tf.reshape. Then tiled my embedding_matrix batch times and finally used:
predicted_vectors = tf.matmul(reshaped_one_hot, tiled_embedding)
There may be more elegant solutions (or less memory-hungry, if tiling the embedding matrix is not an option), so feel free to explore more.