I have a convolutional autoencoder model. While an autoencoder typically focuses on reconstructing the input without using any label information, I want to use the class label to perform class conditional scaling/shifting after convolutions. I am curious if utilizing the label in this way might help produce better reconstructions.
num_filters = 32
input_img = layers.Input(shape=(28, 28, 1)) # input image
label = layers.Input(shape=(10,)) # label
# separate scale value for each of the filter dimensions
scale = layers.Dense(num_filters, activation=None)(label)
# conv_0 produces something of shape (None,14,14,32)
conv_0 = layers.Conv2D(num_filters, (3, 3), strides=2, activation=None, padding='same')(input_img)
# TODO: Need help here. Multiply conv_0 by scale along each of the filter dimensions.
# This still outputs something of shape (None,14,14,32)
# Essentially each 14x14x1 has it's own scalar multiplier
In the example above, the output of the convolutional layer is (14,14,32) and the scale layer is of shape (32,). I want the convolutional output to be multiplied by the corresponding scale value along each filter dimension. For example, if these were numpy arrays I could do something like conv_0[:, :, i] * scale[i] for i in range(32).
I looked at tf.keras.layers.Multiply which can be found here, but based on the documentation I believe that takes in tensors of the same size as input. How do I work around this?
You don't have to loop. Simply do the following by making two tensors broadcast-compatible,
out = layers.Multiply()([conv_0, tf.expand_dims(tf.expand_dims(scale,axis=1), axis=1)])
I dont know if i actually understood what you are trying to achieve but i did a quick numpy test. I believe it should hold in tensorflow also:
conv_0 = np.ones([14, 14, 32])
scale = np.array([ i + 1 for i in range(32)])
result = conv_0 * scale
check whether channel-wise slices actually scaled element-wise in this case by the element found at index 1 in scale, which is 2
conv_0_slice_1 = conv_0[:, :, 1]
result_slice_1 = result[:, :, 1]
Related
Trying to build a Multilayer perceptron using only NumPy with iris flower dataset but I'm stuck with this:
Input layer of 4 nodes(iris dataset is shaped (112,4)).
I want that my hidden layer consists of 3 nodes, so in theory, the correct shape would be (112,3)?
I know that each input has its own weights example:
input[0] has weight[0] etc..
The question is what shape should my random init weights have to be able to perform the dot product correct in order to get the right hidden layer output?
The number of parameters of your function should not depend on the number of entries in your dataset. From your comment it, you are having 112 feature of size 4 and you want to project those with a linear functia on to feature size of 3. A fully connected layer comes down to matrix multiplication with optionally an additive bias. So in your case you only need to use a 4x3 matrix.
>>> x = np.random.rand(112,4)
>>> m = np.random.rand(4,3)
Inference:
>>> out = x#m # __maltmul__
>>> out.shape
(112, 3)
You could also add a bias:
>>> b = np.random.rand(1,3)
>>> out = x#m + b
I am trying to understand Bahdanaus attention using the following tutorial:
https://www.tensorflow.org/tutorials/text/nmt_with_attention
The calculation is the following:
self.attention_units = attention_units
self.W1 = Dense(self.attention_units)
self.W2 = Dense(self.attention_units)
self.V = Dense(1)
score = self.V(tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)))
I have two problems:
I cannot understand why the shape of tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) is (batch_size,max_len,attention_units) ?
Using the rules of matrix multiplication I got the following results:
a) Shape of self.W1(last_inp_dec) -> (1,hidden_units_dec) * (hidden_units_dec,attention_units) = (1,attention_units)
b) Shape of self.W2(last_inp_enc) -> (max_len,hidden_units_dec) * (hidden_units_dec,attention_units) = (max_len,attention_units)
Then we add up a) and b) quantities. How do we end up with dimensionality (max_len, attention_units) or (batch_size, max_len, attention_units)? How can we do addition with different size of second dimension (1 vs max_len)?
Why do we multiply tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) by self.V? Because we want alphas as scalar?
) I cannot understand why the shape of tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) is
(batch_size,max_len,attention_units) ?
From the comments section of the code in class BahdanauAttention
query_with_time_axis shape = (batch_size, 1, hidden size)
Note that the dimension 1 was added using tf.expand_dims to make the shape compatible with values for the addition. The added dimension of 1 gets broadcast during the addition operation. Otherwise, the incoming shape was (batch_size, hidden size), which would not have been compatible
values shape = (batch_size, max_len, hidden size)
Addition of the query_with_time_axis shape and values shape gives us a shape of (batch_size, max_len, hidden size)
) Why do we multiply tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) by self.V? Because we want alphas as scalar?
self.V is the final layer, the output of which gives us the score. The random weight initialization of the self.V layer is handled by keras behind the scene in the line self.V = tf.keras.layers.Dense(1).
We are not multiplying tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) by self.V.
The construct self.V(tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) means --> the tanh activations resulting from the operation tf.nn.tanh(self.W1(last_inp_dec) + self.W2(input_enc)) form the input matrix to the single output output layer represented by self.V.
The shapes are slightly different from the ones you have given. It is best understood with a direct example perhaps?
Assuming 10 units in the alignment layer and 128 embedding dimensions on the decoder and 256 dimensions on the encoder and 19 timesteps, then:
last_inp_dec and input_enc shapes would be (?,128) and (?,19,256). We need to now expand last_inp_dec over the time axis to make it (?,1,128) so that addition is possible.
The layer weights for w1,w2,v will be (?,128,10), (?,256,10) and (?,10,1) respectively. Notice how self.w1(last_inp_dec) works out to (?,1,10). This is added to each of the self.w2(input_enc) to give a shape of (?,19,10). The result is fed to self.v and the output is (?,19,1) which is the shape we want - a set of 19 weights. Softmaxing this gives the attention weights.
Multiplying this attention weight with each encoder hidden state and summing up returns the context.
To your question as to why 'v' is needed, it is needed because Bahdanau provides the option of using 'n' units in the alignment layer (to determine w1,w2) and we need one more layer on top to massage the tensor back to the shape we want - a set of attention weights..one for each time step.
I just posted an answer at Understanding Bahdanau's Attention Linear Algebra
with all the shapes the tensors and weights involved.
I am trying to implement the circuits listed on page 8 in the following paper: https://arxiv.org/pdf/1905.10876.pdf using Tensorflow Quantum (TFQ). I have done so previously for a subset of circuits using Qiskit, and ended up with accuracies that can be found on page 14 in the following paper: https://arxiv.org/pdf/2003.09887.pdf. In TFQ, my accuracies are way down. I think this delta originates because in TFQ, I only used 1 observable Pauli Z operator on the first qubit, and the circuits do not seem to "transfer all knowledge" to the first qubit. I place this in quotes, because I am sure there is a better way to describe this. In Qiskit on the other hand, 16 states (4^2) get mapped to 2 states.
My question: how can I get my accuracies back up?
Potential answer a): some method of "transferring all information" to a single qubit, potentially an ancilla qubit, and doing a readout on this qubit.
Potential answer b) placing a Pauli Z observable on all qubits (4 in total), mapping half of the 16 states to a label 0 and the other half to a label 1. I attempted this in the code below.
My attempt at answer b):
I have a Tensorflow Quantum (TFQ) circuit implemented in Tensorflow. The circuit has multiple observables, which I try to bring together in my loss function. I prefer to use as many standard components as possible, but need to map my quantum states to a label in order to determine the loss. I think what I am trying to achieve is not unique to TFQ. I define my model in the following way:
def circuit():
data_qubits = cirq.GridQubit.rect(4, 1)
circuit = cirq.Circuit()
...
return circuit, [cirq.Z(data_qubits[0]), cirq.Z(data_qubits[1]), cirq.Z(data_qubits[2]), cirq.Z(data_qubits[3])]
model_circuit, model_readout = circuit()
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(), dtype=tf.string),
# The PQC layer returns the expected value of the readout gate, range [-1,1].
tfq.layers.PQC(model_circuit, model_readout),
])
# compile model
model.compile(
loss = loss_mse,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
metrics=[])
in loss_mse (Mean Square Error), I receive a (32, 4) tensor for y_pred. One row could look like
[-0.2, 0.33, 0.6, 0.3]
This would have to be first mapped from [-1,1] to a binarized version of [0,1], so that it looks like:
[0, 1, 1, 1]
Now, a table lookup needs to happen, which tells if this combination is 0 or 1. Finally, the regular (y_true-y_pred)^2 can be performed by that row, followed by a np.sum on all rows. I tried to implement this:
def get_label(measurement):
if measurement == [0,0,0,0]: return 0
...
elif measurement == [1,1,1,1]: return 0
else: return -1
def py_call(y_true, y_pred):
# cast tensor to numpy
y_pred_np = np.asarray(y_pred)
loss = np.zeros((len(y_pred))) # could be a single variable with += within the loop
# evalaute all 32 samples
for pred in range(len(y_pred_np)):
# map, binarize and lookup
y_labelled = get_label([0 if y<0 else 1 for y in y_pred_np[pred]])
# regular loss comparison
loss[pred] = (y_labelled - y_true[pred])**2
# reduce
loss = np.sum(loss)/len(y_true)
return loss
#tf.function
def loss_mse(y_true, y_pred):
external_list = []
loss = tf.py_function(py_call, inp=[y_true, y_pred], Tout=[tf.float64])
return loss
However, the system appears to still expect a (32,4) tensor. I would have thought I could simply provide a single loss values (float). My question: how can I map multiple values for y_true to a single number in order to compare with a single y_pred value in a tensorflow loss function?
So it looks like there are a couple of things going on here. To answer your question
how can I map multiple values for y_true to a single number in order to compare with a single y_pred value in a tensorflow loss function ?
What you might want is some kind of tf.reduce_* function like tf.reduce_mean or tf.reduce_sum. This function will allow you to apply this reduction operation accross a given tensor axis allowing you to convert a tensor of shape (32, 4) to a tensor of shape (32,) or a tensor of shape (4,). Here is a quick snippet:
#tf.function
def my_loss(y_true, y_pred):
# y_true is shape (32, 4)
# y_pred is shape (32, 4)
# Scale from [-1, 1] to [0, 1]
y_true += 1
y_true /= 2
y_pred += 1
y_pred /= 2
# These are now both (32,) with the reduction of taking the mean applied along
# the second axis.
reduced_true = tf.reduce_mean(y_true, axis=1)
reduced_pred = tf.reduce_mean(y_pred, axis=1)
# Now a scalar loss.
loss = tf.reduce_mean((reduce_true - reduced_pred) ** 2)
return loss
Now the above isn't exactly what you want, since it's not super clear to me at least what exact reduction rules you have in mind for taking something like [0,1,1,1] -> 0 vs [0,0,0,0] -> 1.
Another thing I will also mention is that if you want JUST the sum of these Pauli Operators in cirq that you have term by term in the list [cirq.Z(data_qubits[0]), cirq.Z(data_qubits[1]), cirq.Z(data_qubits[2]), cirq.Z(data_qubits[3])] and all you care about is the final sum of these expectations, you could just as easily do:
my_operator = sum([cirq.Z(data_qubits[0]), cirq.Z(data_qubits[1]),
cirq.Z(data_qubits[2]), cirq.Z(data_qubits[3])])
print(my_op)
Which should give something like:
cirq.PauliSum(cirq.LinearDict({frozenset({(cirq.GridQubit(0, 0), cirq.Z)}): (1+0j), frozenset({(cirq.GridQubit(0, 1), cirq.Z)}): (1+0j), frozenset({(cirq.GridQubit(0, 2), cirq.Z)}): (1+0j), frozenset({(cirq.GridQubit(0, 3), cirq.Z)}): (1+0j)}))
Which is also compatable as a readout operation in the PQC layer. Lastly if would recommend reading through some of the snippets and examples here:
https://www.tensorflow.org/quantum/api_docs/python/tfq/layers/PQC
and here:
https://www.tensorflow.org/quantum/api_docs/python/tfq/layers/Expectation
Which give a pretty good description of how the input and output signatures of the functions look as well as the shapes you can expect from them.
I have a 1D input signal. I want to compute autocorrelation as the part of the neural net for further use inside the network.
I need to perform convolution of input with input itself.
To perform convolution in keras custom layer/ tensorflow. We need the following parameters
data shape is "[batch, in_height, in_width, in_channels]",
filter shape is "[filter_height, filter_width, in_channels, out_channels]
There is no batch present in filter shape, which needs to be input in my case
TensorFlow now has an auto_correlation function. It should be in release 1.6. If you build from source you can use it right now (see e.g. the github code).
Here is a possible solution.
By self convolution, I understood a regular convolution where the filter is exactly the same as the input (if it's not that, sorry for my misunderstanding).
We need a custom function for that, and a Lambda layer.
At first I used padding = 'same' which brings outputs with the same length as the inputs. I'm not sure about what output length you want exactly, but if you want more, you should add padding yourself before doing the convolution. (In the example with length 7, for a complete convolution from one end to another, this manual padding would include 6 zeros before and 6 zeros after the input length, and use padding = 'valid'. Find the backend functions here)
Working example - Input (5,7,2)
from keras.models import Model
from keras.layers import *
import keras.backend as K
batch_size = 5
length = 7
channels = 2
channels_batch = batch_size*channels
def selfConv1D(x):
#this function unfortunately needs to know previously the shapes
#mainly because of the for loop, for other lines, there are workarounds
#but these workarounds are not necessary since we'll have this limitation anyway
#original x: (batch_size, length, channels)
#bring channels to the batch position:
x = K.permute_dimensions(x,[2,0,1]) #(channels, batch_size, length)
#suppose channels are just individual samples (since we don't mix channels)
x = K.reshape(x,(channels_batch,length,1))
#here, we get a copy of x reshaped to match filter shapes:
filters = K.permute_dimensions(x,[1,2,0]) #(length, 1, channels_batch)
#now, in the lack of a suitable available conv function, we make a loop
allChannels = []
for i in range (channels_batch):
f = filters[:,:,i:i+1]
allChannels.append(
K.conv1d(
x[i:i+1],
f,
padding='same',
data_format='channels_last'))
#although channels_last is my default config, I found this bug:
#https://github.com/fchollet/keras/issues/8183
#convolution output: (1, length, 1)
#concatenate all results as samples
x = K.concatenate(allChannels, axis=0) #(channels_batch,length,1)
#restore the original form (passing channels to the end)
x = K.reshape(x,(channels,batch_size,length))
return K.permute_dimensions(x,[1,2,0]) #(batch_size, length, channels)
#input data for the test:
x = np.array(range(70)).reshape((5,7,2))
#little model that just performs the convolution
inp= Input((7,2))
out = Lambda(selfConv1D)(inp)
model = Model(inp,out)
#checking results
p = model.predict(x)
for i in range(5):
print("x",x[i])
print("p",p[i])
You can just use tf.nn.conv3d by treating the "batch size" as "depth":
# treat the batch size as depth.
data = tf.reshape(input_data, [1, batch, in_height, in_width, in_channels])
kernel = [filter_depth, filter_height, filter_width, in_channels, out_channels]
out = tf.nn.conv3d(data, kernel, [1,1,1,1,1], padding='SAME')
I have the following situation:
I want to deploy a face detector model using Tensorflow Serving: https://www.tensorflow.org/serving/.
In Tensorflow Serving, there is a command line option called --enable_batching. This causes the model server to automatically batch the requests to maximize throughput. I want this to be enabled.
My model takes in a set of images (called images), which is a tensor of shape (batch_size, 640, 480, 3).
The model has two outputs: (number_of_faces, 4) and (number_of_faces,). The first output will be called faces. The last output, which we can call partitions is the index in the original batch for the corresponding face. For example, if I pass in a batch of 4 images and get 7 faces, then I might have this tensor as [0, 0, 1, 2, 2, 2, 3]. The first two faces correspond to the first image, the third face for the second image, the 3rd image has 3 faces, etc.
My issue is this:
In order for the --enable_batching flag to work, the output from my model needs to have the 0th dimension the same as the input. That is, I need a tensor with the following shape: (batch_size, ...). I suppose this is so that the model server can know which grpc connection to send each output in the batch towards.
What I want to do is to convert my output tensor from the face detector from this shape (number_of_faces, 4) to this shape (batch_size, None, 4). That is, an array of batches, where each batch can have a variable number of faces (e.g. one image in the batch may have no faces, and another might have 3).
What I tried:
tf.dynamic_partition. On the surface, this function looks perfect. However, I ran into difficulties after realizing that the num_partitions parameter cannot be a tensor, only an integer:
tensorflow_serving_output = tf.dynamic_partition(faces, partitions, batch_size)
If the tf.dynamic_partition function were to accept tensor values for num_partition, then it seems that my problem would be solved. However, I am back to square one since this is not the case.
Thank you all for your help! Let me know if anything is unclear
P.S. Here is a visual representation of the intended process:
I ended up finding a solution to this using TensorArray and tf.while_loop:
def batch_reconstructor(tensor, partitions, batch_size):
"""
Take a tensor of shape (batch_size, 4) and a 1-D partitions tensor as well as the scalar batch_size
And reconstruct a TensorArray that preserves the original batching
From the partitions, we can get the maximum amount of tensors within a batch. This will inform the padding we need to use.
Params:
- tensor: The tensor to convert to a batch
- partitions: A list of batch indices. The tensor at position i corresponds to batch # partitions[i]
"""
tfarr = tf.TensorArray(tf.int32, size=batch_size, infer_shape=False)
_, _, count = tf.unique_with_counts(partitions)
maximum_tensor_size = tf.cast(tf.reduce_max(count), tf.int32)
padding_tensor_index = tf.cast(tf.gather(tf.shape(tensor), 0), tf.int32)
padding_tensor = tf.expand_dims(tf.cast(tf.fill([4], -1), tf.float32), axis=0) # fill with [-1, -1, -1, -1]
tensor = tf.concat([tensor, padding_tensor], axis=0)
def cond(i, acc):
return tf.less(i, batch_size)
def body(i, acc):
partition_indices = tf.reshape(tf.cast(tf.where(tf.equal(partitions, i)), tf.int32), [-1])
partition_size = tf.gather(tf.shape(partition_indices), 0)
# concat the partition_indices with padding_size * padding_tensor_index
padding_size = tf.subtract(maximum_tensor_size, partition_size)
padding_indices = tf.reshape(tf.fill([padding_size], padding_tensor_index), [-1])
partition_indices = tf.concat([partition_indices, padding_indices], axis=0)
return (tf.add(i, 1), acc.write(i, tf.gather(tensor, partition_indices)))
_, reconstructed = tf.while_loop(
cond,
body,
(tf.constant(0), tfarr),
name='batch_reconstructor'
)
reconstructed = reconstructed.stack()
return reconstructed