Julia Neural Network Issues - tensorflow

I am trying to create a neural network using the Flux package, that takes as input an 100x3 matrix of random points, and outputs True or False. I have the corresponding labels in Y which is an 100 element array of Boolean Values (1 or 0)
This is my code so far.
# model
model = Chain(Dense(100, 32, relu), Dense(32, 1), sigmoid)
# loss function and the optimizer
loss(x, y) = Flux.binarycrossentropy(model(x), y)
opt = Flux.ADAM()
# Train
Flux.train!(loss, Flux.params(model), [(X, Y)], opt)
What I've noticed is that currently, if I call model(x) it outputs a 3 element array with probabilities, which is not the thing I want, since it should output at least a 2 element array with the probability for it to be True or False. Also if I change my model to
model = Chain(Dense(100, 32, relu), Dense(32, 100), sigmoid) it outputs a 100x3 matrix of probabilities, which is again, not correct as it should be 100x2 I believe.

In Flux.jl, the observation dimension is is always the last one.
I think in your problem, each row of the matrix is ​​an observation. Am I right?
If that's the case, I think this will solve your problem:
using Flux
X = rand(3, 100) # each column has one observation
bools = rand(Bool, 100)
Y = hcat([[b, !b] for b in bools]...) # each column has one label
dataset = [(X, Y)]
model = Chain(
Dense(3 => 32, relu),
Dense(32 => 2),
sigmoid
)
opt = Flux.setup(Adam(), model)
loss(model, x, y) = Flux.binarycrossentropy(model(x), y)
Flux.train!(loss, model, dataset, opt)
Note: this code is using the latest version of Flux.jl

Related

Neural Network with inputs of different sizes

I would like to use a neural network in Keras that takes 2 inputs of different sizes (a vector v and a matrix A) and outputs a vector u, which is v after acted upon by A.
I have managed to input the matrix and vector. The problem is, when I try to use the vector u as the target when fitting the model, it complains:
ValueError: Data cardinality is ambiguous:
x sizes: 70, 312
y sizes: 70
Make sure all arrays contain the same number of samples.
The best option in your case zero padding or padding up would likely be the best decision in your situation. Inputs are only zeroed out in these situations to account for the absence of data. It is frequently used for CNN pictures' borders.
An RNN, which is a simpler option and can easily accommodate your variable-length inputs, is another option.
Here is the example code. I think it can help you.
input_layer = Input(shape=(None, None, channels))
x = Conv2D(16,(4,4), activation = 'relu')(input_layer)
x = Conv2D(32,(4,4), activation = 'relu')(x)
x = Dropout(0.2)(x)
x = Conv2D(64,(4,4), activation = 'relu')(x)
x = Dropout(0.5)(x)
x = Conv2D(128, (1,1))(x)
x = GlobalAveragePooling2D()(x)
output_layer = Dense(5, activation = "softmax")(x)
model = Model(inputs = input_layer, outputs=output_layer)
model.compile(optimizer = "adam", loss = "categorical_crossentropy",
metrics=["accuracy"])

How do you use Keras preprocessing.Normalization layers with multi-Input models and a Dataset?

All the documentation for Keras pre-processing seems to assume a single Input. If you have a model with multiple Inputs:
x_norm = preprocessing.Normalization()
y_norm = preprocessing.Normalization()
x = layers.Input(shape=(1,))
x = x_norm(x)
y = layers.Input(shape=(1,))
y = y_norm(y)
concated = layers.Concatenate()([x, y])
output = layers.Dense(1)(concated)
model = keras.Model(inputs=[x, y], outputs=output)
It's unclear how to use adapt() on a Dataset to "train" each preprocessing layer (i.e. x_norm and y_norm). With a single Input and preprocessing layer (e.g. preprocessing_layer) you simply do:
preprocessing_layer.adapt(dataset)
But in the case of multiple inputs how do I select the right input feature to use in adapt()?
The best I've come up with so far is:
normalization_layers = {
'x': preprocessing.Normalization(),
'y': preprocessing.Normalization(),
}
for batch in dataset:
for name, layer in normalization_layers.items():
layer.adapt(batch[0][name])
I don't know if this is efficient and TensorFlow gives a warning about a tf.function (inside adapt()) being called in a loop.

How to use Keras Multiply() with tf.Variable?

How do I multiply tf.keras.layers with tf.Variable?
Context: I am creating a sample dependent convolutional filter, which consists of a generic filter W that is transformed through sample dependent shifting + scaling. Therefore, the convolutional original filter W is transformed into aW + b where a is sample dependent scaling and b is sample dependent shifting. One application of this is training an autoencoder where the sample dependency is the label, so each label shifts/scales the convolutional filter. Because of sample/label dependent convolutions, I am using tf.nn.conv2d which takes the actual filters as input (as opposed to just the number/size of filters) and a lambda layer with tf.map_fn to apply a different "transformed filter" (based on the label) for each sample. Although the details are different, this kind of sample-dependent convolution approach is discussed in this post: Tensorflow: Convolutions with different filter for each sample in the mini-batch.
Here is what I am thinking:
input_img = keras.Input(shape=(28, 28, 1))
label = keras.Input(shape=(10,)) # number of classes
num_filters = 32
shift = layers.Dense(num_filters, activation=None, name='shift')(label) # (32,)
scale = layers.Dense(num_filters, activation=None, name='scale')(label) # (32,)
# filter is of shape (filter_h, filter_w, input channels, output filters)
filter = tf.Variable(tf.ones((3,3,input_img.shape[-1],num_filters)))
# TODO: need to shift and scale -> shift*(filter) + scale along each output filter dimension (32 filter dimensions)
I am not sure how to implement the TODO part. I was thinking of tf.keras.layers.Multiply() for scaling and tf.keras.layers.Add() for shifting, but they do not seem to work with tf.Variable to my knowledge. How do I get around this? Assuming the dimensions/shape broadcasting work out, I would like to do something like this (note: the output should still be the same shape as var and is just scaled along each of the 32 output filter dimensions)
output = tf.keras.layers.Multiply()([var, scale])
It requires some work and needs a custom layer. For example you cannot use tf.Variable with tf.keras.Lambda
class ConvNorm(layers.Layer):
def __init__(self, height, width, n_filters):
super(ConvNorm, self).__init__()
self.height = height
self.width = width
self.n_filters = n_filters
def build(self, input_shape):
self.filter = self.add_weight(shape=(self.height, self.width, input_shape[-1], self.n_filters),
initializer='glorot_uniform',
trainable=True)
# TODO: Add bias too
def call(self, x, scale, shift):
shift_reshaped = tf.expand_dims(tf.expand_dims(shift,1),1)
scale_reshaped = tf.expand_dims(tf.expand_dims(scale,1),1)
norm_conv_out = tf.nn.conv2d(x, self.filter*scale + shift, strides=(1,1,1,1), padding='SAME')
return norm_conv_out
Using the layer
import tensorflow as tf
import tensorflow.keras.layers as layers
input_img = layers.Input(shape=(28, 28, 1))
label = layers.Input(shape=(10,)) # number of classes
num_filters = 32
shift = layers.Dense(num_filters, activation=None, name='shift')(label) # (32,)
scale = layers.Dense(num_filters, activation=None, name='scale')(label) # (32,)
conv_norm_out = ConvNorm(3,3,32)(input_img, scale, shift)
print(norm_conv_out.shape)
Note: Note that I haven't added bias. You will need bias as well for the convolution layer. But that's straightfoward.

Custom TensorFlow loss function with batch size > 1?

I have some neural network with following code snippets, note that batch_size == 1 and input_dim == output_dim:
net_in = tf.Variable(tf.zeros(shape = [batch_size, input_dim]), dtype=tf.float32)
input_placeholder = tf.compat.v1.placeholder(shape = [batch_size, input_dim], dtype=tf.float32)
assign_input = net_in.assign(input_placeholder)
# Some matmuls, activations, dropouts, normalizations...
net_out = tf.tanh(output_before_activation)
def loss_fn(output, input):
#input.shape = output.shape = (batch_size, input_dim)
output = tf.reshape(output, [input_dim,]) # shape them into 1d vectors
input = tf.reshape(input, [input_dim,])
return my_fn_that_only_takes_in_vectors(output, input)
# Create session, preprocess data ...
for epoch in epoch_num:
for batch in range(total_example_num // batch_size):
sess.run(assign_input, feed_dict = {input_placeholder : some_appropriate_numpy_array})
sess.run(optimizer.minimize(loss_fn(net_out, net_in)))
Currently the neural network above works fine, but it is very slow because it updates gradient every sample (batch size = 1). I would like to set batch size > 1, but my_fn_that_only_takes_in_vectors cannot accommodate matrices whose first dimension is not 1. Due to the nature of my custom loss, flattening the batch input into a vector of length (batch_size * input_dim) seems to not work.
How would I write my new custom loss_fn now that the input and output are N x input_dim where N > 1? In Keras this would not have been an issue because keras somehow takes the average of the gradients of each example in the batch. For my TensorFlow function, should I take each row as a vector individually, pass them to my_fn_that_only_takes_in_vectors, then take the average of the results?
You can use a function that computes the loss on the whole batch, and works independently on the batch size. Basically the operations are applied to the whole first dimension of the input (the first dimension represents the element number in the batch). Here is an example, I hope this helps to see how the operations are carried out:
def my_loss(y_true, y_pred):
dx2 = tf.math.squared_difference(y_true[:, 0], y_true[:, 2]) # shape (BatchSize, )
dy2 = tf.math.squared_difference(y_true[:, 1], y_true[:, 3]) # shape: (BatchSize, )
denominator = dx2 + dy2 # shape: (BatchSize, )
dst_vec = tf.math.squared_difference(y_true, y_pred) # shape: (Batch, n_labels)
numerator = tf.reduce_sum(dst_vec, axis=-1) # shape: (BatchSize,)
loss_vector = tf.cast(numerator / denominator, dtype="float32") # shape: (BatchSize,) this is a vector containing the loss of each element of the batch
loss = tf.reduce_sum(loss_vector ) #if you want to sum the losses
return loss
I am not sure whether you need to return the sum or the avg of the losses for the batch.
If you sum, make sure to use a validation dataset with same batch size, otherwise the loss is not comparable.

Expected to see 1 array(s), but instead got the following list of 2 arrays:

I am attempting to make a multilayer model with the functional API from keras. The idea is to take an vector that describes an image (comes from a pretrained model) and concatenate it with another vector that comes from the text associated with that image. The first branch processes the input coming from the images, the second branch takes as input the text. Both are concatenated and then processed together with a few more layers.
When I train the model with:
model = Model(inputs=[xa.input, xb.input], outputs=z)
and this fit call:
model.fit([oraciones_img, X], y, epochs=2, batch_size=5, verbose=1)
everything works out fine.
The problem comes when I want to do a prediction. I provided two arrays with the same dimensions as oraciones_img and X to the model.predict function but it throws this error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 2 arrays: [array([[1.1081828 , 0.4828459 , 1.6334529 , ..., 0.03270344, 0.17258629,
0.03792314],
[0.8138524 , 0. , 0.65887845, ..., 0.6183274 , 0.00807555,
0.6506448 ],
[0.8...
I double checked that the dimensions from the inputs during training and testing are the same but did not work out. I tried the training data on the predict function, assuming that at least that should work out, but weirdly that throws the same error. In theory, I should be providing this:
model.predict([ (n_samples, 405504) , (n_samples, 50) ])
where:
(n_samples, 405504) are the vectors coming from n images
(n_samples, 50) are the vectors coming from n texts associated to the images
but somehow the model is asking me for a single array. I provided one to follow up, and the predict function is asking for a single array of dim4 which is a bit more confusing.
I have seen other posts regarding similar issues on the shape of the data, but all of them have problems running the fit function. I already passed that, this is about the predict function.
Any idea of what is going on?
define model
seq_length = 50
seq_length_img = (200704 + 2048)*2
InputA = Input(shape=(seq_length_img,))
InputB = Input(shape=(seq_length,))
xa = Dense(512, activation='relu')(InputA)
xa = Model(inputs = InputA, outputs= xa)
xb = Embedding(input_dim=vocab_size, output_dim=512, input_length=seq_length)(InputB)
xb = Flatten()(xb)
xb = Dense(512, activation='relu')(xb)
xb = Model(inputs = InputB, outputs= xb)
combined = concatenate([xa.output, xb.output], axis=-1)
combined = Reshape((1024, 1))(combined)
z = LSTM(1024, input_shape=(1024, 1), return_sequences=True)(combined)
z = LSTM(1024)(z)
z = Dense(100, activation='relu')(z)
z = Dense(vocab_size, activation='softmax')(z)
model = Model(inputs=[xa.input, xb.input], outputs=z)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit([oraciones_img, X], y, epochs=2, batch_size=5, verbose=1)
yhat = model.predict([feat, pred_sentence])