Suppose I have a tensorflow graph implementing a classification model:
x = tf.placeholder(tf.float32, shape)
# [insert mdoel here]
logits = tf.layers.dense(inputs=..., units=num_labels, activation=None)
Now suppose I want to optimize over the inputs using the Adam optimizer.
For instance, in order to find targeted adversarial examples, I would declare a variable to optimize over (initialized at some sample during execution), specify a target class different from the true class, compute the cross-entropy and minimize it.
var_to_optimize = tf.Variable(np.zeros(shape, dtype=np.float32))
tgt_label = tf.placeholder(tf.float32, shape=[num_labels])
xent = tf.nn.softmax_cross_entropy_with_logits_v2(labels=tgt_label, logits=logits)
I would then like to minimize the cross-entropy by perturbing the inputs
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
training_op = optimizer.minimize(xent, var_list=[var_to_optimize])
However, xent requires that I feed values for the input placeholder x. How do I link the model's logits with var_to_optimize?
The question I was trying to answer is essentially the following: how can one create two separate optimization procedures on the same tensorflow graph?
The tutorial in the following link describes how to do this: a tensorflow graph is defined that trains a neural network and then adds random noise (uniform across samples) optimized to induce misclassification of most samples.
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/12_Adversarial_Noise_MNIST.ipynb
Related
I want to train a Neural Network for a classification task in Keras using a TensorFlow backend with a custom loss function. In my loss, I want to give different weights to different training examples. I have some datapoints I consider important and some I do not consider as important. I want my loss function to take this into account and punish errors in important examples more than in less important ones.
I have already built my model:
input = tf.keras.Input(shape=(16,))
hidden_layer_1 = tf.keras.layers.Dense(5, kernel_initializer='glorot_uniform', activation='relu')(input)
output = tf.keras.layers.Dense(1, kernel_initializer='normal', activation='softmax')(hidden_layer_1)
model = tf.keras.Model(input, output)
model.compile(loss=custom_loss(input), optimizer='adam', run_eagerly=True, metrics = [tf.keras.metrics.Accuracy(), 'acc'])
and the currrent state of my loss function is:
def custom_loss(input):
def loss(y_true, y_pred):
return ...
return loss
I'm struggling with implementing the loss function in the way I explained above, mainly because I don't exactly know what input, y_pred and y_true are (KerasTensors, I know - but what is the content? And is it for one training example only or for the whole batch?). I'd appreciate help with
printing out the values of input, y_true and y_pred
converting the input value to a numpy ndarray ([1,3,7] for example) so I can use the array to look up my weight for this specific training data point
once I have my weigth as a number (0.5 for example), how do I implement the computation of the loss function in Keras? My loss for one training exaple should be 0 if the classification was correct and weight if it was incorrect.
I have created this Linear regression model using Tensorflow (Keras). However, I am not getting good results and my model is trying to fit the points around a linear line. I believe fitting points around degree 'n' polynomial can give better results. I have looked googled how to change my model to polynomial linear regression using Tensorflow Keras, but could not find a good resource. Any recommendation on how to improve the prediction?
I have a large dataset. Shuffled it first and then spited to 80% training and 20% Testing. Also dataset is normalized.
1) Building model:
def build_model():
model = keras.Sequential()
model.add(keras.layers.Dense(units=300, input_dim=32))
model.add(keras.layers.Activation('sigmoid'))
model.add(keras.layers.Dense(units=250))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=200))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=150))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=100))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=50))
model.add(keras.layers.Activation('linear'))
model.add(keras.layers.Dense(units=1))
#sigmoid tanh softmax relu
optimizer = tf.train.RMSPropOptimizer(0.001,
decay=0.9,
momentum=0.0,
epsilon=1e-10,
use_locking=False,
centered=False,
name='RMSProp')
#optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
2) Train the model:
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 500
# Store training stats
history = model.fit(train_data, train_labels, epochs=EPOCHS,
validation_split=0.2, verbose=1,
callbacks=[PrintDot()])
3) plot Train loss and val loss
enter image description here
4) Stop When results does not get improved
enter image description here
5) Evaluate the result
[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)
#Testing set Mean Abs Error: 1.9020842795676374
6) Predict:
test_predictions = model.predict(test_data).flatten()
enter image description here
7) Prediction error:
enter image description here
Polynomial regression is a linear regression with some extra additional input features which are the polynomial functions of original input features.
i.e.;
let the original input features are : (x1,x2,x3,...)
Generate a set of polynomial functions by adding some transformations of the original features, for example: (x12, x23, x13x2,...).
One may decide which all functions are to be included depending on their constraints such as intuition on correlation to the target values, computational resources, and training time.
Append these new features to the original input feature vector. Now the transformed input feature vector has a size of len(x1,x2,x3,...) + len(x12, x23, x13x2,...)
Further, this updated set of input features (x1,x2,x3,x12, x23, x13x2,...) is feeded into the normal linear regression model. ANN's architecture may be tuned again to get the best trained model.
PS: I see that your network is huge while the number of inputs is only 32 - this is not a common scale of architecture. Even in this particular linear model, reducing the hidden layers to one or two hidden layers may help in training better models (It's a suggestion with an assumption that this particular dataset is similar to other generally seen regression datasets)
I've actually created polynomial layers for Tensorflow 2.0, though these may not be exactly what you are looking for. If they are, you could use those layers directly or follow the procedure used there to create a more general layer https://github.com/jloveric/piecewise-polynomial-layers
I am experimenting with LSTMs in Keras with little to no luck. At some moment I decided to scale back to the most basic problems in order finally achieve some positive result.
However, even with simplest problems I find that Keras is unable to converge while the implementation of the same problem in Tensorflow gives stable result.
I am unwilling to just switch to Tensorflow without understanding why Keras keeps diverging on any problem I attempt.
My problem is a many-to-many sequence prediction of delayed sin echo, example below:
Blue line is a network input sequence, red dotted line is an expected output.
The experiment was inspired by this repo and workable Tensorflow solution was also created from it too.
The relevant excerpts from the my code are below, and full version of my minimal reproducible example is available here.
Keras model:
model = Sequential()
model.add(LSTM(n_hidden,
input_shape=(n_steps, n_input),
return_sequences=True))
model.add(TimeDistributed(Dense(n_input, activation='linear')))
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(lr=learning_rate),
metrics=[])
Tensorflow model:
x = tf.placeholder(tf.float32, [None, n_steps, n_input])
y = tf.placeholder(tf.float32, [None, n_steps])
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_steps], seed = SEED))
}
biases = {
'out': tf.Variable(tf.random_normal([n_steps], seed = SEED))
}
lstm = rnn.LSTMCell(n_hidden, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(lstm, inputs=x,
dtype=tf.float32,
time_major=False)
h = tf.transpose(outputs, [1, 0, 2])
pred = tf.nn.bias_add(tf.matmul(h[-1], weights['out']), biases['out'])
individual_losses = tf.reduce_sum(tf.squared_difference(pred, y),
reduction_indices=1)
loss = tf.reduce_mean(individual_losses)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) \
.minimize(loss)
I claim that other parts of code (data_generation, training) are completely identical. But learning progress with Keras stalls early and yields unsatisfactory predictions. Graphs of logloss for both libraries and example predictions are attached below:
Logloss for Tensorflow-trained model:
Logloss for Keras-trained model:
It's not easy to read from graph, but Tensorflow reaches target_loss=0.15 and stops early after about 10k batches. But Keras uses up all 13k batches reaching loss about only 1.5. In a separate experiment where Keras was running for 100k batches it went no further stalling around 1.0.
Figures below contain: black line - model input signal, green dotted line - ground truth output, red line - acquired model output.
Predictions of Tensorflow-trained model:
Predictions of Keras-trained model:
Thank you for suggestions and insights, dear colleagues!
Ok, I have managed to solve this. Keras implementation now converges steadily to a sensible solution too:
The models were in fact not identical. You may inspect with extra caution the Tensorflow model version from the question and verify for yourself that actual Keras equivalent is listed below, and isn't what stated in the question:
model = Sequential()
model.add(LSTM(n_hidden,
input_shape=(n_steps, n_input),
return_sequences=False))
model.add(Dense(n_steps, input_shape=(n_hidden,), activation='linear'))
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(lr=learning_rate),
metrics=[])
I will elaborate. Workable solution here uses that last column of size n_hidden spat out by LSTM as an intermediate activation then fed to the Dense layer.
So, in a way, the actual prediction here is made by the regular perceptron.
One extra take away note - source of mistake in the original Keras solution is already evident from the inference examples attached to question. We see there that earlier timestamps fail utterly, while later timestamps are near perfect. These earlier timestamps correspond to the states of LSTM when it were just initialized on new window and clueless of context.
I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.
I am having trouble understanding the way that the content and style filters are being trained for (e.g. in this paper ) in style transfer algorithms using TensorFlow.
I have examined a few implementations of the algorithm in the linked paper, but I can't quite grok their treatment of this step. To that end, I thought it would be to helpful implement a naive version, without using the pre-trained model. My understanding of the steps involved are:
Train a CNN on a single image (in the paper they use the pre-trained VGG network)
Using the trained network, feed in a white noise image. Define a new loss function that is minimized by updating the input image (this is how the image is 'painted') e.g. 'content' is derived by minimizing the distance between the conv layers in the trained model, and those resulting from the input (white noise) image
Thus, the implementation should be something like:
import TensorFlow as tf
x_in = tf.placeholder(tf.float32, shape=[None, num_pixels], name='x')
y_ = tf.placeholder(tf.float32, shape=[None, num_pixels], name='y')
...
diff = y_-y_out
loss = tf.reduce_sum(tf.abs(diff)) # minimizing 'pixel difference'
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
# training model
for i in range(NUM_TRAINING_STEPS):
_, loss_val = sess.run([train_step, loss],
feed_dict={x_in: input_image, y_: input_image})
After training the model, I can generate a white noise image, but how can I used the trained model to update my input image? My suspicion is that I need to create a second network, where x_in is of type tf.Variable and load the weights and biases from the trained model, but the details of this elude me.
Yes, you could store your input image in a tf.Variable, load the weights from the trained model, and run an optimization loop with the style transfer loss function wrt to the input variable.
you can just use a style transfer as a service site to train styles like http://somatic.io