tensorflow.js GPU crashing

tensorflow.js GPU crashing - tensorflow

My model takes an image and calculates a certain value, the input layer is a cropping layer that removes a number of pixels from the top and bottom of the image. The model works quite well, but when I change the settings of the cropping layer, say, remove 25 pixels from the top instead of 75, the browser window (chrome) flickers and outputs the following error:
NOTE: Right before the above error, it prints out the following message Couldn't parse line number in error followed by what appears to be GLSL code.
The same error shows up if I remove the cropping layer altogether.
I'm using tfjs v3.8.0 but also tested with v2.0.0 with a similar outcome. This is my model:
const model = tf.sequential();
// Cropping Layer
model.add(
tf.layers.cropping2D({
// If I change 75 to anything below 50, it crashes before completing the first epoch,
// If this layer is removed, it crashes almost immediately after training starts
cropping: [
[75, 25],
[0, 0]
],
// image height, width, depth
inputShape: [160, 320, 3]
})
);
model.add(
tf.layers.conv2d({
filters: 16,
kernelSize: [3, 3],
strides: [2, 2],
activation: 'relu',
})
);
model.add(
tf.layers.maxPool2d({
poolSize: [2, 2]
})
);
model.add(
tf.layers.conv2d({
filters: 32,
kernelSize: [3, 3],
strides: [2, 2],
activation: 'relu'
})
);
model.add(
tf.layers.maxPool2d({
poolSize: [2, 2]
})
);
model.add( tf.layers.flatten());
model.add( tf.layers.dense({ units: 1024, activation: 'relu' }));
model.add( tf.layers.dropout({ rate: 0.25 }));
model.add( tf.layers.dense({ units: 128, activation: 'relu' }));
model.add( tf.layers.dense({ units: 1, activation: 'linear' }));
model.compile({
optimizer: 'adam',
loss: 'meanSquaredError',
metrics: [
'accuracy',
],
});
Am I doing something obviously wrong?

Lost GL context means you've likely run out of GPU memory
WebGL backend has massive overhead compared to what model actually needs
You might get a bit further with forcing GL memory cleanup instead of leaving it at defaults, but it comes at a cost (it does slow things down significantly)
tf.ENV.set('WEBGL_DELETE_TEXTURE_THRESHOLD', 0);

The problem did end up being the GPU running out of memory as #vladimir-mandic suggested. But setting WEBGL_DELETE_TEXTURE_THRESHOLD to zero did not help in my case.
It took me a while to verify this because it happened between batches and I could not track it via tf.memory() at batchEnd because in the callback the memory had already been released OR the GPU would crash before reaching that point. I ended up doing two things to overcome this problem:
Reduce the image size: The cropping layer was helping not to reach the 'out of memory' state, hence removing it or reducing the number of pixels cropped was causing the application to crash. But since it also allocates tensors, I decided to resize the image via canvas manipulations BEFORE feeding them into the model.
Reduce the batchSize: I had been working with the default batchSize of 32 and it was only until I reduced it that I started to notice the crash would go away, this led me to investigate the internals of model.fitDataset and that's how I found the excessive memory consumption between batches.
As #vladimir-mandic recommended, setting WEBGL_DELETE_TEXTURE_THRESHOLD to 0 SHOULD also help alleviate this issue, but I didn't notice any significant effect in my case, so I didn't end up using it.

Related

Learning XOR using Tensorflow.js

Hello I am creating my first neural network using Tensorflow.js.
I want to use the points (0,0), (0,1), (1,0), (1,1) and the labels 0, 1, 1, 0 as inputs to my NN. I tried it the following way:
async function runModel() {
// Build and compile model.
const model = tf.sequential();
model.add(tf.layers.dense({units: 2, inputShape: [2]}));
model.compile({optimizer: 'sgd', loss: 'meanSquaredError'});
// Generate some synthetic data for training.
const xs = tf.tensor2d([[1], [0]], [2,1]);
const ys = tf.tensor2d([[1]], [1, 1]);
// Train model with fit().
await model.fit(xs, ys, {epochs: 10});
// Run inference with predict().
model.predict(tf.tensor2d([[0], [1]], [2, 1])).print();
}
runModel()
I end up with the error:
Uncaught (in promise) Error: Error when checking input: expected
dense_Dense1_input to have shape [,2], but got array with shape [2,1].
and I tried to play with all the parameters but I don't get it (even with documentation) how to succeed.

As already explained here and there, this error is thrown when there is a mismatch of the shape expected by the model and the shape of the training data.
expected dense_Dense1_input to have shape [,2], but got array with shape [2,1]
The error thrown is meaningful enough to help solve the issue. The first layer is expecting a tensor of shape [,2] since the inputShape is [2]. But xs has the shape [2, 1], it should rather have the shape [1, 2].
In the model, the last layer will return 2 values whereas in reality it should be only one ( an xor operation outputs only a single value). Therefore instead of units: 2, it should be units: 1. That means that ys should have the shape [,1]. The shape of ys is already what the model is supposed to have - so no changes there.
The shape of the tensor used for prediction should match the model input shape ie [, 2]
By making the above changes, it becomes the following:
const model = tf.sequential();
model.add(tf.layers.dense({units: 1, inputShape: [2]}));
model.compile({optimizer: 'sgd', loss: 'meanSquaredError'});
// Generate some synthetic data for training.
const xs = tf.tensor2d([[1, 0]]);
const ys = tf.tensor2d([[1]], [1, 1]);
// Train model with fit().
await model.fit(xs, ys, {epochs: 10});
// Run inference with predict().
model.predict(tf.tensor([[0, 1]], [1, 2])).print()

TensorflowJS model doesn't predict multiclass data properly

As a beginner, I have tried build a really simple multi-class classifier in tensorflowJS which is suppose to predict the direction of my eye sight.
Step 1: I created data set in the browser to train my model where I am storing images of my eyes rendered by webcam on a HTML5 canvas. I use arrow keys to label my images as 0=left,1=normal and 2=right. To train the model, I convert these lables using tf.onHot() before passing to the method.
// data collection
let imageArray = [];
let labelArray = [];
let collectData = (label) => {
const img = tf.tidy(() => {
const captureImg = getImage();
//console.log(captureImg.shape)
return captureImg;
})
imageArray.push(img)
labelArray.push(label) //--- labels are 0,1,2
}
// label conversion
let labelSet = tf.oneHot(tf.tensor1d(labelArray, 'int32'), 3);
Step 2: Instead of loading any per-trained model, I used my custom model that I built using tensorflowJS.
let createModel = () => {
const model = tf.sequential();
let config_one = {
kernelSize: 3,
filters: 40,
strides: 1,
activation: 'relu',
inputShape: [imageHeight, imageWidth, imageChannels]
}
model.add(tf.layers.conv2d(config_one));
let config_two = {
poolSize: [2, 2],
strides: [2, 2],
}
model.add(tf.layers.maxPooling2d(config_two));
model.add(tf.layers.flatten());
model.add(tf.layers.dropout(0.2));
// Two output values x and y
let congfig_output = {
units: 3,
activation: 'tanh',
}
model.add(tf.layers.dense(congfig_output));
// Use ADAM optimizer with learning rate of 0.0005 and MSE loss
let config_compile = {
optimizer: tf.train.adam(0.00005),
loss: 'categoricalCrossentropy',
}
model.compile(config_compile);
tf.memory()
return model;
}
Problems : There are several problems I am facing right now.
When I use meanSquared as loss function and adam learning rate 0.000005, my model starts predicting but it only predicts two of the eye's state normal and left/right so to do multi-class classification, I changed loss function to categoricalCrossentropy but the result is still same or sometime worst.
I tried other combination of hyper parameters but no luck. The worst situation I got into was my loss function was showing only three constant values repeatedly.
My browser would crashed in some case where - if - I pass too much data or use other type of optimizer in compile config such as sgd or anything else. When I did a quick search on google, I found I can use tf.memory() to check any memory leak which could be causing browser crash but that line didn't log anything in the console.
I was adjusting various values and parameters in the code and training the model which made it work sometimes, partially, and most of the time didn't even work. It was all hit and trial. Eventually I learned about parameters to use for loss function in the compile method and activation function in con2d input layer but other stuff is still confusing such as - number of epochs, batch size, learning rate in adam etc.
I understood or I think I understood these - kernalsize, filters, strides, inputshape but still have no idea how to decide number of layers various hyper parameters etc.
Edit - this is what I get after updating the code as per the suggestion. I still don't proper classification. I am training with minimum of 1000+ images.
A. I still get the loss recurring with fixed valeus
B. Accuracy is also repeating itself with 1, 0.5 and 0
function getImage() {
return tf.tidy(function () {
const image = tf.browser.fromPixels($('#eyes')[0]);
const batchedImage = image.expandDims(0);
const norm = batchedImage.toFloat().div(tf.scalar(255)).sub(tf.scalar(1));
return norm;
});
}
Here are the console output
Sample images -

Most obvious thing to me that is wrong with this is your output layer's activation function, where you use tanh you should be using softmax instead. Next, your learning rate is way to low try setting it to 0.001 which is a good default to use.
You also probably don't need dropout as you have not gotten any results to justify that the model is overfitting. You could also add in more convolutional layers to this, try the example below.
model.add(tf.layers.conv2d({
inputShape: [28, 28, 1],
kernelSize: 5,
filters: 8,
strides: 1,
activation: 'relu',
}));
model.add(tf.layers.maxPooling2d({
poolSize: [2, 2],
strides: [2, 2],
}));
model.add(tf.layers.conv2d({
kernelSize: 5,
filters: 16,
strides: 1,
activation: 'relu',
}));
model.add(tf.layers.maxPooling2d({
poolSize: [2, 2],
strides: [2, 2],
}));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({
units: 3,
activation: 'softmax',
}));
const LEARNING_RATE = 0.001;
const optimizer = tf.train.adam(LEARNING_RATE);
model.compile({
optimizer: optimizer,
loss: 'categoricalCrossentropy',
metrics: ['accuracy'],
});

How to minimize TensorFlow Keras Frozen Graph (.pb) Size using Hyperopt?

I am attempting to load a frozen graph created through TensorFlow Keras (.pb) onto a memory limited microcontroller. Via Hyperopt(...) I am optimizing my training hyperparameters, however I would like to include the size of the resulting model as part of the search space. At the moment, I'm including using the weighted loss as an input to hyperopt, something like the following:
def optimizer(args):
...
training_history = nn.train_v00(....)
file_bytes = int(os.path.getsize(frozen_graph_filepath)) # returns size in bytes
final_loss = training_history.history['loss'][-1]
weighted_loss = final_loss*(file_bytes/(100*1024)) # Want model smaller than 100KB
if OPTIMIZATION_TARGET == 'loss':
return_struct = {
'status': STATUS_OK,
'loss': weighted_loss,
'epochs': epochs,
'metrics': {
'accuracy': final_acc
}
}
return return_struct
space = {
'learning_rate': hp.loguniform('learning_rate', np.log(0.00001), np.log(0.05)),
'dropout': hp.uniform('dropout', 0, 0.5),
'batch_size': hp.quniform('batch_size', 32, 128, 2),
'input_seq_len': hp.quniform('seq_len', 32, N_ADC_SAMPLES, 2),
'cnn_n_filters': hp.quniform('cnn_n_filters', 1, 10, 1),
'cnn_kernel_size': hp.quniform('cnn_kernel_size', 1, 10, 1),
'cnn_pool_size': hp.quniform('cnn_pool_size', 1, 10, 1)
}
t = Trials()
best = fmin(optimizer, space, algo=tpe.suggest, max_evals=MAX_EVALS, trials=t)
From what I have found thus far, there isn't a way to directly backpropagate the size of the model back through the training, but is there a better way of doing this?
Thanks for the consideration!

If I understand you correctly this is a question about how to craft your loss function, not about hyperopt directly. Correct? If so, if you indeed have a hard cut-off at 100kb, I wouldn't scale your training loss (as it might lead to strange artifacts like a poor performing, but very tiny model taking the first spot).
How about you report the regular training loss if the model size is smaller than 100kb and otherwise return an astronomically large number? Just an idea.

model predicts NaN

I am trying to learn and practice on Tensorflow.js.
So, I tried to train a neural network on a [,2] shaped array as x (as I understood, this would simulate a problem where I have x samples that each one has 2 variables) and a [,1] array as y (what would mean if I'm correct, that the combination of my 2 variables generate 1 output).
And I tried to code it:
const model = tf.sequential();
model.add(tf.layers.dense({ units: 2, inputShape: [2] }));
model.add(tf.layers.dense({ units: 64, inputShape: [2] }));
model.add(tf.layers.dense({ units: 1, inputShape: [64] }));
// Prepare the model for training: Specify the loss and the optimizer.
model.compile({ loss: 'meanSquaredError', optimizer: 'sgd' });
// Generate some synthetic data for training.
const xs = tf.tensor([[1,5], [2,10], [3,15], [4,20], [5,25], [6,30], [7,35], [8,40]], [8, 2]);
const ys = tf.tensor([1, 2, 3, 4, 5, 6, 7, 8], [8, 1]);
// Train the model using the data.
model.fit(xs, ys, { epochs: 100 }).then(() => {
// Use the model to do inference on a data point the model hasn't seen before:
// Open the browser devtools to see the output
model.predict(tf.tensor([10, 50], [1, 2])).print();
});
But, what I am facing is that when I try to predict the [10,50] input, I have the following console output:
Tensor
[[NaN],]
So, I think my problem might be very simple, but I am really stuck with this and probably it is a matter of some background knowledge I'm missing.
Thank you!

The first layer takes the shape of the input data
model.add(tf.layers.dense({ units: 2, inputShape: [2] }))
The inputShape is [2], which means that your input x is of shape [2].
The last layer unit value gives the dimension of the output y.
model.add(tf.layers.dense({ units: 1, inputShape: [64] }));
So the shape of y should be [1]
In this case, the NaN prediction is related to the number of epochs for your training. If you decrease it to 2 or 3, it will return a numerical value. Actually, the error is related to how your optimizer is updating the weights. Alternatively, you can change the optimizer to adam and it will be fine.

I think I am late but I hope this helps someone.
I got same problem once and it was because I am getting training and testing data from file using "fs" dependency and I solved the problem by just doing this to the returned variable before returning it to the main function to start training:
JSON.parse(JSON.stringify(data))
I don't know the reason but for some reason the tensorflow model only accepts JSON Array and not any JavaScript array so just by doing this you are converting your array to json array instead of leaving it as JavaScript Array.
Hope this saves someone's time.

I dealt with this same issue for the past 2 days, and the problem was that I trained my model with a GPU (using Google Colab) and performed inference on the CPU. After changing the settings on Google Colab to use no hardware acceleration, my problem was fixed!
Hope this helps someone in the future.

Convert 2d tensor to 3d for use in LSTM layers

I'm building a small program to predict some float from an 1d array of floats. So far I've been using dense layers to achieve this:
const model = sequential();
model.add(layers.dense({units: 32, inputShape: [numCols,]}));
model.add(layers.activation({activation: 'relu'}));
model.add(layers.dense({units: 4}));
model.add(layers.dense({units: 1}));
Where my xs input shape is [numRows, numCols] (e.g. [132, 100] - in a dataset of 132 examples: [[1, 2, 3, ...], [4, 5, 6, ...], ...]) and my ys output is a single value [num] (e.g. [17.50]).
But I wanted to try out LSTM to test if it would perform better. The issue is that the layers for LSTM want a 3d matrix and I was not sure how to go about it.
I've tried the following:
const trainXs = xs.clone()
.reshape([numRows, numCols, 1]);
The above converted my input [[1, 2, 3, ...], [4, 5, 6, ...], ...] to [[[1], [2], [3], ...], [[4], [5], [6], ...], ...].
And the layers:
const model = sequential();
model.add(layers.simpleRNN({
units: 32,
inputShape: [numCols, numRows], // [100, 132]
recurrentInitializer: 'glorotNormal',
returnSequences: true
}));
model.add(layers.simpleRNN({
units: 32,
recurrentInitializer: 'glorotNormal',
returnSequences: true
}));
But the above would fail with the following error:
Error: Error when checking input: expected simple_rnn_SimpleRNN1_input to have shape [,100,132], but got array with shape [132,100,1].
I'm a bit confused and I'm not sure how I should reshape my 2d tensor to fit the requirements of the LSTM layers.
Update:
The fit call:
model.fit(trainXs, trainYs, {
epochs: 1000,
batchSize: 12,
validationData: [testXs, testYs] // Test data has the same shape as trainXs/trainYs
});
I only have a single layer the moment:
model.add(layers.simpleRNN({
units: 32,
inputShape: [1, numCols, numRows],
recurrentInitializer: 'glorotNormal',
returnSequences: true
}));

The reference says:
The shape of the input (not including the first, batch dimension) needs to be at least 2-D, with the first dimension being time steps.
so the first dimension of your input should contain the time steps. For simplicity just use 1. So in your case the shape of the tensor, which is passed to the cell would be [1, numCols, numRows] as you already got in the error message.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

tensorflow.js GPU crashing - tensorflow

Related

Learning XOR using Tensorflow.js

TensorflowJS model doesn't predict multiclass data properly

How to minimize TensorFlow Keras Frozen Graph (.pb) Size using Hyperopt?

model predicts NaN

Convert 2d tensor to 3d for use in LSTM layers

Categories

Resources