Your input ran out of data - tensorflow2.0

history = model.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=10,
verbose=1,
validation_data = validation_generator,
validation_steps=50)
tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.

To solve this problem, we need to pay attention two items:
How to define batch size and batch_size and steps_per_epoch.
The simple answer is steps_per_epoch=total_train_size//batch_size
How to define max number of epochs for the training process.
This is not as straightforward as the first one.
Majority answers covered the first topic, I didn't find good answer for second, try to explain as below:
I have a train data set of 93 data samples, and batch_size of 32, so the for the first question:
steps_per_epoch=total_train_size//batch_size=93//32=2
For the second question, it will depend on how many not repeated batch your data generator can provide, if I have 93 data samples and each batch need 32 two samples so the each epoch has 2 train steps. You will have 93//2 = 46 epochs that will able to provide not repeated batches, the epoch 47 will cause this error.
I didn't find reference for tensorflow data generator so this is just my understanding, if there are anything wrong please correct me, thanks!

Related

What to do if I don't want to waste samples by flooring steps_per_epoch in model.fit?

I have a model that has 536 training samples, and would like to run through all the samples per epoch. The batch size is 32, epoch is 50. Here is the code and its error:
results = model.fit(train_X, train_y, batch_size = 32, epochs = 50, validation_data=(val_X, val_y), callbacks=callbacks)
The dataset you passed contains 837 batches, but you passed epochs=50 and steps_per_epoch=17, which is a total of 850 steps. We cannot draw that many steps from this dataset. We suggest to set steps_per_epoch=16.
Total number of samples / batch size = steps per epoch = 536/32 = 16.75. The model.fit would work if I set steps per epoch = 16. Doesn't this mean I'm discarding 24 samples (0.75 * 32) per each epoch?
If yes, how can I not discard these samples? One way would be adjusting batch size to have no residual when diving # of samples by it.
If there are other ways, please enlighten me.
If you do not want to discard any sample of data to train, It's better not to use steps_per_epoch after defining the batch_size. Because the model itself can calculate the steps_per_epoch while training the input dataset based on defined batch_size and provided training samples.
Please go through with this attached gist where I have tried to elaborate these terms in more detail.

Unknown number of steps - Training convolution neural network at Google Colab Pro

I am trying to run (training) my CNN at Google Colab Pro, when I run my code, all is allright, but It does not know the number of steps, so an infinite loop is created.
Mounted at /content/drive
2.2.0-rc3
Found 10018 images belonging to 2 classes.
Found 1336 images belonging to 2 classes.
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
Epoch 1/300
8/Unknown - 364s 45s/step - loss: 54.9278 - accuracy: 0.5410
I am using ImageDataGenerator() for loadings images. How can I fix it?
An iterator does not store anything, it generates the data dynamically. When you are using a dataset or dataset iterator, you must provide steps_per_epoch. The length of an iterator is unknown until you iterate through it. You could explicitly pass len(datafiles) into the .fit function. So, You need to provide steps_per_epoch as shown below.
model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=total_val // batch_size
)
More details are mentioned here
steps_per_epoch: Integer or None. Total number of steps (batches of
samples) before declaring one epoch finished and starting the next
epoch. When training with input tensors such as TensorFlow data
tensors, the default None is equal to the number of samples in your
dataset divided by the batch size, or 1 if that cannot be determined.
If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch
will run until the input dataset is exhausted. This argument is not
supported with array inputs.
I notice you are using binary classification. One more thing to remember when you use ImageDataGenerator is to provide class_mode as shown below. Otherwise, there will be a bug (in keras) or 50% accuracy (in tf.keras).
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),class_mode='binary') #

MLP output of first layer is zero after one epoch

I've been running into an issue lately trying to train a simple MLP.
I'm basically trying to get a network to map the XYZ position and RPY orientation of the end-effector of a robot arm (6-dimensional input) to the angle of every joint of the robot arm to reach that position (6-dimensional output), so this is a regression problem.
I've generated a dataset using the angles to compute the current position, and generated datasets with 5k, 500k and 500M sets of values.
My issue is the MLP I'm using doesn't learn anything at all. Using Tensorboard (I'm using Keras), I've realized that the output of my very first layer is always zero (see image 1), no matter what I try.
Basically, my input is a shape (6,) vector and the output is also a shape (6,) vector.
Here is what I've tried so far, without success:
I've tried MLPs with 2 layers of size 12, 24; 2 layers of size 48, 48; 4 layers of size 12, 24, 24, 48.
Adam, SGD, RMSprop optimizers
Learning rates ranging from 0.15 to 0.001, with and without decay
Both Mean Squared Error (MSE) and Mean Absolute Error (MAE) as the loss function
Normalizing the input data, and not normalizing it (the first 3 values are between -3 and +3, the last 3 are between -pi and pi)
Batch sizes of 1, 10, 32
Tested the MLP of all 3 datasets of 5k values, 500k values and 5M values.
Tested with number of epoches ranging from 10 to 1000
Tested multiple initializers for the bias and kernel.
Tested both the Sequential model and the Keras functional API (to make sure the issue wasn't how I called the model)
All 3 of sigmoid, relu and tanh activation functions for the hidden layers (the last layer is a linear activation because its a regression)
Additionally, I've tried the very same MLP architecture on the basic Boston housing price regression dataset by Keras, and the net was definitely learning something, which leads me to believe that there may be some kind of issue with my data. However, I'm at a complete loss as to what it may be as the system in its current state does not learn anything at all, the loss function just stalls starting on the 1st epoch.
Any help or lead would be appreciated, and I will gladly provide code or data if needed!
Thank you
EDIT:
Here's a link to 5k samples of the data I'm using. Columns B-G are the output (angles used to generate the position/orientation) and columns H-M are the input (XYZ position and RPY orientation). https://drive.google.com/file/d/18tQJBQg95ISpxF9T3v156JAWRBJYzeiG/view
Also, here's a snippet of the code I'm using:
df = pd.read_csv('kinova_jaco_data_5k.csv', names = ['state0',
'state1',
'state2',
'state3',
'state4',
'state5',
'pose0',
'pose1',
'pose2',
'pose3',
'pose4',
'pose5'])
states = np.asarray(
[df.state0.to_numpy(), df.state1.to_numpy(), df.state2.to_numpy(), df.state3.to_numpy(), df.state4.to_numpy(),
df.state5.to_numpy()]).transpose()
poses = np.asarray(
[df.pose0.to_numpy(), df.pose1.to_numpy(), df.pose2.to_numpy(), df.pose3.to_numpy(), df.pose4.to_numpy(),
df.pose5.to_numpy()]).transpose()
x_train_temp, x_test, y_train_temp, y_test = train_test_split(poses, states, test_size=0.2)
x_train, x_val, y_train, y_val = train_test_split(x_train_temp, y_train_temp, test_size=0.2)
mean = x_train.mean(axis=0)
x_train -= mean
std = x_train.std(axis=0)
x_train /= std
x_test -= mean
x_test /= std
x_val -= mean
x_val /= std
n_epochs = 100
n_hidden_layers=2
n_units=[48, 48]
inputs = Input(shape=(6,), dtype= 'float32', name = 'input')
x = Dense(units=n_units[0], activation=relu, name='dense1')(inputs)
for i in range(1, n_hidden_layers):
x = Dense(units=n_units[i], activation=activation, name='dense'+str(i+1))(x)
out = Dense(units=6, activation='linear', name='output_layer')(x)
model = Model(inputs=inputs, outputs=out)
optimizer = SGD(lr=0.1, momentum=0.4)
model.compile(optimizer=optimizer, loss='mse', metrics=['mse', 'mae'])
history = model.fit(x_train,
y_train,
epochs=n_epochs,
verbose=1,
validation_data=(x_test, y_test),
batch_size=32)
Edit 2
I've tested the architecture with a random dataset where the input was a (6,) vector where input[i] is a random number and the output was a (6,) vector with output[i] = input[i]² and the network didn't learn anything. I've also tested a random dataset where the input was a random number and the output was a linear function of the input, and the loss converged to 0 pretty quickly. In short, it seems the simple architecture is unable to map a non-linear function.
the output of my very first layer is always zero.
This typically means that the network does not "see" any pattern in the input at all, which causes it to always predict the mean of the target over the entire training set, regardless of input. Your output is in the range of -𝜋 to 𝜋 probably with an expected value of 0, so it checks out.
My guess is that the model is too small to represent the data efficiently. I would suggest that you increase the number of parameters in the model by a factor of 10 or 100 and see if it starts seeing something. Limiting the number of parameters has a regularizing effect on the network, and strong regularization usually leads the the aforementioned derping to the mean.
I'm by no means a robotics expert, but I guess that there are a lot of situations where a small nudge in the output parameters causes a large change of the input. Let's say I'm trying to scratch my back with my left hand - the farther my hand goes to the left, the harder the task becomes, so at some point I might want to switch hands, which is a discontinuous configuration change. A bad analogy, sure, but I hope it demonstrates my hunch that there are certain places in the configuration space where small target changes cause large configuration changes.
Such large changes will cause a very large, very noisy gradient around those points. I'm not sure how well the network will work around these noisy gradients, but I would suggest as an experiment that you try to limit the training dataset to a set of outputs that are connected smoothly to one another in the configuration space of the arm, if that makes sense. Going further, you should remove any points from the dataset that are close to such configuration boundaries. To make up for that at inference time, you might instead want to sample several close-by points and choose the most common prediction as the final result. Hopefully some of those points will land in a smooth configuration area.
Also, adding batch normalization before each dense layer will help smooth the gradient and provide for more reliable training.
As for the rest of your hyperparameters:
A batch size of 32 is good, a very small batch size will make the gradient too noisy
The loss function is not critical, both MSE and MAE should work
The activation functions aren't critical, ReLU is a good default choice.
The default initializers a good enough.
Normalizing is important for Dense layers, so keep it
Train for as many epochs as you need as long as both the training and validation loss are dropping. If the validation loss hasn't dropped for 5-10 epochs you might as well stop early.
Adam is a good default choice. Start with a small learning rate and increase the learning rate at the beginning of training only if the training loss is dropping consistently over several epochs.
Further reading: 37 Reasons why your Neural Network is not working
I ended up replacing the first dense layer with a Conv1D layer and the network now seems to be learning decently. It's overfitting to my data, but that's territory I'm okay with.
I'm closing the thread for now, I'll spend some time playing with the architecture.

Low accuracy of DNN created using tf.keras on dataset having small feature set

total train data record: 460000
total cross-validation data record: 89000
number of output class: 392
tensorflow 1.8.0 CPU installation
Each data record has 26 features, where 25 are numeric and one is categorical which is one hot encoded into 19 additional features. Initially, not all feature value was present for each data record. I have used avg to fill missing float type features and most frequent value for missing int type feature. Output can be any of 392 classes labeled as 0 to 391.
Finally, all features are passed through a StandardScaler()
Here is my model:
output_class = 392
X_train, X_test, y_train, y_test = get_data()
# y_train and y_test contains int from 0-391
# Make y_train and y_test categorical
y_train = tf.keras.utils.to_categorical(y_train, unique_dtc_count)
y_test = tf.keras.utils.to_categorical(y_test, unique_dtc_count)
# Convert to float type
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)
# tf.enable_eager_execution() # turned off to use rmsprop optimizer
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(400, activation=tf.nn.relu, input_shape=
(44,)))
model.add(tf.keras.layers.Dense(40000, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(392, activation=tf.nn.softmax))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
import logging
logging.getLogger().setLevel(logging.INFO)
model.fit(X_train, y_train, epochs=3)
loss, acc = model.evaluate(X_test, y_test)
print('Accuracy', acc)
But this model gives only 28% accuracy on both on training and test data. What should I change here to get a good accuracy on both training and test data? Should I go wider and deeper? Or should I consider taking more features?
Note: there were total 400 unique features in the dataset. But most of the features only appeared randomly in 5 to 10 data record. And some features have no relevance in other data records. I picked 26 features based on domain knowledge and frequency in data records.
Any suggestion is appreciated. Thanks.
EDIT: I forgot to add this in the original post, #Neb suggested a less wide deeper network, I actually tried this. My first model was a [44,400,400,392] layer. It gave me around 30% accuracy in training and testing.
Your model is too wider. You have 400 nodes in the first hidden layer and 40.000 in the second layer, for a total of 400*44 + 40.000*400 + 392*400 = 16.174.400 parameters. However, you only input 44 features!
Because of this, your net is capable of detecting even the smallest, most imperceptible variations in inputs and finally it considers them as valuable information instead of noise. I'm quite sure that if you leave your network training for a long time (here I only see 3 epoch), it will end up with overfitting your training set.
You have some solutions:
reduce the number of nodes per levels. You may also experiment adding 1 or 2 new layers. A possible structure might be [44, 128, 512, 392]
Implement regression. You have multiple way to do this:
restrict the range the range in which network parameters live
implement Dropout
implement Batch normalization (which is known to have a small regularization effect)
use Adam Optimizer instead of RMSprop
If your features are somewhat correlated, you may try a CNN instead of a Fully connected network.
Then, to improve generalization you can:
explore the dataset looking for outliers and remove them. An outlier is a sample which can confuse the network or does not convey any additional information.
"randomly" initialize your parameters, e.g using Xavier's Initialization
Finally, I would say: do you really need 392 classes? Could you merge some of them?

What is the relationship between steps and epochs in TensorFlow?

I am going through TensorFlow get started tutorial. In the tf.contrib.learn example, these are two lines of code:
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y, batch_size=4, num_epochs=1000)
estimator.fit(input_fn=input_fn, steps=1000)
I am wondering what is the difference between argument steps in the call to fit function and num_epochs in the numpy_input_fn call. Shouldn't there be just one argument? How are they connected?
I have found that code is somehow taking the min of these two as the number of steps in the toy example of the tutorial.
At least, one of the two parameters either num_epochs or steps has to be redundant. We can calculate one from the other. Is there a way I can know how many steps (number of times parameters get updated) my algorithm actually took?
I am curious about which one takes precedence. And does it depend on some other parameters?
TL;DR: An epoch is when your model goes through your whole training data once. A step is when your model trains on a single batch (or a single sample if you send samples one by one). Training for 5 epochs on a 1000 samples 10 samples per batch will take 500 steps.
The contrib.learn.io module is not documented very well, but it seems that numpy_input_fn() function takes some numpy arrays and batches them together as input for a classificator. So, the number of epochs probably means "how many times to go through the input data I have before stopping". In this case, they feed two arrays of length 4 in 4 element batches, so it will just mean that the input function will do this at most a 1000 times before raising an "out of data" exception. The steps argument in the estimator fit() function is how many times should estimator do the training loop. This particular example is somewhat perverse, so let me make up another one to make things a bit clearer (hopefully).
Lets say you have two numpy arrays (samples and labels) that you want to train on. They are a 100 elements each. You want your training to take batches with 10 samples per batch. So after 10 batches you will go through all of your training data. That is one epoch. If you set your input generator to 10 epochs, it will go through your training set 10 times before stopping, that is it will generate at most a 100 batches.
Again, the io module is not documented, but considering how other input related APIs in tensorflow work, it should be possible to make it generate data for unlimited number of epochs, so the only thing controlling the length of training are going to be the steps. This gives you some extra flexibility on how you want your training to progress. You can go a number of epochs at a time or a number of steps at a time or both or whatever.
Epoch: One pass through the entire data.
Batch size: The no of examples seen in one batch.
If there are 1000 examples and the batch size is 100, then there will be 10 steps per epoch.
The Epochs and batch size completely define the number of steps.
steps_cal = (no of ex / batch_size) * no_of_epochs
estimator.fit(input_fn=input_fn)
If you just write the above code, then the value of 'steps' is as given by 'steps_cal' in the above formula.
estimator.fit(input_fn=input_fn, steps = steps_less)
If you give a value(say 'steps_less') less than 'steps_cal', then only 'steps_less' no of steps will be executed.In this case, the training will not cover the entire no of epochs that were mentioned.
estimator.fit(input_fn=input_fn, steps = steps_more)
If you give a value(say steps_more) more than steps_cal, then also 'steps_cal' no of steps will be executed.
Let's start the opposite the order:
1) Steps - number of times the training loop in your learning algorithm will run to update the parameters in the model. In each loop iteration, it will process a chunk of data, which is basically a batch. Usually, this loop is based on the Gradient Descent algorithm.
2) Batch size - the size of the chunk of data you feed in each loop of the learning algorithm. You can feed the whole data set, in which case the batch size is equal to the data set size.You can also feed one example at a time. Or you can feed some number N of examples.
3) Epoch - the number of times you run over the data set extracting batches to feed the learning algorithm.
Say you have 1000 examples. Setting batch size = 100, epoch = 1 and steps = 200 gives a process with one pass (one epoch) over the entire data set. In each pass it will feed the algorithm a batch with 100 examples. The algorithm will run 200 steps in each batch. In total, 10 batches are seen. If you change the epoch to 25, then it will do this 25 times, and you get 25x10 batches seen altogether.
Why do we need this? There are many variations on gradient descent (batch, stochastic, mini-batch) as well as other algorithms for optimizing the learning parameters (e.g., L-BFGS). Some of them need to see the data in batches, while others see one datum at a time. Also, some of them include random factors/steps, hence you might need multiple passes on the data to get good convergence.
This answer is based on the experimentation I have done on the getting started tutorial code.
Mad Wombat has given a detailed explanation of the terms num_epochs, batch_size and steps. This answer is an extension to his answer.
num_epochs - The maximum number of times the program can iterate over the entire dataset in one train(). Using this argument, we can restrict the number of batches that can be processed during execution of one train() method.
batch_size - The number of examples in a single batch emitted by the input_fn
steps - Number of batches the LinearRegressor.train() method can process in one execution
max_steps is another argument for LinearRegressor.train() method. This argument defines the maximum number of steps (batches) can process in the LinearRegressor() objects lifetime.
Let's whats this means. The following experiments change two lines of the code provided by the tutorial. Rest of the code remains as is.
Note: For all the examples, assume the number of training i.e. the length of x_train to be equal to 4.
Ex 1:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, we defined the batch_size = 4 and num_epochs = 2. So, the input_fn can emit just 2 batches of input data for one execution of train(). Even though we defined steps = 10, the train() method stops after 2 steps.
Now, execute the estimator.train(input_fn=input_fn, steps=10) again. We can see that 2 more steps have been executed. We can keep executing the train() method again and again. If we execute train() 50 times, a total of 100 steps have been executed.
Ex 2:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, the value of batch_size is changed to 2 (it was equal to 4 in Ex 1). Now, in each execution of train() method, 4 steps are processed. After the 4th step, there are no batches to run on. If the train() method is executed again, another 4 steps are processed making it a total of 8 steps.
Here, the value of steps doesn't matter because the train() method can get a maximum of 4 batches. If the value of steps is less than (num_epochs x training_size) / batch_size, see ex 3.
Ex 3:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=8, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
Now, let batch_size = 2, num_epochs = 8 and steps = 10. The input_fn can emit a total of 16 batches in one run of train() method. However, steps is set to 10. This means that eventhough input_fn can provide 16 batches for execution, train() must stop after 10 steps. Ofcourse, train() method can be re-executed for more steps cumulatively.
From examples 1, 2, & 3, we can clearly see how the values of steps, num_epoch and batch_size affect the number of steps that can be executed by train() method in one run.
The max_steps argument of train() method restricts the total number of steps that can be run cumulatively by train()
Ex 4:
If batch_size = 4, num_epochs = 2, the input_fn can emit 2 batches for one train() execution. But, if max_steps is set to 20, no matter how many times train() is executed only 20 steps will run in optimization. This is in contrast to example 1, where the optimizer can run to 200 steps if the train() method is exuted 100 times.
Hope this gives a detailed understanding of what these arguments mean.
num_epochs: the maximum number of epochs (seeing each data point).
steps: the number of updates (of parameters).
You can update multiple times in an epoch
when the batch size is smaller than the number of training data.