I have a data set with 30 time steps and 1 input. It has a binary label (i.e., target). So, based on the sequence, I would like to predict whether the sequence is a "1" or "0". Simple!
I build a model like this:
n_steps = 30
n_inputs = 1
model = keras.models.Sequential([
keras.layers.SimpleRNN(8, return_sequences=True, input_shape=[n_steps, n_inputs] ),
keras.layers.SimpleRNN(8, return_sequences=True),
keras.layers.SimpleRNN(8),
keras.layers.Dense(1, activation='sigmoid')
])
It works just fine. When I try to make a prediction, it requires a similar instance (with 30 time steps and 1 input) - which is fine.
Though, I can build the same model like this too (without the input_shape):
n_steps = 30
n_inputs = 1
model = keras.models.Sequential([
keras.layers.SimpleRNN(8, return_sequences=True),
keras.layers.SimpleRNN(8, return_sequences=True),
keras.layers.SimpleRNN(8),
keras.layers.Dense(1, activation='sigmoid')
])
This also trains fine (and generates the same accuracy value). Though, now, I can predict an instance that has different time steps. For example, I can predict an instance with 3 time steps and 1 input.
How is this happening??? Where can I find more information about this? Why doesn't the model require an instance with 30 time steps in this case? If I send an instance with only 3 time steps, how is the model accounting for the missing 27 time steps? Any information is greatly appreciated.
Related
hello everyone i am trying to train a model using cnn and keras but the training don't finish and i got this warning and it stops training , i don't know why and i didn't understand where the problem is can anyone gives me a advice or what i should change in the code
def myModel():
no_Of_Filters=60
size_of_Filter=(5,5) # THIS IS THE KERNEL THAT MOVE AROUND THE IMAGE TO GET THE FEATURES.
# THIS WOULD REMOVE 2 PIXELS FROM EACH BORDER WHEN USING 32 32 IMAGE
size_of_Filter2=(3,3)
size_of_pool=(2,2) # SCALE DOWN ALL FEATURE MAP TO GERNALIZE MORE, TO REDUCE OVERFITTING
no_Of_Nodes = 500 # NO. OF NODES IN HIDDEN LAYERS
model= Sequential()
model.add((Conv2D(no_Of_Filters,size_of_Filter,input_shape=(imageDimesions[0],imageDimesions[1],1),activation='relu'))) # ADDING MORE CONVOLUTION LAYERS = LESS FEATURES BUT CAN CAUSE ACCURACY TO INCREASE
model.add((Conv2D(no_Of_Filters, size_of_Filter, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool)) # DOES NOT EFFECT THE DEPTH/NO OF FILTERS
model.add((Conv2D(no_Of_Filters//2, size_of_Filter2,activation='relu')))
model.add((Conv2D(no_Of_Filters // 2, size_of_Filter2, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(no_Of_Nodes,activation='relu'))
model.add(Dropout(0.5)) # INPUTS NODES TO DROP WITH EACH UPDATE 1 ALL 0 NONE
model.add(Dense(noOfClasses,activation='softmax')) # OUTPUT LAYER
# COMPILE MODEL
model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
return model
############################### TRAIN
model = myModel()
print(model.summary())
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),steps_per_epoch=steps_per_epoch_val,epochs=epochs_val,validation_data=(X_validation,y_validation),shuffle=1)
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 20000 batches). You may need to use the repeat() function when building your dataset.
While using generators, you can either run the model without the step_per_epoch parameter and let the model figure out how many steps are there to cover an epoch.
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),
epochs=epochs_val,
validation_data=(X_validation,y_validation),
shuffle=1)
OR
you'll have to calculate steps_per_epoch and use it while training as follows;
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),
steps_per_epoch=(data_samples/batch_size)
epochs=epochs_val,
validation_data=(X_validation,y_validation),
shuffle=1)
Let us know if the issue still persists. Thanks!
I'm learning how to train RNN model on Keras and I was expecting that training a model to predict the Moving Average of the last N steps would be quite easy.
I have a time series with thousands of steps and I'm able to create a model and train it with batches of data.
If I train it with the following model though, the test set predictions differ a lot from real values. (batch = 30, moving average window = 10)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=False)(inputs)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
To be able to get good predictions, I need to add another layer of TimeDistributed, getting 2D predictions instead of 1D ones (I get one prediction per each time step)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=True)(inputs)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_labels))(x)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
I suggest that if your goal is to give as input the last 10 timesteps and have as a prediction the moving average to try a regressor model with Densely Connected layers rather than an RNN. (Linear activation with regularization might work well enough)
That option would be cheaper to train and run than an LSTM
I am experimenting with stateful LSTM on a time-series regression problem by using TensorFlow. I apologize that I cannot share the dataset.
Below is my code.
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
model.fit(train_feature, train_label,
epochs=10,
batch_size=batch_size)
When I run the above code, after the end of the first epoch, I will get an error as follows.
InvalidArgumentError: [_Derived_] Invalid input_h shape: [1,64,50] [1,49,50]
[[{{node CudnnRNN}}]]
[[sequential_1/lstm_1/StatefulPartitionedCall]] [Op:__inference_train_function_1152847]
Function call stack:
train_function -> train_function -> train_function
However, the model will be successfully trained if I change the batch_size to 1, and change the code for model training to the following.
total_epochs = 10
for i in range(total_epochs):
model.fit(train_feature, train_label,
epochs=1,
validation_data=(val_feature, val_label),
batch_size=batch_size,
shuffle=False)
model.reset_states()
Nevertheless, with a very large data (1 million rows), the model training will take a very long time since the batch_size is 1.
So, I wonder, how to train a stateful LSTM with a batch size larger than 1 (e.g. 64), without getting the invalid input_h shape error?
Thanks for your answers.
The fix is to ensure batch size never changes between batches. They must all be the same size.
Method 1
One way is to use a batch size that perfectly divides your dataset into equal-sized batches. For example, if total size of data is 1500 examples, then use a batch size of 50 or 100 or some other proper divisor of 1500.
batch_size = len(data)/proper_divisor
Method 2
The other way is to ignore any batch that is less than the specified size, and this can be done using the TensorFlow Dataset API and setting the drop_remainder to True.
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
When using the Dataset API like above, you will need to also specify how many rounds of training count as an epoch (essentially how many batches to count as 1 epoch). A tf.data.Dataset instance (the result from tf.data.Dataset.from_tensor_slices) doesn't know the size of the data that it's streaming to the model, so what constitutes as one epoch has to be manually specified with steps_per_epoch.
Your new code will look like this:
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
You can also include the validation set as well, like this (not showing other code):
batch_size = 64
val_data = tf.data.Dataset.from_tensor_slices((val_feature, val_label))
val_data = val_data.repeat().batch(batch_size, drop_remainder=True)
validation_steps = len(val_feature) // batch_size
model.fit(train_data, epochs=10,
steps_per_epoch=steps_per_epoch,
validation_steps=validation_steps)
Caveat: This means a few datapoints will never be seen by the model. To get around that, you can shuffle the dataset each round of training, so that the datapoints left behind each epoch changes, giving everyone a chance to be seen by the model.
buffer_size = 1000 # the bigger the slower but more effective shuffling.
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.shuffle(buffer_size=buffer_size, reshuffle_each_iteration=True)
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
Why the error occurs
Stateful RNNs and their variants (LSTM, GRU, etc.) require fixed batch size. The reason is simply because statefulness is one way to realize Truncated Backprop Through Time, by passing the final hidden state for a batch as the initial hidden state of the next batch. The final hidden state for the first batch has to have exactly the same shape as the initial hidden state of the next batch, which requires that batch size stay the same across batches.
When you set the batch size to 64, model.fit will use the remaining data at the end of an epoch as a batch, and this may not have up to 64 datapoints. So, you get such an error because the batch size is different from what the stateful LSTM expects. You don't have the problem with batch size of 1 because any remaining data at the end of an epoch will always contain exactly 1 datapoint, so no errors. More generally, 1 is always a divisor of any integer. So, if you picked any other divisor of your data size, you should not get the error.
In the error message you posted, it appears the last batch has size of 49 instead of 64. On a side note: The reason the shapes look different from the input is because, under the hood, keras works with the tensors in time_major (i.e. the first axis is for steps of sequence). When you pass a tensor of shape (10, 15, 2) that represents (batch_size, steps_per_sequence, num_features), keras reshapes it to (15, 10, 2) under the hood.
While training a neural network, on the fashion mnist dataset, I decided to have a greater number of nodes in my output layer than the number of classes in the dataset.
The dataset has 10 classes, while I trained my network to have 15 nodes in the output layer. I also used a softmax.
Now surprisingly, this gave me an accuracy of 97% which is quite good.
This leads me to the question, what do those extra 5 nodes even mean, and what do they do here?
Why is my softmax able to work properly when the label range(0-9) isn't equal to the number of nodes(15)?
And finally, in general, what does it mean to have more nodes in your output layer than the number of classes, in a classification task?
I understand the effects of having lesser nodes than the number of classes, and also that the rule of thumb is to use number of nodes = number of classes. Yet, I've never seen someone use a greater number of nodes, and I'd like to understand why/why not.
I'm attaching some code so that the results can be reproduced. This was done using Tensorflow 2.3
import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.mnist
(training_images, training_labels) , (test_images, test_labels) = mnist.load_data()
training_images = training_images/255.0
test_images = test_images/255.0
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dense(15, activation=tf.nn.softmax)])
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
model.fit(training_images, training_labels, epochs=5)
model.evaluate(test_images, test_labels)
The only reason you are able to use such a configuration is because you have specified your loss function as sparse_categorical_crossentropy.
let's understand the effects of greater output nodes in forward propagation.
Consider a neural network with 2 layers.
1st layer - 6 neurons (Hidden layer)
2nd layer - 4 neurons (output layer)
You have dataset X whose shape is(100*12) ie. 12 features and 100 rows.
you have labels y whose shape is (100,) containing two unique values 0 and 1.
Therefore essentially this is a binary classification problem but we will use 4 neurons in our output layer.
Consider each neuron as a logistic regression unit. Therefore each of your neurons will 12 weights (w1, w2,.....,w12)
Why? - Because you have 12 features.
Each neuron will output a single term given by a. I will give the computation of a in two steps.
z = w1x1 + w2x2 + ........ + w12*x12 + w0 # w0 is bias
a = activation(z)
Therefore, your 1st layer will output 6 values for each row in our dataset.
So now you have a feature matrix of 100 * 6.
This is passed to the 2nd layer and the same process repeats.
So in essence you are able to complete the forward propagation step even when you have more neurons than the actual classes.
Now let's see backpropagation.
For backpropagation to exist you must be able to calculate the loss_value.
we will take a small example:
y_true has two labels as in our problem and y_pred has 4 probability values since we have 4 units in our final layer.
y_true = [0, 1]
y_pred = [[0.03, 0.90, 0.02, 0.05], [0.15, 0.02, 0.8, 0.03]]
# Using 'auto'/'sum_over_batch_size' reduction type.
scce = tf.keras.losses.SparseCategoricalCrossentropy()
scce(y_true, y_pred).numpy() # 3.7092905
How is it calculated:
( log(0.03) + log(0.02) ) / 2
So essentially we can compute the loss so we can also compute its gradients.
Therefore no problem in using backpropagation too.
Therefore our model can very well train and achieve 90 % accuracy.
So the final question, what are these extra neurons representing. ie( neuron 2 and neuron 3).
Ans - They are representing the probability of the example being of class 2 and class 3 respectively. But since the labels contain no values of class 2 and class 3 they will have zero contribution in calculating the loss value.
Note- If you encode your y_label in one-hot-encoding and use categorical_crossentropy as your loss you will encounter an error.
I'm trying to categorize which electronic devices that are turned ON based only the sum of all electricity for my apartment. I have a setup where I measure each watt hour (blink of a LED), so the current consumption in watts have a precision of about 10 seconds, which is great.
I am trying to do this in tensorflow, and in the first iteration I want to use only one input (the total watts, e.g. 200W), and I want to have one output per electronic device. I also use dummy data now to see how it works (and because it would be very troublesome to categorize every measurement to be able to teach the algorithm).
Here is my code now:
import tensorflow as tf
import numpy as np
LABELS = [
'Nothing',
'Toaster', # Toaster uses 800W
'Lamp'] # Lamp uses just 100W
DATA_LENGTH = 20000
np.random.seed(1) # To be able to reproduce
# Create dummy data (1:s or 0:s)
nothing_data = np.array([1] * DATA_LENGTH)
toaster_data = np.random.randint(2, size=DATA_LENGTH)
lamp_data = np.random.randint(2, size=DATA_LENGTH)
labels = np.array(list(zip(nothing_data, toaster_data, lamp_data)))
x_train = (toaster_data * 800 + lamp_data * 100) / 900 # Normalize
y_train = labels
# Split up train and test data
x_test = x_train[15000:]
y_test = y_train[15000:]
x_train = x_train[:15000]
y_train = y_train[:15000]
# The model
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_dim=1),
tf.keras.layers.Dense(4, activation=tf.nn.relu),
tf.keras.layers.Dense(4, activation=tf.nn.relu),
tf.keras.layers.Dense(3, activation=tf.nn.sigmoid)
])
model.compile(optimizer='adadelta',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=10)
val_loss, val_acc = model.evaluate(x_test, y_test)
print(val_loss, val_acc)
Now to the problem, the val_acc is 1.0, 100%. (val_loss=0.059, val_acc=1.0)
Still when I predict, the predictions are very off.
# Predict
predict_input = [0.88888, 0.111111, 1.0000, 0.222]
predict_output = model.predict(predict_input)
First one should be toaster + nothing, but it also has 33% lamp. I would have liked binary output, if that was possible.
Do I need to have a "nothing" output?
You need to match the model type to your problem. You've applied what is basically a mixed linear regression prediction, to a problem of binary classification. The model is good if you want to predict that wattage, given the appliances turned on, but it's not so good in the opposite direction.
It's going to try all sorts of things with the paucity of data given and the freedom inherent in the model. Note that you really have only four training inputs: making multiple copies in equal amounts doesn't really make your training better.
Most of all, why are you not doing this with the "sum to target" algorithm, a much simpler and more effective way to solve the problem. The presented problem isn't really a ML sort of problem.
If you simply want to do this by training a model, then build one with multiple binary outputs. You can research "multiple labels" for leads on how to do so. If you're doing it only for a handful of appliances in your home, you might want to beat it to death with 2^n output states, and not worry about the structural accuracy.