I have timeseries data (ECG). I have annotations for blocks of 30seconds.
each block has 1000 data points. We have 500 of those data blocks.
The target, the annotations are e.g. in range 1 to 5.
To be clear please see Figure
About X-DATA
How translate that into the Keras notation for input data [Samples,timesteps, features]?
My guess:
Samples=Blocks (500)
timesteps=values(1000)
features= ECG as itselve (1)
resulting in [500,1000,1]
About Y-Data(target)
My target or y data would result in
[500,1,1]
after one hot encoding it would be
[500,5,1]
The problem is that Keras expect the X and y data to be of same dimensions. But increasing my ydata to 1000 per timestep would not make sense to me.
Thanks for your help
p.s. cannot answer directly as I am with my parent in law. Thanks in advance
I think you're thinking about y incorrectly. From my understanding based on you're graph.
y actually is (500, 5) after one hot encoding. That is, for every block there is a single outcome.
Also there is no need for X and y to have the same dimensions in Keras (unless you have a seq2seq requirement which is not the case here).
What we do want is the model to give us a probability distribution over
the possible labels for each block, and that we'll achieve using a softmax
on the last (Dense) layer.
Here is how I simulated your problem:
import numpy as np
from keras.models import Model
from keras.layers import Dense, LSTM
# using eye doesn't capture one-hot but works for the example
series = np.random.rand(500, 1000, 1)
labels = np.eye(500, 5)
inp = Input(shape=(1000, 1))
lstm = LSTM(128)(inp)
out = Dense(5, activation='softmax')(lstm)
model = Model(inputs=[inp], outputs=[out])
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(series, labels)
Related
I have a set of observations made of 10 features, each of these features being a real number in the interval (0,2). Say I wanted to train a simple neural network to classify whether the average of those features is above or below 1.0.
Unless I'm missing something, it should be enough with a two-layer network with one neuron on each layer. The activation functions would be a linear one (i.e. no activation function) on the first layer and a sigmoid on the output layer. An example of a NN with this architecture that would work is one that calculates the average on the first layer (i.e. all weights = 0.1 and bias=0) and asseses whether that is above or below 1.0 in the second layer (i.e. weight = 1.0 and bias = -1.0).
When I implement this using TensorFlow (see code below), I obviously get a very high accuracy quite quickly, but never get to 100% accuracy... I would like some help to understand conceptually why this is the case. I don't see why the backppropagation algorithm does not reach a set of optimal weights (may be this is related with the loss function I'm using, which has local minmums?). Also I would like to know whether a 100% accuracy is achievable if I use different activations and/or loss function.
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
X = [np.random.random(10)*2.0 for _ in range(10000)]
X = np.array(X)
y = X.mean(axis=1) >= 1.0
y = y.astype('int')
train_ratio = 0.8
train_len = int(X.shape[0]*0.8)
X_train, X_test = X[:train_len,:], X[train_len:,:]
y_train, y_test = y[:train_len], y[train_len:]
def create_classifier(lr = 0.001):
classifier = tf.keras.Sequential()
classifier.add(tf.keras.layers.Dense(units=1))
classifier.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))#, input_shape=input_shape))
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
metrics=[tf.keras.metrics.BinaryAccuracy()],
classifier.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), metrics=metrics)
return classifier
classifier = create_classifier(lr = 0.1)
history = classifier.fit(X_train, y_train, batch_size=1000, validation_split=0.1, epochs=2000)
Ignoring the fact that a neural network is an odd approach for this problem, and answering your specific question - it looks like your learning rate might be too high which could explain the fluctuations around the optimal point.
I would like to use TFP to write a neural network where the output are the probabilities of a categorical variable with 3 classes, and train it using the negative log-likelihood.
As I'm moving my first steps with TF and TFP, I started with a toy model where the input layer has only 1 unit receiving a null input, and the output layer has 3 units with softmax activation function. The idea is that the biases should learn (up to an additive constant) the log of the probabilities.
Here below is my code, true_p are the true parameters I use to generate the data and I would like to learn, while learned_p is what I get from the NN.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll
from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions
# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000
# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))
# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)
model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)
# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)
# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p
With this setup, I get the result:
>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)
I over-estimate the second category, and can't really distinguish between the first and the third one. What's worst, if I plot the probabilities at the end of each epoch, it looks like they are converging monotonically to the vector [0,1,0], which doesn't make sense (it seems to me the gradient should push in the opposite direction once I start to over-estimate).
I really can't figure out what's going on here, but have the feeling I'm doing something plain wrong. Any idea? Thank you for your help!
For the record, I also tried using other optimizers like Adam or Adagrad playing with the hyper-params, but with no luck.
I'm using Python 3.7.9, TensorFlow 2.3.1 and TensorFlow probability 0.11.1
I believe the default argument to Categorical is not the vector of probabilities, but the vector of logits (values you'd take softmax of to get probabilities). This is to help maintain precision in internal Categorical computations like log_prob. I think you can simply eliminate the softmax activation function and it should work. Please update if it doesn't!
EDIT: alternatively you can replace the tfd.Categorical with
lambda p: tfd.Categorical(probs=p)
but you'll lose the aforementioned precision gains. Just wanted to clarify that passing probs is an option, just not the default.
I know that an LSTM layer expects a 3 dimension input (samples, timesteps, features). But which of it dimension the data is considered as a sequence.
Reading some sites I understood that is the timestep, so I tried to create a simple problem to test.
In this problem, the LSTM model needs to sum the values in timesteps dimension. Then, assuming that the model will consider the previous values of the timestep, it should return as an output the sum of the values.
I tried to fit with 4 samples and the result was not good. Does my reasoning make sense?
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM
X = np.array([
[5.,0.,-4.,3.,2.],
[2.,-12.,1.,0.,0.],
[0.,0.,13.,0.,-13.],
[87.,-40.,2.,1.,0.]
])
X = X.reshape(4, 5, 1)
y = np.array([[6.],[-9.],[0.],[50.]])
model = Sequential()
model.add(LSTM(5, input_shape=(5, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=1000, batch_size=4, verbose=0)
print(model.predict(np.array([[[0.],[0.],[0.],[0.],[0.]]])))
print(model.predict(np.array([[[10.],[-10.],[10.],[-10.],[0.]]])))
print(model.predict(np.array([[[10.],[20.],[30.],[40.],[50.]]])))
output:
[[-2.2417212]]
[[7.384143]]
[[0.17088854]]
First of all, yes you're right that timestep is the dimension take as data sequence.
Next, I think there is some confusion about what you mean by this line
"assuming that the model will consider the previous values of the
timestep"
In any case, LSTM doesn't take previous values of time step, but rather, it takes the output activation function of the last time step.
Also, the reason that your output is wrong is because you're using a very small dataset to train the model. Recall that, no matter what algorithm you use in machine learning, it'll need many data points. In your case, 4 data points are not enough to train the model. I used slightly more number of parameters and here's the sample results.
However, remember that there is a small problem here. I initialised the training data between 0 and 50. So if you make predictions on any number outside of this range, this won't be accurate anymore. Farther the number from this range, lesser the accuracy. This is because, it has become more of a function mapping problem than addition. By function mapping, I mean that your model will learn to map all values that are in training set(provided it's trained on enough number of epochs) to outputs. You can learn more about it here.
I'm trying to use Keras to implement part of an algorithm that requires weight clipping, i.e. limiting the weight values after a gradient update. I haven't found any solutions through web searches so far.
For background, this has to do with the WGANs algorithm:
https://arxiv.org/pdf/1701.07875.pdf
If you look at algorithm 1 on page 8, you'll see the following:
I've highlighted the lines that I'm trying to implement in Keras: after computing a gradient to use to update the weights in the network, I want to make sure that all the weights are clipped between some values [-c, c] that I can set.
How could I go about doing this in Keras?
For reference I am using the TensorFlow backend. I don't mind digging into things and adding messy quick-fixes for now.
While creating the optimizer object set param clipvalue. It will do precisely what you want.
# all parameter gradients will be clipped to
# a maximum value of 0.5 and
# a minimum value of -0.5.
rsmprop = RMSprop(clipvalue=0.5)
and then use this object to for model compiling
model.compile(loss='mse', optimizer=rsmprop)
For more reference check: here.
Also, I prefer to use clipnorm over clipvalue because with clipnorm the optimization remains stable. For example say you have 2 parameters and the gradients came out to be [0.1, 3]. By using clipvalue the gradients will become [0.1, 0.5] ie there are chances that the direction of steepest decent can get changed drastically. While clipnorm don't have similar problem as all the gradients will be appropriately scaled and the direction will be preserved and all the while ensuring the constraint on the magnitude of the gradient.
Edit: The question asks weights clipping not gradient clipping:
Gradiant clipping on weights is not part of keras code. But maxnorm on weights constraints is. Check here.
Having said that it can be easily implemented. Here is a very small example:
from keras.constraints import Constraint
from keras import backend as K
class WeightClip(Constraint):
'''Clips the weights incident to each hidden unit to be inside a range
'''
def __init__(self, c=2):
self.c = c
def __call__(self, p):
return K.clip(p, -self.c, self.c)
def get_config(self):
return {'name': self.__class__.__name__,
'c': self.c}
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(30, input_dim=100, W_constraint = WeightClip(2)))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')
X = np.random.random((1000,100))
Y = np.random.random((1000,1))
model.fit(X,Y)
I have tested the running of the above code, but not the validity of the constraints. You can do so by getting the model weights after training using model.get_weights() or model.layers[idx].get_weights() and checking whether its abiding the constraints.
Note: The constrain is not added to all the model weights .. but just to the weights of the specific layer its used and also W_constraint adds constrain to W param and b_constraint to b (bias) param
I am trying to implement a layer that is not fully connected. I have a matrix that specifies the connectivity I desire in the variable connectivity_matrix, which is a numpy array of ones and zeros.
The way I am currently trying to impliment the layer is by pairwise multiplying the weights, by this connectivity matrix F:
Is this the correct way to do this in tensorflow? Here is what I have so far
import numpy as np
import tensorflow as tf
import tflearn
num_input = 10
num_layer1 = 313
num_output = 700
# For example:
connectivity_matrix = np.array(np.random.choice([0, 1], size=(num_layer1, num_output)), dtype='float32')
input = tflearn.input_data(shape=[None, num_input])
# Here is where I specify the connectivity in tensorflow
connectivity = tf.constant(connectivity_matrix, shape=[num_layer1, num_output])
# One basic, fully connected layer
layer1 = tflearn.fully_connected(input, num_layer1, activation='relu')
# Here is where I want to have a non-fully connected layer
W = tf.Variable(tf.random_uniform([num_layer1, num_output]))
b = tf.Variable(tf.zeros([num_output]))
# so take a fully connected W, and do a pairwise multiplication with my tf_connectivity matrix
W_filtered = tf.mul(connectivity, W)
output = tf.matmul(layer1, W_filtered) + b
Masking out unwanted connections in each iteration should work, but I am not sure what the convergence properties are like. It may okay for a small enough learning rate?
Another approach would be to penalize unwanted weights in the cost function. You would use a mask matrix with 1's at unwanted connections, and 0's at wanted ones (or have a smoother transition). This would be multiplied by weights, squared/scaled and added to the cost function. This should converge more smoothly.
P.S.: If you've made progress on this, it would be great to hear your comments as I am also working on this problem.