I have a network with keras / tf where two branches are built:
- one where a short sequence of words is transformed into 300-dim embeddings
- the other where the same sequence of words is transformed into ngrams
I then end up with two data structures:
termwords.shape = (?, 42, 300)
termngrams.shape = (?, 42)
(I make sure that both branches have the same 'length' of 42, ie. maximally 42 words and maximally 42 ngrams, padding/cutting where needed). I'd then need to merge these into one branch to arrive at the prediction layer.
But
merged = merge([termwords, termngrams], mode='concat')
tells me that the ranks don't match. I was hoping concat would allow me to append the 'termngrams' to the 'termwords' such that I end up just with a data structure of shape (?,42,301). But I can't find the proper way to express that.
The "rank" error is telling you that the tensors don't have the same number of dimensions. One is 2D and the other is 3D.
Use a Lambda layer with expand_dims to add an extra dimension to the 2D one.
import keras.backend as K
from keras.layers import Lambda
termngrams = Lambda(lambda x: K.expand_dims(x))(termngrams) #outputs (?,42,1)
Then use a Contatenate() layer (by default it uses the last axis, as you want).
merged = Concatenate()([termwords,termngrams])
(Assuming you're using a functional API Model instead of sequential models, sequential models aren't good for branching)
Related
I'm doing Reinforcement Learning to teach agents to accomplish tasks in a 2-dimensional world. A big part of that is to figure out how to represent their environment as neurons.
So far I've represented the world has a 3-d grid of shape (10, 10, 7). The first two 10s are because the size of the grid is 10 in each direction, and the 7 is because I have 7 different kinds of things to say about each space (whether it has food, an enemy, a wall...)
I then used convolutional layers in Keras to process this information and learn from it. It worked and the creatures are successfully walking towards the food.
Now I would like to also add more information that the neural network might figure out how to use. For example, I'd like to encode the last action the agent took. I might also encode the distance or angle to the nearest food. Obviously, this is not 3-d data, this is a sequence of 1-d data.
I want Keras to be able to use that as input together with the 3-d input, and learn from it. I've represented that combined data as a structured array in NumPy:
observation = np.zeros((1,), dtype=[('grid', np.float64, (10, 10, 7,)), ('sequential', np.float64, (7,))])
That way it's possible to access the grid data as observation['grid'] and the sequential data as observation['sequential'].
Unfortunately I don't know how to get Keras to work with this kind of structured array. My reasoning is that I should build a model using the Functional API, and that model will have two "prongs" for the input that'll connect together to a concatenate at some point and get merged to a final output layer.
But, I have no idea how to make Keras figure out that the NumPy structured array should be broken down to the subarrays that it's made of. Is that possible?
If I'm going the wrong way with this, please advise.
in keras you can give different inputs as follow
from keras.models import Input, Model
from keras.layers import Conv2D, Dense, Flatten, concatenate
first_input = Input(shape=(10, 10, 7))
second_input = Input(shape=(7))
c1 = Conv2D(32, (3,3), padding='same') (first_input)
c1 = Flatten() (c1)
d1 = Dense(10) (second_input)
m = concatenate([c1,d1])
m = Dense(5) (m)
model = Model(inputs=[first_input,second_input], outputs=m)
model.compile(optimizer='adam' loss='categorical_crossentropy')
model.fit([observation['grid'], observation['sequential']], Y_train)
this is rough design but it will take your input and concatenate, then produce result
I'm working with padded sequences of maximum length 50. I have two types of sequence data:
1) A sequence, seq1, of integers (1-100) that correspond to event types (e.g. [3,6,3,1,45,45....3]
2) A sequence, seq2, of integers representing time, in minutes, from the last event in seq1. So the last element is zero, by definition. So for example [100, 96, 96, 45, 44, 12,... 0]. seq1 and seq2 are the same length, 50.
I'm trying to run the LSTM primarily on the event/seq1 data, but have the time/seq2 strongly influence the forget gate within the LSTM. The reason for this is I want the LSTM to tend to really penalize older events and be more likely to forget them. I was thinking about multiplying the forget weight by the inverse of the current value of the time/seq2 sequence. Or maybe (1/seq2_element + 1), to handle cases where it's zero minutes.
I see in the keras code (LSTMCell class) where the change would have to be:
f = self.recurrent_activation(x_f + K.dot(h_tm1_f,self.recurrent_kernel_f))
So I need to modify keras' LSTM code to accept multiple inputs. As an initial test, within the LSTMCell class, I changed the call function to look like this:
def call(self, inputs, states, training=None):
time_input = inputs[1]
inputs = inputs[0]
So that it can handle two inputs given as a list.
When I try running the model with the Functional API:
# Input 1: event type sequences
# Take the event integer sequences, run them through an embedding layer to get float vectors, then run through LSTM
main_input = Input(shape =(max_seq_length,), dtype = 'int32', name = 'main_input')
x = Embedding(output_dim = embedding_length, input_dim = num_unique_event_symbols, input_length = max_seq_length, mask_zero=True)(main_input)
## Input 2: time vectors
auxiliary_input = Input(shape=(max_seq_length,1), dtype='float32', name='aux_input')
m = Masking(mask_value = 99999999.0)(auxiliary_input)
lstm_out = LSTM(32)(x, time_vector = m)
# Auxiliary loss here from first input
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
# An abitrary number of dense, hidden layers here
x = Dense(64, activation='relu')(lstm_out)
# The main output node
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
## Compile and fit the model
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'], loss_weights=[1., 0.2])
print(model.summary())
np.random.seed(21)
model.fit([train_X1, train_X2], [train_Y, train_Y], epochs=1, batch_size=200)
However, I get the following error:
An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 50, 1), ndim=3)]; however `cell.state_size` is (32, 32)
Any advice?
You can't pass a list of inputs to default recurrent layers in Keras. The input_spec is fixed and the recurrent code is implemented based on single tensor input also pointed out in the documentation, ie it doesn't magically iterate over 2 inputs of same timesteps and pass that to the cell. This is partly because of how the iterations are optimised and assumptions made if the network is unrolled etc.
If you like 2 inputs, you can pass constants (doc) to the cell which will pass the tensor as is. This is mainly to implement attention models in the future. So 1 input will iterate over timesteps while the other will not. If you really like 2 inputs to be iterated like a zip() in python, you will have to implement a custom layer.
I would like to throw in a different ideas here. They don't require you to modify the Keras code.
After the embedding layer of the event types, stack the embeddings with the elapsed time. The Keras function is keras.layers.Concatenate(axis=-1). Imagine this, a single even type is mapped to a n dimensional vector by the embedding layer. You just add the elapsed time as one more dimension after the embedding so that it becomes a n+1 vector.
Another idea, sort of related to your problem/question and may help here, is 1D convolution. The convolution can happen right after the concatenated embeddings. The intuition for applying convolution to event types and elapsed time is actually 1x1 convolution. In such a way that you linearly combine the two together and the parameters are trained. Note in terms of convolution, the dimensions of the vectors are called channels. Of course, you can also convolve more than 1 event at a step. Just try it. It may or may not help.
I have timeseries data (ECG). I have annotations for blocks of 30seconds.
each block has 1000 data points. We have 500 of those data blocks.
The target, the annotations are e.g. in range 1 to 5.
To be clear please see Figure
About X-DATA
How translate that into the Keras notation for input data [Samples,timesteps, features]?
My guess:
Samples=Blocks (500)
timesteps=values(1000)
features= ECG as itselve (1)
resulting in [500,1000,1]
About Y-Data(target)
My target or y data would result in
[500,1,1]
after one hot encoding it would be
[500,5,1]
The problem is that Keras expect the X and y data to be of same dimensions. But increasing my ydata to 1000 per timestep would not make sense to me.
Thanks for your help
p.s. cannot answer directly as I am with my parent in law. Thanks in advance
I think you're thinking about y incorrectly. From my understanding based on you're graph.
y actually is (500, 5) after one hot encoding. That is, for every block there is a single outcome.
Also there is no need for X and y to have the same dimensions in Keras (unless you have a seq2seq requirement which is not the case here).
What we do want is the model to give us a probability distribution over
the possible labels for each block, and that we'll achieve using a softmax
on the last (Dense) layer.
Here is how I simulated your problem:
import numpy as np
from keras.models import Model
from keras.layers import Dense, LSTM
# using eye doesn't capture one-hot but works for the example
series = np.random.rand(500, 1000, 1)
labels = np.eye(500, 5)
inp = Input(shape=(1000, 1))
lstm = LSTM(128)(inp)
out = Dense(5, activation='softmax')(lstm)
model = Model(inputs=[inp], outputs=[out])
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(series, labels)
I have a pretrained Seq-to-Seq slot tagger network as which in its simplest form is follows:
Network_1 = Sequential ([
Embedding(emb_dim)
Recurrence(LSTM(LSTM_dim))
Dense(num_labels)
])
I would like to use the output of this as initial layers in another network. Basically I would like to concatenate the embeddings from the network_1 (pretrained) to an embedding layer in the network_2 as follows:
Network_2 = Sequential ([
Concat_embeddings ( Embedding(emb_dim), Network_1_embed() )
Recurrence(LSTM(LSTM_dim))
(Label('encoded_h'), Label('encoded_c'))
])
def Network_1_embed():
loaded_model = load_model(path_to_network_1_saved_model);
cloned_model = loaded_model.clone(CloneMethod.freeze);
return cloned_model
def Concat_embeddings(emb1, emb2):
X=Placeholder();
return splice(emb1(X), emb2(X))
This is giving me the following error
ValueError: Times: The 1 leading dimensions of the right operand with shape '[50360]' do not match the left operand's trailing dimensions with shape '[293]'
For reference, we get [293] since emb_dim=256, and num_network_1_labels=37, while [50360] is the vocabulary size of the network_2 input. The Network_1 also had the same vocabulary mapping when being trained, so it can take the same input, and output a 37 dimensional vector for each token.
How do I make this work?
Thanks
I think your problem is that you are using the entire Network_1 as the embedding, instead of just its embedding layer.
One way would be to define embed separately and train it through Network_1:
embed = Embedding(emb_dim)
Network_1 = Sequential ([
embed,
Recurrence(LSTM(LSTM_dim)),
Dense(num_labels)
])
Then train Network_1, but save embed:
embed.save(EMBED_PATH)
Explanation: Since Network_1 just invokes embed, they share parameters, so that training Network_1 will train embed's parameters. Saving embed then gives you the embedding layer trained by Network_1. Quite straight-forward, actually.
Then, to train your second model (in a second script), load embed from disk and just use it:
Network_1_embed = load_model(EMBED_PATH)
Network_2 = Sequential ([
( Embedding(emb_dim), Network_1_embed() ),
splice,
Recurrence(LSTM(LSTM_dim)),
(Label('encoded_h'), Label('encoded_c'))
])
Note the use of a function tuple as the first item passed to Sequential(). The tuple means to apply both functions to the same input, and generates two outputs, which are then the input to the subsequent function, splice.
To keep embed constant, clone it with Freeze option as you already did in your example.
(I am not in front of a computer with the latest CNTK and cannot test this, so it is possible that I made a mistake.)
I'm doing a Matrix Factorization in TensorFlow, I want to use coo_matrix from Spicy.sparse cause it uses less memory and it makes it easy to put all my data into my matrix for training data.
Is it possible to use coo_matrix to initialize a variable in tensorflow?
Or do I have to create a session and feed the data I got into tensorflow using sess.run() with feed_dict.
I hope that you understand my question and my problem otherwise comment and i will try to fix it.
The closest thing TensorFlow has to scipy.sparse.coo_matrix is tf.SparseTensor, which is the sparse equivalent of tf.Tensor. It will probably be easiest to feed a coo_matrix into your program.
A tf.SparseTensor is a slight generalization of COO matrices, where the tensor is represented as three dense tf.Tensor objects:
indices: An N x D matrix of tf.int64 values in which each row represents the coordinates of a non-zero value. N is the number of non-zeroes, and D is the rank of the equivalent dense tensor (2 in the case of a matrix).
values: A length-N vector of values, where element i is the value of the element whose coordinates are given on row i of indices.
dense_shape: A length-D vector of tf.int64, representing the shape of the equivalent dense tensor.
For example, you could use the following code, which uses tf.sparse_placeholder() to define a tf.SparseTensor that you can feed, and a tf.SparseTensorValue that represents the actual value being fed :
sparse_input = tf.sparse_placeholder(dtype=tf.float32, shape=[100, 100])
# ...
train_op = ...
coo_matrix = scipy.sparse.coo_matrix(...)
# Wrap `coo_matrix` in the `tf.SparseTensorValue` form that TensorFlow expects.
# SciPy stores the row and column coordinates as separate vectors, so we must
# stack and transpose them to make an indices matrix of the appropriate shape.
tf_coo_matrix = tf.SparseTensorValue(
indices=np.array([coo_matrix.rows, coo_matrix.cols]).T,
values=coo_matrix.data,
dense_shape=coo_matrix.shape)
Once you have converted your coo_matrix to a tf.SparseTensorValue, you can feed sparse_input with the tf.SparseTensorValue directly:
sess.run(train_op, feed_dict={sparse_input: tf_coo_matrix})