keras: add constant number to all input values - tensorflow

I want to create a simple toy model in keras. The model should take an input, then add a 1 to every element and produce an output.
I found an example using keras, but it requires 2 inputs
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# create model
input1 = layers.Input(shape=(2,))
input2 = layers.Input(shape=(2,))
added = layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
# run inference
input_shape = (2,)
x1 = tf.ones(input_shape)
x2 = tf.ones(input_shape)
y = model([x1, x2])
However, I need the model to only have a single input and simply increase every input value by 1, for example.

You can replace the second input of your toy model with a call to tf.ones_like:
input1 = layers.Input(shape=())
added = layers.Add()([input1, tf.ones_like(input1)])
model = keras.models.Model(inputs=input1, outputs=added)
tf.ones_like creates a tensor full of ones of the shape of the tensor passed as an argument. As this op depends only on the shape of the input tensor, you can technically create your network without a specified input shape, and it will accept any shape as input:
>>> model(3)
<tf.Tensor: shape=(), dtype=float32, numpy=4.0>
>>> model(tf.ones((1,2,3)))
<tf.Tensor: shape=(1, 2, 3), dtype=float32, numpy=
array([[[2., 2., 2.],
[2., 2., 2.]]], dtype=float32)>

Related

How to interpret get_weights for Keras GRU?

I am unable to interpret the results of get_weights from a GRU layer. Here's my code -
#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt
model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)
I am familiar with GRU concepts. In addition, I understand how the get_weights work for Keras Simple RNN layer, where the first array represents the input weights, the second the activation weights and the third the bias. However, I am lost with output of GRU, which is given below -
Shape = [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969, 0.22260845,
-0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 , 0.3723044 , -0.6559699 , -0.33790302,
0.27062896],
[-0.4214194 , 0.46456426, 0.27233726, -0.00461334, -0.6533575 ,
-0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)]
I am assuming it has something to do with GRU gates.
Update:7/4 - This page says that keras GRU has 3 gates, update, reset and output. However, based on this, GRU shouldn't have the output gate.
Best way I know would be to track the add_weight() calls in the build() function of the GRUCell.
Let's take an example model,
model = tf.keras.models.Sequential(
[
tf.keras.layers.GRU(32, input_shape=(5, 10), name='gru'),
tf.keras.layers.Dense(10)
]
)
How we'll print some metadata about what's returned by weights = model.get_layer('gru').get_weights(). Which gives,
Number of arrays in weights: 3
Shape of each array in weights: [(10, 96), (32, 96), (2, 96)]
Let's go back to what weights defined by the GRUCell. We got,
self.kernel = self.add_weight(
shape=(input_dim, self.units * 3),
...
)
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units * 3),
...
)
...
bias_shape = (2, 3 * self.units)
self.bias = self.add_weight(
shape=bias_shape,
...
)
This is what you're seeing as weights (in that order). Here's why they are shaped like this. GRU computations are outlined here.
The first matrix in weights (of shape [10, 96]) is a concatenation of Wz|Wr|Wh (in that order). Each of these is a [10, 32] sized tensor. Concatenation gives a [10, 32*3=96] sized tensor.
Similarly, the second matrix is a concatenation of Uz|Ur|Uh. Each of these is a [32, 32] sized tensor which becomes [32, 96] after concatenation.
You can see how they break this combined weight matrix to each of z, r and h components here.
Finally the bias. It contains 2 biases i.e. [2, 96] sized tensor; input_bias and recurrent_bias. Again, biases from all gates/weights are combined to a single tensor. Typically, only the input_bias is used. But if you have reset_after (decides how the reset gate is applied) set to True, then the recurrent_bias gets used. It's an implementation detail.

I'm getting error (Inputs to a layer should be tensors) when using tf.data.Dataset and the Window creation function

The problem I'm stuck with is an error in the fit method when trying to train a neural network based on a dataset generated using the tf.data.Dataset.Window window creation function.
My training dataset is too big to fit in memory, and I have to train on data that is formed into window. In this regard, the loading of a data set is organized through the tf.data.experimental.CsvDataset function.
The dataset is a consecutive row of numeric values, where the first 7 values ​​contain labels, the next 100 values ​​contain features. Only one value is taken to form labels, the remaining 6 are omitted and serve only for additional experiments with the quality of training.
import tensorflow as tf
from tensorflow import keras
XLength = 107
types = [tf.constant(0, dtype=tf.float32)]
ds = tf.data.experimental.CsvDataset(train_file_list, types*XLength, header=False, field_delim = ";", compression_type="GZIP")
The pack_row function extracts the 3rd value from each row as a label and 100 features values
def pack_row(*row):
label = row[3]
features = tf.stack(row[PLength:XLength],1)
return features, label
Next, we create a data set in which rows form a data set divided into features and labels, and add a window creation function.
window_ds_train = ds.batch(1000).map(stack_row, num_parallel_calls=4).unbatch().window(10, shift=1, drop_remainder=True)
The features dataset looks like this:
for x in window_ds_train.take(1):
for n in x[0]:
print(n)
tf.Tensor(
[1.1039783 1.1163003 1.1081576 1.1117266 1.1180297 1.2345679 1.3053098
1.3443557 1.3639535 1.26 1.2604042 1.1780168 1.1761158 1.2451861
1.4478064 1.4914197 1.35623 1.4864376 1.4237918 1.4029851 1.434866
1.1298449 1.0216535 1.0060976 1.0190678 1.0550661 0.99117 0.8632287
0.7545455 0.7396314 0.7372093 0.7226107 0.7727273 0.766129 1.0083683
1.5096774 1.4933333 1.2517985 1.537037 1.6262627 1.5851064 1.2197802
1.1764706 1.6491228 4.631579 5.25 4.7 4.3333335 4.
3.5714285 0.28 0.25 0.2307692 0.212766 0.1904762 0.2159091
0.606383 0.85 0.8198198 0.6308725 0.6149068 0.6506024 0.7988506
0.6696429 0.6623932 0.9917012 1.3052632 1.2941177 1.383871 1.3564669
1.3520249 1.3253012 1.1584415 1.0089086 0.9478079 0.981289 0.9939394
0.9788054 0.8850772 0.6969292 0.7127659 0.7023498 0.6727494 0.7373381
0.6705021 0.6907001 0.8030928 0.8502564 0.8488844 0.7933962 0.7936508
0.7331628 0.7438507 0.7661017 0.81 0.8944306 0.8995017 0.9023987
0.8958163 0.9058149], shape=(100,), dtype=float32)
tf.Tensor(
[1.0480686 1.0768552 1.0823635 1.0807899 1.0946314 1.1049724 1.0976744
1.1112158 1.1066037 1.0180608 1.0143541 1.0478215 1.1168385 1.1465721
1.1544029 1.1672772 1.0481482 1.0198511 0.9598997 1.0053476 1.1888889
0.9557377 0.8722689 0.9482759 0.948718 0.9485149 0.9144603 0.7938144
0.6960168 0.6963124 0.7188209 0.7328605 0.6848341 0.686747 0.589242
0.5806451 0.5614035 0.4371859 0.483965 0.4721408 0.7163461 0.8951613
0.8403361 0.8703704 1.1428572 0.9264706 0.7460318 0.65 0.5925926
0.9615384 1.04 1.6875 1.5384616 1.3404255 1.0793651 0.875
1.1489362 1.19 1.1171172 1.3959732 2.1180124 2.066265 2.2873564
1.78125 1.7222222 1.6970954 1.4561404 1.4602076 1.3645161 1.3911672
1.4361371 1.436747 1.2597402 1.0935411 1.0542798 1.054054 1.0545454
1.1464355 1.0463122 0.8411215 0.9946808 1.0417755 0.9805353 0.9540636
0.8566946 0.8662487 0.872165 0.8953846 0.9543611 0.9858491 0.9822596
0.9036658 0.8999152 0.9110169 0.905 0.9135495 0.9252492 0.9239041
0.9286301 0.954136 ], shape=(100,), dtype=float32)
I had to omit some of the data, because the data set is too large, the window has the form (10,100)
The labels look like this:
for x in window_ds_train.take(1):
for n in x[1]:
print(n)
tf.Tensor(-0.21, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
Next, I would like to make a flat_map transformation to a dataset, but when I try to execute:
flatten = window_ds_train.flat_map(lambda x:x.batch(10))
of course, I will get an error: TypeError: () takes 1 positional argument but 2 were given, since both features and labels are hardwired inside the dataset, and the method can apparently only process one axis.
The model I'm trying to train looks like this:
inputs = keras.Input(shape=(100))
x = keras.layers.Dense(204, activation='relu')(inputs)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(400, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(400, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(204, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(102, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(10)(x)
model = keras.Model(inputs, outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss = 'mse', metrics="mae")
If, under such circumstances, training is carried out:
model.fit(window_ds_train, epochs=1, verbose=1)
then I get an error: TypeError: Inputs to a layer should be tensors. Got: <_VariantDataset shapes: (100,), types: tf.float32>
Accordingly, I understand that the incoming data must be a tensor, while it is of type _VariantDataset, which is not acceptable.
To work around this problem, I attempted to split the dataset into features and labels and process them in separate flat_map threads. To do this, I had to additionally introduce two functions, the first returns features, and the second labels:
def label_row(*row):
label = row[3]
return label
def features_row(*row):
features = tf.stack(row[PLength:XLength],1)
return features
Next, we form a data set with window functions for features and labels separately for each:
feature_flatten = feature_window_ds_train.flat_map(lambda x:x.batch(10))
label_flatten = label_window_ds_train.flat_map(lambda x:x.batch(10))
When trying to train a model:
history = model.fit(feature_flatten, label_flatten, epochs=1, verbose=1)
i get error: y argument is not supported when using dataset as input
Definitely, the input model expects a dataset in which the Dataset consists of x and y, in this case I submit x separately from y, which is unacceptable.
If someone has ideas on how to train a model that will accept Dataset.Window as input, I would be very grateful for clarifications.
Let's first create a dataset compatible with your model
N = 50;
c = 1;
ds = tf.data.Dataset.from_tensor_slices(
(
tf.random.normal(shape=(N, c, 100)),
tf.random.normal(shape=(N, c))
)
)
Then we can simply
model.fit(ds, epochs=1)
But notice that the return type of window, is not the same as the initial dataset. ds is a dataset of tuples, dsw is a tupple of _VariantDatasets
print(ds)
# <TensorSliceDataset shapes: ((1, 100), (1,)), types: (tf.float32, tf.float32)>
for dsw in ds.window(30):
print(dsw);
# (<_VariantDataset shapes: (1, 100), types: tf.float32>, <_VariantDataset shapes: (1,), types: tf.float32>)
# (<_VariantDataset shapes: (1, 100), types: tf.float32>, <_VariantDataset shapes: (1,), types: tf.float32>)
What you can do to get a window of the dataset with the same type is to combine skip and take
def simple_window(ds, size):
for start in range(0, ds.cardinality(), size):
yield ds.skip(start).take(size)
Then you can train with different windows
for dsw in simple_window(ds, 30):
model.fit(dsw, epochs=1)

How does BatchNormalization work on an example?

I am trying to understand batchnorm.
My humble example
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.,4.]])
out = layer1(x)
print(out)
Prints
tf.Tensor([[2.99850112 3.9980015 ]], shape=(1, 2), dtype=float64)
My attempt to reproduce it
e=0.001
m = np.sum(x)/2
b = np.sum((x - m)**2)/2
x_=(x-m)/np.sqrt(b+e)
print(x_)
It prints
[[-0.99800598 0.99800598]]
What am I doing wrong?
Two problems here.
First, batch norm has two "modes": Training, where normalization is done via the batch statistics, and inference, where normalization is done via "population statistics" that are collected from batches during training. Per default, keras layers/models function in inference mode, and you need to specify training=True in their call to change this (there are other ways, but that is the simplest one).
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.,4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
This prints tf.Tensor([[0. 0.]], shape=(1, 2), dtype=float32). Still not right!
Second, batch norm normalizes over the batch axis, separately for each feature. However, the way you specify the input (as a 1x2 array) is basically a single input (batch size 1) with two features. Batch norm just normalizes each feature to mean 0 (standard deviation is not defined). Instead, you want two inputs with a single feature:
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.],[4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
This prints
tf.Tensor(
[[-0.99800634]
[ 0.99800587]], shape=(2, 1), dtype=float32)
Alternatively, specify the "feature axis":
layer1 = tf.keras.layers.BatchNormalization(axis=0, scale=False, center=False)
x = np.array([[3.,4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
Note that the input shape is "wrong", but we told batchnorm that axis 0 is the feature axis (it defaults to -1, the last axis). This will also give the desired result:
tf.Tensor([[-0.99800634 0.99800587]], shape=(1, 2), dtype=float32)

Tensorflow tf.nn.embedding_lookup

is there a small neural network in tf.nn.embedding_lookup??
When I train some data, a value of the same index is changing.
So is it trained also? while I'm training my model
I checked the official embedding_lookup code but I can not see any tf.Variables for train embedding parameter.
But when I print all tf.Variables then I can found a Variable which is within embedding scope
Thank you.
Yes, the embedding is learned. You can look at the tf.nn.embedding_lookup operation as doing the following matrix multiplication more efficiently:
import tensorflow as tf
import numpy as np
NUM_CATEGORIES, EMBEDDING_SIZE = 5, 3
y = tf.placeholder(name='class_idx', shape=(1,), dtype=tf.int32)
RS = np.random.RandomState(42)
W_em_init = RS.randn(NUM_CATEGORIES, EMBEDDING_SIZE)
W_em = tf.get_variable(name='W_em',
initializer=tf.constant_initializer(W_em_init),
shape=(NUM_CATEGORIES, EMBEDDING_SIZE))
# Using tf.nn.embedding_lookup
y_em_1 = tf.nn.embedding_lookup(W_em, y)
# Using multiplication
y_one_hot = tf.one_hot(y, depth=NUM_CATEGORIES)
y_em_2 = tf.matmul(y_one_hot, W_em)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
sess.run([y_em_1, y_em_2], feed_dict={y: [1.0]})
# [array([[ 1.5230298 , -0.23415338, -0.23413695]], dtype=float32),
# array([[ 1.5230298 , -0.23415338, -0.23413695]], dtype=float32)]
The variable W_em will be trained in exactly the same way irrespective of whether you use y_em_1 or y_em_2 formulation; y_em_1 is likely to be more efficient, though.

Initializing LSTM hidden state Tensorflow/Keras

Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model.
Is that even possible with current API?
This is paper I am trying to recreate:
http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf
Yes - this is possible but truly cumbersome. Let's go through an example.
Defining a model:
from keras.layers import LSTM, Input
from keras.models import Model
input = Input(batch_shape=(32, 10, 1))
lstm_layer = LSTM(10, stateful=True)(input)
model = Model(input, lstm_layer)
model.compile(optimizer="adam", loss="mse")
It's important to build and compile model first as in compilation the initial states are reset. Moreover - you need to specify a batch_shape where batch_size is specified as in this scenario our network should be stateful (which is done by setting a stateful=True mode.
Now we could set the values of initial states:
import numpy
import keras.backend as K
hidden_states = K.variable(value=numpy.random.normal(size=(32, 10)))
cell_states = K.variable(value=numpy.random.normal(size=(32, 10)))
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
Note that you need to provide states as a keras variables. states[0] holds hidden states and states[1] holds cell states.
Hope that helps.
As stated in the Keras API documentation for recurrent layers (https://keras.io/layers/recurrent/):
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by calling reset_states with the keyword argument states. The value of states should be a numpy array or list of numpy arrays representing the initial state of the RNN layer.
Since the LSTM layer has two states (hidden state and cell state) the value of initial_state and states is a list of two tensors.
Examples
Stateless LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
import tensorflow as tf
import numpy as np
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8)
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
Stateful LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
Note that for stateful lstm you need to specify also batch_size.
import tensorflow as tf
import numpy as np
from pprint import pprint
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8, stateful=True, batch_size=(1, 10, 1))
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
With a Stateful LSTM, the states are not reset at the end of each sequence and we can notice that the output of the layer correspond to the hidden state (i.e. lstm.states[0]) at the last timestep:
>>> pprint(outputs)
<tf.Tensor: id=821, shape=(1, 8), dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>
>>>
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.14726108, 0.13584498, -0.12986949, -0.22309153, 0.0125412 ,
-0.11446435, 0.22290672, 0.05397629]], dtype=float32)>]
Calling reset_states() it is possible to reset the states:
>>> lstm.reset_states()
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>]
>>>
or to set them to a specific value:
>>> lstm.reset_states(states=[h_0, c_0])
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>]
>>>
>>> pprint(h_0)
<tf.Tensor: id=422, shape=(1, 8), dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>
>>>
>>> pprint(c_0)
<tf.Tensor: id=421, shape=(1, 8), dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>
>>>
I used this approach, totally worked out for me:
lstm_cell = LSTM(cell_num, return_state=True)
output, h, c = lstm_cell(input, initial_state=[h_prev, c_prev])
Assuming an RNN is in layer 1 and hidden/cell states are numpy arrays. You can do this:
from keras import backend as K
K.set_value(model.layers[1].states[0], hidden_states)
K.set_value(model.layers[1].states[1], cell_states)
States can also be set using
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
but when I did it this way my state values stayed constant even after stepping the RNN.