How to define positional arguments: 'op', 'value_index', and 'dtype' in a tensor? - tensorflow

when I run the below piece of code
from tensorflow.python.ops.numpy_ops import np_config
np_config.enable_numpy_behavior()
import pandas as pd
df = pd.DataFrame(
{'x':[1.,2.,3.,4.],
'y':[1.59,4.24,2.38,0.53]}
)
data = tf.data.Dataset.from_tensor_slices(df.to_numpy())
data = data.flat_map(lambda x: x.reshape((2,1)))
I receive:
TypeError: init() missing 3 required positional arguments: 'op', 'value_index', and 'dtype' . I understand why this is happening as I didn't define values for 'op', 'value_index', and 'dtype' amd that tensorflow cant produce tensors.
Basically I want to use flat_map function to create tensors with shape = (1,) and dtype = tf.float64 such that when I run the below code the printed tensors look like:
for item in data:
print(item)
tf.Tensor([1.], shape=(1,), dtype=float64)
tf.Tensor([2.], shape=(1,), dtype=float64)
tf.Tensor([3.], shape=(1,), dtype=float64)
tf.Tensor([4.], shape=(1,), dtype=float64)
tf.Tensor([1.59], shape=(1,), dtype=float64)
tf.Tensor([4.24], shape=(1,), dtype=float64)
tf.Tensor([2.38], shape=(1,), dtype=float64)
tf.Tensor([0.53], shape=(1,), dtype=float64)
How can I specify those values inside flat_map function or any other function which I can pass into flat_map function?
I checked here https://www.tensorflow.org/api_docs/python/tf/Tensor but unfortunately I couldn't come up with a solution.
Thanks!

Related

How can I use Tensorflow's GlorotUniform Initializer with a state-less semantics?

How can I use GlorotUniform Initializer with a state-less semantics? In other words, I would like GlorotUniform to produce the same result on different calls. The following code does not work
import tensorflow as tf
tf.random.set_seed(1234)
initializer = tf.keras.initializers.GlorotUniform(seed=3)
print (initializer(shape=(2, 2)))
initializer = tf.keras.initializers.GlorotUniform(seed=3)
print (initializer(shape=(2, 2)))
which produces
tf.Tensor(
[[1.1279136 0.19878006]
[0.34682322 1.1320969 ]], shape=(2, 2), dtype=float32)
tf.Tensor(
[[ 0.9531394 0.22104084]
[ 0.41438842 -1.1447294 ]], shape=(2, 2), dtype=float32)
I understand one can use tf.random.stateless_uniform but that is not globot uniform.

keras: add constant number to all input values

I want to create a simple toy model in keras. The model should take an input, then add a 1 to every element and produce an output.
I found an example using keras, but it requires 2 inputs
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# create model
input1 = layers.Input(shape=(2,))
input2 = layers.Input(shape=(2,))
added = layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
# run inference
input_shape = (2,)
x1 = tf.ones(input_shape)
x2 = tf.ones(input_shape)
y = model([x1, x2])
However, I need the model to only have a single input and simply increase every input value by 1, for example.
You can replace the second input of your toy model with a call to tf.ones_like:
input1 = layers.Input(shape=())
added = layers.Add()([input1, tf.ones_like(input1)])
model = keras.models.Model(inputs=input1, outputs=added)
tf.ones_like creates a tensor full of ones of the shape of the tensor passed as an argument. As this op depends only on the shape of the input tensor, you can technically create your network without a specified input shape, and it will accept any shape as input:
>>> model(3)
<tf.Tensor: shape=(), dtype=float32, numpy=4.0>
>>> model(tf.ones((1,2,3)))
<tf.Tensor: shape=(1, 2, 3), dtype=float32, numpy=
array([[[2., 2., 2.],
[2., 2., 2.]]], dtype=float32)>

How to generate encoded text directly from tf.data.Dataset.from_generator method?

On Better performance with the tf.data API Tensorflow tutorial is showed a simple and efficient Dataset implementation. When working with text datasets, this implementation would be something like:
class TextDataset(tf.data.Dataset):
def _generator(dataset_dir, num_samples):
# Opening the dataset file
dataset_file = open(dataset_dir, "r")
for sample_idx in range(num_samples):
# Reading data (line, record) from the file
sample = dataset_file.readline()
yield {"idx": sample_idx, "text": sample}
def __new__(cls, dataset_dir, num_samples=3):
return tf.data.Dataset.from_generator(
cls._generator,
output_types={"idx": tf.dtypes.int64, "text": tf.dtypes.string},
output_shapes={"idx": (), "text": ()},
args=(dataset_dir, num_samples,)
)
which generates the following dataset:
{'idx': <tf.Tensor: shape=(), dtype=int64, numpy=0>,
'text': <tf.Tensor: shape=(), dtype=string, numpy=b'sample one'>},
{'idx': <tf.Tensor: shape=(), dtype=int64, numpy=1>,
'text': <tf.Tensor: shape=(), dtype=string, numpy=b'sample two'>},
{'idx': <tf.Tensor: shape=(), dtype=int64, numpy=2>,
'text': <tf.Tensor: shape=(), dtype=string, numpy=b'sample three'>}
...
Now, instead of yield the text as a string in the _generator method, it would be interesting to return only the identifiers of the string's tokens (encode). This is possible to be done by a tokenizer.
So, how to encode the text as a list of integers before yield it in the _generator method?
Note: a working example is available in Google Colab.

How to align shape of a tensor returned by an iterator with a tensorflow variable

This is probably a very simple question, however I am fairly new to tensorflow and have been stuck at this issue. I use tensorflow 1.12 and python 3.
My question is, what is the proper way to set the shape of a tensor object that is returned by the iterator?
With placeholders I can make make something like this code work, but I would like to make this work without a placeholder and using tensorflow datasets.
I cannot figure out how to align the shape of a tensor with a matrix in order to use tf.matmul.
The error I receive is: ValueError: Shape must be rank 2 but is rank 1 for 'MatMul_19' (op: 'MatMul') with input shapes: [2], [2,1].
The dataset of the iterator is specified as: TensorSliceDataset shapes: (2,), types: tf.float32>.
Thanks in advance!
import tensorflow as tf
import numpy as np
batch_size = 200
# this simulates a dataset read from a csv.....
x=np.array([[0., 0.], [1., 0.], [0., 1.], [1., 1.]],dtype="float32")
y=np.array([0, 0, 0, 1],dtype="float32")
dataset = tf.data.Dataset.from_tensor_slices((x))
print(dataset) # <TensorSliceDataset shapes: (2,), types: tf.float32>
dataset = dataset.repeat(10000)
print('repeat ds ', dataset) # repeat ds <RepeatDataset shapes: (2,), types: tf.float32>
iter = dataset.make_initializable_iterator()
print('iterator ', iter) # iterator <tensorflow.python.data.ops.iterator_ops.Iterator object at 0x0000028589C62550>
sess = tf.Session()
sess.run(iter.initializer)
next_elt= iter.get_next()
print('shape of dataset ', dataset , '[iterator] elt ', next_elt) # shape of dataset <RepeatDataset shapes: (2,), types: tf.float32> [iterator] elt Tensor("IteratorGetNext_105:0", shape=(2,), dtype=float32)
print('shape of it ', next_elt.shape) #s hape of it (2,)
for i in range(4):
print(sess.run(next_elt))
''' outputs:
[0. 0.]
[1. 0.]
[0. 1.]
[1. 1.]
'''
w = tf.Variable(tf.random_uniform([2,1], -1, 1, seed = 1234),name="weights_layer_1")
# this is where the error is because of shape mismatch of iterator and w variable.
# How od I make the shape of the iterator (2,1) so that matmul can be used?
# What is the proper way of aligning a tensor shape with inut data
# The output of the error:
# ValueError: Shape must be rank 2 but is rank 1 for 'MatMul_19' (op: 'MatMul') with input shapes: [2], [2,1].
H = tf.matmul( sess.run(next_elt) , w)
You can use tf.reshape. Just add tf.reshape(next_elt, [1,2]) prior to matmul op
More on reshape https://www.tensorflow.org/api_docs/python/tf/reshape

Initializing LSTM hidden state Tensorflow/Keras

Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model.
Is that even possible with current API?
This is paper I am trying to recreate:
http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf
Yes - this is possible but truly cumbersome. Let's go through an example.
Defining a model:
from keras.layers import LSTM, Input
from keras.models import Model
input = Input(batch_shape=(32, 10, 1))
lstm_layer = LSTM(10, stateful=True)(input)
model = Model(input, lstm_layer)
model.compile(optimizer="adam", loss="mse")
It's important to build and compile model first as in compilation the initial states are reset. Moreover - you need to specify a batch_shape where batch_size is specified as in this scenario our network should be stateful (which is done by setting a stateful=True mode.
Now we could set the values of initial states:
import numpy
import keras.backend as K
hidden_states = K.variable(value=numpy.random.normal(size=(32, 10)))
cell_states = K.variable(value=numpy.random.normal(size=(32, 10)))
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
Note that you need to provide states as a keras variables. states[0] holds hidden states and states[1] holds cell states.
Hope that helps.
As stated in the Keras API documentation for recurrent layers (https://keras.io/layers/recurrent/):
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by calling reset_states with the keyword argument states. The value of states should be a numpy array or list of numpy arrays representing the initial state of the RNN layer.
Since the LSTM layer has two states (hidden state and cell state) the value of initial_state and states is a list of two tensors.
Examples
Stateless LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
import tensorflow as tf
import numpy as np
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8)
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
Stateful LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
Note that for stateful lstm you need to specify also batch_size.
import tensorflow as tf
import numpy as np
from pprint import pprint
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8, stateful=True, batch_size=(1, 10, 1))
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
With a Stateful LSTM, the states are not reset at the end of each sequence and we can notice that the output of the layer correspond to the hidden state (i.e. lstm.states[0]) at the last timestep:
>>> pprint(outputs)
<tf.Tensor: id=821, shape=(1, 8), dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>
>>>
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.14726108, 0.13584498, -0.12986949, -0.22309153, 0.0125412 ,
-0.11446435, 0.22290672, 0.05397629]], dtype=float32)>]
Calling reset_states() it is possible to reset the states:
>>> lstm.reset_states()
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>]
>>>
or to set them to a specific value:
>>> lstm.reset_states(states=[h_0, c_0])
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>]
>>>
>>> pprint(h_0)
<tf.Tensor: id=422, shape=(1, 8), dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>
>>>
>>> pprint(c_0)
<tf.Tensor: id=421, shape=(1, 8), dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>
>>>
I used this approach, totally worked out for me:
lstm_cell = LSTM(cell_num, return_state=True)
output, h, c = lstm_cell(input, initial_state=[h_prev, c_prev])
Assuming an RNN is in layer 1 and hidden/cell states are numpy arrays. You can do this:
from keras import backend as K
K.set_value(model.layers[1].states[0], hidden_states)
K.set_value(model.layers[1].states[1], cell_states)
States can also be set using
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
but when I did it this way my state values stayed constant even after stepping the RNN.