tensorflow CNN with complex features and labels? - tensorflow

I recently found a paper where they used a CNN with complex 2D-feature-maps as an input. However, there Network also outputs a complex output vector. They used Keras with tensorflow backend.
Here is the link: https://arxiv.org/pdf/1802.04479.pdf
I asked myself if it is possible to build complex Deep Neural Networks like CNNs with tensorflow. As far as i know it is not possible. Did i missed something?
There are other related questions which adresses the same problem with no answer: Complex convolution in tensorflow
when building a realy senseless model with real number in and output all works correct:
import tensorflow as tf
from numpy import random, empty
n = 10
feature_vec_real = random.rand(1,n)
X_real = tf.placeholder(tf.float64,feature_vec_real.shape)
def model(x):
out = tf.layers.dense(
inputs=x,
units=2
)
return out
model_output = model(X_real)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
output = sess.run(model_output,feed_dict={X_real:feature_vec_real})
but when using complex inputs:
import tensorflow as tf
from numpy import random, empty
n = 10
feature_vec_complex = empty(shape=(1,n),dtype=complex)
feature_vec_complex.real = random.rand(1,n)
feature_vec_complex.imag = random.rand(1,n)
X_complex = tf.placeholder(tf.complex128,feature_vec_complex.shape)
def complex_model(x):
out = tf.layers.dense(
inputs=x,
units=2
)
return out
model_output = complex_model(X_complex)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
output = sess.run(model_output,feed_dict={X_complex:feature_vec_complex})
i get the following error:
ValueError: An initializer for variable dense_7/kernel of is required
So what is the correct way to initialize the weights of the dense kernel when having complex inputs?
I know there is the possibility to handle complex numbers as two different layers in the network. But this is not what i want.
Thanks for your help!

Related

Different outputs with LSTM in PyTorch vs Tensorflow

I am trying to convert a Tensorflow(1.15) model to PyTorch model. Since I was getting very different loss values, I tried comparing the output of the LSTM in the forward pass for the same input. The declaration and initialization of the LSTM is given below:
Tensorflow Code
rnn_cell_video_fw = tf.contrib.rnn.LSTMCell(
num_units=self.options['rnn_size'],
state_is_tuple=True,
initializer=tf.orthogonal_initializer()
)
rnn_cell_video_fw = tf.contrib.rnn.DropoutWrapper(
rnn_cell_video_fw,
input_keep_prob=1.0 - rnn_drop,
output_keep_prob=1.0 - rnn_drop
)
sequence_length = tf.expand_dims(tf.shape(video_feat_fw)[1], axis=0)
initial_state = rnn_cell_video_fw.zero_state(batch_size=batch_size, dtype=tf.float32)
rnn_outputs_fw, _ = tf.nn.dynamic_rnn(
cell=rnn_cell_video_fw,
inputs=video_feat_fw,
sequence_length=sequence_length,
initial_state=initial_state,
dtype=tf.float32
)
PyTorch code
self.rnn_video_fw = nn.LSTM(self.options['video_feat_dim'], self.options['rnn_size'], dropout = self.options['rnn_drop'])
rnn_outputs_fw, _ = self.rnn_video_fw(video_feat_fw)
Initialization for LSTM in train.py
def init_weight(m):
if type(m) in [nn.LSTM]:
for param in m.parameters():
nn.init.orthogonal_(m.weight_hh_l0)
nn.init.orthogonal_(m.weight_ih_l0)
The output for tensorflow
The output for pytorch
The same is pretty much the case for every data item and my PyTorch model isn't converging. Is my suspicion of difference in output LSTM being the reason for it correct? If so, where am I going wrong?
Link to the paper
Link to TF code
let me know if anything else is required.

GRU/RNN state in graph mode Vs eager execution mode

I have same piece of code written first in eager execution mode and then in graph mode. Now, I am not quite able to figure out why the GRU state is not retained in the graph mode while it's working fine in eager mode.
Here's the eager mode code:
import tensorflow as tf
import xxhash
import numpy as np
tf.enable_eager_execution()
rnn_units = 1024
def hash_code(arr):
return xxhash.xxh64(arr).hexdigest()
model = tf.keras.Sequential([tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform', batch_input_shape=[1, None, 256])])
lstm_wt = np.load('lstm_wt.npy', allow_pickle=True) # fixed weights for comparison
lstm_re_wt = np.load('lstm_re_wt.npy', allow_pickle=True)
lstm_bias = np.load('lstm_bias.npy', allow_pickle=True)
model.layers[0].set_weights([lstm_wt, lstm_re_wt, lstm_bias])
op_embed = np.load('op_embed.npy', allow_pickle=True) # fixed input
op_lstm = model(op_embed)
print(hash_code(op_lstm.numpy()))
op_lstm = model(op_embed)
print(hash_code(op_lstm.numpy()))
model.layers[0].reset_states() # now reset the state, you'll get back the initial output.
op_lstm = model(op_embed)
print(hash_code(op_lstm.numpy()))
Output of this code:
d092fdb4739588a3
cdfdf8b8e292c6e8
d092fdb4739588a3
Now, the graph model code:
import tensorflow as tf
import xxhas
import numpy as np
# checking lstm
op_embed = np.load('op_embed.npy', allow_pickle=True)
# load op_embed, lstm weights
lstm_wt = np.load('lstm_wt.npy', allow_pickle=True)
lstm_re_wt = np.load('lstm_re_wt.npy', allow_pickle=True)
lstm_bias = np.load('lstm_bias.npy', allow_pickle=True)
rnn_units = 1024
layers = tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform')
x_placeholder = tf.placeholder(shape=op_embed.shape, dtype=tf.float32)
op_lstm = layers(x_placeholder)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
layers.set_weights([lstm_wt, lstm_re_wt, lstm_bias])
tf.assign(layers.weights[0],lstm_wt ).eval(sess)
tf.assign(layers.weights[1], lstm_re_wt).eval(sess)
tf.assign(layers.weights[2], lstm_bias).eval(sess)
print('keras op hash',xxhash.xxh64(sess.run(op_lstm, feed_dict={x_placeholder:op_embed})).hexdigest())
print('keras op hash',xxhash.xxh64(sess.run(op_lstm, feed_dict={x_placeholder:op_embed})).hexdigest())
output:
keras op hash d092fdb4739588a3
keras op hash d092fdb4739588a3
Any insights on how to fix this ambiguity and retain states in case of graph mode?
There's a similar question asked before but unanswered. Statefulness in eager mode vs non-eager mode
Specifying the Solution here (Answer Section) even though it is present in the Link provided in the Question, for the Benefit of the Community.
The Recurrent Neural Network (RNN or GRU or LSTM) loses its state while executing it in Non-Eager-Mode/Graph-Mode by default.
If we want to retain the state, we need to pass the Initial State during RNN call, as shown below:
current_state = np.zeros((1,1))
state_placeholder = tf.placeholder(tf.float32, shape=[1, 1])
output, state = rnn(x, initial_state=state_placeholder)
Then, while executing the Output, we need to pass the State as well, in addition to the Input for feed_dict.
So, the code,
print('keras op hash',xxhash.xxh64(sess.run(op_lstm, feed_dict={x_placeholder:op_embed})).hexdigest())
can be replaced with
for _ in range(No_Of_TimeSteps):
op_val, state_val = sess.run([op_lstm, state], feed_dict={x_placeholder:op_embed})).hexdigest(),
state_placeholder: current_state.astype(np.float32)})
current_state = state_val
print('keras op hash',xxhash.xxh64(op_val))
Hope this helps. Happy Learning!

How to use tensorflow.distributions in a custom loss function for a keras model

For a Deep learning model I defined with tf2.0 keras I need to write a custom loss function.
As this will depend on stuff like entropy and normal log_prob, it would really make my life less misrable if I could use tf.distributions.Normal and use two model outpus as mu and sigma respectivly.
However, as soon as I put this into my loss function, I get the Keras error that no gradient is defined for this function.
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I tried encalpsulating the call in a tf.contrib.eager.Variable as I read somewhere. Did not help.
What is the trick to use them? I don't see a reason from the fundamental arcitecture why I should not be able to use them in a mixed form.
#this is just an example which does not really give a meaningful result.
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
def custom_loss_fkt(extra_output):
def loss(y_true,y_pred):
dist = tf.distributions.Normal(loc=y_pred,scale=extra_output)
d = dist.entropy()
return K.backend.mean(d)
return loss
input_node = K.layers.Input(shape=(1,))
dense = K.layers.Dense(8,activation='relu')(input_node)
#dense = K.layers.Dense(4,activation='relu')(dense)
out1 = K.layers.Dense(4,activation='linear')(dense)
out2 = K.layers.Dense(4,activation ='linear')(dense)
model = K.Model(inputs = input_node, outputs = [out1,out2])
model.compile(optimizer = 'adam', loss = [custom_loss_fkt(out2),custom_loss_fkt(out1)])
model.summary()
x = np.zeros((1,1))
y1 = np.array([[0.,0.1,0.2,0.3]])
y2 = np.array([[0.1,0.1,0.1,0.1]])
model.fit(x,[y1,y2],epochs=1000,verbose=0)
print(model.predict(x))

Tensorflow Dataset issue at inference phase

I created a char-level language generation with Tensorflow here. I used tf.placeholder API, which according to the google docs:
Feeding is least efficient way to feed data into a TensorFlow program.
I decided to change my code and replace it with new TensroFlow Dataset API.
I used from_generator to create Dataset:
dataset = tf.data.Dataset.from_generator(gen, (tf.int32, tf.int32),
(tf.TensorShape([None, None]),
tf.TensorShape([None, None])))
self.iterator = dataset.make_initializable_iterator()
self.inp, self.target = self.iterator.get_next()
As can be seen in above code, I used [None, None] for Tensorshape to give the model more generality. During the training everything is perfectly fine. But at inference some problem arise. In tf.placeholder API I used following code to generate characters:
def inference(self):
converter = utils.TextReader(filename=FLAGS.CONVERTER_PATH)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
samples = []
new_state = sess.run(self.init_state)
c = 12 # random starting token
samples.append(c)
for i in range(1000):
x = np.zeros((1, 1))
x[0, 0] = c
feed_dict = {
self.inp: x,
self.init_state: new_state
}
preds, new_state = sess.run([self.prediction, self.final_state], feed_dict=feed_dict)
c = utils.pick_top_n(preds, converter.vocab_size)
samples.append(c)
samples = np.array(samples)
print(converter.arr_to_text(samples))
In Dataset API, I dont have tf.placeholder to feed my previous character. And when I use the above code, as expected, following error happened:
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,50] vs. shape[1] = [32,50]
At inference, the model use the same input shape ([32,50]) that I used for training. Which is not what I want (Actually, I define TensorShape([None,None]) to handle this but not works).
How can I fix the issue with new Dataset API?
Complete code.

Initializing tensorflow Variable with an array larger than 2GB

I am trying to initialize a tensorflow Variable with pre-trained word2vec embeddings.
I have the following code:
import tensorflow as tf
from gensim import models
model = models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
X = model.syn0
embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False)
sess.run(tf.initialize_all_variables())
sess.run(embeddings.assign(X))
And I am receiving the following error:
ValueError: Cannot create an Operation with a NodeDef larger than 2GB.
The array (X) I am trying to assign is of shape (3000000, 300) and its size is 3.6GB.
I am getting the same error if I try tf.convert_to_tensor(X) as well.
I know that it fails due to the fact that the array is larger than 2GB. However, I do not know how to assign an array larger than 2GB to a tensorflow Variable
It seems like the only option is to use a placeholder. The cleanest way I can find is to initialize to a placeholder directly:
X_init = tf.placeholder(tf.float32, shape=(3000000, 300))
X = tf.Variable(X_init)
# The rest of the setup...
sess.run(tf.initialize_all_variables(), feed_dict={X_init: model.syn0})
The easiest solution is to feed_dict'ing it into a placeholder node that you use to tf.assign to the variable.
X = tf.Variable([0.0])
place = tf.placeholder(tf.float32, shape=(3000000, 300))
set_x = X.assign(place)
# set up your session here....
sess.run(set_x, feed_dict={place: model.syn0})
As Joshua Little noted in a separate answer, you can also use it in the initializer:
X = tf.Variable(place) # place as defined above
...
init = tf.initialize_all_variables()
... create sess ...
sess.run(init, feed_dict={place: model.syn0})
try this:
import tensorflow as tf
from gensim import models
model = models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin', binary=True)
X = model.syn0
embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
embeddings.load(model.syn0, sess)