CNTK: Applying average pooling over LSTM output - cntk

I am trying to apply an average pooling function over the outputs of an LSTM for a sequence:
Sequential([
Embedding(emb_dim),
pooling(Recurrence(LSTM(hidden_dim), go_backwards=False), PoolingType_Average, (hidden_dim,)),
Dense(num_labels)
])
When I was just using the last element of the sequence it was working without problems:
Sequential([
Embedding(emb_dim),
sequence.last(Recurrence(LSTM(hidden_dim), go_backwards=False)),
Dense(num_labels)
])
a. Is the addition of pooling int he network definition correct and does the shape I set describe the operation I am trying to perform? (i.e. to average the vectors coming from the LSTM for each sample in the sequence)
b. The format of my input data that worked when using sequence.last is the following (for 1 sequence). Does it need to change to apply mean pooling?
1 |x 5:1 |y 1 0 0 0 0
1 |x 414:1
1 |x 8:1
The error I get is:
File .../model_training.py", line 55, in train
criterion.placeholders[1]: Input(num_labels, dynamic_axes=[Axis.default_batch_axis()])})
File ".../anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/utils/swig_helper.py", line 58, in wrapper
result = f(*args, **kwds)
File ".../anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/ops/functions.py", line 449, in replace_placeholders
return super(Function, self).replace_placeholders(substitutions)
File ".../anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/cntk_py.py", line 1246, in replace_placeholders
return _cntk_py.Function_replace_placeholders(self, placeholderReplacements)
RuntimeError: Currently if an operand of a elementwise operation has any dynamic axes, those must match the dynamic axes of the other operands

Pooling only works with static axes. There is a branch with a sequence.pooling operation that should be available in master around the end of January (2017). You can also do average pooling with a recurrence. This example pools with a "learned" average pooling via a recurrence.

Related

Convert an TF Agents ActorDistributionNetwork into a Tensorflow lite model

I would like to convert the ActorDistributionModel from a trained PPOClipAgent into a Tensorflow Lite model for deployment. How should I accomplish this?
I have tried following this tutorial (see section at bottom converting policy to TFLite), but the network outputs a single action (the policy) rather than the density function over actions that I desire.
I think perhaps something like this could work:
tf.compat.v2.saved_model.save(actor_net, saved_model_path, signature=?)
... if I knew how to set the signature parameter. That line of code executes without error when I omit the signature parameter, but I get the following error on load (I assume because the signature is not set up correctly):
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
File "/home/ais/salesmentor.ai/MDPSolver/src/solver/ppo_budget.py", line 336, in train_eval
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
File "/home/ais/.local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py", line 1275, in from_saved_model
raise ValueError("Only support a single signature key.")
ValueError: Only support a single signature key.
This appears to work. I won't accept the answer until I have completed an end-to-end test, though.
def export_model(actor_net, observation_spec, saved_model_path):
predict_signature = {
'action_pred':
tf.function(func=lambda x: actor_net(x, None, None)[0].logits,
input_signature=(tf.TensorSpec(shape=observation_spec.shape),)
)
}
tf.saved_model.save(actor_net, saved_model_path, signatures=predict_signature)
# Convert to TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path,
signature_keys=["action_pred"])
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
tflite_policy = converter.convert()
with open(os.path.join(saved_model_path, 'policy.tflite'), 'wb') as f:
f.write(tflite_policy)
The solution wraps the actor_net in a lambda because I was unable to figure out how to specify the signature with all three expected arguments. Through the lambda, I convert the function into using a single argument (a tensor). I expect to pass None to the other two arguments in my use case, so there is nothing lost in this approach.
I see you using CartPole as the model simulation, Agent DQN, and Model learning and Evaluation from links provided TF-Agent Checkpointer. For simple understanding, you need to understand about the distributions and your model limits ( less than 6 actions determining at a time ).
Discretes Distribution, answer the question to the points but the links is how they implement AgentDQN on TF- Agent.
temp = tf.random.normal([10], 1, 0.2, tf.float32), mean is one and the standard deviation is 0.2. Overall of result summation product is nearby one and its variance is 0.2, when they have 10 actions to determine the possibility of the result is the same action is 1 from 5 or 0.5. random normal
Coefficient is ladder steps or you understand as IF and ELSE conditions or SWITCH conditions such as at the gap of 0 to 5, 5 to 10, 10 to 15, and continue.
The matrixes product from the Matrix coefficients and randoms is selected 4 - 5 actions sorted by priority, significant and select the most effects in rows.
The ArgMax is 0 to 9 which is actions 0 - 9 that respond to the environment input co-variances.
Sample: To the points, random distributions and selective agents ( we call selective agent maybe the questioner has confused with NN DQN )
temp = tf.random.normal([10], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ coefficient_0, coefficient_1, coefficient_2, coefficient_3, coefficient_4, coefficient_5, coefficient_6, coefficient_7, coefficient_8, coefficient_9 ])
temp = tf.nn.softmax(temp)
action = int(np.argmax(temp))

How to apply Mean Square Error row-wise in Python using NumPy without looping

I'm building a primitive neural network to emulate AND gate. The loss-fucntion is MSE:
def mse(predicted, desired):
return np.square(np.subtract(predicted, desired)).mean()
In the following there are a prediction, and the desired outputs (a.k.a. labels):
predicted = np.array
([[0.5000, 0.5000], # 0 AND 0
[0.4721, 0.5279], # 0 AND 1
[0.3049, 0.6951], # 1 AND 0
[0.3345, 0.6655]]) # 1 AND 1
desired = np.array
([[1, 0], # False
[1, 0], # False
[1, 0], # False
[0, 1]]) # True
Each row (in both of the above matrices) indicates a single case. I want to keep all the cases to be held together like this, rather than splitting them into vectors. The catch is, I need to treat each row individually.
I'm trying to get the following result, but yet I couldn't:
returned output =
[0.2500, # 1st CASE ERROR
0.2786, # 2nd CASE ERROR
0.4831, # 3rd CASE ERROR
0.1118] # 4th CASE ERROR
I tried the following function...
np.apply_along_axis(mse, 1, predicted, desired)
but it didn't work because "desire" is being passed as the whole matrix, rather than a row at a time. So, is there any way to achieve that without changing "mse function" implementation or loops?
Because all your data is in nicely formed ndarrays you can make NumPy do all the heavy lifting. In this case you can convert your for loop into a reduction along one of the array dimensions.
np.square(np.subtract(predicted, desired)).mean(1)
or
((predicted-desired)**2).mean(1)
which is more readable IMO.

Why Tensorflow error: `failed to convert object of type <class 'dict'> to Tensor` happens and How can I solve it?

I am doing a task on traffic analysis and I am stymied with some error in my code. My data rows are like this:
qurter | DOW (Day of week)| Hour | density | speed | label (predicted speed for another half an hour)
The values are like this:
1, 6, 19, 23, 53.32, 45.23
Which means in some specific street during 1st quarter of 19 o'clock on Friday, density of traffic is measured 23 and current speed is 53.32. the predicted speed would be 45.23.
The task is to predict the speed for another half an hour by predictors given above.
I am using this code to build a TensorFlow DNNRegressor for data:
import pandas as pd
data = pd.read_csv('dataset.csv')
X = data.iloc[:,:5].values
y = data.iloc[:, 5].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=0)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(data=scaler.transform(X_train),columns = ['quarter','DOW','hour','density','speed'])
X_test = pd.DataFrame(data=scaler.transform(X_test),columns = ['quarter','DOW','hour','density','speed'])
y_train = pd.DataFrame(data=y_train,columns = ['label'])
y_test = pd.DataFrame(data=y_test,columns = ['label'])
import tensorflow as tf
speed = tf.feature_column.numeric_column('speed')
hour = tf.feature_column.numeric_column('hour')
density = tf.feature_column.numeric_column('density')
quarter= tf.feature_column.numeric_column('quarter')
DOW = tf.feature_column.numeric_column('DOW')
feat_cols = [h_percentage, DOW, hour, density, speed]
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,num_epochs=1000,shuffle=False)
model = tf.estimator.DNNRegressor(hidden_units=[5,5,5],feature_columns=feat_cols)
model.train(input_fn=input_func,steps=25000)
predict_input_func = tf.estimator.inputs.pandas_input_fn(
x=X_test,
batch_size=10,
num_epochs=1,
shuffle=False)
pred_gen = model.predict(predict_input_func)
predictions = list(pred_gen)
final_preds = []
for pred in predictions:
final_preds.append(pred['predictions'])
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,final_preds)**0.5
when I run this code, It throws an error with this ending:
TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'label': <tf.Tensor 'fifo_queue_DequeueUpTo:6' shape=(?,) dtype=float64>}. Consider casting elements to a supported type.
First of all what is the concept of error? I couldn't find source for reason of error to deal with it. And how can I modify code for solution?
secondly does it improve the model performance to use tensorflow categorical_column_with_identity instead of numeric_columns for DOW which indicates days of week?
I also want to know if it's useful to merge quarter and hour as a single column like day time (quarter is minutes in an hour which is going to be normalized between 0 and 1)?
First of all what is the concept of error? I couldn't find source for
reason of error to deal with it. And how can I modify code for
solution?
Let me first talk about the solution to the problem. You need to change parameter y in pandas_input_fn as follows.
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train['label'],batch_size=10,num_epochs=1000,shuffle=False)
It seems that the parameters y in pandas_input_fn doesn't support dataframe type when you run to model.train(). pandas_input_fn parses every sample y to a form similar to {columnname: value} in this case, but model.train() can't recognize it. So you need to pass series type.
secondly does it improve the model performance to use tensorflow
categorical_column_with_identity instead of numeric_columns for DOW
which indicates days of week?
This involves when we should choose categorical or choose numeric for feature engineering. A very simple rule is to choose numeric if there is a significant difference between big and small in the internal comparison of your feature. If the feature does not have bigger or smaller significance, you should choose categorical. So I tend to choose categorical_column_with_identity for feature DOW.
I also want to know if it's useful to merge quarter and hour as a
single column like day time (quarter is minutes in an hour which is
going to be normalized between 0 and 1)?
Cross features may bring some benefits such as latitude and longitude features. I recommend you to use tf.feature_column.crossed_column(link) here. It returns a column for performing crosses of categorical features. You can also continue to retain features quarter and hour in model at the same time, .
A similar error occurred to me:
Failed to convert object of type <class 'tensorflow.python.autograph.operators.special_values.Undefined'> to Tensor.
It occurred in a tf.function when I tried to use a variable that I had not assigned before.
To debug this, you have to remove tf.function from the method ;-)

compute Hessians w.r.t higher rank variable not work neither by tf.hessians() nor tf.gradients()

When we need to calculate double gradient or Hessian, in tensorflow, we may use tf.hessians(F(x),x), or use tf.gradient(tf.gradients(F(x),x)[0], x)[0]. However, when x is not rank one, I was told the following error when use tf.hessians().
ValueError: Cannot compute Hessian because element 0 of xs does not
have rank one.. Tensor model_inputs/action:0 must have rank 1.
Received rank 2, shape (?, 1)
in following code:
with tf.name_scope("1st scope"):
self.states = tf.placeholder(tf.float32, (None, self.state_dim), name="states")
self.action = tf.placeholder(tf.float32, (None, self.action_dim), name="action")
with tf.name_scope("2nd scope"):
with tf.variable_scope("3rd scope"):
self.policy_outputs = self.policy_network(self.states)
# use tf.gradients twice
self.actor_action_gradients = tf.gradients(self.policy_outputs, self.action)[0]
self.actor_action_hessian = tf.gradients(self.actor_action_gradients, self.action)[0]
# or use tf.hessians
self.actor_action_hessian = tf.hessian(self.policy_outputs, self.action)
When using tf.gradients(), also causes an error:
in create_variables self.actor_action_hessian =
tf.gradients(self.actor_action_gradients, self.action)[0]
AttributeError: 'NoneType' object has no attribute 'dtype'
How can I fix this, does neither tf.gradients() nor tf.hessians() can be used in this case?
The second approach is fine, error is somewhere else, namely your graph is not connected.
self.actor_action_gradients = tf.gradients(self.policy_outputs, self.action)[0]
self.actor_action_hessian = tf.gradients(self.actor_action_gradients, self.action)[0]
errror is thrown in second line because self.actor_action_gradients is None, and so you can't compute its gradient. Nothing in your code suggests that self.policy_outputs depends on self.action (and it shouldn't, since its action that depends on policy, not policy on action).
Once you fix this you will notice, that "hessian" is not really a hessian but a vector, to form a proper hessian of f wrt. x you have to iterate over all values returned by tf.gradients, and compute tf.gradients of each one independently. This is a known limitation in TF, and no simpler way is available right now.

pybrain LSTM sequence to predict sequential data

I have written a simple code using pybrain to predict a simple sequential data.
For example a sequence of 0,1,2,3,4 will supposed to get an output of 5 from the network. The dataset specifies the remaining sequence.
Below are my codes implementation
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.datasets import SequentialDataSet
from pybrain.structure import SigmoidLayer, LinearLayer
from pybrain.structure import LSTMLayer
import itertools
import numpy as np
INPUTS = 5
OUTPUTS = 1
HIDDEN = 40
net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=LinearLayer, recurrent=True, bias=True)
ds = SequentialDataSet(INPUTS, OUTPUTS)
ds.addSample([0,1,2,3,4],[5])
ds.addSample([5,6,7,8,9],[10])
ds.addSample([10,11,12,13,14],[15])
ds.addSample([16,17,18,19,20],[21])
net.randomize()
trainer = BackpropTrainer(net, ds)
for _ in range(1000):
print trainer.train()
x=net.activate([0,1,2,3,4])
print x
The output on my screen keeps showing [0.99999999 0.99999999 0.9999999 0.99999999] every simple time. What am I missing? Is the training not sufficient? Because trainer.train()
shows output of 86.625..
The pybrain sigmoidLayer is implementing the sigmoid squashing function, which you can see here:
sigmoid squashing function code
The relevant part is this:
def sigmoid(x):
""" Logistic sigmoid function. """
return 1. / (1. + safeExp(-x))
So, no matter what the value of x, it will only ever return values between 0 and 1. For this reason, and for others, it is a good idea to scale your input and output values to between 0 and 1. For example, divide all your inputs by the maximum value (assuming the minimum is no lower than 0), and the same for your outputs. Then do the reverse with the result (e.g. multiply by 25 if you were dividing by 25 at the beginning).
Also, I'm no expert on pybrain, but I wonder if you need OUTPUTS = 4? It looks like you have only one output in your data, so I'm wondering if you could just use OUTPUTS = 1.
You may also try scaling the inputs and outputs to a particular part of the sigmoid curve (e.g. between 0.1 and 0.9) to make the pybrain's job easier, but that makes the scaling before and after a little more complex.