How to print out the minibatchData's value? - cntk

minibatch_size = 5
data = reader.next_minibatch(minibatch_size, input_map={ # fetch minibatch
x: reader.streams.query,
y: reader.streams.slot_labels
})
evaluator = C.eval.Evaluator(loss, progress_printer)
evaluator.test_minibatch(data)
print("labels=", data[y].as_sequences())
I got an error for data[y].as_sequences() saying:
raise ValueError('cannot convert sparse value to sequences '
ValueError: cannot convert sparse value to sequences without the corresponding variable
How do I fix this? What is a variable? What should I put?

data[y].as_sequences(variable=y) should do the trick, but I wouldn't recommend it.
On larger datasets as_sequences and asarray quickly cause out of memory exception to be thrown.
I ended up using this:
true_labels = cntk.ops.argmax(labels_input).eval(minibatch[labels_input]).astype(int)

Related

Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative

I am trying to split my data set into train and test sets by using:
for train_set, test_set in stratified.split(complete_df, complete_df["loan_condition_int"]):
stratified_train = complete_df.loc[train_set]
stratified_test = complete_df.loc[test_set]
My dataframe complete_df does not have any NaN value. I make sured it by using complete_df.isnull().sum().max() which returned 0.
But I still get a warning saying:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
And it leads to an error later. I tried to use some techniques I found online but it does not still fix it.
First, you should clarify what is stratified. I'm assuming it's a sklearn's StratifiedShuffleSplit object.
my data set complete_df does not have any NAN value.
"missing labels" from the warning message don't refer to missing values, i.e. NaNs. The error is saying that train_set and/ or test_set contain values (labels) that are not present in the index of complete_df. That's because .loc performs indexing based on row (and column) labels, not row position, while train_set and test_set indicate the row numbers. So if the index of your DataFrame doesn't coincide with the integer locations of the rows, which seems the case, the warning is raised.
To select by row position, use iloc. This should work
for train_set, test_set in stratified.split(complete_df, complete_df["loan_condition_int"]):
stratified_train = complete_df.iloc[train_set]
stratified_test = complete_df.iloc[test_set]

tf.io.decode_raw return tensor how to make it bytes or string

I'm struggling with this for a while. I searched stack and check tf2
doc a bunch of times. There is one solution indicated, but
I don't understand why my solution doesn't work.
In my case, I store a binary string (i.e., bytes) in tfrecords.
if I iterate over dataset via as_numpy_list or directly call numpy()
on each item, I can get back binary string.
while iterating the dataset, it does work.
I'm not sure what exactly map() passes to test_callback.
I see doesn't have a method nor property numpy, and the same about type
tf.io.decode_raw return. (it is Tensor, but it has no numpy as well)
Essentially I need to take a binary string, parse it via my
x = decoder.FromString(y) and then pass it my encoder
that will transform x binary string to tensor.
def test_callback(example_proto):
# I tried to figure out. can I use bytes?decode
# directly and what is the most optimal solution.
parsed_features = tf.io.decode_raw(example_proto, out_type=tf.uint8)
# tf.io.decoder returns tensor with N bytes.
x = creator.FromString(parsed_features.numpy)
encoded_seq = midi_encoder.encode(x)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(test_callback)
Thank you, folks.
I found one solution but I would love to see more suggestions.
def test_callback(example_proto):
from_string = creator.FromString(example_proto.numpy())
encoded_seq = encoder.encoder(from_string)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(lambda x: tf.py_function(test_callback, [x], [tf.int64]))
My understanding that tf.py_function has a penalty on performance.
Thank you

Tensorflow: InvalidArgumentError: Input ... incompatible with expected float_ref

The following code results in a very unhelpful error:
import tensorflow as tf
x = tf.Variable(tf.constant(0.), name="x")
with tf.Session() as s:
val = s.run(x.assign(1))
print(val) # 1
val = s.run(x, {x: 2})
print(val) # 2
val = s.run(x.assign(1), {x: 0.}) # InvalidArgumentError
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 0 of node Assign_1 was passed float from _arg_x_0_0:0 incompatible with expected float_ref.
How did I get this error?
Why do I get this error?
Here's what I could infer.
How did I get this error?
This error is seen when attempting to perform the following two operations in a single session run:
A Tensorflow variable is assigned a value
That same variable is also passed a value as part of the feed_dict
This is why the first 2 runs succeed (they both don't simultaneously attempt to perform both these operations).
Why do I get this error?
I am not sure, but I don't think this was an intentional design choice by Google. Here's my explanation:
Firstly, the TF(TensorFlow) source code (basically) resolves x.assign(1) to tf.assign(x, 1) which gives us a hint for better understand the error message when it says Input 0.
The error message refers to x when it says Input 0 of the assign op.
It goes on to say that the first argument of the assign op was passed float from _arg_x_0_0:0.
TLDR
Thus for a run where a TF variable is provided as a feed, that variable will no longer be treated as a variable (but instead as the value it was assigned), and thus any attempts at further assigning a value to it would be erroneous since only TF variables can be assigned a value in the graph.
Fix
If your graph has variable assignment operation, don't pass a value to that same variable in your feed_dict. ¯_(ツ)_/¯. Assuming you're using the feed_dict to provide an initial value, you could instead assign it a value in a prior session run. Or, leverage tf.control_dependencies when building your graph to assign it an initial value from a placeholder as shown below:
import tensorflow as tf
x = tf.Variable(tf.constant(0.), name="x")
initial_x = tf.placeholder(tf.float32)
assign_from_placeholder = x.assign(initial_x)
with tf.control_dependencies([assign_from_placeholder]):
x_assign = x.assign(1)
with tf.Session() as s:
val = s.run(x_assign, {initial_x: 0.}) # Success!

Placeholders for LSTM-RNN parameters in TensorFlow

I would like to use placeholders for the dropout rate, number of hidden units, and number of layers in an LSTM-based RNN. Below is the code I am currently trying.
dropout_rate = tf.placeholder(tf.float32)
n_units = tf.placeholder(tf.uint8)
n_layers = tf.placeholder(tf.uint8)
net = rnn_cell.BasicLSTMCell(n_units)
net = rnn_cell.DropoutWrapper(net, output_keep_prob = dropout_rate)
net = rnn_cell.MultiRNNCell([net] * n_layers)
The last line gives the following error:
TypeError: Expected uint8, got <tensorflow.python.ops.rnn_cell.DropoutWrapper
object ... of type 'DropoutWrapper' instead.
I would appreciate any help.
The Error is raised from the following code: [net] * n_layers.
You are trying to make a list looking like [net, net, ..., net] (with a length of n_layers), but n_layers is now a placeholder of unknown value.
I can't think of a way to do that with a placeholder, so I guess you must go back to a standard n_layers=3. (Anyway, putting n_layers as a placeholder was not a good practice in the first place.)

Is there a way to easily get the logarithm of a np.ndarray containing errors

If I have a np.array of values, Y, with a no.array of corresponding errors, Err, the error in the log scale will be
Err_{log} = log(Y+Err) - log(Y) = log ((Y+Err)/Y)
While I can place this in my code, this isn't much readable. Is there a function that does that?
NumPy has the function log1p(x) that computes the log of 1+x. So you could write:
Err_log = np.log1p(Err/Y)