get TensorFlow tf.Operation inputs by name - tensorflow

Let op be a tf.Operation. I can get its inputs via op.inputs. But the inputs also have names defined by the op type, which I probably can extract somehow via op.op_def. But is there an easy way?
In the end, I would want a dict like {input_name[i]: op.inputs[i] for i ...}.
My current solution:
def get_protobuf_fields(obj):
"""
:param obj: protobuf object
:rtype: dict[str]
"""
return {k.name: v for (k, v) in obj.ListFields()}
def get_operation_input_names(op):
"""
:param tf.Operation op:
:rtype: list[str]
"""
op_def_fields = get_protobuf_fields(op.op_def)
args_pb = [get_protobuf_fields(a) for a in op_def_fields["input_arg"]]
return [a["name"] for a in args_pb]
Then you get the dict like:
inputs_by_name = dict(zip(get_operation_input_names(op), op.inputs))
For example, for op = tf.add(3,4).op, you would get inputs_by_name = {'x': ..., 'y': ...}.
Is there a simpler way?

Related

Keras custom layer on ragged tensor to reduce dimensionallity

I'm trying to write a custom layer that will handle variable-length vectors, and reduce them to the same length vector.
The length is known in advance because the reason for the variable lengths is that I have several different data types that I encode using a different number of features.
In a sense, it is similar to Embedding only for numerical values.
I've tried using padding, but the results were bad, so I'm trying this approach instead.
So, for example let's say I have 3 data types, which I encode with 3, 4, 6 length vectors.
arr = [
# example one (data type 1 [len()==3], datat type 3[len()==6]) - force values as floats
[[1.0,2.0,3],[1,2,3,4,5,6]],
# example two (data type 2 [len()==4], datat type 3len()==6]) - force values as floats
[[1.0,2,3,4],[1,2,3,4,5,6]],
]
I tried implementing a custom layer like:
class DimensionReducer(tf.keras.layers.Layer):
def __init__(self, output_dim, expected_lengths):
super(DimensionReducer, self).__init__()
self._supports_ragged_inputs = True
self.output_dim = output_dim
for l in expected_lengths:
setattr(self,f'w_{l}', self.add_weight(shape=(l, self.output_dim),initializer='random_normal',trainable=True))
setattr(self, f'b_{l}',self.add_weight(shape=(self.output_dim,), initializer='random_normal',trainable=True))
def call(self, inputs):
print(inputs.shape)
# batch
if len(inputs.shape) == 3:
print("batch")
result = []
for i,x in enumerate(inputs):
_result = []
for v in x:
l = len(v)
print(l)
print(v)
w = getattr(self, f'w_{l}')
b = getattr(self, f'b_{l}')
out = tf.matmul([v],w) + b
_result.append(out)
result.append(tf.concat(_result, 0))
r = tf.stack(result)
print("batch output:",r.shape)
return r
Which seems to be working when called directly:
dim = DimensionReducer(3, [3,4,6])
dim(tf.ragged.constant(arr))
But when I try to incorporate it into a model, it fails:
import tensorflow as tf
val_ragged = tf.ragged.constant(arr)
inputs_ragged = tf.keras.layers.Input(shape=(None,None), ragged=True)
outputs_ragged = DimensionReducer(3, [3,4,6])(inputs_ragged)
model_ragged = tf.keras.Model(inputs=inputs_ragged, outputs=outputs_ragged)
# this one with RaggedTensor doesn't
print(model_ragged(val_ragged))
With
AttributeError: 'DimensionReducer' object has no attribute 'w_Tensor("dimension_reducer_98/strided_slice:0", shape=(), dtype=int32)'
I'm not sure how am I to implement such a layer, or what I'm doing wrong.

how to iterate over tensorflow tensor_list referencing both index and individual tensor

I tried following code, used tf.map_fn, and still got error as: "Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn."
def get_tensor_A(i):
return mask[i] #another tensor returned
def get_tensor_B(x):
return tf.nn.softmax(x, 1)
def final_rt(i, inp):
return get_tensor_A(idx)*get_tensor_B(inp)
tensor_2_list = [ tf.map_fn(lambda x: final_rt(x(0), x(1)), (idx, inp)) for idx, inp in enumerate(tensor_1_list)]

How to perform string find and replace on Tensorflow String Tensor?

I currently am using the Tensorflow dataset api to perform some augmentations to images at a specified path. The filename itself contains information that states whether to augment the file or not. So what I want to do is read in the files from the dataset and for each file, perform a find within the filename and if I find a specific substring, then set a bool flag and replace the substring with "".
The error I get is:
AttributeError: 'Tensor' object has no attribute 'find'
I can't perform a "find" on the tensor with dtype string entries because find is not a part of the Tensor, so I am trying to figure out how I can go about performing the above action. I have shared some code below that I think demonstrates what I am trying to do. Performance is important, so I would prefer to do this the correct way if anyone sees that I am going about doing this via the Dataset API incorrectly.
def preproc_img(filenames):
def parse_fn(filename):
augment_inst = False
if cfg.SPLIT_INTO_INST:
#*****************************************************
#*** THIS IS WHERE THE LOGIC IS CURRENTLY BREAKING ***
#*****************************************************
if filename.find('_data_augmentation') != -1:
augment_inst = True
filename = filename.replace('_data_augmentation', '')
image_string = tf.read_file(filename)
img = tf.image.decode_image(image_string, channels=3)
return dict(zip([filename], [img]))
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.map(parse_fn)
iterator = dataset.make_one_shot_iterator()
return iterator.get_next()
def perform_train():
if __name__ == '__main__':
filenames = helper.get_image_paths()
next_batch = preproc_img(filenames)
with tf.Session() as sess:
with sess .graph.as_default():
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
dat = sess.run(next_batch)
# I would now go about calling any of my tf op code below
You can use tf.regex_replace for replacing text in a tf.string tensor.
filename = tf.regex_replace(filename, "_data_augmentation", "")
For TF 2.0
filename = tf.strings.regex_replace(filename, "_data_augmentation", "")

TensorFlow input function for reading sparse data (in libsvm format)

I'm new to TensorFlow and trying to use the Estimator API for some simple classification experiments. I have a sparse dataset in libsvm format. The following input function works for small datasets:
def libsvm_input_function(file):
def input_function():
indexes_raw = []
indicators_raw = []
values_raw = []
labels_raw = []
i=0
for line in open(file, "r"):
data = line.split(" ")
label = int(data[0])
for fea in data[1:]:
id, value = fea.split(":")
indexes_raw.append([i,int(id)])
indicators_raw.append(int(1))
values_raw.append(float(value))
labels_raw.append(label)
i=i+1
indexes = tf.SparseTensor(indices=indexes_raw,
values=indicators_raw,
dense_shape=[i, num_features])
values = tf.SparseTensor(indices=indexes_raw,
values=values_raw,
dense_shape=[i, num_features])
labels = tf.constant(labels_raw, dtype=tf.int32)
return {"indexes": indexes, "values": values}, labels
return input_function
However, for a dataset of a few GB size I get the following error:
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
How can I avoid this error? How should I write an input function to read medium-sized sparse datasets (in libsvm format)?
When use estimator, for libsvm data input, you can create dense index list, dense value list, then use feature_column.categorical_column_with_identity and feature_column.weighted_categorical_column to create feature column, finally, put feature columns to estimator. Maybe your input features length is variable, you can use padded_batch to handle it.
here some codes:
## here is input_fn
def input_fn(data_dir, is_training, batch_size):
def parse_csv(value):
## here some process to create feature_indices list, feature_values list and labels
return {"index": feature_indices, "value": feature_values}, labels
dataset = tf.data.Dataset.from_tensor_slices(your_filenames)
ds = dataset.flat_map(
lambda f: tf.data.TextLineDataset(f).map(parse_csv)
)
ds = ds.padded_batch(batch_size, ds.output_shapes, padding_values=(
{
"index": tf.constant(-1, dtype=tf.int32),
"value": tf.constant(0, dtype=tf.float32),
},
tf.constant(False, dtype=tf.bool)
))
return ds.repeat().prefetch(batch_size)
## create feature column
def build_model_columns():
categorical_column = tf.feature_column.categorical_column_with_identity(
key='index', num_buckets=your_feature_dim)
sparse_columns = tf.feature_column.weighted_categorical_column(
categorical_column=categorical_column, weight_feature_key='value')
dense_columns = tf.feature_column.embedding_column(sparse_columns, your_embedding_dim)
return [sparse_columns], [dense_columns]
## when created feature column, you can put them into estimator, eg. put dense_columns into DNN, and sparse_columns into linear model.
## for export savedmodel
def raw_serving_input_fn():
feature_spec = {"index": tf.placeholder(shape=[None, None], dtype=tf.int32),
"value": tf.placeholder(shape=[None, None], dtype=tf.float32)}
return tf.estimator.export.build_raw_serving_input_receiver_fn(feature_spec)
Another way, you can create your custom feature column, like this: _SparseArrayCategoricalColumn
I have been using tensorflow.contrib.libsvm. Here's an example (i am using eager execution with generators)
import os
import tensorflow as tf
import tensorflow.contrib.libsvm as libsvm
def all_libsvm_files(folder_path):
for file in os.listdir(folder_path):
if file.endswith(".libsvm"):
yield os.path.join(folder_path, file)
def load_libsvm_dataset(path_to_folder):
return tf.data.TextLineDataset(list(all_libsvm_files(path_to_folder)))
def libsvm_iterator(path_to_folder):
dataset = load_libsvm_dataset(path_to_folder)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
yield libsvm.decode_libsvm(tf.reshape(next_element, (1,)),
num_features=666,
dtype=tf.float32,
label_dtype=tf.float32)
libsvm_iterator gives you a feature-label pair back on each iteration, from multiple files inside a folder that you specify.

while_loop error in Tensorflow

I tried to use while_loop in Tensorflow, but when I try to return the target output from callable in while loop, it gives me an error because the shape is increased every time.
The output should be contains (0 or 1) values based on data value (input array). If data value is large than 5 return 1 else return 0. The returned value must be added into output
This is the code::
import numpy as np
import tensorflow as tf
data = np.random.randint(10, size=(30))
data = tf.constant(data, dtype= tf.float32)
global output
output= tf.constant([], dtype= tf.float32)
i = tf.constant(0)
c = lambda i: tf.less(i, 30)
def b(i):
i= tf.add(i,1)
cond= tf.cond(tf.greater(data[i-1], tf.constant(5.)), lambda: tf.constant(1.0), lambda: tf.constant([0.0]))
output =tf.expand_dims(cond, axis = i-1)
return i, output
r,out = tf.while_loop(c, b, [i])
print(out)
sess= tf.Session()
sess.run(out)
The error::
r, out = tf.while_loop(c, b, [i])
ValueError: The two structures don't have the same number of elements.
First structure (1 elements): [tf.Tensor 'while/Identity:0' shape=()
dtype=int32]
Second structure (2 elements): [tf.Tensor 'while/Add:0' shape=()
dtype=int32, tf.Tensor 'while/ExpandDims:0' shape=unknown
dtype=float32>]
I use tensorflow-1.1.3 and python-3.5
How can I change my code to gives me the target result?
EDIT::
I edit the code based on #mrry answer, but I still have an issue that the output is incorrect answer
the output is numbers summation
a = tf.ones([10,4])
print(a)
a = tf.reduce_sum(a, axis = 1)
i =tf.constant(0)
c = lambda i, _:tf.less(i,10)
def Smooth(x):
return tf.add(x,2)
summ = tf.constant(0.)
def b(i,_):
global summ
summ = tf.add(summ, tf.cast(Smooth(a[i]), tf.float32))
i= tf.add(i,1)
return i, summ
r, smooth_l1 = tf.while_loop(c, b, [i, smooth_l1])
print(smooth_l1)
sess = tf.Session()
print(sess.run(smooth_l1))
the out put is 6.0 (wrong).
The tf.while_loop() function requires that the following four lists have the same length, and the same type for each element:
The list of arguments to the cond function (c in this case).
The list of arguments to the body function (b in this case).
The list of return values from the body function.
The list of loop_vars representing the loop variables.
Therefore, if your loop body has two outputs, you must add a corresponding argument to b and c, and a corresponding element to loop_vars:
c = lambda i, _: tf.less(i, 30)
def b(i, _):
i = tf.add(i, 1)
cond = tf.cond(tf.greater(data[i-1], tf.constant(5.)),
lambda: tf.constant(1.0),
lambda: tf.constant([0.0]))
# NOTE: This line fails with a shape error, because the output of `cond` has
# a rank of either 0 or 1, but axis may be as large as 28.
output = tf.expand_dims(cond, axis=i-1)
return i, output
# NOTE: Use a shapeless `tf.placeholder_with_default()` because the shape
# of the output will vary from one iteration to the next.
r, out = tf.while_loop(c, b, [i, tf.placeholder_with_default(0., None)])
As noted in the comments, the body of the loop (specifically the call to tf.expand_dims()) seems to be incorrect and this program won't work as-is, but hopefully this is enough to get you started.
If you see this error:
ValueError: The two structures don't have the same number of elements.
If you see it in a while_loop, that means your inputs and outputs out of the while loop have different shapes.
I solved it by making sure that I return the same structure of loop_vars from my while loop function, the condition function must also accept same loop vars.
Here is an example code
loop_vars = [i, loss, batch_size, smaller_str_lens]
def condition(*loop_vars):
i = loop_vars[0]
batch_size = loop_vars[2]
return tf.less(i, batch_size)
def body(*loop_vars):
i, loss, batch_size, smaller_str_lens = loop_vars
tf.print("The loop passed here")
## logic here
i = tf.add(i, 1)
return i, loss, batch_size, smaller_str_lens
loss = tf.while_loop(condition, compare_strings, loop_vars)[1]
The body func must return loop vars, and the condition func must accept loop vars