Mapping tensors to operations in tensorflow - tensorflow

I'm trying to combine a number of tf.write_file ops into single op to save a batch of images inside single Session.run() call, with no luck.
This code:
r = tf.range(tf.shape(images)[0])
crops = tf.image.crop_and_resize(images, boxes, r, [160, 160])
crops = tf.image.convert_image_dtype(crops, dtype=tf.uint8)
def write_ith_image(i):
nonlocal out_file_tensor, crops
encoded_image = tf.image.encode_jpeg(crops[i])
return tf.write_file(out_file_tensor[i], encoded_image)
encode_all = tf.map_fn(write_ith_image, r)
results in:
TypeError: Can't convert Operation 'map/while/WriteFile' to Tensor (target dtype=None, name='value', as_ref=False)
tf.while_loop gives similar results. Is there a way to map a tensor to a group of operations that can be executed in single tf.Session.run() call? Why this works for tensors, but not for operations with side effects?

You could return a dummy value.
def write_ith_image(i):
nonlocal out_file_tensor, crops
encoded_image = tf.image.encode_jpeg(crops[i])
with tf.control_dependencies([tf.write_file(out_file_tensor[i], encoded_image)]):
dummy = tf.constant([0])
return dummy
encode_all = tf.map_fn(write_ith_image, r)
Not pretty, but gets the job done.
You could get something slightly more satisfying with tf.while_loop (no dummy values), but it would be more verbose and probably not any more efficient.

Related

tf.io.decode_raw return tensor how to make it bytes or string

I'm struggling with this for a while. I searched stack and check tf2
doc a bunch of times. There is one solution indicated, but
I don't understand why my solution doesn't work.
In my case, I store a binary string (i.e., bytes) in tfrecords.
if I iterate over dataset via as_numpy_list or directly call numpy()
on each item, I can get back binary string.
while iterating the dataset, it does work.
I'm not sure what exactly map() passes to test_callback.
I see doesn't have a method nor property numpy, and the same about type
tf.io.decode_raw return. (it is Tensor, but it has no numpy as well)
Essentially I need to take a binary string, parse it via my
x = decoder.FromString(y) and then pass it my encoder
that will transform x binary string to tensor.
def test_callback(example_proto):
# I tried to figure out. can I use bytes?decode
# directly and what is the most optimal solution.
parsed_features = tf.io.decode_raw(example_proto, out_type=tf.uint8)
# tf.io.decoder returns tensor with N bytes.
x = creator.FromString(parsed_features.numpy)
encoded_seq = midi_encoder.encode(x)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(test_callback)
Thank you, folks.
I found one solution but I would love to see more suggestions.
def test_callback(example_proto):
from_string = creator.FromString(example_proto.numpy())
encoded_seq = encoder.encoder(from_string)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(lambda x: tf.py_function(test_callback, [x], [tf.int64]))
My understanding that tf.py_function has a penalty on performance.
Thank you

Tensorflow/Keras, How to convert tf.feature_column into input tensors?

I have the following code to average embeddings for list of item-ids.
(Embedding is trained on review_meta_id_input, and used as look up for pirors_input and for getting average embedding)
review_meta_id_input = tf.keras.layers.Input(shape=(1,), dtype='int32', name='review_meta_id')
priors_input = tf.keras.layers.Input(shape=(None,), dtype='int32', name='priors') # array of ids
item_embedding_layer = tf.keras.layers.Embedding(
input_dim=100, # max number
output_dim=self.item_embedding_size,
name='item')
review_meta_id_embedding = item_embedding_layer(review_meta_id_input)
selected = tf.nn.embedding_lookup(review_meta_id_embedding, priors_input)
non_zero_count = tf.cast(tf.math.count_nonzero(priors_input, axis=1), tf.float32)
embedding_sum = tf.reduce_sum(selected, axis=1)
item_average = tf.math.divide(embedding_sum, non_zero_count)
I also have some feature columns such as..
(I just thought feature_column looked cool, but not many documents to look for..)
kid_youngest_month = feature_column.numeric_column("kid_youngest_month")
kid_age_youngest_buckets = feature_column.bucketized_column(kid_youngest_month, boundaries=[12, 24, 36, 72, 96])
I'd like to define [review_meta_id_iput, priors_input, (tensors from feature_columns)] as an input to keras Model.
something like:
inputs = [review_meta_id_input, priors_input] + feature_layer
model = tf.keras.models.Model(inputs=inputs, outputs=o)
In order to get tensors from feature columns, the closest lead I have now is
fc_to_tensor = {fc: input_layer(features, [fc]) for fc in feature_columns}
from https://github.com/tensorflow/tensorflow/issues/17170
However I'm not sure what the features are in the code.
There's no clear example on https://www.tensorflow.org/api_docs/python/tf/feature_column/input_layer either.
How should I construct the features variable for fc_to_tensor ?
Or is there a way to use keras.layers.Input and feature_column at the same time?
Or is there an alternative than tf.feature_column to do the bucketing as above? then I'll just drop the feature_column for now;
The behavior you desire could be achieved through following steps.
This works in TF 2.0.0-beta1, but may being changed or even simplified in further reseases.
Please check out issue in TensorFlow github repository Unable to use FeatureColumn with Keras Functional API #27416. There you will find the more general example and useful comments about tf.feature_column and Keras Functional API.
Meanwhile, based on the code in your question the input tensor for feature_column could be get like this:
# This you have defined feauture column
kid_youngest_month = feature_column.numeric_column("kid_youngest_month")
kid_age_youngest_buckets = feature_column.bucketized_column(kid_youngest_month, boundaries=[12, 24, 36, 72, 96])
# Then define layer
feature_layer = tf.keras.layers.DenseFeatures(kid_age_youngest_buckets)
# The inputs for DenseFeature layer should be define for each original feature column as dictionary, where
# keys - names of feature columns
# values - tf.keras.Input with shape =(1,), name='name_of_feature_column', dtype - actual type of original column
feature_layer_inputs = {}
feature_layer_inputs['kid_youngest_month'] = tf.keras.Input(shape=(1,), name='kid_youngest_month', dtype=tf.int8)
# Then you can collect inputs of other layers and feature_layer_inputs into one list
inputs=[review_meta_id_input, priors_input, [v for v in feature_layer_inputs.values()]]
# Then define outputs of this DenseFeature layer
feature_layer_outputs = feature_layer(feature_layer_inputs)
# And pass them into other layer like any other
x = tf.keras.layers.Dense(256, activation='relu')(feature_layer_outputs)
# Or maybe concatenate them with outputs from your others layers
combined = tf.keras.layers.concatenate([x, feature_layer_outputs])
#And probably you will finish with last output layer, maybe like this for calssification
o=tf.keras.layers.Dense(classes_number, activation='softmax', name='sequential_output')(combined)
#So you pass to the model:
model_combined = tf.keras.models.Model(inputs=[s_inputs, [v for v in feature_layer_inputs.values()]], outputs=o)
Also note. In model fit() method you should pass info which data sould be used for each input.
One way, if you use tf.data.Dataset, take care that you have used the same names for features in Dataset and for keys in feature_layer_inputs dictionary
Other way use explicite notation like:
model.fit({'review_meta_id_input': review_meta_id_data, 'priors_input': priors_data, 'kid_youngest_month': kid_youngest_month_data},
{'outputs': o},
...
)

Dataset API 'flat_map' method producing error for same code which works with 'map' method

I am trying to create a create a pipeline to read multiple CSV files using TensorFlow Dataset API and Pandas. However, using the flat_map method is producing errors. However, if I am using map method I am able to build the code and run it in session. This is the code I am using. I already opened #17415 issue in TensorFlow Github repository. But apparently, it is not an error and they asked me to post here.
folder_name = './data/power_data/'
file_names = os.listdir(folder_name)
def _get_data_for_dataset(file_name,rows=100):#
print(file_name.decode())
df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
usecols =['Wind_MWh','Actual_Load_MWh'],nrows = rows)
X_data = df_input.as_matrix()
X_data.astype('float32', copy=False)
return X_data
dataset = tf.data.Dataset.from_tensor_slices(file_names)
dataset = dataset.flat_map(lambda file_name: tf.py_func(_get_data_for_dataset,
[file_name], tf.float64))
dataset= dataset.batch(2)
fiter = dataset.make_one_shot_iterator()
get_batch = iter.get_next()
I get the following error: map_func must return a Dataset object. The pipeline works without error when I use map but it doesn't give the output I want. For example, if Pandas is reading N rows from each of my CSV files I want the pipeline to concatenate data from B files and give me an array with shape (N*B, 2). Instead, it is giving me (B, N,2) where B is the Batch size. map is adding another axis instead of concatenating on the existing axis. From what I understood in the documentation flat_map is supposed to give a flatted output. In the documentation, both map and flat_map returns type Dataset. So how is my code working with map and not with flat_map?
It would also great if you could point me towards code where Dataset API has been used with Pandas module.
As mikkola points out in the comments, the Dataset.map() and Dataset.flat_map() expect functions with different signatures: Dataset.map() takes a function that maps a single element of the input dataset to a single new element, whereas Dataset.flat_map() takes a function that maps a single element of the input dataset to a Dataset of elements.
If you want each row of the array returned by _get_data_for_dataset() to
become a separate element, you should use Dataset.flat_map() and convert the output of tf.py_func() to a Dataset, using Dataset.from_tensor_slices():
folder_name = './data/power_data/'
file_names = os.listdir(folder_name)
def _get_data_for_dataset(file_name, rows=100):
df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
usecols=['Wind_MWh', 'Actual_Load_MWh'], nrows=rows)
X_data = df_input.as_matrix()
return X_data.astype('float32', copy=False)
dataset = tf.data.Dataset.from_tensor_slices(file_names)
# Use `Dataset.from_tensor_slices()` to make a `Dataset` from the output of
# the `tf.py_func()` op.
dataset = dataset.flat_map(lambda file_name: tf.data.Dataset.from_tensor_slices(
tf.py_func(_get_data_for_dataset, [file_name], tf.float32)))
dataset = dataset.batch(2)
iter = dataset.make_one_shot_iterator()
get_batch = iter.get_next()

Convert string tensor to lower case

Is there any way to convert a string tensor to lower case, without evaluating in the session ? Some sort of tf.string_to_lower op ?
More specifically, I am reading data from tfrecords files, so my data is made of tensors. I then want to use tf.contrib.lookup.index_table_from_* to lookup indices for words in the data, and I need this to be case-insensitive. Lowering the data before writing it to tfrecords is not an option, as it needs to be kept in original format. One option would be to store both original and lowered, but I'd like to avoid this if possible.
Here's an implementation with tensorflow ops:
def lowercase(s):
ucons = tf.constant_initializer([chr(i) for i in range(65, 91)])
lcons = tf.constant_initializer([chr(i) for i in range(97, 123)])
upchars = tf.constant(ucons, dtype=tf.string)
lchars = tf.constant(lcons, dtype=tf.string)
upcharslut = tf.contrib.lookup.index_table_from_tensor(mapping=upchars, num_oov_buckets=1, default_value=-1)
splitchars = tf.string_split(tf.reshape(s, [-1]), delimiter="").values
upcharinds = upcharslut.lookup(splitchars)
return tf.reduce_join(tf.map_fn(lambda x: tf.cond(x[0] > 25, lambda: x[1], lambda: lchars[x[0]]), (upcharinds, splitchars), dtype=tf.string))
if __name__ == "__main__":
s = "komoDO DragoN "
sess = tf.Session()
x = lowercase(s)
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
print(sess.run([x]))
returns [b'komodo dragon ']
You can use tf.py_func to use a python function that manipulates your string and it's executed withing the graph.
You can do something like:
# I suppose your string tensor is tensorA
lower = tf.py_func(lambda x: x.lower(), [tensorA], tf.string, stateful=False)
# Starting from TF 2.0 `tf.py_func` is deprecated so correct code will be
lower = tf.py_function(lambda x: x.numpy().lower(), [tensorA], tf.string)
Unfortunately, tf.py_func doesn't work in all cases as serving or TFT. The following snippet is a simple in-graph TF solution.
import tensorflow as tf
def to_lower_case(text):
chars = tf.strings.unicode_decode(text, input_encoding='UTF-8')
capital_mask = tf.logical_and(tf.greater_equal(chars, 65), tf.less(chars, 91))
chars = chars + tf.cast(capital_mask, tf.int32) * 32
return tf.strings.unicode_encode(chars, output_encoding='UTF-8')
with tf.Session() as sess:
print(sess.run(to_lower_case('Test')))
In Tensorflow 1.14, a lower op has been added. A short code snippet (in eager execution mode) looks like the following:
astring = tf.constant('A String', dtype=tf.string)
tf.strings.lower(astring)
<tf.Tensor: id=79, shape=(), dtype=string, numpy=b'a string'>
If the characters your are using are limited to ASCII characters, I have a working solution for that (in graph). The idea is:
Create a lookup table with keys whose values are in [32, 127), while values the same, except those in [65, 91) replaced with [97, 123). Method: tf.contrib.lookup.HashTable.
Split the string into characters. Method: tf.string_split
Using lookup to map upper case characters to lower case characters. Method: case_table.lookup (if the HashTable was called case_table).
Join the characters back into the string. Method: tf.reduce_join.
A concrete example can be found here: https://github.com/bshao001/ChatLearner/blob/master/chatbot/tokenizeddata.py
This approach should be able to be expanded to other character sets. Notice that if you were trying to convert only those characters that need to be changed (such as 26 English uppercase characters), that would be harder (not sure doable or not) as you will have to use tf.cond method and check if the character is in the key set or not, and would be less efficient too.

TensorFlow: How to apply the same image distortion to multiple images

Starting from the Tensorflow CNN example, I'm trying to modify the model to have multiple images as an input (so that the input has not just 3 input channels, but multiples of 3 by stacking images).
To augment the input, I try to use random image operations, such as flipping, contrast and brightness provided in TensorFlow.
My current solution to apply the same random distortion to all input images is to use a fixed seed value for these operations:
def distort_image(image):
flipped_image = tf.image.random_flip_left_right(image, seed=42)
contrast_image = tf.image.random_contrast(flipped_image, lower=0.2, upper=1.8, seed=43)
brightness_image = tf.image.random_brightness(contrast_image, max_delta=0.2, seed=44)
return brightness_image
This method is called multiple times for each image at graph construction time, so I thought for each image it will use the same random number sequence and consequently, it will result in have the same applied image operations for my image input sequence.
# ...
# distort images
distorted_prediction = distort_image(seq_record.prediction)
distorted_input = []
for i in xrange(INPUT_SEQ_LENGTH):
distorted_input.append(distort_image(seq_record.input[i,:,:,:]))
stacked_distorted_input = tf.concat(2, distorted_input)
# Ensure that the random shuffling has good mixing properties.
min_queue_examples = int(num_examples_per_epoch *
MIN_FRACTION_EXAMPLES_IN_QUEUE)
# Generate a batch of sequences and prediction by building up a queue of examples.
return generate_sequence_batch(stacked_distorted_input, distorted_prediction, min_queue_examples,
batch_size, shuffle=True)
In theory, this works fine. And after doing some test runs, this really seemed to solve my problem. But after a while, I found out that I'm having a race-condition, because I use the input pipeline of the CNN-example code with multiple threads (which is the suggested method in TensorFlow to improve performance and reduce memory consumption at runtime):
def generate_sequence_batch(sequence_in, prediction, min_queue_examples,
batch_size):
num_preprocess_threads = 8 # <-- !!!
sequence_batch, prediction_batch = tf.train.shuffle_batch(
[sequence_in, prediction],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
return sequence_batch, prediction_batch
Because multiple threads create my examples, it is not guaranteed anymore that all image operations are performed in the right order (in sense of the right order of random operations).
Here I came to a point where I got completely stuck. Does anyone know how to solve this problem to apply the same image distortion to multiple images?
Some thoughts of mine:
I thought about to do some synchronizations arround these image distortion methods, but I could find anything provided by TensorFlow
I tried to generate to generate a random number for e.g. the random brightness delta using tf.random_uniform() by myself and use this value for tf.image.adjust_contrast(). But the result of the TensorFlow random generator is always a tensor, and I have not found a way to use this tensor as a parameter for tf.image.adjust_contrast() which expects a simple float32 for its contrast_factor parameter.
A solution that would (partly) work would be to combine all images to a huge image using tf.concat(), apply random operations to change contrast and brightness, and split the image afterwards. But this would not work for random flipping, because this would (at least in my case) change the order of the images, and there is no way to detect whether tf.image.random_flip_left_right() has performed a flip or not, which would be required to fix the wrong order of images if necessary.
Here is what I came up with by looking at the code of random_flip_up_down and random_flip_left_right within tensorflow :
def image_distortions(image, distortions):
distort_left_right_random = distortions[0]
mirror = tf.less(tf.pack([1.0, distort_left_right_random, 1.0]), 0.5)
image = tf.reverse(image, mirror)
distort_up_down_random = distortions[1]
mirror = tf.less(tf.pack([distort_up_down_random, 1.0, 1.0]), 0.5)
image = tf.reverse(image, mirror)
return image
distortions = tf.random_uniform([2], 0, 1.0, dtype=tf.float32)
image = image_distortions(image, distortions)
label = image_distortions(label, distortions)
I would do something like this using tf.case. It allows you to specify what to return if certain condition holds https://www.tensorflow.org/api_docs/python/tf/case
import tensorflow as tf
def distort(image, x):
# flip vertically, horizontally, both, or do nothing
image = tf.case({
tf.equal(x,0): lambda: tf.reverse(image,[0]),
tf.equal(x,1): lambda: tf.reverse(image,[1]),
tf.equal(x,2): lambda: tf.reverse(image,[0,1]),
}, default=lambda: image, exclusive=True)
return image
def random_distortion(image):
x = tf.random_uniform([1], 0, 4, dtype=tf.int32)
return distort(image, x[0])
To check if it works.
import numpy as np
import matplotlib.pyplot as plt
# create image
image = np.zeros((25,25))
image[:10,5:10] = 1.
# create subplots
fig, axes = plt.subplots(2,2)
for i in axes.flatten(): i.axis('off')
with tf.Session() as sess:
for i in range(4):
distorted_img = sess.run(distort(image, i))
axes[i % 2][i // 2].imshow(distorted_img, cmap='gray')
plt.show()