convert tf.dense Tensor to tf.one_hot Tensor on Graph execution Tensorflow - tensorflow2.0

TF version: 2.11
I try to train a simple 2input classifier with TFRecords tf.data pipeline
I do not manage to convert the tf.dense Tensor with containing only a scalar to a tf.onehot vector
# get all recorddatasets abspath
training_names= [record_path+'/'+rec for rec in os.listdir(record_path) if rec.startswith('train')]
# load in tf dataset
train_dataset = tf.data.TFRecordDataset(training_names[1])
train_dataset = train_dataset.map(return_xy)
mapping function:
def return_xy(example_proto):
#parse example
sample= parse_function(example_proto)
#decode image 1
encoded_image1 = sample['image/encoded_1']
decoded_image1 = decode_image(encoded_image1)
#decode image 2
encoded_image2 = sample['image/encoded_2']
decoded_image2 = decode_image(encoded_image2)
#decode label
print(f'image/object/class/'+level: {sample['image/object/class/'+level]}')
class_label = tf.sparse.to_dense(sample['image/object/class/'+level])
print(f'type of class label :{type(class_label)}')
print(class_label)
# conversion to onehot with depth 26 :: -> how can i extract only the value or convert directly to tf.onehot??
label_onehot=tf.one_hot(class_label,26)
#resizing image
input_left=tf.image.resize(decoded_image1,[416, 416])
input_right=tf.image.resize(decoded_image2,[416, 416])
return {'input_3res1':input_left, 'input_5res2':input_right} , label_onehot
output:
image/object/class/'+level: SparseTensor(indices=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:14", shape=(None, 1), dtype=int64), values=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:31", shape=(None,), dtype=int64), dense_shape=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:48", shape=(1,), dtype=int64))
type of class label :<class 'tensorflow.python.framework.ops.Tensor'>
Tensor("SparseToDense:0", shape=(None,), dtype=int64)
However I am sure that the label is in this Tensor because when run it eagerly
raw_dataset = tf.data.TFRecordDataset([rec_file])
parsed_dataset = raw_dataset.map(parse_function) # only parsing
for sample in parsed_dataset:
class_label=tf.sparse.to_dense(sample['image/object/class/label_level3'])[0]
print(f'type of class label :{type(class_label)}')
print(f'labels from labelmap :{class_label}')
I get output:
type of class label :<class 'tensorflow.python.framework.ops.EagerTensor'>
labels from labelmap :7
If I just chose a random number for the label and pass it to tf_one_hot(randint, 26) then the model begins to train (obviously nonsensical).
So the question is how can i convert the:
Tensor("SparseToDense:0", shape=(None,), dtype=int64)
to a
Tensor("one_hot:0", shape=(26,), dtype=float32)
What I tried so far
in the call data.map(parse_xy)
i tried to just call .numpy() on the tf tensors but didnt work , this only works for eager tensors.
In my understanding i cannot use eager execution because everthing in the parse_xy function gets excecuted on the whole graph:
ive already tried to enable eager execution -> failed
https://www.tensorflow.org/api_docs/python/tf/config/run_functions_eagerly
Note: This flag has no effect on functions passed into tf.data transformations as arguments.
tf.data functions are never executed eagerly and are always executed as a compiled Tensorflow Graph.
ive also tried to use the tf_pyfunc but this only returns another tf.Tensor with an unknown shape
def get_onehot(tensor):
class_label=tensor[0]
return tf.one_hot(class_label,26)
and add the line in parse_xy:
label_onehot=tf.py_function(func=get_onehot, inp=[class_label], Tout=tf.int64)
but there i always get an unknown shape which a cannot just alter with .set_shape()

I was able to solve the issue by only using TensorFlow functions.
tf.gather allows to index a TensorFlow tensor:
class_label_gather = tf.sparse.to_dense(sample['image/object/class/'+level])
class_indices = tf.gather(tf.cast(class_label_gather,dtype=tf.int32),0)
label_onehot=tf.one_hot(class_indices,26)

Related

How to avoid memory leakage in an autoregressive model within tensorflow

Recently, I am training a LSTM with attention mechanism for regressionin tensorflow 2.9 and I met an problem during training with model.fit():
At the beginning, the training time is okay, like 7s/step. However, it was increasing during the process and after several steps, like 1000, the value might be 50s/step. Here below is a part of the code for my model:
class AttentionModel(tf.keras.Model):
def __init__(self, encoder_output_dim, dec_units, dense_dim, batch):
super().__init__()
self.dense_dim = dense_dim
self.batch = batch
encoder = Encoder(encoder_output_dim)
decoder = Decoder(dec_units,dense_dim)
self.encoder = encoder
self.decoder = decoder
def call(self, inputs):
# Creat a tensor to record the result
tempt = list()
encoder_output, encoder_state = self.encoder(inputs)
new_features = np.zeros((self.batch, 1, 1))
dec_initial_state = encoder_state
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt.append(dec_result.logits)
new_features = dec_result.logits
dec_initial_state = dec_state
result=tf.concat(tempt,1)
return result
In the official documents for tf.function, I notice: "Don't rely on Python side effects like object mutation or list appends".
Since I use a dynamic python list with append() to record the intermediate variables, I guess each time during training, a new tf.graph was added. Is the reason my training is getting slower and slower?
Additionally, what should I use instead of python list to avoid this? I have tried with a numpy.zeros matrix but it will lead to another problem:
tempt = np.zeros(shape=(1,6))
...
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt[i]=(dec_result.logits)
...
Cannot convert a symbolic tf.Tensor (decoder/dense_3/BiasAdd:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

Get output layer Tensorflow 1.14

I want to create an multi-modal machine learning model using TLSTM for time-variant data.
In order to concatinate time-variant with time-invariant data I need to get the output vector of the TLSTM.
I´m using this TLSTM Model: https://github.com/illidanlab/T-LSTM
I updated the repo to be compatible with Tensorflow 1.14 and Python 3.7.12.
I assume you can exreact the output vector at the get_output function:
def get_output(self, state):
output = tf.nn.relu(tf.matmul(state, self.Wo) + self.bo)
output = tf.nn.dropout(output, self.keep_prob)
output = tf.matmul(output, self.W_softmax) + self.b_softmax
return output
If I print the output I get an tensor:
output = tf.nn.relu(tf.matmul(state, self.Wo) + self.bo)
print(output)
-> Tensor("map_5/while/Relu:0", shape=(?, 64), dtype=float32)
print(output[0])
-> Tensor("map_5/while/strided_slice:0", shape=(64,), dtype=float32)
print(output[0, 0])
-> Tensor("map_5/while/strided_slice_1:0", shape=(), dtype=float32)
The 64 dimension seems to be the output I´m looking for, how can I access it?
Solution: tf.print()
Note that inside of a session it must be called as control_dependence:
def get_output(self, state):
output = tf.nn.relu(tf.matmul(state, self.Wo) + self.bo)
print_op = tf.print(
output,
summarize=-1,
output_stream="file://C:/Users/my_path/T-LSTM-master/features/foo.out")
with tf.control_dependencies([print_op]):
output = tf.nn.dropout(output, self.keep_prob)
output = tf.matmul(output, self.W_softmax) + self.b_softmax
return output
This example directly saves the feature as a file. Summarize=-1 to save/print the entire tensor.
If you just want to print your output tensor, most of the time tf.print(output) would give you the required result.

ValueError when trying to fine-tune GPT-2 model in TensorFlow

I am encountering a ValueError in my Python code when trying to fine-tune Hugging Face's distribution of the GPT-2 model. Specifically:
ValueError: Dimensions must be equal, but are 64 and 0 for
'{{node Equal_1}} = Equal[T=DT_FLOAT, incompatible_shape_error=true](Cast_18, Cast_19)'
with input shapes: [64,0,1024], [2,0,12,1024].
I have around 100 text files that I concatenate into a string variable called raw_text and then pass into the following function to create training and testing TensorFlow datasets:
def to_datasets(raw_text):
# split the raw text in smaller sequences
seqs = [
raw_text[SEQ_LEN * i:SEQ_LEN * (i + 1)]
for i in range(len(raw_text) // SEQ_LEN)
]
# set up Hugging Face GPT-2 tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token
# tokenize the character sequences
tokenized_seqs = [
tokenizer(seq, padding="max_length", return_tensors="tf")["input_ids"]
for seq in seqs
]
# convert tokenized sequences into TensorFlow datasets
trn_seqs = tf.data.Dataset \
.from_tensor_slices(tokenized_seqs[:int(len(tokenized_seqs) * TRAIN_PERCENT)])
tst_seqs = tf.data.Dataset \
.from_tensor_slices(tokenized_seqs[int(len(tokenized_seqs) * TRAIN_PERCENT):])
def input_and_target(x):
return x[:-1], x[1:]
# map into (input, target) tuples, shuffle order of elements, and batch
trn_dataset = trn_seqs.map(input_and_target) \
.shuffle(SHUFFLE_BUFFER_SIZE) \
.batch(BATCH_SIZE, drop_remainder=True)
tst_dataset = tst_seqs.map(input_and_target) \
.shuffle(SHUFFLE_BUFFER_SIZE) \
.batch(BATCH_SIZE, drop_remainder=True)
return trn_dataset, tst_dataset
I then try to train my model, calling train_model(*to_datasets(raw_text)):
def train_model(trn_dataset, tst_dataset):
# import Hugging Face GPT-2 model
model = TFGPT2Model.from_pretrained("gpt2")
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy()
)
model.fit(
trn_dataset,
epochs=EPOCHS,
initial_epoch=0,
validation_data=tst_dataset
)
The ValueError is triggered on the model.fit() call. The variables in all-caps are settings pulled in from a JSON file. Currently, they are set to:
{
"BATCH_SIZE":64,
"SHUFFLE_BUFFER_SIZE":10000,
"EPOCHS":500,
"SEQ_LEN":2048,
"TRAIN_PERCENT":0.9
}
Any information regarding what this error means or ideas on how to resolve it would be greatly appreciated. Thank you!
I'm having the same problem but when I change the batch size to 12 (same as n_layer parameter in the gpt-2 config file) it works.
I don't Know why it works but you can try it...
If you manage to solve it on different way I will be glad to hear.

tensorflow error - you must feed a value for placeholder tensor 'in'

I'm trying to implement queues for my tensorflow prediction but get the following error -
you must feed a value for placeholder tensor 'in' with dtype float and shape [1024,1024,3]
The program works fine if I use the feed_dict, Trying to replace feed_dict with queues.
The program basically takes a list of positions and passes the image np array to the input tensor.
for each in positions:
y,x = each
images = img[y:y+1024,x:x+1024,:]
a = images.astype('float32')
q = tf.FIFOQueue(capacity=200,dtypes=dtypes)
enqueue_op = q.enqueue(a)
qr = tf.train.QueueRunner(q, [enqueue_op] * 1)
tf.train.add_queue_runner(qr)
data = q.dequeue()
graph=load_graph('/home/graph/frozen_graph.pb')
with tf.Session(graph=graph,config=tf.ConfigProto(log_device_placement=True)) as sess:
p_boxes = graph.get_tensor_by_name("cat:0")
p_confs = graph.get_tensor_by_name("sha:0")
y = [p_confs, p_boxes]
x = graph.get_tensor_by_name("in:0")
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord,sess=sess)
confs, boxes = sess.run(y)
coord.request_stop()
coord.join(threads)
How can I make sure the input data that I populated to the queue is recognized while running the graph in the session.
In my original run I call the
confs, boxes = sess.run([p_confs, p_boxes], feed_dict=feed_dict_testing)
I'd suggest not using queues for this problem, and switching to the new tf.data API. In particular tf.data.Dataset.from_generator() makes it easier to feed in data from a Python function. You can rewrite your code to be much simpler, as follows:
def generator():
for y, x in positions:
images = img[y:y+1024,x:x+1024,:]
yield images.astype('float32')
dataset = tf.data.Dataset.from_generator(
generator, tf.float32, [1024, 1024, img.shape[3]])
# Add any extra transformations in here, like `dataset.batch()` or
# `dataset.repeat()`.
# ...
iterator = dataset.make_one_shot_iterator()
data = iterator.get_next()
Note that in your program, there's no connection between the data tensor and the graph you loaded in load_graph() (at least, assuming that load_graph() doesn't grab data from the global state!). You will probably need to use tf.import_graph_def() and the input_map argument to associate data with one of the tensors in your frozen graph (possibly "in:0"?) to complete the task.

"Output 0 of type double does not match declared output type string" while running the iris sample program in TensorFlow Serving

I am running the sample iris program in TensorFlow Serving. Since it is a TF.Learn model, I am exporting the model using the following classifier.export(export_dir=model_dir,signature_fn=my_classification_signature_fn) and the signature_fn is defined as shown below:
def my_classification_signature_fn(examples, unused_features, predictions):
"""Creates classification signature from given examples and predictions.
Args:
examples: `Tensor`.
unused_features: `dict` of `Tensor`s.
predictions: `Tensor` or dict of tensors that contains the classes tensor
as in {'classes': `Tensor`}.
Returns:
Tuple of default classification signature and empty named signatures.
Raises:
ValueError: If examples is `None`.
"""
if examples is None:
raise ValueError('examples cannot be None when using this signature fn.')
if isinstance(predictions, dict):
default_signature = exporter.classification_signature(
examples, classes_tensor=predictions['classes'])
else:
default_signature = exporter.classification_signature(
examples, classes_tensor=predictions)
named_graph_signatures={
'inputs': exporter.generic_signature({'x_values': examples}),
'outputs': exporter.generic_signature({'preds': predictions})}
return default_signature, named_graph_signatures
The model gets successfully exported using the following piece of code.
I have created a client which makes real-time predictions using TensorFlow Serving.
The following is the code for the client:
flags.DEFINE_string("model_dir", "/tmp/iris_model_dir", "Base directory for output models.")
tf.app.flags.DEFINE_integer('concurrency', 1,
'maximum number of concurrent inference requests')
tf.app.flags.DEFINE_string('server', '', 'PredictionService host:port')
#connection
host, port = FLAGS.server.split(':')
channel = implementations.insecure_channel(host, int(port))
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
# Classify two new flower samples.
new_samples = np.array([5.8, 3.1, 5.0, 1.7], dtype=float)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'iris'
request.inputs["x_values"].CopyFrom(
tf.contrib.util.make_tensor_proto(new_samples))
result = stub.Predict(request, 10.0) # 10 secs timeout
However, on making the predictions, the following error is displayed:
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INTERNAL, details="Output 0 of type double does not match declared output type string for node _recv_input_example_tensor_0 = _Recv[client_terminated=true, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=2016246895612781641, tensor_name="input_example_tensor:0", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/cpu:0"]()")
Here is the entire stack trace.
enter image description here
The iris model is defined in the following manner:
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3, model_dir=model_dir)
# Fit model.
classifier.fit(x=training_set.data,
y=training_set.target,
steps=2000)
Kindly guide a solution for this error.
I think the problem is that your signature_fn is going on the else branch and passing predictions as the output to the classification signature, which expects a string output and not a double output. Either use a regression signature function or add something to the graph to get the output in the form of a string.