Tensorflow program stucks in reading tfrecord - tensorflow

I don't know whats going wrong with this small program.
Here is a snippet of the mcve-writer:
def convert_to_example():
example = tf.train.Example(features=tf.train.Features(feature={
'bboxes': _floats_feature([0.,1.])
}))
return example
writer = tf.python_io.TFRecordWriter(output_file)
...
for filename in filenames:
...
example = convert_to_example()
writer.write(example.SerializeToString())
writer.close()
This is how I read the examples:
filename = '/path/to/file'
record_iter = tf.python_io.tf_record_iterator(path=filename)
example = tf.train.Example()
l = []
for record in record_iter:
example.ParseFromString(record)
bboxes = example.features.feature['bboxes'].float_list.value[:]
l.append(bboxes)
print(l)
I have narrowed down the problem in:
it works with bytes_list
it works with int64_list if the list is just one integer but not a list of integers
it works with float_list if the list is just one float but not a list of floats
So, if I use a list of floats/integers, the execution reaches a deadlock or crushes. If I use a float/integer everything runs smooth.
Any idea?

This error is system dependent. On the workstation works just fine but not in my pc. I opened an issue in github.com.

Related

tfrecordswriter does not write

I am trying to create a tf.data.Dataset from a generator I wrote, and following this great answer: Split .tfrecords file into many .tfrecords files
Generator Code
def get_examples_generator(num_variants, vcf_reader):
def generator():
counter = 0
for vcf_read in vcf_reader:
is_vcf_ok = ... # checking whether this "vcf" example is ok
if is_vcf_ok and counter < num_variants:
counter += 1
# features extraction ...
# we create an example
example = make_example(img=img, label=label) # returns a SerializedExample
yield example
return generator
TFRecordsWriter Usage Code
def write_sharded_tfrecords(filename, path, vcf_reader,
num_variants,
shard_len):
assert Path(path).exists(), "path does not exist"
generator = get_examples_generator(num_variants=num_variants,
vcf_reader=vcf_reader,
cfdna_bam_reader=cfdna_bam_reader)
dataset = tf.data.Dataset.from_generator(generator,
output_types=tf.string,
output_shapes=())
num_shards = int(np.ceil(num_variants/shard_len))
formatter = lambda batch_idx: f'{path}/{filename}-{batch_idx:05d}-of-' \
f'{num_shards:05d}.tfrecord'
# inspired by https://stackoverflow.com/questions/54519309/split-tfrecords-file-into-many-tfrecords-files
for i in range(num_shards):
shard_path = formatter(i)
writer = tf.data.experimental.TFRecordWriter(shard_path)
shard = dataset.shard(num_shards, index=i)
writer.write(shard)
This is supposed to be a straight-forward use of tfrecords writer. However, It does not write any files at all. Does anyone understand why this doesn't work?
In my functions, I call the writer with tf.io.TFRecordWriter. Try changing your writer and see if it works:
writer = tf.io.TFRecordWriter
...
As a further reference, this answer helped me:
https://stackoverflow.com/a/60283571

Unfurling TFRecords growing slower and slower

I'm trying to convert a TFRecord dataset back to images and I'm using the following code to do so:
def get_im_and_label_helper(parsed_features, im_format, label_format):
im = tf.image.decode_png(parsed_features['image/encoded'])
label = tf.image.decode_png(parsed_features['image/segmentation/class/encoded'])
im, label = im.eval(), label.eval()
return im, label
for tfr_file_path_name in tfr_files_list:
tfr_file_path = os.path.join(sub_dataset_dir, tfr_file_path_name)
record_iterator = tf.python_io.tf_record_iterator(tfr_file_path)
for string_record in record_iterator:
parsed_features = tf.parse_single_example(string_record, READ_FEATURES)
filename = parsed_features['image/filename'].eval().decode("utf-8")
im, label = get_im_and_label_helper(parsed_features, im_format, label_format)
imageio.imwrite(os.path.join(target_dir, "images", filename + ".png"), im)
imageio.imwrite(os.path.join(target_dir, "labels", filename + ".png"), label)
It works fine and does what I expect - extracts the images and labels and saves them in the proper place. It starts fast and it gets slower and slower as it goes on. I'm inexperienced with tensorflow, so I assume I'm causing some computation graph to grow bigger and bigger, but I don't really know.
Any ideas?
Using tf.enable_eager_execution() followed by tf.executing_eagerly(), and replacing all .eval()with .numpy()solved the problem.

OpenCv_Python - Convert Frame Sequence To a Video

I am a newbie in OpenCV using Python. I am currently working with a project related opencv using python language. I have a video data set named "VideoDataSet/dynamicBackground/canoe/input" that stores the sequence of image frames and I would like to convert the sequence of frames from the file path to a video. However, I am getting an error when I execute the program. I have tried various codecs but it still gives me the same errors, can any of you please shed some light on what might be wrong? Thank you.
This is my sample code:
import cv2
import numpy as np
import os
import glob as gb
filename = "VideoDataSet/dynamicBackground/canoe/input"
img_path = gb.glob(filename)
videoWriter = cv2.VideoWriter('test.avi', cv2.VideoWriter_fourcc(*'MJPG'),
25, (640,480))
for path in img_path:
img = cv2.imread(path)
img = cv2.resize(img,(640,480))
videoWriter.write(img)
print ("you are success create.")
This is the error:
Error prompt out:cv2.error: OpenCV(3.4.1) D:\Build\OpenCV\opencv-3.4.1\modules\imgproc\src\resize.cpp:4044: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize
(Note: the problem occur with the img = cv2.resize(img,(640,480)))
It is returning this error because you are trying to re-size the directory entry! You need to put:
filename = "VideoDataSet/dynamicBackground/canoe/input/*"
So that it will match all the files in the folder when you glob it. The error actually suggested that the source image had either zero width or zero height. Putting:
print( img_path )
In after your glob attempt showed that it was only returning the directory entry itself.
You subsequently discovered that although it was now generating a file, it was corrupted. This is because you are incorrectly specifying the codec. Replace your fourcc parameter with this:
cv2.VideoWriter_fourcc('M','J','P','G')
you can try this:
img_path = gb.glob(filename)
videoWriter = cv2.VideoWriter('frame2video.avi', cv2.VideoWriter_fourcc(*'MJPG'), 25, (640,480))
for path in img_path:
img = cv2.imread(path)
img = cv2.resize(img,(640,480))
videoWriter.write(img)

How does tf.contrib.seq2seq.gather_tree work?

How exactly does gather_tree in contrib.seq2seq work? I can see that it takes the predicted ids and beam parent ids and somehow returns the final beams, but what's actually going underneath the hood? There doesn't seem to be any Python code base I could examine to figure it out. The API isn't very explanatory;
Is there any code source for tf.contrib.seq2seq.gather_tree? I am using TensorFlow 1.3 and looking inside gen_beam_search_ops.py doesn't seem helpful.
The codes are detailed as follows:
def gather_tree_py(values, parents):
"""Gathers path through a tree backwards from the leave nodes. Used
to reconstruct beams given their parents."""
beam_length = values.shape[0]
num_beams = values.shape[1]
res = np.zeros_like(values)
res[-1, :] = values[-1, :]
for beam_id in range(num_beams):
parent = parents[-1][beam_id]
for level in reversed(range(beam_length - 1)):
res[level, beam_id] = values[level][parent]
parent = parents[level][parent]
return np.array(res).astype(values.dtype)
def gather_tree(values, parents):
"""Tensor version of gather_tree_py"""
res = tf.py_func(
func=gather_tree_py, inp=[values, parents], Tout=values.dtype)
res.set_shape(values.get_shape().as_list())
return res
github: seq2seq beam_search

TensorFlow eval inbetween two queues

My goal is as follows:
1). Use tf.train.string_input_producer and tf.TextLineReader to read lines from files.
2). Convert the resulting tensors containing the files' lines into ordinary strings using eval to do preprocessing before batching (TensorFlow's limited string operations are insufficient for my purposes)
3). Convert these preprocessed strings back to tensors (presumably using tf.constant ?)
4). Use tf.train.batch on the resulting tensors.
The following code is a simplified version of what I'm working on.
The "After batch" print statement gets executed, the REPL hangs on the print statement with the final eval.
From what I've read, I have a feeling this is because
threads = tf.train.start_queue_runners(coord = coord, sess = sess)
needs to be run after calling tf.train.batch. But if I do this, then the REPL will of course hang on the first eval
evalue = value.eval(session = sess)
needed to do the preprocessing.
What is the best way to convert back and forth between tensors and their values inbetween queues? (I'm really hoping I can do this without preprocessing my data files beforehand.)
import tensorflow as tf
import os
def process(string):
return string.upper()
def main():
sess = tf.Session()
filenames = tf.constant(["test_data/" + f for f in os.listdir("./test_data")])
filename_queue = tf.train.string_input_producer(filenames)
file_reader = tf.TextLineReader()
key, value = file_reader.read(filename_queue)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord = coord, sess = sess)
evalue = value.eval(session = sess)
proc_value = process(evalue)
tensor_value = tf.constant(proc_value)
batch = tf.train.batch([tensor_value], batch_size = 2, capacity = 2)
print "After batch."
print batch.eval(session = sess)
We discussed a slightly different approach, which I think achieves what you need here:
Converting TensorFlow tutorial to work with my own data
Not sure what file formats you are reading, but the above example reads CSVs row-by-row and packs them into randomized batches.
If you are reading from a CSV, then, in a nutshell, I think what you might want to do is instead of returning value from file_reader.read(filename_queue) immediately, you could try to do some pre-processing first, and return THAT instead, something like this:
rDefaults = [['a'] for row in range((ROW_LENGTH))]
_, value = reader.read(filename_queue)
whole_row = tf.decode_csv(value, record_defaults=rDefaults)
cell1 = tf.slice(whole_row, [0], [1]) # one specific cell that contains a string
cell2 = tf.slice(whole_row, [1], [2]) # another cell that contains a string
# do some processing on cell1 and cell2
return cell1, cell2