I am trying to reproduce the experiments in the paper Cross Modal Focal Loss for RGBD Face Anti-Spoofing (https://arxiv.org/pdf/2103.00948.pdf) . I've pointed my preprocessed directory to point to the mc-pixbis-224 preprocessed data in order to train the RGBDMH - CMFL model .I've selected to train the grandtest protocol and pointed the annotations directory to PROTOCOL-grand_test-curated.csv file . However , my DataFolder class fails to load any training samples as the length of dataset when printed is 0 .
Traceback (most recent call last):
File "bin/train_generic.py", line 22, in <module>
sys.exit(bob.learn.pytorch.scripts.train_generic.main())
File "/home/hazeeq/anaconda3/envs/bob.paper.cross_modal_focal_loss_cvpr2021/lib/python3.7/site-packages/bob/learn/pytorch/scripts/train_generic.py", line 150, in main
shuffle=True,
File "/home/hazeeq/anaconda3/envs/bob.paper.cross_modal_focal_loss_cvpr2021/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 262, in __init__
sampler = RandomSampler(dataset, generator=generator) # type: ignore
File "/home/hazeeq/anaconda3/envs/bob.paper.cross_modal_focal_loss_cvpr2021/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 104, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
(bob.paper.cross_modal_focal_loss_cvpr2021) hazeeq#hazeeq-U3033:~/test/bob.paper.cross_modal_focal_loss_cvpr2021$
Line 152 of train_generic.py refers to this section of the code where dataloader["train"] fails to be loaded with the proper DataLoader object within the 'else' statement:
# Which device to use is figured out at this point, no need to use `use-gpu` flag anymore
# get data
if hasattr(configuration, "dataset"):
dataloader = {}
if not do_crossvalidation:
logger.info(
"There are {} training samples".format(
len(configuration.dataset["train"])
)
)
dataloader["train"] = torch.utils.data.DataLoader(
configuration.dataset["train"],
batch_size=batch_size,
num_workers=num_workers,
shuffle=True,
)
else:
dataloader["train"] = torch.utils.data.DataLoader(
configuration.dataset["train"],
batch_size=batch_size,
num_workers=num_workers,
shuffle=True,
)
dataloader["val"] = torch.utils.data.DataLoader(
configuration.dataset["val"],
batch_size=batch_size,
num_workers=num_workers,
shuffle=True,
)
logger.info(
"There are {} training samples".format(
len(configuration.dataset["train"])
)
)
logger.info(
"There are {} validation samples".format(
len(configuration.dataset["val"])
)
)
else:
logger.error("Please provide a dataset in your configuration file !")
sys.exit()
assert hasattr(configuration, "optimizer")
# train the network
if hasattr(configuration, "network"):
trainer = GenericTrainer(
configuration.network,
configuration.optimizer,
configuration.compute_loss,
learning_rate=learning_rate,
device=device,
verbosity_level=verbosity_level,
tf_logdir=output_dir + "/tf_logs",
do_crossvalidation=do_crossvalidation,
save_interval=save_interval,
)
trainer.train(dataloader, n_epochs=epochs, output_dir=output_dir, model=model)
else:
logger.error("Please provide a network in your configuration file !")
sys.exit()
The code also reports multiple missing files , so I am not sure if there are any missing files that should be part of the MC-pixbis-224 preprocessed data . Here I have attached some of the missing file prompts , there are more missing files than the ones showed below .
...............................HLDI self.annotation_directory ./hqwmca-protocols-csv/PROTOCOL-grand_test-curated.csv
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/02.04.19/1_03_0064_0000_06_01_013-e3a1456b.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/01.04.19/1_03_0001_0000_07_00_001-c8bd4c01.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/02.04.19/1_03_0001_0000_06_01_001-48c7d79c.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/11.03.19/1_03_0523_0018_08_00_004-315ad7b2.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/02.04.19/1_03_0002_0000_06_01_002-173e70ed.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/11.10.19/1_01_0002_0000_00_00_000-51e86383.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/11.10.19/1_01_0002_0000_00_00_000-7517b634.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/07.10.19/1_01_0077_0000_00_00_000-9f7b92f8.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/07.10.19/1_01_0077_0000_00_00_000-d416451d.hdf5
Missing file: /home/Dataset/FaceAntiSpoofing/HQ-WMCA/MC-PixBiS-224/preprocessed/face-station/11.10.19/1_01_0084_0000_00_00_000-305a3a31.hdf5
The CSV files are not annotations, they are just showing the distribution of files in different folds in each of the protocols if you want to use the datasets outside bob. Moreover, the MC-pixbis-224 preprocessed files do not correspond to the RGB-D data, it corresponds to an earlier paper. For RGB-D data you have to access the RAW data, and preprocess the data using the documentation.
Regards,
Anjith
Related
I am currently writing a script to augment a dataset for me using tf.keras (code given below). I'm pretty new to tf and data augmentation so I've been following a tutorial (https://blog.devgenius.io/data-augmentation-programming-e9a4703198be) pretty religiously. Despite this, I've been running into a lot of errors when I try to actually apply the ImageDataGenerator object to the image I'm loading. Specifically, I keep getting this error:
Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: '/home/kai/SURF22/yolov5/data/sc_google_aug/aug_0_3413.png'
File "/home/kai/SURF22/yolov5/data_augmentation", line 45, in <module>
for batch in idg.flow(aug_array,
It seems like tf can't find the image I want it to augment but I have no idea why because I load the image and input it as an array like the tutorial does. I tried inputting the absolute file path to the image instead one time but then I got a "string to float" error. Basically, I have no idea what is wrong and no one else seems to be getting this error when applying a for loop to .flow(). If anyone has advice on what could be going wrong I'd really appreciate it!
# images folder directory
folder_dir = "/home/kai/SURF22/yolov5/data/"
# initialize count
i = 0
for image in os.listdir(folder_dir + "prelim_data/sc_google_trans"):
# open the image
img = Image.open(folder_dir + "prelim_data/sc_google_trans/" + image)
# make copy of image to augment
# want to preserve original image
aug_img = img.copy()
# define an ImageDataGenerator object
idg = ImageDataGenerator(horizontal_flip=True,
vertical_flip=True,
rotation_range=360,
brightness_range=[0.2, 1.0],
shear_range=45)
# aug_img = load_img(folder_dir + "prelim_data/sc_google_trans/0.png")
# reshape image to a 4D array to be used with keras flow function
aug_array = img_to_array(aug_img)
aug_array = aug_array.reshape((1,) + aug_array.shape)
# augment image
for batch in idg.flow(aug_array,
batch_size=1,
save_to_dir='/home/kai/SURF22/yolov5/data/sc_google_aug',
save_prefix='aug',
save_format='png'):
i += 1
if i > 3:
break
I am trying to create a tfrecord from a folder of numpy arrays, the folder contains about 2000 numpy files of 50mb each.
def convert(image_paths,out_path):
# Args:
# image_paths List of file-paths for the images.
# labels Class-labels for the images.
# out_path File-path for the TFRecords output file.
print("Converting: " + out_path)
# Number of images. Used when printing the progress.
num_images = len(image_paths)
# Open a TFRecordWriter for the output-file.
with tf.python_io.TFRecordWriter(out_path) as writer:
# Iterate over all the image-paths and class-labels.
for i, (path) in enumerate(image_paths):
# Print the percentage-progress.
print_progress(count=i, total=num_images-1)
# Load the image-file using matplotlib's imread function.
img = np.load(path)
# Convert the image to raw bytes.
img_bytes = img.tostring()
# Create a dict with the data we want to save in the
# TFRecords file. You can add more relevant data here.
data = \
{
'image': wrap_bytes(img_bytes)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)
i think it converts about 200 files and then i get this
Converting: tf.recordtrain
- Progress: 3.6%Traceback (most recent call last):
File "tf_record.py", line 71, in <module>
out_path=path_tfrecords_train)
File "tf_record.py", line 54, in convert
writer.write(serialized)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/tf_record.py", line 236, in write
self._writer.WriteRecord(record, status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: tf.recordtrain; File too large
Any suggestions to fix this would be helpful, Thanks in advance.
I'm not sure what the limits are to tfrecords but the more common approach assuming you have enough disk space is to store your dataset over several tfrecords file e.g. store every 20 numpy files in a different tfrecords file.
I've been trying to adapt the reddit_tft example from the cloud-ml github samples repo to my needs.
I've been able to get it running as per the tutorial readme.
However what i want to use it for is a binary classification problem and also output keys in batch prediction.
So i have made copy of the tutorial code here and have changed it in a few places to be able to have a model type of deep_classifier that would use a DNNClasifier instead of a DNNRegressor.
I've changed the score variable to be
if(score>0,1,0) as score
It's training fine, deploys to cloud ml but i'm not sure how to now get keys back from my predictions. `
I've updated the sql pulling from BigQuery to include id as example_id here
It seems the code from the tutorial had some sort of placeholder for example_id so i'm trying to leverage that.
It all seems to work but when i get batch predictions all i get is json like this:
{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]}
{"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]}
...
So example_id does not seem to be making it into the serving functions like i need.
I've tried to follow the approach here which is based on adapting the census example for keys.
I just cant figure out how to finish adapting this reddit example to also output keys in the predictions as they look a bit different to me in terms of design and functions being used.
Update 1
My latest attempt is here Trying to use the approach outlined here.
However this is giving errors:
NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]
Update 2
My latest attempt and details are here.
I'm now getting a error from tensorflow-fransform (run_preprocess.sh works fine in tft 0.1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__
self._dtype = tf.as_dtype(state['dtype'])
TypeError: string indices must be integers, not str
Update 3
I have changed things to just use beam + csv and avoid tft. Also i'm now using the approach as outlined here for extending the canned estimator to get the key back with the predictions.
However when following this post to try get the comments in as features i'm now running into a new error.
The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name
My repo for this attempt/approach is here. This all runs fine if i just use subreddit as a feature, it's adding in the comment feature that seems to be causing the problems. Lines 103 to 111 is where i have followed this approach.
Not sure what's triggering the error in my code from reading the trace. Anyone any ideas?
Or can anyone point me towards another approach to go from text to bow to embedding feature in TF?
See:
https://medium.com/#lakshmanok/how-to-extend-a-canned-tensorflow-estimator-to-add-more-evaluation-metrics-and-to-pass-through-ddf66cd3047d
Here's what the code looks like to pass through keys:
def forward_key_to_export(estimator):
estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)
## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
config = estimator.config
def model_fn2(features, labels, mode):
estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
if estimatorSpec.export_outputs:
for ekey in ['predict', 'serving_default']:
estimatorSpec.export_outputs[ekey] = \
tf.estimator.export.PredictOutput(estimatorSpec.predictions)
return estimatorSpec
return tf.estimator.Estimator(model_fn=model_fn2, config=config)
##
# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
estimator = tf.estimator.DNNLinearCombinedRegressor(...)
estimator = forward_key_to_export(estimator)
...
tf.estimator.train_and_evaluate(estimator, ...)
We have plans, but haven't moved the changes into Census yet for the output keys. In the mean time can you please see if this gist helps https://gist.github.com/andrewm4894/ebd3ac3c87e2ab4af8a10740e85073bb#file-with_keys_model-py
Please feel free to send a PR if you get to it sooner and we will merge your contribution.
I was training a neural network and had run over all the training data for several epochs successfully.
However, the tfrecord corrputed error suddenly came out as follows:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/tf_record.py", line 77, in tf_record_iterator
reader.GetNext(status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 106241330
I checked the data file again and it was indeed corrupted at that line. But the data was intact before I ran the training code and I simply just read the data by following code:
batch_data = []
record_iterator = tf.python_io.tf_record_iterator(path=file, options=options)
for string_record in record_iterator:
example = tf.train.Example()
example.ParseFromString(string_record)
data = generate_data_from_record(example) # record parsing code
batch_data.append(data)
if len(batch_data) == batch_size:
yield batch_data
batch_data = []
I am wondering why the data file was corrupted and how can I remain the integrity of the data file.
You should make a clean copy of your tfrecord files. Whenever your working copy get corrupted, replace from the clean copy. The dataLoss error seems to be as a result of several reading of the same record, and its also dependent on the disk.
If someone is facing this problem, the above answer by #nwoye-cid worked for me plus the link below to install everything properly.
Also, restart your kernel from scratch if nothing works then only go for other solutions.
Link!
What related GitHub issues or Stack Overflow threads have you found by searching the web for your problem?
I searched #1269 #504
Environment info
Mac OS for build and Android version 5 to run .apk demo.
If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)
I followed the steps mentioned in #1269 and could able to run the example successfully, but the accuracy of the result is very low and often wrong. I have trained my systems on 25 different daily used products like soap, soup, noodles, etc.
Where as when i run the same example using following script it give me very high accuracy (approx. 90-95%)
import sys
import tensorflow as tf
// change this as you see fit
image_path = sys.argv[1]
// Read in the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
// Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile("/tf_files/retrained_labels.txt")]
// Unpersists graph from file
with tf.gfile.FastGFile("/tf_files/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
// Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
// Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
The only difference I see here is that the model file used in the Android demo is stripped because it does not support DecodeJpeg, whereas in the above code its the actually generated unstripped model. Is there any specific reason or somewhere I am wrong here?
I also tried using optimize_for_inference
but unfortunately, it fails with following error:
[milinddeore#P028: ~/tf/tensorflow ] bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/Users/milinddeore/tf_files_nm/retrained_graph.pb --output=/Users/milinddeore/tf/tensorflow/tensorflow/examples/android/assets/tf_ul_stripped_graph.pb --input_names=DecodeJpeg/content —-output_names=final_result
Traceback (most recent call last):
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/optimize_for_inference.py", line 141, in <module>
app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/optimize_for_inference.py", line 90, in main
FLAGS.output_names.split(","), FLAGS.placeholder_type_enum)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/optimize_for_inference_lib.py", line 91, in optimize_for_inference
placeholder_type_enum)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/strip_unused_lib.py", line 71, in strip_unused
output_node_names)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/framework/graph_util_impl.py", line 141, in extract_sub_graph
assert d in name_to_node_map, "%s is not in graph" % d
AssertionError: is not in graph
I suspect that this problem is due to the android not being parse DecodeJpeg, but please correct me if i am wrong.
What other attempted solutions have you tried?
Yes, I the above script and it gives me quite high accuracy result.
Well, the reason for bad accuracy is following:
I ran this example code on Lenovo Vibe K5 mobile (This has SanpDragon 415), this wasn't compiled for hexagon DSP, even through the DSP on 415 is very old as compared to 835 (Hexagon DSP 682), in fact i am not quite sure if the Hexagon SDK will work with 415 or not, i haven't tried this though. This means the example was running on CPU to first detect motion and later classify them and hence the poor performance.
Slow FPS, will capture the images very slowly and hence moving objects will be really difficult.
So if you have bad image, there are very strong chances that the prediction will also be bad.
Camera capture and classification is taking long time, due to latency its not quite real-time.