Converts my own image data to TFRecords - tensorflow

now I am practicing to convert my own image data to TFRrcords for tensorflow.I am really new with tensorflow so I just modified the build_image_data.py which I got from Github.
This is some parts of the original code:
bazel-bin/inception/build_image_data \
--train_directory="${TRAIN_DIR}" \
--validation_directory="${VALIDATION_DIR}" \
--output_directory="${OUTPUT_DIRECTORY}" \
--labels_file="${LABELS_FILE}" \
--train_shards=128 \
--validation_shards=24 \
--num_threads=8
And I replace them with :
# convert the data.
bazel-bin/inception/build_image_data \
--train_directory=("C:/Dataset/Training data")
--validation_directory=("C:/Dataset/Test data")
--output_directory=("C:/Dataset/Trf")
--labels_file="C:/Dataset/Labels file"
--train_shards=128
--validation_shards=24
--num_threads=8
But I got an error as follows:
File "<ipython-input-12-4e5ff554c85f>", line 90
bazel-bin/inception/build_image_data --train_directory=("C:/Dataset/Training data")
^
SyntaxError: can't assign to operator
Someone could help me, please.
Thanks.

Just remove parentheses around path:
bazel-bin/inception/build_image_data \
--train_directory="C:/Dataset/Training data"
--validation_directory="C:/Dataset/Test data"
--output_directory="C:/Dataset/Trf"
--labels_file="C:/Dataset/Labels file"
--train_shards=128
--validation_shards=24
--num_threads=8

Related

using UDF's and simpe dataframes in pyspark

I am new to pyspark and come trying to do something like below
call a function PrintDetails for each cookie and then write the result into a file. The spark.sql query returns the correct data and I can serialize it to a file as well.
Can someone help with the for statement on each cookie. What should the syntax be for calling the UDF and how can I write the output to a text file?
any help is appreciated.
Thanks
#udf(returnType=StringType())
def PrintDetails(cookie, timestamps,current_day, current_hourly_threshold,current_daily_threshold):
#DO SOME WORK
return "%s\t%d\t%d\t%d\t%d\t%s" %(some_data)
def main(argv):
spark = SparkSession \
.builder \
.appName("parquet_test") \
.config("spark.debug.maxToStringFields", "100") \
.getOrCreate()
inputPath = r'D:\Hadoop\Spark\parquet_input_files'
inputFiles = os.path.join(inputPath, '*.parquet')
impressionDate = datetime.strptime("2019_12_31", '%Y_%m_%d')
current_hourly_threshold = 40
current_daily_threshold = 200
parquetFile = spark.read.parquet(inputFiles)
parquetFile.createOrReplaceTempView("parquetFile")
cookie_and_time = spark.sql("SELECT cookie, collect_list(date_format(from_unixtime(ts), 'YYYY-mm-dd-H:M:S')) as imp_times FROM parquetFile group by 1 ")
for cookie in cookie_and_time :
PrintDetails(cookie('cookie'), cookie('imp_times'), impressionDate, current_hourly_threshold, current_daily_threshold))
You can do like below.
cookie_df= cookie_and_time.withColumn("cookies",PrintDetails(cookie('cookie'), cookie('imp_times'), lit(impressionDate), lit(current_hourly_threshold), lit(current_daily_threshold)))
Or you can define all your variables in udf function itself and avoid passing as arguments.

Image Classification Graph model making wrong predictions

I'm using the make_image_classifier python script to retrain a mobilenetv2 on a new set of images. My end goal is to make predictions in tfjs in the browser.
This is exactly what i'm doing:
Step 1: Retrain the model
make_image_classifier \
--image_dir input_data \
--tfhub_module https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4 \
--image_size 224 \
--saved_model_dir ./trained_model \
--labels_output_file class_labels.txt \
--tflite_output_file new_mobile_model.tflite
Step 2: Convert the tf saved model to a graph model using tensorflowjs_converter
tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
--signature_name=serving_default \
--saved_model_tags=serve \
trained_model/ \
web_model/
Step 3: load the new model in the browser, preprocess an image input and ask the model to make a prediction
const model = tf.loadGraphModel('model.json').then(function(m){
var img = document.getElementById("img");
var processed=preprocessImage(img, "mobilenet")
window.prediction=m.predict(processed)
window.prediction.print();
})
})
function preprocessImage(image,modelName){
let tensor=tf.browser.fromPixels(image)
.resizeNearestNeighbor([224,224])
.toFloat();
console.log('tensor pro', tensor);
if(modelName==undefined)
{
return tensor.expandDims();
}
if(modelName=="mobilenet")
{
let offset=tf.scalar(127.5);
console.log('offset',offset);
return tensor.sub(offset)
.div(offset)
.expandDims();
}
else
{
throw new Error("Unknown Model error");
}
}
I'm getting invalid results. I checked the predictions made by the initial model and they are correct so what I'm thinking is either the conversion is not happening properly or I'm not preprocessing the image in the same manner that the initial script is.
Help.
P.S: When running the converter, I'm getting the following message. Not sure if its directly relevant to what I'm experiencing.
tensorflow/core/graph/graph_constructor.cc:750 Node 'StatefulPartitionedCall' has 71 outputs but the _output_shapes attribute specifies shapes for 605 outputs. Output shapes may be inaccurate.
make_image_classifier creates a saved_model specified to tensorflow lite. If you rather want to convert mobilenet to tensorflow.js, the command to used has been given in this answer.
Instead of using make_image_classifier, you would need to use retrain.py which can be downloded by the following
curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py

An error occurred while training deeplabv3++ using CityScapes dataset:“data split name train not recognized”

I followed the following steps:
1.CityScapes data set preparation
2.Generate TFRecords of CityScapes
3.Download the pre-training model
4.Run official instruction
python deeplab/train.py \
--logtostderr \
--training_number_of_steps=1000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=513 \
--train_crop_size=513 \
--train_batch_size=1 \
--dataset="cityscapes" \
--tf_initial_checkpoint='/root/newP/official_tf/models-master/research/deeplab/backbone/deeplabv3_cityscapes_train/model.ckpt' \
--train_logdir='/root/newP/official_tf/models-master/research/deeplab/exp/train_on_train_set/train' \
--dataset_dir='/root/dataset/cityscapesScripts/tfrecord'
An error occurred while training deeplabv3++ using CityScapes dataset
“data split name train not recognized”.
I found the problem after debugging: "train" no longer exit in
"_CITYSCAPES_INFORMATION.splits_to_sizes".
Content in the code:
_CITYSCAPES_INFORMATION = DatasetDescriptor(
splits_to_sizes={'train_fine': 2975,
'train_coarse': 22973,
'trainval_fine': 3475,
'trainval_coarse': 23473,
'val_fine': 500,
'test_fine': 1525},
num_classes=19,
ignore_label=255,
)
I tried several others "train_fine","train_coarse".A new error occurred:
"Total size of new array must be unchanged for image_pooling/weights lh_shape: [(1, 1, 2048, 256)], rh_shape: [(1, 1, 320, 256)]".
May I ask what modifications I should do?
I found that the latest version of the pretraining model had a problem, and I could run it directly when I was not using the pretraining model.
[https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md][1]

Adding sqlite Qt5 plugin in Yocto

Following this answer, I'm trying to add the sqlite (sqlite3) Qt5 plugin I forgot to enable during the last Yocto build. Here what I did:
Under my own custom layer (meta-custom-layer/recipes-core) I added a file qtbase_%.bbappend.
Inside I put:
PACKAGECONFIG_append = " sql-sqlite"
PACKAGECONFIG[sql-sqlite] = "-sql-sqlite,-no-sql-sqlite,sqlite3"
Then I deleted the tmp folder and issued bitbake qtbase. I didn't removed the sstate-cache because I added something rather removed or changed.
After parsing the recipes it successfully rebuilt the tmp folder but I cannot find anything related to the requested plugin (it should be libqsqlite.so).
I didn't understand the answer provided in the link above?
What is the right method to add this plugin?
UPDATE
To be usre there's nothing else to tune, here the contents of the image bb file:
SUMMARY = "blabla"
LICENSE = "Proprietary"
include recipes-st/images/st-image.inc
inherit core-image distro_features_check
CONFLICT_DISTRO_FEATURES = "x11 wayland"
IMAGE_LINGUAS = "en-us"
IMAGE_FEATURES += "splash package-management ssh-server-dropbear"
IMAGE_ROOTFS_MAXSIZE = ""
IMAGE_QT_MANDATORY_PART = " \
qtbase \
qtbase-plugins \
qtbase-tools \
"
IMAGE_QT_OPTIONAL_PART = " \
qtserialport \
"
CORE_IMAGE_EXTRA_INSTALL += " \
systemd-networkd-configuration \
\
packagegroup-framework-tools-core-base \
packagegroup-framework-tools-kernel-base \
packagegroup-framework-tools-network-base \
packagegroup-framework-tools-python2-base \
packagegroup-framework-tools-python3-base \
\
packagegroup-framework-tools-core \
packagegroup-framework-tools-kernel \
packagegroup-framework-tools-network \
packagegroup-framework-tools-python2 \
packagegroup-framework-tools-python3 \
\
packagegroup-core-eclipse-debug \
\
${IMAGE_QT_MANDATORY_PART} \
${IMAGE_QT_OPTIONAL_PART} \
"
and here the contents of the RDEPENDS_${PN} var in layers/meta-qt5/recipes-qt/packagegroups/packagegroup-qt5-toolchain-target.bb:
RDEPENDS_${PN} += " \
packagegroup-core-standalone-sdk-target \
libsqlite3-dev \
qtbase-dev \
qtbase-mkspecs \
qtbase-plugins \
qtbase-staticdev \
qtconnectivity-dev \
qtconnectivity-mkspecs \
qtmqtt-dev \
qtmqtt-mkspecs \
qtserialport-dev \
qtserialport-mkspecs \
qtserialbus-dev \
qtserialbus-mkspecs \
qtsystems-dev \
qtsystems-mkspecs \
qttools-dev \
qttools-mkspecs \
qttools-staticdev \
qtwebsockets-dev \
qtwebsockets-mkspecs \
qtwebchannel-dev \
qtwebchannel-mkspecs \
"
The PACKAGECONFIG is already there:
PACKAGECONFIG[sql-sqlite] = "-sql-sqlite -system-sqlite,-no-sql-sqlite,sqlite3"
Your problem is most likely due to you redefining in (wrongfully as you can see).
You do you have to define new PACKAGECONFIG. Just enable it with:
PACKAGECONFIG_append = " sql-sqlite"

Distributed traning fail on two machine : InvalidArgumentError

I have two machine, each of which has 4 GPUs. I use
with tf.device('/job:worker/replica:%d/task:%d/gpu:%d' % (FLAGS.replica_id, FLAGS.task_id, FLAGS.gpu_device_id)):
to dictate device, but failed with these error log:
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'init_all_tables': Could not satisfy explicit device specification '/job:worker/replica:1/task:4/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:ps/replica:0/task:0/cpu:0,
/job:worker/replica:0/task:0/cpu:0, /job:worker/replica:0/task:0/gpu:0, /job:worker/replica:0/task:0/gpu:1, /job:worker/replica:0/task:0/gpu:2, /job:worker/replica:0/task:0/gpu:3, /job:worker/replica:0/task:1/cpu:0, /job:worker/replica:0/task:1/gpu:0, /job:worker/replica:0/task:1/gpu:1, /job:worker/replica:0/task:1/gpu:2, /job:worker/replica:0/task:1/gpu:3, /job:worker/replica:0/task:2/cpu:0, /job:worker/replica:0/task:2/gpu:0, /job:worker/replica:0/task:2/gpu:1, /job:worker/replica:0/task:2/gpu:2, /job:worker/replica:0/task:2/gpu:3, /job:worker/replica:0/task:4/cpu:0, /job:worker/replica:0/task:4/gpu:0, /job:worker/replica:0/task:4/gpu:1, /job:worker/replica:0/task:4/gpu:2, /job:worker/replica:0/task:4/gpu:3, /job:worker/replica:0/task:5/cpu:0, /job:worker/replica:0/task:5/gpu:0, /job:worker/replica:0/task:5/gpu:1, /job:worker/replica:0/task:5/gpu:2, /job:worker/replica:0/task:5/gpu:3, /job:worker/replica:0/task:6/cpu:0, /job:worker/replica:0/task:6/gpu:0, /job:worker/replica:0/task:6/gpu:1, /job:worker/replica:0/task:6/gpu:2, /job:worker/replica:0/task:6/gpu:3, /job:worker/replica:0/task:7/cpu:0, /job:worker/replica:0/task:7/gpu:0, /job:worker/replica:0/task:7/gpu:1, /job:worker/replica:0/task:7/gpu:2, /job:worker/replica:0/task:7/gpu:3
it seems like tensorflow can't find machine B ? but I have totally same hardware and software configuration on both machine.
the start script:
# machine 10.10.12.28
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=0 \
--gpu_device_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=1 \
--gpu_device_id=1 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=2 \
--gpu_device_id=2 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=3 \
--gpu_device_id=3 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
CUDA_VISIBLE_DEVICES='' ~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--job_name='ps' \
-task_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
# machine 10.10.12.29
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=4 \
--gpu_device_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=5 \
--gpu_device_id=1 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=6 \
--gpu_device_id=2 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=7 \
--gpu_device_id=3 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
TL;DR: Don't ever use '/replica:%d' in your device specification.
The problem seems to be in your device string:
'/job:worker/replica:%d/task:%d/gpu:%d' % (FLAGS.replica_id, FLAGS.task_id, FLAGS.gpu_device_id)
The device specification '/replica:%d' is not supported in the open-source version of TensorFlow (but it is retained for some backwards compatibility reasons). The replica ID should be 0 for all tasks. You can solve this immediately by passing 0 as the --replica_id for each task, but you should really remove that flag from your version of the code.