How can I save a CNTK model as text? - cntk

I have a neural network created and trained in CNTK. I can save it with model.save_model("mymodel.dnn") in Python. This produces a file serialized in protobuf format.
How can I either save the model as plain text or convert the .dnn file to plain text?

The format CNTK uses is protobuf. Therefore you can use things like
import google.protobuf.text_format
to create a readable output. This page has further information.
Our protobuf files are currently in this location. I'm hard linking to version 2b9. Make sure you use the right .proto file.

The protobuf compiler can generate textual representation from a binary model file, you just need to point to the CNTK proto definition and tell it to expect a Dictionary inside the model file:
%PROTOBUF_PATH%\bin\protoc --decode CNTK.proto.Dictionary --proto_path [CNTK root]\Source\CNTKv2LibraryDll\proto\ [CNTK root]\Source\CNTKv2LibraryDll\proto\CNTK.proto < mymodel.dnn > mymodel.txt

With Brainscript you can add
command = <yourCommands>:DumpNodeInfo
modelDir = "./ANNmodel"
modelPath = "$modelDir$/NN.dnn"
...
# dump parameter values
DumpNodeInfo = {
action = "dumpNode"
printValues = true
}
Click here for more information.

You can convert the model trained by CNTK to txt format with CNTK's dumpnode command. Here is contents config file txt.conf:
command = convert2txt
convert2txt = [
action = "dumpnode"
modelPath="./cntkSpeechFF.dnn.5"
nodeName = "Prior" # if not specified, all nodes will be printed
outputFile = "./cntkSpeechFF.dnn.5.txt" # the path to the output file. If not specified a file name will be automatically generated based on the modelPath.
printValues = true
printMetadata = true
]
Then you run cntk as
cntk configFile=txt.conf

Related

COCO json annotation to YOLO txt format

how to convert a single COCO JSON annotation file into a YOLO darknet format?? like below
each individual image has separate filename.txt file
My classmates and I have created a python package called PyLabel to help others with this task and other labelling tasks.
Our package does this conversion! You can see an example in this notebook https://github.com/pylabel-project/samples/blob/main/coco2yolov5.ipynb.
You're answer should be in there! But you should be able to do this conversion by doing something like:
!pip install pylabel
from pylabel import importer
dataset = importer.ImportCoco(path=path_to_annotations, path_to_images=path_to_images)
dataset.export.ExportToYoloV5(dataset)
You can find the source code that is used behind the scenes here https://github.com/pylabel-project/
I built a tool
https://github.com/tw-yshuang/coco2yolo
Download this repo and use the following command:
python3 coco2yolo.py [OPTIONS]
coc2yolo
Usage: coco2yolo.py [OPTIONS] [CAT_INFOS]...
Options:
-ann-path, --annotations-path TEXT
JSON file. Path for label. [required]
-img-dir, --image-download-dir TEXT
The directory of the image data place.
-task-dir, --task-categories-dir TEXT
Build a directory that follows the task-required categories.
-cat-t, --category-type TEXT Category input type. (interactive | file) [default: interactive]
-set, --set-computing-type TEXT
Set Computing for the data. (union | intersection) [default: union]
--help Show this message and exit.
There is an open-source tool called makesense.ai for annotating your images. You can download YOLO txt format once you annotate your images. But you won't be able to download the annotated images.
There is three ways.
use roboflow https://roboflow.com/formats (You can find another solution also)
You can find some usage guide for roboflow. e.g.
https://medium.com/red-buffer/roboflow-d4e8c4b52515
search 'convert coco format to yolo format' -> you will find some open-source codes to convert annotations to yolo format.
write your own code to convert coco format to yolo format

Passing commandline argument in google colab

How to pass commandline argument when running a python code in google colab?
I have written a code which takes a file as input via sys.argv[]. How do I do this?
As far as I know, there is no special way to pass command line arguments to python code. This is a working code sample I use to when creating tfrecords.
!python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record --image_dir=images/
I don't see any difference between the regular command line python argument passing and the colab. Please add more code to your question to get better help.
I tried this in a google colab notebook
import sys
sys.argv[0] = "first_arg" # this is to assign the first command line argument
sys.argv[1] = "second_arg" # This line to assign the second arg for example
And it worked for me.
So if you want to run a python code that works like this:
!python test.py --image_folder '/content/image' --workers 2 --Prediction CTC --rgb True
You have to open test.py or your file with editor then you will find line inside the file similer like this:
parser = argparse.ArgumentParser()
parser.add_argument('--image_folder', required=True, help='path to image_folder')
parser.add_argument('--workers', type=int, default=1, help='number of workers')
parser.add_argument('--Prediction', type=str, default='CTC', help='Prediction stage.')
parser.add_argument('--rgb', action='store_true', help='use rgb input')
args = parser.parse_args()
But this will give you " Error SystemExit: 2 "
Then you have to change like this:
parser = argparse.ArgumentParser()
parser.add_argument('--image_folder', required=False, default='/content/image', help='path to image_folder')
parser.add_argument('--workers', type=int, default=2, help='number of workers')
parser.add_argument('--Prediction', type=str, default='CTC', help='Prediction stage.')
parser.add_argument('--rgb', action='store_false', help='use rgb input')
parser.add_argument("-f", "--file", required=False)
args = parser.parse_args()
You must add in the end of " parser.add_argument " line:
parser.add_argument("-f", "--file", required=False)
Then you can call commandline argument like this:
image = args.image_path
Or
img = Image.open(args.image_path)
workers = args.workers
But if your last line like this:
args = vars(ap.parse_args())
Then you have to call it like this:
image = args["image_path"]
Or
img = Image.open(args["image_path"])
workers = args["workers"]
#Note ( action='store_false' ) will default to ( False )
Likewise, ( action='store_false' ) will default to ( True )
Tested with Google colab
I made a bioinformatic tool locally in my machine to parse Uniprot big data files of proteins.
The tool I made needs the passing of different parameters using command line arguments. After the tool was working locally, I upload data files and python source files to my google drive.
I did not make any changes to my files. I just run directly the following command in google colab:
!python3 drive/MyDrive/uniprot/uniprot_select.py FIELDS "ID,OS,SQ" FROM drive/MyDrive/data/uniprot.dat WHERE "SQ#EYDRRR" FASTA
It works perfectly!
No need of special parsing, no need to additional imports. All the work you normally do locally in your machine, can be executed without changes.

TFX StatisticsGen for image data

Hi I've trying to get a TFX Pipeline going just as an exercise really. I'm using ImportExampleGen to load TFRecords from disk. Each Example in the TFRecord contains a jpg in the form of a byte string, height, width, depth, steering and throttle labels.
I'm trying to use StatisticsGen but I'm receiving this warning;
WARNING:root:Feature "image_raw" has bytes value "None" which cannot be decoded as a UTF-8 string. and crashing my Colab Notebook. As far as I can tell all the byte-string images in the TFRecord are not corrupt.
I cannot find concrete examples on StatisticsGen and handling image data. According to the docs Tensorflow Data Validation can deal with image data.
In addition to computing a default set of data statistics, TFDV can also compute statistics for semantic domains (e.g., images, text). To enable computation of semantic domain statistics, pass a tfdv.StatsOptions object with enable_semantic_domain_stats set to True to tfdv.generate_statistics_from_tfrecord.
But I'm not sure how this fits in with StatisticsGen.
Here is the code that instantiates an ImportExampleGen then the StatisticsGen
from tfx.utils.dsl_utils import tfrecord_input
from tfx.components.example_gen.import_example_gen.component import ImportExampleGen
from tfx.proto import example_gen_pb2
examples = tfrecord_input(_tf_record_dir)
# https://www.tensorflow.org/tfx/guide/examplegen#custom_inputoutput_split
# has a good explanation of splitting the data the 'output_config' param
# Input train split is _tf_record_dir/*'
# Output 2 splits: train:eval=8:2.
train_ratio = 8
eval_ratio = 10-train_ratio
output = example_gen_pb2.Output(
split_config=example_gen_pb2.SplitConfig(splits=[
example_gen_pb2.SplitConfig.Split(name='train',
hash_buckets=train_ratio),
example_gen_pb2.SplitConfig.Split(name='eval',
hash_buckets=eval_ratio)
]))
example_gen = ImportExampleGen(input=examples,
output_config=output)
context.run(example_gen)
statistics_gen = StatisticsGen(
examples=example_gen.outputs['examples'])
context.run(statistics_gen)
Thanks in advance.
From git issue response
Thanks Evan Rosen
Hi Folks,
The warnings you are seeing indicate that StatisticsGen is trying to treat your raw image features like a categorical string feature. The image bytes are being decoded just fine. The issue is that when the stats (including top K examples) are being written, the output proto is expecting a UTF-8 valid string, but instead gets the raw image bytes. Nothing is wrong with your setups from what I can tell, but this is just an unintended side-effect of a well-intentioned warning in the event that you have a categorical string feature which can't be serialized. We'll look into finding a better default that handles image data more elegantly.
In the meantime, to tell StatisticsGen that this feature is really an opaque blob, you can pass in a user-modified schema as described in the StatsGen docs. To generate this schema, you can run StatisticsGen and SchemaGen once (on a sample of data) and then modify the inferred schema to annotate that image features. Here is a modified version of the colab from #tall-josh:
Open In Colab
The additional steps are a bit verbose, but having a curated schema is often a good practice for other reasons. Here is the cell that I added to the notebook:
from google.protobuf import text_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2
# Load autogenerated schema (using stats from small batch)
schema = tfx.utils.io_utils.SchemaReader().read(
tfx.utils.io_utils.get_only_uri_in_dir(
tfx.types.artifact_utils.get_single_uri(schema_gen.outputs['schema'].get())))
# Modify schema to indicate which string features are images.
# Ideally you would persist a golden version of this schema somewhere rather
# than regenerating it on every run.
for feature in schema.feature:
if feature.name == 'image/raw':
feature.image_domain.SetInParent()
# Write modified schema to local file
user_schema_dir ='/tmp/user-schema/'
tfx.utils.io_utils.write_pbtxt_file(
os.path.join(user_schema_dir, 'schema.pbtxt'), schema)
# Create ImportNode to make modified schema available to other components
user_schema_importer = tfx.components.ImporterNode(
instance_name='import_user_schema',
source_uri=user_schema_dir,
artifact_type=tfx.types.standard_artifacts.Schema)
# Run the user schema ImportNode
context.run(user_schema_importer)
Hopefully you find this workaround is useful. In the meantime, we'll take a look at a better default experience for image-valued features.
Groked this and found the solution to be dramatically simpler than i thought...
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
import logging
...
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
...
context = InteractiveContext(pipeline_name='my_pipe')
...
c = StatisticsGen(...)
...
context.run(c)

Trying to save a tensorflow model but failing to create directory "failed to create a directory: /tmp/serving_savemodel\1565577669"

I'm trying to save a DNN classifier model so I can generate a tflite model, but in the last line, when trying to export to directory, I get error below:
failed to create a directory: /tmp/serving_savemodel\1565577669
feature_spec = tf.feature_column.make_parse_example_spec(feat_cols)
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
servable_model_dir = "/tmp/serving_savemodel"
servable_model_path = dnn_model.export_savedmodel(servable_model_dir, export_input_fn)
I faced a similar problem in my windows machine.
I solved it by using as path separator "\" instead of "/".
Example for your case:
servable_model_dir = r"\tmp\serving_savemodel"
Related question: Tensorflow error failed to create directory

ValueError: Input 0 of node Variable/Assign was passed int32 from Variable:0 incompatible with expected int32_ref

I am currently trying to get a trained TF seq2seq model working with Tensorflow.js. I need to get the json files for this. My input is a few sentences and the output is "embeddings". This model is working when I read in the checkpoint however I can't get it converted for tf.js. Part of the process for conversion is to get my latest checkpoint frozen as a protobuf (pb) file and then convert that to the json formats expected by tensorflow.js.
The above is my understanding and being that I haven't done this before, it may be wrong so please feel free to correct if I'm wrong in what I have deduced from reading.
When I try to convert to the tensorflow.js format I use the following command:
sudo tensorflowjs_converter --input_format=tf_frozen_model
--output_node_names='embeddings'
--saved_model_tags=serve
./saved_model/model.pb /web_model
This then displays the error listed in this post:
ValueError: Input 0 of node Variable/Assign was passed int32 from
Variable:0 incompatible with expected int32_ref.
One of the problems I'm running into is that I'm really not even sure how to troubleshoot this. So I was hoping that perhaps one of you maybe had some guidance or maybe you know what my issue may be.
I have upped the code I used to convert the checkpoint file to protobuf at the link below. I then added to the bottom of the notebook an import of that file that is then providing the same error I get when trying to convert to tensorflowjs format. (Just scroll to the bottom of the notebook)
https://github.com/xtr33me/textsumToTfjs/blob/master/convert_ckpt_to_pb.ipynb
Any help would be greatly appreciated!
Still unsure as to why I was getting the above error, however in the end I was able to resolve this issue by just switching over to using TF's SavedModel via tf.saved_model. A rough example of what worked for me can be found below should anyone in the future run into something similar. After saving out the below model, I was then able to perform the tensorflowjs_convert call on it and export the correct files.
if first_iter == True: #first time through
first_iter = False
#Lets try saving this badboy
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
shutil.rmtree(path, ignore_errors=True)
inputs_dict = {
"batch_decoder_input": tf.convert_to_tensor(batch_decoder_input)
}
outputs_dict = {
"batch_decoder_output": tf.convert_to_tensor(batch_decoder_output)
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Model Saved')
#End save model code