Gstreamer: How to read a structure inside the property of an element - tensorflow

First of all , I am very new to G-streamer and it would be very helpful if someone could just give me a simple explanation about what I'm asking here.
So there is a pipeline which feeds raw video from a camera source to Tensorflow element which detects faces and stores the Face ROI coordinates into a structure and also update some kind of metadata. After this , there's a display overlay element which will draw a bounding box using the inference results read from metadata updated by the Tensor Flow element.
Tensorflow (plugin) --> Post processing (property) --> Detection (structure)
To put it simply , I need to get the values of structure which will get updated on each face detection when the pipeline is running. I checked gst_bin_get_by_name() + g_object_class_find_property()+g_object_get () API combination , but seems like it can only read the state like Enabled/disabled,a parameter string etc from the property.
I don't know I was able to convey my requirement properly.
Someone please help me out.

It appears that you can use nnstreamer for your application: https://github.com/nnstreamer/nnstreamer
The corresponding (GStreamer) pipeline would look like (w/ a lot of assumptions):
v4l2src ! # assuming a USB camera
tee name=rawvideo ! queue max-buffer-size=2 leaky=2 !
videoconvert ! videoscale ! videorate ! # basic preprocessing for live video
video/x-raw,format=RGB,width=300,height=300 ! # another assumption on your model
tensor_converter ! # now, the format becomes video/x-raw --> other/tensors
tensor_transform mode=arithmetic option=typecast:float32,div:255 ! # you may add different mode & options for your pre-processing needs (e.g., transpose, standardization, ...)
tensor_filter framework=tensorflow model=YOURMODELFILE.pb !
tee name=result ! appsink name=yourappcangetrawstructuretensors
result. ! tensor_decoder mode=bounding_boxes option..... ! # if the correponsing subplugin exists for the given structure, you can simply designate here. otherwise, you can write your own code and attach here.
mix.sink_1
rawvideo. ! queue leaky=2 max-buffer-size=2 ! mix.sink_0
### use composite or videomixer to overlay boundingboxes
compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! videoconvert !
autovideosink ## you may use different video sink for your systems.
In other words, you can have separated streams of the "structure" (output of tensorflow) and the "video" (input of tensorflow or the original video stream) and merge them later at compositor in the above example.

Related

Convert a .npy file to wav following tacotron2 training

I am training the Tacotron2 model using TensorflowTTS for a new language.
I managed to train the model (performed pre-processing, normalization, and decoded the few generated output files)
The files in the output directory are .npy files. Which makes sense as they are mel-spectograms.
I am trying to find a way to convert said files to a .wav file in order to check if my work has been fruitfull.
I used this :
melspectrogram = librosa.feature.melspectrogram(
"/content/prediction/tacotron2-0/paol_wavpaol_8-norm-feats.npy", sr=22050,
window=scipy.signal.hanning, n_fft=1024, hop_length=256)
print('melspectrogram.shape', melspectrogram.shape)
print(melspectrogram)
audio_signal = librosa.feature.inverse.mel_to_audio(
melspectrogram, sr22050, n_fft=1024, hop_length=256, window=scipy.signal.hanning)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
But it is given me this error : Audio data must be of type numpy.ndarray.
Although I am already giving it a numpy.ndarray file.
Does anyone know where the issue might be, and if anyone knows a better way to do it?
I'm not sure what your error is, but the output of a Tacotron 2 system are log Mel spectral features and you can't just apply the inverse Fourier transform to get a waveform because you are missing the phase information and because the features are not invertible. You can learn about why this is at places like Speech.Zone (https://speech.zone/courses/)
Instead of using librosa like you are doing, you need to use a vocoder like HiFiGan (https://github.com/jik876/hifi-gan) that is trained to reconstruct a waveform from log Mel spectral features. You can use a pre-trained model, and most off-the-shelf vocoders, but make sure that the sample rate, Mel range, FFT, hop size and window size are all the same between your Tacotron2 feature prediction network and whatever vocoder you choose otherwise you'll just get noise!

How to makeup FSNS dataset with my own image for attention OCR tensorflow model

I want to apply attention-ocr to detect all digits on number board of cars.
I've read your README.md of attention_ocr on github(https://github.com/tensorflow/models/tree/master/research/attention_ocr), and also the way I should do to use my own image data to train model with the StackOverFlow page.(https://stackoverflow.com/a/44461910/743658)
However, I didn't get any information of how to store annotation or label of the picture, or the format of this problem.
For object detection model, I was able to make my dataset with LabelImg and converting this into csv file, and finally make .tfrecord file.
I want to make .tfrecord file on FSNS dataset format.
Can you give me your advice to go on this training steps?
Please reread the mentioned answer it has a section explaining how to store the annotation. It is stored in the three features image/text, image/class and image/unpadded_class. The image/text field is used for visualization, some models support unpadded sequences and use image/unpadded_class, while the default version relies on the text padded with null characters to have the same length stored in the feature image/class. Here is the excerpt to store the text annotation:
char_ids_padded, char_ids_unpadded = encode_utf8_string(
text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
feature={
'image/class': _int64_feature(char_ids_padded),
'image/unpadded_class': _int64_feature(char_ids_unpadded),
'image/text': _bytes_feature(text)
...
}
))
If you have worked with tensorflow object detection, then the apporach should be much easier for you.
You can create the annotation file (in .csv format) using labelImg or any other annotation tool.
However, before converting it into tensorflow format (.tfrecord), you should keep in mind the annotation format. (FSNS format in this case)
The format is : files text xmin ymin xmax ymax
So while annotating dont bother much about the class (as you would have done in object detection !! Some random name should suffice.)
Convert it into .tfrecords.
And finally labelMap is a list of characters which you have annotated.
Hope it helps !

Incorrect Broadcast input array shape error when trying to use Pretraining

I am trying to use spacy's 'pre-train' feature for a NER task, so here is what I tried doing(I am still trying to use it),
Step 1: I started by initializing the model with 'en_core_web_lg' next I saved this model to disk and tested its NER capability on few lines to see if it recognizes the tags in those test lines. (Made a note of ignored tags)
Step 2: Next I created a .jsonl file with new data to train on (about 20 new lines, I wanted to see the model's capability given new data around an entity(ignored tags found earlier) will it be able to correctly identify tags after doing transfer learning). So using this .jsonl and the model I saved earlier file I used 'spacy pre-train' command to train, this created a token2vec .bin file for me (model999.bin).
Step 3: Next I created a function that takes the location of an earlier saved model(model saved in step 1) and location of token2vec (model999.bin file obtained in step 2). Inside the function it loads the model>creates/gets pipe>disables rest of the files>uses (pipe_name).model.tok2vec.from_bytes(file_.read()) to read from model999.bin and broadcast the learned vectors to base model.
But when I run this function, I get this error:
ValueError: could not broadcast input array from shape (96,3,384) into shape (96,3,480)
(I have uploaded the entire notebook here: [https://github.com/pratikdk/ner_test/blob/master/base_model_contextual_TF.ipynb ]).
In order to pre-train I used this function
python -m spacy pre-train ub.jsonl model_saves w2s
Here are the 20 lines I tried training on top of the base model
[ https://github.com/pratikdk/ner_test/blob/master/ub.jsonl ]
What am I doing wrong here exactly? Please can you also point the fix, I am sure many would need insight on this.
Environment
Operating System: CentOS
Python Version Used: 3.7.3
spaCy Version Used: 2.1.3
Environment Information: Anaconda Jupyter Lab
So I was able to fix this, the developer(on github) answered my question.
Here is the answer:
https://github.com/explosion/spaCy/issues/3616

Predict value of single image after training model on TPU

I still want to know how I can predict the value of an image after training the network, but it seems like it is not supported yet. Any idea for a workaround (taken from the mnist_tpu.py)?
if mode == tf.estimator.ModeKeys.PREDICT:
raise RuntimeError("mode {} is not supported yet".format(mode))
Besides Stackoverflow - anywhere else I can get support for the implementing my models using TPU?
Here is a Python program that sends an image to a TPU-trained model (ResNet in this case) and gets back a classification:
with tf.gfile.FastGFile('/some/path.jpg', 'r') as ifp:
credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1', credentials=credentials,
discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')
request_data = {'instances':
[
{"image_bytes": {"b64": base64.b64encode(ifp.read())}}
]
}
parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, MODEL, VERSION)
response = api.projects().predict(body=request_data, name=parent).execute()
print("response={0}".format(response))
Full code is here: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/quests/tpu/flowers_resnet.ipynb
This article documents the process of writing a model for the Cloud TPU: https://medium.com/tensorflow/how-to-write-a-custom-estimator-model-for-the-cloud-tpu-7d8bd9068c26
It is supported now. Changes have been done to https://github.com/tensorflow/models/blob/master/official/mnist/mnist_tpu.py to make it working.
Besides stackoverflow, you can add your issues on github https://github.com/tensorflow/tpu/issues.
According to the documentation, you can choose online or batch modes for prediction, but you can't select the target device. As stated, "the prediction service allocates resources to run your job."
The documentation says that prediction is performed by nodes. I thought I'd read somewhere that prediction nodes are always CPUs in the Google Compute Engine, but I can't find a clear reference.

Tensorflow Lite export looks like it do not add weigths and add unsupported operations

I want to reload some of my model variables with the saved weight in the chheckpoint and then export it to the tflite file.
The question is a bit tricky without see code, so I made this Colab jupyter notebook with the complete code to explain it better (All code is working, you can actually copy in a new collab and change if you want):
https://colab.research.google.com/drive/1wSor4CxEz36LgElVi4y_N8uiSt4-j9b2#scrollTo=XKBQzoW_wd4A
I got it working but with two issues:
The exported .tflite file is like 3Ks, so I do not believe it is the entire model with the weights in it. Only the input is an image of 128x128x3, one weight for each is more than 3K.
When I finally import the model in Android, I have this error: "Didn't find custom op for name 'VariableV2' /n Didn't find custom op for name 'ReorderAxes' /n Registration failed."
Maybe the last error is cause the save/restore operations? They look like are there when I save the graph definition.
Thanks in advance.
I realize my problem.. I'm trying to convert to TFLITE a model without previously freezing it, TFLITE do not allow "VariableV2" nodes cause they should not be there..
All the problem is corrected freezing the model like this:
output_graph_def = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), ["output"])
I lost some time looking for that, hope it helps.