I obtained test results(bbox size, location) via code below,
!./darknet detector map data/data.obj cfg/yolov4-obj.cfg backup/yolov4_last.weights -ext_output <data/test.txt> result.txt
and want to generate confusion matrix with it.
How can i make one?
Related
I am training the Tacotron2 model using TensorflowTTS for a new language.
I managed to train the model (performed pre-processing, normalization, and decoded the few generated output files)
The files in the output directory are .npy files. Which makes sense as they are mel-spectograms.
I am trying to find a way to convert said files to a .wav file in order to check if my work has been fruitfull.
I used this :
melspectrogram = librosa.feature.melspectrogram(
"/content/prediction/tacotron2-0/paol_wavpaol_8-norm-feats.npy", sr=22050,
window=scipy.signal.hanning, n_fft=1024, hop_length=256)
print('melspectrogram.shape', melspectrogram.shape)
print(melspectrogram)
audio_signal = librosa.feature.inverse.mel_to_audio(
melspectrogram, sr22050, n_fft=1024, hop_length=256, window=scipy.signal.hanning)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
But it is given me this error : Audio data must be of type numpy.ndarray.
Although I am already giving it a numpy.ndarray file.
Does anyone know where the issue might be, and if anyone knows a better way to do it?
I'm not sure what your error is, but the output of a Tacotron 2 system are log Mel spectral features and you can't just apply the inverse Fourier transform to get a waveform because you are missing the phase information and because the features are not invertible. You can learn about why this is at places like Speech.Zone (https://speech.zone/courses/)
Instead of using librosa like you are doing, you need to use a vocoder like HiFiGan (https://github.com/jik876/hifi-gan) that is trained to reconstruct a waveform from log Mel spectral features. You can use a pre-trained model, and most off-the-shelf vocoders, but make sure that the sample rate, Mel range, FFT, hop size and window size are all the same between your Tacotron2 feature prediction network and whatever vocoder you choose otherwise you'll just get noise!
I download the following graph-cut code:
https://github.com/shaibagon/GCMex
I compiled the mex files, and ran it for pre-defined image in the code (which is rgb image)
I wanna optimize the image segmentation results,
I have probability map of the image, which its dimension is (width,height, 5). Five probability distribution over the image dimension are stacked together. each relates to one the classes.
My problem is which parts of code should according to the probability image.
I want to define Data and Smoothing terms based on my application.
My question is:
1) Has someone refined the code according to the defining different energy function (I wanna change Unary and pair-wise formulation).
2) I have a stack of 3D images. I wanna define 6-neighborhood system, 4 neighbors in current slice and the other two from two adjacent slices. In which function and part of code can I do the refinements?
Thanks
I am trying to do a Deep Learning project by using Tensorflow.
Each of my data sets contains 2 files( PNGimage file + TXTvectors file ), where are put in different folders as follow:
./data/image/ #Folders contains different size of images
./data/vector/ #Folders contains vectors of corresponding image
#For example: apple.png + apple.txt
The example content of vector shows as follow:
10.0,2.5,5,13
And since image size are different, the resize and some transformation apply on vectors are required. It is important to make sure that I can do these processing during Tensorflow is running. Is there any good way to manage this kind of datasets?
I referred to a lot of basic tutorial however most of them are not so many details about arrange customized data input and output. Please give me some advice!
I recommend you to take a look at TFRecords and queues. Basically the idea is the following: you resize all your images to the same format and store them together with your txt vectors in one TFRecord file. This is done separately before you run your model.
When you create your model you create a queue which reads data from the TFRecord file and feeds it to your model.
I'm using tensorboard (tensorflow 1.1.0) to show the result of my CNN classifier.
I added some output vector as tf.summary.histogram in order to show the counts of output in each bin, but tensorboard seems to automatically compute interpolation and show them as (somehow) smoothed distribution
(and therefore I can not find the exact counts for the bins).
Could someone tell me how can I avoid the interpolation and show usual histograms using bars?
I not sure that there is easy way to do it.
I very unsure in below text, correct me if I wrong.
From this file https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/histogram/vz_histogram_timeseries/index.html it seems that histogram comes to tensorboard in double values.
Summary op uses either histogram from https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/python/ops/histogram_ops.py (1) or https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/core/lib/histogram/histogram.cc (2)
I suppose that it uses 2nd because here https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/python/summary/summary.py#L189 it calls function from generated file. In my package code in this generated file there is another function call:
result = _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
name=name)
I have grep all repo and seems like there is no other python code which define something with "HistogramSummary", so it seems like it's really defined here https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/core/kernels/summary_op.cc and this code uses code mentioned above (2).
So, it seems to me that histogram which is used now is buried deep inside of framework and I not sure that it's easy to rewrite it.
In this page there is email for support https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/summary . I suppose that it's better to contact this person or make issue on github.
I have trained my custom object detector using faster_rcnn_inception_v2, tested it using object_detection_tutorial.ipynb and it works perfect, I can find bounding boxes for the objects inside the test image, my problem is how can I actually count the number of those bounding boxes or simply I want to count the number of objects detected for each class.
Because of low reputations I can not comment.
As far as I know the object detection API unfortunately has no built-in function for this.
You have to write this function by yourself. I assume you run the eval.py for evaluation!? To access the individual detected objects for each image you have to follow the following chain of scripts:
eval.py -> evaluator.py ->object_detection_evaluation.py -> per_image_evaluation.py
In the last script you can count the detected objects and bounding boxes per image. You just have to save the numbers and sum them up over your entire dataset.
Does this already help you?
I solved this using Tensorflow Object Counting API. We have an example of counting objects in an image using single_image_object_counting.py. I just replaced ssd_mobilenet_v1_coco_2017_11_17 with my own model containing inference graph
input_video = "image.jpg"
detection_graph, category_index = backbone.set_model(MODEL_DIR)