I want to train my word2vec models on the hpc cluster provided through my university. However, I have been told that in order to optimize storage on the cluster, I must transform my data into HDF5 and upload that data instead into the cluster. My data consists of txt files (basically the txt files I want to train word2vec on). How am I supposed to transform txt files into HDF5 ?
I am surfing the documentation but cannot seem to find a tool for txt files, or should I write a certain script ?
Related
I have tried multiple python scripts(like this one) but the output of the code block is empty and there is no "Yolo txt" file being created in the directory.
I have structured my images, annotations and the python script as shown here
I see in explanation a TFRecord contains multiple classes and multiple images (a cat and a bridge). When it was written, both images are written into one TFRecord. During the read back, it is verified that this TFRecord contains two images.
Elsewhere I have seen people generating one TFRecord per image, I know you can load multiple TFRecord files like this:
train_dataset = tf.data.TFRecordDataset("<Path>/*.tfrecord")
But which way is recommended? should I build one tfrecord per image, or one tfrecord for multiple images? If put multiple images into one tfrecord, then how many is maximum?
As you said, it is possible to save an arbitrary amount of entries in a single TFRecord file, and one can create as many TFRecord files as desired.
I would recommend using practical considerations to decide how to proceed:
On one hand, try to use fewer TFRecord files for easier handling moving files in the filesystem
On the other hand, avoid growing TFRecord files to a size that can become a problem for filesystem
Keep in mind that it is useful to keep separate TFRecord files for train / validation / test split
Sometimes the nature of the dataset makes it obvious how to split into separate files (for example, I have a video dataset where I use one TFRecord file per participant session)
I have trained Two models and generated their detect.tflite files successfully, I need to know that , Is there any way to merge both detect.tflite file so that resulting one detect file can be used in android/ios application?
I did quiet decent research on this and came to conclusion that two .tflite file cannot be merged, however one can combine datasets and retrain model and generate new .tflite file which can do job of both previous .tflite files
I am trying to classify wav files into different classes. However the number of sound files is far more than what I can load on my RAM (>10000 files). So the only optimum way I can input these files is in batches, by using a DataGenerator function (like the ImageGenerator Function & flow_from_directory). Can someone please help me with it? I have a custom spectrogram function that i would like to apply on each wav file as it is being processed.
I'm trying to deploy a trained model to Google Cloud but I'm having trouble with the file size. Google Cloud has a 250mb file size limit. I was able to quantise the .pb file into a smaller size. Still, I don't know how to reduce the file size of the .pbtxt, is it possible to quantise the .pbtxt as well? If so, then how or is there any other method do reduce the size?
Thanks
As noted by Bhupesh, the service accepts both .pb and .pbtxt files; the former is a binary format and stored much more efficiently on disk.