I have a data set containing pictures in 2 classes.
The dataset is organized in 1 folder that contains 2 folders, 1 for each class.
class 1 pictures path: "Project\Class_A"
class 2 pictures path: "Project\Class_B"
I want to read all the data (quickly).
splitting it into train, validation, test.
train a tensorflow model.
I saw that the image_dataset_from_directory function doing the trick nicely but I can not found how to split to a test dataset (only train, validation).
I want the data to be splitted randomly (and not just split it manually by moving some pictures to a different folder).
Do you have any idea for a quick way to read all the data and splitting it to train, validation, test datasets?
Thank you.
Related
I am building a CNN model to classify own dataset in python using tensorflow 2. How should i prepare my dataset in the directory to load it to the model.
Create the train dataset and test dataset, extract them into 2 different folders named as “train” and “test”. The train folder should contain ‘n’ folders each containing images of respective classes. For example, in the Dog vs Cats dataset, the train folder should have 2 folders, namely “Dogs” and “Cats” containing respective images inside them. The same should be done for the test folder.
Then, you should use tf.keras.preprocessing.image.ImageDataGenerator and its flow_from_directory function.
My situation is I have a excel file with 747 nodes (as input) each with a value (imagine 747 columns with floats) and I have an output of 741 values/columns with again floats. These are basically inputs and outputs of a geological simulation. So one row has 747(input)+741(output) = 1488 floats which is one dataset (from one simulation). I have 4 such datasets (rows) to train a neural network such that when I test them on 3 test datasets (747 columns) I get the output of 741 columns. This is just a simple run to get the skeleton of the neural network going before further modifications.
I have come across the Multi-Target Regression example of NYCTaxi (https://github.com/zeahmed/DeepLearningWithMLdotNet/tree/master/NYCTaxiMultiOutputRegression) but I can seem to wrap my head around it.
This is the training set (Input till and including column 'ABS', rest is output):
https://docs.google.com/spreadsheets/d/12TKVbGExt9KcK5RQKTexrToVo8qA5YfeItSaa7E2QdU/edit?usp=sharing
This is the test set:
https://docs.google.com/spreadsheets/d/1-RjyZsdguucCSOr9QTdTp2ehJBqWCr5yz1-aRjQ_4zo/edit?usp=sharing
This is the test Output (To validate) : https://docs.google.com/spreadsheets/d/10O_6711CEpJ4DN1w-kCmW01NikjFVZTDmNRuqO3U_6A/edit?usp=sharing
Any guidance/tips would be well appreciated. TIA!
We can use an AutoEncoder for this task. An AutoEncoder takes in the data, compresses it into a latent representation. Now, this representation vector is used to construct the output variable.
So, you can feed the 747-dimensional vector to the model and generate another 747-dimensional vector which is the output. After proper training, the model will be able to generate the target variables for a given set of inputs.
I want to train YOLO to only detect the class person. Therefor I downloaded the COCO dataset, adjusted the labels to only this class and changed the config files accordingly, I then trained YOLO by following the steps described in the section "Training YOLO on COCO" on this site https://pjreddie.com/darknet/yolo/.
But the mean average precision (map) with my trained weights for the class person is much worse than the map for the same class when I use the trained weights from the same page under the caption "Performance on the COCO Dataset". I was wondering what could be the reason for this, and which data was used to train the weights available at the homepage.
Probably there's something wrong when you modify the cfg file (classes, filters, etc). Anyway what's the purpose of your task? Do you really need to retrain the model, or you only need to filter 1 class and make detection?
If you want to filter the Person label only out of 80 classes, you can simply do this workaround method. You don't need to retrain the model, you just need to use the weight provided by the author on yolo website.
For easy and simple way using COCO dataset, follow these steps :
Modify (or copy for backup) the coco.names file in darknet\data\coco.names
Delete all other classes except person
Modify your cfg file (e.g. yolov3.cfg), change the 3 classes on line 610, 696, 783 from 80 to 1
Change the 3 filters in cfg file on line 603, 689, 776 from 255 to 18 (derived from (classes+5)x3)
Run the detector ./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/your_image.jpg
For more advance way using COCO dataset you can use this repo to create yolo datasets based on voc, coco or open images. https://github.com/holger-prause/yolo_utils .
Also refer to this : How can I download a specific part of Coco Dataset?
It would be much simpler, IMMO, to use the pretrained weights with the presupplied *.names, *.data and *.cfg. Then you run the detector for single images or for a list of image file names in the mode (I do not remember the details) where YOLO outputs a list of detections in the form of class name, confidence and bbox. You then simply ignore anything other than the "person" class.
I am training a UNET for semantic segmentation but I only have 200 labeled images. Given the small size of the dataset, it definitely needs some data augmentation techniques.
I have question about the test and the validation set.
I have custom data generator which keep feeding data from folder for training model.
So what I plan to do is:
do data augmentation for the training set and keep all of it in the same folder
"randomly" pick some of training data into test and validation set (of course, before training).
I am not sure if this is fine, since we just do some simple processing (flipping, transposing, adjusting brightness)
Would it be better to separate the data first and do the augmentation for the rest of data in the training folder?
I am interested in using Tensorflow for training my data for binary classification based on CNN.
Now I wonder about how to set the filter value, number of output nodes in the convolution process.
I have read many tutorials and example. However, most of them use image data and I cannot compare it with my data that is customer data, not pixel.
So could you suggest me about this issue?
If you data varies in time or space then you can use CNN,I am currently working with EEG data set which varies in time.Also you can refer to this paper
http://www.nlpr.ia.ac.cn/english/irds/People/lwang/M-MCG_EN/Publications/2015/YD2015ACPR.pdf
were the input data(Which is not an image) is presented as an image to the CNN.
You have to reshape the data to be 4d. In this example, I have only 4 column.
x_train = np.reshape(x_train, (x_train.shape[0],2, 2,1))
x_test = np.reshape(x_test, (x_test.shape[0],2,2, 1))
This is a good example to use none image data
https://github.com/fengjiqiang/LSTM-Wind-Speed-Forecasting
You just need to change the following :
prediction_cols
feature_cols
features
and dataload
This tutorial for text :
Here !
You might use one of following classes:
class Dataset: Represents a potentially large set of elements.
class FixedLengthRecordDataset: A Dataset of fixed-length records
from one or more binary files.
class Iterator: Represents the state of iterating through a Dataset.
class TFRecordDataset: A Dataset comprising records from one or more
TFRecord files.
class TextLineDataset: A Dataset comprising lines from one or more
text files.
Tutorial
official documentation