I'm using Random Forest in Weka 3.9 GUI. My dataset cotains 211965 instances with 95 attributes. I'm classifing a numerical value.
When I save the model it's size is 1 951 834 KB and it's way too big to load it in my Java application using Weka API.
Am I doing something wrong that causes the file to be that big?
Here is a classifier output from Weka so you can see the paramaters that I have used (I removed attributes list from it to make it shorter).
=== Run information ===
Scheme weka.classifiers.trees.RandomForest -P 100 -I 30 -num-slots 0 -K 0 -M 1.0 -V 0.001 -S 1 -depth 21
Relation all_cars_wo_cena300k3
Instances 211965
Attributes 95
Test mode evaluate on training data
=== Classifier model (full training set) ===
RandomForest
Bagging with 30 iterations and base learner
weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -depth 21 -do-not-check-capabilities
Time taken to build model 15.02 seconds
=== Evaluation on training set ===
Time taken to test model on training data 6.36 seconds
=== Summary ===
Correlation coefficient 0.9978
Mean absolute error 1532.7018
Root mean squared error 3087.1285
Relative absolute error 5.1246 %
Root relative squared error 6.9288 %
Total Number of Instances 211965
Related
I'm trying to understand the output of my textcat_multilabel job. I have 4 text categories and I'm using spacy version 3.2.0 (The methodologies have changed a lot recently and I don't really understand the documentation).
E
#
LOSS TEXTC...
CATS_SCORE
SCORE
0
0
1.00
51.86
0.52
0
200
122.15
52.90
0.53
This is what I have in my config file. (btw. What is v1?)
scorer = {"#scorers":"spacy.textcat_multilabel_scorer.v1"}
threshold = 0.5
In fact, everything in the standard config file is unchanged from the suggestions except the dropout which I increased to 0.5.
The final row of my job shows these values: 0 8400 2.59 87.29 0.87
I am very impressed with the results that I'm getting with this job. Just need to understand what I'm doing.
E is epochs
# is training iterations / batches (see here)
LOSS_TEXTCAT is the loss of your textcat component. Loss normally fluctuates the first few iterations and then trends downward. The exact values are meaningless.
SCORE_TEXTCAT is the score of your textcat component on your dev set, see the docs for some details on that.
SCORE is overall score of your pipeline, a weighted average of any components you have. But you only have a textcat so it's basically the same as that score.
v1 is just version 1, components are versioned in case they are updated later so you can use older versions with newer code.
I need to classify small images in 4 different categories, +1 "background" for false detection.
While training the loss quickly drop to 0.7, but stay there even after 800k steps. In the end, the frozen graph seems to classify most images with the background label.
I'm probably missing something, I'll detail the steps I used below, and any feedback is welcomed.
I'm new to tf-slim, so it can be an obvious mistake, maybe too little samples ? I'm not looking for top accuracy, just something working for prototyping.
Source materials can be found there : https://www.dropbox.com/s/k55xoygdzb2efag/TilesDataset.zip?dl=0
I used tensorflow-gpu 1.15.3 on windows 10.
I created the dataset using :
python ./createTfRecords.py --tfrecord_filename=tilesV2_40 --dataset_dir=.\tilesV2\Tiles_40
I added a dataset provider in models-master\research\slim\datasets based on the flowers provider.
I modified the mobilnet_v2.py in models-master\research\slim\nets\mobilenet, changed num_classes=5 and mobilenet.default_image_size = 40
I trained the net with : python ./models-master/research/slim/train_image_classifier.py --model_name "mobilenet_v2" --learning_rate 0.045 --preprocessing_name "inception_v2" --label_smoothing 0.1 --moving_average_decay 0.9999 --batch_size 96 --learning_rate_decay_factor 0.98 --num_epochs_per_decay 2.5 --train_dir ./weight --dataset_name Tiles_40 --dataset_dir .\tilesV2\Tiles_40
When I try this python .\models-master\research\slim\eval_image_classifier.py --alsologtostderr --checkpoint_path ./weight/model.ckpt-XXX --dataset_dir ./tilesV2/Tiles_40 --dataset_name Tiles_40 --dataset_split_name validation --model_name mobilenet_v2 I get eval/Recall_5[1]eval/Accuracy[1]
I then export the graph with python .\models-master\research\slim\export_inference_graph.py --alsologtostderr --model_name mobilenet_v2 --image_size 40 --output_file .\export\output.pb --dataset_name Tiles_40
And freeze it with freeze_graph --input_graph .\export\output.pb --input_checkpoint .\weight\model.ckpt-XXX --input_binary true --output_graph .\export\frozen.pb --output_node_names MobilenetV2/Predictions/Reshape_1
I then try the net with images from the dataset with python .\label_image.py --graph .\export\frozen.pb --labels .\tilesV2\Tiles_40\labels.txt --image .\tilesV2\Tiles_40\photos\lac\1_1.png --input_layer input --output_layer MobilenetV2/Predictions/Reshape_1. This is where I get wrong classifications.,
like 0:background 0.92839915 2:lac 0.020171663 1:house 0.019106707 3:road 0.01677236 4:start 0.0155500565 for a "lac" image of the dataset
I tried changing the depth_multiplier, the learning rate, learning on a cpu, removing --preprocessing_name "inception_v2" from the learning command. I don't have any idea left...
Change your learning rate, maybe start from the usual choice of 3e-5.
I am going through the training tutorial on retraining Inception's final layer after having installed Tensorflow for Ubuntu with regular CPU support. I successfully made the flower examples work however after switching to a new set of categories with ten sub-folders I cannot make Inception produce ten scores for each input image rather than the default five. My current command line to run a test image looks like this, working with headers labelled 0-9.
bazel build tensorflow/examples/label_image:label_image && \
bazel-bin/tensorflow/examples/label_image/label_image \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--output_layer=final_result \ --input_layer=Mul
--image=$HOME/Input/Example.jpg
Which produces as a result
5 (4): 0.642959
3 (2): 0.243444
9 (8): 0.0513504
4 (5): 0.0231318
6 (7): 0.0180509
However I cannot find anything in the programs that Inception runs to reconfigure how many output scores are produced so that all ten of my categories have scores rather than just five. How do I change this?
I tried with 8 categories and was able to get result for all of them.
If your code has below line
top_k = predictions[0].argsort()[-5:][::-1]
change it to
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
If code contains predictions = np.squeeze(predictions) then use predictions instead of predictions[0]
I have run this using following command instead of bazel and I found it easier.
python /path_to_file/label_image.py /path_to_image/image.jpeg
First make sure that graph is created after you run retrain.py and it is at the correct location. (default is inside /tmp/).
I've tried searching the internet for inputs on this one, but ineffectively.
I am using libSVM (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) and I've encountered this while training the SVM with rbf kernel.
If a feature contains very small numbers, like feature 15 in the following
0 1:4.25606e+07 2:4.2179e+07 3:5.1059e+07 4:7.72388e+06 5:7.72037e+06 6:8.87669e+06 7:4.40263e-06 8:0.0282494 9:819 10:2.34513e-05 11:21.5385 12:95.8974 13:179.117 14:9 15:6.91877e-310
libSVM will fail reading the file with the error code Wrong input at line <lineID>.
After some testing, I was able to confirm that changing such a small number to 0 appears to fix the error. i.e. this line is correctly read:
0 1:4.17077e+07 2:4.12838e+07 3:5.04597e+07 4:7.76011e+06 5:7.74881e+06 6:8.91813e+06 7:3.97472e-06 8:0.0284308 9:936 10:2.46506e-05 11:22.8714 12:100.969 13:186.641 14:17 15:0
Can anybody help me figure out why this is happening? My file contains a lot of number around that order of magnitude.
I am calling the SVM via terminal on Ubuntu like:
<path to>/svm-train -s 0 -t 2 -g 0.001 -c 100000 <path to features file> <path for output model file>
I am facing a problem with libsvm and I am hoping you can help me
When I use svm-train.exe with default parameters like this...
svm-train dikomou
svm-predict dikomou.t dikomou.model dikomou.t.predict
I get accuracy 84.72%
When I use scaling [-1, 1] ,and same scaling for training and testing file like this....
svm-scale -l -1 -u 1 -s range1 dikomou > dikomou.scale
svm-scale -r range1 dikomou.t > dikomou.t.scale
svm-train dikomou.scale
svm-predict dikomou.t.scale dikomou.scale.model dikomou.t.predict
I get less accuracy 81.94%
If I do the scaling for 0 to 1 I get accuracy 87.5%
So I keep the 0 to 1 scaling.
BUT when I use grid.py with the 0 to 1 scaled data like this
grid.py dikomou.scale
..
8 0.0078125 84 ,25
$ ./svm-train -c 8 -g 0.0078125 dikomou.scale
$ ./svm-predict dikomou.t.scale dikomou.scale.model dikomou.t.predict
I get cross validation rate 84.25% and total accuracy of 79.166% and best given c = 8 gamma= 0.0078125
So the grid.py gives me less!!! accuracy than the one using the svm train with defaults. So I have two questions.
How is this possible??
What are the default values of c and gamma the svm train uses?? (I cant find this clearly in the documentation. is gamma 1/number of features and c = 1?) and why the do better than when I use the grid.py?
easy.py also gives me worse results than the defaults. what can i do?