TensorFlow linear regresison task - very high loss problem - tensorflow

I'm trying to build a linear model on my own yield
# Create features
X = np.array([-7.0, -4.0, -1.0, 2.0, 5.0, 8.0, 11.0, 14.0])
# Create labels
y = np.array([3.0, 6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0])
model = tf.keras.Sequential([
tf.keras.layers.Dense(50, activation = "elu", input_shape = [1]),
tf.keras.layers.Dense(1)
])
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.01),
metrics = ["mae"])
model.fit(X, y, epochs = 150)
When I train with the above X and y data, the loss value starts from a normal value.
experience salary
0 0 2250
1 1 2750
2 5 8000
3 8 9000
4 4 6900
5 15 20000
6 7 8500
7 3 6000
8 2 3500
9 12 15000
10 10 13000
11 14 18000
12 6 7500
13 11 14500
14 12 14900
15 3 5800
16 2 4000
But when I use such a dataset, the initial loss value starts as 800.(same as above model btw)
What could be the reason for this?

Your learning rate is significantly high. You should opt for much lower initial learning rates, such as 0.0001 or 0.00001.
Otherwise you are using 'linear' activation on the last layer (default one) and the correct loss function and metric. Also note that the default batch_size in absence of explicit mentioning is 32.
UPDATING : as determined by the author of the question, underfitting was also fundamental to the problem. Adding multiple more layers helped solved the problem.

Related

knn classifiers troubleshooting

I'm exploring knn classifiers using some stock data - the features I'm using as to classify are the mean_return and volatility. My classifiers are labels 'green' and 'red' or 0 and 1 respectively.
This is my code so far, including training and testing:
year_one.loc[year_one['labels'] == 'green', 'label_two'] = 0
year_one.loc[year_one['labels'] == 'red', 'label_two'] = 1
X = year_one.iloc[:, 2:4] # features
y = year_one.iloc[:, -1] # label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state = 20)
And this is what my dataframe looks like this...
Year Week_Number mean_return volatility labels label_two
159 2020 1 1.57500 0.738242 green 0
160 2020 2 1.21760 0.672509 green 0
161 2020 3 -0.20475 3.040763 red 1
162 2020 4 -2.10100 3.879057 red 1
163 2020 5 0.35420 5.266582 green 0
164 2020 6 0.57760 1.611520 green 0
165 2020 7 -0.49050 3.277057 red 1
166 2020 8 -1.11040 3.086351 red 1
167 2020 9 -0.31020 4.117689 red 1
168 2020 10 -4.88960 12.424480 red 1
When I try run the knn classifier on sklearn, I get an error that says 'ValueError: Unknown label type: 'unknown'
classifier = KNeighborsClassifier(n_neighbors = 3)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
Any idea what the error is and what I'm doing wrong? Thanks.

Trying to understand shuffle within mini-batch in tensorflow Dataset

From here I understand what shuffle, batch and repeat do. I'm working on Medical image data where each mini-batch has slices from one patient record. I'm looking for a way to shuffle within the minibatch while training. I cannot increase the buffer size because I don't want slices from different records to get mixed up. Could someone please explain how this can be done?
dataset = tf.data.Dataset.from_tensor_slices(tf.range(1, 20))
data = dataset.batch(5).shuffle(5).repeat(1)
for element in data.as_numpy_iterator():
print(element)
Current Output :
[ 6 7 8 9 10]
[1 2 3 4 5]
[11 12 13 14 15]
[16 17 18 19]
Expected Output :
[ 6 8 9 7 10]
[3 4 1 5 2]
[15 12 11 14 13]
[16 17 19 20 17]
I just realized, there is no need to shuffle within the mini-batch as shuffling within the minibatch doesn't contribute to improving training in any way. Appretiate if anyone has other views on this.

Internal node predictions of xgboost model

Is it possible to calculate the internal node predictions of an xgboost model? The R package, gbm, provides a prediction for internal nodes of each tree.
The xgboost output, however only shows predictions for the final leaves of the model.
xgboost output:
Notice that the Quality column has the final prediction for the leaf node in row 6. I would like that value for each of the internal nodes as well.
Tree Node ID Feature Split Yes No Missing Quality Cover
1: 0 0 0-0 Sex=female 0.50000 0-1 0-2 0-1 246.6042790 222.75
2: 0 1 0-1 Age 13.00000 0-3 0-4 0-4 22.3424225 144.25
3: 0 2 0-2 Pclass=3 0.50000 0-5 0-6 0-5 60.1275253 78.50
4: 0 3 0-3 SibSp 2.50000 0-7 0-8 0-7 23.6302433 9.25
5: 0 4 0-4 Fare 26.26875 0-9 0-10 0-9 21.4425507 135.00
6: 0 5 0-5 Leaf NA <NA> <NA> <NA> 0.1747126 42.50
R gbm output:
In the R gbm package output, the prediction column contains values for both leaf nodes (SplitVar == -1) and the internal nodes. I would like access to these values from the xgboost model
SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction
0 1 0.000000000 1 8 15 32.564591 445 0.001132514
1 2 9.500000000 2 3 7 3.844470 282 -0.085827382
2 -1 0.119585850 -1 -1 -1 0.000000 15 0.119585850
3 0 1.000000000 4 5 6 3.047926 207 -0.092846157
4 -1 -0.118731665 -1 -1 -1 0.000000 165 -0.118731665
5 -1 0.008846912 -1 -1 -1 0.000000 42 0.008846912
6 -1 -0.092846157 -1 -1 -1 0.000000 207 -0.092846157
Question:
How do I access or calculate predictions for the internal nodes of an xgboost model? I would like to use them for a greedy, poor man's version of SHAP scores.
The solution to this problem is to dump the xgboost json object with all_stats=True. That adds the cover statistic to the output which can be used to distribute the leaf points through the internal nodes:
def _calculate_contribution(node: AnyNode) -> float32:
if isinstance(node, Leaf):
return node.contrib
else:
return (
node.left.cover * Node._calculate_contribution(node.left)
+ node.right.cover * Node._calculate_contribution(node.right)
) / node.cover
The internal contribution is the weighted average of the child contributions. Using this method, the generated results exactly match those returned when calling the predict method with pred_contribs=True and approx_contribs=True.

Converting .tflite to .pb

Problem: How can i convert a .tflite (serialised flat buffer) to .pb (frozen model)? The documentation only talks about one way conversion.
Use-case is: I have a model that is trained on converted to .tflite but unfortunately, i do not have details of the model and i would like to inspect the graph, how can i do that?
I found the answer here
We can use Interpreter to analysis the model and the same code looks like following:
import numpy as np
import tensorflow as tf
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
Netron is the best analysis/visualising tool i found, it can understand lot of formats including .tflite.
I don't think there is a way to restore tflite back to pb as some information are lost after conversion. I found an indirect way to have a glimpse on what is inside tflite model is to read back each of the tensor.
interpreter = tf.contrib.lite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
# trial some arbitrary numbers to find out the num of tensors
num_layer = 89
for i in range(num_layer):
detail = interpreter._get_tensor_details(i)
print(i, detail['name'], detail['shape'])
and you would see something like below. As there are only limited of operations that are currently supported, it is not too difficult to reverse engineer the network architecture. I have put some tutorials too on my Github
0 MobilenetV1/Logits/AvgPool_1a/AvgPool [ 1 1 1 1024]
1 MobilenetV1/Logits/Conv2d_1c_1x1/BiasAdd [ 1 1 1 1001]
2 MobilenetV1/Logits/Conv2d_1c_1x1/Conv2D_bias [1001]
3 MobilenetV1/Logits/Conv2d_1c_1x1/weights_quant/FakeQuantWithMinMaxVars [1001 1 1 1024]
4 MobilenetV1/Logits/SpatialSqueeze [ 1 1001]
5 MobilenetV1/Logits/SpatialSqueeze_shape [2]
6 MobilenetV1/MobilenetV1/Conv2d_0/Conv2D_Fold_bias [32]
7 MobilenetV1/MobilenetV1/Conv2d_0/Relu6 [ 1 112 112 32]
8 MobilenetV1/MobilenetV1/Conv2d_0/weights_quant/FakeQuantWithMinMaxVars [32 3 3 3]
9 MobilenetV1/MobilenetV1/Conv2d_10_depthwise/Relu6 [ 1 14 14 512]
10 MobilenetV1/MobilenetV1/Conv2d_10_depthwise/depthwise_Fold_bias [512]
11 MobilenetV1/MobilenetV1/Conv2d_10_depthwise/weights_quant/FakeQuantWithMinMaxVars [ 1 3 3 512]
12 MobilenetV1/MobilenetV1/Conv2d_10_pointwise/Conv2D_Fold_bias [512]
13 MobilenetV1/MobilenetV1/Conv2d_10_pointwise/Relu6 [ 1 14 14 512]
14 MobilenetV1/MobilenetV1/Conv2d_10_pointwise/weights_quant/FakeQuantWithMinMaxVars [512 1 1 512]
15 MobilenetV1/MobilenetV1/Conv2d_11_depthwise/Relu6 [ 1 14 14 512]
16 MobilenetV1/MobilenetV1/Conv2d_11_depthwise/depthwise_Fold_bias [512]
17 MobilenetV1/MobilenetV1/Conv2d_11_depthwise/weights_quant/FakeQuantWithMinMaxVars [ 1 3 3 512]
18 MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Conv2D_Fold_bias [512]
19 MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Relu6 [ 1 14 14 512]
20 MobilenetV1/MobilenetV1/Conv2d_11_pointwise/weights_quant/FakeQuantWithMinMaxVars [512 1 1 512]
I have done this with TOCO, using tf 1.12
tensorflow_1.12/tensorflow/bazel-bin/tensorflow/contrib/lite/toco/toco --
output_file=coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.pb --
output_format=TENSORFLOW_GRAPHDEF --input_format=TFLITE --
input_file=coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite --
inference_type=FLOAT --input_type=FLOAT --input_array="" --output_array="" --
input_shape=1,450,450,3 --dump_grapHviz=./
(you can remove the dump_graphviz option)

Keras (tf backend) memory allocation problems

I am using Keras with Tensorflow backend.
I am facing a batch size limitation due to high memory usage
My data is composed of 4 1D signals treated with a sample size of 801 for each channel. Global sample size is 3204
Input data:
4 channels of N 1D signals of length 7003
Input generated by applying a sliding window on 1D signals
Give input data shape (N*6203, 801, 4)
N is the number of signals used to build one batch
My Model:
Input 801 x 4
Conv2D 5 x 1, 20 channels
MaxPooling 2 x 1
Conv2D 5 x 1, 20 channels
MaxPooling 2 x 1
Conv2D 5 x 1, 20 channels
MaxPooling 2 x 1
Conv2D 5 x 1, 20 channels
Flatten
Dense 2000
Dense 5
With my GPU (Quadro K6000, 12189 MiB) i can fit only N=2 without warning
With N=3 I get a ran out of memory warning
With N=4 I get a ran out of memory error
It sound like batch_size is limitated by the space used by all tensors.
Input 801 x 4 x 1
Conv 797 x 4 x 20
MaxPooling 398 x 4 x 20
Conv 394 x 4 x 20
MaxPooling 197 x 4 x 20
Conv 193 x 4 x 20
MaxPooling 96 x 4 x 20
Conv 92 x 4 x 20
Dense 2000
Dense 5
With a 1D signal of 7001 with 4 channels -> 6201 samples
Total = N*4224 MiB.
N=2 -> 8448 MiB fit in GPU
N=3 -> 12672 MiB work but warning: failed to allocate 1.10 GiB then 3.00 GiB
N=4 -> 16896 MiB fail, only one message: failed to allocate 5.89 GiB
Does it work like that ? Is there any way to reduce the memory usage ?
To give a time estimation: 34 batch run in 40s and I got N total = 10^6
Thank you for your help :)
Example with python2.7: https://drive.google.com/open?id=1N7K_bxblC97FejozL4g7J_rl6-b9ScCn