How to get same model and prediction in scikit-learn 0.17.1 and 0.22.1? - pandas

My current code is written on python2.7 using scikit-learn 0.17.1 . I am trying to move to latest versions of python3.7 and scikit-learn 0.22.1 . For the exact same feature set, my model object after calling .fit is different. The values of coeff_ are not matching at all. As a result my predictions are also not matching. Is this expected?
This is a big code rewrite exercise and I need to test finally that my changes are working perfectly.
My initial test plan was to compare predictions for exact same values between old code and new code. But if predictions are going to be different, I will have to check that predictions accuracy has improved in the new code. I would still be not sure if I have missed some transformations.
If you have faced similar problem, please advice what you did in such a scenario.
here is the code I am using which is same in 0.17.1 and 0.22.1 -
model = ElasticNetCV(l1_ratio=[0.5, 0.75, 0.95, 0.99, 1.0], normalize=True, n_jobs=1)
model.fit(features.values, targets.values)

Related

'use_multiprocessing=True' in Mask RCNN with Keras 2.x & Tensorflow 2.x

I am using Keras 2.9.0 and Tensorflow 2.9.2 and already managed to make the necessary changes to compile the Mask-RCNN model (there are many compatability issues as it is a 2017 model).
The code is running on Colab with GPU.
I get now the following warning:
WARNING:tensorflow:Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the `keras.utils.Sequence` class.
This comes from the following lines in the model.py file:
self.keras_model.fit(
train_generator,
initial_epoch=self.epoch,
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=workers,
use_multiprocessing=True,
)
The problem is that the training is stuck at Epoch 1/XXX before even starting to really train.
I am pretty sure this is due tot he multiprocessing warning (from other threads here).
this thread is referring here but its a very different approach to generating the data than in Mask RCNN and therefore I'd like to avoid making such a big change (potentially will create many other issues).
Moreover, if I set use_multiprocessing=False (default) there is the following error:
RuntimeError: Your generator is NOT thread-safe.Keras requires a thread-safe generator when use_multiprocessing=False, workers > 1
as far as I understand, the solutions suggested here are not directly the mask-rcnn model.
Question: is there a way to resolve the issue with Mask-RCNN? preferably keep the option to run with multiprocessing (to be faster) ?
EDIT:
even if I reduce the original amount of workers (12) to 1 (as hinted in the warning message), the model is still stuck at the same stage.

Function CNN model written in Keras 2.2.4 not learning in TensorFlow or Keras 2.4

I am dealing with an object detection problem and using a model which is actually functioning (its results have been published on a paper and I have the original code). Originally, the code was written with Keras 2.2.4 without importing TensorFlow and trained and tested on the same dataset that I am using at the moment. However, when I try to run the same model with TensorFlow 2.x it just won't learn a thing.
I have tried importing everything from TensorFlow 2.4, but I have the same problem if I import everything (layers, models, optimizers...) from Keras 2.4. And I have tried to do so on two different devices, both using a GPU. Namely, what is happening is that the loss function decreases ridiculously fast, but the accuracy won't increase a bit (or, if it does, it gets stuck around 10% or smth). Also, every now and then this happens from an epoch to the next one:
Loss undergoes HUGE jumps between consecutive epochs, and all this without any changes in accuracy
I have tried to train the network on another dataset (had to change the last layers in order to match the required dimensions) and the model seemed to be learning in a normal way, i.e. the accuracy actually increases and the loss doesn't reach 0.0x in one epoch.
I can't post the script, but the model is an Encoder-Decoder network: consecutive Convolutions with increasing number of filters reduce the dimensions of the image, and a specular path of Transposed Convolutions restores the original dimensions. So basically the network only contains:
Conv2D
Conv2DTranspose
BatchNormalization
Activation("relu")
Activation("sigmoid")
concatenate
6 is used to put together outputs from parallel paths or distant layers; 3 and 4 are used after every Conv or ConvTranspose; 5 is only used as final activation function, i.e. as output layer.
I think the problem is pretty generic and I am honestly surprised that I couldn't find a single question about it. What could be happening here? The problem must have something to do with TF/Keras versions, but I can't find any documentation about it and I have been trying to change so many things but nothing changes. It's crazy because if I didn't know that the model works I would try to rewrite it from scratch so I am afraid that this problem may occurr with a new network and I won't be able to understand whether it's the libraries or the model itself.
Thank you in advance! :)
EDIT
Code snippets:
Convolutional block:
encoder1 = Conv2D(filters=first_layer_channels, kernel_size=2, strides=2)(input)
encoder1 = BatchNormalization()(encoder1)
encoder1 = Activation('relu')(encoder1)
Decoder
decoder1 = Conv2DTranspose(filters=first_layer_channels, kernel_size=2, strides=2)(encoder4)
decoder1 = BatchNormalization()(decoder1)
decoder1 = Activation('relu')(decoder1)
Final layers:
final = Conv2D(filters=total, kernel_size=1)(decoder4)
final = BatchNormalization()(final)
Last_Conv = Activation('sigmoid')(final)
The task is human pose estimation: the network (which, I recall, works on this specific task with Keras 2.2.4) has to predict twenty binary maps containing the positions of specific keypoints.

Outputting multiple loss components to tensorboard from tensorflow estimators

I am pretty new to tensorflow and I am struggling to get tensorboard to display some of my custom metrics. The model I am working with is a tf.estimator.Estimator, with an associated EstimatorSpec. The first new metric I am trying to log is from my loss function, which is composed of two components: a loss for an age prediction (tf.float32) and a loss for a class prediction (one-hot/multiclass), which I add together to determine a total loss (my model is predicting both a class and an age). The total loss is output just fine during training and shows up on tensorboard, but I would like to track the individual age and the class prediction loss components as well.
I think a solution that is supposed to work is to add a eval_metric_ops argument to the EstimatorSpec as described here (Custom eval_metric_ops in Estimator in Tensorflow). I have not been able to make this approach work, however. I defined a custom metric function that looks like this:
def age_loss_function(labels, ages_pred, ages_true):
per_sample_age_loss = get_age_loss_per_sample(ages_pred, ages_true) ### works fine
#### The error happens on this line:
mean_abs_age_diff, age_loss_update_fn = tf.metrics.Mean(per_sample_age_loss)
######
return mean_abs_age_diff, age_loss_update_fn
eval_metric_ops = {"age_loss": age_loss_function} #### Want to use this in EstimatorSpec
The instructions seem to say that I need both the error metric and the update function which should both be returned from the tf.metrics command as in examples like the one I linked. But this command fails for me with the error message:
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
I am probably just misusing the APIs. If someone can guide me on the proper usage I would really appreciate it. Thanks!
It looks like the problem was from a version change. I had updated to tensorflow 2.0 while the instructions I was following were from 1.X. Using tf.compat.v1.metrics.mean() instead gets past this problem.

Training multiple Keras models in one script

I want to train different Keras models (or in some cases just multiple runs of the same model to compare the results) in a queue (using TensorFlow as the backend if that matters). In my current setup I create and fit all of these models in one big python script, e.g. (in a simplified way):
for i in range(10):
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
The create_model(i) function creates the specific model for the i'th run. This includes changing the number of inputs / labels for example. The compile function can be different (e.g. different optimizer) for each run as well.
While this code works for me and I have not found any problems, I am unclear if this is the correct way to do it because all of the models reside in the same TensorFlow Graph (if I understand the way Keras / TensorFlow work together correctly). My questions are:
is this the correct way to run multiple independent models. (I do not want any influence of the i'th run on the i+1'th run)
is running the models from different python scripts (in this example model1.py, model2.py, ... model9.py) in any way better technically speaking (I am not referring to readability / reproducibility here) because each model would then have its own separate TensorFlow Graph / Session?
Does clearing the Session / deleting the Graph via keras.backend.clear_session() have any influence in this case if it is run after the save function (some_function_to_save_model() inside the for loop)? Is this in some way beneficial compared to the current setup?
Once again: I am not concerned with the problems that might arise due to creating messy code if all models are cramped together in one script instead of a single script per model only with creating & training models independently.
Unfortunately I did not find a concise answer to this (only suggestions using both methods). Maybe someone here can enlighten me?
Edit: Maybe I should be more precise. Basically I would like to have a technical explanation regarding the differences (advantages & disadvantages) of the following three cases:
create_and_train.py:
for i in range(10):
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
create_and_train.py:
for i in range(10):
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
# clear session:
keras.backend.clear_session()
create_and_train_i.py with i in [0, 1, ..., 9]:
i = 5 # (e.g.)
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
and e.g. a bash script that loops through these

Why does the LSTM cell unit need the reuse flag?

I was looking at the tensorflow LSTM tutorial and it had the line of code:
def lstm_cell():
# With the latest TensorFlow source code (as of Mar 27, 2017),
# the BasicLSTMCell will need a reuse parameter which is unfortunately not
# defined in TensorFlow 1.0. To maintain backwards compatibility, we add
# an argument check here:
if 'reuse' in inspect.getargspec(tf.contrib.rnn.BasicLSTMCell.__init__).args:
return tf.contrib.rnn.BasicLSTMCell(
size, forget_bias=0.0, state_is_tuple=True,
reuse=tf.get_variable_scope().reuse)
else:
return tf.contrib.rnn.BasicLSTMCell(
size, forget_bias=0.0, state_is_tuple=True)
which seems confusing to me. It seems there is an issue with the re-use flag. Why is it important. Why do we need it? I know RNNs always share parameters as we produce new states, so why would it be randomly be decided to remove such an important flag...which makes me think I might not understand what is missing. I understand very well Chris's blog post on LSTM so I find this piece of code in the tutorial really mysterious.