Python 3.8.8 Jupyter notebook kernel dies when I call model.fit() when I try to use my GPU - tensorflow

My tensorflow recognizes my gpu
However, when I call model.fit() on my data it shows:
epoch(1/2) and then the kernel dies immediately
If I run this in a separate virtual environment with no GPU it works fine:
I have simplified the model architecture and number of training points to only ten as a quick test and it still fails
Simple example
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
model = keras.Sequential()
model.add(Dense(4,
activation='relu'))
model.add(Dense(1, activation='sigmoid'))
opt = keras.optimizers.Adam(learning_rate=.001)
model.compile(loss = 'binary_crossentropy' , optimizer = opt, metrics = ['accuracy'] )
info = model.fit(X_train, y_train, epochs=2, batch_size=2,shuffle=True, verbose=1)
versions:
Python 3.8.8
Num GPUs Available 1
2.5.0-dev20210227
2.4.3
cuda v11.2

I am going to answer my own question rather than deleting this because maybe someone else will be making the same simple mistake I was.
The main mistake I made was having the incorrect CUDA download. you can refer to the what versions are correct at this link:
https://www.tensorflow.org/install/source#gpu
TLDR: Just follow this video:
https://www.youtube.com/watch?v=hHWkvEcDBO0
This also highlighted the importance of a virtual environment where you control the package versions to prevent incompatibilities.

I had the same problem. I transferred the code into a python file and found the root cause. In my case it was copying cudnn dll files into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin. Check the following link as well:
Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found

Related

KeyError: 'The optimizer cannot recognize variable dense_1/kernel:0. for pretrained keras model VGG19

I'm using the following code to load an imagenet pre-trained VGG19 model and fit to my custom dataset.
from keras.applications.vgg19 import VGG19
optim = tf.keras.optimizers.RMSprop(momentum=0.9)
vgg19 = VGG19(include_top=False, weights='imagenet', input_tensor=tf.keras.layers.Input(shape=(224, 224, 3)))
vgg19.trainable = False
# x = keras.layers.GlobalAveragePooling2D()(model_vgg19_pt.output)
x = keras.layers.Flatten()(vgg19.output)
output = keras.layers.Dense(n_classes, activation='softmax')(x)
model_vgg19_pt = keras.models.Model(inputs=[vgg19.input], outputs=[output])
model_vgg19_pt.compile(optimizer=optim,
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
model_vgg19_pt.fit(x_train, y_train, batch_size=20,
epochs=50, callbacks=[callback]
)
on model.fit() line, I get the following error
KeyError: 'The optimizer cannot recognize variable dense_1/kernel:0. This usually means you are trying to call the optimizer to update different parts of the model separately. Please call optimizer.build(variables) with the full list of trainable variables before the training loop or use legacy optimizer `tf.keras.optimizers.legacy.{self.class.name}.'
What does it mean and how can I fix it?
I get the same errors for
keras.applications.inception_v3
too, when using the same implementation method.
Additionally, this was working with jupyter notebook file on tensorflow cpu, but when running on a remote machine with tensorflow-gpu installed, I'm getting these errors.
This works fine with optimizer SGD, but not with RMSprop. why?
Additional
Using this:
model_vgg19_pt.compile(optimizer=tf.keras.optimizers.RMSprop(momentum=0.9),
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
instead as used above works. But can somebody explain why....
Which version of Tensorflow GPU have you installed? TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Please check the link to install TensorFlow by following all the Hardware/Software requirements for the GPU support.
The LearningRateScheduler arguments in callback is not defined which you are passing while model compilation.
I was able to train the model after removing the callback from model.fit(). (Attaching the gist here for your reference)

Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib. ----- Library not loaded: #rpath/libiomp5.dylib

/Users/dozieanazia/.conda/envs/Exercise Files/bin/python" "/Users/dozieanazia/Dropbox/My Mac (Dozie’s MacBook Pro)/Downloads/Ex_Files_Building_Deep_Learning_Apps/Exercise Files/03/create_model.py"
INTEL MKL ERROR: dlopen(/Users/dozieanazia/.conda/envs/Exercise Files/lib/libmkl_intel_thread.dylib, 9): Library not loaded: #rpath/libiomp5.dylib
Referenced from: /Users/dozieanazia/.conda/envs/Exercise Files/lib/libmkl_intel_thread.dylib
Reason: image not found.
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.
Process finished with exit code 2
This was the full error I received. I've scoured StackOverflow for the right answer, but nothing I've inputted has worked. These particulars lessons are from Linkedin Learning course on Keras.
I don't know if it has something to do with my hardware since I'm using a Macbook from 2012.
The interesting thing about it, When I'm in the pre-processing data file, the python environment is in python 3.10 and it runs(I did have this error until tinkered with it, but finally got it work). When I move to create_model file, this error comes back and now I'm stuck again. Since I'm Python 3.10, the whole file has problems. I switch python to 3.7 and now I don't have errors until I run it. This when I get the error listed above.
Please help.
This is the code file below:
import pandas as pd
from keras.models import Sequential
from keras.layers import *
training_data_df = pd.read_csv("sales_data_training_scaled.csv")
X = training_data_df.drop('total_earnings', axis=1).values
Y = training_data_df[['total_earnings']].values
// Define the model
model = Sequential()
model.add(Dense(50, input_dim=9, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss="mean_squared_error", optimizer="adam")

How to use the efficientnet-lite provided by tfhub for the second training on tf2.1

The version I use is tensorflow-gpu version 2.1.0, installed from pip.
import tensorflow as tf
import tensorflow_hub as hub
tf.keras.backend.set_learning_phase(True)
module_url = "https://tfhub.dev/tensorflow/efficientnet/lite0/classification/2"
module2 = tf.keras.Sequential([
hub.KerasLayer(module_url, trainable=False, input_shape=(224,224,3))])
output1 = module2(tf.ones(shape=(1,224,224,3)))
print(module2.summary())
When I set trainable = True, the operation will give an error.
So, can't I retrain it on tf2.1 version?
The EfficientNet-Lite models on TFHub are based on TensorFlow 1, and thus are subject to many restrictions on TF2 including fine-tuning as you've discovered. The EfficientNet models were updated to TF2 but we're still waiting for their lite counterparts.
https://www.tensorflow.org/hub/model_compatibility
https://github.com/tensorflow/hub/issues/751
UPDATE: Beginning October 5, 2021, the EfficientNet-Lite models on TFHub are available for TensorFlow 2.

How to run TensorFlow for SSE4.1 SSE4.2 AVX AVX2 FMA on Python with Spyder on MacOs

I am trying to run the code:
from keras.datasets import imdb as im
from keras.preprocessing import sequence as seq
from keras.models import Sequential
from keras.layers import Embedding
from keras.layers import LSTM
from keras.layers import Dense
train_set, test_set = im.load_data(num_words = 10000)
X_train, y_train = train_set
X_test, y_test = test_set
X_train_padded = seq.pad_sequences(X_train, maxlen = 100)
X_test_padded = seq.pad_sequences(X_test, maxlen = 100)
model = Sequential()
model.add(Embedding(input_dim=10000, output_dim=128))
model.add(LSTM(units=128))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
scores = model.fit(X_train_padded,y_train)
When I run the code, it gives me a message:
I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
I don't understand what the issue is and what I am supposed to do next. I installed the "tenserflow" package (1.14.0) but that doesn't solve the issue.
I have looked at this reference but I don't know what I am looking for:
https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
Can someone please help me. Thanks.
my config: osx-64, MacOS Mojave v.10.14.6, Python 3.7 with Spyder with Anaconda, conda version : 4.7.12
You can ignore the message and everything will work fine.
As far as I can gather from https://github.com/tensorflow/tensorflow/pull/24782/commits/7faefa4bb665e115cc744d7895a407338624993f, when TensorFlow is compiled with MKL-DNN support (which it is, according to your message), MKL-DNN will take care of using all available CPU performance features. So it doesn't matter that TensorFlow wasn't compiled to use them.
This might not be answering the exact question you have put, but I had a very similar error message when running a similar task.
In addition to the error message above, I also had the following error message:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Error was solved with:
conda install nomkl
This is as per this stackoverflow post

Keras / tensorflow - limit number of cores (intra_op_parallelism_threads not working)

I've been trying to run keras on a CPU cluster, and for this I need to limit the number of cores used (it's a shared system). So to limit the number of cores, I landed on this answer. However, this simply doesn't work. I tried running with this basic code:
from keras.applications.vgg16 import VGG16
from keras import backend as K
import numpy as np
conf = K.tf.ConfigProto(device_count={'CPU': 1},
intra_op_parallelism_threads=2,
inter_op_parallelism_threads=2)
K.set_session(K.tf.Session(config=conf))
model = VGG16(weights='imagenet', include_top=False)
x = np.random.randn(1000, 224, 224, 3)
features = model.predict(x)
When I run this and check htop, it uses all (128) logical cores. Is this a bug in keras? Or am I doing something wrong?
Keras says that my CPU supports SSE4.1 and SSE4.2, which are not used because I didn't compile from binary. Will compiling from binary also fix the original question?
EDIT: I've found a workaround when launching the keras script from a unix machine:
taskset -c 0-23 python keras_script.py
This will run the script on the first 24 cores of the machine. It works, but it would still be nice if this was available from within keras/tensorflow.
I found this snippet of code that works for me, hope it helps:
from keras import backend as K
import tensorflow as tf
jobs = 2 # it means number of cores
config = tf.ConfigProto(intra_op_parallelism_threads=jobs,
inter_op_parallelism_threads=jobs,
allow_soft_placement=True,
device_count={'CPU': jobs})
session = tf.Session(config=config)
K.set_session(session)