Keras / tensorflow - limit number of cores (intra_op_parallelism_threads not working) - tensorflow

I've been trying to run keras on a CPU cluster, and for this I need to limit the number of cores used (it's a shared system). So to limit the number of cores, I landed on this answer. However, this simply doesn't work. I tried running with this basic code:
from keras.applications.vgg16 import VGG16
from keras import backend as K
import numpy as np
conf = K.tf.ConfigProto(device_count={'CPU': 1},
intra_op_parallelism_threads=2,
inter_op_parallelism_threads=2)
K.set_session(K.tf.Session(config=conf))
model = VGG16(weights='imagenet', include_top=False)
x = np.random.randn(1000, 224, 224, 3)
features = model.predict(x)
When I run this and check htop, it uses all (128) logical cores. Is this a bug in keras? Or am I doing something wrong?
Keras says that my CPU supports SSE4.1 and SSE4.2, which are not used because I didn't compile from binary. Will compiling from binary also fix the original question?
EDIT: I've found a workaround when launching the keras script from a unix machine:
taskset -c 0-23 python keras_script.py
This will run the script on the first 24 cores of the machine. It works, but it would still be nice if this was available from within keras/tensorflow.

I found this snippet of code that works for me, hope it helps:
from keras import backend as K
import tensorflow as tf
jobs = 2 # it means number of cores
config = tf.ConfigProto(intra_op_parallelism_threads=jobs,
inter_op_parallelism_threads=jobs,
allow_soft_placement=True,
device_count={'CPU': jobs})
session = tf.Session(config=config)
K.set_session(session)

Related

Python 3.8.8 Jupyter notebook kernel dies when I call model.fit() when I try to use my GPU

My tensorflow recognizes my gpu
However, when I call model.fit() on my data it shows:
epoch(1/2) and then the kernel dies immediately
If I run this in a separate virtual environment with no GPU it works fine:
I have simplified the model architecture and number of training points to only ten as a quick test and it still fails
Simple example
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
model = keras.Sequential()
model.add(Dense(4,
activation='relu'))
model.add(Dense(1, activation='sigmoid'))
opt = keras.optimizers.Adam(learning_rate=.001)
model.compile(loss = 'binary_crossentropy' , optimizer = opt, metrics = ['accuracy'] )
info = model.fit(X_train, y_train, epochs=2, batch_size=2,shuffle=True, verbose=1)
versions:
Python 3.8.8
Num GPUs Available 1
2.5.0-dev20210227
2.4.3
cuda v11.2
I am going to answer my own question rather than deleting this because maybe someone else will be making the same simple mistake I was.
The main mistake I made was having the incorrect CUDA download. you can refer to the what versions are correct at this link:
https://www.tensorflow.org/install/source#gpu
TLDR: Just follow this video:
https://www.youtube.com/watch?v=hHWkvEcDBO0
This also highlighted the importance of a virtual environment where you control the package versions to prevent incompatibilities.
I had the same problem. I transferred the code into a python file and found the root cause. In my case it was copying cudnn dll files into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin. Check the following link as well:
Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found

How to run TensorFlow for SSE4.1 SSE4.2 AVX AVX2 FMA on Python with Spyder on MacOs

I am trying to run the code:
from keras.datasets import imdb as im
from keras.preprocessing import sequence as seq
from keras.models import Sequential
from keras.layers import Embedding
from keras.layers import LSTM
from keras.layers import Dense
train_set, test_set = im.load_data(num_words = 10000)
X_train, y_train = train_set
X_test, y_test = test_set
X_train_padded = seq.pad_sequences(X_train, maxlen = 100)
X_test_padded = seq.pad_sequences(X_test, maxlen = 100)
model = Sequential()
model.add(Embedding(input_dim=10000, output_dim=128))
model.add(LSTM(units=128))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
scores = model.fit(X_train_padded,y_train)
When I run the code, it gives me a message:
I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
I don't understand what the issue is and what I am supposed to do next. I installed the "tenserflow" package (1.14.0) but that doesn't solve the issue.
I have looked at this reference but I don't know what I am looking for:
https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
Can someone please help me. Thanks.
my config: osx-64, MacOS Mojave v.10.14.6, Python 3.7 with Spyder with Anaconda, conda version : 4.7.12
You can ignore the message and everything will work fine.
As far as I can gather from https://github.com/tensorflow/tensorflow/pull/24782/commits/7faefa4bb665e115cc744d7895a407338624993f, when TensorFlow is compiled with MKL-DNN support (which it is, according to your message), MKL-DNN will take care of using all available CPU performance features. So it doesn't matter that TensorFlow wasn't compiled to use them.
This might not be answering the exact question you have put, but I had a very similar error message when running a similar task.
In addition to the error message above, I also had the following error message:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Error was solved with:
conda install nomkl
This is as per this stackoverflow post

Tensorflow MKL-DNN build in Ubuntu silently produces erroneus results

I have built Tensorflow 1.6.0-rc0 from source in Ubuntu 16.04 with MKL-DNN support following this guide. The build proceeds without any problem. Testing it with keras 2.1.3 on a simple convnet from this example "as is" is two times slower than with the non-MKL build.
Now, tuning the MKL parameters as recommended in the guide leads to almost 2 times speedup over the non-MKL build. But produces complete nonsense in terms of accuracy (and loss):
This comes with no errors or warnings from the console. The MKL parameters were tuned as follows:
from keras import backend as K
K.set_session(K.tf.Session(config=K.tf.ConfigProto(inter_op_parallelism_threads=1)))
os.environ["KMP_BLOCKTIME"] = "0"
os.environ["KMP_AFFINITY"] = "granularity=fine,verbose,compact,1,0"
The CPU is an i7-4790K.
For reference, results obtained from a run without tuning the MKL parameters are as expected:
Did anyone come across a similar issue? Just to check it against the community before firing an issue on GitHub.
You won't get such flat accuracy if the parameter "inter_op_parallelism_threads" is 2.
Given below is a modified version of the MKL tuning parameters that might speed up your code:
from keras import backend as K
import tensorflow as tf
config = tf.ConfigProto(intra_op_parallelism_threads=<Number.of Cores>, inter_op_parallelism_threads=2, allow_soft_placement=True, device_count = {'CPU': <Number.of Cores>})
session = tf.Session(config=config)
K.set_session(session)
os.environ["OMP_NUM_THREADS"] = "<Number.of Cores>"
os.environ["KMP_BLOCKTIME"] = "30"
os.environ["KMP_SETTINGS"] = "1"
os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"
eg :
from keras import backend as K
import tensorflow as tf
config = tf.ConfigProto(intra_op_parallelism_threads=8, inter_op_parallelism_threads=2, allow_soft_placement=True, device_count = {'CPU': 8})
session = tf.Session(config=config)
K.set_session(session)
os.environ["OMP_NUM_THREADS"] = "8"
os.environ["KMP_BLOCKTIME"] = "30"
os.environ["KMP_SETTINGS"] = "1"
os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"
Since certain optimization parameters works differently for different code, you could also play around with the parameters above to see a better speed up

Reproducible results with keras

How can I get reproducible results with keras? I followed these steps but I am still getting different results every time I run the Jupyter notebook. I also tried setting shuffle=False when calling model.fit().
My configuration:
conda 4.3.25
keras 2.0.6 with tensorflow backend
tensorflow-gpu 1.2.1
python 3.5
windows 10
See the answer I posted at another question. The main idea is to: first, disable the GPU. And then, seed the libraries like "numpy, random, etc". To sum up, including the code below at the beginning of your code may help solve your problem.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import numpy as np
import tensorflow as tf
import random as rn
from keras import backend as K
sd = 1
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)

How to get reproducible result when running Keras with Tensorflow backend

Every time I run LSTM network with Keras in jupyter notebook, I got a different result, and I have googled a lot, and I have tried some different solutions, but none of they are work, here are some solutions I tried:
set numpy random seed
random_seed=2017
from numpy.random import seed
seed(random_seed)
set tensorflow random seed
from tensorflow import set_random_seed
set_random_seed(random_seed)
set build-in random seed
import random
random.seed(random_seed)
set PYTHONHASHSEED
import os
os.environ['PYTHONHASHSEED'] = '0'
add PYTHONHASHSEED in jupyter notebook kernel.json
{
"language": "python",
"display_name": "Python 3",
"env": {"PYTHONHASHSEED": "0"},
"argv": [
"python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
]
}
and the version of my env is:
Keras: 2.0.6
Tensorflow: 1.2.1
CPU or GPU: CPU
and this is my code:
model = Sequential()
model.add(LSTM(16, input_shape=(time_steps,nb_features), return_sequences=True))
model.add(LSTM(16, input_shape=(time_steps,nb_features), return_sequences=False))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='linear'))
model.compile(loss='mse',optimizer='adam')
The seed is definitely missing from your model definition. A detailed documentation can be found here: https://keras.io/initializers/.
In essence your layers use random variables as their basis for their parameters. Therefore you get different outputs every time.
One example:
model.add(Dense(1, activation='linear',
kernel_initializer=keras.initializers.RandomNormal(seed=1337),
bias_initializer=keras.initializers.Constant(value=0.1))
Keras themselves have a section about getting reproduceable results in their FAQ section: (https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development). They have the following code snippet to produce reproducable results:
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/fchollet/keras/issues/2280#issuecomment-306959926
import os
os.environ['PYTHONHASHSEED'] = '0'
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
Keras + Tensorflow.
Step 1, disable GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".
import tensorflow as tf
import numpy as np
import random as rn
sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.
I resolved this issue by adding os.environ['TF_DETERMINISTIC_OPS'] = '1'
Here an example:
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
#rest of the code
#TensorFlow version 2.3.1