I built the latest version of darknet with Cuda 8 and OpenCV 3.4.0. When I try to run ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights shortvid.mp4 it loads everything properly and looks like it's starting, but then just hangs after printing the name of the video file.
It isn't doing anything because my GPU/cpu/mem isn't being utilized. Anybody know what could be happening? I don't have any errors to go by.
Related
My tfjs-node-gpu code works great on an NVIDIA p4 on GKE (and using WebGL in a browser), but it fails on a v100 and t4.
Node is crashing in the first predict call inside my warmup. I'm using small 128x128 tiles to predict a 4x image upscale using the idealo-gans. The v100 initializes fine, shows up with nvidia_smi, is displayed as a TF device and the NUMA stuff is all fine. It just hard crashes my node express server. I'm having trouble finding the crash stack, since this is started in a Docker container and my last attempt to log the crash from stderr failed.
I've tried with both the latest tfjs-node-gpu 3.0 and 2.8.5. GKE is configured to install the NV drivers, currently 410.104, and CUDA 10.0.
I've tried enabling debug mode, and passing {verbose: true} to the failing model.predict() call in my warmup function. Neither added any output to the warmup call, which is odd, since I do see output in the actual, non-warmup call to model.predict()
Any suggestions on how to debug further?
I started a new machine learning project.
In according to this document (https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras)
TF with Tensorboard appears to support GPU profiling. So, i used the same code in my Jupyter Notebook for testing.
The sample code generates profiling resulting. However, there is no GPU tracing information in resulting file. (only CPU)
This is my main problem.
I am using two RTX 2080 TI graphic cards.
And also, they were working when running the code.
The sample code does not use MirroredStrategy. So, i could see the one of them was running.
At first, i thought Tensorboard was the problem. But,i realized soon that TF does not generate the GPU tracing information.
The image above is the resulting file (local.trace). There was no GPU data.
It is my system specification.
OS ubuntu 18.04
jupyter-client 5.3.4
jupyter-core 4.6.1
jupyter-tensorboard 0.1.10
tensorflow-gpu 2.0.0
tensorflow-estimator 2.0.1
tensorflow-metadata 0.15.1
tensorboard 2.0.2
nVidia 410.104
CUDA 10.0
anaconda 4.7.12 (with python 3.6)
It looks irrelevant, but there was a warning message like the image below.
I have tested this on other PC and got the same resulting. It could be the GPU profiling is only supporting on Google Colab. (I am still confusing) Recently, I have searched it on google to fix the problem. I could not get still the answer.
Is there someone who is using GPU profiling on your own System instead of Google Colab?
Please give me piece of advices.
I figured out what caused the problem.
It was related with CUPTI(CUDA Profiling Tools Interface)
In contrast to Jupyter Notebook, there was a warning message when the code is running on Ubunto shell.
CUPTI error: CUPTI could not be loaded or symbol could not be found.
TF could not find CUPTI libraries. This is the main reason of the problem.
After adding the path to LD_LABRARY_PATH as below link, the problem is fixed!
https://stackoverflow.com/a/58752904/5553618
Doing a course in Machine Learning and can't get Tensorboard to work. I have saved runs from running a DQN and I write:
tensorboard -logdir runs
With the folliwng result:
2019-12-28 18:32:04.265065: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TensorBoard 1.7.0 at http://david-linux:6006 (Press CTRL+C to quit)
So I click the link and get:
No dashboards are active for the current data set.
Probable causes:
You haven’t written any data to your event files
TensorBoard can’t find your event files.
I also get this result after having the code running for a while:
"W1228 18:34:34.186506 Thread-2 application.py:272] path /[[_dataImageSrc]] not found, sending 404
W1228 18:34:34.205581 Thread-2 application.py:272] path /[[_imageURL]] not found, sending 404"
Running this on Linux using Anaconda Python version 3.6 because that is what the course book uses. Have no idea what the above errors means, quite new to coding in general and reinforment learning in particular.
It could be caused if the browser isn't updated. You could also try installing the latest version of Tensorboard:
pip uninstall tensorflow-tensorboard
pip install tensorboard
Also try using different browsers.
Can you just try going to http://localhost:6006 instead? It looks like your hostname is not one that actually resolves in DNS.
The GPU version of Tensorboard is having certain issues in Colab although the CPU version works alright. I could not find much from the docs though. This is the error
Also, I tried the following for installation
As you can see, I tried with both GPU and non-GPU versions and it does not work till I disable the GPU from runtime. Any help shall be appreciated.
I run tensorflow object_detection api last week correctly on my webcam (in windows). But this week after tensorflow updated to 1.4.0 it compiled correctly but it did not create boxes even for its test images. Since I did not have tensorflow 1.4.0 source files for windows platform, I run it on my ubuntu but the result is the same, boxes do not created.
I saw my variables in spyder IDE, the scores for detected classes are so weak. Why this happened? Am I run it wrong?
Thanks for your helps guys ...
This was a bug with tensorflow last update. I mentioned it in github and it solved. this is the link