Is there any difference between nvprof and pgprof? - gpu

I am interested to know if pgprof == nvprof+nvvp.
For instance, I would like to know if they are interchangable. nvprof or nvvp will profile a PGI OpenACC application exactly as pgprof?
For instance, pgprof preselects the CUDA toolkit that comes with the OpenACC installation, and nvvp selects the the one in /usr/local/cuda. There is any problem on mixing the toolkits?
I'm a bit confused because the documentation of both tools (NVIDIA Profiler documentation & PGI Profiler Guide) looks exactly the same.
Also doing a diff doesn't show any clear difference:
$ nvprof --help > help.nv
$ pgprof --help > help.pgi
$ diff help.pgi help.nv
1c1
< Usage: pgprof [options] [application] [application-arguments]
---
> Usage: nvprof [options] [application] [application-arguments]
113c113
< this pgprof instance. Note: Only one instance of pgprof
---
> this nvprof instance. Note: Only one instance of nvprof
305c305
< Suppress all pgprof output.
---
> Suppress all nvprof output.
346c346
< Make pgprof send all its output to the specified file, or
---
> Make nvprof send all its output to the specified file, o

When NVIDIA acquired PGI a few years ago, we did merge pgprof and nvprof with pgprof's CPU profiling brought into nvprof. The main difference is that pgprof will enable CPU profiling by default while this need to be enabled via the "--cpu-profiling on" option when using nvprof.

Related

/include/boost/thread/pthread/mutex.hpp:111: boost::mutex::~mutex(): Assertion `!res' failed

on Ubuntu 16.04, I compiled the spinnaker SDK src/Acquisition/make, I got the "Acquisition" under bin/
When I run it, I got the error:
Number of cameras detected: 1
Running example for camera 0...
* DEVICE INFORMATION *
DeviceID: 18073382
DeviceSerialNumber: 18073382
DeviceVendorName: Point Grey Research
DeviceModelName: Grasshopper3 GS3-U3-32S4M
DeviceType: U3V
DeviceDisplayName: Point Grey Research
DeviceAccessStatus: OpenReadWrite
DeviceVersion: FW:v2.25.3.00 FPGA:v2.02
DeviceDriverVersion: none : 0.0.0.0
DeviceUserID:
DeviceIsUpdater: 0
DeviceInstanceId: 0113C726
DeviceLocation:
DeviceCurrentSpeed: HighSpeed
GUIXMLLocation: Device
GUIXMLPath: Input.xml
GenICamXMLLocation: Device
GenICamXMLPath:
DeviceU3VProtocol: 1
* IMAGE ACQUISITION *
Acquisition mode set to continuous...
Unable to begin image acquisition. Aborting with error -1010...
Camera 0 example complete...
Done! Press Enter to exit...
Acquisition_C: /softwarelib/Boost/boost_1_60_0/GCC_5_3_1/linux_cpp11/release/amd64/include/boost/thread/pthread/mutex.hpp:111: boost::mutex::~mutex(): Assertion `!res' failed
The sample code itself doesn't use mutex at all.
This error is due to insufficient usbfs memory allocation. Please refer to section 3 of the spinnaker readme as follows for info on how to increase the value to 1000:
===============================================================================
3. USB RELATED NOTES
On Linux systems, the USB-FS memory is restricted to 16 MB or less by default. To
increase this limit to make use of the imaging hardware's full capabilities, a
minor change needs to be made to the system.
To PERMANENTLY modify the USB-FS memory:
1. Open the /etc/default/grub file in any text editor. Find and replace:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
with this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash usbcore.usbfs_memory_mb=1000"
2. Update grub with these settings:
$ sudo update-grub
3. Reboot and test a USB 3.1 camera.
If this method fails to set the memory limit, to TEMPORARILY modify the USB-FS
memory until the next reboot, run the following command:
$ sudo sh -c 'echo 1000 > /sys/module/usbcore/parameters/usbfs_memory_mb'
To confirm that the memory limit has been successfully updated, run the following command:
$ cat /sys/module/usbcore/parameters/usbfs_memory_mb
If using multiple USB3 cameras, the USB-FS memory limit may need to exceed 1000.
More information on these changes can be found at:
https://www.flir.com/support-center/iis/machine-vision/application-note/understanding-usbfs-on-linux

nv-nsight-cu-cli caused Tensorflow to fail

I've downloaded the newest Nsight Compute profiling tool and I want to use it to benchmark Tensorflow applications. The code I'm using is here. It runs perfectly fine when I execute it and when I benchmark it with nvprof ./mnist.py it had no problem at all. However, when I try to run it with command sudo ./nv-nsight-cu-cli [path to the file] I get the following error:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
I suspect that nv-nsight-cu-cli somehow didn't recognized the environment variable at all. Is there any fix around?
You need to search for differences in both environments:
env variables
LD_LIBRARY_PATH
/etc/ld.so.conf
/etc/ld.so.conf.d/*
cuBLAS
Is installation complete/not broken?
Is it installed at the same location on both machines?
Versions
...
You can start with locate libcublas.so on both machines to see if there's a difference. Alternatively, you can strace -f -e open the program to check where it tries to libcublas.so from.
Your error has (for now) nothing to do with GPUs: libcublas.so.9.0 can just not be found. Find it, find why Tensorflow can not find it and your problem will be solved.
It appears that GP100 is not supported by the tool at this moment.
The answer is found here:
Nsight Compute only supports Pascal (other than GP100) and later GPUs.

Access pagemap in gem5 FS mode

I am trying to run an application which uses pagemap in gem5 FS mode.
But I am not able to use pagemap in gem5. It throws below error -
"assert(pagemap>=0) failed"
The line of code is:
int pagemap = open("/proc/self/pagemap", O_RDONLY);
assert(pagemap >= 0);
Also, If I try to run my application on gem5 terminal with sudo ,it throws error-
sudo command not found
How can I use sudo in gem5 ??
These problems are not gem5 specific, but rather image / Linux specific, and would likely happen on any simulator or real hardware. So I recommend that you remove gem5 from the equation completely, and ask a Linux or image specific question next time, saying exactly what image your are using, kernel configs, and provide a minimal C example that reproduces the problem: this will greatly improve the probability that you will get help.
I have just done open("/proc/self/pagemap", O_RDONLY) successfully with: this program and on this fs.py setup on aarch64, see also these comments.
If /proc/<pid>/pagemap is not present for any file, do the following:
ensure that procfs is mounted on /proc. This is normally done with an fstab entry of type:
proc /proc proc defaults 0 0
but your init script needs to use fstab as well.
Alternatively, you can mount proc manually with:
mount -t proc proc proc/
you will likely want to ensure that /sys and /dev are mounted as well.
grep the kernel to see if there is some config controlling the file creation.
These kinds of things are often easy to find without knowing anything about the kernel.
If I do:
git grep '"pagemap'
to find the pagemap string, which is likely the creation point, on v4.18 this leads me to fs/proc/base.c, which contains:
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("pagemap", S_IRUSR, proc_pagemap_operations),
#endif
so make sure CONFIG_PROC_PAGE_MONITOR is set.
sudo: most embedded / simulator images don't have it, you just login as root directly and can do anything by default without it. This can be seen by the conventional # in the prompt instead of $.

callgrind with spawn-fcgi not creating profiling data

I need to profile my c++ application which starts with spawn-fcgi.
I tried to use callgrind but callgrind output in KCachegrind is not showing any information of my applications.
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes --demangle=no --trace-children=yes --callgrind-out-file=%p spawn-fcgi -s /tmp/sock.tmp ./myApp arg1 arg2
This command creates two files 10012 and 10013, but second file is empty.
First file have function informations for location spawn-fcgi, ld-2.*.so amd libc.
Please suggest correct option to get profiling information for my application.
I experienced similar behaviour when profiled fastcgi process crashed on exit (so statistics isn't dumped on termination with SIGSEGV).
I used callgrind_control tool to dump statistics at arbitrary point of time:
callgrind_control --dump
Also you can stop gathering statistics with callgrind_control -i off, reenable with callgrind_control -i on, reset with callgrind_control -z.
See callgrind_control manual for details http://valgrind.org/docs/manual/cl-manual.html .

Get current CPU usage in monkeyrunner script

I am using a monkeyrunner Jython script to automate some UI test. I want to confirm that the previous step is complete before doing the next step, based on the current CPU usage of the OS (of the PC the emulator is running on). Hence I need a way to get current CPU usage in a monkeyrunner Jython script.
I've done some survey, but looks like monkeyrunner Jython script does not work with psutil: Monkeyrunner doesnt find my module
Anyone could tell me what is the easiest way to get current CPU usage in a monkeyrunner Jython script?
Thanks.
You can invoke shell commands directly from MonkeyDevice:
top10 = device.shell('top -n 1 -m 10')
try top command to get cpu usage,
1.try this if you want to get cpu usage of android device:-
import os
top_10_ps_list=os.popen('adb shell top -n 1 -m 10').read()
2.try this if you want to get cpu usage of PC OS:-
import os
top_10_ps_list=os.popen('top -b -n 1').read()