How can I low ksoftirq usage when it hits 100% of a CPU? - process

I have a Linux server with 48 CPU and, from time to time, some of them starts hitting almost 100% of usage. And when that happens, the usage doesn't go down and that's affecting some services performance.
When I reboot this server, everything goes ok for several days when some starts hitting almost 100% of usage again. This is like a cycle.
The problem is that the usage of some CUP, when hits 100%, doesn't change, unless I reboot server, and there is a service that's performing badly when that occurs.
When I use the htop command, I can see some process ksoftirq/0, ksoftirq/1, ksoftirq/2 ... is the responsible for this usage. I don't know what to do.
One approach that I tried was: I observed that the number of CPU where that occurs tends to be in the first half (0-23). I tried to change the affinity of those process (ksoftirq) to another CPU in the second half (24-47), but without any success.
What I want is: how to low this ksoftirq process usage or, at least, how to distribute them in order to keep CPU with a usage lower?
I'm not sure if the solution is about changing the affinity or changing some other attribute. I'm really clueless in this situation.
This is a physical server. Some info about it:
output of lsb_release -a:
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
output of uname -a:
Linux tiamat-dc 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux
output of lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 44 bits physical, 48 bits virtual
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 46
Model name: Intel(R) Xeon(R) CPU E7530 # 1.87GHz
Stepping: 6
CPU MHz: 1239.565
BogoMIPS: 3723.93
Virtualization: VT-x
L1d cache: 768 KiB
L1i cache: 768 KiB
L2 cache: 6 MiB
L3 cache: 48 MiB
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44
NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45
NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46
NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp l
m constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx
16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d

Related

hashcat benchmark not starting at all

I'm trying to start hashcat on Win10 using my GPU AMD RX580.
What I did so far is:
Uninstalled all AMD drivers and rebooted
Started Driver Fusion cleaned all AMD Display drivers and rebooted
Manually removed OpenCL.dll from C:\windows\system32\ and c:\windows\syswow64 and rebooted
Installed AMD Crimsson driver edition - tried 17.8.2, 17.11.1, 17.11.4
My problem:
When I try to start a benchmark for WPA2 hash with PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -m 2500 -b
It starts and just quits without any output:
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -m 2500 -b
hashcat (v5.1.0) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
* Device #2: Not a native Intel OpenCL runtime. Expect massive speed loss.
You can use --force to override, but do not report related errors.
OpenCL Platform #1: Advanced Micro Devices, Inc.
================================================
* Device #1: Ellesmere, 3264/4096 MB allocatable, 36MCU
* Device #2: Intel(R) Core(TM) i7-4790 CPU # 3.60GHz, skipped.
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
PS D:\crack\hashcat-5.1.0>
Output from hashcat -I:
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -I
hashcat (v5.1.0) starting...
OpenCL Info:
Platform ID #1
Vendor : Advanced Micro Devices, Inc.
Name : AMD Accelerated Parallel Processing
Version : OpenCL 2.0 AMD-APP (2442.8)
Device ID #1
Type : GPU
Vendor ID : 1
Vendor : Advanced Micro Devices, Inc.
Name : Ellesmere
Version : OpenCL 2.0 AMD-APP (2442.8)
Processor(s) : 36
Clock : 1366
Memory : 3264/4096 MB allocatable
OpenCL Version : OpenCL C 2.0
Driver Version : 2442.8
Device ID #2
Type : CPU
Vendor ID : 128
Vendor : GenuineIntel
Name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
Version : OpenCL 1.2 AMD-APP (2442.8)
Processor(s) : 8
Clock : 3599
Memory : 6131/24526 MB allocatable
OpenCL Version : OpenCL C 1.2
Driver Version : 2442.8 (sse2,avx)
PS D:\crack\hashcat-5.1.0>
The same happens if I try to start mask attack.
Other benchmarks seems to work but when it reaches the WPA2 it just returns me to the command prompt:
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -b
hashcat (v5.1.0) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
* Device #2: Not a native Intel OpenCL runtime. Expect massive speed loss.
You can use --force to override, but do not report related errors.
OpenCL Platform #1: Advanced Micro Devices, Inc.
================================================
* Device #1: Ellesmere, 3264/4096 MB allocatable, 36MCU
* Device #2: Intel(R) Core(TM) i7-4790 CPU # 3.60GHz, skipped.
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 0 - MD5
Speed.#1.........: 12381.1 MH/s (96.87ms) # Accel:256 Loops:512 Thr:256 Vec:1
Hashmode: 100 - SHA1
Speed.#1.........: 4268.9 MH/s (70.02ms) # Accel:256 Loops:128 Thr:256 Vec:1
Hashmode: 1400 - SHA2-256
Speed.#1.........: 1870.1 MH/s (80.00ms) # Accel:256 Loops:64 Thr:256 Vec:1
Hashmode: 1700 - SHA2-512
Speed.#1.........: 461.1 MH/s (81.22ms) # Accel:128 Loops:32 Thr:256 Vec:1
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
PS D:\crack\hashcat-5.1.0>
Any idea what could cause this behavior?
It looks like it was a crash in the OpenCL runtime. I have installed the newest Adrenalin Drivers - 19.3.2 , but went into another issue :
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -b -m 2500
hashcat (v5.1.0) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
OpenCL Platform #1: Advanced Micro Devices, Inc.
================================================
* Device #1: Ellesmere, 3264/4096 MB allocatable, 36MCU
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
* Device #1: ATTENTION! OpenCL kernel self-test failed.
Your device driver installation is probably broken.
See also: https://hashcat.net/faq/wrongdriver
Speed.#1.........: 179.3 kH/s (87.14ms) # Accel:128 Loops:64 Thr:256 Vec:1
Started: Mon Mar 18 19:05:55 2019
Stopped: Mon Mar 18 19:06:26 2019
PS D:\crack\hashcat-5.1.0>
Then I used the --self-test-disable option and it started . I have also created a post in the hashcat forum : https://hashcat.net/forum/thread-8226-post-44141.html#pid44141 in case anyone is interested .

running gem5 with SPEC2006

when running GEM5 X86 in SE mode, I am trying to run bzip2 from SPEC2006, at first it was failing because it says it can't run dynamic execution so I compiled it with -static flag.
now I get this error:
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 27 2018 00:36:02
gem5 started Dec 22 2018 18:16:40
gem5 executing on Dan
command line: ./build/X86/gem5.opt configs/example/se.py -c /home/dan/SPEC2006/benchspec/CPU2006/401.bzip2/exe/bzip2_base.ia64-gcc42 -i /home/dan/SPEC2006/benchspec/CPU2006/401.bzip2/data/test/input/dryer.jpg
Could not import 03_BASE_FLAT
Could not import 03_BASE_NARROW
Global frequency set at 1000000000000 ticks per second
warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (4096 Mbytes)
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
info: Entering event queue # 0. Starting simulation...
panic: Tried to write unmapped address 0xffffedd8. Inst is at 0x400da4
# tick 5500
[invoke:build/X86/arch/x86/faults.cc, line 160]
Memory Usage: 4316736 KBytes
Program aborted at tick 5500
Aborted (core dumped)
I am running gem5 on ubuntu 17.10.
I tried to find solutions in google but I didn't see any one referring to this problem, does anyone know how to fix the problem?
Please check your host machine configuration. Bzip2 does not work in a 32-bit machine. My desktop is dual core have 32-bit X86 architecture, I tried to run bzip2 it had shown same error.

Segmentation fault (core dumped) on tf.Session()

I am new with TensorFlow.
I just installed TensorFlow and to test the installation, I tried the following code and as soon as I initiate the TF Session, I am getting the Segmentation fault (core dumped) error.
bafhf#remote-server:~$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
/home/bafhf/anaconda3/envs/ismll/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
>>> tf.Session()
2018-05-15 12:04:15.461361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1349] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:04:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
Segmentation fault (core dumped)
My nvidia-smi is:
Tue May 15 12:12:26 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:04:00.0 Off | 0 |
| N/A 38C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 00000000:05:00.0 Off | 2 |
| N/A 31C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
And nvcc --version is:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
Also gcc --version is:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Following is my PATH:
/home/bafhf/bin:/home/bafhf/.local/bin:/usr/local/cuda/bin:/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib:/home/bafhf/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
and the LD_LIBRARY_PATH:
/usr/local/cuda/bin:/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
I am running this on a server and I don't have root privileges. Still I managed to install everything as per the instructions on the official website.
Edit: New observations:
Seems like the GPU is allocating memory for the process for a second and then the core segmentation dumped error is thrown:
Edit2: Changed tensorflow version
I downgraded my tensorflow version from v1.8 to v1.5. The issue still remains.
Is there any way address or debug this issue?
This could possibly occur since you are using multiple GPUs here. Try setting cuda visible devices to just one of the GPUs. See this linkfor instructions on how to do that. In my case, this solved the problem.
If you can see the nvidia-smi output, the second GPU has an ECC code of 2. This error manifests itself irrespective of a CUDA version or TF version error, and usually as a segfault, and sometimes, with the CUDA_ERROR_ECC_UNCORRECTABLE flag in the stack trace.
I got to this conclusion from this post:
"Uncorrectable ECC error" usually refers to a hardware failure. ECC is
Error Correcting Code, a means to detect and correct errors in bits
stored in RAM. A stray cosmic ray can disrupt one bit stored in RAM
every once in a great while, but "uncorrectable ECC error" indicates
that several bits are coming out of RAM storage "wrong" - too many for
the ECC to recover the original bit values.
This could mean that you have a bad or marginal RAM cell in your GPU
device memory.
Marginal circuits of any kind may not fail 100%, but are more likely
to fail under the stress of heavy use - and associated rise in
temperature.
A reboot usually is supposed to take away the ECC error. If not, seems like the only option is to change the hardware.
So what all I did and finally how I fixed the issue?
I tested my code a on a separate machcine with NVIDIA 1050 Ti
machine and my code executed perfectly fine.
I made the code run only on the first card for which the ECC
value was normal, just to narrow down the issue. This I did
following, this post, setting the
CUDA_VISIBLE_DEVICES environment variable.
I then requested for restart of the Tesla-K80 server to check
whether a restart can fix this issue, they took a while but the
server was then restarted
Now the issue is no more and I can run both the cards for my
tensorflow implemntations.
In case anyone still interested in, I happened to had the same issue, with "Volatile Uncorr. ECC" output. My problem was incompatible versions as shown below:
Loaded runtime CuDNN library: 7.1.1 but source was compiled with:
7.2.1. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a
binary install, upgrade your CuDNN library. If building from sources,
make sure the library loaded at runtime is compatible with the version
specified during compile configuration. Segmentation fault
After I upgrade CuDNN library to 7.3.1 (which is greater than 7.2.1), segmentation fault error disappeared. To upgrade I did the following (as also documented in here).
Download CuDNN library from NVIDIA website
sudo tar -xzvf [TAR_FILE]
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
I was also facing the same issue. I have a workaround for the same you can try that.
I followed the following steps:
1. Reinstall the python 3.5 or above
2. Reinstall the Cuda and Add the Cudnn libraries to it.
3. Reinstall Tensorflow 1.8.0 GPU version.
Check that you are using the exact version of CUDA and CuDNN required by tensorflow, and also that you are using the version of driver of the graphics card that comes with this CUDA version.
I once had a similar issue having a driver that was too recent. Downgrading it to the version coming with the CUDA version required by tensorflow solved the issue for me.
I encounter this problem recently.
The reason is multiple GPUs in docker container.
The solution is pretty simple, you either:
set CUDA_VISIBLE_DEVICES in host
refers to https://stackoverflow.com/a/50464695/2091555
or
use --ipc=host to launch the docker if you need multiple GPUs
e.g.
docker run --runtime nvidia --ipc host \
--rm -it
nvidia/cuda:10.0-cudnn7-runtime-ubuntu16.04:latest
This problem is actually pretty nasty, and segfault happens during cuInit() calls in docker container and everything works fine in the host. I will leave log here to let the search engine find this answer easier for other people.
(base) root#e121c445c1eb:~# conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
Collecting package metadata (current_repodata.json): / Segmentation fault (core dumped)
(base) root#e121c445c1eb:~# gdb python /data/corefiles/core.conda.572.1569384636
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
warning: core file may not match specified executable file.
[New LWP 572]
[New LWP 576]
warning: Unexpected size of section `.reg-xstate/572' in core file.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/conda/bin/python /opt/conda/bin/conda upgrade conda'.
Program terminated with signal SIGSEGV, Segmentation fault.
warning: Unexpected size of section `.reg-xstate/572' in core file.
#0 0x00007f829f0a55fb in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
[Current thread is 1 (Thread 0x7f82bbfd7700 (LWP 572))]
(gdb) bt
#0 0x00007f829f0a55fb in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#1 0x00007f829f06e3a5 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#2 0x00007f829f07002c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#3 0x00007f829f0e04f7 in cuInit () from /usr/lib/x86_64-linux-gnu/libcuda.so
#4 0x00007f82b99a1ec0 in ffi_call_unix64 () from /opt/conda/lib/python3.7/lib-dynload/../../libffi.so.6
#5 0x00007f82b99a187d in ffi_call () from /opt/conda/lib/python3.7/lib-dynload/../../libffi.so.6
#6 0x00007f82b9bb7f7e in _call_function_pointer (argcount=1, resmem=0x7ffded858980, restype=<optimized out>, atypes=0x7ffded858940, avalues=0x7ffded858960, pProc=0x7f829f0e0380 <cuInit>,
flags=4353) at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callproc.c:827
#7 _ctypes_callproc () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callproc.c:1184
#8 0x00007f82b9bb89b4 in PyCFuncPtr_call () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/_ctypes.c:3969
#9 0x000055c05db9bd2b in _PyObject_FastCallKeywords () at /tmp/build/80754af9/python_1553721932202/work/Objects/call.c:199
#10 0x000055c05dbf7026 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1553721932202/work/Python/ceval.c:4619
#11 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1553721932202/work/Python/ceval.c:3124
#12 0x000055c05db9a79b in function_code_fastcall (globals=<optimized out>, nargs=0, args=<optimized out>, co=<optimized out>)
at /tmp/build/80754af9/python_1553721932202/work/Objects/call.c:283
#13 _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1553721932202/work/Objects/call.c:408
#14 0x000055c05dbf2846 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1553721932202/work/Python/ceval.c:4616
#15 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1553721932202/work/Python/ceval.c:3124
... (stack omitted)
#46 0x000055c05db9aa27 in _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1553721932202/work/Objects/call.c:433
---Type <return> to continue, or q <return> to quit---q
Quit
Another try is using pip to install
(base) root#e121c445c1eb:~# pip install torch torchvision
(base) root#e121c445c1eb:~# python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
Segmentation fault (core dumped)
(base) root#e121c445c1eb:~# gdb python /data/corefiles/core.python.28.1569385311
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
warning: core file may not match specified executable file.
[New LWP 28]
warning: Unexpected size of section `.reg-xstate/28' in core file.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
bt
Core was generated by `python'.
Program terminated with signal SIGSEGV, Segmentation fault.
warning: Unexpected size of section `.reg-xstate/28' in core file.
#0 0x00007ffaa1d995fb in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0 0x00007ffaa1d995fb in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007ffaa1d623a5 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffaa1d6402c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffaa1dd44f7 in cuInit () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffaee75f724 in cudart::globalState::loadDriverInternal() () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#5 0x00007ffaee760643 in cudart::__loadDriverInternalUtil() () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#6 0x00007ffafe2cda99 in __pthread_once_slow (once_control=0x7ffaeebe2cb0 <cudart::globalState::loadDriver()::loadDriverControl>,
... (stack omitted)
I am using tensorflow in a cloud enviornment from paperspace.
Update of cuDNN 7.3.1 did not work for me.
One way is to build Tensorflow with proper GPU and CPU support.
This is not proper solution but this solved my issue temporarily (downgrade tensoflow to 1.5.0):
pip uninstall tensorflow-gpu
pip install tensorflow==1.5.0
pip install numpy==1.14.0
pip install six==1.10.0
pip install joblib==0.12
Hope this helps !

How to set KVM VM use shadow page table?

I want to measure shadow page table performance vs EPT. I know in kvm code path, EPT and shadow page table share some code path. There is a switch to check if EPT is enabled.
So, I turn off EPT, I think this is a way to use shadow page table in KVM VM.
I exec some commands on host:
# cat /sys/module/kvm_intel/parameters/ept
Y //check if EPT is enabled, Yes it is
# rmmod kvm_intel
# modprobe kvm_intel ept=0,vpid=0 //Re-modprobe kvm_intel, but turn off EPT and VPID
# cat /sys/module/kvm_intel/parameters/ept
N // EPT is disabled
# cat /sys/module/kvm_intel/parameters/vpid
Y //VPID is still enabled
After these commads, I expect the EPT is disabled, However, I create a VM with 4 vcpu, but in VM, I use htop in VM only find i vcpu. I don't know why.
So, I turn on ept again, I can see 4 vcpu in vm using htop. But, on another server, I turn off ept, in VM, I still can find 4 vcpu.
In host, I test qemu thread num:
pstree -p | grep qemu
|-qemu-kvm(20594)-+-{qemu-kvm}(20612)
| |-{qemu-kvm}(20613)
| |-{qemu-kvm}(20614)
| |-{qemu-kvm}(20615)
| |-{qemu-kvm}(20618)
| `-{qemu-kvm}(23429)
There are still muti threads.
In KVM VM, I use lscpu to check:
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0
Off-line CPU(s) list: 1-3
Thread(s) per core: 1
Core(s) per socket: 1
座: 1
NUMA 节点: 1
厂商 ID: GenuineIntel
CPU 系列: 6
型号: 62
型号名称: Intel(R) Xeon(R) CPU E5-2640 v2 # 2.00GHz
步进: 4
CPU MHz: 1999.999
BogoMIPS: 3999.99
超管理器厂商: KVM
虚拟化类型: 完全
L1d 缓存: 32K
L1i 缓存: 32K
L2 缓存: 4096K
NUMA 节点0 CPU: 0
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase tsc_adjust smep erms xsaveopt
VPID is still enabled cause your command format was incorrect
# modprobe kvm_intel ept=0 vpid=0(Use spaces instead of commas)

Pig Processing log file using

I have following logs : Can any one tell me how can I processes it using PigLatin ?
**
SYSTEM IP:192.168.68.78
Distro info:Red Hat Enterprise Linux Server release 6.6 (Santiago)
Kernel:Linux bugzilla-blr-in 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Uptime:12:27:42 up 8 days, 17:57, 0 users, load average: 0.00, 0.00, 0.00
Memory:Total:1869Mb Memory:Used:1567Mb Memory:Free:302Mb
Swap:Total:1999Mb Swap:Used:0Mb Swap:Free: 1999Mb
Architecture:x86_64
Processor:0:Intel(R) Xeon(R) CPU E5-2640 0 # 2.50GHz
Date:Wed Jun 29 12:27:42 IST 2016
SCRIPT USER
User:aimsadm (uid:503)
Groups:aimsadm
Working dir:/home/aimsadm
Home dir:/home/aimsadm
NETWORK DETAILS
Hostname:bugzilla-blr-in
IP ( ):127.0.0.1/8
IP (eth0):192.168.68.78/24
Gateway:192.168.68.1
Name Server:8.8.8.8
Name Server:192.168.68.80
LIST OF USERS:sdudam,sudutha,djegathesa,aimsadm,krishnang,
CLAMD STATUS: CLAM AV service is stopped or not installed
NAGIOS STATUS: Nagios service is running
OSSEC STATUS: Ossec service is stopped or not installed
NTPD STATUS: NTP service is running
HARDENING STATUS:Hardening Done
AD INTEGRATION STATUS:AD Integration Not Done
HARDWARE/PLATFORM DETAILS
Hardware Platform:64Bit
Hardware Info :DMI 2.3 present.
DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
OS DETAILS
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Linux bugzilla-blr-in 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
CPU INFO
model name : Intel(R) Xeon(R) CPU E5-2640 0 # 2.50GHz
MEMORY INFO
MemTotal: 1914776 kB
RAM:1 GB
HARD DISK DETAILS
MOUNT DETAILS
Filesystem:/dev/mapper/vg_bugzillablrin-LogVol00,Type:ext4,Total Size:22G,Used:2.4G,Avail:19G,Use%:12%,Mounted on:/
Filesystem:tmpfs,Type:tmpfs,Total Size:981M,Used:0,Avail:981M,Use%:0%,Mounted on:/dev/shm
Filesystem:/dev/sda1,Type:ext4,Total Size:297M,Used:95M,Avail:186M,Use%:34%,Mounted on:/boot
Filesystem:/dev/mapper/vg_bugzillablrin-LogVol01,Type:ext4,Total Size:21G,Used:5.8G,Avail:14G,Use%:30%,Mounted on:/var
LSBLK OUTPUT
NAME:sr0,
MAJ:MIN:11:0,RM:1,SIZE:1024M,RO:0,TYPE:rom,MOUNTPOINT::
NAME:sda,
MAJ:MIN:8:0,RM:0,SIZE:60G,RO:0,TYPE:disk,MOUNTPOINT::
NAME:sda1,
MAJ:MIN:8:1,RM:0,SIZE:300M,RO:0,TYPE:part,MOUNTPOINT::/boot
NAME:sda2,
MAJ:MIN:8:2,RM:0,SIZE:59.7G,RO:0,TYPE:part,MOUNTPOINT::
RUNNING SERVICES
auditd running...
crond running...
messagebus running...
nrpe running...
ntpd running...
rhnsd running...
rhsmcertd running...
rpcbind running...
openssh-daemon running...
**
SYSTEM IP:192.168.68.35
Distro info:CentOS release 6.6 (Final)
Kernel:Linux altifin-ci-app 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Uptime:12:28:06 up 48 days, 20:31, 0 users, load average: 0.00, 0.00, 0.00
Memory:Total:11903Mb Memory:Used:1277Mb Memory:Free:10625Mb
Swap:Total:8191Mb Swap:Used:0Mb Swap:Free: 8191Mb
Architecture:x86_64
Processor:0:Intel(R) Xeon(R) CPU E5-2650 v2 # 2.60GHz
Processor:1:Intel(R) Xeon(R) CPU E5-2650 v2 # 2.60GHz
Date:Wed Jun 29 12:28:06 IST 2016
SCRIPT USER
User:aimsadm (uid:509)
Groups:aimsadm
Working dir:/home/aimsadm
Home dir:/home/aimsadm
NETWORK DETAILS
Hostname:altifin-ci-app
IP (lo):127.0.0.1/8
IP (eth0):192.168.68.35/24
Gateway:192.168.68.1
Name Server:192.168.68.10
Name Server:192.168.68.4
LIST OF USERS:altipay,aramesh,sdudam,nagios,kpankaj,sudutha,miyappan,skosanam,djegathesa,aimsadm,
CLAMD STATUS: CLAM AV service is stopped or not installed
NAGIOS STATUS: Nagios service is running
OSSEC STATUS: Ossec service is stopped or not installed
NTPD STATUS: NTP service is running
HARDENING STATUS:Hardening Done
AD INTEGRATION STATUS:AD Integration Not Done
HARDWARE/PLATFORM DETAILS
Hardware Platform:64Bit
Hardware Info :DMI 2.3 present.
DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
OS DETAILS
CentOS release 6.6 (Final)
Linux altifin-ci-app 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
CPU INFO
model name : Intel(R) Xeon(R) CPU E5-2650 v2 # 2.60GHz
model name : Intel(R) Xeon(R) CPU E5-2650 v2 # 2.60GHz
MEMORY INFO
MemTotal: 12189032 kB
RAM:11 GB
HARD DISK DETAILS
MOUNT DETAILS
Filesystem:/dev/mapper/vg_altifinci-LogVol01,Type:ext4,Total Size:203G,Used:80G,Avail:113G,Use%:42%,Mounted on:/
Filesystem:tmpfs,Type:tmpfs,Total Size:6.3G,Used:0,Avail:6.3G,Use%:0%,Mounted on:/dev/shm
Filesystem:/dev/sda1,Type:ext4,Total Size:500M,Used:64M,Avail:410M,Use%:14%,Mounted on:/boot
LSBLK OUTPUT
NAME:sr0,
MAJ:MIN:11:0,RM:1,SIZE:1024M,RO:0,TYPE:rom,MOUNTPOINT::
NAME:sda,
MAJ:MIN:8:0,RM:0,SIZE:200G,RO:0,TYPE:disk,MOUNTPOINT::
NAME:sda1,
MAJ:MIN:8:1,RM:0,SIZE:500M,RO:0,TYPE:part,MOUNTPOINT::/boot
NAME:sda2,
MAJ:MIN:8:2,RM:0,SIZE:199.5G,RO:0,TYPE:part,MOUNTPOINT::
RUNNING SERVICES
abrtd running...
abrt-dump-oops running...
acpid running...
atd running...
auditd running...
automount running...
crond running...
cupsd running...
hald running...
mcelog running...
messagebus running...
MySQL but
rpc.statd running...
nrpe running...
ntpd running...
rpcbind running...
openssh-daemon running...
Yes.
There is way. Let me explain this. Though the given sample data falls into the category of 'unstructured', we always look for 'some thing' in it.
Having said that we look for a pattern, say line or lines having the required data you are looking into!
To achieve this we need to identify the 'pattern' from the sample data and use appropriate 'RegEx' (regular expression) to pull it.
Also, Pig comes with built-in jar 'piggybank' to support various pre-defined file formats including unstructured one like you said.Try with 'RegExLoader' class that is part of the below package from PIG's piggybank!!!
(Package org.apache.pig.piggybank.storage)
https://pig.apache.org/docs/r0.15.0/api/
Also, let all know the exact output you are looking into.