collectd: What are the units of the measurements captured by the disk, interface, irq, and swap_io plugins? - collectd

What are the units of the measurements captured by the disk, interface, irq, and swap_io collectd plugins?
I'm comparing collectd 5 running on one machine with collectd 4 running on another, and trying to configure them both so they capture the same metrics.
disk reports fractional floats on collectd 5 (KiB?), integers on 4 (bytes).
interface is fractional floats on collectd 5 (KiB?), integer on 4 (bytes).
irq is apparently percentages on collectd 5, cumulative jiffies(?) on 4.
swap_io is floats (with an occasional 'nan') on collectd 5, integer on 4.
Additionally, the cpu plugin is supposed, according to the collectd wiki, to capture jiffies when ValuesPercentage is false and both ReportByCpu and ReportByState are true (their defaults). The collectd 4 cpu plugin does not offer these options and measures in jiffies. I set the collectd.conf of the version 5 instance to the settings I just mentioned (even explicitly setting defaults) and...it stubbornly continues to report percentages instead of jiffies!
Update: After taking a close look at the code for collectd 5, it has become clear that the collectd wiki is lying: when ValuesPercentage is false and both ReportByCpu and ReportByState are true, the cpu plugin reports rates of change and not cumulative jiffies (as in collectd 4).
Further Update: I had misinterpreted the collectd 4 behaviour for swap_io. Turns out the swap metrics are in bytes while swap_io is in pages (and you can't configure that), so the correct collectd 5 setting is ReportBytes = false and not ReportBytes = true. That's one less discrepancy.
collectd.conf for version 4:
# We're running collectd 4.10.9
# FQDNLookup since 4.3, became true by default with 5.0
FQDNLookup false
##
## Interval (in seconds) at which to query values.
## Starting with 5.4.3, this may be overridden on a per-plugin basis.
Interval 60
##
## Client part
##
## Logging (only one allowed)
##
# syslog since 4.0
LoadPlugin syslog
<Plugin syslog>
LogLevel info
</Plugin>
##
## Inputs
##
# cpu since 1.3
LoadPlugin cpu
# df since 3.6
LoadPlugin df
<Plugin df>
FSType tmpfs
IgnoreSelected true
# ReportByDevice since 4.8
ReportByDevice true
# ReportInodes false
# ReportReserved false by default; the option became true and was removed with version 5
ReportReserved true
# ValuesAbsolute, ValuesPercentage starting with collectd 5.4
</Plugin>
# disk since 1.5
LoadPlugin disk
# interface since 1.0
LoadPlugin interface
# irq since 4.0
LoadPlugin irq
# load since 1.0 (not sure about ReportRelative)
LoadPlugin load
<Plugin load>
ReportRelative true
</Plugin>
# memory since 1.0
# No options for 4.10
LoadPlugin memory
# nfs since 3.3
LoadPlugin nfs
# processes since 3.2
LoadPlugin processes
<Plugin processes>
# ProcessMatch since 4.5
ProcessMatch "all" "(.*)"
</Plugin>
# protocols since 4.7
LoadPlugin protocols
# swap since 2.1
# No options for 4.10
LoadPlugin swap
<Plugin swap>
# 4.10 reports swap I/O in pages (and cannot be configured otherwise)
# 4.10 reports swap cached/free/used in bytes
</Plugin>
# tcpconns since 4.2
LoadPlugin tcpconns
# thermal since 4.5
LoadPlugin thermal
# uptime since 4.7
LoadPlugin uptime
##
## Server part
##
## Output (only one allowed)
##
# csv since 4.0
LoadPlugin csv
<Plugin csv>
DataDir "/var/collectd/csv"
# StoreRates since 4.3
StoreRates false
</Plugin>
# write_graphite since 5.1
# write_http since 4.8
collectd.conf for version 5:
# We are running collectd 5.4.0.git
# FQDNLookup since 4.3, became true by default with 5.0
FQDNLookup false
# Interval (in seconds) at which to query values. This may be overwritten on
# a per-plugin basis by using the 'Interval' option of the LoadPlugin block.
# This capability was announced with version 5.2 but became functional
# only with 5.4.3.
Interval 60
# Logging
# syslog since 4.0
LoadPlugin syslog
<Plugin syslog>
LogLevel info
</Plugin>
# LoadPlugin section
# cpu since 1.3
LoadPlugin cpu
<Plugin cpu>
# ReportByCpu true
# ReportByState true
# ValuesPercentage starting with collectd 5.5
# ValuesPercentage true
# ValuesPercentage false forces measurements in jiffies/second when both
# ReportByCpu and ReportByState are true (the defaults); collectd 4 does
# not do time derivation (and time derivation cannot be turned off in 5)
ValuesPercentage false
</Plugin>
# df since 3.6
LoadPlugin df
<Plugin df>
FSType rootfs
FSType sysfs
FSType proc
FSType devtmpfs
FSType devpts
FSType tmpfs
FSType fusectl
FSType cgroup
IgnoreSelected true
# ReportByDevice since 4.8
ReportByDevice true
# ValuesAbsolute, ValuesPercentage since 5.4
# ValuesAbsolute for reporting in bytes (true by default)
# ValuesAbsolute true
ValuesPercentage false
</Plugin>
# disk since 1.5
LoadPlugin disk
# irq since 4.0
LoadPlugin irq
# load since 1.0 (not sure about ReportRelative)
LoadPlugin load
<Plugin load>
ReportRelative true
</Plugin>
# memory since 1.0
# No options for 4.10
LoadPlugin memory
<Plugin memory>
ValuesAbsolute true
ValuesPercentage false
</Plugin>
# nfs since 3.3
LoadPlugin nfs
# processes since 3.2
LoadPlugin processes
<Plugin processes>
# ProcessMatch since 4.5
ProcessMatch "all" "(.*)"
</Plugin>
# protocols since 4.7
LoadPlugin protocols
# swap since 2.1
# No options for 4.10
LoadPlugin swap
<Plugin swap>
ReportByDevice true
# ReportIO true
# ReportBytes is false by default; when false, swap I/O is in pages
# 4.10 reports swap I/O in pages (and cannot be configured otherwise)
ReportBytes false
ValuesAbsolute true
ValuesPercentage false
# 4.10 reports swap cached/free/used in bytes, swap_io in/out in pages
</Plugin>
# tcpconns since 4.2
LoadPlugin tcpconns
# thermal since 4.5
LoadPlugin thermal
# thermal since 4.7
LoadPlugin uptime
# Server part
LoadPlugin write_graphite
<Plugin write_graphite>
<Node "node-graphite-1">
Host "192.168.1.170"
Port "1111"
Protocol "tcp"
EscapeCharacter "_"
AlwaysAppendDS true
SeparateInstances false
</Node>
</Plugin>

disk reports cumulative operations (ops) and also cumulative bytes (Io).
interface reports cumulative packet and error counts on the one hand, and cumulative bytes on the other.
swap_io reports cumulative operations.
Can you post your exact full config please, so we can troubleshoot your collectd5 CPU config?

Well, that turned out to be a simple, stupid mistake. The csv plugin used on one side has a default for StoreRates of false, while the graphite plugin used on the other has a default of true, and that was why some (but not all) metrics were integer (cumulative counts) while others were floats (rates of increase).

Related

Flink standalone cluster: SIGSEGV crushing TaskManager

We have a simple standlone session cluster (20 TaskManagers) with several Flink streaming jobs.
Periodically (maybe couple of times a month) one of our TaskManagers dies with SIGSEGV error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f8302eab031, pid=3947, tid=0x00007f82876f6700
#
# JRE version: OpenJDK Runtime Environment (8.0_272-b10) (build 1.8.0_272-8u272-b10-0+deb9u1-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.272-b10 mixed mode linux-amd64 )
# Problematic frame:
# V [libjvm.so+0x5b6031]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# //hs_err_pid3947.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
hs_err_pid content is here
As I understand, the problem is somewhere in native code and I suppose that can be RocksDB error (we use RocksDB as state backend in our jobs).
All information about similar errors in internet is pretty old, e.g. https://issues.apache.org/jira/browse/FLINK-8309
We use Flink 1.10.0.
I will be glad to any help or advice, I have no idea how to localize the problem.

Java 7 supported Application crashes on Mojave

My application supported on
jdk1.7.0_76
JavaFx2.2.76_b13
Netbeans IDE
It's running successfully till Mac-OSX-HighSierra.
When I tried to run this application on Mojave using Netbeans the application crashes and giving following error.
Launching <fx:deploy> task from /Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre/../lib/ant-javafx.jar
jfx-deployment-script:
jfx-deployment:
jar:
objc[8382]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre/bin/java (0x1018244c0) and /Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre/lib/jli/./libjli.dylib (0x10b4f3480). One of the two will be used. Which one is undefined.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGILL (0x4) at pc=0x00007fff4200543b, pid=8382, tid=775
#
# JRE version: Java(TM) SE Runtime Environment (7.0_80-b15) (build 1.7.0_80-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [CoreFoundation+0x13f43b] _CFRelease+0x434
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/rahulsharma/NetBeansProjects/CreatFXMLTst/hs_err_pid8382.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Java Result: 134
debug:
jfxsa-debug:
BUILD SUCCESSFUL (total time: 17 seconds)
We're seeing the exact same crash in Firefox (illegal instruction at that address), it's probably an issue in CoreFoundation:
Firefox crashes # CoreFoundation+0x13f43b

SIGSEGV - Fatal Error in JavaFX Application - libjvm.so [duplicate]

#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007ff17a60c678, pid=4219, tid=140673779791616
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b124) (build 1.8.0-ea-b124)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b66 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x665678] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x38
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /media/data/K's World/javaFX/ChatApp/hs_err_pid4219.log
Compiled method (c1) 16675 988 3 java.util.concurrent.atomic.AtomicBoolean::set (14 bytes)
total in heap [0x00007ff16535ef50,0x00007ff16535f2a0] = 848
relocation [0x00007ff16535f070,0x00007ff16535f0a0] = 48
main code [0x00007ff16535f0a0,0x00007ff16535f1c0] = 288
stub code [0x00007ff16535f1c0,0x00007ff16535f250] = 144
metadata [0x00007ff16535f250,0x00007ff16535f258] = 8
scopes data [0x00007ff16535f258,0x00007ff16535f268] = 16
scopes pcs [0x00007ff16535f268,0x00007ff16535f298] = 48
dependencies [0x00007ff16535f298,0x00007ff16535f2a0] = 8
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
I am writing chat App in javaFx..and I am using eclipse IDE..
My Application is running well but I don't know why suddenly application has been stopped.
It sounds like you're running JavaFX with Java 8 on Linux, and you've run into this bug:
https://bugs.openjdk.java.net/browse/JDK-8141687
App crashes while starting Main.class in JavaFx
ava version "1.8.0_60" Java(TM) SE Runtime Environment (build
1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
ADDITIONAL OS VERSION INFORMATION : Mint17.2 Cinnamon 64Bit
SUGGESTION: Try a different version of Java/JavaFX.
Run sudo update-alternatives --config java to see what alternatives are already present on your system. I would downgrade to Java 1.7 if possible.
https://askubuntu.com/questions/272187/setting-jdk-7-as-default
If there are no suitable candidates, use apt-get install openjdk-7-jdk:
https://www.digitalocean.com/community/tutorials/how-to-install-java-on-ubuntu-with-apt-get
I had the same issue (except that it was java-8-oracle build 101) and found out why it was happening:
I have a login screen that appears before my main application and that screen gets closed after the login occurs and, apparently, closing it (or even hiding it) and showing a new window makes it crash.

How can I run Tensorflow on one single core?

I'm using Tensorflow on a cluster and I want to tell Tensorflow to run only on one single core (even though there are more available).
Does someone know if this is possible?
To run Tensorflow on one single CPU thread, I use:
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
device_count limits the number of CPUs being used, not the number of cores or threads.
tensorflow/tensorflow/core/protobuf/config.proto says:
message ConfigProto {
// Map from device type name (e.g., "CPU" or "GPU" ) to maximum
// number of devices of that type to use. If a particular device
// type is not found in the map, the system picks an appropriate
// number.
map<string, int32> device_count = 1;
On Linux you can run sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread" to see how many CPUs/cores/threads you have, e.g. the following has 2 CPUs, each of them has 8 cores, each of them has 2 threads, which gives a total of 2*8*2=32 threads:
fra#s:~$ sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread"
Socket Designation: CPU1
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Socket Designation: CPU2
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Tested with Tensorflow 0.12.1 and 1.0.0 with Ubuntu 14.04.5 LTS x64 and Ubuntu 16.04 LTS x64.
Yes it is possible by thread affinity. Thread affinity allows you to decide which specific thread to be executed by which specific core of the cpu. For thread affinity you can use "taskset" or "numatcl" on linux. You can also use https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html and https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
The following code will not instruct/direct Tensorflow to run only on one single core.
TensorFlow 1
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
TensorFlow 2
import os
# reduce number of threads
os.environ['TF_NUM_INTEROP_THREADS'] = '1'
os.environ['TF_NUM_INTRAOP_THREADS'] = '1'
import tensorflow
This will generate in total at least N threads, where N is the number of cpu cores. Most of the time only one thread will be running while others are in sleeping mode.
Sources:
https://github.com/tensorflow/tensorflow/issues/42510
https://github.com/tensorflow/tensorflow/issues/33627
You can restrict the number of devices of a certain type that TensorFlow uses by passing the appropriate device_count in a ConfigProto as the config argument when creating your session. For instance, you can restrict the number of CPU devices as follows :
config = tf.ConfigProto(device_count={'CPU': 1})
sess = tf.Session(config=config)
with sess.as_default():
print(tf.constant(42).eval())

Jboss 7.1.1 Final occuping huge physical RAM of linux

I am using jboss 7.1.1 Final with jdk version 1.6.0_45 and while starting jboss i just configured 5gb for heap and 1gb for non-heap. My Linux complete RAM size is around 60GB. After starting jboss some time i can see from linux top command its occupied ram is 50GB. Moreover from jconsole & jvisualvm tool i can see my jboss ram utilization keep up & down reaching max 90% ( approx. using between 4-5 gb)
top - 11:52:35 up 1 day, 17:40, 4 users, load average: 0.89, 1.20, 1.27
Tasks: 174 total, 1 running, 173 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.1%us, 1.8%sy, 0.0%ni, 85.8%id, 0.0%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 62.948G total, 49.872G used, 13.076G free, 347.309M buffers
Swap: 8197.219M total, 0.000k used, 8197.219M free, 2053.590M cached
And Jboss parameters are like this :
-D[Standalone] -server -Xms5120m -Xmx5120m -XX:MaxPermSize=1024m -XX:PermSize=1024m -Djava.net.preferIPv4Stack=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dorg.jboss.resolver.warning=true -XX:+UseConcMarkSweepGC -XX:+CMSPermGenSweepingEnabled -XX:+UseParNewGC
Please help out why more Linux RAM is consumed ????
Regards
Veera