collectd disk aggregation does not work? - collectd

The following aggregates all the CPUs in one specific host, creating cpu-all-sum/cpu-idle, cpu-all-sum/cpu-nice, etc.
<Aggregation>
Plugin "cpu"
Type "cpu"
SetPlugin "cpu"
SetPluginInstance "all-%{aggregation}"
GroupBy "Host"
GroupBy "TypeInstance"
CalculateSum true
</Aggregation>
However, the following does not work:
<Aggregation>
Plugin "disk"
PluginInstance "/xvd./"
Type "disk"
SetPlugin "disk"
SetPluginInstance "all-%{aggregation}"
GroupBy "Host"
GroupBy "TypeInstance"
CalculateSum true
</Aggregation>
... it is supposed to aggregate IO ops on all the "xvd" disks. It creates no files, and there's nothing in the log.
Any clues?

Here's what collectd logs for me in a similar situation with debug log level:
aggregation plugin: The "disk_octets" type (data set) has more than one data source. This is currently not supported by this plugin. Sorry.
I'm not quite sure, but it seems that the aggregation plugin does not support plugins with mutliple value samples - cpu only has one value, and disk has 2 - read and write.

Related

Pylint: same pylint and pandas version on 2 machines, 1 fails

I have 2 places running the same linting job:
Machine 1: Ubuntu over SSH
pandas==1.2.3
pylint==2.7.4
python 3.8.10
Machine 2: Gitlab CI Docker image, python:3.8.12-buster
pandas==1.2.3
pylint==2.7.4
Python 3.8.12
The Ubuntu machine is able to lint all the code fine, and it has for many months. Same for the CI job, except it had been running Python 3.7.8. Now that I upgraded the Docker image to Python 3.8.12, it throws several no-member linting errors on some Pandas objects. I've tried clearing CI caches etc.
I wish I could provide something more reproducible. But, to check my understanding of what a linter is doing, is it theoretically possible that a small version difference in python messes up pylint like this? For something like a no-member error on Pandas objects, I would think the dominant factor is the pandas version, but those are equal, so I'm confused!
Update:
I've looked at the Pandas code for pd.read_sql_query, which is what's causing the no-member error. It says:
def read_sql_query(
sql,
con,
index_col=None,
coerce_float=True,
params=None,
parse_dates=None,
chunksize: Optional[int] = None,
) -> Union[DataFrame, Iterator[DataFrame]]:
In Docker, I get E1101: Generator 'generator' has no 'query' member (no-member) (because I'm running .query on the returned dataframe). So it seems Pylint thinks that this function returns a generator. But it does not make this assumption in my other setup. (I've also verified the SHA sum of pandas/io/sql.py matches). This seems similar to this issue, but I am still baffled by the discrepancy in environments.
A fix that worked was to bump a limit like:
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in my .pylintrc file, as explained here.
I'm unsure why/if this is connected to my change in Python version, but I'm happy to use this and move on for now. It's probably complex.
(Another hack was to write a function that returns the passed arg if the passed arg is a dataframe, and returns 1 dataframe if the passed arg is an iterable of dataframes. So the ambiguous-type object could be passed through this wrapper to clarify things for Pylint. While this was more intrusive on our codebase, we had dozens of calls to pd.read_csv and pd.real_sql_query, and only about 3 calls caused confusion for Pylint, so we almost used this solution)

DataFrames, df."example vector in the data()" raises a MethodError

So I have a dataframe df . CSV.read(...) and I have a column labeled 'Population in thousands (2017)'
and I used a command
df."Population in thousands (2017)"
This used to be what was working... but I installed some packages and created something and now I get THIS error when I input
df."Population in thousands (2017)"
ERROR: MethodError: no method matching getproperty(::DataFrame, ::String)
Closest candidates are:
getproperty(::AbstractDataFrame, ::Symbol) at C:\Users\jerem\.julia\packages\DataFrames\S3ZFo\src\abstractdataframe\abstractdataframe.jl:295
getproperty(::Any, ::Symbol) at Base.jl:33
Stacktrace:
[1] top-level scope
# REPL[10]:1
Thank you in advance.
I can confirm that this works on the current (at the time of writing) DataFrames release:
(jl_yo71eu) pkg> st
Status `...\AppData\Local\Temp\jl_yo71eu\Project.toml`
[a93c6f00] DataFrames v1.2.2
julia> using DataFrames
julia> df = DataFrame("Population in thousands (2017)" => rand(5));
julia> df."Population in thousands (2017)"
5-element Vector{Float64}:
0.8976467991472025
0.32646068570785514
0.5168819082429569
0.8488198612708232
0.27250141317576815
I'm assuming you're on an outdated version of DataFrames?
Edited to add following discussion in comments:
Bogumil can of course read your DataFrames version of the random folder name, so it appears you really are on an outdated version. You should do add DataFrames#1.2 in the package manager to force an upgrade, which will tell you what packages in your current environment are holding you back.

How to determine at runtime if the dl4j/nd4j backend is CPU or GPU?

There is an optimization for dl4j that only works with GPUs:
DataTypeUtil.setDTypeForContext(DataBuffer.Type.HALF)
I'd like to only make that call if the backend is a GPU.
In my Maven pom.xml, I've got
<!-- CPU or GPU -->
<nd4j.backend>nd4j-native-platform</nd4j.backend>
<!--<nd4j.backend>nd4j-cuda-8.0-platform</nd4j.backend>-->
And I was looking at ways to read that value from Java, all of which seem clunky. It would be much easier if I could query dl4j or nd4j for "What flavor of backend are we running?" and then make the optimization call based on that.
Edit from answer:
Nd4jBackend.load().let { be->
println("nd4j Backend: ${be.javaClass.simpleName}")
if(be.javaClass.simpleName.toLowerCase().contains("gpu")) {
println("Optimizing for GPU")
DataTypeUtil.setDTypeForContext(DataBuffer.Type.HALF)
}
}
See if you can use Nd4j.backend. Printing it with cuda enabled I get:
org.nd4j.linalg.jcublas.JCublasBackend
and without cuda:
org.nd4j.linalg.cpu.nativecpu.CpuBackend
It also prints out at the beginning when you start up nd4j. There should be a vendor it prints out for the backend.

Getting CPU statistics from libvirt

Is it possible using the python bindings of libvirt to get the running, waiting and ready time of a VM from the host?
I'm not sure what you mean by "waiting" and "ready" time, but whatever that is, I believe it is possible using the getCPUStats() function called on a domain object. The docstrings of that function follows:
getCPUStats(total, flags=0)
Extracts CPU statistics for a running domain. On success it will return a list of data of dictionary type. If boolean total is False or 0, the first element of the list refers to CPU0 on the host, second element is CPU1, and so on. The format of data struct is as follows:
[{cpu_time:xxx}, {cpu_time:xxx}, ...]
If it is True or 1, it returns total domain CPU statistics in the format of
[{cpu_time:xxx, user_time:xxx, system_time:xxx}]

Accumulo-Pig error - Connector info for AccumuloInputFormat can only be set once per job

Versions:
Accumulo 1.5
Pig 0.10
Attempted:
Read/write data in/into Accumulo from Pig, using accumulo-pig.
Encountered an error - any insight into getting past this error is greatly appreciated.
Switching to Accumulo 1.4 is not an option as we are using the Accumulo Thrift Proxy in our C# codebase.
Impact:
This is currently a roadblock in our project.
Source reference:
Source code - https://git-wip-us.apache.org/repos/asf/accumulo-pig.git
Error:
In attemtping to read a dataset in Accumulo, from Pig, I am getting the following error-
org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
Connector info for AccumuloInputFormat can only be set once per job
Code snippet:
DATA = LOAD 'accumulo://departments?instance=indra&user=root&password=xxxxxxx&zookeepers=cdh-dn01:2181' using org.apache.accumulo.pig.AccumuloStorage() AS (row, cf, cq, cv, ts, val);
dump DATA;
Try using the ACCUMULO-1783-1.5 branch from the same repository. The way that Pig sets up the InputFormat doesn't play nicely with how Accumulo sets up InputFormats (notably, Accumulo makes a funny assertion that you never call the same static method more than one for a Configuration).
I have been using pig 0.12 -- I doubt there's a difference in how 0.10 sets up the InputFormats as opposed to 0.12, but I'm not positive YMMV.
I just pushed a fix to the above branch that gets rid of the previously mentioned limitation on Hadoop version.