Is there a way to obtain total size of a device. Currently, I can get free, used, and reserved metrics but the total size of a device/file system doesn't seem to be available.
Update 1
The following aggregation pluin combinations were tried without required results:
GroupBy Host+TypeInstance
GroupBy Host+PluginInstance
GroupBy Host+PluginInstance+TypeInstance
Sample configuration:
<Plugin aggregation>
<Aggregation>
Plugin "df"
Type "df_complex"
SetPlugin "df"
GroupBy "Host"
GroupBy "TypeInstance"
CalculateSum true
</Aggregation>
</Plugin>
I suggest you take a look at the aggregation plugin which might enable you to compute this
Related
I have a dataset with 200 rows. When setting
pd.options.display.max_rows = 200
we see all rows in a scrollable area:
But if we set it to less than the full dataset -thus requiring truncation - then we get a summary . and only 10 rows ?
pd.options.display.max_rows = 100
How can the options be set to really display 100 rows?
It appears that this behavior is intended to be controlled as much as it can be by display.large_repr:
display.large_repr : 'truncate'/'info'
For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
show a truncated table (the default from 0.13), or switch to the view from
df.info() (the behaviour in earlier versions of pandas).
[default: truncate] [currently: truncate]
The default truncate gives the present behavior. info does not even give any data at all:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Columns: 3 entries, column to column_val_cnt
dtypes: int64(1), object(2)
memory usage: 4.8+ KB
So there does not appear to be any way to achieve what I'm looking for.
Reference: Pandas Display Configurations
Btw I also looked into an indirect way to get what I was looking for
from IPython.display import display, HTML, Markdown
This did not work either. I'm getting close to "calling it a day" on this feature as "not supported".
I am trying to write PromQL on Prometheus UI to get cpu usage of all replicasets and their containers by fixing the cluster, namespace, and deployment. My desirable outcome is to graph each {replicaset, container} pair cpu usage on the same graph. Since there are no labels within container_cpu_usage_seconds_total that allow me to group by replicaset name. I am sure that I have to retrieve replicaset name and somehow aggregate by containers using the kube_pod_info metrics. And then, join different metrics to get what I want.
Below is what I came up with right now:
avg by (replicaset, container) (
container_cpu_usage_seconds_total
* on (replicaset) group_left (created_by_kind, created_by_name)
(kube_pod_info {app_kubernetes_io_name="kube-state-metrics", cluster=~"${my_cluster}", namespace=~"${my_namespace}", created_by_kind=~"ReplicaSet"} * 0)
)
I got an error saying "many-to-many matching not allowed: matching labels must be unique on one side".
My desirable output is:
*{r, c} means certain {replicaset, container} pair
I am using Spark 3.x in Python. I have some data (in millions) in CSV files that I have to index in Apache Solr.
I have deployed pysolr module for this purpose
import pysolr
def index_module(row ):
...
solr_client = pysolr.Solr(SOLR_URI)
solr_client.add(row)
...
df = spark.read.format("csv").option("sep", ",").option("quote", "\"").option("escape", "\\").option("header", "true").load("sample.csv")
df.toJSON().map(index_module).count()
index_module module simply get one row of data frame as json and then index in Solr via pysolr module. Pysolr support to index list of documents instead of one. I have to update my logic so that instead of sending one document in each request, I'll send a list of document. Definatelty, it will improve the performance.
How can I achieve this in PySpark ? Is there any alternative or best approach instead of map and toJSON ?
Also, My all activities are completed in transformation functions. I am using count just to start the job. Is there any alternative dummy function (of action type) in spark to do the same?
Finally, I have to create Solr Object each time, is there any alternative for this ?
I am trying to use Flink's 1.9 LAST_VALUE. Unlike the Alibaba docs, it does not accept a second argument for ORDER and it does not like the OVER(...) clause. So, I am not sure, how to feed into LAST_VALUE a criteria?
I was hoping that if you set the processing to "event-time", last_value would return the latest value based on event-time, but instead, it is returning the latest value read?
The function LAST_VALUE is only supported by the Blink planner when running SQL on Flink. One needs to explicitly activate the usage of the Blink planner via
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.java.StreamTableEnvironment;
StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(bsEnv, bsSettings);
Only then, you should be able to run SQL queries containing the LAST_VALUE function.
gem5 stats include some lines with multiple value columns.
How should we understand them?
For example:
The following line is very intuitive and self-explanatory.
system.cpu.itb.wrAccesses 761786015 # TLB accesses on write requests
However, the following lines have more than one value columns.
system.cpu.iq.fu_full::MemRead 165768608 93.48% 93.49% # attempts to use FU when none available
system.cpu.iq.fu_full::MemWrite 43109 0.02% 93.52% # attempts to use FU when none available
system.cpu.iq.fu_full::FloatMemRead 11493101 6.48% 100.00% # attempts to use FU when none available
In my understanding, the first value (11493101) is the absolute number of attempts to use FloatMemRead unit when none available, which is 6.48% of the attempts to use all units when none available. The last column seems like the cumulative distribution of attempts to use an FU when none available. Is that correct?