Apache Nifi Historical Statistics of a Component

Apache Nifi Historical Statistics of a Component - apache

Can anbody explain this min/max/mean in the following screenshot

It shows the Minimal, Maximal and Mean number of output files per some predetermined amount of time (my guess a minute in your case).
"
Min/Max/Mean: The minimum, maximum, and mean (arithmetic mean, or average) values are shown. These values are based only on the range of time selected, if any time range is selected. If this instance of NiFi is clustered, these values are shown for the cluster as a whole, as well as each individual node. In a clustered environment, each node is shown in a different color. This also serves as the graph's legend, showing the color of each node that is shown in the graph. Hovering the mouse over the Cluster or one of the nodes in the legend will also make the corresponding node bold in the graph."
You can read more about it in the official documentation:
User Guide - Historical Statistics of a Component

Related

What do the entries in Lamport clocks representations represent?

I'm trying to understand an illustrative example of how Lamport's algorithm is applied. In the course that I'm taking, we were presented with two representations of the clocks within three [distant] processes, one with the lamport alogrithm applied and the other without.
Without the Lamport algorithm:
With the lamport algorithm applied:
My question is concerning the validity of the change that was applied to the third entry of the table pertaining to the process P1. Shouldn't it be, as the Lamport algorithm instructs, max(2, 2) + 1, which is 3 not 4?
When I asked some of my classmates regarding this issue, one of them informed me that the third entry of the table of P1 represents a "local" event that happened within P1, and so when message A is arrived, the entry is updated to max(2, 3) + 1, which is 4. However, if that was the case, shouldn't the receipt of the message be represented in a new entry of its own, instead of being put in the same entry that represents the local event that happened within P1?
Upon further investigation, I found, in the same material of the course, a figure that was taken from Tannenbaum's Distributed Systems: Principles and Paradigms, in which the new values of an entry that corresponds to the receipt of a message is updated by adding 1 to the max of the entry before it in the same table and the timestamp of the received message, as shown below, which is quite different from what was performed in the first illustration.
I'm unsure if the problem relates to a faulty understanding that I have regarding the algorithm, or to the possibility that the two illustrations are using different conventions with respect to what the entries represent.

validity of the change that was applied to the third entry of the table pertaining to the process P1
In classical lamport algorithm, there is no need to increase local counter before taking max. If you do that, that still works, but seems like an useless operation. In the second example, all events are still properly ordered. In general, as long as counters go up, the algorithm works.
Another way of looking at correctness is trying to rebuild the total order manually. The hard requirement is that if an event A happens before an event B, then in the total order A will be placed before B. In both picture 2 and 3, everything is good.
Let's look into picture 2. Event (X) from second cell in P0 happens before the event (Y) of third cell of P1. To make sure X does come before Y in the total order it is required that the time of Y to be larger than X's. And it is. It doesn't matter if the time difference is 1 or 2 or 100.
in which the new values of an entry that corresponds to the receipt of a message is updated by adding 1 to the max of the entry before it in the same table and the timestamp of the received message, as shown below, which is quite different from what was performed in the first illustration
It's actually pretty much the same logic, with exception of incrementing local counter before taking max. Generally speaking, every process has its own clock and every event increases that clock by one. The only exception is when a clock of a different process is already in front, then taking max is required to make sure all events have correct total order. So, in the third picture, P2 adjusts clock (taking max) as P3 is way ahead. Same for P1 adjust.

Grand totals row not summing in Google Data Studio

Well, I'm absolute newbie in Google Data Studio, but for any reason, my grand totals rows is not working.
I'm learning to use this tool, and I made an easy table with just countries and sessions.
Piece of Cake. Now I just want to add a total row where it sums all sessions. That's all. I activated option Show Summary Row but it shows nothing.
Thing's I've done and not worked:
Update and refresh
Changed time period and tried different dates just in case.
Delete and create again full table.
Checked connection. I get data and the data is right, I just cannot sum it.
Changed size and format of table, just in case it where a problems or margins or font color.
And I know it can be done, because different sources. I've read this question here:
Grand Total is wrong in Google Data Studio
But it did not help. In that question, a user posted an image in the comments:
As you can see, he managed to get what I'm trying to do.
So I must be doing something wrong, and I do not why.
UPDATE 2: If I apply a filter, I get no totals. You can see my config in the right side of image.
Can anybody give me a clue of how to make a grand totals row in Google Data Studio?
Thanks

Sounds like a bug. It should be a case of selecting that tick box. Strangely, I looked at an existing table I have with totals and when I unticked the box and then ticked again, the totals didn't reappear and disappeared off another table on the page (like your example). They did reappear eventually with some refreshing of the data and page but seems like there's something wrong with them.

I don't think this is a bug I think it part of the design.
I actually just discovered the reason this is happening at least for me, it doesn't actually sum the values in the table, the grand total summary of a table is a sum of whatever the metric being used is not the actual rows shown in the chart. so if you have a dimension (like age / gender) where there is data thresholding applied internally by google but are using a metric such as users you will see the grand total from the metric value without the thresholding applied from the dimension.
Proof below
You can see the grand total for column 2 is not 953.6 its 453.6 and if i look at a non threshold dimension (country)
you can see where the 953.6 comes from since the data source supplied to the table uses 80% of all users 1192 * .8 give me 953.6 which is what the grand total is displaying. Conclusion, the only way this number could be possible is if, when using a threshold dimension for a table with metric there will be a discrepancy since the grand total value is not coming from the table values but rather from metric source data, which will not have the tables dimension applied for some odd reason.

How to see absolute values instead of percents on Events graphs?

I use custom events for tracking statistics about some deprecated modules are used by users. And I`ll want to remove migrations from deprecated module to a new one when amount of usages will be lower a "waterline".
So, it is not enough convenient to track it via clicking on a date on a graph and check amount of events at the date. Could I somehow switch a type of values on a graph to absolute values?

Mike from Fabric here. For the graphs, we will either show the percentage if the custom attribute is a string or the 25th, median and 75th percentiles if the custom attribute is a number. However, the top 10 custom attribute count will be present below the graph.

Keeping dValIds For auto-generated dimensions consistent

I am working with Endeca 6.4.1 and have many auto-generated dimensions present in my pipeline (mapped using Dev-studio), the application's indexing is CAS-less. So only FCM is creating Dimensions and assigning dValIds. I am using Endeca SEO, so the dVal Id directly reflects in my URL, and if an auto-gen dimension's value's Id changes, a link to that navigation State is lost.
I have a flat file as the dimension's source, for example
product.feature|neon finish
What I want is that, if the value some day changes to Neon-finish or Neon color, the dValId that was assigned to neon finish should be transferred to the new value. I can keep a custom mapping of the change to track that neon finish has been changed to a new value.
Is there any way to achieve this, may be by using some manipulators?
Please share your thoughts.

There are two basic ways to do this:
1) Update the state files when you change a dimension value (APPDIR/data/state/autogen_dimensions.xml ). This would most likely be a manual process.
2) A more robust but complex solution is to change the dimension values to be some ID number and use a synonym for the display name. Then the display name can change without a change to the id number. This may require some serious changes to your pipeline.
Good luck

Build a Kibana Histogram with buckets dynamically created by ElasticSearch terms aggregation

I want to be able to combine the functionality of the Kibana Terms Graph (be able to create buckets based on uniqueness of values from a particular attribute) and Histogram Graph (separate data into buckets based on queries and then illustrate the date based on time).
Overall, I want to create a Histogram, but I only want to create the Histogram based on the results of one query, not multiple queries like it's being done in the Kibana demo app. Instead, I want each bucket to be dynamically created per unique value of my particular field. For example, consider the following data returned by my query:
{"myValueType": "New York"}
{"myValueType": "New York"}
{"myValueType": "New York"}
{"myValueType": "San Francisco"}
{"myValueType": "San Francisco"}
Also assume that each record has a timestamp field for separating histogram data by date. For that particular date, I want the data to be communicated as a count of 3 into the New York bucket and a count of 2 into the San Francisco bucket. However, I am only able to show a count of 5 for my one linked query. When I configure the Histogram, I am able to specify a field to use for my timestamp, but not to create buckets from. I could've sent a field to compute a total/min/max/mean, but this field would've had to be numeric, so that is not the solution either.
If I were to use a Term Graph to create a pie or bar graph, I am indeed able to separate my data into buckets based on the unique values of my specified field (in this case, "myValueType"), but this would total up the data for all-time, not split up the data by timestamp. Although this is good information to know, it is not ideal because I wouldn't be able to detect trends in my data.
I am looking for a solution that will do one of the following:
Let me dynamically create queries in my Kibana dash board to create "buckets" in a Histogram
Allow me to run an ElasticSearch Terms Aggregation to supposidly split up my data into buckets based on "myValueType" and integrate these results into my Histogram
Customize the JSON of my dashboard, but this doesn't look possible to me
Create my own custom panel, but this is not desirable
Link a Kibana "TopN" query in Kibana. Actually, this has proven to be a work-around for my problem because the TopN query dynamically created one query per unique value/term from the specified fieldName. However, the problem is that I can only link one colour to this TopN query and each unique term will be placed in a bucket that uses a different shade of the colour. Ideally, every bucket in my Histogram will have a completely different colour associated to it. Imagine how difficult it will be to distinguish unique terms as the number of buckets grows.
If all else fails, I make one query per unique value from my search field. This will allow me to have one unique colour per bucket, but as the number of unique terms in the "myValueType" field changes, I need to keep adding/removing queries from Kibana, which can get quite messy.
I'm sure there is someting that I am missing here. Please help me out. Many thanks.
A highly related SOF question: Is it Possible to Use Histogram Facet or Its Curl Response in Kibana

This would be a great feature. It looks like it will be supported in Kibana4, but there doesn't seem to be much more info out there than that.
For reference: https://github.com/elasticsearch/kibana/issues/1249

Maybe a little late but it is actually possible in the newest BETA release.
kibana 4 beta 3 installation download

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas