how to Identify the cloud watch metrics for specific KCL in kinesis streams - amazon-cloudwatch

we have multiple kinesis consumer applications(KCL 2.0) are consuming the data from the same kinesis stream. All the consumer is sending the metrics to cloud watch and in the cloud watch those are showing up.
If i wanted to specifically understood and scale to multiple instances of one consumer application. how can we achive that... ?
cloud watch metrics Get records iterator age, Incoming data - sum (Count)

KCL metrics are provided under separate namespace in cloudwatch. The namespace that is used to upload metrics is the applicationName that you provide in KCL configuration. So if you have multiple KCL application with differnt applicationName then, you will find those metrics in cloudwatch metric console under "custom namespaces"
complete list of KCL metrics can be found here

Related

How to get Units in the CloudWatch metrics ingested via the amazon-cloudwatch-agent collectd socket listener?

I have the amazon-cloudwatch-agent configured to retrieve custom metrics via collectd.
That is working in the sense that I can see the metrics in CloudWatch UI, but they have no unit (for example the collectd_GenericJMX_java_heap_used should have Bytes as CloudWatch Unit).
As far as I know collectd does not have the concept of units, and it seems that neither does telegraf.Metric (which I think is the underlying concept that is used in the amazon-cloudwatch-agent to ingest collectd metrics).
Is there any way, to specify the "units" for collectd metrics?

Sensor data timestamps using VictoriaMetrics

I am trying to figure out how to record timestamped sensor data to an instance of VictoriaMetrics. I have an embedded controller with a sensor that is read once per second. I would like VictoriaMetrics to poll the controller once a minute, and log all 60 readings with their associated timestamps into the TSDB.
I have the server and client running, and measuring system metrics is easy, but I can't find an example of how to get a batch of sensor readings to be reported by the embedded client, nor have I been able to figure it out from the docs.
Any insights are welcome!
VictoriaMetrics supports data ingestion via various protocols. All these protocols support batching, i.e. multiple measurements can be sent in a single request. So you can choose the best suited protocol for inserting batches of collected measurements into VictoriaMetrics. For example, if Prometheus text exposition format is selected for data ingestion, then a batch of metrics could look like the following:
measurement_name{optional="labels"} value1 timestamp1
...
measurement_name{optional="labels"} valueN timestampN
VictoriaMetrics can poll (scrape) metrics from the configured address via HTTP. It expects application to return metrics value in the exposition text format. The exposition text format is compatible with Prometheus, so its libraries for different languages will be compatible with VictoriaMetrics as well.
There is also a how-to guide for instrumenting golang application to expose metrics and scrape via VictoriaMetrics here. It describes the monitoring basics for any service or application.

API or other queryable source for getting total NiFi queued data

Is there an API point or whatever other queryable source where I can get the total queued data?:
setting up a little dataflow in NiFi to monitor NiFi itself sounds sketchy, but if it's a common practice, let's be it. Anyway, I cannot find the API endpoint to get that total
Note: I have a single NiFi instance: I don't have nor will implement S2S reporting since I am on a single instance, single node NiFi setup
The Site-to-Site Reporting tasks were developed because they work for clustered, standalone, and multiple instances thereof. You'd just need to put an Input Port on your canvas and have the reporting task send to that.
An alternative as of NiFi 1.10.0 (via NIFI-6780) is to get the nifi-sql-reporting-nar and use QueryNiFiReportingTask, you can use a SQL query to get the metrics you want. That uses a RecordSinkService controller service to determine how to send the results, there are various implementations such as Site-to-Site, Kafka, Database, etc. The NAR is not included in the standard NiFi distribution due to size constraints, but you can get the latest version (1.11.4) here, or change the URL to match your NiFi version.
#jonayreyes You can find information about how to get queue data from NiFi API Here:
NiFi Rest API - FlowFile Count Monitoring

Creating multiple subs on same topic to implement load sharing (pub/sub)

I spent almost a day on google pub sub documentation to create a small app. I am thinking of switching from rabbitMQ to google pub/sub. Here is my question:
I have an app that push messages to a topic (T). I wanted to do load sharing via subscribers. So I created 3 subscribers to T. I have kept the name of all 3 subs same (S), so that I don't get same message 3 times.
I have 2 issues:
There is no where I console I see 3 same subscribers to T. It shows 1
If I try to start all 3 instances of subscribers at same time. I get "A service error has occurred.". Error disappeared if I start in sequential manner.
Lastly, Is google serious about pub/sub ? Looking at the documentations and public participations, I am not sure if I should switch to google pub/sub.
Thanks,
In pub/sub, each subscription gets a copy of every message. So to load balance handling message, you don't want 3 different subscriptions, but a single subscription that distributes messages to 3 workers.
If you are using pull delivery, simply create a single subscription (as a one-time action when you set up the system), and have each worker pull from the same subscription.
If you are using push delivery, have a single subscription pushing to a single endpoint that provides load balancing (e.g. push to a HTTP load balancer with multiple instances in a backend service
Google is serious about Pub/Sub, it is deeply integrated into many products (GCS, BigQuery, Dataflow, Stackdriver, Cloud Functions etc) and Google uses it internally.
As per documentation on GCP,https://cloud.google.com/pubsub/architecture.
Load balanced subscribers are possible, but all of them have to use same subscription. Don't have any code sample or POC ready but working on same.

Is there a way to do hourly batched writes from Google Cloud Pub/Sub into Google Cloud Storage?

I want to store IoT event data in Google Cloud Storage, which will be used as my data lake. But doing a PUT call for every event is too costly, therefore I want to append into a file, and then do a PUT call per hour. What is a way of doing this without losing data in case a node in my message processing service goes down?
Because if my processing service ACKs the message, the message will no longer be in Google Pub/Sub, but also not in Google Cloud Storage yet, and at that moment if that processing node goes down, I would have lost the data.
My desired usage is similar to this post that talks about using AWS Kinesis Firehose to batch messages before PUTing into S3, but even Kinesis Firehose's max batch interval is only 900 seconds (or 128MB):
https://aws.amazon.com/blogs/big-data/persist-streaming-data-to-amazon-s3-using-amazon-kinesis-firehose-and-aws-lambda/
If you want to continuously receive messages from your subscription, then you would need to hold off acking the messages until you have successfully written them to Google Cloud Storage. The latest client libraries in Google Cloud Pub/Sub will automatically extend the ack deadline of messages for you in the background if you haven't acked them.
Alternatively, what if you just start your subscriber every hour for some portion of time? Every hour, you could start up your subscriber, receive messages, batch them together, do a single write to Cloud Storage, and ack all of the messages. To determine when to stop your subscriber for the current batch, you could either keep it up for a certain length of time or you could monitor the num_undelivered_messages attribute via Stackdriver to determine when you have consumed most of the outstanding messages.