Sensor data timestamps using VictoriaMetrics - victoriametrics

I am trying to figure out how to record timestamped sensor data to an instance of VictoriaMetrics. I have an embedded controller with a sensor that is read once per second. I would like VictoriaMetrics to poll the controller once a minute, and log all 60 readings with their associated timestamps into the TSDB.
I have the server and client running, and measuring system metrics is easy, but I can't find an example of how to get a batch of sensor readings to be reported by the embedded client, nor have I been able to figure it out from the docs.
Any insights are welcome!

VictoriaMetrics supports data ingestion via various protocols. All these protocols support batching, i.e. multiple measurements can be sent in a single request. So you can choose the best suited protocol for inserting batches of collected measurements into VictoriaMetrics. For example, if Prometheus text exposition format is selected for data ingestion, then a batch of metrics could look like the following:
measurement_name{optional="labels"} value1 timestamp1
...
measurement_name{optional="labels"} valueN timestampN

VictoriaMetrics can poll (scrape) metrics from the configured address via HTTP. It expects application to return metrics value in the exposition text format. The exposition text format is compatible with Prometheus, so its libraries for different languages will be compatible with VictoriaMetrics as well.
There is also a how-to guide for instrumenting golang application to expose metrics and scrape via VictoriaMetrics here. It describes the monitoring basics for any service or application.

Related

Using Azure IoT - telemetry from a Windows desktop application

I work for a company that manufactures large scientific instruments, with a single instrument having 100+ components: pumps, temperature sensors, valves, switches and so on. I write the WPF desktop software that customers use to control their instrument, which is connected to the PC via a serial or TCP connection. The concept is the same though - to change a pump's speed for example, I would send a "command" to the instrument, where an FPGA and custom firmware would take care of handling that command. The desktop software also needs to display dozens of "readback" values (temperatures, pressures, valve states, etc.), and are again retrieved by issuing a "command" to request a particular readback value from the instrument.
We're considering implementing some kind of telemetry service, whereby the desktop application will record maybe a couple of dozen readback values, each having its own interval - weekly, daily, hourly, per minute or per second.
Now, I could write my own telemetry solution, whereby I record the data locally to disk then upload to a server (say) once a week, but I've been wondering if I could utilise Azure IoT for collecting the data instead. After wading through the documentation and concepts I'm still none the wiser! I get the feeling it is designed for "physical" IoT devices that are directly connected to the internet, rather than data being sent from a desktop application?
Assuming this is feasible, I'd be grateful for any pointers to the relevant areas of Azure IoT. Also, how would I map a single instrument and all its components (valves, pumps, etc) to an Azure IoT "device"? I'm assuming each component would be a device, in which case is it possible to group multiple devices together to represent one customer instrument?
Finally, how is the collected data reported on? Is there something built-in to Azure, or is it essentially a glorified database that would require bespoke software to analyse the recorded data?
Azure IoT would give you:
Device SDKs for connecting (MQTT or AMQP), sending telemetry, receiving commands, receiving messages, reporting properties, and receiving property update requests.
An HA/DR service (IoT Hub) for managing devices and their authentication, configuring telemetry routes (where to route the incoming messages).
Service SDKs for managing devices, sending commands, requesting property updates, and sending messages.
If it matches your solution, you could also make use of the Device Provisioning Service, where devices connect and are assigned an IoT hub. This would make sense, for instance, if you have devices around the world and wish to have them connect to the closest IoT hub you have deployed.
Those are the building blocks. You'd integrate the device SDK into your WPF app. It doesn't have to be a physical device, but the fact it has access to sensor data makes it behave like one and that seems like a good fit. Then you'd build a service app using the Service SDKs to manage the fleet of WPF apps (that represent an instrument with components, right?). For monitoring telemetry, it would depend on how you choose to route it. By default, it goes to an EventHub instance created for you. You'd use the EventHub SDK to subscribe to those messages. Alternatively, or in addition to, those telemetry messages could be routed to Azure Storage where you could perform historical analysis. There are other routing options.
Does that help?

API or other queryable source for getting total NiFi queued data

Is there an API point or whatever other queryable source where I can get the total queued data?:
setting up a little dataflow in NiFi to monitor NiFi itself sounds sketchy, but if it's a common practice, let's be it. Anyway, I cannot find the API endpoint to get that total
Note: I have a single NiFi instance: I don't have nor will implement S2S reporting since I am on a single instance, single node NiFi setup
The Site-to-Site Reporting tasks were developed because they work for clustered, standalone, and multiple instances thereof. You'd just need to put an Input Port on your canvas and have the reporting task send to that.
An alternative as of NiFi 1.10.0 (via NIFI-6780) is to get the nifi-sql-reporting-nar and use QueryNiFiReportingTask, you can use a SQL query to get the metrics you want. That uses a RecordSinkService controller service to determine how to send the results, there are various implementations such as Site-to-Site, Kafka, Database, etc. The NAR is not included in the standard NiFi distribution due to size constraints, but you can get the latest version (1.11.4) here, or change the URL to match your NiFi version.
#jonayreyes You can find information about how to get queue data from NiFi API Here:
NiFi Rest API - FlowFile Count Monitoring

Puppeteer: WebRTC statistics

I am planning to use Puppeteer for WebRTC call. I hope it should be easy. I am not sure how do I collect statistics like WebRTC call is passed or failed, how many media packets (UDP packets exchanged), stun / turn pass fail, media parameters like jitter, delay etc.
Can somebody please help me to understand, using Puppeteer how can one collect WebRTC related statistics.
There is a well known WebRTC test engine based on selenium and selenium grid called KITE. For references, and quick start you can check the simple KITE-AppRTC-Test implementation to see how they are collecting the stats, and show them. You might want to run the demos as well because it seems to have the results you are looking for.
Among many other approaches one might be -
Collect WebRTC connection metrics by calling getStats API. What you see in chrome://webrtc-internals is a visual representation of this getStats API that collects getStats snapshots in regular interval, and showing them after some post-processing.
Collect getStats data from puppeteer page.evaluate, send it to server and then analyse the data realtime or at the end of call based on your use case.
There are quite good amount of opensource work done by WebRTC experts on how you can collect WebRTC data, send them to server and represent them
https://github.com/fippo/webrtc-externals
https://github.com/fippo/webrtc-dump-importer
https://github.com/fippo/dump-webrtc-event-log

Handling of pubsub subscribers for distributed longrunning tasks

I am evaluating the use of using pubsub for long-running tasks such as video transcoding, where a particular transcode may take between 2-10 minutes. Is pubsub a good approach for such a task distribution? For example, let's say I have five servers:
- publisher1
- publisher2
- publisher3
- publisher4
- publisher5
And a topic called "videos". Would it be possible to spread out the messages equally across those five servers? What about when servers are added or removed? What would be a good approach to doing this, or is pubsub not the right tool for something like this?
This does sound like a reasonable use case for pubsub. Specifically, if you use a pull subscriber, you can configure flow control settings to have at most one outstanding message to your server, and configure the max ack extension period (in java) to be a reasonable upper bound of your processing time. This api is described here http://googleapis.github.io/google-cloud-java/google-cloud-clients/apidocs/index.html?com/google/cloud/pubsub/v1/package-summary.html
This should effectively load balance across your servers by default if you use the same subscriber id for all jobs. If a server is added and backlog exists, it will receive a new entry. If a server is removed, it will no longer be sent messages. If it removed while processing or crashes, the message it was working on will be resent to another server.
One concern however is that pubsub has a limit of 10MB per message. You might consider instead putting the data itself in a google cloud storage bucket. Cloud storage can publish the file location to a pubsub topic when an upload is complete. https://cloud.google.com/storage/docs/pubsub-notifications

what are the Streaming APIs

basically I want to understand both high level and also technical point of view as what constitutes a streaming API, there are all sorts of data available but I could not find a satisfactory explanation of streaming API, also how does it differ from general APIs (REST if applicable)
PS:I am not asking about multimedia streaming.
Kind of a vague question. I guess streaming usually means one of the following (or a combination)
downloading data for immediate consumption, rather than a whole file for storage, potentially with support for delivering partial data (lower quality, only relevant pieces etc), sometimes even without any storage at all in between producer and consumer
a persistent connection that continues to deliver new data as it becomes available, rather than having the client poll
A good example (for the first pattern) are streaming XML parsers (such as SAX). They allow you to handle XML data that is too big to fit into memory (which a DOM parser likes to do).
I just find another good answer here:
https://www.quora.com/What-is-meant-by-streaming-API
A streaming API differs from the normal REST API in the way that it leaves the HTTP connection open for as long as possible(i.e. "persistent connection"). It pushes data to the client as and when it's available and there is no need for the client to poll the requests to the server for newer data. This approach of maintaining a persistent connection reduces the network latency significantly when a server produces continous stream of data like say, today's social media channels. These APIs are mostly used to read/subscribe to data.