What are the most important metrics when monitoring openDJ? - opends

and how do you decide on what monitoring method to use (LDAP monitoring, SNMP, JMX or the logs). Is it worth it to implement all of them? How different are they?
I pulled the metrics through all of these methods and they seem to be very similar to me. I'm new to working with directory servers so I'm interested in what others would think.
Also, after you pull the metrics, how do you make use of them (view them with jconsole for example?)

With OpenDJ, all metrics are equally available through LDAP and JMX. SNMP only has a subset of the metrics, i.e. the ones that are defined in the standard directory MIB.
When monitoring any server, I don't think a single capture of metrics is useful. What you want to compute average and running average over sample periods, define thresholds for some of the metrics and raise alert when those thresholds are met.

Related

CoTURN Usage Statistics

I am still a bit new to the WebRTC world and trying to find my way through. I have succcessfully set up CoTURN, and been able to route calls behind a firewall by using CoTURN. Now I am wondering if it is possible to somehow inspect and possibly visualize usage statistics of CoTURN? I would love to know how many users are utilizing the server at any given time, how much the bandwidth and CPU usage is etc.? I saw details on how to optimize bandwidth and CPU usage in the official docs, but I haven't found any info on actually monitoring the usage. Any help would be highly appreciated.
If you want to monitor standard usage statistics like CPU usage, load, bandwidth, etc., you can focus on what's available for your infrastructure. For example in AWS you could have CloudWatch, or in generic Linux deployments export the usage stats with Prometheus and have them presented with Grafana.
For the coturn/TURN specific statistics, then coturn allows to store some metrics in Redis; it's described in https://github.com/coturn/coturn/blob/master/turndb/schema.stats.redis
Total traffic information is also reported when the allocation is deleted. The keys are
"turn/user/<username>/allocation/<id>/total_traffic" or "turn/user/<username>/allocation/<id>/total_traffic/peer".
Applications interested in the total amount of traffic per allocation can subscribe to these events as:
psubscribe turn/realm/*/user/*/allocation/*/total_traffic
psubscribe turn/realm/*/user/*/allocation/*/total_traffic/peer

Baselining internal network traffic (corporate)

We are collecting network traffic from switches using Zeek in the form of ‘connection logs’. The connection logs are then stored in Elasticsearch indices via filebeat. Each connection log is a tuple with the following fields: (source_ip, destination_ip, port, protocol, network_bytes, duration) There are more fields, but let’s just consider the above fields for simplicity for now. We get 200 million such logs every hour for internal traffic. (Zeek allows us to identify internal traffic through a field.) We have about 200,000 active IP addresses.
What we want to do is digest all these logs and create a graph where each node is an IP address, and an edge (directed, sourcedestination) represents traffic between two IP addresses. There will be one unique edge for each distinct (port, protocol) tuple. The edge will have properties: average duration, average bytes transferred, number of logs histogram by the hour of the day.
I have tried using Elasticsearch’s aggregation and also the newer Transform technique. While both work in theory, and I have tested them successfully on a very small subset of IP addresses, the processes simply cannot keep up for our entire internal traffic. E.g. digesting 1 hour of logs (about 200M logs) using Transform takes about 3 hours.
My question is:
Is post-processing Elasticsearch data the right approach to making this graph? Or is there some product that we can use upstream to do this job? Someone suggested looking into ntopng, but I did not find this specific use case in their product description. (Not sure if it is relevant, but we use ntop’s PF_RING product as a Frontend for Zeek). Are there other products that does the job out of the box? Thanks.
What problems or root causes are you attempting to elicit with graph of Zeek east-west traffic?
Seems that a more-tailored use case, such as a specific type of authentication, or even a larger problem set such as endpoint access expansion might be a better use of storage, compute, memory, and your other valuable time and resources, no?
Even if you did want to correlate or group on Zeek data, try to normalize it to OSSEM, and there would be no reason to, say, collect tuple when you can collect community-id instead. You could correlate Zeek in the large to Suricata in the small. Perhaps a better data architecture would be VAST.
Kibana, in its latest iterations, does have Graph, and even older version can lever the third-party kbn_network plugin. I could see you hitting a wall with 200k active IP addresses and Elasticsearch aggregations or even summary indexes.
Many orgs will build data architectures beyond the simple Serving layer provided by Elasticsearch. What I have heard of would be a Kappa architecture streaming into the graph database directly, such as dgraph, and perhaps just those edges of the graph available from a Serving layer.
There are other ways of asking questions from IP address data, such as the ML options in AWS SageMaker IP Insights or the Apache Spot project.
Additionally, I'm a huge fan of getting the right data only as the situation arises, although in an automated way so that the puzzle pieces bubble up for me and I can simply lock them into place. If I was working with Zeek data especially, I could lever a platform such as SecurityOnion and its orchestrated Playbook engine to kick off other tasks for me, such as querying out with one of the Velocidex tools, or even cross correlating using the built-in Sigma sources.

Controlling and monitoring use of BI Engine Reservations

With the new beta BI Engine Reservations, I've noticed some queries speed up, but others remain unaffected. Will it be possible
- to monitor how the reservation is being used?
- to have some control over how the reservation is used?
When it comes to control, I've seen no indication that you'll have any—the system decides what the most efficient mechanism is (BI Engine, query cache, etc.) and then allocates accordingly. Also, the size of your reservation, usage, and age are factored into what is added and subsequently removed from the BI Engine reservation.
While that may seem frustrating, it's also the selling point: zero-config, automatic acceleration of your dashboards. As Google iterates quickly on these products, I would expect some controls to find their way in eventually.
As a workaround, you could use a separate project for data you want to ensure has access to the full reservation (since BI Engine is project-level).
As was mentioned elsewhere, there are a handful of metrics that can be viewed using Stackdriver logging (if you enable it). These are all high-level metrics, and are listed in the documentation:
Reservation Total Bytes
Reservation Used Bytes
Inflight Requests
Request Count
Request Execution Times
These won't likely give you a lot of the information you're looking for, but can be monitored for patterns.
You can use the elasticsearch and logstash for monitoring and implementing a security enviroment. The way with works is simple and for Near Real Time.

Using Graylog to monitor resources + notifications

Since we're already using Graylog (version 2.4.6) as a general purpose logging backend for our project, we thought we might as well also use it to monitor resource use. The three major benefits would be:
No need to change our codebase to add additional libraries.
Easy to create charts and graphs for the metrics we're tracking.
Built-in notifications.
Concretely, we're trying to track how many jobs our various Beanstalk server has in each of its tubes. If a given tube accumulates for than a certain amount of jobs, we would like to be alerted.
Here's a typical message that we're using for a given tube:
{
"count" => $totalJobsInTube,
"tube" => $tubeName,
"env" => $env,
}
I can't think of a way to set up an alert condition in Graylog that allows me to specify a query + which field to look at. The only conditions we have are:
Field content alert condition
Field aggregation alert condition
Message count alert condition
Message conditional count alert condition
Can this even be done i Graylog?
Graylog is using Elasticsearch as a backend, which is not a good system for metrics (time series data), it's not efficient and doesn't scale well with time series data. This is the reason that most use another monitoring system for measuring resources and other time series data. It depends on your stack, but there are lots of open source and commercial offerings to do that.
If you wanted to do logs and metrics together I would suggest using open source software the Elasic Stack can do both, but that is only my reccomendation if you have limited numbers of metrics. Splunk and SumoLogic can also do logs and metrics, but they are not ideal for time series, especially large numbers of them.

Protocol for remote logging of temperature, gas/electricity consumption

So, I'm managing a series of rented holiday homes, which all have dynamic IP, ADSL Internet connections.
We've wanted to keep track of a few types of data, e.g. per-room electricity usage, hot water temperature, thermostat setting, gas usage, network bandwidth usage, etc etc, and keep these centrally so we can perform analytics and graph them in real-time.
I'm comfortable building the hardware required to log these variables every 1-5 seconds and get them into e.g. a Raspberry Pi, but I'm wondering what kind of framework would be suitable for transferring and storing the data on the server side.
My initial thought was something like SNMP, but a) this doesn't seem designed for non-network uses, b) it's not very secure, and c) I'm looking for something agent-to-server (so I don't have to know the IP of the agent, and it'll also traverse NAT, so I can have multiple devices logging different things on the same network.)
My second thought was something using a REST API, but making potentially hundreds of API calls per second via different TCP connections seems a bit wasteful.
I came across Cubism but this seems to have the same disadvantages as some sort of REST API; there's a lot of redundant data transmitted every connection, if I were to send the data every 5 seconds per sensor.
Names like AMQP and MQTT come up, though none of these seem particularly suited (natively) to travelling over the public Internet without configuring VPNs etc.
Thoughts?
[This doesn't seem like a particularly niche problem, now I think about it - weather logging, share price, etc etc... although this is probably a smaller interval]
I have an geospatial/environment monitoring background and can tell you something about two major standards which are used today in environmental/infrastructural (electricity and water supply networks) monitoring sensor networks.
Proprietary one: Most sensors simply store time series measurements in their own local data format. A server process calls every sensor from time to time to gather the time series data (in most cases via a simple GPRS uplink), transforms it into an exchange Format and then stores it into a centralized database where you can work with the data. One of the industry leader companies is Kisters AG and their exchange format ZRXP. So this is simply storing time series data in an ASCII Format (i.e.ZRXP), and import that into a database by calling the sensor over any connection.
Open Geospatial Standard: Sensor Observation Service and SensorML which I think does more fit your needs, because these are Web Service Specifications whilst the proprietary stuff above is a complete system solution built by one vendor. There exists a nearly ready to use java reference implementation of SOS provided by 52 north which should be easily runnable on a Pi. Although the SOS specification has a very strong geospatial background, that does not mean,that it can't be adopted for your purpose I think. At least SensorML should give you some ideas.