Spring XD or Integration to parse log stats in real time

Spring XD or Integration to parse log stats in real time - rabbitmq

I have a dashboard web application that currently starts a thread and tails a log file, even time a line is added to the file the tailer picks it up, parses it and then publishes an event around the application, which in turn does things like send it to the client over a web socket or updates a total stored on disk.
This all works fine and so far seems to be handling 500,000 log events a day without batting an eyelid.
But looking at the Spring family maybe there is a better way of doing this in a more 'standardised' fashion that can make it easier to maintain the code and for others to support it.
As I'm new to Spring can someone tell me if this sort of thing is best suited to Spring Integration or Spring XD, or should I take it a stage further and use Spring Integration and Rabbit MQ?

Spring Integration 3.0 now has a tail inbound adapter. It will be released soon, the release candidate was announced last week. XD uses it in its tail source module.

I'am doing quite the same job with Spring XD. processing 30Go of logs files a day. Enrich them, send them to hadoop and Elasticsearch.
We are really happy with that technology that mixes spring integration, spring batch in a distributed way.
I've noticed that Redis was a real bottelneck and switched to RabbitMq to have a better throughtput

Related

Rabbitmq Prometheus Exporter vs Prometheus Plugin for RabbitMQ

We are currently trying to use Prometheus/Grafana in order to monitor several RabbitMQ instances deployed on multiple Docker containers.
My question is quite simple, what's the difference between using the Rabbitmq Prometheus Exporter vs Prometheus Plugin for RabbitMQ ?
Does the exporter scrape different/more information compared to the Plugin ?
Is there an overhead when using the Pluging compared to the exporter ?
Is it just a question of RabbiMQ's version ?
What is the added value from using one of the two options?
So basically what approach is better or can they be used in combination.

I have not trued out the plugin, but as far as I read it exports the same metrics as the exporter. The plugin has the advantage that it does not add complexity:
You need to host the rabbit exporter (which is not much effort, but still you need to make sure it runs, is updated from time to time,...)
You need an account for the rabbit exporter that can query the metrics which is a security issue. Your credentials might get stolen or the exporter might get compromised and and attacker would have access to your rabbit cluster.
Since there might be network between your rabbit cluster and the Rabbit exporter there might be the situation the exporter cannot reach the cluster while the plugin could still produce the metrics.
These are not big issues, we use the exporter for years now and never had an issue with it, but still, if we would start from scratch, we would give the plugin a try.

Monitoring Yarn/Cloudera application logs in production

I am NOT talking about Cloudera or Yarn system level logs. I am talking about applications running on Cloudera/Yarn infrastructure.
We have tens of Java and Python applications running on our Cloudera Infra, and all of them generate application logs. I am looking for the best way to monitor these logs for any errors and warnings. If it is a pure stand alone Java application, traditionally we can use one of these log scraper tools that send emails based on an expression matching (to detect error/warning/any other special situation). I am looking for something similar, that can monitor our application logs and emails us in real time for better production application support.
If thinking about this like a traditional application log monitoring is not the right way, then I am happy to know if there are any better industry standard approaches. Thanks!

I guess the ElasticStack (https://www.elastic.co/de/) could be one approach to solve this. You could use FileBeats to send your application logs to Logstash which forwards it to ElasticSearch. You could then create a Watcher in Kibana which sends i.e. Emails based on some triggering condition (we use a webhook to send notifications into a MS Teams channel).
This solution should work at least in near-realtime (~1-2 minutes delay, but this also depends on your watcher configuration).

Does javamelody work with spring webflux?

can anyone point me to a resource, how I can get spring-webflux and javamelody to work together?
Seems, that a servletcontext is neccessary for startup, which I don't have/need.
I'm aware of the coll metrics stuff, that comes with spring-boot 2.x, but I don't have anything to display the metrics with, and am locked to a company environment, where just installing something isn't a valid option.
Thanks,
Henning

javamelody is mainly based on monitoring of memory, cpu, http requests, sql requests and spring components among other things. See javamelody-spring-boot-starter for example.
But as far as I know, Spring webflux does not use the servlet api. So what do you want to monitor?
If you just want to have graphs in a browser, then start a http server for javamelody reports like in standalone. And if you also want to monitor sql requests and spring components, then add in your application all methods from this example, except monitoringSessionListener and monitoringFilter.
A new spring-boot-starter for javamelody in webflux could be created if it makes sense.

Does RabbitMQ contain functionality to deal with offline target nodes

Being new to the RabbitMQ I was wondering how to deal with an offline target node.
As an example this scenario:
1 log recording application that stores logs to some persistent storage
N log publishing applications that want their logs to be written to the persistent storage via the log recording server.
There would be two options:
Each publishing application publishes it's log messages to it's local RabbitMQ instance and the log recording server must subscribe to each of these
The log recording application has it's local RabbitMQ instance on which each log publishing application delivers it's messages.
Option 1 would require me to reconfigure/recode/notify the recording application each time a new application appears or moves. Therefore I would think Option 2 is the right one, each new publishing application simply writes to the RabbitMQ Node of the recording application.
The only thing I am struggling with is how to deal with a situation in which the Node of the recording application is down. Do I need to build my own system to store the messages until it's back online or can I use some functionality of RabbitMQ to deal with that? I.e. could the local RabbitMQ of each of the publishing applications just receive the messages and forward them to the recording application RabbitMQ as soon as it's back online?
I found something about the Federated plugin be couldn't understand if that's the solution. Maybe I need something different or maybe I have to write my own local queueing system (which I hope I don't have to) to queue messages when the target Node is offline.
Any links to architectural examples or solutions are more than welcome.
BTW: https://groups.google.com/forum/#!topic/easynetq/nILIKSjxyMg states that you shouldn't be installing a RabbitMQ Node for each application, so maybe I should resort to something like MSQM or ZeroMQ (?)

From experience in what sounds like a similar situation, I would suggest using something other than a queue to store the messages locally, when offline.
Years ago, I built a system that had to work offline - no network connection at all - and then had to push messages through a message queue to the central server, when the laptop was brought back to the office.
I solved this by using a local database (sqlite at the time) to store my messages when the message queue was not available.
You should do something similar. Use a local database or even a plain text file or CSV file to store your messages when RabbitMQ is offline. When it reconnects, read the messages from your local file system and send them through RabbitMQ.
This is a good strategy to use, even if you do not expect RabbitMQ to go offline. Frankly, it will go offline at some point and you will have to deal with it. You should be prepared for that situation, and having a local store for your messages will help that.
...
regarding rqm node per application: bad idea. this adds a ton of complexity to your system. You want as few RabbitMQ nodes as you can get away with. Meaning, 1 per system (a system being comprised of many applications) when possible... with the exception of RabbitMQ clusters for availability - but that's another line of questions and design, entirely.
...
I did an interview with Aria Stewart about designing for failure with RabbitMQ and messaging systems, and have a small excerpt where she talks about how networks fail.
The point is, the network or RabbitMQ or something will fail and you will need a solution like a local datastore so that you can recover when RabbitMQ comes back online.

Tools to monitor performance on ActiveMQ

I am a looking for proven tools to monitor performance on ActiveMQ 5.5. I come from an environment which used Glassfish and JMQ that can tell me rate of messages produced and consumed on any given destination using "imqcmd". Is there a like tool for ActiveMQ or a different way to go about it?
I see that there is a project at http://activemq.apache.org/activemq-performance-module-users-manual.html that will do some sort of performance reporting but it seems to be no more than a SNAPSHOT version that I cannot get to operate.
Any input would be appreciated.

there are several options for this: JMX, AMQ webconsole, other options
here are my notes on this...I opted to go with JMX and built a simple web app (JSP, jQuery, Google Charts, etc) to interface with JMX to gather queue stats, manage queues, etc...
http://www.consulting-notes.com/2010/08/monitoring-and-managing-activemq-with.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas