Monitoring Yarn/Cloudera application logs in production - hadoop-yarn

I am NOT talking about Cloudera or Yarn system level logs. I am talking about applications running on Cloudera/Yarn infrastructure.
We have tens of Java and Python applications running on our Cloudera Infra, and all of them generate application logs. I am looking for the best way to monitor these logs for any errors and warnings. If it is a pure stand alone Java application, traditionally we can use one of these log scraper tools that send emails based on an expression matching (to detect error/warning/any other special situation). I am looking for something similar, that can monitor our application logs and emails us in real time for better production application support.
If thinking about this like a traditional application log monitoring is not the right way, then I am happy to know if there are any better industry standard approaches. Thanks!

I guess the ElasticStack (https://www.elastic.co/de/) could be one approach to solve this. You could use FileBeats to send your application logs to Logstash which forwards it to ElasticSearch. You could then create a Watcher in Kibana which sends i.e. Emails based on some triggering condition (we use a webhook to send notifications into a MS Teams channel).
This solution should work at least in near-realtime (~1-2 minutes delay, but this also depends on your watcher configuration).

Related

Redis connection settings for app "surviving" redis connectivity issues

I'm using azure redis cache for certain performance monitoring services. Basically when events like page loads, etc occur, I send a fire and forget command to redis to record the event. My goal is for my app to function fine whether or not it can contact the redis server. I'm looking for a best practice for this scenario. I would be OK with losing some events if necessary. I've been finding that even though I'm using fire and forget, the app staggers when the web server runs into high latency or connectivity issues with the server.
I'm using StackExchange.Redis. Any best practice configuration options/programming practices for this scenario?
The way I was implementing a singleton pattern on the connection turned out to be blocking requests. Once I fixed this my app behaves as I want (e.g. it still functions when redis connection dies).

Does RabbitMQ contain functionality to deal with offline target nodes

Being new to the RabbitMQ I was wondering how to deal with an offline target node.
As an example this scenario:
1 log recording application that stores logs to some persistent storage
N log publishing applications that want their logs to be written to the persistent storage via the log recording server.
There would be two options:
Each publishing application publishes it's log messages to it's local RabbitMQ instance and the log recording server must subscribe to each of these
The log recording application has it's local RabbitMQ instance on which each log publishing application delivers it's messages.
Option 1 would require me to reconfigure/recode/notify the recording application each time a new application appears or moves. Therefore I would think Option 2 is the right one, each new publishing application simply writes to the RabbitMQ Node of the recording application.
The only thing I am struggling with is how to deal with a situation in which the Node of the recording application is down. Do I need to build my own system to store the messages until it's back online or can I use some functionality of RabbitMQ to deal with that? I.e. could the local RabbitMQ of each of the publishing applications just receive the messages and forward them to the recording application RabbitMQ as soon as it's back online?
I found something about the Federated plugin be couldn't understand if that's the solution. Maybe I need something different or maybe I have to write my own local queueing system (which I hope I don't have to) to queue messages when the target Node is offline.
Any links to architectural examples or solutions are more than welcome.
BTW: https://groups.google.com/forum/#!topic/easynetq/nILIKSjxyMg states that you shouldn't be installing a RabbitMQ Node for each application, so maybe I should resort to something like MSQM or ZeroMQ (?)
From experience in what sounds like a similar situation, I would suggest using something other than a queue to store the messages locally, when offline.
Years ago, I built a system that had to work offline - no network connection at all - and then had to push messages through a message queue to the central server, when the laptop was brought back to the office.
I solved this by using a local database (sqlite at the time) to store my messages when the message queue was not available.
You should do something similar. Use a local database or even a plain text file or CSV file to store your messages when RabbitMQ is offline. When it reconnects, read the messages from your local file system and send them through RabbitMQ.
This is a good strategy to use, even if you do not expect RabbitMQ to go offline. Frankly, it will go offline at some point and you will have to deal with it. You should be prepared for that situation, and having a local store for your messages will help that.
...
regarding rqm node per application: bad idea. this adds a ton of complexity to your system. You want as few RabbitMQ nodes as you can get away with. Meaning, 1 per system (a system being comprised of many applications) when possible... with the exception of RabbitMQ clusters for availability - but that's another line of questions and design, entirely.
...
I did an interview with Aria Stewart about designing for failure with RabbitMQ and messaging systems, and have a small excerpt where she talks about how networks fail.
The point is, the network or RabbitMQ or something will fail and you will need a solution like a local datastore so that you can recover when RabbitMQ comes back online.

Upload text logs to MVC 4 web server every second

I have a Web Server implemented using dot net MVC4. There are clients connected to this web server which perform some operations and upload live logs to the server using WebClient.UploadString method. Sending these logs from client to server is being done in group of 2500 characters at a time.
Things work fine until 2-3 client upload logs. However when more than 3 clients try to upload logs simultaneously they start receiving "http 500 internal server error".
I might have to scale up and add more slaves but that will make the situation worse.
I want to implement Jenkins like live logging, where logs from slave are updated live.
Please suggest some better and scalable solution to this problem.
Have you considered looking into SignalR?
It can be used for anything from instant messaging to stocks! I have implemented both a chatbox, and a custom system that sends off messages, does calculations and then passes them back down to client. It is very reliable, there are some nice tutorials, and I think it's awesome.

Benchmarking Rabbitmq Tool

I have multiple components connected with RabbitMQ. Some are producers and consumers. I need to benchmark/load test my system. I need to ensure that the consumers can handle N messages/second. I've done some searching on the internet but haven't really found anything. Does anyone have any experience with benchmarking RabbitMQ? Ideally I'd like to just spam the network with messages without having to create a new producer.
Do you know the tool JMeter? With this tool you can simply simulate a heavy load on a server. I use it normaly for web applications, but i saw a JMeter-RabbitMQ-plugin to test AMQP message broker like RabbitMQ with JMeter. I think you should have a look on it.
If you have a web application in the front of RabbitMQ, then you can also test directly this application with JMeter.

Tools to monitor performance on ActiveMQ

I am a looking for proven tools to monitor performance on ActiveMQ 5.5. I come from an environment which used Glassfish and JMQ that can tell me rate of messages produced and consumed on any given destination using "imqcmd". Is there a like tool for ActiveMQ or a different way to go about it?
I see that there is a project at http://activemq.apache.org/activemq-performance-module-users-manual.html that will do some sort of performance reporting but it seems to be no more than a SNAPSHOT version that I cannot get to operate.
Any input would be appreciated.
there are several options for this: JMX, AMQ webconsole, other options
here are my notes on this...I opted to go with JMX and built a simple web app (JSP, jQuery, Google Charts, etc) to interface with JMX to gather queue stats, manage queues, etc...
http://www.consulting-notes.com/2010/08/monitoring-and-managing-activemq-with.html