What logs are important to monitor in splunk for Continuum - continuum

I am setting up splunk to monitor continuum and its logs, which log files of continuum would be important to monitor?

Logs in VersionOne Continuum can be thought of in two categories - service logs and automation logs.
Service logs will give you insights into warnings and errors with the UI, API and other core processes. Specifically watch for CRITICAL and ERROR in the messages.
Automation logs are more targeted to the teams use of the value stream orchestration. Still, it would be useful to monitor the logs in /pi and /te for CRITICAL and ERROR, as well as the keyword failure in /pi and the keyword Error in /te, as these indicate automation routines that have completed successfully but where their goals have failed.
From time to time, all systems experience problems sending email messages. Monitoring the ctm-core log will reveal to you when the system is unable to send a notification after the max retries.

Related

Monitoring Yarn/Cloudera application logs in production

I am NOT talking about Cloudera or Yarn system level logs. I am talking about applications running on Cloudera/Yarn infrastructure.
We have tens of Java and Python applications running on our Cloudera Infra, and all of them generate application logs. I am looking for the best way to monitor these logs for any errors and warnings. If it is a pure stand alone Java application, traditionally we can use one of these log scraper tools that send emails based on an expression matching (to detect error/warning/any other special situation). I am looking for something similar, that can monitor our application logs and emails us in real time for better production application support.
If thinking about this like a traditional application log monitoring is not the right way, then I am happy to know if there are any better industry standard approaches. Thanks!
I guess the ElasticStack (https://www.elastic.co/de/) could be one approach to solve this. You could use FileBeats to send your application logs to Logstash which forwards it to ElasticSearch. You could then create a Watcher in Kibana which sends i.e. Emails based on some triggering condition (we use a webhook to send notifications into a MS Teams channel).
This solution should work at least in near-realtime (~1-2 minutes delay, but this also depends on your watcher configuration).

Google Cloud Pub/Sub - Publish test message

Does Cloud Pub/Sub support publishing test messages, i.e. messages that are truly published but not forwarded to the subscribers, to verify the integration between your application and the remote Cloud Pub/Sub?
I could imagine doing this manually by setting a test attribute flag for such messages so that subscribers can filter them out. I know that there also exists the Cloud Pub/Sub emulator for local testing but I was wondering if there exists such a feature?
No. This feature doesn't exists. However, your subscriber should be tolerant to error and not compliant messages.
I suggest to publish a wrong message. Your integration is tested, and the subscriber behavior and fault tolerance also. Bonus, you can filter log Trace of your subscribers in the stackdriver logging service and create an alert in case of wrong message received, in real production environment

Azure Web apps are getting very slow

I have an Azure web app which is connected with my mobile app. some times my azure app is facing slowness. why azure app getting slow sometimes.
am getting this issue sometimes please check the image
502 means a Gateway timeout. The front-end of the azure web app infrastructure wasn't able to communicate with the process serving your application. This could be due to variety of reasons. The common reasons include
application time-out
deadlocks in application
Application restarts
I would recommend you to enable the following logging to collect some data and investigate the same:
Web Server Logging (use this to check the time_taken field)
Failed Request Tracing (this will help you in determining which module is taking the time)
Detailed Error Messages (This will provide the exact error)
There is another option to investigate your app. Browse to Diagnose and solve problems section for your app and refer to the instructions there. See the screenshot below:

Does RabbitMQ contain functionality to deal with offline target nodes

Being new to the RabbitMQ I was wondering how to deal with an offline target node.
As an example this scenario:
1 log recording application that stores logs to some persistent storage
N log publishing applications that want their logs to be written to the persistent storage via the log recording server.
There would be two options:
Each publishing application publishes it's log messages to it's local RabbitMQ instance and the log recording server must subscribe to each of these
The log recording application has it's local RabbitMQ instance on which each log publishing application delivers it's messages.
Option 1 would require me to reconfigure/recode/notify the recording application each time a new application appears or moves. Therefore I would think Option 2 is the right one, each new publishing application simply writes to the RabbitMQ Node of the recording application.
The only thing I am struggling with is how to deal with a situation in which the Node of the recording application is down. Do I need to build my own system to store the messages until it's back online or can I use some functionality of RabbitMQ to deal with that? I.e. could the local RabbitMQ of each of the publishing applications just receive the messages and forward them to the recording application RabbitMQ as soon as it's back online?
I found something about the Federated plugin be couldn't understand if that's the solution. Maybe I need something different or maybe I have to write my own local queueing system (which I hope I don't have to) to queue messages when the target Node is offline.
Any links to architectural examples or solutions are more than welcome.
BTW: https://groups.google.com/forum/#!topic/easynetq/nILIKSjxyMg states that you shouldn't be installing a RabbitMQ Node for each application, so maybe I should resort to something like MSQM or ZeroMQ (?)
From experience in what sounds like a similar situation, I would suggest using something other than a queue to store the messages locally, when offline.
Years ago, I built a system that had to work offline - no network connection at all - and then had to push messages through a message queue to the central server, when the laptop was brought back to the office.
I solved this by using a local database (sqlite at the time) to store my messages when the message queue was not available.
You should do something similar. Use a local database or even a plain text file or CSV file to store your messages when RabbitMQ is offline. When it reconnects, read the messages from your local file system and send them through RabbitMQ.
This is a good strategy to use, even if you do not expect RabbitMQ to go offline. Frankly, it will go offline at some point and you will have to deal with it. You should be prepared for that situation, and having a local store for your messages will help that.
...
regarding rqm node per application: bad idea. this adds a ton of complexity to your system. You want as few RabbitMQ nodes as you can get away with. Meaning, 1 per system (a system being comprised of many applications) when possible... with the exception of RabbitMQ clusters for availability - but that's another line of questions and design, entirely.
...
I did an interview with Aria Stewart about designing for failure with RabbitMQ and messaging systems, and have a small excerpt where she talks about how networks fail.
The point is, the network or RabbitMQ or something will fail and you will need a solution like a local datastore so that you can recover when RabbitMQ comes back online.

Get rabbitmq message body

I am using RabbitMQ 3.1.3 to handle Celery tasks, and have discovered a task that is locking up my workers. It is one of two messages in a queue, and I would love to see the content of the message to debug what is breaking my process flow. How can I dump the message body for debugging purposes? I have tried rabbitmqadmin, but get "Connection reset" errors on login attempt (and the logs show a cryptic "{bad_header,<<"POST /ap">>}
").
Login to the web managemnet gui at the RabbitMQ server to view the status of your server in an easy way. You can see exchanges, queues etc. And the status of the messages.
I would also recommend you to check out the tutorials at the RabbitMq website. They explain how everything works in a compact way it is also code examples for several programming languages. Php, Java, Ruby etc
You can easily write small programs to test queues, sending test messages etc from the examples.