Java application exception monitoring & alarms

Java application exception monitoring & alarms - notifications

We have a few applications which are running in Windows 2K, 2008 servers. They are written in java.
These applications needs to do many automation tasks. We are having difficulty to monitor these applications. Sometime due to XYZ reasons application either hangs or fail to perform desired job. We only come to know about this after a few days when some one reports that desired function hasn't been executed.
To come out of this issue, we added emails for each imp exceptions but then developer needs to spend time to check those 1000 emails everyday. Which is again not feasible & efficient solution.
Now we are looking for a alert, alarms, notification display & monitoring system. We need to have a remote application which can receive alarms from these java applications & then based on certain information/Condition/Configuration, remote application can display some red, orange, green text on the screen. Based on red text, users can be visually see that there is an issue in the system. If required users can be notified that there is a serious issue in the application.
Please help us to identify any existing mechanism, tool, package to achieve this goal. Any suggestion would be highly appreciated.
Thanks

There are myriad ways to achieve this, but all of them will require some effort. Which way to proceed depends on your needs and abilities. A couple of options occur to me:
Have your processes log their exceptions to a Syslog daemon, running on some central server. Then you could have an admin read through the log file for serious problems, but there are many ways to post-process syslog messages, a web search on it might give some more hints.
Is there any way, when logged into the server, to observe whether or not the process is running properly or not? You could install something like Nagios on a sever, and write a plugin that monitors your particular process on all the servers. The plugin can basically be a shell script that checks "ps", or a log file, or whatever you want.
If you are in an IT department, your organization might already have some system like this (NMS).
I'm not sure why this question is tagged "snmp", but it's technically possible to install an SNMP agent on each server, and have them send traps on certain conditions. I do think it would be slightly overkill because you would also have to get a good SNMP manager to receive the traps and alert a sysadmin.

I would go for a combination of the check_logfiles plugin to parse log exceptions and raise alerts, and check_jmx/jmxquery to check metrics inside the JVM such as heap usage and thread count.
check_logfiles
check_jmx

Related

ColdFusion 11 to 2018 Upgrade -- Server Locking Up, How to Test Better?

We are currently testing an upgrade from CF11 to CF2018 for my company's intranet. To give you an idea how long this site has been running, our first version of CF was 3.1! It is still using application.cfm, and there is code from 1998, when I started writing this thing. Yes, 21 years -- I'm astonished, too. It is a hodgepodge of all kinds of older frameworks, too, including Fusebox.
Anyway, we're running Win 2012 VM connected to a SQL 2016 farm. Everything looked OK initially, but in the Week I've been testing, the server has come to a slowdown once (a page took more than 5 seconds to run, something that usually takes 100ms, no DB involvement), and another time, the server came to a grinding halt. The only way I could restart CF App service was by connecting to the server with another server via Services, because doing it via Remote Desktop was so slow.
Now keep in mind -- it's just me testing. This is a site that doesn't have a ton of users, but still, having 5 concurrent connections is normal and there are upwards of 200-400 users hitting this thing every day.
I have FusionReactor running on this thing now, so the next time a lockup happens, I will be able to take a closer look, but what do you think is the best way I can test this? Our site is mostly transactional, users going and filling out forms to put internal orders through. We also connect to XML web services and REST services; we also provide REST services, too. Obviously there's no way to completely replicate a production server's requests onto a test server, but I need to do more thorough testing. Any advice would be hugely appreciated.

I realize your focus for now is trying to recreate the problem on test. That may not be as easy as hoped. Instead, you should be able to understand and resolve it in production. FusionReactor can help, but the answer may well be in the cf logs.
You don't mention assessing the logs at the time of the hangup. See especially the coldfusion-error log, for outofmemory conditions.
You mention raising the heap, but the problem may be with the metaspace instead. If so, consider simply removing the maxmetaspace setting in the jvm args. That may be the sole and likely cause of such new and unexpected outages.
Or if it's not, and there's nothing in the logs at the time, THEN do consider FR. Does IT show anything happening at the time?
If not then consider a need to tune the cf/web server connector. I assume you're using iis. How many sites do you have? And how many connectors (folders in the cf config/wsconfig folder)? What are the settings in their workers.properties file? Are they optimized for the number of sites using that connector?
Also, have you updated cf2018? Are there any errors in the update error log? Did you update the web server connector also?
Are you running the cf2018 pmt (performance monitoring tool set)? Have you updated it?
There could be still more to consider, but let's see how it goes with those. I have blog posts on these and many more topics that would elaborate on things, both at my site (carehart.org) and the Adobe cf portal (coldfusion.adobe.com).
But let's hear if any of this gets you going.

How important is Audit Logging in a Standalone Application

My experience are mostly in developing web applications and we do a lot of audit trails there. Literally every table is audited. I believe this is because user transactions are centralized to a server and they share the same table so it is important who did what.
But now I am assigned to a project developing a standalone application (specifically a mobile application with occasional server transactions). Some are suggesting to add Audit logging but I am not sure what is the norm for standalone applications. For those who have experiences, kindly share if you think it is mandatory or not. I'm leaning towards NO (that it is not that important) because it will only increase resource consumption (and mobile limited). It may affect performance, stability and usability.

Well for the app itself it may not be so important, however having automated error logs can be very useful when you faced with an angry customer(s) and need to quickly debug the app. You can even have a special 'debug mode' to collect more info.
You should also log your server transactions, adding an extra query in the request won't really affect performance.

Weblogic work manager

I am new to weblogic server. I am using work manager. I want to know what is work manager and why we need it. What is the difference between normal request with out work manager and with work manager !!

I think the documentation is rather good on this subject.
WebLogic Server prioritizes work and allocates threads based on an
execution model that takes into
account administrator-defined
parameters and actual run-time
performance and throughput.
Administrators can configure a set of
scheduling guidelines and associate
them with one or more applications, or
with particular application
components. For example, you can
associate one set of scheduling
guidelines for one application, and
another set of guidelines for other
application. At run-time, WebLogic
Server uses these guidelines to assign
pending work and enqueued requests to
execution threads.
Essentially, with work managers you can attach a scheduling policy to an application to e.g. make sure that a specific application gets a fair share of the available computing resources under a heavy load situation. Or you might want to restict the maximum number of threads that will be allocated to an application to prevent a buggy/untested application to bring the whole application server to its knees. (But surely all apps have been tested not to do anything like that.... ;) )

Outside of modifying the default allocation algorithms, the Work Manager is also useful if you are using a Foreign JMS Provider (such as IBM MQ) and need to process more than 16 messages at a time.

Windows Scheduler OR SQL Server Job for sending out digest e-mails

Will be sending out e-mails from an application on a scheduled basis.
I have an EmailController in my ASP.NET MVC application with action methods, one for each kind of notification/e-mail, that will need to be called at different times during the week.
Question: Is Windows Scheduler (running on a Server 2008 box) any better or worse than scheduling this via a SQL Server job? And why?
Thanks

IMHO having scheduler call into the controller and execute the action methods to fire off notifications worked out best. My process (for better of for worst) is as such:
Put the code to call the controller/action in a .vbs file. The action method requires a "security code" that must match a value in the web.config or else it will not execute (my thinking is that this will lessen the chance of some folk hitting the action method with there browser and running the send notification code when it shouldn't be run).
Create a scheduled task in Scheduler to call that file on a regular basis.
In my database, log all notification executions and include an attribute that defines the frequency in which different notification types should go out. This, again, is to lessen the chance of someone sending out notifications when they shouldn't.
Anyhow, this works. The only problem I had was hitting vis https. That didn't work as I believe the task was being challenged to provide some credentials (which it couldn't as it was being run programmatically). Changing it to http worked and imo doesn't create any kind of security risk.
Thoughts? Better way to implement this? I'd love to hear anything anyone has to offer.
Thanks

I prefer sending emails with a SQL server job. As we already had several jobs running on our SQL server it made sense to stick with this one approach. If we had gone down the scheduled task route we would then of had 2 different task scheduling systems which adds needless complexity. With all scheduled tasks occurring through one system its easy to track and maintain them.

(Windows) Exception Handling: to Event Log or to Database?

I've worked in shops where I've implemented Exception Handling into the event log, and into a table in the database.
Each have their merits, of which I can highlight a few based on my experience:
Event Log
Industry standard location for exceptions (+)
Ease of logging (+)
Can log database connection problems here (+)
Can build report and viewing apps on top of the event log (+)
Needs to be flushed every so often, if alot is reported there (-)
Not as extensible as SQL logging [add custom fields like method name in SQL] (-)
SQL/Database
Can handle large volumes of data (+)
Can handle rapid volume inserts of exceptions (+)
Single storage location for exception in load balanced environment (+)
Very customizable (+)
A little easier to build reporting/notification off of SQL storage (+)
Different from where typical exceptions are stored (-)
Am I missing any major considerations?
I'm sure that a few of these points are debatable, but I'm curious what has worked best for other teams, and why you feel strongly about the choice.

You need to differentiate between logging and tracing. While the lines are a bit fuzzy, I tend to think of logging as "non developer stuff". Things like unhandled exceptions, corrupt files, etc. These are definitely not normal, and should be a very infrequent problem.
Tracing is what a developer is interested in. The stack traces, method parameters, that the web server returned an HTTP Status of 401.3, etc. These are really noisy, and can produce a lot of data in a short amount of time. Normally we have different levels of tracing, to cut back the noise.
For logging in a client app, I think that Event Logs are the way to go (I'd have to double check, but I think ASP.NET Health Monitoring can write to the Event Log as well). Normal users have permissions to write to the event log, as long as you have the Setup (which is installed by an admin anyway) create the event source.
Most of your advantages for Sql logging, while true, aren't applicable to event logging:
Can handle large volumes of data:
Do you really have large volumes of unhandled exceptions or other high level failures?
Can handle rapid volume inserts of exceptions: A single unhandled exception should bring your app down - it's inherently rate limited. Other interesting events to non developers should be similarly aggregated.
Very customizable: The message in an Event Log is pretty much free text. If you need more info, just point to a text or structured XML or binary file log
A little easier to build reporting/notification off of SQL storage: Reporting is built in with the Event Log Viewer, and notification systems are, either inherent - due to an application crash - or mixed in with other really critical notifications - there's little excuse for missing an Event Log message. For corporate or other networked apps, there's a thousand and 1 different apps that already cull from Event Logs for errors...chances are your sysadmin is already using one.
For tracing, of which the specific details of an exception or errors is a part of, I like flat files - they're easy to maintain, easy to grep, and can be imported into Sql for analysis if I like.
90% of the time, you don't need them and they're set to WARN or ERROR. But, when you do set them to INFO or DEBUG, you'll generate a ton of data. An RDBMS has a lot of overhead - for performance (ACID, concurrency, etc.), storage (transaction logs, SCSI RAID-5 drives, etc.), and administration (backups, server maintenance, etc.) - all of which are unnecessary for trace logs.

I wouldn't log straight to the database. As you say, database issues become tricky to log :)
I would log to the filesystem, and then have a job which bulk-inserts from files to the database. Personally I like having the logs in the database in the log run primarily for the scaling situation - I pretty much assume I'll have more than one machine running, and it's handy to be able to effectively have a combined log. (Each entry should state the machine it comes from, of course.)
Report and viewing apps can be done very easily from a database - there may be fewer log-specialized reporting tools out there at the moment, but pretty much all databases have generalised reporting functionality.
For ease of logging, I'd use a framework like log4net which takes a lot of the effort out of it, and is a tried and tested solution. Aside from anything else, that means you can change your output strategy with no code changes. You could even log to both the event log and the database if necessary, or send some logs to one place and some to the other. (I've assumed .NET here, but there are similar logging frameworks for many platforms.)

One thing that needs considering about event logging is that there are products out there which can monitor your servers' event logs (like Microsoft Operations Manager) and intelligently do notification, and gather statistics on their contents.
A "minus" of SQL-based logging is that it adds another layer of dependencies to your application, which may or may not always be acceptable. I've done both in my career. I once or twice even used a MSMQ based message queue to queue log events and empty the queue into a MSSQL database to eliminate the need for my client software to have a connection to the DB.

One note about writing to the event log: that requires certain permissions for your application users that in some environments may be restricted by default.
Where I'm at we do most of our logging to a database, with flat files as backup. It's pretty nice, we can do things like get an RSS feed for an app to watch for a few days when we make a change.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas