Capture a flight recording from JMC/JCMD vs WLDF image capture - weblogic

I need a Java flight recording to diagnose a performance problem on production Weblogic servers. I'd like to also get the Weblogic events. Is there any difference between starting the flight recording from either Java Mission Control (or in my case JCMD) versus initiating a WLDF diagnostic image capture? I understand the WLDF contains zipped files in addition to the .jfr, but right now I'm only interested in the flight recording (.jfr) with both the HotspotJVM and Weblogic events.
The reason I ask is because I notice something in the WLDF docs called Configure WLDF diagnostic volume (off, low, medium, high) where you set what types of Weblogic events you want to record. Will starting a flight recording from JCMD on a weblogic java instance include the Weblogic events at the preconfigured diagnostic volume? Or do you need to start it from the Weblogic Admin Console?

There is no mechanism in WLS that continuously polls to see if a recording has been started, using jcmd or JMC, and if so enable the WLDF events.
You have to enable them separately in the WLDF GUI [1]. When you do that you will also get JVM events roughly corresponding to what you get when you create a default recording. If you want more detailed information (profile), you need to start two seperate recodings.
[1] It can be good to know that the WLDF events are added using bytecode instrumentation, so the events are not even in the code until you enable the diagnostic feature.

Everything recorded into the flight recorder is recorded into the same buffers. See http://hirt.se/blog/?p=370. That said, the WLDF instrumentation settings will throttle what is actually recorded. So, there are various different ways to achieve what you want. The first thing to do is to make sure that you've enabled the diagnostic volume in WLDF to record whatever you want the WLDF to record into the flight recorder. For example "high".
Next you can either:
Start a continuous recording using command line flags, with a template configured to record you are interested in. (For example, the profiling template minus the full thread stack dump events.)
...or use jcmd to start a recording, again referring to the template that specify what, in addition to the WLDF events, you want to record.
...or use JMC to do pretty much the same thing - start a recording with the template settings you are interested in.
The advantage of the first alternative is that the events you are interested in will always be available, even if you dump an arbitrary time period. In the other two alternatives, they will only be available for the time you are running your (presumably) time limited recording. The advantage of the other alternatives is that you only pay for the (usually tiny) additional overhead of the additional events when your recordings are running.

Related

Using Graylog to monitor resources + notifications

Since we're already using Graylog (version 2.4.6) as a general purpose logging backend for our project, we thought we might as well also use it to monitor resource use. The three major benefits would be:
No need to change our codebase to add additional libraries.
Easy to create charts and graphs for the metrics we're tracking.
Built-in notifications.
Concretely, we're trying to track how many jobs our various Beanstalk server has in each of its tubes. If a given tube accumulates for than a certain amount of jobs, we would like to be alerted.
Here's a typical message that we're using for a given tube:
{
"count" => $totalJobsInTube,
"tube" => $tubeName,
"env" => $env,
}
I can't think of a way to set up an alert condition in Graylog that allows me to specify a query + which field to look at. The only conditions we have are:
Field content alert condition
Field aggregation alert condition
Message count alert condition
Message conditional count alert condition
Can this even be done i Graylog?
Graylog is using Elasticsearch as a backend, which is not a good system for metrics (time series data), it's not efficient and doesn't scale well with time series data. This is the reason that most use another monitoring system for measuring resources and other time series data. It depends on your stack, but there are lots of open source and commercial offerings to do that.
If you wanted to do logs and metrics together I would suggest using open source software the Elasic Stack can do both, but that is only my reccomendation if you have limited numbers of metrics. Splunk and SumoLogic can also do logs and metrics, but they are not ideal for time series, especially large numbers of them.

Can VMs on Google Compute detect when they've been migrated?

Is it possible to notify an application running on a Google Compute VM when the VM migrates to different hardware?
I'm a developer for an application (HMMER) that makes heavy use of vector instructions (SSE/AVX/AVX-512). The version I'm working on probes its hardware at startup to determine which vector instructions are available and picks the best set.
We've been looking at running our program on Google Compute and other cloud engines, and one concern is that, if a VM migrates from one physical machine to another while running our program, the new machine might support different instructions, causing our program to either crash or execute more slowly than it could.
Is there a way to notify applications running on a Google Compute VM when the VM migrates? The only relevant information I've found is that you can set a VM to perform a shutdown/reboot sequence when it migrates, which would kill any currently-executing programs but would at least let the user know that they needed to restart the program.
We ensure that your VM instances never live migrate between physical machines in a way that would cause your programs to crash the way you describe.
However, for your use case you probably want to specify a minimum CPU platform version. You can use this to ensure that e.g. your instance has the new Skylake AVX instructions available. See the documentation on Specifying the Minimum CPU Platform for further details.
As per the Live Migration docs:
Live migration does not change any attributes or properties of the VM
itself. The live migration process just transfers a running VM from
one host machine to another. All VM properties and attributes remain
unchanged, including things like internal and external IP addresses,
instance metadata, block storage data and volumes, OS and application
state, network settings, network connections, and so on.
Google does provide few controls to set the instance availability policies which also lets you control aspects of live migration. Here they also mention what you can look for to determine when live migration has taken place.
Live migrate
By default, standard instances are set to live migrate, where Google
Compute Engine automatically migrates your instance away from an
infrastructure maintenance event, and your instance remains running
during the migration. Your instance might experience a short period of
decreased performance, although generally most instances should not
notice any difference. This is ideal for instances that require
constant uptime, and can tolerate a short period of decreased
performance.
When Google Compute Engine migrates your instance, it reports a system
event that is published to the list of zone operations. You can review
this event by performing a gcloud compute operations list --zones ZONE
request or by viewing the list of operations in the Google Cloud
Platform Console, or through an API request. The event will appear
with the following text:
compute.instances.migrateOnHostMaintenance
In addition, you can detect directly on the VM when a maintenance event is about to happen.
Getting Live Migration Notices
The metadata server provides information about an instance's
scheduling options and settings, through the scheduling/
directory and the maintenance-event attribute. You can use these
attributes to learn about a virtual machine instance's scheduling
options, and use this metadata to notify you when a maintenance event
is about to happen through the maintenance-event attribute. By
default, all virtual machine instances are set to live migrate so the
metadata server will receive maintenance event notices before a VM
instance is live migrated. If you opted to have your VM instance
terminated during maintenance, then Compute Engine will automatically
terminate and optionally restart your VM instance if the
automaticRestart attribute is set. To learn more about maintenance
events and instance behavior during the events, read about scheduling
options and settings.
You can learn when a maintenance event will happen by querying the
maintenance-event attribute periodically. The value of this
attribute will change 60 seconds before a maintenance event starts,
giving your application code a way to trigger any tasks you want to
perform prior to a maintenance event, such as backing up data or
updating logs. Compute Engine also offers a sample Python script
to demonstrate how to check for maintenance event notices.
You can use the maintenance-event attribute with the waiting for
updates feature to notify your scripts and applications when a
maintenance event is about to start and end. This lets you automate
any actions that you might want to run before or after the event. The
following Python sample provides an example of how you might implement
these two features together.
You can also choose to terminate and optionally restart your instance.
Terminate and (optionally) restart
If you do not want your instance to live migrate, you can choose to
terminate and optionally restart your instance. With this option,
Google Compute Engine will signal your instance to shut down, wait for
a short period of time for your instance to shut down cleanly,
terminate the instance, and restart it away from the maintenance
event. This option is ideal for instances that demand constant,
maximum performance, and your overall application is built to handle
instance failures or reboots.
Look at the Setting availability policies section for more details on how to configure this.
If you use an instance with a GPU or a preemptible instance be aware that live migration is not supported:
Live migration and GPUs
Instances with GPUs attached cannot be live migrated. They must be set
to terminate and optionally restart. Compute Engine offers a 60 minute
notice before a VM instance with a GPU attached is terminated. To
learn more about these maintenance event notices, read Getting live
migration notices.
To learn more about handling host maintenance with GPUs, read
Handling host maintenance on the GPUs documentation.
Live migration for preemptible instances
You cannot configure a preemptible instances to live migrate. The
maintenance behavior for preemptible instances is always set to
TERMINATE by default, and you cannot change this option. It is also
not possible to set the automatic restart option for preemptible
instances.
As Ramesh mentioned, you can specify the minimum CPU platform to ensure you are only migrated to an instance which has at least the minimum CPU platform you specified. At a high level it looks like:
In summary, when you specify a minimum CPU platform:
Compute Engine always uses the minimum CPU platform where available.
If the minimum CPU platform is not available or the minimum CPU platform is older than the zone default, and a newer CPU platform is
available for the same price, Compute Engine uses the newer platform.
If the minimum CPU platform is not available in the specified zone and there are no newer platforms available without extra cost, the
server returns a 400 error indicating that the CPU is unavailable.

JProfiler, Want to store data measurment in disk automaticly without using JProfiler GUI?

I use JProfiler to measure Memory used, CPU and Garbage collection for my application. I can see all those measurement in JProfiler GUI. Also I am able to store data for all mentioned measurements in disk after finishing the test using options in GUI in order to generate a nice report using excel for example.
But I want to do the same task automatically, for example when the test complete I want to store all measurement automatically in disk without using GUI.
Any help?
Thank you
Ibrahim
This is done with offline profiling.
In the "Triggers" section of the session settings, you can set up triggers that are executed for certain events, such as entry / exit of selected methods, timers, low heap / high CPU conditions, JVM start and exit and others.
Each trigger has a list of actions that control the profiling agent. Among other things, they can start and stop recording and save snapshots.
You can then export data from the saved snapshots programmatically with the command line utility jpexport.

Java application exception monitoring & alarms

We have a few applications which are running in Windows 2K, 2008 servers. They are written in java.
These applications needs to do many automation tasks. We are having difficulty to monitor these applications. Sometime due to XYZ reasons application either hangs or fail to perform desired job. We only come to know about this after a few days when some one reports that desired function hasn't been executed.
To come out of this issue, we added emails for each imp exceptions but then developer needs to spend time to check those 1000 emails everyday. Which is again not feasible & efficient solution.
Now we are looking for a alert, alarms, notification display & monitoring system. We need to have a remote application which can receive alarms from these java applications & then based on certain information/Condition/Configuration, remote application can display some red, orange, green text on the screen. Based on red text, users can be visually see that there is an issue in the system. If required users can be notified that there is a serious issue in the application.
Please help us to identify any existing mechanism, tool, package to achieve this goal. Any suggestion would be highly appreciated.
Thanks
There are myriad ways to achieve this, but all of them will require some effort. Which way to proceed depends on your needs and abilities. A couple of options occur to me:
Have your processes log their exceptions to a Syslog daemon, running on some central server. Then you could have an admin read through the log file for serious problems, but there are many ways to post-process syslog messages, a web search on it might give some more hints.
Is there any way, when logged into the server, to observe whether or not the process is running properly or not? You could install something like Nagios on a sever, and write a plugin that monitors your particular process on all the servers. The plugin can basically be a shell script that checks "ps", or a log file, or whatever you want.
If you are in an IT department, your organization might already have some system like this (NMS).
I'm not sure why this question is tagged "snmp", but it's technically possible to install an SNMP agent on each server, and have them send traps on certain conditions. I do think it would be slightly overkill because you would also have to get a good SNMP manager to receive the traps and alert a sysadmin.
I would go for a combination of the check_logfiles plugin to parse log exceptions and raise alerts, and check_jmx/jmxquery to check metrics inside the JVM such as heap usage and thread count.
check_logfiles
check_jmx

Automatic hibernation of application instance on cloudbees

I have a cloudbees enterprise instance that I use for performance and automated UI testing.
The free instance (which is limited in memory) cannot support the memory or request per second that we have for testing.
I would like to have the instance automatically hibernated when I am not using it but have it wake up when requests come in. I would configure a jenkins job to wake the app up (by issuing a request) before kicking off my sauce lab based selenium jobs.
My question is how do I configure automatic hibernation? The control panel has minimum of one instance which I guess means that the one instance stays up.
You are right - currently automatic hibernation is only for free applications. When an application is hibernated (vs stopped) then it will be automatically woken whenever someone needs to access it.
What you could do for this is to have a job set your application to hibernated, say once a day, (or at certain time of the day when you know it won't be needed). When it is needed again - you won't need to do anything - simply accessing it will cause it to be activated (woken) again - so your test script can just insure that is the case (and ideally, after a test run, set it to hibernated again).
It really depends how often the app is needed - if you can work out what points it isn't needed and trigger the hibernate off that (eg after a test run) then that is ideal (you minimise cost).