I am using Apache Ignite 2.8.0
GridGain web console is used to monitor the live performance of Ignite.
Is there possible to Monitor the history records?
Example, I need view the performance of last week, is there possible by GridGain web console?
Initially when i downloaded that the size is 77MB, but now size is 256MB. So i got doubt as whether it will store the data anywhere?
GridGain Control Center, the next monitoring tool from GridGain that will substitute Web Console, supports tracing capabilities and more advanced monitoring dashboards that can work with historical data: https://www.gridgain.com/docs/control-center/latest/overview
Related
Our company uses Geode services for some of our applications, we are making use of Geode Member group configurations as well for maintaining different regions.
We have been undergoing an effort of migrating our applications from Geode version 1.6 to the latest version 1.12.
We have seen dramatic performance decrease after the upgrade, if we use the older parameters for the server and locator startup scripts, and things work fine when we remove those parameters.
We are now planning to take the route of understanding the parameters (earlier used) and available, to determine the most optimal configurations for the server and locator to get the best out of the new Geode version.
I was wondering if someone has any best practices or recommendations to follow for this task.
Below are the configurations for the Geode locator and server startup scripts for old and new versions.
Locator startup command
---Old configurations ( works great with Geode 1.6 version but not with any version after Geode 1.8)
gfsh start locator --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --initial-heap=2G --max-heap=2G --dir=/opt/compnayname/geode/locator --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-locator.xml --J=-DCLUSTER=${ECS_CLUSTER} --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-locator.xml' --J=-Dgemfire.distributed-system-id=${DISTRIBUTED_SYSTEM_ID} --J=-Dgemfire.member-timeout=30000 --J=-Dgemfire.max-num-reconnect-tries=0 --J=-Dgemfire.jmx-manager=true --J=-Dgemfire.jmx-manager-start=true --J=-Dgemfire.jmx-manager-port=1099 --J=-Dgemfire.http-service-port=0 --J=-Dgemfire.log-level=info --J=-Dgemfire.log-file-size-limit=10 --J=-Dgemfire.log-disk-space-limit=10 --J=-Dgemfire.disable-auto-reconnect=true
---New configuration (works great with all versions)
gfsh start locator --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --J=-Xmx2048m --dir=/opt/compnayname/geode/locator --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-locator.xml --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-locator.xml'
Server Startup command
---Old configurations ( works great with Geode 1.6 version but not with any version after Geode 1.8)
gfsh start server --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --initial-heap=${GEODE_INIT_HEAP} --max-heap=${GEODE_MAX_HEAP} --group=${SERVER_GROUP} --dir=/opt/compnayname/geode/server --classpath=/opt/compnayname/geode/services-geode.jar --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-server.xml --J=-DCLUSTER=${ECS_CLUSTER} --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-server.xml' --J=-Dgemfire.distributed-system-id=${DISTRIBUTED_SYSTEM_ID} --J=-Dgemfire.member-timeout=30000 --J=-Dgemfire.max-num-reconnect-tries=0 --J=-Dgemfire.socket-buffer-size=16777215 --J=-Dgemfire.off-heap-memory-size=${GEODE_OFF_HEAP} --J=-XX:+UseParNewGC --J=-XX:+UseConcMarkSweepGC --J=-XX:CMSInitiatingOccupancyFraction=60 --eviction-heap-percentage=70 --critical-heap-percentage=90 --J=-Dgemfire.http-service-port=0 --J=-Dgemfire.log-level=info --J=-Dgemfire.log-file-size-limit=10 --J=-Dgemfire.log-disk-space-limit=10 --J=-Dgemfire.disable-auto-reconnect=true ${ADDTL_GEODE_SERVER_OPTS}
---New configuration (works great with all versions)
gfsh start server --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --J=-Xmx${GEODE_MAX_HEAP} --group=${SERVER_GROUP} --dir=/opt/compnayname/geode/server --classpath=/opt/compnayname/geode/services-geode.jar --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-server.xml --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-server.xml'
Test Environment Details
We are using the exact same environment (read AWS) for testing the old and new configurations and performing the same test to measure the response time. We are using 3 Geode locators and 3 Geode servers for the different member groups.
The only difference is the Geode version
We are actually doing a count operation (we have written a count function to execute on Geode regions to count the records existing in the downloaded data which is actually data sketches (https://datasketches.apache.org/)). This count operation on the same data in the same testing environment is giving a drastically slow response with the old configuration using any Geode version beyond 1.8
Another surprising thing is that if I use the old configurations in my local laptop (my laptop serves as locator and server both) with any Geode version greater than 1.8 (including the latest version of Geode), then I am not seeing this issue. Somehow these extra configurations are causing slowness in the AWS environment in the distributed infrastructure.
Please let me know if more information is required and I will be glad to provide more details.
Any information on this will be appreciated.
The main difference I see is the inclusion of --J='-javaagent:/opt/xyz/geode/jmxtrans-agent-1.2.6.jar=/opt/xyz/geode/jmxtrans-agent-server.xml' in the startup parameters. This seems to be a third party Java Agent to expose several JVM metrics through JMX.
Do you know if the agent itself is modifying the byte code?, I've seen negative effects of that approach for Geode applications in the past (not performance related, though). Have you tried upgrading the agent to the latest available version (1.2.10)?. As a side note, Geode already exposes a lot of metrics and information through JMX out of the box, is there any reason why you're relying on yet another external tool for this?.
We have seen dramatic performance decrease after the upgrade
How are you measuring performance?, where do you see the degradation?, are you executing exactly the same workload on exactly the same machines, where the only difference is the Geode version?. There are several actors in play here.
That said, diagnosing and troubleshooting performance degradations can be a long and though process, so my suggestion would be to open a Geode JIRA Ticket with all the relevant information and artefacts.
Cheers.
Due to the firewall, i am unable to use Perfmon to check a load of apache server in RHEL with the load from Jmeter. So I Need some other tool which can be used to measure CPU and memory utilization of the RHEL 7 apache server. If there is any tool which can be used to check performance of load kindly suggest.
Are you looking for a free or a paid solution? You can take a look at OctoPerf, we support monitoring dozen of backend technologies combined with JMeter load testing. Our on-premise monitoring agent supports out-of-the-box monitoring for Apache Httpd Server, Linux Operating system and more.
If you look for free solutions, then you can take a look at Zabbix.
Currently I use jmeter aggregate report or summary report for submitting reports. But they expect something extra.. How can I give. Is there any plugins for getting server resources usage when testing load.
Reporting: since JMeter 3.0 there is a HTML Reporting Dashboard which can be generated during the test run. It contains exhaustive overview information. If you need to find out the reason of the bottleneck or memory leak or whatever you can consider extra Graphs available via JMeter Plugins project.
The same JMeter Plugins project provides PerfMon - client-server application which is able to collect over 70 different metrics and plot them via JMeter Listener. See How to Monitor Your Server Health & Performance During a JMeter Load Test guide for detailed setup and usage instructions.
There are quite a few plug-ins available that can help you analyze the results better. You can refer to https://jmeter-plugins.org/ for the same.
Most popularly used ones are:
Response Times Over Time
Response Times Percentiles
Transactions per Second
Response Latencies Over Time
In case of server usage you can use following that comes with JMeter plug-ins
PerfMon Metrics Collector and Server Agent or
In case of Unix based system use sar command that comes with sysstat package or VMstat. In case of windows based system use Perfmon to capture the system utilization data while the test is running and then use Ksar to plot graphs with the data collected using sar. https://sourceforge.net/projects/ksar/
If you have collected data using Perfmon then plot the graphs using PAL. https://pal.codeplex.com/
In this case, I would suggest using Grafana. It shows realtime results. And the best thing is, it can be configured according to the need.
Now, the thing is how to use it? Using it is not that tough.
If you're using a Mac or Linux (Any Flavour) things become easy. If you're using Windows, I would suggest using a virtual machine. The reason behind that is windows block traffic after some requests. And that causes a lot of pain in the head.
In my case, I used a virtual machine to setup ubuntu inside it and then configured Grafana.
For working with Grafana, you need to have these two things installed.
Grafana Itself
Influx Db for the backend
Links for both here below:
https://grafana.com/grafana/download?platform=linux
https://portal.influxdata.com/downloads/
Once installed and setup,
You need to use Backen Listener to push results o Graphite Client (Installed along with Influx DB Automatically).
I know it is a bit confusing but once you understand the thing, you and your client will love the detailed reports.
Remeber, Grafana is all about configuration.
Let me know if you have any confusion regarf=ding this.
Happy to help. :)
I am using collectd to gather metrics for system performance and MySQL and display in Grafana. I have done it now and I want to monitor the web server and services. I am facing some issues. Is there any other way to monitor them?
If you're willing to enable mod_status, collectd can scrape that page to provide web server metrics:
https://collectd.org/wiki/index.php/Plugin:Apache
Just to elaborate, you can use the collectd MySQL plugin for the storage of data in MySQL. Please see https://collectd.org/wiki/index.php/Plugin:MySQL
However, I would recommend a more robust solution using InfluxDB as the datastore software since it is designed to work with high I/O loads.
Check out this guide, was very useful to me: https://www.linkedin.com/pulse/monitoring-collectd-influxdb-grafana-alan-wen (no, I'm not the author.)
Many times, I get:
-Frozen, load goes to 5.0. Can't use my box.
-Just doesn't work.
Do following steps:
1.rabbitmq-plugins enable rabbitmq_management
2.service rabbitmq-server restart
3.browse to http://rabbitmq-server-ip:15672
4.login with
username: guest
password: guest
Dont forget to change your password later.
As sheki notes, rabbitmqctl is your first port of call for diagnostics, and for building monitoring on top of, but it's not suitable for actual monitoring directly being a manual command line.
I've found DataDog very good to monitor both the MQ details, plus the host platform in parallel. e.g. you can watch the queue levels and set alerts on queues backing-up, while also watching the CPU/memory/IO inflicted by these queue levels. It really helps to get ratios of resource usage, and the alerts are good. Having a uniform platform for both infrastructure and application level monitoring is surprisingly rare, but speeds up diagnoses of production issues hugely.
NewRelic is similar and also has a RabbitMQ plugin, although I've not used this plugin specifically, I've used NR for years and found it invaluable in diagnosing operational issues.
AppDynamics is another example. Similarly this allows you to drill down into your app from a high-level dashboard, and visually navigate from problems to causes. It's especially good with visualising the network of a distributed application across various services/servers. I've used this, for example, to find complex problems in .NET applications and SQL Server clusters using 3rd party Web Services (e.g. latency and its consequences to your app over chatty protocols). These things are very difficult to diagnose, especially for developers who are limited to checking their code. Diagnosing operational issues requires a much broader picture.
I gave up trying to even install and configure Nagios. I know it's the 'best' but it's the best of an old breed of self-configured beasts which we don't have time to manage. I didn't even get it going... and eventually turned to the more 'modern' cloud approach. Once you get over the trust factor, it's pretty liberating.
I'm using these APM platforms together* to aggregate data from:
Windows O/S level Event Logs/Services
Linux O/S level
AWS console level
RDS, EC2
Apache
MySQL
App integrations / custom NR plugins I've written
Rabbit MQ
*NewRelic can feed into Datadog! So if you are already using NR you don't need to install DD on those hosts as well.
Being able to view all these levels together gives you a view on the publishers, middleware, MQ servers, workers and front-end app - all in one dashboard.
I would highly recommend an approach like this, because just looking at one server alone leads you to a lot of head-scratching. Seeing an entire stack in one customisable dashboard is just so illuminating it takes most of the guesswork out of it.
Worried about installing these things? I found New Relic to be especially light-weight and unobtrusive. AppDynamics seemed to stress the host a bit more, but mostly that's because you had to run the visualisation tools on the host! (this may have changed). DataDog seems performant, but creates a lot of control panels/icons on the target host (perhaps just a visual impression).
To a four year old question - this answer probably wasn't available in 2011, but in 2015 these once 'startup' style APM services are just tens or hundred dollars a month for an unbelievably rich enterprise-level solution.
There are bunch of RabbitMQ monitoring plugins available for different monitoring systems like Nagios, Zabbix etc.
Look at http://www.rabbitmq.com/how.html#management
Using rabbitmqctl is the most straight forward solution to check the status of the node.
$ rabbitmqctl status
This should tell you the status of the RabbitMQ node.
If you have PRTG (or any probe system with a HTTP sensor check), you can check the server status described at the following page:
https://blog.cdemi.io/monitoring-rabbitmq-in-prtg/
In particular you have to
Enable Management Plugin
The rabbitmq-management plugin provides an HTTP-based API for management and monitoring of your RabbitMQ
server, along with a browser-based UI and a command line tool,
rabbitmqadmin. The management plugin is included in the RabbitMQ
distribution. To enable it, we need to run: rabbitmq-plugins enable
rabbitmq_management on the RabbitMQ nodes. For more details on the
Management plugin refer to RabbitMQ Documentation.
The web UI is located at: http://server-name:15672/ The HTTP API and
its documentation are both located at: http://server-name:15672/api/
Once done, you can check the overview of your server with the API:
http://server-name:15672/api/overview
Where you have a JSON with all details about the server, active connections, queues, etc.
This cmd will help you service rabbitmq-server status
OR try theseservice rabbitmq-server stop and service rabbitmq-server start then service rabbitmq-server status.