System.out slow in JBoss - jboss7.x

I'm implementing a project using JBoss 7.1.1 where I need to perform a System.out with 1.5mb text size but it's taking more than 6 seconds to perform the task, do you know if it is a normal behavior take so long?

Related

What is the best strategy to select which parameters to use for Geode server and locator startup script

Our company uses Geode services for some of our applications, we are making use of Geode Member group configurations as well for maintaining different regions.
We have been undergoing an effort of migrating our applications from Geode version 1.6 to the latest version 1.12.
We have seen dramatic performance decrease after the upgrade, if we use the older parameters for the server and locator startup scripts, and things work fine when we remove those parameters.
We are now planning to take the route of understanding the parameters (earlier used) and available, to determine the most optimal configurations for the server and locator to get the best out of the new Geode version.
I was wondering if someone has any best practices or recommendations to follow for this task.
Below are the configurations for the Geode locator and server startup scripts for old and new versions.
Locator startup command
---Old configurations ( works great with Geode 1.6 version but not with any version after Geode 1.8)
gfsh start locator --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --initial-heap=2G --max-heap=2G --dir=/opt/compnayname/geode/locator --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-locator.xml --J=-DCLUSTER=${ECS_CLUSTER} --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-locator.xml' --J=-Dgemfire.distributed-system-id=${DISTRIBUTED_SYSTEM_ID} --J=-Dgemfire.member-timeout=30000 --J=-Dgemfire.max-num-reconnect-tries=0 --J=-Dgemfire.jmx-manager=true --J=-Dgemfire.jmx-manager-start=true --J=-Dgemfire.jmx-manager-port=1099 --J=-Dgemfire.http-service-port=0 --J=-Dgemfire.log-level=info --J=-Dgemfire.log-file-size-limit=10 --J=-Dgemfire.log-disk-space-limit=10 --J=-Dgemfire.disable-auto-reconnect=true
---New configuration (works great with all versions)
gfsh start locator --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --J=-Xmx2048m --dir=/opt/compnayname/geode/locator --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-locator.xml --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-locator.xml'
Server Startup command
---Old configurations ( works great with Geode 1.6 version but not with any version after Geode 1.8)
gfsh start server --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --initial-heap=${GEODE_INIT_HEAP} --max-heap=${GEODE_MAX_HEAP} --group=${SERVER_GROUP} --dir=/opt/compnayname/geode/server --classpath=/opt/compnayname/geode/services-geode.jar --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-server.xml --J=-DCLUSTER=${ECS_CLUSTER} --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-server.xml' --J=-Dgemfire.distributed-system-id=${DISTRIBUTED_SYSTEM_ID} --J=-Dgemfire.member-timeout=30000 --J=-Dgemfire.max-num-reconnect-tries=0 --J=-Dgemfire.socket-buffer-size=16777215 --J=-Dgemfire.off-heap-memory-size=${GEODE_OFF_HEAP} --J=-XX:+UseParNewGC --J=-XX:+UseConcMarkSweepGC --J=-XX:CMSInitiatingOccupancyFraction=60 --eviction-heap-percentage=70 --critical-heap-percentage=90 --J=-Dgemfire.http-service-port=0 --J=-Dgemfire.log-level=info --J=-Dgemfire.log-file-size-limit=10 --J=-Dgemfire.log-disk-space-limit=10 --J=-Dgemfire.disable-auto-reconnect=true ${ADDTL_GEODE_SERVER_OPTS}
---New configuration (works great with all versions)
gfsh start server --locators=$locators_str --name=${EC2_HOSTNAME}.aws.compnaynamedigital.net --J=-Xmx${GEODE_MAX_HEAP} --group=${SERVER_GROUP} --dir=/opt/compnayname/geode/server --classpath=/opt/compnayname/geode/services-geode.jar --J=-Dlog4j.configurationFile=/opt/compnayname/geode/log4j2-server.xml --J='-javaagent:/opt/compnayname/geode/jmxtrans-agent-1.2.6.jar=/opt/compnayname/geode/jmxtrans-agent-server.xml'
Test Environment Details
We are using the exact same environment (read AWS) for testing the old and new configurations and performing the same test to measure the response time. We are using 3 Geode locators and 3 Geode servers for the different member groups.
The only difference is the Geode version
We are actually doing a count operation (we have written a count function to execute on Geode regions to count the records existing in the downloaded data which is actually data sketches (https://datasketches.apache.org/)). This count operation on the same data in the same testing environment is giving a drastically slow response with the old configuration using any Geode version beyond 1.8
Another surprising thing is that if I use the old configurations in my local laptop (my laptop serves as locator and server both) with any Geode version greater than 1.8 (including the latest version of Geode), then I am not seeing this issue. Somehow these extra configurations are causing slowness in the AWS environment in the distributed infrastructure.
Please let me know if more information is required and I will be glad to provide more details.
Any information on this will be appreciated.
The main difference I see is the inclusion of --J='-javaagent:/opt/xyz/geode/jmxtrans-agent-1.2.6.jar=/opt/xyz/geode/jmxtrans-agent-server.xml' in the startup parameters. This seems to be a third party Java Agent to expose several JVM metrics through JMX.
Do you know if the agent itself is modifying the byte code?, I've seen negative effects of that approach for Geode applications in the past (not performance related, though). Have you tried upgrading the agent to the latest available version (1.2.10)?. As a side note, Geode already exposes a lot of metrics and information through JMX out of the box, is there any reason why you're relying on yet another external tool for this?.
We have seen dramatic performance decrease after the upgrade
How are you measuring performance?, where do you see the degradation?, are you executing exactly the same workload on exactly the same machines, where the only difference is the Geode version?. There are several actors in play here.
That said, diagnosing and troubleshooting performance degradations can be a long and though process, so my suggestion would be to open a Geode JIRA Ticket with all the relevant information and artefacts.
Cheers.

Selenium standalone in docker compose - killed by OS?

I am running an application using docker-compose.
One of the containers is a selenium/standalone-chrome image. I give is shm_size of 2g.
The application works fine when there is no high load. However, I have noticed that whenever there are concurrent requests to the selenium container (9 concurrent requests on a 8-core machine) Selenium fails silently. It just dies and stays dead. Subsequent request are not handled. There is nothing in the logs. The last message is:
17:41:00.083 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 5da2cd57f4e8e4f80b907564d7352051 (org.openqa.selenium.chrome.ChromeDriverService)
I am monitoring the RAM and CPU usage using both docker stats and top. Ram is fine .. about 50% used. Using free -m shows shared memory at about 500m. The 8 cores are taking the load staying at around 80% most of the time. However, whenever the last request arrives - the processes just die out. CPU usage drops. Shared memory does not seem to be released though.
In order to make it work again, I have to restart the application. Otherwise, none of the subsequent requests are received or logged.
I suspect there might me some kind of limitation from the OS on the containers and once they start consuming resources the OS kills them, but to be fair, I have no idea what is going on.
Any help would be greatly appreciated.
Thanks!
Update:
Here is my docker-compose reference
selenium-chrome:
image: selenium/standalone-chrome
privileged: true
shm_size: 2g
expose:
- "4444"
This is what my logs look like when it hangs:
And after I kill the docker-compose process and restart it:
I have also tested different images. These screenshots are actually with image selenium/standalone-chrome:3.141.59-gold.
One last thing that puzzles me even more - I am using selenium for screenshots, and I have added webhook call in the java code if the process fails. I would expect it to fire if the selenium process dies, however, it seem the java does not consider the selenium connection dead and stays waiting until I docker-compose down. Then all the messages from the webhook are fired.
Update2:
Here is what I have tried and I know so far:
1. chrome driver version makes no difference
2. shm_size increase does not make any difference
3. jvm memory limit makes no difference - command: ["java", "-Xmx2048m", "-jar", "/opt/selenium/selenium-server-standalone.jar"]
4. always hangs on the same spot .. 8 concurrent processes on a 8 core machine
5. once dead, stays dead
6. lots of chrome processes hang there - ps -aux | grep chrome
6.1 if those processes are killed - sudo kill -9 $(ps aux | grep 'chrome' | awk '{print $2}'), the process does not start again and stays dead.
7. --no-sandbox option does not help
8. the java process is alive on the host - telnet ip 4444 -> connects succesfully
I suspect your selenium/standalone-chrome is implemented using Java technology.
And the container's JVM has a bounded max memory with JVM argument -Xmx2048m or similar value.
Research selenium JVM setup/configuration files.
What can happen is one or more of the options:
Container application crashed with out of memory, because its memory bound was reached. Solution: decrease JVM max memory bound to match container's max memory bound (maybe 2048m > 2g).
JVM application crashed with out of memory. Solution: increase JVM max memory bound to match container's max memory bound (maybe 2048m not sufficient for the task).
Container peaked its CPU utilization limit for a moment and crashed. I assume selenium implements massive parallelism (check its configuration). Solution: provide more compute power to the container, or decrease selenium parallelism functionality.
Note that periodic resource monitoring tools fail to identify peak resources stress. If the peak is momentary and sharp. So if the resources stress is building up gradually you can identify the breaking point.

Portal running with Glassfish 2.1.1, Liferay 5.2 and SSL get too many blocked threads

I have a portal which is running over SSL on Glassfish and uses Liferay. Last time we sent a email that brings approximately 200 people at same time to access released information our Glassfish "stalled".
From the server we could see that system resources were ok.
- Glassfish has up to 8 GB to use but was using 5 GB
- The server has 4 CPUs and the overall usage was around 30%
- Glassfish is configured up to 400 HTTP threads.
As soon we detected that our server wasn't answering users we started a profiler in order to understand what was going on.
The threads overview show too many blocked threads:
From the stack it's no possible to see code other than sun, grizzly, catalina classes:
I would like to fix such issue but right now I can tell whether I should work on our code our should replace some component like disabling SSL.
Any thoughts would be very appreciated.
Thanks.
A thread dump might have been easier and less intrusive than a profiler - this might have shown you where the threads are blocked in the actual running system.
You'll have to figure out where the blocking occurred: Was it in Liferay's code or in your own? What did you have on the pages, how is the theme done? Also, note that you're running a really old version of Liferay - in case you're running CE this has been out of maintenance for a few years now (Enterprise Edition still being supported, but as you don't mention this, odds are you're running Community Edition (CE))
Further, if you cause situations like the one you describe (sending loads of people at the same time) you might want to load test your system with an artificial load in order to see how it behaves. Also, you might want the landing page to be buffered (this is not to say that 200 users are a lot, but for any such activity you probably want to know that your system can handle it)
Until you prove the opposite, I'd assume that there is some custom component on the page (either a portlet or the theme) that causes a bottleneck and the blocking that you discovered.

Glassfish 2.1.1 domain crash while deploying jbi service if heap size is 2 GB

i have glassfish 2.1.1 running on top of 32 bit RHEL 4 where i need to frequently deploy and undeploy around 30-40 jbi service assemblies packaged in zip files. I use a shell script where the "asadmin deploy-jbi-service-assembly" and "asadmin start-jbi-service-assembly" commands run inside a loop. A peculiar thing I have encountered is that when I set the heap size of the domains through Xmx jvm option to 2048m, undeployment goes fine, but during deployment, after around 6-7 deployments, the domain crashes. When I reduced the heap size from 2048m to around 1700m, I was able to deploy all assemblies without hiccup. But then after deployment I again have to change the heap size and restart the domains. The server has about 48GB memory with 2 quad core cpus, so resource isn't scarce. This is getting to be a real headache. Anyone help me out ????

Is it normal that my Grails application is using more than 200 MB memory at startup?

My Grails application is running in a development environment. I still didn't go into production, but in any case, is it normal that my Grails application is requiring 230 MB at startup only (with an empty bootstrap and no request handled so far)?
Do you know why this is the case, how to improve memory usage in development mode and, most important, whether it is reduced in production environment?
To answer your questions, yes - it is normal. It's especially normal if you have a lot GSPs in your application. GSPs are runtime compiled so you can speed up their generation by increasing your permgen space.
You can improve memory use and performance in general by making sure that you are passing the '-server' flag when you load your server JVM.
I wouldn't blame all that memory usage just on Grails. Because it uses an embedded Tomcat (Jetty in older versions) there will be a decent amount of overhead even when running an empty application.
IMO, 230MB is a lot of memory use for a Java application. High memory usage is just part of life when writing jvm based applications.
My online Grails applications run in a VPS with only 512MB (which includes a Drupal CMS, Apache, the email services, ... and the Tomcat to run GRails) so you can definitely tune your application to use less memory