Portal running with Glassfish 2.1.1, Liferay 5.2 and SSL get too many blocked threads - ssl

I have a portal which is running over SSL on Glassfish and uses Liferay. Last time we sent a email that brings approximately 200 people at same time to access released information our Glassfish "stalled".
From the server we could see that system resources were ok.
- Glassfish has up to 8 GB to use but was using 5 GB
- The server has 4 CPUs and the overall usage was around 30%
- Glassfish is configured up to 400 HTTP threads.
As soon we detected that our server wasn't answering users we started a profiler in order to understand what was going on.
The threads overview show too many blocked threads:
From the stack it's no possible to see code other than sun, grizzly, catalina classes:
I would like to fix such issue but right now I can tell whether I should work on our code our should replace some component like disabling SSL.
Any thoughts would be very appreciated.
Thanks.

A thread dump might have been easier and less intrusive than a profiler - this might have shown you where the threads are blocked in the actual running system.
You'll have to figure out where the blocking occurred: Was it in Liferay's code or in your own? What did you have on the pages, how is the theme done? Also, note that you're running a really old version of Liferay - in case you're running CE this has been out of maintenance for a few years now (Enterprise Edition still being supported, but as you don't mention this, odds are you're running Community Edition (CE))
Further, if you cause situations like the one you describe (sending loads of people at the same time) you might want to load test your system with an artificial load in order to see how it behaves. Also, you might want the landing page to be buffered (this is not to say that 200 users are a lot, but for any such activity you probably want to know that your system can handle it)
Until you prove the opposite, I'd assume that there is some custom component on the page (either a portlet or the theme) that causes a bottleneck and the blocking that you discovered.

Related

Liferay Cloud IDE, Multiple developpers working on same liferay server

We want to start working with liferay. But the server is too heavy and the developpers computer don't have enought RAM. We want to centralize the server instance.
In other words, we want to build a development server where all developpers can connect and directly develop in their web browser, compile, view the result and push the code to git repository.
I found some good cloud IDE like eclipse CHE and a good maven archetype for liferay projet. So i can build the projet with maven. But now i want to know if it is possible to configure Liferay like every developpers can work without troubling another. And if possible, How ?
The developpers can share the same database and can use different port. Maybe, the server can generate tempory URL like some online cloud editor.
I found this post Liferay With Multiple Server Instances, but i don't think is the best way because he create one server per project. I think is too heavy.
If necessary, We have kubernetes in our IS.
Liferay's tomcat bundle, by default, is configured to take a maximum of 2.5G for the process, but it can run with far less - the default only recently was bumped up, because many people never change the default and then wonder why production systems run out of memory. For 1 concurrent user (the sole developer) on a machine, I guess that the previous default of 1G heap space is enough. Are you saying that that's too much for your developers' machines?
Having many developers on a shared server poses one problem: Yes, you may deploy different code from different machines, but: How about setting a breakpoint? Can you connect with multiple debuggers? If something fails, how do you know whos recent deployment caused the failure?
Sharing a server is an integration technique, not a development technique. If your developers don't have enough memory available for running their own Liferay server next to their IDE, it's a lot cheaper to upgrade their machines than to slow them down when everybody is accessing the same server and they can't properly debug. You pay the memory once, but your waiting developers by the hour.
Is it possible to share one server? Sure it is.
Is it possible to share one server without troubling each other? I doubt.
When you say: You think it's too heavy: What are you basing that assumption on? What does the actual developer machine look like and what keeps you from investing in the extra memory?
It's trivial to share some infrastructure - i.e. have all of them connect to the same database server (and give everyone their own schema). But just the extra effort and setup might require you to pay the developers by the hour as much as you'd otherwise pay for a couple of memory chips.
And yet another option is: Run Liferay on a remote server, but keep 1 instance per developer. This way you don't need the local memory, but can have the memory in the cloud. Calculate if you pay more for remote cloud machines than for local memory - that decision is up to you.

Instability on Worklight Server

I'm using websphere liberty profile v8.5.5.0 and worklight 6.2.
The full version of my WL and runtime is:
Server version: 6.2.0.00.20140922-2259
Project WAR version: 6.2.0.00.20140922-2259
I've noticed that sometimes I have troubles getting into the worklightconsole, the server takes a too big of a time to answer and most of the time it just gives me a time out.
Regarding JVM Heap its at 60 - 70% of the total heap, most likkely 1,5 Gb or something like that.
On the FFDC, sometimes I get a error saying something close to an
FFDC Incident has been created: "javax.naming.ServiceUnavailableException: ldap.example.com:389; socket closed; remaining name 'o=example' com.ibm.ws.wim.adapter.ldap.LdapConnection 1670" at ffdc.log
I have my LDAP connected to this websphere via VPN, and I know that webspheres historically have trouble dealing with LDAP.
However I don't see any more errors on the logs; the machine eventually recovers and is able to work correctly, but for some time is 'down'.
If I enable tracing, the verbosity overwhelms the machine and I can't even start the worklightconsole, neither continue to work with worklight like calling an adapter from an application.
There is one more thing, it seems that this happens more frequently after updates on existing application versions or adapters. Does this ring a bell with anyone?
If i ask for a restart when the machine is sluggish, the stoping of the websphere takes quite some time but eventually stops normally and when I start it, everything is fine right out of the bat.
Before asking for a PMR, I would like to know if there is something else I could do to troubleshoot this problem.
Thanks in advance.
My initial "smell" of the problem is that sometimes your VPN connection with LDAP is very slow or your LDAP server is taking too long to respond.
My suggestion is that you try using WAIT(wait.ibm.com), it's a non-invasive easy to use diagnostic tool, to further investigate. If you find out the call to LDAP is getting hang then I suggest you try tuning Liberty LDAP cache, this should help.

Bottle WSGI server vs Apache

I don't actually have any problem, just a bit curious of things.
I make a python web framework based on bottle (http://bottlepy.org/). Today I try to do a bit comparison to compare bottle WSGI server and apache server performance. I work on lubuntu 12.04, using apache 2, python 2.7, bottle development version (0.12) and get this surprising result:
As stated in the bottle documentation, the included WSGI Server is only intended for development purpose. The question is, why the development server is faster than the deployment one (apache)?
As far as I know, development server is usually slower, since it provide some "debugging" features.
Also, I never has any response in less than 100 ms when developing PHP application. But look, it is just 13 ms in bottle.
Can anybody please explain this? This is just doesn't make sense for me. A deployment server should be faster than the development one.
Development servers are not necessarily faster than production grade servers, so such an answer is a bit misleading.
The real reason in this case is likely going to be due to lazy loading of your web application on the first request that hits a process. Especially if you don't configure Apache correctly, you could hit this lazy loading quite a bit if your site doesn't get much traffic.
I would suggest you go watch my PyCon talk which deals with some of these issues.
http://lanyrd.com/2013/pycon/scdyzk/
Especially make sure you aren't using prefork MPM. Use mod_wsgi daemon mode in preference.
A deployment server should be faster than the development one.
True. And it generally is faster... in a "typical" web server environment. To test this, try spinning up 20 concurrent clients and have them make continuous requests to each version of your server. You see, you've only tested 1 request at a time--certainly not a typical web environment. I suspect you'll see different results (we're thinking of both latency AND throughput here) with tens or hundreds of concurrent requests per second.
To put it another way: At 10, 20, 100 requests per second, you might still see ~200ms latency from Apache, but you'd see much worse latency from Bottle's server.
Incidentally, the Bottle docs do refer to concurrency:
The built-in default server is based on wsgiref WSGIServer. This
non-threading HTTP server is perfectly fine for development and early
production, but may become a performance bottleneck when server load
increases.
It's also worth noting that Apache is doing a lot more than the Bottle reference server is (checking .htaccess files, dispatching to child process/thread, robust logging, etc.) and all those features necessarily add to request latency.
Finally, I'd ask whether you tuned the Apache installation. It's possible that you could configure it to be faster than it is now, e.g. by tuning the MPM, simplifying logging, disabling .htaccess checks.
Hope this helps. And if you do run a concurrent benchmark, please do share the results with us.

Apache Tomcat 6.0.35 is taking 100% CPU in prodcution

I have been using apache-tomcat-6.0.35 in production environment. Our application is hosted on Amazon EC2 using Small Instance. The problem we are facing is that the apache tomcat is using 100% CPU. We have verified it by running htop and it shows multiple threads of tomcat running.
Out application has been developed in Grails 2.0.1.
We are puzzled that why it is happening? Can any body suggest any solutions?
Thanks
Probable Cause
Most likely this has been caused by the recent Leap Second and its impact on quite some unaware/unprepared IT systems, including parts of Linux, MySQL, Java and indeed Tomcat - see the Wired article about the ‘Leap Second’ Bug Wreaks Havoc Across Web for the whole story:
[...], saying it experienced the leap bug problem with the
Java-happy Tomcat web servers it uses to serve up its site. “Our web
servers running tomcat came close to zero response (we were able to
handle some requests),” read an e-mail from a site spokesman. “We were
able to connect to servers in order to reset them. Only rebooting the
servers cleared up the issue.” [emphasis mine]
Workaround / Fix
Accordingly, the solution usually boils down to turning it off and on again, i.e. restarting the server in question, though you might be able to avoid this by simply setting the date, as suggested e.g. in the context of:
Linux/Tomcat, see July 1 2012 Linux problems? High CPU/Load? Probably caused by the Leap Second!:
Apparently, simply forcing a reset of the date is enough to fix the
problem:
date -s "`date`"
MySQL, see MySQL and the Leap Second, High CPU and the Fix (also linked from the comments on wwwhizz' answer to MySQL high CPU usage, where you'll find two specific variations how to do this depending on your OS):
The fix is quite simple – simply set the date. Alternatively, you can
restart the machine, which also works. Restarting MySQL (or Java, or
whatever) does NOT fix the problem.
Background / Proposed Solutions
Please note that while the underlying issue is utterly tricky, it is all but unknown in principle, hence there have been prominent posts/users warning about and explaining this and offering suggestions on how to deal with it in principle, in particular:
An humble attempt to work around the leap second by Marco Marongiu
Time, technology and leaping seconds by Christopher Pascoe
We can't say anything for sure with the information provided. For performance issue, I would recommend a profiler, especially JProfiler, to investigate the cause of this problem. By this way you will be able to locate where the problem is.
This program has a trial license, I think that's enough for a quick look.
UPDATE: after carefully read your question, I see that you have many tomcat instance running for a website? It means that the previous tomcat instances failed to stop; they still run and hog up all the resources. This is possible. You must kill all the old tomcat process before trying to start a new one.
You can kill the processes by hand by "kill -9 " if you are on Linux, before trying to start the server again.

Server Setup: Based on Apache and Tomcat needs

I'm trying to setup a server based on our needs for a new website. Basically, I need to build a website based on social engine, and according to the platform's requirements (found here: http://www.socialengine.net/support/documentation/article?q=152&question=SocialEngine-Requirements) it requires the webserver to be Apache based.
Now my issue comes with the addition of a web application that needs to be included in the site. The web application requires the server to be capable of Asynchronous Request Processing, and is currently only supported by Tomcat or GlassFish.
I found a couple tutorials such as this one http://www.serverwatch.com/tutorials/article.php/2203891/Integrating-Tomcat-with-Apache.htm that explain how to "integrate" Tomcat into Apache. Would a server running Tomcat alone be able to handle the applet needs as well as serve the Apache (assuming HTTP) needs from the Social Engine platform? Are there any hosting providers any of you would reccomend?
Although I've done alot of front end stuff before, this is the first time i have to deal with any of the back end details, so my knowledge of server side functionality is really garbage. Please let me know if I'm not asking the right questions.
Thanks
You wouldn't really be able to use Tomcat for both apps, since the other one needs PHP. It's pretty common to have both Tomcat and Apache running on the same server. You might want to look up more recent documentation on mixing them, even this but definitely have a look at mod_proxy_ajp.
What's the other application? It's a little tricky to set up Asynchronous Request Processing if you are new to server apps, but there is also a lot of documentation, so if you're game, you can probably figure it out OK. You might also want to see if that app would work with node.js (hosting info here)
If you want to set it all up yourself, you could get a virtual private server from Rackspace Cloud or similar host or get a shared host that has the required apps already set up, which would limit your ability to customize the environment and may require 2 hosting plans, but would be easier to set up. It also somewhat depends on if both apps need to be on the same machine for any reason and/or on the same domain.
A regular LAMP stack will run SE4 just fine, however, you will need to do some tuning to get the page loads under 3 seconds. You will want to remove any Apache modules that you aren't using with a2dismod. For instance, if you're not using any Ruby on the site, a2dismod ruby. This will help get memory usage under control. APC is a must.
For a much more in depth read on tuning php/apache, please read this: Performance tuning on Apache, PHP, MySQL, WordPress v1.1 – Updated