Apache Tomcat Threads in WAITING state while thread pool increases - apache

I am trying to analyse thread dumps I have taken from my tomcat server. One of the thread dumps was taken after a couple of minutes of uptime and shows a thread pool of about 70, with several in WAITING state. I left a script hitting the server overnight and when I took another thread dump in the morning. When comparing the two dumps I can see that the threadpool has increased to from 70 threads to 90 threads. I can also see that the same threads are in a WAITING state between one dump and the other, while 20 new threads are added. Would this suggest that there is some bug in my application or is this standard behavior? I am wondering why the threads that are in waiting are not being re-used and instead new threads being created. I am assuming that the threads have not been re-used at all from one dump to another because in the dump file it reports them as "waiting on " where the number in <> is the same from one dump to another, is this assumption correct?
For example, from my initial thread dump I see this:
"http-8000-40" - Thread t#74
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <4fd24389> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- None
and then I can see the same thread in the dump of the following morning in the same state and waiting on the same object: (I am assuming this from the numbers in "<>")
"http-8000-40" - Thread t#74
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <4fd24389> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- None

Tomcat needs to spend some time managing threads and other resources even after your webapp's code completes processing a request. In order to keep up with the load, Tomcat will allocate new threads if enough aren't available.
If you have 70 total threads and 70 simultaneous requests, all should be well. If one request (of 70) completes (that is, the client has received all the data) and another is made before Tomcat is fully-done with the request-processor thread, another thread will be allocated to handle the new request resulting on a thread pool of size=71.
This can happen many times because it's not deterministic due to context switches, GC pauses, etc. that can interfere with exact timing of everything happening on the server.

Related

Apache Ignite - Partition Map exchange causes deadlock when used with write through enabled cache

We have Ignite running in server mode in our JVM. Ignite is going into deadlock in following scenario. I have added the thread stack at the end of this question
a.Create a cache with write through enabled
b.In CacheWriter.write() implementation
1.Wait for a second to for step c to be invoked
2.Try to read from another cache
c. While step b is executing Trigger a thread which will create a new
cache.
d.On executing above scenario, Ignite is going into deadlock as
1.Readlock has been acquired by cache.put() operation
2.When cache creation is triggered in separate thread, Partition Map Exchange is also started
3.PME tries to acquire all 16 locks , but wait as one Read lock is already acquire
4.While reading from cache, cache.get() can not complete as it waits for current Partition Map Exchange to complete
We have face this issue in production and above scenario is just its reproducer. Write Through implementation is just trying to read from cache and cache creation is happening in totally different thread
Why Ignite is blocking all cache.get() operation for PME when it does not even have all required locks? Shouldn’t the call be blocked only after PME operation has all the locks?
why PME stops everything? If I create cache A then only related operation for cache A or its cache group should be stopped
Also is there any solution to solve this deadlock?
Thread executing cache.put() and write through
"main" #1 prio=5 os_prio=0 tid=0x0000000003505000 nid=0x43f4 waiting on condition [0x000000000334b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4870)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4830)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1463)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1128)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:688)
at ReadWriteThroughInterceptor.write(ReadWriteThroughInterceptor.java:70)
at org.apache.ignite.internal.processors.cache.GridCacheLoaderWriterStore.write(GridCacheLoaderWriterStore.java:121)
at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter.put(GridCacheStoreManagerAdapter.java:585)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6468)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6239)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5923)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4041)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3935)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2039)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1923)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1734)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1717)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2327)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2553)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2016)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1833)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1692)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:300)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:481)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:441)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:249)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1147)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:615)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2571)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2550)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1337)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:868)
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest.writeToCache(WriteReadThroughTest.java:54)
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest.lambda$runTest$0(WriteReadThroughTest.java:26)
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest$$Lambda$1095/2028767654.execute(Unknown Source)
at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:50)
at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:37)
at org.junit.jupiter.api.Assertions.assertDoesNotThrow(Assertions.java:3060)
at WriteReadThroughTest.runTest(WriteReadThroughTest.java:24)
PME thread waiting for locks
"exchange-worker-#39" #56 prio=5 os_prio=0 tid=0x0000000022b91800 nid=0x450 waiting on condition [0x000000002866e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076e73b428> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:897)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1222)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lockInterruptibly(ReentrantReadWriteLock.java:998)
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.lock0(StripedCompositeReadWriteLock.java:192)
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.lockInterruptibly(StripedCompositeReadWriteLock.java:172)
at org.apache.ignite.internal.util.IgniteUtils.writeLock(IgniteUtils.java:10487)
at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.updateTopologyVersion(GridDhtPartitionTopologyImpl.java:272)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updateTopologies(GridDhtPartitionsExchangeFuture.java:1269)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1028)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3370)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3197)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.lang.Thread.run(Thread.java:748)
Technically, you have answered your question on your own, that is great work, to be honest.
You are not supposed to have blocking methods in your write-through cache store implementation that might get in conflict with PME or cause pool starvation.
You have to remember that PME is a show-stopper mechanism: the entire user load is stopped. In short, that is required to ensure ACID guarantees. The lock indeed is divided into multiple parts to speed up the processing, i.e. allowing up to 16 threads to perform cache operations concurrently. But a PME does need exclusive control over the cluster, thus it acquires a write lock over all the threads.
Shouldn’t the call be blocked only after PME operation has all the
locks?
Yes, that's indeed how it's supposed to work. But in your case, PME tries to get the write lock, whereas the read lock is there, therefore it's waiting for its completion, and all further read locks are being queued after the write lock.
Also is there any solution to solve this deadlock?
move cache-related logic out of the CacheStore. Ideally, do not start caches dynamically, since that triggers PME. Have them created in advance if possible
check if other mechanisms like continuous-queries or entry processo would work.
But still, it all depends on your use case.
I don't think creating a cache inside the cache store will work. From the documentation for CacheWriter:
A CacheWriter is used for write-through to an external resource.
(Emphasis mine.)
Without knowing your use case, it's difficult to suggest an alternative approach, but creating your caches in advance or using a continuous query as a trigger works in similar situations.

Weblogic Stuck thread impacts other runnable threads in it

I am using Weblogic 10.3.6 with 8 managed servers configured with session timeout as 600 seconds. I have an issue with my application that when a session gets timed out in 600 seconds(I am receiving as STUCK alerts which is also configured) I am facing slowness in my application. My question is,
Will all threads be impacted because of one STUCK thread(STUCK thread
was due to DB transaction timeout)
I assume it will not be, but wanted to confirm.
Depends on your application. In general no, but if for example the stuck thread is holding a lock on an object (database, file, etc.) called by other requests, these may be affected too. Also, depending on what the stuck thread is doing, it may use excessive resources (cpu, memory, disk, etc.). I suggest to investigate why the thread is taking so long and if it's possible to

One thread receiving another thread result

I have a thread poo where I am creating 10 threads. Those threads are used to fetch data between two different process (Process A is on local machine and Process B on server) via a socket.
When I am running my code I observe that the result I am expecting from thread 1 I am getting on thread 4, It is swapping it always.
I want to know how can I stop this swapping?
For distinction between thread I am using pthread_key_create() and for maintaining threads I am using pthread_setspecific()

Apache Tomcat Threads in WAITING State with 100% CPU utilisation

The application, when subjected to load, sometimes, utilises 100%.
doing a kill -quit <pid> showed 1100+ threads in waiting state as:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode):
"http-8080-1198" daemon prio=10 tid=0x00007f17b465c800 nid=0x2061 in Object.wait() [0x00007f1762b6e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f17cb087890> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
- locked <0x00007f17cb087890> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:619)
"http-8080-1197" daemon prio=10 tid=0x00007f17b465a800 nid=0x2060 in Object.wait() [0x00007f1762c6f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f17cb14f460> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
- locked <0x00007f17cb14f460> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:619)
............
The state does not change even when the application-context is undeployed OR the DB is restarted.
Please suggest a probable cause.
App Server: Apache Tomcat 6.0.26
Max Threads: 1500
Threads in WAITING state : 1138
"waiting on" is not a problem. The thread is waiting to be notified - and in this case it is locked on the JIoEndpoint.Worker
The background thread that listens for
incoming TCP/IP connections and hands
them off to an appropriate processor.
So I think this is waiting for actual requests to come in.
Firstly, CPU utilization actually increases when you have many threads due to high amount of context switching. Do you actually need 1500? Can you try by reducing?
Secondly, Is it memory hogging or GC-ing too often?
"waiting for" would be a problem if you see those. Do you have any BLOCKED(on object monitor) or waiting to lock () in the stack trace?
On a Solaris system you can use the command
prstat -L -p <pid> 0 1 > filename.txt
This will give you a break down of each process doing work on the CPU and will be based on the Light weight processor ID, instead of the PID. When you look at your thread dump you can match the light weight process up to your NID (or TID depending on the implementations) which are shown on the top line of your thread dump. By matching these two things up you will be able to tell which of your threads are the CPU hog.
Here is an example of the output.
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
687 user 1024M 891M sleep 59 0 0:40:07 12.0% java/5
687 user 1024M 891M sleep 59 0 0:34:43 15.3% java/4
687 user 1024M 891M sleep 59 0 0:17:00 7.6% java/3
687 user 1024M 891M sleep 59 0 1:00:07 31.4% java/2
Then with a corresponding thread dump, you can find these threads
"GC task thread#0 (ParallelGC)" prio=3 tid=0x00065295 nid=0x2 runnable
"GC task thread#1 (ParallelGC)" prio=3 tid=0x00012345 nid=0x3 runnable
"GC task thread#2 (ParallelGC)" prio=3 tid=0x0009a765 nid=0x4 runnable
"GC task thread#3 (ParallelGC)" prio=3 tid=0x0003456b nid=0x5 runnable
So in the case of this High CPU case, the problem was in the Garbage collection. This is seen by matching the nid with the LWPID field
If this will help you out I would suggest making a script that will take the output your prstat and the CPU usage all at once. This will provide you wil the most accurate representation of your application.
As per your original two threads, #joseK was correct. Those threads are sitting and waiting to take a request from a user. There is no problem there.

How to cause locks to be freed in one thread which were set by another

I have a simple thread pool written in pthreads implemented using a pool of locks so I know which threads are available. Each thread also has a condition variable it waits on so I can signal it to do work.
When work comes in, I pick a thread by looking finding an available thread from the lock pool. I then set a data structure associated with the thread that contains the work it needs to do and signal on the condition variable that the thread should start working.
The problem is when the thread completes work. I need to unlock the thread in the lock pool so it's available for more work. However, the controlling thread is the one which set the lock, so the thread can't free this lock itself. (And the controlling thread doesn't know when work is done.)
Any suggestions?
I could rearchitect my thread pool to use a queue where all threads are signaled when work is added so one thread can grab it. However, in the future, thread affinity will likely be a problem for incoming work and the lock pool makes implementation of this easier.
It seems to me that the piece of data that you're trying to synchronize access to is the free/busy status of each thread.
So, have a table (array) that records the free/busy status of each thread, and use a mutex to protect access to that table. Any thread (controller or worker) that wants to examine/change the thread status needs to seize the mutex, but the lock needs to be held only while the status is being examined/changed, not for the entire duration of the thread's work.
To assign work to a thread, you would do:
pthread_mutex_lock(&thread_status_table_lock);
-- search table for available thread
-- assign work to that thread
-- set thread status to "busy"
pthread_mutex_unlock(&thread_status_table_lock);
-- signal the thread
And when the thread finishes its work, it would change its status back to "free":
pthread_mutex_lock(&thread_status_table_lock);
-- set thread status to "free"
pthread_mutex_unlock(&thread_status_table_lock);