I notice that there are some stuck threads.
I like to check what could be the cause of the stuck thread just base on the thread dump logs below. Any advise anyone? And also what is the difference between a fat lock and a thin lock?
"[STUCK] ExecuteThread: '25' for queue: 'weblogic.kernel.Default (self-tuning)'" id=87495 idx=0x274 tid=15308 prio=1 alive, in native, blocked, daemon
-- Blocked trying to get lock: com/jnn/testController#0x135a26c0[thin lock]
One set of thread dumps alone wont be too helpful to get to the root cause. Take 4 or 5 sets of thread dumps at an interval of 5 seconds between each. so at the end you will have a single log file which has around 20 - 25 seconds worth of action on the app server.
Then you should look for you want to check is a stuck thread or long running transaction is happening, all the thread dumps will show a certain thread id is at the same line in your java stack trace. In simpler terms, the transaction (say in an EJB or database) is spanning across multiple thread dumps and hence needs more investigation.
Now when you run these through Samurai or TDA (I havent used TDA myself), it will highlight these in Red colour so you can quickly click on it and get to the lines showing issues.
See an example of this here. Look at the Samurai output image in that link. Green is fine. Red and grey need looking at.
In your case, thread 25 is blocked trying to get the lock on this object
com/jnn/testController#0x135a26c0
Search the rest of the lock to see what is holding a lock on the same object, and see why it is not releasing the lock - this will be visible in the stack trace
A "thin lock" is a lock with no contention (contention happens when a thread has to wait before acquiring a lock).
Thin locks get promoted to "Fat locks" when there is contention and a list is made with all threads waiting to acquire the lock.
You can read more on the argument here:
http://download.oracle.com/docs/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/thread_basics.html
Related
I am using Weblogic 10.3.6 with 8 managed servers configured with session timeout as 600 seconds. I have an issue with my application that when a session gets timed out in 600 seconds(I am receiving as STUCK alerts which is also configured) I am facing slowness in my application. My question is,
Will all threads be impacted because of one STUCK thread(STUCK thread
was due to DB transaction timeout)
I assume it will not be, but wanted to confirm.
Depends on your application. In general no, but if for example the stuck thread is holding a lock on an object (database, file, etc.) called by other requests, these may be affected too. Also, depending on what the stuck thread is doing, it may use excessive resources (cpu, memory, disk, etc.). I suggest to investigate why the thread is taking so long and if it's possible to
Hi spring batch users,
regarding the documentation http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320
"If the process died ("kill -9" or server failure) the job is, of course, not running, but the JobRepository has no way of knowing because no-one told it before the process died."
I try to find and restart the stale job executions by using
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
...
jobExecution.setStatus(FAILED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
jobOperator.restart(jobExecution.getId());
But this seems to be very inconvenient.
1) I have to do this before other (new) jobs could be started.
2) I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick.
You can find other questions regarding this topic:
https://jira.spring.io/browse/BATCH-2433?jql=project%20%3D%20BATCH%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
Spring Batch after JVM crash
I would love to see a solution to register a "start up clean jobs listener". This will still not fix the problems originated by the multi server environment because spring batch does not know if the JobExecution marked by STARTED is not running on an other instance.
Thanks for any advice
Alex
Your job cannot and should not recover "automatically" from a kill -9 scenario. A kill -9 is treated very differently than you application throwing a caught Exception. The reason for this is that you've effectively pulled the carpet out from under the application without giving it a chance to reach a synchronization point with the database to commit any necessary information to the ExecutionContext or update the job/step status(es). Therefore, the last status touchpoint with the database will remain and the job will still look STARTED.
"OK, fine" you say, "but if I start another execution, I want it to find that STARTED execution, and pick up where it left off." The problem here is that there is no clean way for the application to distinguish a job that is ACTUALLY RUNNING from one that has failed but couldn't up the database. The framework here correctly errs on the side of caution and prevents you from starting a job that already appears running, and this is a GOOD thing.
Why? Because let's assume your job was actually still running and you restarted by accident. As coded, the framework will start to spin up, see your running execution and fail with the following message A job execution for this job is already running. I can't tell you how many times we've been saved by this because someone accidentally launched a job twice!
If you were to implement the listener you suggest, the 2nd execution would instead be allowed to start and you'd have 2 different JVMs repeating the same work, possibly writing to the same files/tables and causing a huge data mess that could be impossible to clean up.
Trust me, in the event the Linux terminal kills your job or your job dies because the connection to the database has been severed, you WANT human eyes on those execution states before you attempt a restart.
Finally, on the off chance you actually wanted to kill you job, you can leverage several other standard patterns for stopping jobs:
Stop via throw Exception
Stop via JobOperator.stop()
I am trying to analyse thread dumps I have taken from my tomcat server. One of the thread dumps was taken after a couple of minutes of uptime and shows a thread pool of about 70, with several in WAITING state. I left a script hitting the server overnight and when I took another thread dump in the morning. When comparing the two dumps I can see that the threadpool has increased to from 70 threads to 90 threads. I can also see that the same threads are in a WAITING state between one dump and the other, while 20 new threads are added. Would this suggest that there is some bug in my application or is this standard behavior? I am wondering why the threads that are in waiting are not being re-used and instead new threads being created. I am assuming that the threads have not been re-used at all from one dump to another because in the dump file it reports them as "waiting on " where the number in <> is the same from one dump to another, is this assumption correct?
For example, from my initial thread dump I see this:
"http-8000-40" - Thread t#74
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <4fd24389> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- None
and then I can see the same thread in the dump of the following morning in the same state and waiting on the same object: (I am assuming this from the numbers in "<>")
"http-8000-40" - Thread t#74
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <4fd24389> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- None
Tomcat needs to spend some time managing threads and other resources even after your webapp's code completes processing a request. In order to keep up with the load, Tomcat will allocate new threads if enough aren't available.
If you have 70 total threads and 70 simultaneous requests, all should be well. If one request (of 70) completes (that is, the client has received all the data) and another is made before Tomcat is fully-done with the request-processor thread, another thread will be allocated to handle the new request resulting on a thread pool of size=71.
This can happen many times because it's not deterministic due to context switches, GC pauses, etc. that can interfere with exact timing of everything happening on the server.
When using NSThread's detachNewThreadSelector:toTarget:withObject:, I'm finding that the thread will fully complete its execution before the application is terminated normally if the user were to attempt to quit the application while the background process was executing.
In this case, this is the behavior I desire, but I couldn't find anything in Apple's docs that suggests that this will always be the case. The only relevant information I was able to find was the following, from Apple's Threading Programming Guide:
Important: At application exit time, detached threads can be terminated immediately but joinable threads cannot. Each joinable thread must be joined before the process is allowed to exit. Joinable threads may therefore be preferable in cases where the thread is doing critical work that should not be interrupted, such as saving data to disk.
So from this, I know that detached threads can be terminated at the time of application exit, but will they ever be terminated automatically? Or, am I always safe to assume the thread will complete its execution before the application quits?
You cannot assume that any thread -- including the main thread -- will ever complete execution normally, regardless of the documentation.
This is because the user can quit an application at any time, the system may lose power/panic, or the app may crash.
As for detached threads, it would not be unheard of for the system frameworks to automatically terminate the app forcibly after some timeout once the main event loop has given up the ghost.
I have a VB.NET application that uses threads to asynchronously process some tasks in a "Scheduled Task" (console application).
We are limiting this application to run 10 threads at once, like so:
(pseudo-code)
- Create a generic list of 10 threads
- Spawn off the threadproc for each one
- Do a thread.join statement for each thread to wait for the longest running one to complete.
I am finding that if the code called by the threadproc contains any "Debug.Writeline" or "Trace.Traceinformation" statements, the thread hangs. I can see the thread in the Debug - Windows - Threads window and switch to it, but it highlights the Debug.Writeline statement and never gets past it.
Is there something special about the Debug or Trace statements that make them non-thread-safe?
Why would this hang things up? If I leave the debug statement in, the thread never completes. If I take the debug statement out, the thread completes in less than 5 seconds.
Yes and no.
Internally, Debug.WriteLine ends up calling into TraceInternal.WriteLine. This particular function does not explicitly stop thread execution but it does acquire a process global lock during the execution of the method. This lock protects both the list of trace listeners and serializes the processing of WriteLine commands.
It's possible for 2 threads to simultaneously hit this WriteLine statement and hence have one thread pause for a short period of time. It is also possible for a custom trace listener to be doing a very long lived or blocking operation which would essentially freeze all other threads for a noticable period of time.
Use Visual Studio to check and see what other threads are currently broken in this function. See if that gives you a clue as to what is holding up this process.
You may have a trace listener that maybe is interfering with the Debug.WriteLine.
There is a custom trace listener in this application. Once I commented it out, my locking problems were solved. Now if I could only track down the original developer to find out what they were doing with this custom listener...