Why are stuck threads other than contention ,like slow IO , slow backends (DB queries, web services, rmi calls)? - weblogic

I am trying to figure out what are the main reasons for stuck thread . Now in WebLogic Server diagnoses a thread as stuck if it is continually working (not idle) for a set period of time. And if a user wants he/she can tune a server's thread detection behavior by changing the length of time before a thread is diagnosed as stuck (Stuck Thread Max Time), and by changing the frequency with which the server checks for stuck threads. My analysis is it is either cause by contention or different reasons like slow IO , slow backends (DB queries, web services, rmi calls) … rarely it is caused by bad coding or huge data (infinite lops) .
Other than above reasons are there more reasons for a thread to stuck ?

not sure what your question is here, here's my 2 cents
Bad Coding can lead to stuck threads
say a developer using a singleton map or hash etc that all servlets need to access.. when you have high load it can lead to contention for that resource and lead to stuck threads easily.
Stuck threads can be caused by slow running server (high cpu)
Sometimes bugs in WLS can cause it to be busy with internal processes resulting in stuck threads.. like WLS stuck in cluster communication.
You can even have stuck thread when Admin server is waiting to hear from a managed server that failed..
The list can go on and on. Only by taking 3-4 thread dumps in a short span of time can one confirm the cause.

Related

boost::asio and boost::thread_group where each thread has it's own libpqxx connection

I'm trying to combine boost::Asio, boost::thread_group where each thread has its own libpqxx(Prostgres) connection to the database. I seem unable to find any examples of asio/thread_group where the thread the task runs on has connection specific information. Asio seems to be specialized on the task containing all the information required to run it. Am I looking at the wrong combination to solve my specific problem?
I have a lot of requests coming in to my program, each of these requests require SQL commands to be run agains the DB ( timescaledb in my case ). These requests must be run on a limited number of connections agains the DB ( normally 8 in total).
My plan was to set up a thread_group of 8 threads each with it's own connection to the DB, and each thread connected to the asio::run. So that I could post new queries to the asio::post, and get a callback via signal2 when the result comes in.
Asio "hide" the threads, and thanks to assio::strands you can avoid more or less the concurrency. In very short you only throw task to asio, as a thread is available your task is submitted, but asio has a learning curve, as concurrency ...
As you describe your problem thread local storage is the answer.

Weblogic Stuck thread impacts other runnable threads in it

I am using Weblogic 10.3.6 with 8 managed servers configured with session timeout as 600 seconds. I have an issue with my application that when a session gets timed out in 600 seconds(I am receiving as STUCK alerts which is also configured) I am facing slowness in my application. My question is,
Will all threads be impacted because of one STUCK thread(STUCK thread
was due to DB transaction timeout)
I assume it will not be, but wanted to confirm.
Depends on your application. In general no, but if for example the stuck thread is holding a lock on an object (database, file, etc.) called by other requests, these may be affected too. Also, depending on what the stuck thread is doing, it may use excessive resources (cpu, memory, disk, etc.). I suggest to investigate why the thread is taking so long and if it's possible to

Known issues for Weblogic 10 concurrency issues?

Recently, our production weblogic is taking too much time to process queues. Besides investigating into queues, db queries and other stuff I thought to look into any known memory and concurrency issues in weblogic.
Does anyone know ?
Summary about the problem:
we had like 2 queues and like 8-9 clusters. one of the queues was down for some reason and the other queue started to pile up and weblogic took forever to process it. the db io increased and cpu consumptions as well.
We had a similar production issue recently.
Check if Flow Control is set at the connection factory level. Using this setting weblogic can throttle message production when it sees that the queue is being overloaded.
Weblogic's checklist of things to do when you have a large message backlog is useful for you to compare to your own scenarios

SQL Server running slow due to thread count increase

On our production server, for some reason at specific amount of time, thread count goes over and over to the certain point that though CPU Utilization is normal(30-50%), but the query starting to run slow we so lot more blocking statements.
I am not sure where to look at it, basically when our site runs normally, the thread count is around 150 threads, but during the specific time in a day(during 1:30 to 2:30) it come up to 270 threads.there are no extra sql transaction goes on, everything as normal as it was before but thread count grows and sql start behaving very very slow.
After restarting the SQL service immediately thread count comes to normal, and our site function fine for another 24 hours.
We are using SQL Server 2005, it is 24 core machine.
any idea?
Blocking statements steal workers (sys.dm_os_workers) so the server will spawn more workers to handle the incoming tasks. At 24 cores you'll have some 700 max worker threads out-of-the-box by default. So seeing 270 'threads' is not an issue, is well within the normal functioning parameters. You real problem must be the blocking, and you have to investigate it accordingly: who is blocking who and why. My bet is that you have a job running between 1:30 and 2:30 that is locking large portions of the database (a delete job perhaps?) and your queries block on locked rows. You'll have to investigate, find the root cause, and act accordingly. Reboot is not a solution, nor is blaming unrelated components (thread count). Use Activity Monitor, use Who Is Active, follow the methodical approach of Waits and Queues methodology. There are plenty of ways to identify the real problem. SQL Server will never appear slow due to thread count. It simply doesn't work like that.
You can control the degree of parallalism using the MAXDOP query hint. For more details please check this article:
http://blog.sqlauthority.com/2010/03/15/sql-server-maxdop-settings-to-limit-query-to-run-on-specific-cpu/
Thanks for your valuable feedback, yes it is true it is nothing which sql is behaving weiredly, it is something our Site which is based on Ektron CMS is responsible for, one of the Functionality of Ektron CMS (which is PageBuilder), while doing operation on this piece of Content was holding of the table badly, we have around 10 million users to our site, and probably since this was blocking the tables SQL Server goes nuts and does not respond very well.
We have finally eliminated the issue.

Intermittent SQL timeouts

Since a few days ago, the SQL server (Microsoft SQL Server 2005) backing our site has started occasionally timeouting. It is happening at seemingly random times approximately every hour or two. It usually takes about 10 minutes during which we see hundreds of timeouted requests. Under normal circumstances, most of our queries take less than 50ms. A query which takes a significant fraction of a second is an exception.
I have effectively killed a day trying to figure out at least something without any real progress. Normally, the server load is about 10-20%, and when the timeouts happen, we don’t see any increased CPU load. Also, there is nothing special happening during the timeouts; no overzealous web crawler, no heavy background tasks, no increased network traffic, no increased number of connections etc. Simply, everything looks as usual.
Not making any progress, we decided to restart it (and install the latest SP since we were in it) which seems to have fixed the problem. It has been already over six hours without any incident. Also, the CPU load has gone down under 10%.
It almost seems like if the SQL server "deteriorated" overtime. Perhaps, some internal structure (some cache or statistic) got out shape and caused the occasional problems. I don’t have any other explanation.
The only thing I noticed when I was monitoring the server (and got lucky once to be present when the timeouts were happening), I saw several long running queries waiting on CXPACKET. But I learned that this is most likely just a consequence of some other problem. I wrote a script monitoring SQL requests, and so hopefully, next time it happens, I will have more information.
Has anybody had similar experience? I’m not an SQL Server guru. Any suggestions are welcome.
since everything looked normal: CPU, nothing special happening, no overzealous web crawler, no heavy background tasks, no increased network traffic, no increased number of connections etc. I'd look into locking\blocking\race condition. Use this to see what (if anything) is locking when the time-out are happening:
How to find out what SQL queries are being blocked and what's blocking them?