Weblogic Stuck thread impacts other runnable threads in it - weblogic

I am using Weblogic 10.3.6 with 8 managed servers configured with session timeout as 600 seconds. I have an issue with my application that when a session gets timed out in 600 seconds(I am receiving as STUCK alerts which is also configured) I am facing slowness in my application. My question is,
Will all threads be impacted because of one STUCK thread(STUCK thread
was due to DB transaction timeout)
I assume it will not be, but wanted to confirm.

Depends on your application. In general no, but if for example the stuck thread is holding a lock on an object (database, file, etc.) called by other requests, these may be affected too. Also, depending on what the stuck thread is doing, it may use excessive resources (cpu, memory, disk, etc.). I suggest to investigate why the thread is taking so long and if it's possible to

Related

CXSYNC_PORT wait type in Azure Sql Database

I'm facing this issue intermittently now, where the query (called from stored Procedure) goes for CXSYNC_PORT wait type and continues to remain in that for longer time (sometimes 8hours in stretch). I had to kill the process and then rerun the procedure. This procedure is called every 2-hours from ADF pipeline.
What's the reason for this behavior and how do I fix the issue?
I searched a lot and there is not Microsoft documents talk about the wait type: CXSYNC_PORT. Others have asked the same question but still with no more details.
Most suggestions are that ask the same problem in more forums. Or ask professional engineer for help, and they will deal with your problem separately and confidentially.
Ask Azure support for details help: https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request
And here's the same question which Microsoft engineer gave more details about the issue:
As part of a fix CXPACKET waits were further broken down into
CXSYNC_CONSUMER and CXSYNC_PORT (and data transfer waits still
reported as CXPACKET) as to distinguish between different wait times
for correct diagnose of the problem.
Basically, CXPACKET is divided into 3: CXPACKET, CXSYNC_PORT,
CXSYNC_CONSUMER. CXPACKET is used for data transfer sync, while
CXSYNC_* are used for other synchronizations. CXSYNC_PORT is used for
synchronizing opening/closing of exchange port between consuming
thread and producing thread. Long waits here may indicate server load
and lack of available threads. Plans containing sort may contribute
this wait type because complete sorting may occur before port is
synchronized.
Please ref this link What is causing wait type CXSYNC_PORT and what to do about it? to get more useful messages. But for now, there isn't an exact solution.
use query hint OPTION(MAXDOP 1)
This will run your long running query in a single thread and you won't get the CX type waits. In my experience this can make a massive 10-20X decrease in execution time and will free up CPU for other tasks as there will be no context switching and thread coordination activity.

boost::asio and boost::thread_group where each thread has it's own libpqxx connection

I'm trying to combine boost::Asio, boost::thread_group where each thread has its own libpqxx(Prostgres) connection to the database. I seem unable to find any examples of asio/thread_group where the thread the task runs on has connection specific information. Asio seems to be specialized on the task containing all the information required to run it. Am I looking at the wrong combination to solve my specific problem?
I have a lot of requests coming in to my program, each of these requests require SQL commands to be run agains the DB ( timescaledb in my case ). These requests must be run on a limited number of connections agains the DB ( normally 8 in total).
My plan was to set up a thread_group of 8 threads each with it's own connection to the DB, and each thread connected to the asio::run. So that I could post new queries to the asio::post, and get a callback via signal2 when the result comes in.
Asio "hide" the threads, and thanks to assio::strands you can avoid more or less the concurrency. In very short you only throw task to asio, as a thread is available your task is submitted, but asio has a learning curve, as concurrency ...
As you describe your problem thread local storage is the answer.

Why are stuck threads other than contention ,like slow IO , slow backends (DB queries, web services, rmi calls)?

I am trying to figure out what are the main reasons for stuck thread . Now in WebLogic Server diagnoses a thread as stuck if it is continually working (not idle) for a set period of time. And if a user wants he/she can tune a server's thread detection behavior by changing the length of time before a thread is diagnosed as stuck (Stuck Thread Max Time), and by changing the frequency with which the server checks for stuck threads. My analysis is it is either cause by contention or different reasons like slow IO , slow backends (DB queries, web services, rmi calls) … rarely it is caused by bad coding or huge data (infinite lops) .
Other than above reasons are there more reasons for a thread to stuck ?
not sure what your question is here, here's my 2 cents
Bad Coding can lead to stuck threads
say a developer using a singleton map or hash etc that all servlets need to access.. when you have high load it can lead to contention for that resource and lead to stuck threads easily.
Stuck threads can be caused by slow running server (high cpu)
Sometimes bugs in WLS can cause it to be busy with internal processes resulting in stuck threads.. like WLS stuck in cluster communication.
You can even have stuck thread when Admin server is waiting to hear from a managed server that failed..
The list can go on and on. Only by taking 3-4 thread dumps in a short span of time can one confirm the cause.

Known issues for Weblogic 10 concurrency issues?

Recently, our production weblogic is taking too much time to process queues. Besides investigating into queues, db queries and other stuff I thought to look into any known memory and concurrency issues in weblogic.
Does anyone know ?
Summary about the problem:
we had like 2 queues and like 8-9 clusters. one of the queues was down for some reason and the other queue started to pile up and weblogic took forever to process it. the db io increased and cpu consumptions as well.
We had a similar production issue recently.
Check if Flow Control is set at the connection factory level. Using this setting weblogic can throttle message production when it sees that the queue is being overloaded.
Weblogic's checklist of things to do when you have a large message backlog is useful for you to compare to your own scenarios

SQL Compact lock timeout on __SysObjects

I'm using SQL Compact 3.5 SP2. My application is multi-threaded, but it does not share connections across threads. Instead, I use a custom object pool to ensure that each thread gets its own connection. That said, it's possible that a connection might be re-used on different threads at different times... in other words, I'm assuming that the connections don't have thread affinity. Also, not sure if it matters, but I'm using Entity Framework in .NET 3.5 SP1.
Anyway, when I've got high load situations (8+ threads), I'm getting lock timeout exceptions (regardless of the length of the timeout setting), and the exception always says the lock was on the __SysObjects table.
I'm not doing any DDL, so I don't understand why I would get locking timeouts on that table. Ideas?
I somewhat resolved this issue by making sure that my connections were closed after each use (as opposed to pooling the open connections), but if I let the code run for a long period of time I started getting OutOfMemoryException and AccessViolation exceptions.
This smells like the SqlCeConnection class has some kind of thread affinity dependency. Either that, or it has a memory leak of some kind.
At any rate, I've given up on trying to pool these objects.
EDIT: This actually appears to be an issued address by Cummulative Update 2. Since updating my references to the new libs, I haven't seen this problem. See: http://support.microsoft.com/kb/983516