JMS Service and Broker count issue - glassfish

Whenever I am trying to deploy my application I keep getting this Exception in the logs:
MQJMSRA_LB4001: start:Aborted:Unable to ping Broker within 60000 millis
I couldn't understand why this was happening so I checked domains/domain1/imq/logs/log.txt and this is what I found:
No threads are available to process a new connection on service admin. 10 threads out of a maximum of 10 threads are already in use by other connections. A minimum of 2 threads must be available to process the connection. Please either limit the # of connections or increase the imq.<service>.max_threads property. Closing the new connection. ". Count: service=5 broker=5
Can someone help me with understanding how to increase this count..
I would really appreciate your help on this.

You should change the connection properties (max_threads) of the broker as the error message suggests. The broker configuration file is \domains\\imq\instances\imqbroker\props\config.properties.

This depends on whether you are using OpenMQ in embedded mode or not. If you are using embedded MQ, look for the Thread Pools section of your config in the admin console. One of them will have a max threads set to 10, that will be the one to increase.
It's hard to be sure since you haven't given any other detail from the logs, but that is very likely what you need to change.

Related

Masstransit 3.5.7 causes leak in erlang processes & number of channels

we are using
masstransit 3.5.7
rabbitmq 3.6.5
Our environment is running ~2000 microservices.
We use cluster.
We are experiencing a leak in the number of channels as well as the number of erlang processes being used.
In the image below you can see that we have ~46,000 channels.
If we look into the connections, we see there are many idle channels in each connection.
In addition, maybe related to it, we can see that the number of erlang processes is constantly increasing.
Can someone please share some information and assist with this behavior? enter image description here
Erlang process yes it's related to opened channels, I simulated here opening thousands of channels (without masstransit, just a regular app) (and not closing by purpose) and look the result, similar to yours:
About the issue, possible it's related to:
https://github.com/MassTransit/MassTransit/issues/266
So you can try do this:
it's necessary to setup a cleanup timer on the SendEndpointCache so
that unused endpoints are shut down after a few minutes.
Hope it helps.

ADO.NET Pooled connections are unable to reuse

I'm working on an ASP.NET MVC application which use EF 6.x to work with my Azure SDL Database. Recently with an increased load app start to get into a state when it's unable to communicate with the SQL server anymore. I can see that there are 100 active connections to my database using exec sp_who and any new connection is unable to create with the following error:
System.Data.Entity.Core.EntityException: The underlying provider
failed on Open. ---> System.InvalidOperationException: Timeout
expired. The timeout period elapsed prior to obtaining a connection
from the pool. This may have occurred because of all pooled connections
were in use and max pool size was reached.
Most of the time app works with average active connection count from 10 to 20. And any load doesn't change this number... Event when load is high it stays at level 10-20. But in certain situations, it could just up to 100 in less than a minute without any ramp up time and this causes app state when all my requests are failing. All those 100 connection are in sleeping state and awaiting command.
The good part is I found a workaround which helped me to mitigate the issue - clear connection pool from the client side. I'm using SqlCoonection.ClearAllPools() and it instantly closing all the connections and sp_who shows me my regular 10-20 connection after that.
The bad part, I still don't know the root cause.
Just to clarify the app load is about 200-300 concurrent users, which generate 1000 requests per minute
With the great suggestion #DavidBrowne to track leaked connection with a simple pattern I was able to find leaked connections while configuring Owin engine
private void ConfigureOAuthTokenGeneration(IAppBuilder app)
{
// here in create method I'm creating also a connection leak tracker
app.CreatePerOwinContext(() => MyCoolDb.Create());
...
}
Basically with every request, Owin creates a connection and doesn't let it go and when the WebAPI load is increased I have troubles.
Could it be the real cause and I Owin smart enough to lazy create a connection when needed (using the function provided) and let it go when it was used?
It's very unlikely that this is caused by anything other than your application code leaking connections.
Here's a helper library you can use to track when a connection is leaked, and report the call site that initially opened the connection.
http://ssbwcf.codeplex.com/SourceControl/latest#SsbTransportChannel/SqlConnectionLifetimeTracker.cs

How to handle long asynchronous requests with pyramid and celery?

I'm setting up a web service with pyramid. A typical request for a view will be very long, about 15 min to finish. So my idea was to queue jobs with celery and a rabbitmq broker.
I would like to know what would be the best way to ensure that bad things cannot happen.
Specifically I would like to prevent the task queue from overflow for example.
A first mesure will be defining quotas per IP, to limit the number of requests a given IP can submit per hour.
However I cannot predict the number of involved IPs, so this cannot solve everything.
I have read that it's not possible to limit the queue size with celery/rabbitmq. I was thinking of retrieving the queue size before pushing a new item into it but I'm not sure if it's a good idea.
I'm not used to good practices in messaging/job scheduling. Is there a recommended way to handle this kind of problems ?
RabbitMQ has flow control built into the QoS. If RabbitMQ cannot handle the publishing rate it will adjust the TCP window size to slow down the publishers. In the event of too many messages being sent to the server it will also overflow to disk. This will allow your consumer to be a bit more naive although if you restart the connection on error and flood the connection you can cause problems.
I've always decided to spend more time making sure the publishers/consumers could work with multiple queue servers instead of trying to make them more intelligent about a single queue server. The benefit is that if you are really overloading a single server you can just add another one (or another pair if using RabbitMQ HA. There is a useful video from Pycon about Messaging at Scale using Celery and RabbitMQ that should be of use.

Prioritize real time msgs over batch msgs using Queues/MDBs

In my application a specific service has a constant bandwidth (For e.g 100 transactions at a time ) , requests to the service arrive real-time as well as batch jobs (Queues). The real time requests doesnt have a uniform distribution. I need a way to make sure that real time jobs are processed first before the batch jobs and also make sure that at any time I don't exceed the threshold of the service.
Please evaluate the following approach.
Have 2 queues A - real time and B - Batch job. Have a thread pool of size = 100 (Service Threshold ) and let the
thread pool first try to pick msgs from A if any else pick from B.
My application runs on Weblogic , I want to make use of MDBs instead of the thread pool but there is no way to make the MDBs listen to multiple Queues.
Within JMS you can set a message priority which should be respected if possible. This may be something simple to try.
Another option could be to set a JMS property on the message with the client and use a Message Selector on the MDB. You could set MY_MESSAGE_TYPE=batch/rt and then have multiple MDB's deployed that are listening to the same queue but can be assigned to different work managers. Keep in mind that Work Manager != Thread Pool. You can also set a Request Class to ensure that if the batch pool is in use that the RT pool will not be starved for threads/CPU.
With this design I believe that if you have two MDB's, one with a message selector, messages that meet the selector criteria should be delivered to the MDB with that selector (RT) before an MDB with no selectors (BATCH). This would be a fairly simple POC to do - set up a client that sends messages to the queue, some of which have the JMS property set to RT and others that do not have it set.
10.0 referece (which is still applicable): http://docs.oracle.com/cd/E11035_01/wls100/config_wls/self_tuned.html

NServiceBus delay retries

We need to be able to specify a delay in retrying failed messages. NServiceBus retries more or less instantly up to n times (as configured) before moving the message to error queue.
What I need to be able to do is for a given message type specify that its not to be retried for an arbitrary period of time
I've read the post here:
NServiceBus Retry Delay
but this doesn't give what I'm looking for.
Kind regards
Ben
This isn't supported as of right now. What you can do is let the messages go to the error queue and setup and endpoint to monitor that queue. Your code could then determine the rules for replaying messages. You could use a Saga to achieve this in combination with the Timeout manager.
Typically you'll have some rules around when to replay messages. In NSB 3.0 we have a better way to do this using the FaultManager. This gives you options on where to put failed messages and includes the exception. One of the options is a DB which you could then set up a job to inspect the exception and determine what to do with it.
Lastly a low tech way of getting this is to schedule a job that runs the ReturnToSourceQueue tool periodically to "clean up". We are doing this and including an alert so we don't endlessly cycle messages around.