Mulesoft cloudhub timeout although it works well locally - mule

Mule cloudhub times out only on HTTPs.
I have a mule http listener in my flow. It works all right locally as well as on the cloudhub upon deployment.
To add security, I switched on HTTPs and I did so as per this blog.
https://docs.mulesoft.com/runtime-manager/building-an-https-service
Works fine locally.But once it's deployed to cloudhub, it starts timing out with HTTP 504. I've even increased the idle time out to a pretty high value. But, it still times out.
Anyone faced this? would be great to get some solution for this.

Reason 1) There is a hard timeout for the Mule Cloudhub which is 5 minutes. If your API is exceeding it then that's the reason for the time out in the cloud. As per my knowledge, it's not configurable/overwritable for each organization client.
You might have to look into a callback model here.
Reason 2) If your API not taking 5 minutes or more time for processing, then it might be something with the SSL configurations
Reason 3) If the above two items are good then this might be something with network connectivity. Please verify the firewall rules in this case.

Related

Debugging Performance of Spring Cloud Gateway with Netty

I have Spring Cloud Gateway (Greenwich) running with Netty. This application receives request and then sends request downstream applications depending on the route configuration.
Randomly few request take lot of time(> 70s). Even though the downstream server responded back within 5 sec, Netty threads (reactor-http-epoll-*) are not picking up the response. I have enabled debug logs to see what those threads are doing. From preliminary analysis, it look like those threads are processing something else and are always in runnable state. When this happens the traffic to server is not unusual and it's same as before.
My question here is:
Why response was not processed by reactor threads while response was received(according to the logging of the downstream app, it sent the response. However, spring-cloud app received response way too late in the logs). Is it possible that all the threads are busy doing other things.
Is there any run book on how such issues should be investigated?
Some-places in logs I do see high number of inactive connections in logs but not sure if that is impacting anything. (Channel cleaned, now 56 active connections and 1400 inactive connections)
Any general guidance on how to proceed with investigation to understand why random slowness is happening in application will really help. Thanks for the help in advance.
Okay, so I ended up doing below things and after lot of investigation it started working fine for me.
Enable logging. Look at how many connections are getting created. In my case, lot of new connections were getting created and and they were not getting re-used.
io.netty.leakDetectionLevel=paranoid
logging.level.reactor.netty=DEBUG
logging.level.reactor.netty.channel.FluxReceive=DEBUG
spring.cloud.gateway.httpclient.wiretap=true
spring.cloud.gateway.httpserver.wiretap=true
Make sure there is no blocking code running on reactor-http-epoll-* threads.
I upgraded Spring Cloud dependencies from Greenwhich train to latest version of Hoxton train.

Maximum concurrent requests in signalr self hosted in kestrel

I've encountered a strange problem with an application I've developed. The application is a windows service hosting AspNetCore 2.0 running on Kestrel. This application receives requests through an IIS site acting as a proxy.
In this application, I also use signal 2.2.2 integrated using Microsoft.AspNetCore.Owin. All worked well until I detected that the application was not responding to requests.
Other applications on the same machine and using the same IIS server as proxy were working fine. Restarting the application pool serving the site solved the problem temporarily.
The problem resurfaced again and digging through monitoring information the application seems to hang when there are 400 signalr SSE connections on the same machine. This seems plausible as I've found that by default OWIN limits the number of concurrent requests at 100 * number of cpus. (Note that a site on the same machine is serving 5000 requests per minute without a sweat but these are not a long-lived request like the SignalR ones)
The problem is that I seem unable to find the same option when hosting Owin inside AspNetCore. Does someone know if this can be the solution and what is the correct setting?
EDIT: I'm fairly certain that the issue is caused by the number of SignalR connections opened concurrently because by disabling it in Javascript the problem vanished.
2nd EDIT: signalr does not seem to be the cuplrit as load testing the site with crank both in test and in production worked until 5000 concurrent connections which is the default IIS limit and is fine by me
After some trial and error I've been able to identify and correct the problem but it was no easy task so I'm leaving this answer behind if someone else stumbles upon the same problem.
Disabling SignalR did not solve the problem but it made it appear less often.
Thanks to the monitoring in place on the server and IIS I observed that the problem appeared when the number of connections to the site started growing rapidly. This system primarily makes request to other services so it does not have a database nor expensive computations.
Examining the code I've found that there were three problems:
a new HttpClient was created for every request which can exhaust the sockets which are not reused between requests blog blog2 blog3
by default there's a maximum number of concurrent connections on the httpClient to a single domain and this limit is set by default to 2 (!!!) blog4
the code was waiting synchronously on every web request to another system (this program was ported from an mvc4 site which never displayed this problem). This worked fine in MVC but asp.net core is very sensitive to this as it will rapidly exhaust all available threads and because the thread pool starts with the number of cores they will be exhausted quickly making all the requests wait. This value can be increased as temporary stop gap solution with ThreadPool.SetMaxThreads(Int32, Int32) but the only solution is to transform all calls in async calls.
Once all calls were mde async the problem never returned. Basically the problem was due to threadpool starvation and aspnet core sensibility to it vs MVC. Here you can find a nice explanation and a detection method using PerfView.
This could be the issue, but it's unlikely. When hosting in dotnet core you're probably using Kestrel as a webserver implementation, to switch these limits such as concurrent connections you can use KestrelServerLimits class as described in this Microsoft article.
KestrelServerLimits should not be causing you any problems since the default value for ConcurrentConnections is unlimited.

Upload text logs to MVC 4 web server every second

I have a Web Server implemented using dot net MVC4. There are clients connected to this web server which perform some operations and upload live logs to the server using WebClient.UploadString method. Sending these logs from client to server is being done in group of 2500 characters at a time.
Things work fine until 2-3 client upload logs. However when more than 3 clients try to upload logs simultaneously they start receiving "http 500 internal server error".
I might have to scale up and add more slaves but that will make the situation worse.
I want to implement Jenkins like live logging, where logs from slave are updated live.
Please suggest some better and scalable solution to this problem.
Have you considered looking into SignalR?
It can be used for anything from instant messaging to stocks! I have implemented both a chatbox, and a custom system that sends off messages, does calculations and then passes them back down to client. It is very reliable, there are some nice tutorials, and I think it's awesome.

The server rejected the session-establishment request: WCF hosted on IIS

We have some WCF services implemented in an IIS application, communicating over net.tcp on the default port (808), using the Microsoft Net.Tcp Port Sharing Service, throwing an error on production servers. When I instantiate a connection to the first of the services, I get back an exception:
The server at <URL> rejected the session-establishment request. All the other services respond fine.
But it runs fine on our test servers.
I initially thought there was something wrong with the particular service that was failing, but I tried rearranging the list of services into a different order, and it SEEMS to always be the first service that I hit that fails. (I say SEEMS because it think once in the early iterations of testing, I saw it happen on the second service that it hit. But I haven't been able to reproduce that.)
I've looked at application startup delays, and that doesn't seem to be the problem, because I can come back and run the test again as soon as it finishes - a delay of only a minute or two - and get the same error. Also, in the lower level environments, there is a start up delay of probably 30 seconds to a minute, but the result still comes back as expected.
I've tried accessing the services over http from INetManager, and I get intermittent failures on all the services - a particular service will return a yellow screen of death on on invocation, then come up with the expected link to the WSDL on the next one seconds later.
I'm completely at a loss to explain this behavior, or how to resolve it. I've googled the error message, and not found anything helpful. It may be a configuration issue - the production servers are newly provisioned VM's, and we may not have the config exactly right (whereas all the lower level environments have been running this and other similar apps for some time), but I have not idea what to look for. I've looked at the properties of the app pool that the app is running on and compared it to the lower level environments without finding any differences.
If somebody can point me in the right direction, you would have my undying gratitude.
Things I can find:
http://go4answers.webhost4life.com/Example/connect-busy-wcf-service-host-while-725.aspx:
MaxConcurrentSessions (default = 10) [Per-channel] The maximum number of sessions that a service can accept at one time. Only comes into play with session-based bindings (wsHttp or netTcp)"
http://blogs.infosupport.com/unable-to-generate-a-wcf-proxy-using-svcutil-but-retreiving-the-wsdl-works/
So in the end the trick is to add the additional right on the c:\windows\temp folder for your App Pool Identity [for the service to be able to generate metadata] to solve the problem.
Also, are timeouts or other limits configured and being hit? Give tracing a look and access the service using WcfTestClient and see if you can find underlying errors.

WCF client application hang -- need repro advice

I have a WCF application with a couple thousand clients connecting to a pair of services running under IIS. What I've noticed is that some of these clients get into a hung state, and I'm trying to reproduce this.
When this problem was first noticed, I had not modified the throttling configuration and the services were set to ConcurrencyMode.Single. One thing I noticed was that an IISReset on the server caused many clients to hang. Yet pulling this same stunt on the client running against IIS on my local machine doesn't seem to cause the problem.
I caught this only once in the wild, but didn't have debugging enabled at the time. The symptom I witnessed was that the client appeared to be trying to open a connection to the web server, but did not succeed. While monitoring with Fiddler, I saw no attempt to reach the service endpoint. Obviously that makes me suspect the client proxy.
I have a very solid hunch as to what's happening -- namely I've been using "Close()" instead of "Abort()" when the service throws an exception, which I believe is causing the channels to become corrupted. But considering the effort to get a new version out there, I need to reproduce this problem by causing a client on my own machine to hang before I can start making changes to the code.
Where should I start?
Thanks in advance,
roufamatic
Have you got any logging turned on? This could help in diagnosing the problem. It can be done completely in config, so no need to build a new version. Use the Service Configuration Editor tool to set it all up. The Visual Studio 2008 Training Kit has a good tutorial on how to use logging and the log viewer.
I suppose this was too vague a question though I was mostly curious what people might suggest. As it turns out there was a nontrivial difference between my workstation and a production environment that, once resolved, allowed me to see the problem. In this case, somehow using Fiddler to watch the traffic actually prevented the error from occurring! Now to ask another question.