heartBeatIntervalInSecs does not have effect if the application starts with the server stopped - ibm-mobilefirst

WL 6.1
I have an application with:
ConnectOnStartup: true
heartBeatIntervalInSecs: 30
If the server is started and I start the application I can see in the application log a trace each 30sec for the heartbeat
But if the server is stopped and I start the application there is no trace for the heartbeat.
I handle the connection error with the onConnectionFailure and I let the application to start.
Is this ok? How could I enable the heartbeat manually?
I have tested this on Android.
Thank you.

There is API for this: WL.Client.setHeartBeatInterval(interval)
Accepts:
-1 to disable
Any other number (in seconds)
In your implementation simply either disable or enable (by setting an interval) whenever required.

Related

IIS shutsdown website even though timeout for app pool is set to 0

Title pretty much says it all. I have created a simple ASP.NET core 3.1 website which runs a MQTT subscriper. My problem is that after a while I see message in event viewer:
Application 'MACHINE/WEBROOT/APPHOST/MYWEBSITE' has shutdown.
Is there something I should add in the website itself or in IIS to make the website always running?
I found a complete step to step guide how to fix this from:
https://www.taithienbo.com/how-to-auto-start-and-keep-an-asp-net-core-web-application-and-keep-it-running-on-iis/
The missing part from my config was to add application initialization to server roles.
I had same problem before and solved with set Rapid-Fail protection to false.
If you trust to your server performance
or increase failure and maximum failures count.

Apache Curator connection state listener not always called with RECONNECTED state change

I am using Apache Curator v4.3.0 (ZK v3.5.8), and I noticed that in some disconnect/reconnect scenarios, I stop getting a RECONNECTED event to the registered listener/s.
CuratorFramework client = ...;
// retry policy is RetryUntilElapsed with Integer.MAX_VALUE
// sessionTimeout is 15 sec
// connectionTimeout is 5 sec
client.getConnectionStateListenable().addListener(new ConnectionStateListener()...
Although I do see that the ConnectionStateManager prints the state change:
[org.apache.zookeeper.ClientCnxn] - Client session timed out, have not heard from server in 15013ms for sessionid 0x10000037e340012, closing socket connection and attempting reconnect
[org.apache.zookeeper.ClientCnxn] - Opening socket connection to server
...
[org.apache.curator.ConnectionState] - Session expired event received
[org.apache.zookeeper.ClientCnxn] - Session establishment complete on server
[org.apache.curator.framework.state.ConnectionStateManager] - State change: RECONNECTED
Usually right after I see my listener called on stateChanged, but not always.
The CuratorFramework client is shared between multiple components registering different listeners. I didn't see any restriction to have only one client per listener. But, when I don't share it, the problem doesn't occur anymore.
Any suggestions on how to proceed debugging this problem?
Thank you,
Meron
This appears to be the bug that was fixed in Curator 5.0.0 - https://issues.apache.org/jira/browse/CURATOR-525 - if you can please test with 5.0.0 and see if it fixes the issue.

Intermittent problems starting Azure App Services: "500.37 ANCM Failed to Start Within Startup Time Limit"

Our app services are experiencing the problem, that they can’t be restarted by the hosting environment (ANCM).
The user is getting the following screen in that case:
Http Error 500.37
Our production subscription consists of up to 8 different app services and the problem can randomly harm one of them ore some of them.
The problem can occur several times a week, or just once a month.
The bootstrapping procedure of our app services is not time consuming.
The last occurrence of the problem has this log entries within the eventlog:
Failed to gracefully shutdown application 'MACHINE/WEBROOT/APPHOST/XXXXXXXXX'.
followed by:
Application '/LM/W3SVC/815681839/ROOT' with physical root 'D:\home\site\wwwroot' failed to load coreclr. Exception message: Managed server didn't initialize after 120000 ms
In most cases the problem can be resolved by manually stopping and starting the app service. In some cases we had to do that twice.
We are not able to reproduce that behavior locally.
The App Service Plan is S2 and we actually use just one instance.
The documentation of the Http error 500.37 recommends:
"You may need to stagger the startup process of multiple apps."
But there is no hint of how to do that.
How can we ensure that our app services are restarted without errors.
HTTP Error 500.37 - ANCM Failed to Start Within Startup Time Limit
You can try following approaches:
Approach 1: If possible, can try to move one app into a new App Service with a separate App Service plan, then check whether it can start as expected.
Please note that creating and using a separate App Service plan would be charged.
Approach 2: Increasing the startupTimeLimit attribute of the aspNetCore element.
For more information about the startupTimeLimit attribute, please check: https://learn.microsoft.com/en-us/aspnet/core/host-and-deploy/aspnet-core-module?view=aspnetcore-3.1#attributes-of-the-aspnetcore-element

Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 not communicating and forwarding logs to Indexer after certain period of time

I have noticed Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 is not communicating to deployment server and forwarding logs to indexer after certain period of time. "splunkd" process appears to be running while this issue persists.
I have to restart UFW for it to resume communication to deployment and forward logs. But this will again stop communication after certain period of time.
I cannot see any specific logs in splunkd.log while this issue occurs.
However, i noticed below message from watchdog.log
06-16-2020 11:51:09.055 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f24365fdcd0 within 8000 ms. Looks like thread name='Shutdown' is busy !? Starting to trace with 8000 ms interval.
Can somebody help to understand what is causing this issue.
This appears to be a Known Issue. From the 7.2.9.1 release notes:
Universal Forwarders stop sending data repeatedly throughout the day
Workaround: In limits.conf, try changing file_tracking_db_threshold_mb
in the [inputproc] stanza to a lower value.
I did not find a version where this is not listed as a known problem.

RUN#Cloud consistently throws me out during a heavy operation

I'm using a large app instance to run a basic java web application (GWT + Spring). There's an expensive operation within my application (report) which takes a long time to execute.
I've tried running it with the cloudbees SDK on my local machine with similar settings as it would be on the cloud and it seems to function just fine. It runs in about 3-4 minutes.
On the cloud, it seems to be taking longer. The problem isn't the fact that it takes long. What happens in that cloudbees terminates the session after 5 minutes and gives me an error in my browser saying 'Unable to connect to server. Please contact your administrator'. A report which doesn't take as long runs just fine. My application has a session timeout of 30 minutes, so that isn't a problem either.
What could possibly be going wrong? Is it something to do with cloudbees?
This may be due to proxy buffering of your request through the routing layer (revproxy) - so it most likely isn't a session timeout - but the http connection getting cut.
You can either set proxyBuffering=false via the bees CLI command (eg when you deploy the app) - this will ensure longer running connections can work.
Ideally, however, you could change the app slightly to return to the browser with some token which you can poll with to get completion status, as even with a connection that lasts that long, over the internet it may provide a bad experience vs locally.