IBM MobileFirst 7.1 is not auto recovering after a network failure / lost of connection even though all services/connections are back to normal.
We have a clustered / farm setup with 2 web and app servers (Tomcat). Both app servers are able to serve incoming transactions. We have this incident where-in there is a network failure/lost connection and during that time, all transactions are pointing to 1 app server. Although all connections went back to normal, this 1 app server still unable to connect to the configuration DB. What we did is turn-off this failure server and try the app which is now pointing to the other app server and the app works. We tried to restart the failure app server, test the app and is now accepting transactions. The question is, why it does not auto recover and Tomcat service needs to be restarted? Is MobileFirst 7.1 designed/built in such behavior (not auto recover)?
The expectation is, it should auto recover.
Please help and advise what can be checked/adjusted.
Thanks in advance.
Best regards,
Jonathan
The default DB configuration (datasource configuration) provided with MFP is not designed to auto recover when there is a DB connectivity issue. You
should be able to configure the MFP for auto-reconnect by providing correct data source configuration. See an article on how this is done for different app servers : https://www.techpaste.com/2016/04/jndi-autoreconnect-java-application-servers/
We have a jboss 7 instance running and hosting a web application. JMX remote has been turned on with username/password authentication and we are able to connect to it fine. Kindly not we are using Jboss/bin/jconsole.bat to connect.
However at times we notice after the following 2 cases it stops allowing any more connections to jmx unless we restart the jboss server. the cases are
1) we attempt a heap dump of the JVM using jconsole
2) We invoke a softreset method on a c3p0 datasource object that has been exposed via spring JMX
Not necessarily after doing any of the 2 it will always stop working. At times it stops taking new connections after trying one heap dump or at times after 3-4 successful attempts.
Any clue on this random behaviour of jconsole?
I think you ware bit by connection leak bug that AS 7.1.x had and it is fixed with 7.2.x versions.
I would recommend you to take EAP 6.1.0.Alpha1 (same as 7.2.0.Final) and try again.
If I recall correctly this was the original issue https://issues.jboss.org/browse/REMJMX-45
We have one VM for BizTalk and a separate VM for the SQL backend. We are using Veeam for backups which basically kicks off a snapshot of the VM. When this snapshot is being finalized on the SQL VM, BizTalk services on the application server fail. Usually they restart automatically but sometimes this requires manual intervention to start the services. The error below is logged on the BizTalk server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error.
OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases.
The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.
Error message: [DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
Error source:
BizTalk host name: BizTalkServerApplication
Windows service name: BTSSvc$BizTalkServerApplication
We experienced the same situation and error with both BizTalk 2009 and BizTalk 2013, each set up with two App servers and one SQL DB server.
When our VMware does the final step of the Snapshot backup on the Application servers, it freezes the application server for about 10 seconds, preventing it from receiving packets. On SQL Server 2008 and 2012, it by default will send out keep-alive packets to the clients every 30 seconds (30,000 ms). If the SQL server fails to receive a response back from the App server, it will send out 5 retries (default setting) of the keep-alive request 1 second (1,000 ms) apart. If SQL still does not receive the response back, it will terminate the connection, which will cause the BizTalk hosts on the App server to reset, and in our case, when our German-made ERP system sends its EDI documents over to BizTalk during that reset period, the transmission will fail.
We trapped the issue by running NetMon on the DB and App servers, waiting for the next error message. Upon inspection, we see the five SQL keep-alive packets being sent to the App servers 1 second apart, and at the same time there were NO packets at all received on the Application server. At first guess, one might think they were "just dropped network packets", which is rarely the case. We then made the correlation to the timing of the VM Snapshots, and now confirm each time the snapshot finishes each day, the App servers freeze.
As a Short-to-mid-term workaround, we raised the number of retries SQL attempts before declaring a connection dead, (5 by default), by adding the registry value TcpMaxDataRetransmissions and setting it to 30 (thus 30 seconds before SQL declares the client unresponsive). This has masked the problem for now for us, and use at your own discretion.
We are also looking at an Agent-based version of the VM Snapshot, which may alleviate the condition of freezing the server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
Not that I am aware of, however you might want to Google config options in the btsntsvc.exe.config file which is located in your BizTalk installation directory.
All messages that pass through BizTalk are written to the BizTalkMsgBoxDb and its other databases are involved if you are running tracking, BAM etc. The only service that can cache 'stuff' and handle a database outage is the Enterprise Single Sign-On (ESSO) Service. BizTalk therefore needs a persistent connection to the database server to remain 'up', hence why your Host Instance (BizTalkServerApplication) is stopping - it simply wouldn't be able to process messages if the database wasn't there.
I would add that your approach to back-ups probably isn't supported by Microsoft and I would further suggest that you seriously consider whether an approach that takes your database server offline during the backup is viable?
BizTalk has a pretty robust backup solution for its various databases built into the product, and I would recommend that you take a look at using this supported method.
If you do need to take snapshots of the database system - say once a night - you might want to consider stopping the BizTalk Host Instances, performing the snapshot, and then re-starting the Host Instances through some scripted task.
You might also want to consider checking whether there are any hotfixes for your version of BizTalk Server included in a Cumulative Update that might help address your problem.
I have a web service application which connects to databases through odbc sql native client and SQL Server drivers. all of a sudden the application stopped connecting to the database throwing the error 08001. But when i did the application pool recycle it started working. Now it is happening intermittently and became a headache for me. It cant be a memory problem as it happened immediately after app pool reclycle once. but agian got corrected after one more app pool recycle. i dont know what is happening as none of the error logs give any clue:(. Please help me...
the first step is to be able to diagnose what is going on. You cannot fix what you cannot measure. To do this I would enable pooling in the data source console for the driver, then add the counters to the performance monitor to see what the connection pool is doing.
I'm not sure what the realtionship between IIS applocation pool processes and odbc connections is but we are seeing some unexpected behaviour in this area. Also the odbc connection performance counters are visible if I connect to the driver through a locally installed console application but I cannot see any performance counter activity for connections made via the web service app pool in IIS? ODD!?
The setup is as follows:
A C++ client connects via OLEDB/SQL Native Client to a SQL Server 2005 database located on another machine. The server is setup with mirroring (automatic failover) with a synchronized server located on yet another server and a witness server on another server.
Occassionally (once every couple of days), our application seizes up in that it appears to attempt to establish a database connection to the database and rather than simply failing and OLEDB throwing a database connection failure it just gets "stuck" (we have a timeout for the connection but it's never timing out). 24 to 36 hours later we'll get an error:
TCP Provider: An existing connection was forcibly closed by the remote host.
And things will continue you on with lots of these errors and our app will eventually need to be restarted. We can't really figure out what condition could be causing this behavior and what we can do about it?
In preliminary research, I've seen some related problems that were solved by setting the Connection Lifetime connection string property to something non-zero.
Does anyone have any thoughts on what might be going on here?