Connection Problems With Mirrored Database - sql-server-2005

The setup is as follows:
A C++ client connects via OLEDB/SQL Native Client to a SQL Server 2005 database located on another machine. The server is setup with mirroring (automatic failover) with a synchronized server located on yet another server and a witness server on another server.
Occassionally (once every couple of days), our application seizes up in that it appears to attempt to establish a database connection to the database and rather than simply failing and OLEDB throwing a database connection failure it just gets "stuck" (we have a timeout for the connection but it's never timing out). 24 to 36 hours later we'll get an error:
TCP Provider: An existing connection was forcibly closed by the remote host.
And things will continue you on with lots of these errors and our app will eventually need to be restarted. We can't really figure out what condition could be causing this behavior and what we can do about it?
In preliminary research, I've seen some related problems that were solved by setting the Connection Lifetime connection string property to something non-zero.
Does anyone have any thoughts on what might be going on here?

Related

BizTalk connectivity issue to SQL during VM snapshot

We have one VM for BizTalk and a separate VM for the SQL backend. We are using Veeam for backups which basically kicks off a snapshot of the VM. When this snapshot is being finalized on the SQL VM, BizTalk services on the application server fail. Usually they restart automatically but sometimes this requires manual intervention to start the services. The error below is logged on the BizTalk server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error.
OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases.
The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.
Error message: [DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
Error source:
BizTalk host name: BizTalkServerApplication
Windows service name: BTSSvc$BizTalkServerApplication
We experienced the same situation and error with both BizTalk 2009 and BizTalk 2013, each set up with two App servers and one SQL DB server.
When our VMware does the final step of the Snapshot backup on the Application servers, it freezes the application server for about 10 seconds, preventing it from receiving packets. On SQL Server 2008 and 2012, it by default will send out keep-alive packets to the clients every 30 seconds (30,000 ms). If the SQL server fails to receive a response back from the App server, it will send out 5 retries (default setting) of the keep-alive request 1 second (1,000 ms) apart. If SQL still does not receive the response back, it will terminate the connection, which will cause the BizTalk hosts on the App server to reset, and in our case, when our German-made ERP system sends its EDI documents over to BizTalk during that reset period, the transmission will fail.
We trapped the issue by running NetMon on the DB and App servers, waiting for the next error message. Upon inspection, we see the five SQL keep-alive packets being sent to the App servers 1 second apart, and at the same time there were NO packets at all received on the Application server. At first guess, one might think they were "just dropped network packets", which is rarely the case. We then made the correlation to the timing of the VM Snapshots, and now confirm each time the snapshot finishes each day, the App servers freeze.
As a Short-to-mid-term workaround, we raised the number of retries SQL attempts before declaring a connection dead, (5 by default), by adding the registry value TcpMaxDataRetransmissions and setting it to 30 (thus 30 seconds before SQL declares the client unresponsive). This has masked the problem for now for us, and use at your own discretion.
We are also looking at an Agent-based version of the VM Snapshot, which may alleviate the condition of freezing the server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
Not that I am aware of, however you might want to Google config options in the btsntsvc.exe.config file which is located in your BizTalk installation directory.
All messages that pass through BizTalk are written to the BizTalkMsgBoxDb and its other databases are involved if you are running tracking, BAM etc. The only service that can cache 'stuff' and handle a database outage is the Enterprise Single Sign-On (ESSO) Service. BizTalk therefore needs a persistent connection to the database server to remain 'up', hence why your Host Instance (BizTalkServerApplication) is stopping - it simply wouldn't be able to process messages if the database wasn't there.
I would add that your approach to back-ups probably isn't supported by Microsoft and I would further suggest that you seriously consider whether an approach that takes your database server offline during the backup is viable?
BizTalk has a pretty robust backup solution for its various databases built into the product, and I would recommend that you take a look at using this supported method.
If you do need to take snapshots of the database system - say once a night - you might want to consider stopping the BizTalk Host Instances, performing the snapshot, and then re-starting the Host Instances through some scripted task.
You might also want to consider checking whether there are any hotfixes for your version of BizTalk Server included in a Cumulative Update that might help address your problem.

DataSnap server - share DB connection or new connection per client request?

I have a Delphi XE2 DataSnap server (Windows service) connected to a backend MS SQL Server 2008 (same server box) serving REST client requests.
Everything has been working great for some time until recently, I had an issue where for some reason the DataSnap service lost connection to the SQL Server.
The service failed to re-establish a connection and I had to restart the DataSnap service to continue.
This got me thinking because currently the service only uses 1 SQL connection (TADOConnection) shared for all the client requests. I did this because I didn't want the overhead of instantiating a new SQL connection for every client request.
I'm considering whether it actually would be better to have a separate SQL connection for each request and if the overhead would be noticeable - can anybody comment/advise on this?
This is where having a well-constructed Data Access Layer that can be modified to try different approaches and isolates your db connection from the rest of your code is really useful.
The pooling approach (as suggested by mjn) is strongly recommended if you're using MIDAS (DataSnap) from your clients to your DataSnap server as I've found it has a large connection overhead.
I've built a few web servcies (fairly low traffic) that use a plain TADOConnection at run-time and have found the overhead of establishing a database connection to be negligible, certainly compared with the overall network latency from the device to the server and back.
If you found TADOConnection still gave too much overhead in a high-traffic environment you could easily add your own connection pooling as above to such a system.

Predis - Removing server from the connection pool

Say, I am having N servers in the predis connection pool. I found that when one of the server does down, predis does not work(i.e. new predis/client(s1,s2,...) does not return successfully if any of the server Si is down). First, the entry of that failed server needs to be removed manually, and only after this predis resumes its work.
Since, predis claims to be using consistent hashing, shouldn't this be the case that predis automatically detects which of the server is not responding ( & has failed), and distribute the keys stored on the failed server to the other working servers?
Predis does use consistent hashing, but it is left to you to ensure that all servers in the pool are up and responding. Monitoring server availability is not intrinsically implied with consistent hashing.
You can check each server before you attempt to connect, and modify your connection pool based on your checks. You can store the available server list for the pool elsewhere and have some other process constantly watching and modifying the available server list. You could just assume they are always all up, and only check which ones need to be removed on a failure, or you can use any combination of the above. The bottom line is that predis does not, at the moment, do it for you.

ODBC connection re establishes after application pool recycle

I have a web service application which connects to databases through odbc sql native client and SQL Server drivers. all of a sudden the application stopped connecting to the database throwing the error 08001. But when i did the application pool recycle it started working. Now it is happening intermittently and became a headache for me. It cant be a memory problem as it happened immediately after app pool reclycle once. but agian got corrected after one more app pool recycle. i dont know what is happening as none of the error logs give any clue:(. Please help me...
the first step is to be able to diagnose what is going on. You cannot fix what you cannot measure. To do this I would enable pooling in the data source console for the driver, then add the counters to the performance monitor to see what the connection pool is doing.
I'm not sure what the realtionship between IIS applocation pool processes and odbc connections is but we are seeing some unexpected behaviour in this area. Also the odbc connection performance counters are visible if I connect to the driver through a locally installed console application but I cannot see any performance counter activity for connections made via the web service app pool in IIS? ODD!?

WCF Server/Client connection handling

I'm having some problems with my WCF server and client connections.
If I use my service and client normally, meaning my client connects and disconnects thru normal operations (myclient.Close()) everything works fine. But in the instance that I have the server running and I use the visual studio "Stop" button to exit my client application, my server and client communications seems to mess up and the server doesn't clean up the connection correctly and any new connections from my client are very unstable and only certain function calls work until one function will throw an exception of:
System.ServiceModel.CommunicationException: An error occurred while receiving the HTTP response to http://localhost:8080/Design_Time_Addresses/Service_Beta1/Service. This could be due to the service endpoint binding not using the HTTP protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the service shutting down). See server logs for more details. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive.
And many other exceptions after that, and I'll spare you from reading those.
If I shutdown the server and client and restart both, all my calls work fine until I do the drastic "Stop" in visual studio. How can I force the server to clean up improperly closed connections? I know in production the "Stop" button is gone and in theory there shouldn't be problems, but I don't want to have server connection problems from crashes of clients or bad disconnects. Because inevitably there will be those cases. It's best to fix this now before you have 20+ clients trying to connect and getting exceptions.
Thanks
Sorry for taking such a long time to post a reply. My problem is I was passing back a datatable to the client but wasn't giving the table a name on creation. See below.
Dim dt As New DataTable() 'Passing just a blank un-named table to client gave errors.
Dim dt As New DataTable("Table") 'Naming the table like so passes just fine.