Is it required to start MSDTC service on database server along with web server? Also should it be running on mirroring server too? - sql-server-2012

My project supports nested transactions and thus we have MSDTC service running on web server as well as on database server. The project is working fine. However, we have database mirroring established over database server and thus whenever fail-over happens, site page where, nested transactions are used, throws an error:
The operation is not valid for the state of the transaction.
We have MSTDC service running on mirroring database too. Please suggest what should be done to overcome this problem.

In the default DTC setup it is the DTC of the server that initiates the transactions (the web server in your case) that coordinates them. When the first database server goes down, it rollbacks its current transaction and notifies the transaction coordinator of this and that is why you get the error. The webserver cannot commit the transaction because at least one participant has voted for a rollback.
I don't think you can get around that. What your webserver should do is retry the complete transaction. Database calls would than be handled by the mirror server and would succeed.
That is at least my opinion. I'm no authority on distributed transactions, nor on database clusters with automatic failover...

Related

weblogic server goes down when any database glitches occurs

Dears,
I am facing a problem with my weblogic 12c Server. I have 3 managed nodes and one admin server. I have configured an Oracle Datasource. Whenever any network glitches or database downtime happens, weblogic server also goes down completely. Is there any configuration which I can make to hold on the server until everything comes back.
Servers are trying to restart automatically.But it is in STARTING state. I have to again start all servers and it is taking time. Your suggestions will help me a lot. Thanks in advance.
If you don't want your servers to restart automatically, the server failure action can probably be tweaked:
{managed_server} -> Configuration -> Overload -> Failure Action
However, probably you should be more interested in exploring if you are missing connection timeout config for your datasource.

Active connections on web farm

I'm trying to build a simple chat with websockets. I'm also displaying the current active users in the chat, and here is where the problems start: we use a web farm.
A user can connect through a loadbalancer with a server. When a new connection hits a server, it increases a counter in a SQL database and notifies the other servers in the farm through rabbit MQ.
All other servers fetch the new data and send that number back to their connected users.
If an user disconnects, the same will happen: The server decreases the counter in the SQL database and through rabbit MQ all other servers will know about this.
But, what will happen when a server dies? for example, If 10 users will be connected with this server. When that server goes down, all the users are disconnected, but that is not updated in the database anymore.
What's the best solution to get the total amount of active users in a web farm? And notifying the users when this amount has changed?
Thanks in advance!
Oh btw, we're using signalr
I think the typical way to deal with nodes asynchronously disconnecting from a mesh is to implement a heartbeat/keep-alive mechanism. In this case the heartbeat message would be between servers and there must also be an accessible record of which users are connected to which server. When a server does not produce a heartbeat for a period of time, then all other servers can update their records and mark all the users associated with the server as disconnected.
Looks like you may have a few options on how to keep track of users (SQL database or every server listens a Rabbit MQ message). As far as the heartbeat, you can implement it yourself or try to see if the laodbalancer's detection method can be utilized.

Database Sync failing after a day

We setup database sync between two databases on the same server. It worked fine yesterday and then stopped working today. I tried killing connections to the database and stopping the web apps that are connected to the database thinking maybe it was a connection limit. I also reset the user and pass after verifying the connection is correct.
This is the error we're getting:
Database re-provisioning failed with the exception "The current operation could not be completed because the database is not provisioned for sync or you not have permissions to the sync configuration tables." For more information, provide tracing ID ‘b4b76a8c-38ae-4b48-ad08-6c07933c23c1’ to customer support.
The error log indicated that the previously provision related operation was not completed yet, so you were not able to re-provision the sync group at the same time.
May I know whether you are still experiencing this problem? If yet could you please provide the latest error log? I'll update the answer then.

BizTalk connectivity issue to SQL during VM snapshot

We have one VM for BizTalk and a separate VM for the SQL backend. We are using Veeam for backups which basically kicks off a snapshot of the VM. When this snapshot is being finalized on the SQL VM, BizTalk services on the application server fail. Usually they restart automatically but sometimes this requires manual intervention to start the services. The error below is logged on the BizTalk server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error.
OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases.
The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.
Error message: [DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
Error source:
BizTalk host name: BizTalkServerApplication
Windows service name: BTSSvc$BizTalkServerApplication
We experienced the same situation and error with both BizTalk 2009 and BizTalk 2013, each set up with two App servers and one SQL DB server.
When our VMware does the final step of the Snapshot backup on the Application servers, it freezes the application server for about 10 seconds, preventing it from receiving packets. On SQL Server 2008 and 2012, it by default will send out keep-alive packets to the clients every 30 seconds (30,000 ms). If the SQL server fails to receive a response back from the App server, it will send out 5 retries (default setting) of the keep-alive request 1 second (1,000 ms) apart. If SQL still does not receive the response back, it will terminate the connection, which will cause the BizTalk hosts on the App server to reset, and in our case, when our German-made ERP system sends its EDI documents over to BizTalk during that reset period, the transmission will fail.
We trapped the issue by running NetMon on the DB and App servers, waiting for the next error message. Upon inspection, we see the five SQL keep-alive packets being sent to the App servers 1 second apart, and at the same time there were NO packets at all received on the Application server. At first guess, one might think they were "just dropped network packets", which is rarely the case. We then made the correlation to the timing of the VM Snapshots, and now confirm each time the snapshot finishes each day, the App servers freeze.
As a Short-to-mid-term workaround, we raised the number of retries SQL attempts before declaring a connection dead, (5 by default), by adding the registry value TcpMaxDataRetransmissions and setting it to 30 (thus 30 seconds before SQL declares the client unresponsive). This has masked the problem for now for us, and use at your own discretion.
We are also looking at an Agent-based version of the VM Snapshot, which may alleviate the condition of freezing the server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
Not that I am aware of, however you might want to Google config options in the btsntsvc.exe.config file which is located in your BizTalk installation directory.
All messages that pass through BizTalk are written to the BizTalkMsgBoxDb and its other databases are involved if you are running tracking, BAM etc. The only service that can cache 'stuff' and handle a database outage is the Enterprise Single Sign-On (ESSO) Service. BizTalk therefore needs a persistent connection to the database server to remain 'up', hence why your Host Instance (BizTalkServerApplication) is stopping - it simply wouldn't be able to process messages if the database wasn't there.
I would add that your approach to back-ups probably isn't supported by Microsoft and I would further suggest that you seriously consider whether an approach that takes your database server offline during the backup is viable?
BizTalk has a pretty robust backup solution for its various databases built into the product, and I would recommend that you take a look at using this supported method.
If you do need to take snapshots of the database system - say once a night - you might want to consider stopping the BizTalk Host Instances, performing the snapshot, and then re-starting the Host Instances through some scripted task.
You might also want to consider checking whether there are any hotfixes for your version of BizTalk Server included in a Cumulative Update that might help address your problem.

Predis - Removing server from the connection pool

Say, I am having N servers in the predis connection pool. I found that when one of the server does down, predis does not work(i.e. new predis/client(s1,s2,...) does not return successfully if any of the server Si is down). First, the entry of that failed server needs to be removed manually, and only after this predis resumes its work.
Since, predis claims to be using consistent hashing, shouldn't this be the case that predis automatically detects which of the server is not responding ( & has failed), and distribute the keys stored on the failed server to the other working servers?
Predis does use consistent hashing, but it is left to you to ensure that all servers in the pool are up and responding. Monitoring server availability is not intrinsically implied with consistent hashing.
You can check each server before you attempt to connect, and modify your connection pool based on your checks. You can store the available server list for the pool elsewhere and have some other process constantly watching and modifying the available server list. You could just assume they are always all up, and only check which ones need to be removed on a failure, or you can use any combination of the above. The bottom line is that predis does not, at the moment, do it for you.