Can anybody suggest how to update a web-service /wcf service holding live transactions without any downtime?
Please don't suggest updating at off peak hours, because our service runs 24x7 and transactions may run any period of time.
So what is the best way to update such service with some changes so that current transactions don't get affected?
I use load balancing to prevent errors and keep the site up during updates. For this to work you need at least two servers behind a load balancer that sends traffic to whichever server is up. My procedure for updating my sites is:
Tell "Server A" to start serving an error page at the URL that the load balancer pings. This tells the load balancer to stop sending traffic to this server.
Wait 30 seconds or so until traffic stops hitting the server.
Update the code on this server.
Tell the server to stop serving an error page at the ping URL.
Wait until that server is getting traffic again.
Repeat steps 1-5 with "Server B".
From my experience... i use a Git server and a local machine to edit the code to make sure it all works, and then push to the server, then pull on the live site, and then it has 0 down time....
Without 0 seconds downtime, It is not possible.
Least you can do is Update the web-services, test that thoroughly at say staging server. Verify and be sure updates are working perfect.
Update the services- part (a xyz.war file in my case) and re-start the server for getting this into effect.
If there is any DB or other changes involved in this process, i make a script of that and test the same at staging server before going live.
Just one suggestion: Take proper backup of current live server for the otherwise case.
Related
Dears,
I am facing a problem with my weblogic 12c Server. I have 3 managed nodes and one admin server. I have configured an Oracle Datasource. Whenever any network glitches or database downtime happens, weblogic server also goes down completely. Is there any configuration which I can make to hold on the server until everything comes back.
Servers are trying to restart automatically.But it is in STARTING state. I have to again start all servers and it is taking time. Your suggestions will help me a lot. Thanks in advance.
If you don't want your servers to restart automatically, the server failure action can probably be tweaked:
{managed_server} -> Configuration -> Overload -> Failure Action
However, probably you should be more interested in exploring if you are missing connection timeout config for your datasource.
For a benchmarking test, I have a very basic test setup wherein I have a single user looping for 100 times (loop delay 100ms) hitting an https endpoint (GET) with HttpClient4 implementation, keep-alive has been turned on.
In the test results, I have observed a pattern wherein every 5/6th request the connect metric is higher as if a full SSL handshake is occurring, check the image below. I am a bit confused with this, any ideas on whats going on here and why the connect times are higher every n request?
[UPDATE]
I was able to troubleshoot this issue a bit further today after turning on access logs on the load balancer (target of this test) and I can see a pattern wherein JMeter seems to be switching the ports on the client side every few requests - the frequency matches the pattern observed previously with the JMeter test results.
This should probably explain the elevated connect times, now the question is why JMeter switches the port?
This could be keep-alive, it certainly was for my issue. Firstly make sure it's enabled on the sampler. Then there's also this JMeter setting to say how long to keep connections alive for.
httpclient4.time_to_live
I've set to 120000 in jmeter.properties but looking at the docs user.properties file should be used. I know jmeter.properties with a setting of 120000 worked for me.
I set the value high to see if it is an http keep alive causing the port switch. Whatever you set it to you need to ensure the client you are emulating does the same.
As you get some quick results I would guess it is a short timer somewhere and not the server side not allowing keep alive at all. Wireshark can help you pin point this as it could be the server side resetting the connection after a certain time. The above config extends the client side time which may get the information you need, if not have a look at the server side equivalent which will vary depending on what services the endpoint.
I have a Web Server implemented using dot net MVC4. There are clients connected to this web server which perform some operations and upload live logs to the server using WebClient.UploadString method. Sending these logs from client to server is being done in group of 2500 characters at a time.
Things work fine until 2-3 client upload logs. However when more than 3 clients try to upload logs simultaneously they start receiving "http 500 internal server error".
I might have to scale up and add more slaves but that will make the situation worse.
I want to implement Jenkins like live logging, where logs from slave are updated live.
Please suggest some better and scalable solution to this problem.
Have you considered looking into SignalR?
It can be used for anything from instant messaging to stocks! I have implemented both a chatbox, and a custom system that sends off messages, does calculations and then passes them back down to client. It is very reliable, there are some nice tutorials, and I think it's awesome.
We have one VM for BizTalk and a separate VM for the SQL backend. We are using Veeam for backups which basically kicks off a snapshot of the VM. When this snapshot is being finalized on the SQL VM, BizTalk services on the application server fail. Usually they restart automatically but sometimes this requires manual intervention to start the services. The error below is logged on the BizTalk server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error.
OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases.
The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.
Error message: [DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
Error source:
BizTalk host name: BizTalkServerApplication
Windows service name: BTSSvc$BizTalkServerApplication
We experienced the same situation and error with both BizTalk 2009 and BizTalk 2013, each set up with two App servers and one SQL DB server.
When our VMware does the final step of the Snapshot backup on the Application servers, it freezes the application server for about 10 seconds, preventing it from receiving packets. On SQL Server 2008 and 2012, it by default will send out keep-alive packets to the clients every 30 seconds (30,000 ms). If the SQL server fails to receive a response back from the App server, it will send out 5 retries (default setting) of the keep-alive request 1 second (1,000 ms) apart. If SQL still does not receive the response back, it will terminate the connection, which will cause the BizTalk hosts on the App server to reset, and in our case, when our German-made ERP system sends its EDI documents over to BizTalk during that reset period, the transmission will fail.
We trapped the issue by running NetMon on the DB and App servers, waiting for the next error message. Upon inspection, we see the five SQL keep-alive packets being sent to the App servers 1 second apart, and at the same time there were NO packets at all received on the Application server. At first guess, one might think they were "just dropped network packets", which is rarely the case. We then made the correlation to the timing of the VM Snapshots, and now confirm each time the snapshot finishes each day, the App servers freeze.
As a Short-to-mid-term workaround, we raised the number of retries SQL attempts before declaring a connection dead, (5 by default), by adding the registry value TcpMaxDataRetransmissions and setting it to 30 (thus 30 seconds before SQL declares the client unresponsive). This has masked the problem for now for us, and use at your own discretion.
We are also looking at an Agent-based version of the VM Snapshot, which may alleviate the condition of freezing the server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
Not that I am aware of, however you might want to Google config options in the btsntsvc.exe.config file which is located in your BizTalk installation directory.
All messages that pass through BizTalk are written to the BizTalkMsgBoxDb and its other databases are involved if you are running tracking, BAM etc. The only service that can cache 'stuff' and handle a database outage is the Enterprise Single Sign-On (ESSO) Service. BizTalk therefore needs a persistent connection to the database server to remain 'up', hence why your Host Instance (BizTalkServerApplication) is stopping - it simply wouldn't be able to process messages if the database wasn't there.
I would add that your approach to back-ups probably isn't supported by Microsoft and I would further suggest that you seriously consider whether an approach that takes your database server offline during the backup is viable?
BizTalk has a pretty robust backup solution for its various databases built into the product, and I would recommend that you take a look at using this supported method.
If you do need to take snapshots of the database system - say once a night - you might want to consider stopping the BizTalk Host Instances, performing the snapshot, and then re-starting the Host Instances through some scripted task.
You might also want to consider checking whether there are any hotfixes for your version of BizTalk Server included in a Cumulative Update that might help address your problem.
I got this question regarding web server (such as nginx, Cherokee or Oracle iPlanet) and Java containers (such as GlassFish): Can we control what happens to the connection if the user drops an unfinished connection?
When a browser opens an HTTP/HTTPS connection to a server, it hits the web server (nginx, Cherokee or Oracle iPlanet) and then reverse proxies to the Java container (GlassFish). The Java application then executes and does quite a lot of things such as calculation and finally needs to write to, say, 3 different databases. If it has finished writing to the 1st database - but not yet to the 2nd and 3rd database - and the user closes the connection (by closing the browser window, or looses a network connection, etc.) what will happen to the process?
Specifically, I would like the process to CONTINUE until it finishes executing all the code. I know of one way is to spin off the process on a new thread, but then this will incur computation costs. So, are there any settings/config I can do to make sure it will continue to execute even though the user has broken the connection?
With nginx, you can set proxy_ignore_client_abort on; and it will not close the connection to the backend if the client closes its connection.