We have a number of web-apps running on IIS 6 in a cluster of machines. One of those machines is also a state server for the cluster. We do not use sticky IP's.
When we need to take down the state server machine this requires the entire cluster to be offline for a few minutes while it's switched from one machine to another.
Is there a way to switch a state server from one machine to another with zero downtime?
You could use Velocity, which is a distributed caching technology from Microsoft. You would install the cache on two or more servers. Then you would configure your web app to store session data in the Velocity cache. If you needed to reboot one of your servers, the entire state for your cluster would still be available.
You could use the SQL server option to store state. I've used this in the past and it works well as long as the ASPState table it creates is in memory. I don't know how well it would scale as an on-disk table.
If SQL server is not an option for whatever reason, you could use your load balancer to create a virtual IP for your state server and point it at the new state server when you need to change. There'd be no downtime, but people who are on your site at the time would lose their session state. I don't know what you're using for load balancing, so I don't know how difficult this would be in your environment.
Related
I am dealing with the infrastructure for a new project. It is a standard Laravel stack = PHP, SQL server, and Nginx. For the PHP + Nginx part, we are using Kubernetes cluster - so scaling and blue/green deployments are taken care of.
When it comes to the database I am a bit unsure. We don't want to use Kubernetes for SQL, so the current idea is to go for Google Cloud SQL managed service (Are the competitors better for blue/green deployment of SQL?). The question is can it sync the data between old and new versions of the database nodes?
Let's say that we have 3 active Pods and at least 2 active database nodes (and a load balancer).
So the standard deployment should look like this:
Pod with the new code is created.
New database node is created with current data.
The new Pod gets new environment variables to connect to the new database.
Database migrations are run on the new database node.
Health check for the new Pod is run, if it passes Pod starts to receive traffic.
One of the old Pods is taken offline.
It should keep doing this iteration until all of the Pods and Database nodes are replaced.
The question is can this work with the database? Let's imagine there is a user on the website that is using the last OLD database node to write some data and when switched to the NEW database node the data are simply not there until the last database node is upgraded. Can they be synced behind the scenes? Does Google Cloud SQL managed service provide that?
Or is there a completely different and better solution to this problem?
Thank you!
I'm not 100% sure if this is what you are looking for, but for my understanding, Cloud SQL replicas would be a better solution. You can have read replicas [1], that are a copy of the master instance and have different options [2]
A read replica is a copy of the master that reflects changes to the master instance in almost real time. You create a replica to offload read requests or analytics traffic from the master. You can create multiple read replicas for a single master instance.
or a failover replica [3], that in case the master goes down, the data continue to be available there.
If an instance configured for high availability experiences an outage or becomes unresponsive, Cloud SQL automatically fails over to the failover replica, and your data continues to be available to clients. This is called a failover.
You can combine those if you need.
We are trying to prevent our application startups from just spinning if we cannot reach the remote cluster. From what I've read Force Server Mode states
In this case, discovery will happen as if all the nodes in topology
were server nodes.
What i want to know is:
Does this client then permanently act as a server which would run computes and store caching data?
If connection to the cluster does not happen at first, a later connection to an establish cluster cause issue with consistency? What would be the expect behavior with a Topology version mismatch? Id their potential for a split brain scenario?
No, it's still a client node, but behaves as a server on discovery protocol level. For example, it can start without any server nodes running.
Client node can never cause data inconsistency as it never stores the data. This does not depend forceServerMode flag.
The Bluemix documentation leads a reader to believe that the only persistent storage for a virtual server is using Bluemix Block Storage. Also, the documentation leads you to believe that virtual server's own storage will not persist over restarts or failures. However, in practice, this doesn't seem to be the case at least as far as restarts are concerned. We haven't suffered any virtual server outages yet.
So we want a clearer understanding of the rationale for separating the virtual server's own storage from its attached Block Storage.
Use case: I am moving our Git server and a couple of small LAMP-based assets to a Bluemix Virtual Server as we simultaneously develop new mobile apps using Cloud Foundry. In our case, we don't anticipate scaling up the work that the virtual server does any time soon. We just want a reliable new home for an existing website.
Even if you separate application files and databases out into block storage, re-provisioning the virtual server in the event of its loss is not trivial even when the provisioning is automated with Ansible or the like. So, we are not expecting to have to be regularly provisioning the non-persistent storage of a Bluemix Virtual Server.
The Bluemix doc you reference is a bit misleading and is being corrected. The virtual server's storage on local disk does persist across restart, reboot, suspend/resume, and VM failure. If such was not the case then the OS image would be lost during any such event.
One of the key advantages of storing application data in a block storage volume is that the data will persist beyond the VM's lifecycle. That is, even if the VM is deleted, the block storage volume can be left in tact to persist data. As you mentioned, block storage volumes are often used to back DB servers so that the user data is isolated, which lends itself well to providing a higher class of storage specifically for application data, back up, recovery, etc.
In use cases where VM migration is desired the VMs can be set up to boot from a block storage volume, which enables one to more easily move the VM to a different hypervisor and simply point to the same block storage boot volume.
Based on your use case description you should be fine using VM local storage.
So, we have a standalone graphite node which is behind a cname and collecting all the metrics. If this goes down,it's not going to be good.So, my question is how do I not only replicate all the existing whisper data to another node, but also setup replication in place using carbon relay. What would the migration workflow look like in short? How should I configure the carbon relay? I want to do this in as transparent way as possible with minimum downtime.
The migration will require a minimal unavoidable downtime.
I would proceed by:
decrease lifetime of you cname in your dns server (it will a lower bound of the duration of your downtime)
prepare a second server with aggregators, cache and a relay pointing to both boxes (replication factor 2), but stop the relay
stop graphite on your first server (downtime start now)
change the cname to point to the second server
archive all metrics on the first server, copy to the second, extract them
start the relay (end of downtime)
at time of dns change + ttl, all your clients will have moved the second relay, all data is written on both servers
You can then start to make your setup more reliable with a relay on the first server (sharing a virtual ip for instance).
On our setup we have separated relay servers (2 of them in active/active) from aggregator+cache servers.
Redundancy in graphite is however tricky when a server is down since it won't fetch missed updates when up again, you'll have to do it manually.
Currently I have the following setup:
Hardware load balancer directing traffic to two physical servers each with 2 instances of weblogic running.
Works ok. I'd like to be able to shutdown one of the servers without dropping active sessions. Right now if I shutdown one of the physical servers any traffic that was going there gets bounced back to a login screen.
I'm looking for the simplest way of accomplishing this with the smallest performance hit.
Things I've considered so far:
1. See if I can somehow store the session information on the Load Balancer and through some Load Balancer magic have it notice a server is dead and try another one with the same session information (not sure this is possible)
2. Configure weblogic clustering. Not sure what the performance hit would be. Im guessing this is what I'll end up with, but still fishing for alternatives.
3. ?
What I currently have is an overly designed DR solution (which was the requirement), but I'd like to move it more in the direction of HA (for the flexibility)
edit Also is it worthwhile to create 2 clusters and replicate the sessions between them (I was thinking one cluster per site, sites are close enough). This would cover the event of one cluster failing.
You could try setting up a JDBC Session Storage pointing (of course) both instances to the same datasource without setting up a cluster, but I think the right approach would be setting up a Weblogic Cluster.
A nice thing about clustering Weblogic Servers is that - (from the link above, emphasis mine):
Sessions can be shared across clustered WebLogic Servers. Note that session persistence is no longer a requirement in a WebLogic Cluster. Instead, you can use in-memory replication of state. For more information, see Using WebLogic Server Clusters.
We've got a write up of this on our blog http://blog.c2b2.co.uk/2012/10/basic-clustering-with-weblogic-12c-and.html which provides step by step instructions on setting up web session failover in a cluster.
Clusters are not heavyweight assuming you don't store huge amounts of data in the cluster as it will be replicated.