How to replicate existing graphite server to a new one using carbon relay - replication

So, we have a standalone graphite node which is behind a cname and collecting all the metrics. If this goes down,it's not going to be good.So, my question is how do I not only replicate all the existing whisper data to another node, but also setup replication in place using carbon relay. What would the migration workflow look like in short? How should I configure the carbon relay? I want to do this in as transparent way as possible with minimum downtime.

The migration will require a minimal unavoidable downtime.
I would proceed by:
decrease lifetime of you cname in your dns server (it will a lower bound of the duration of your downtime)
prepare a second server with aggregators, cache and a relay pointing to both boxes (replication factor 2), but stop the relay
stop graphite on your first server (downtime start now)
change the cname to point to the second server
archive all metrics on the first server, copy to the second, extract them
start the relay (end of downtime)
at time of dns change + ttl, all your clients will have moved the second relay, all data is written on both servers
You can then start to make your setup more reliable with a relay on the first server (sharing a virtual ip for instance).
On our setup we have separated relay servers (2 of them in active/active) from aggregator+cache servers.
Redundancy in graphite is however tricky when a server is down since it won't fetch missed updates when up again, you'll have to do it manually.

Related

Ignite TimeServer sync behind NAT

We're currently running Apache Ignite in a docker container but are having problems with the time server sync. Each node is reporting all the known ip addresses which later is used by a remote peer to send a time sync message over UDP.
Is there a way to specify the externally reachable ip address that the peers will use for time sync?
It's possible to set to every node a network interface it should use for all network related communications using IgniteConfiguration.setLocalHost(...) method. The time server will use the addresses specified this way for its needs as well.
However it's not critical that the time server is non workable on your side because it's used for cache CLOCK mode which is discouraged for usage.

Detect cluster node failure Jboss AS 7.1.1-Final

I have configured 2 node clusers in Jboss AS 7.1.1-Final. I am planning to use sticky sessions. Meanwhile I am also recording number of active online users in Infinispan cache with node IP from where that user session was created for reporting purpose.
I have taken care of scenarios for login/logout where I would clear our cache entries. Problem is if one of the server node goes down, I need to write clean up routine to clear such records of that node from cache too.
One of the option is to write a client and check at specific interval if server is alive otherwise trigger a clean up routine. This approach would work but I am looking for more cleaner approach if I could detect server node failure that gets notified to other live nodes then I could hit cleanup.
From console I know that it shows when server goes down or comes up. But what would be that listerner to listen to such events. Any thoughts?
If you just need to know when the node leaves within some server module (inside JBoss server) you can use the ViewChanged listener
You cannot get this information on clients connected via REST or memcached protocols - with HotRod protocol it is doable but pretty hackish, you'd have to override TransportFactory.updateServers (probably just extend TcpTransportFactory - see configuration property infinispan.client.hotrod.transport_factory)

Redis clients broadcast problems (in the context of Socket.IO)

So I've read some articles about scaling Socket.IO. For various reasons I don't want to use built-in Socket.IO scaling mechanism (mostly it seems to be inefficient, since it publishes a lot more stuff to Redis then required from my point of view).
So I've came up with this simple idea:
Each Socket.IO server creates Redis pub/sub/store clients, connects to Redis and subscribes to a channel. Now, when I want to broadcast data I just publish it to Redis and all other Socket.IO servers get it and push it to users.
There is a problem, though (which I think is also a problem for Socket.IO built-in mechanism). Let's say I want to know the number of all connected users. There are at least two ways of doing that:
Server A publishes give_me_clients to Redis. Then each Socket.IO server counts connections and publishes number_of_clients. Server A grabs this data, combines it and sends it to the client.
Each server updates number_of_clients_for::ID_HERE in Redis whenever user connects/disconnects to the server. Then Server A just fetches data and combines it. Might be more efficient.
There are problems with these solutions though:
Server A is not aware of other servers. Therefore he does not know when he should stop listening to number_of_clients. One could fix it with making Server A aware of other servers: whenever a server connects to Redis he publishes new_server (Server A grabs the data and stores it in memory). But what to do, when Redis - Socket.IO connection breaks? Is there a way for Redis to notify clients that one of the client disconnected?
Actually the same as above. When a Socket.IO server crashes how to clear number_of_clients data?
So the real question is: can Redis notify (publish to chanel) clients that the connection with one of them has just ended??
After a lot of testing it seems, that Redis does not have such functionality. Also I've found out, that scaling Socket.IO is really a pain.
So I've switched from Socket.IO to WS (see this link). It is low level (but perfect for my use) and it only supports WebSockets (in all major versions). But then again I only want to support WebSockets and FlashSocket (which I have to imlement manually, but that's fine).
The advantage is that I can easily create cluster with such servers. HAProxy works with such servers almost out of the box (some minor tuning). Servers can easily communicate on a local net (with UDP or central TCP server if the cluster is big).
The disadvantage is that one have to manually implement some cool features like heartbeats, broadcasting, rooms, etc. Also you want have long-polling fallback, but that's fine in my case. Scaling is still more important, imho.

Web App: High Availability / How to prevent a single point of failure?

Can someone explain to me how high-availability ("HA") works for a web application ... because I assume HA means that there exist no single-point-of-failure.
However, even if a load balancer is used- isn't that the single point of failure?
I have found this article on the subject:
http://www.tenereillo.com/GSLBPageOfShame.htm
Basically if you do not require long lasting sticky sessions you can configure your DNS servers to return multiple A records (IP addresses) for your website.
Web browsers are smart enough to try all the addresses until they find one that works.
In simple words high availability can be defined as running a system 24*7 without a downtime even if there are hardware and software failures. In other way a fault tolerance application. This helps ensure uninterrupted use of the application for it’s intended users.
Read more on High Availability Deployment Architecture
It works the following way that you setup two HA Proxy servers with heartbeat, so when one fails (stops responding to queries), it's being removed from the cluster.
Requests from HA Proxy can be forwarded to web servers in round robin fashion, and if one web server fails, HA Proxy servers do not try to contact it until it's alive.
Web servers are storing all dynamic information in database, which is replicated across two MySQL instances.
As you can see, HA Proxy and Cluster MySQL (or simply MySQL replication) as well IP Clustering here is the key.
Sure it is when operated alone. Usual highly available setup includes 2 or more load balancers running in cluster in either active/active or active/passive configuration. To further increase the availability you can have 2 different Internet Service Providers (or geo distributed datacenters) each running a pair of clustered load balancers. Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist
This is what big players do to make their services highly available.
Please read https://docs.oracle.com/cd/E23824_01/html/821-1453/gkkky.html for more clearity. Actually both load balancer uses same vip(Virtual IP Address. https://techterms.com/definition/vip).
HA architecture is a entire field and multiple books were written on it, so it is hard to answer in a short paragraph.
To sum up the ideal situation, you would be using multiple servers, interconnected to a layer of multiple load balancers. The nodes and LB will be located in a few different data centers, and connected to different network backbone. Ideally the data centers will be located all over the world.
In short, all component will have redundancy, including the load balancers.
For a starting point, see Wikipedia for High Availability Cluster

Switching state server to another machine in cluster

We have a number of web-apps running on IIS 6 in a cluster of machines. One of those machines is also a state server for the cluster. We do not use sticky IP's.
When we need to take down the state server machine this requires the entire cluster to be offline for a few minutes while it's switched from one machine to another.
Is there a way to switch a state server from one machine to another with zero downtime?
You could use Velocity, which is a distributed caching technology from Microsoft. You would install the cache on two or more servers. Then you would configure your web app to store session data in the Velocity cache. If you needed to reboot one of your servers, the entire state for your cluster would still be available.
You could use the SQL server option to store state. I've used this in the past and it works well as long as the ASPState table it creates is in memory. I don't know how well it would scale as an on-disk table.
If SQL server is not an option for whatever reason, you could use your load balancer to create a virtual IP for your state server and point it at the new state server when you need to change. There'd be no downtime, but people who are on your site at the time would lose their session state. I don't know what you're using for load balancing, so I don't know how difficult this would be in your environment.