When I try to create an AKS cluster with the portal, I'm facing the following problem:
This problem got "solved" when I scaled-down the cluster to just one node. But when I tried to browse the cluster, it wasn't possible. It was giving me timeouts. It seems that cluster was not health anyway.
Now, when I try to create a new cluster with just one node (A0), it's taking foreveeeeeer! Look:
I'm wondering if there's any special limitation for the Azure pass subscription. But it's weird because when I take a look at the subscription quota, I'm not even close to reaching the limit.
This is a new and clean subscription, I have nothing else there.
So the problem was that I was using an A0 instance type.
AKS cluster nodes need at least 3.5GB of memory, that's why it had errors or took so much time to be created (it never finished actually).
I end up using a Standard A2 v2 (2 vcpus, 4 GB memory)
Related
I use Ignite.Net and run ignite in my .net core app process.
My application receives some messages (5000 per second) and I put or remove some keys according to the messages received. The cache mode is replicated, with default Primary_Sync write mode.
Everything is good and I can process up to 20,000 messages/sec.
But when I run another ignite node on another machine, everything changes. Processing speed is reduced up to 1000 messages per second.
perhaps it's due to that some operations do on the network, but I want just put or remove keys on the local instance and replicate them (changed keys) to other nodes. Write mode is Primary_Sync and this means ignite must put or remove key on the local node (because all nodes are the same due to replicated mode and no need to distribute them on other nodes) and then replicate them to other nodes asynchronously.
Where is the problem?
Is the slowdown due to network operations?
Looking at the code (could not run it - requires messing with SQL server), I can provide the following recommendations:
Use DataStreamer. Always use streamer when adding/removing batches of data.
Try using multiple threads to load the data. Ignite APIs are thread-safe.
Maybe try CacheWriteSynchronizationMode.FullAsync
Together this should result in a noticeable speedup, no matter how many nodes.
I have 2 nodes, in which im trying to run 4 ignite servers, 2 on each node and 16 ignite clients, 8 on each node. I am using replicated cache mode. I could see the load on cluster is not distributed eventually to all servers.
My intension of having 2 servers per node is to split the load of 8 local clients to local servers and server can work in write behind to replicate the data across all servers.
But I could notice that only one server is taking the load, which is running at 200% cpu and other 3 servers are running at very less usage of around 20%cpu. How can I setup the cluster to eventually distribute the client loads across all servers. Thanks in advance.
I'm generating load by inserting same value 1Million times and trying to get the value using the same key
Here is your problem. Same key is always stored on the same Ignite node, according to Affinity Function (see https://apacheignite.readme.io/docs/data-grid), so only one node takes read and write load.
You should use a wide range of keys instead.
I would like to know what each strategy means and how they work behind the scenes (i.e., Highlander, Red/Black, Rolling Push). It would be very useful to have this information on the official website.
Thanks
There is useful information out there that can help you with your question, I'll do my best to summarize it below.
Type and Strategies of Deployments Introduction
"There are a variety of techniques to deploy new applications to
production, so choosing the right strategy is an important decision,
weighing the options in terms of the impact of change on the system,
and on the endusers."
Recreate: (also known as Highlander) Version A is terminated then version B is rolled out.
Ramped (also known as Rolling-Update or Incremental): Version B is slowly rolled out and replacing version A.
Blue/Green (also known as Red/Black): Version B is released alongside version A, then the traffic is switched to version B.
Canary: Version B is released to a subset of users, then proceed to a full rollout.
A/B Testing: Version B is released to a subset of users under specific condition.
Shadow: Version B receives real-world traffic alongside version A and doesn’t impact the response.
Type and Strategies of Deployments Summary Table
Ref link 1: https://thenewstack.io/deployment-strategies/
Spinnaker Deployment Strategies
Spinnaker treats cloud-native deployment strategies as first class constructs, handling the underlying orchestration such as verifying health checks, disabling old server groups and enabling new server groups.
Spinnaker supported deployment strategies (in active development):
Highlander
Red/Black (a.k.a. Blue/Green)
Rolling Red/Black
Canary
Illustrated in the Figure below as follows:
Highlander: This deployment strategy is aptly named after the film Highlander because of the famous line, "there can be only one." With this strategy, there is a load balancer fronting a single cluster. Highlander destroys the previous cluster after the deployment is completed. This is the simplest strategy, and it works well when rollback speed is unimportant or infrastructure costs need to be kept down.
Red/Black: This deployment strategy is also referred to as Blue/Green. The Red/Black strategy uses a load balancer and two target clusters / server groups (known as red/black or blue/green). The load balancer routes traffic to the active (enabled) cluster / server group. Then, a new deployment replaces servers (w/ K8s provider -> Replica Sets & Pods) in the disabled cluster / server group. When the newly enabled cluster / server group is ready, the load balancer routes traffic to this cluster and the previous cluster becomes disabled. The currently disabled cluster / server group (previously enabled cluster / server groups) is kept around by spinnaker in case a rollback is needed for the next X deployments (which is a configurable parameter).
Rolling Red/Black: is a slower red/black with more possible verification points. The process is the same as red/black, but difference is in how traffic switches over. The above image illustrates this difference. Blue is the enabled cluster. Blue instances are gradually replaced by new instances in the green cluster until all enabled instances are running the newest version. The rollout may occur in 20% increments, so it can be 80/20, 60/40, 40/60, 20/80, or 100%. Both blue/green clusters receive traffic until the rollout is complete.
Canary: deployments is a process in which a change is partially deployed, then tested against baseline metrics before continuing. This process reduces the risk that a change will cause problems when it has been completely rolled out by limiting your blast radius to a small percentage of your user-base. The baseline metrics are set when configuring the canary. Metrics may be error count or latency. Higher-than-baseline error counts or latency spikes kill the canary, and thus stop the pipeline.
Ref link 2: https://www.spinnaker.io/concepts/#deployment-strategies
Ref link 3: https://blog.armory.io/advanced-deployment-strategies-with-armory-spinnaker/
Ref link 4: https://www.weave.works/blog/kubernetes-deployment-strategies
As I understand it:
Highlander: when the new Auto Scaling group (ASG) is up and healthy, all old ASGs are destroyed automatically.
Red/Black: A new ASG is launched, some manual (or more complicated than in Highlander) verification steps are done, and only after those steps are completed is the old ASG manually deleted. Netflix blog post here: http://techblog.netflix.com/2013/08/deploying-netflix-api.html
Rolling push: "Old instances get gracefully deleted and replaced by new instances one or two at a time until all the instances in the ASG have been replaced." Netflix blog post here: http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
At my company we only use Highlander and Red/Black on a regular basis.
I am doing data base scaling using postgresql.
Currently i am using pg_shard for scaling and able to do sharding and replication. i have tested the example that mentioned in Readme file of pg_shard.
But i need dynamically scale a cluster as new machines are added or old ones are retired.I am using google cloud VM to setup database .So once one VM is filled with data i want to setup new instance with same configuration.
ie,if the current machine size is 4GB and is of out of memory then it should create one more VM with 4GB size and next entries should come there.
I have gone through http://slideplayer.com/slide/4896815/ and after reading this i understood that it is possible to do but the steps are not mentioned anywhere.
How to achieve this using pg_shard?
I got the answer myself.
We can use CitusDB for this.
CitusDB is installed with an extension called "shard_rebalancer", which helps you to move the shards around when new nodes are added to the cluster. For this, you need to follow the installation instructions for CitusDB.
In this documentation, you can find about the related information for the shard rebalancer functions (i.e., rebalance_table_shards and replicate_table_shards)
With simpler words, you must follow the steps:
Add CitusDB node(s) to the cluster
Add the IPs (or host names) to pg_worker_list.conf
Reload the master node configuration, so that the master becomes aware of the new worker node(s)
Run "SELECT rebalance_table_shards('tablename')" on the master node.
My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase