How to do dynamic scaling using pg_shard - dynamic

I am doing data base scaling using postgresql.
Currently i am using pg_shard for scaling and able to do sharding and replication. i have tested the example that mentioned in Readme file of pg_shard.
But i need dynamically scale a cluster as new machines are added or old ones are retired.I am using google cloud VM to setup database .So once one VM is filled with data i want to setup new instance with same configuration.
ie,if the current machine size is 4GB and is of out of memory then it should create one more VM with 4GB size and next entries should come there.
I have gone through http://slideplayer.com/slide/4896815/ and after reading this i understood that it is possible to do but the steps are not mentioned anywhere.
How to achieve this using pg_shard?

I got the answer myself.
We can use CitusDB for this.
CitusDB is installed with an extension called "shard_rebalancer", which helps you to move the shards around when new nodes are added to the cluster. For this, you need to follow the installation instructions for CitusDB.
In this documentation, you can find about the related information for the shard rebalancer functions (i.e., rebalance_table_shards and replicate_table_shards)
With simpler words, you must follow the steps:
Add CitusDB node(s) to the cluster
Add the IPs (or host names) to pg_worker_list.conf
Reload the master node configuration, so that the master becomes aware of the new worker node(s)
Run "SELECT rebalance_table_shards('tablename')" on the master node.

Related

Setting up a multi-broker cluster (kafka quickstart)

I am trying to follow QUICKSTART in Apache Kafka homepage.
In Step 6: Setting up a multi-broker cluster, it says " edit the server properties"
config/server-1:properties:
broker.id=1
listeners = PLAINTEXT://:9093
logs.dirs=/tmp/kafka-logs-1
My question is :
Do I use vi editor to edit? if yes, I just change those 'broker', 'listeners','logs.dirs' items' values as above?
Whichever tool you prefer, it needs to be able to change values and save the file. You could use sed if you really wanted to (like some Docker containers might do... On that note, you could even just use Docker Compose to start two Kafka containers side by side)
But, as the docs point out, those three properties should be unique across all brokers on a single machine, and all others can remain unchanged.
I would like to point out that having two broker processes on one machine, sharing a single disk, is not a good idea. You'd get better performance with a larger heap space on that one server

Can we copy Apache Ignite Cluster to another Ignite cluster?

I want to back up entire Ignite cluster so that back up clutser will be used if the original(active) cluster is down. Is there any approach for this?
If you need two separate clusters with replication across data center, it would be better to look at GridGain solutions that supports Datacenter Replication.
Unfortunately, Ignite does not support DR.
With Apache Ignite you can logically divide you cluster to two zones to have guarantee that every zone contains full copy of data. However, there is no way to choose primary node for partitions manually. See, AffinityFunction and affinityBackupFilter() method of standard implementations.
As answered above, ready made solution is only available in paid version. Open source Apache ignite provides ability to take cluster wide absolute snapshot. You can add a cron job in your ignite cluster to take this snapshot and add another job to copy snapshot data to object storage like S3.
On the other side, you download this data node wise to work directories of respective nodes as per manual restore procedure and start the cluster. It should automatically activate when all baseline nodes are started successfully and your cluster is ready to use.

Google Cloud Compute - Virtual Machine Scaling and Load Balancing

I signed up for Google Cloud the other day using their free trial promotion. I love it so far. I've got a couple of questions that are probably generic to cloud computing, which I'm new to. I have my test virtual machine up without any issues, using Ubuntu Linux.
My question with cloud concepts are - first:
- How to scale instance. Can you scale from micro to small (also vice versa)?
If scaling isn't done that way, and it's about using instance groups, how do load balancing and instance groups work?
This is the concept I'm most confused with...how would I push an code update if I had 3 instances for the load balancer?
Thanks for your help!
First question: How do you vertically scale an instance? Answer: you must re-create the instance and destroy the old one. You can't just make an existing instance smaller or larger. Luckily, you can script the whole setup. GCE allows you to add a flag called --metadata-from-file. If you are using systemd, I recommend something to the effect of --metadata-from-file user-data=cloud-config.yaml. Since you are using Ubuntu, and Ubuntu's support for systemd is sketchy at best, you probably just want to do something like: --metadata-from-file startup-script=my-startup-script.sh Scripting your deployment will allow you to scale, re-create and document your deployment and is a best practice in cloud computing.
Second question: How do instance groups and load balancing groups work? Answer: Instance groups in GCE are almost always of the "managed" variety. This allows you to create a template that defines how you want your instances to work. Then you can horizontally scale them (i.e. add more or take some away) behind a load balancer. You can even leverage preemptible instances to save you some cash.
Third question: How do I push an update? This depends on how you deploy. But in general I would say:
If you use Docker, push a new image to GCR and have your instances pull it.
If you use CM (like Salt or Ansible) just use those tools normally. They work fine on GCE
If you use startup scripts do something like gcloud compute instances myinstance add-metadata metadata-from-file startup-script=newScript.sh (and restart after)
If everything is contained in a managed instance template, update your template.

Updating Redis Click-To-Deploy Configuration on Compute Engine

I've deployed a single micro-instance redis on compute engine using the (very convenient) click-to-deploy feature.
I would now like to update this configuration to have a couple of instances, so that I can benchmark how this increases performance.
Is it possible to modify the config while it's running?
The other option would be to add a whole new redis deployment, bleed traffic onto that over time and eventually shut down the old one. Not only does this sound like a pain in the butt, but, I also can't see any way in the web UI to click-to-deploy multiple clusters.
I've got my learners license with all this, so would also appreciate any general 'good-to-knows'.
I'm on the Google Cloud team working on this feature and wanted to chime in. Sorry no one replied to this for so long.
We are working on some of the features you describe that would surely make the service more useful and powerful. Stay tuned on that.
I admit that there really is not a good solution for modifying an existing deployment to date, unless you launch a new cluster and migrate your data over / redirect reads and writes to the new cluster. This is a limitation we are working to fix.
As a workaround for creating two deployments using Click to Deploy with Redis, you could create a separate project.
Also, if you wanted to migrate to your own template using the Deployment Manager API https://cloud.google.com/deployment-manager/overview, keep in mind Deployment Manager does not have this limitation, and you can create multiple deployments from the same template in the same project.
Chris

Accumulo -- Adding a new node

I'm trying to learn Accumulo. But I have a couple of questions that I couldn't find directly:
First, can we add a new server to an existing Accumulo system without any down time? If yes, the new node will have its share (DB data) arranged by master; right? Since it has fail-recovery, I believe that will be automatic.
Can we define the number of replications or whole data is shared with some fail recovery system by itself? How can I learn the details of replication and data distribution process?
Thanks a lot :)
Yes, you can dynamically add/remove worker nodes at any time. They just need to have the same configuration options available to them so that they can join the cluster (shared secret, zookeeper quorum, etc... basically, the same accumulo-site.xml that you are using).
By default, the "master" process will assign tablets to each "tablet server" processes so that each host will be serving roughly the same amount of data.
Not sure I understand your second question, but Accumulo generally uses HDFS for its backing store, which handles replication and data recovery at the "file" level.