Setting up a multi-broker cluster (kafka quickstart) - properties-file

I am trying to follow QUICKSTART in Apache Kafka homepage.
In Step 6: Setting up a multi-broker cluster, it says " edit the server properties"
config/server-1:properties:
broker.id=1
listeners = PLAINTEXT://:9093
logs.dirs=/tmp/kafka-logs-1
My question is :
Do I use vi editor to edit? if yes, I just change those 'broker', 'listeners','logs.dirs' items' values as above?

Whichever tool you prefer, it needs to be able to change values and save the file. You could use sed if you really wanted to (like some Docker containers might do... On that note, you could even just use Docker Compose to start two Kafka containers side by side)
But, as the docs point out, those three properties should be unique across all brokers on a single machine, and all others can remain unchanged.
I would like to point out that having two broker processes on one machine, sharing a single disk, is not a good idea. You'd get better performance with a larger heap space on that one server

Related

How to do dynamic scaling using pg_shard

I am doing data base scaling using postgresql.
Currently i am using pg_shard for scaling and able to do sharding and replication. i have tested the example that mentioned in Readme file of pg_shard.
But i need dynamically scale a cluster as new machines are added or old ones are retired.I am using google cloud VM to setup database .So once one VM is filled with data i want to setup new instance with same configuration.
ie,if the current machine size is 4GB and is of out of memory then it should create one more VM with 4GB size and next entries should come there.
I have gone through http://slideplayer.com/slide/4896815/ and after reading this i understood that it is possible to do but the steps are not mentioned anywhere.
How to achieve this using pg_shard?
I got the answer myself.
We can use CitusDB for this.
CitusDB is installed with an extension called "shard_rebalancer", which helps you to move the shards around when new nodes are added to the cluster. For this, you need to follow the installation instructions for CitusDB.
In this documentation, you can find about the related information for the shard rebalancer functions (i.e., rebalance_table_shards and replicate_table_shards)
With simpler words, you must follow the steps:
Add CitusDB node(s) to the cluster
Add the IPs (or host names) to pg_worker_list.conf
Reload the master node configuration, so that the master becomes aware of the new worker node(s)
Run "SELECT rebalance_table_shards('tablename')" on the master node.

Running multiple Kettle transformation on single JVM

We want to use pan.sh to execute multiple kettle transformations. After exploring the script I found that it internally calls spoon.sh script which runs in PDI. Now the problem is every time a new transformation starts it create a separate JVM for its executions(invoked via a .bat file), however I want to group them to use single JVM to overcome memory constraints that the multiple JVM are putting on the batch server.
Could somebody guide me on how can I achieve this or share the documentation/resources with me.
Thanks for the good work.
Use Carte. This is exactly what this is for. You can startup a server (on the local box if you like) and then submit your jobs to it. One JVM, one heap, shared resource.
Benefit of that is then scalability, so when your box becomes too busy just add another one, also using carte and start sending some of the jobs to that other server.
There's an old but still current blog here:
http://diethardsteiner.blogspot.co.uk/2011/01/pentaho-data-integration-remote.html
As well as doco on the pentaho website.
Starting the server is as simple as:
carte.sh <hostname> <port>
There is also a status page, which you can use to query your carte servers, so if you have a cluster of servers, you can pick a quiet one to send your job to.

Google Cloud Compute - Virtual Machine Scaling and Load Balancing

I signed up for Google Cloud the other day using their free trial promotion. I love it so far. I've got a couple of questions that are probably generic to cloud computing, which I'm new to. I have my test virtual machine up without any issues, using Ubuntu Linux.
My question with cloud concepts are - first:
- How to scale instance. Can you scale from micro to small (also vice versa)?
If scaling isn't done that way, and it's about using instance groups, how do load balancing and instance groups work?
This is the concept I'm most confused with...how would I push an code update if I had 3 instances for the load balancer?
Thanks for your help!
First question: How do you vertically scale an instance? Answer: you must re-create the instance and destroy the old one. You can't just make an existing instance smaller or larger. Luckily, you can script the whole setup. GCE allows you to add a flag called --metadata-from-file. If you are using systemd, I recommend something to the effect of --metadata-from-file user-data=cloud-config.yaml. Since you are using Ubuntu, and Ubuntu's support for systemd is sketchy at best, you probably just want to do something like: --metadata-from-file startup-script=my-startup-script.sh Scripting your deployment will allow you to scale, re-create and document your deployment and is a best practice in cloud computing.
Second question: How do instance groups and load balancing groups work? Answer: Instance groups in GCE are almost always of the "managed" variety. This allows you to create a template that defines how you want your instances to work. Then you can horizontally scale them (i.e. add more or take some away) behind a load balancer. You can even leverage preemptible instances to save you some cash.
Third question: How do I push an update? This depends on how you deploy. But in general I would say:
If you use Docker, push a new image to GCR and have your instances pull it.
If you use CM (like Salt or Ansible) just use those tools normally. They work fine on GCE
If you use startup scripts do something like gcloud compute instances myinstance add-metadata metadata-from-file startup-script=newScript.sh (and restart after)
If everything is contained in a managed instance template, update your template.

Multiple docker containers

I am reading about docker and I am trying to understand whether or not this is something I should learn to use.
From what I read best practices states that you should have one process per container. Now, this mean that I need one container for JBoss, one for database, one for file storage, build server, ...
Now would I manually have to start each of these containers? Or are there some kind of dependencies you can set up?
What about the order and requirements that one process in a container can have? JBoss needs the database to be started before it starts etc?
Is this handled?
one process per container
This advice is valid if you want to follow a microservices architecture. microservices have advantages but also drawbacks. Depending on your situation you might find it more convenient to have a container running multiple processes.
Running multiple containers on one single host
If you want to start multiple containers together on one single docker host, the easiest way is to use fig. The fig configuration file is very easy to understand as its syntax mimics docker commands. This video gives you a nice presentation of fig (by one of fig authors Aanand Prasad)
Note that tools such as fig AFAIK won't be able to wait for a first container to start and finish initializing before starting another container depending on the first one. The way to handle this is to have the 2nd container implement some kind of test and loop until the dependency is ready, then start its process. This can be achieved by different means (wrapper script, straight in your application code, ...)
Running multiple processes in one container
As a docker container will stop as soon as no process is running in the foreground, there are different techniques you can use (supervisor, running a first process as a daemon and a last one in the foreground, using phusion/baseimage, ...)

Redis active-active replication

I am using redis version 2.8.3. I want to build a redis cluster. But in this cluster there should be multiple master. This means I need multiple nodes that has write access and applying ability to all other nodes.
I could build a cluster with a master and multiple slaves. I just configured slaves redis.conf files and added that ;
slaveof myMasterIp myMasterPort
Thats all. Than I try to write something into db via master. It is replicated to all slaves and I really like it.
But when I try to write via a slave, it told me that slaves have no right to write. After that I just set read-only status of slave in redis.conf file to false. Hence, I could write something into db.
But I realize that, it is not replicated to my master replication so it is not replicated to all other slave neigther.
This means I could'not build an active-active cluster.
I tried to find something whether redis has active-active cluster capability. But I could not find exact answer about it.
Is it available to build active-active cluster with redis?
If it is, How can I do it ?
Thank you!
Redis v2.8.3 does not support multi-master setups. The real question, however, is why do you want to set one up? Put differently, what challenge/problem are you trying to solve?
It looks like the challenge you're trying to solve is how to reduce the network load (more on that below) by eliminating over-the-net reads. Since Redis isn't multi-master (yet), the only way to do it is by setting up each app server with a master and a slave (to the other master) - i.e. grand total of 4 Redis instances (and twice the RAM).
The simple scenario is when each app updates only a mutually-exclusive subset of the database's keys. In that scenario this kind of setup may actually be beneficial (at least in the short term). If, however, both apps can touch all keys or if even just one key is "shared" for writes between the apps, then you'll need to bake locking/conflict resolution/etc... logic into your apps to consolidate local master and slave differences (and that may be a bit of an overkill). In either case, however, you'll end up with too many (i.e. more than 1) Redises, which means more admin effort at the very least.
Also note that by colocating app and database on the same server you're setting yourself for near-certain scalability failure. What will happen when you need more compute resources for your apps or Redis? How will you add yet another app server to the mix?
Which brings me back to the actual problem you are trying to solve - network load. Why exactly is that an issue? Are your apps so throughput-heavy or is the network so thin that you are willing to go to such lengths? Or maybe latency is the issue that you want to resolve? Be the case as it may be, I recommended that you consider a time-proven design instead, namely separating Redis from the apps and putting it on its own resources. True, network will hit you in the face and you'll have to work around/with it (which is what everybody else does). On the other hand, you'll have more flexibility and control over your much simpler setup and that, in my book, is a huge gain.
Redis Enterprise has had this feature for quite a while, but if you are looking for an open source solution KeyDB is a fork with Active Active support (called Active Replica).
Setting it up is just a little more work than standard replication:
Both servers must have "active-replica yes" in their respective configuration files
On server B execute the command "replicaof [A address] [A port]"
Server B will drop its database and load server A's dataset
On server A execute the command "replicaof [B address] [B port]"
Server A will drop its database and load server B's dataset (including the data it just transferred in the prior step)
Both servers will now propagate writes to each other. You can test this by writing to a key on Server A and ensuring it is visible on B and vice versa.
https://github.com/JohnSully/KeyDB/wiki/KeyDB-(Redis-Fork):-Active-Replica-Support