Is it a good way to run Kafka on Kubernetes? - redis

For a large online application, use k8s to run it. The scale maybe daily activity user 500,000.
The application inside k8s need messaging feature - Pub/Sub, there are these options:
Kafka
RabbitMQ
Redis
Kafka
It needs zookeeper and good to run on os depends on disk I/O. So if install it into k8s cluster, how? The performance will be worse?
And, if keep Kafka outside of the k8s cluster, connect Kafka from application inside the k8s cluster, how about that performance? They are in the different layer, won't be slow?
RabbitMQ
It's slow than Kafka, but for a daily activity user 500,000 application, is it good enough? If so, maybe it's a good choice.
Redis
It's another option. Maybe the most simple one. But from the internet I got that it will lose message sometimes. If true, that's terrible.
So, the most important thing is, use Kafka(also with zookeeper) on k8s, good or not in this use case?

Yes, running Kafka on Kubernetes is great. Check out this example: https://github.com/Yolean/kubernetes-kafka. It includes ZooKeeper and Kafka as StatefulSets.
PS. Running any of the services in your question on Kubernetes will be pleasant. You can Google the name of the service and "kubernetes" and find example manifests. Many examples here: https://github.com/kubernetes/charts.

For Kafka, you can find some suggestion here. Kubernetes 1.7+ supports local persistent volume, which may be good for Kafka deployment.

You can also take a look to the following project :
https://github.com/EnMasseProject/barnabas
It's about running Kafka on Kubernetes and OpenShift as well. It provides deploying with StatefulSets with persistent volumes or just in memory (for developing or just testing purpose). It provides deploying for Kafka Connect and Prometheus metrics as well.

Another simple configuration of Kafka/Zookeeper on Kubernetes in DigitalOcean with external access:
https://github.com/StanislavKo/k8s_digitalocean_kafka
You can connect to Kafka from outside of AWS/DO/GCE by regular binary protocol. Connection is PLAINTEXT or SASL_PLAINTEXT (user/password).
Kafka cluster is StatefulSet, so you can scale cluster easily.

Related

How can I setup Redis Cluster mode or master slave mode in PCF?

This is regarding the use case where we are trying to use the Redis in PCF (Pivotal Cloud Foundry). In our use case, we will refresh the Redis cache daily once or twice with the required data and then API will query Redis and then provide the response.
One thing of particular concern for us is that we want API queries to happen from Redis only that means Redis to be available at all times. But whenever we are refreshing the Redis DB, Redis would not be able to serve the APIs since it is refreshing the keys. To avoid that we wanted to setup a Redis in cluster mode or master-slave mode so if one instance is being written another can be read from.
How can we setup Redis cluster or master-slave mode in PCF and then fulfil our requirement?
Please provide any other suggestions as well that you may have.
At the time I write this, the Redis for Pivotal Platform product does not support clustering. See Availability, in the docs here -> https://docs.pivotal.io/redis/2-3/erc.html#offerings.
All Redis for Pivotal Platform services are single VMs without clustering capabilities. This means that planned maintenance jobs (e.g., upgrades) can result in 2–10 minutes of downtime, depending on the nature of the upgrade. Unplanned downtime (e.g., VM failure) also affects the Redis service.
Redis for Pivotal Platform has been used successfully in enterprise-ready apps that can tolerate downtime. Pre-existing data is not lost during downtime with the default persistence configuration. Successful apps include those where the downtime is passively handled or where the app handles failover logic.
If you require clustered Redis, you'd need to look at a different offering. Redis Labs has some offerings that integrate with PCF, you could use a Cloud Provider's Redis offering, or you could host your own.
If the solution you use isn't integrated into PCF, you can create a user-provided service with cf cups and provide the Redis credentials to your application that way. It will function just like a Redis service instance created through the marketplace.

Is there any way we can configure hazelcast in master-slave architecture like redis with Spring boot

Currently hazelcast is using cloud discovery for communication.
So if there are 4 kubernetes pods and each of them is having in-memory hazelcast. whenever hazelcast cache is updated in one of the pod, it gets updated in one of the other pod. but in case both of these pods get downscaled and get terminated, the data which is only in these 2 pods is lost. Can we have something like redis where we can provide server, port of the hazelcast cluster and it will be independent of kubernetes pod
Please check the following Blog Post ("Scale without Data Loss!" section) to read how to scale Hazelcast cluster on Kubernetes to avoid data loss.
Also, you can check the official README of hazelcast/hazelcast-kubernetes plugin. There is a section dedicated to scaling there.

Prometheus target management

We are using prometheus in our production envirment recently. Before we only have 30-40 nodes for each service and those servers not change very often, so we just write it in the prometheus.yml, but right now it become too long to hold in one file and change much frequently then before, so my question is should i use file_sd_config to put those server list out of yml file and change those config files sepearately, or using consul for service discovery(same much easy to handle changes).
I have install 3 nodes consul cluster in data center and as i can see if i change to use consul to slove this problem , i also need to install consul client in each server(node) and define its services info. Is that correct? or does anyone have good advise.
Thanks
I totally advocate the use of a service discovery system. It may be a bit hard to deploy at first but surely it will worth it in the future.
That said, Prometheus comes with a lot of service discovery integrations. It's possible that you don't need a Consul cluster. If your servers are in a cloud provider like AWS, GCP, Azure, Openstack, etc, prometheus are able to autodiscover the instances.
If you keep running with Consul, the answer is yes, the agent must be running in every node. You can also register services and nodes via API but it's easier to deploy the agent.

High-availability Redis?

I am currently setting up an infrastructure for an App in AWS. App is written in Django and is using Redis for some transactions. High availability is key for this application and I am having a hard time trying to get my head around how to configure Redis for High availability.
Application level changes are not an option.
Ideally I would like to have a redis setup, to which I can write and read and replicate and scale when required.
Current Setup is a Redis Fail-over scenario with HAProxy --> Redis Master --> Replica Slave.
Could someone guide me understand various options ? and how to scale redis for high availability !
Use AWS ElastiCache Redis Cluster with Multi-AZ. They provides automatic fail-over. It provides endpoint to access master node.
If master goes down AWS route your endpoint to another node. everything happens automatically, you don't have to do anything.
Just make sure that if you are doing DNS to IP caching in your application, its set to 60 seconds or so instead of default.
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/AutoFailover.html
Thanks,
KS

How to configure Redis in Spring XD distributed runtime?

The Spring XD documentation (http://docs.spring.io/spring-xd/docs/1.0.0.RC1/reference/html/) recommends Zookeeper to be run in ensemble so that Zookeeper is highly available. There is not lot of details about Redis about high availability.
If I were to run 2 XD admin instances and say 4 Container instances, I see 3 options
should I run a Redis instance in each server that runs container or admin? In that case does the Distributed runtime work properly with different Redis instances handling transport of different modules?
OR
should I run 1 Redis instance in a separate server and configure all XD instances to talk to this instance? In this case 1 instance of Redis is not highly available
OR
should I configure Redis cluster or Redis Sentinel high availability? I am not sure how XD or any other client will connect to a cluster or HA.
Thanks
I would suggest that you run a single Redis instance, there are some settings for persistence that you can change that may meet your requirements.
http://redis.io/topics/persistence
We will be adding support for Redis Sentinal, certainly in the Spring XD 1.1 release, but possibly in a maintenance release depending on what library changes we need to pick up. Spring Data Redis and Spring Boot have recent updates to support Redis Sentinal.
If you are using Redis as a message transport and want higher guarantees, I would switch to using Rabbit HA configuration of the MessageBus.
Cheers,
Mark