Ignite error upgrading the setup in Kubernetes - ignite

While I upgraded the Ignite that is deployed in Kubernetes (EKS) for Log4j vulnerability, I get the error below
[ignite-1] Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (54b55de4-7742-4e82-9212-7158bf51b4a9) is not compatible with BaselineTopology in the cluster. Joining node BlT id (4) is greater than cluster BlT id (3). New BaselineTopology was set on joining node with set-baseline command. Consider cleaning persistent storage of the node and adding it to the cluster again.
The setup is a 3 node cluster, with native persistence enabled (PVC). This seems to be occurring many times in our journey with Apache Ignite, having followed the official guide.
I cannot clean the storage as the pod gets restarted every now and then, by the time I get the pod shell the pod crash & restarts.

This might happen to be due to the wrong startup order, starting nodes manually in reverse order may resolve this, but I'm not sure if that is possible in K8s. Another possible issue might be related to the baseline auto-adjustment that might change your baseline unexpectedly, I suggest you turn it off if it's enabled.
One of the workarounds to clean a DB of a failing POD might be (quite tricky) - to replace Ignite image with some simple image like a plain Debian or Alpine docker images (just to be able to access CLI) keeping the same PVC attached, and once you fix the persistence issue, set the Ignite image back. The other one is - to access underlying PV directly if possible and do surgery in place.

Related

Google cloud kubernetes cluster newbie question

I am a newbie of GKE. I created a GKE cluster with very simple setup. It only has on gpu node and all other stuff was default. After the cluster is up, I was able to list the nodes and ssh into the nodes. But I have two questions here.
I tried to install nvidia driver using the command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
It output that:
kubectl apply --filename https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
daemonset.apps/nvidia-driver-installer configured
But 'nvidia-smi' cannot be found at all. Should I do something else to make it work?
On the worker node, there wasn't the .kube directory and the file 'config'. I had to copy it from the master node to the worker node to make things work. And the config file on the master node automatically updates so I have to copy again and again. Did I miss some steps in the creation of the cluster or how to resolve this problem?
I appreciate someone can shed some light on this. It drove me crazy after working on it for several days.
Tons of thanks.
Alex.
For the DaemonSet to work, you need to have a tag on your worker Node as cloud.google.com/gke-accelerator (see this line). The DaemonSet checks for this tag on a node before scheduling any pods for installing the driver. I'm guessing a default node pool you create did not have this tag on it. You can find more details on this on the GKE docs here.
The worker nodes, by design are just that worker nodes. They do not need privileged access to the Kubernetes API so they don't need any kubeconfig files. The communication between worker nodes and the API is strictly controlled through the kubelet binary running on the node. Therefore, you will never find kubeconfig files on a worker node. Also, you should never put them on the worker node either, since if a node gets compromised, the keys in that file can be used to damage the API Server. Instead, you should make it a habit to either use the master nodes for kubectl commands, or better yet, have the kubeconfig on your local machine, and keep it safe, and issue commands remotely to your cluster.
After all, all you need is access to an API endpoint for your Kubernetes API server, and it shouldn't matter where you access it from, as long as the endpoint is reachable. So, there is no need whatsoever to have kubeconfig on the worker nodes :)

Certain java-based containers throw "UnknownHostException"

I have two issues with my kubernetes.
kubernetes version 1.12.5, ubuntu16.04
the first issue is
Occasionally, containers on a specific node are restarted including kube-proxy
kernel: IPVS: rr TCP - no destination available
IPVS: __ip_vs_del_service:enter
net_ratelimit: callbacks suppressed
As these logs are continuously recorded,
The load avarage of node system resources is rather high.
Docker containers uploaded to the node keep repeating the restart.
In this case, node drain can relieve the symptoms.
the second issue
Certain java-based containers throw "UnknownHostException".
Restarting the container manually will resolve the symptoms.
Should I look at the container deployment settings?
Should I look at the cluster dns, resolve related settings?
I want to know if UnknownHostException is related to dns settings.
Can you give me some good comments?

Migrate a (storm+nimbus) cluster to a new Zookeeper, without loosing the information or having downtime

I have a nimbus+storm cluster using Zookeeper, and I wish to move my cluster and point it to a new Zookeeper. Do you know if this is possible? Can I keep all the information of the old zookeeper and save it in the new one? Is it possible to do it without downtime?
I have looked in the internet for this procedure but I have not found much.
Would it be as simples as change the storm.yml file in both the master . and worker nodes? Do I need a restart afterwards?
# storm.zookeeper.servers:
# - "server1"
# - "server2"
If you just change storm.yml, you'd be pointing Storm at a new empty Zookeeper cluster, and it will be like you just installed Storm from scratch. More likely, you want to grow your Zookeeper cluster to include your new machines, then update storm.yml to point at the new machines, then shrink the cluster to exclude the machines you want to move away from. That way, your Zookeeper quorum is preserved even though you've moved to other physical machines.
This is easier to do on Zookeeper 3.5 with dynamic reconfiguration http://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html. I'm unsure whether Storm will run on Zookeeper 3.5, but you may consider investigating whether you can upgrade to 3.5 before growing/shrinking the cluster.
Otherwise you will have to do a rolling restart to add the new Zookeeper nodes, then do another one to remove the old machines once the cluster has stabilized.
Let me suggest a hack here. This was a script provided by microsoft for migration on HD Insight cluster , but you can change it and use it for your need.
The script can be downloaded from : https://github.com/hdinsight/hdinsight-storm-examples/tree/master/tools/zkdatatool-1.0 and you can read more about it here :
https://blogs.msdn.microsoft.com/azuredatalake/2017/02/24/restarting-storm-eventhub/
I have used it in the past when i had to migrate some stuff between PaaS clusters and i can confirm it works ok!

Should all pods using a redis cache be constrained to the same node as the rediscache itself?

We are running one of our services in a newly created kubernetes cluster. Because of that, we have now switched them from the previous "in-memory" cache to a Redis cache.
Preliminary tests on our application which exposes an API shows that we experience timeouts from our applications to the Redis cache. I have no idea why and it issue pops up very irregularly.
So I'm thinking maybe the reason for these timeouts are actually network related. Is it a good idea to put in affinity so we always run the Redis-cache on the same nodes as the application to prevent network issues?
The issues have not arisen during "very high load" situations so it's concerning me a bit.
This is an opinion question so I'll answer in an opinionated way:
Like you mentioned I would try to put the Redis and application pods on the same node, that would rule out wire networking issues. You can accomplish that with Kubernetes pod affinity. But you can also try nodeslector, that way you always pin your Redis and application pods to a specific node.
Another way to do this is to taint your nodes where you want to run your workloads and then add a toleration to the Redis and your application pods.
Hope it helps!

Resolving Chef Dependencies

In my lab, I am currently managing a 20 nodes cluster with Cobbler and Chef. Cobbler is used for OS provisioning and basic network settings, which is working fine as expected. I can manage several OS distributions with preseed-based NQA installation and local repo mirroring.
We also successfully installed chef server and started managing nodes but chef is not working as I expected. The issue is that I am not being able to set node dependencies within chef. Our one important use case is this:
We are setting up ceph and openstack on these nodes
Ceph should be installed before openstack because openstack uses ceph as back-end storage
Ceph monitor should be installed before Ceph osd because creating osd requires talking to monitor
The dependencies between Openstack and Ceph does not matter because it is a dependency in one node; just installing openstack later would resolve the issue.
However, a problem arises with the dependency between ceph monitor and ceph osd. Ceph osd provisioning requires a running ceph monitor. Therefore, ceph osd recipe should always be run after ceph mon recipe finishes in another node. Our current method is just to run "chef-client" in "ceph-osd" node after "chef-client" run completely finishes in "ceph-mon" node but I think this is a too much of a hassle. Is there a way to set these dependencies in Chef so that nodes will provision sequentially according to their dependencies? If not, are there good frameworks who handles this?
In chef itself, I know no method for orchestrating (that's not chef Job).
A workaround given your use case could be to use tags and search.
You monitor recipe could tag the node at end (with tag("CephMonitor") or with setting any attribute you wish to search on).
After that the solr index of chef has to catch it up (usually in the minute) and you can use search in the Cephosd recipe you can do something like this:
CephMonitor = search(:node,"tags:CephMonitor") || nil
return if CephMonitor.nil?
[.. rest of the CephOsd recipe, using the CephMonitor['fqdn'] or other attribute from the node ..]
The same behavior can be used to avoid trying to run the OpenStack recipe until the osd has run.
The drawback if that it will take 2 or 3 chef run to get to a converged infrastructure.
I've nothing to recommend to do the orchestration, zookeeper or consul could help instead of tags and to trigger the runs.
Rundeck can tage the runs on different nodes and aggregate this in one job.
Which is best depends on your feeling there.