how do we restore kubeflow from backups if the installation is destroyed or how we can back the kubflow as it was if the eks cluster is destroyed - amazon-eks

How am i going to take backup for my kubeflow pipeline and restore it if the installing is failed or the eks cluster is destroyed. i have some finding to get the image of the vanila i am using for database and find out how to take backup and restore but i didnt have any luck so far.
i have e kubflow running on aws eks cluster
and i have 15/16 kubeflow pipeline running
i used vanilla for database
so now i need yours help to know how to backup the pipelines
and restore the kubeflow pipeline if anything happens to the kubeflow
or eks.

If you inspect the Kubeflow manifest file, you'll see the list of dependencies it has. The largest one is the database. For those on AWS, you can use RDS as a target for running the database rather than a self-hosted one in Kubernetes.
You can see the instructions for that here.
I used to be the Product Manager for Open Source MLOps at AWS and my team wrong that post.

Related

In amazon eks - how to view logs which are prior to eks fargate node creation and logs while pods is starting up

I'm using amazon EKS fargate. I can see container logs using fluentbit side car etc no problem at all. But those logs ONLY show what is happening inside the container AFTER it has started up
I enabled aws eks cluster logging fully
Now I would like to see logs in cloudwatch which is equivalent of
kubectl describe pod
command
I have searched the ENTIRE cloudwatch clustername log group and am not able to find logs like
"pulling image into container"
"efs not mounted"
etc
I want to see logs in cloudwatch prior to the actual container creation stage
IS it possible at all using eks fargate ?
Thanks a bunch
You can use Container Insights which can collect metrics by using performance log events using the embedded metric format. The logs are stored in CloudWatch Logs. CloudWatch generates several metrics automatically from the logs which you can view in the CloudWatch console.
In Amazon EKS and Kubernetes, Container Insights uses a containerized version of the CloudWatch agent to discover all of the running containers in a cluster. It then collects performance data at every layer of the performance stack.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-view-metrics.html

Google cloud kubernetes cluster newbie question

I am a newbie of GKE. I created a GKE cluster with very simple setup. It only has on gpu node and all other stuff was default. After the cluster is up, I was able to list the nodes and ssh into the nodes. But I have two questions here.
I tried to install nvidia driver using the command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
It output that:
kubectl apply --filename https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
daemonset.apps/nvidia-driver-installer configured
But 'nvidia-smi' cannot be found at all. Should I do something else to make it work?
On the worker node, there wasn't the .kube directory and the file 'config'. I had to copy it from the master node to the worker node to make things work. And the config file on the master node automatically updates so I have to copy again and again. Did I miss some steps in the creation of the cluster or how to resolve this problem?
I appreciate someone can shed some light on this. It drove me crazy after working on it for several days.
Tons of thanks.
Alex.
For the DaemonSet to work, you need to have a tag on your worker Node as cloud.google.com/gke-accelerator (see this line). The DaemonSet checks for this tag on a node before scheduling any pods for installing the driver. I'm guessing a default node pool you create did not have this tag on it. You can find more details on this on the GKE docs here.
The worker nodes, by design are just that worker nodes. They do not need privileged access to the Kubernetes API so they don't need any kubeconfig files. The communication between worker nodes and the API is strictly controlled through the kubelet binary running on the node. Therefore, you will never find kubeconfig files on a worker node. Also, you should never put them on the worker node either, since if a node gets compromised, the keys in that file can be used to damage the API Server. Instead, you should make it a habit to either use the master nodes for kubectl commands, or better yet, have the kubeconfig on your local machine, and keep it safe, and issue commands remotely to your cluster.
After all, all you need is access to an API endpoint for your Kubernetes API server, and it shouldn't matter where you access it from, as long as the endpoint is reachable. So, there is no need whatsoever to have kubeconfig on the worker nodes :)

CI/CD pipeline on AWS cloud for pyspark EMR application

I need to create CI/CD pipeline in AWS cloud for a pyspark application , finally this py-spark is to be invoked through a airflow DAG.
I am no expert on this either, but you can follow this guide:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/
The idea is to automate job testing in Spark local mode, then run a live job with infrastructure created on the fly and finally deploy the job to production if all the previous steps succeed. I would keep my production jobs automated in Airflow and run this CI/CD pipeline on development branches (these ones without deploying to production, of course) as well as on PR on the main branch. That way your production jobs will always be functioning correctly and only incorporate new functionality/changes after they are fully tested on development branches.

Creating a kubernetes cluster on GCP using Spinnaker

For end to end devops automation I want to have an environment on demand. For this I need to Spun up and environment on kubernetes which is eventually hosted on GCP.
My Use case
1. Developer Checks in the code in feature branch
2. Environment in Spun up on Google Cloud with Kubernetes
3. Application gets deployed on Kubernetes
4. Gets tested and then the environment gets destroyed.
I am able to do everything with Spinnaker except #2. i.e create Kube Cluster on GCP using Spinnaker.
Any help please
Thanks,
Amol
I'm not sure Spinnaker was meant for doing what the second point in your list. Spinnaker assumes a collection of resources (VM's or a Kubernetes cluster) and then works with that. So instead of spinning up a new GKE cluster Spinnaker makes use of existing clusters. I think it'd be better (for you costs as well ;) if you seperate the environments using Kubernetes namespaces.

Automatic cluster setup and app deployment on GCE Kubernetes

We are looking for a solid, declarative (yaml), based proceedure to automate the setup of our Kubernetes cluster and application deployments on Google Container Engine.
As our last resort in a serious failure we want to be able to:
Create a new GCE cluster
Execute all our deployments to their latest versions
Execute all the steps in the correct order
What are the solutions people are currently using. Doing this manually takes us about an hour and is error prone. Really it could take 15-20 mins if automated.
You should take a look at Google Cloud Deployment Manager. It "automates the creation and management of your Google Cloud Platform resources for you" meaning that it can create a Google Container Engine cluster as well as create your deployments.
Looking through the GKE deployment manager example should help get you started.