Never successfully built a large hadoop&spark cluster - google-hadoop

I was wondering if anybody could help me with this issue in deploying a spark cluster using the bdutil tool.
When the total number of cores increase (>= 1024), it failed all the time with the following reasons:
Some machine is never sshable, like "Tue Dec 8 13:45:14 PST 2015: 'hadoop-w-5' not yet sshable (255); sleeping"
Some nodes fail with an "Exited 100" error when deploying spark worker nodes, like "Tue Dec 8 15:28:31 PST 2015: Exited 100 : gcloud --project=cs-bwamem --quiet --verbosity=info compute ssh hadoop-w-6 --command=sudo su -l -c "cd ${PWD} && ./deploy-core-setup.sh" 2>>deploy-core-setup_deploy.stderr 1>>deploy-core-setup_deploy.stdout --ssh-flag=-tt --ssh-flag=-oServerAliveInterval=60 --ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=us-central1-f"
In the log file, it says:
hadoop-w-40: ==> deploy-core-setup_deploy.stderr <==
hadoop-w-40: dpkg-query: package 'openjdk-7-jdk' is not installed and no information is available
hadoop-w-40: Use dpkg --info (= dpkg-deb --info) to examine archive files,
hadoop-w-40: and dpkg --contents (= dpkg-deb --contents) to list their contents.
hadoop-w-40: Failed to fetch http://httpredir.debian.org/debian/pool/main/x/xml-core/xml-core_0.13+nmu2_all.deb Error reading from server. Remote end closed connection [IP: 128.31.0.66 80]
hadoop-w-40: E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
I tried 16-core 128-nodes, 32-core 64-nodes, 32-core 32-nodes and other over 1024-core configurations, but either the above Reason 1 or 2 will show up.
I also tried to modify the ssh-flag to change the ConnectTimeout to 1200s, and change bdutil_env.sh to set the polling interval to 30s, 60s, ..., none of them works. There will be always some nodes which fail.
Here is one of the configurations that I used:
time ./bdutil \
--bucket $BUCKET \
--force \
--machine_type n1-highmem-32 \
--master_machine_type n1-highmem-32 \
--num_workers 64 \
--project $PROJECT \
--upload_files ${JAR_FILE} \
--env_var_files hadoop2_env.sh,extensions/spark/spark_env.sh \
deploy

To summarize some of the information that came out from a separate email discussion, as IP mappings change and different debian mirrors get assigned, there can be occasional problems where the concurrent calls to apt-get install during a bdutil deployment can either overload some unbalanced servers or trigger DDOS protections leading to deployment failures. These do tend to be transient, and at the moment it appears I can deploy large clusters in zones like us-east1-c and us-east1-d successfully again.
There are a few options you can take to reduce the load on the debian mirrors:
Set MAX_CONCURRENT_ASYNC_PROCESSES to a much smaller value than the default 150 inside bdutil_env.sh, such as 10 to only deploy 10 at a time; this will make the deployment take longer, but would lighten the load as if you just did several back-to-back 10-node deployments.
If the VMs were successfully created but the deployment steps fail, instead of needing to retry the whole delete/deploy cycle, you can try ./bdutil <all your flags> run_command -t all -- 'rm -rf /home/hadoop' followed by ./bdutil <all your flags> run_command_steps to just run through the whole deployment attempt.
Incrementally build your cluster using resize_env.sh; initially set --num_workers 10 and deploy your cluster, and then edit resize_env.sh to set NEW_NUM_WORKERS=20, and run ./bdutil <all your flags> -e extensions/google/experimental/resize_env.sh deploy and it will only deploy the new workers 10-20 without touching those first 10. Then you just repeat, adding another 10 workers to NEW_NUM_WORKERS each time. If a resize attempt fails, you simply ./bdutil <all your flags> -e extensions/google/experimental/resize_env.sh delete to only delete those extra workers without affecting the ones you already deployed successfully.
Finally, if you're looking for more reproducible and optimized deployments, you should consider using Google Cloud Dataproc, which lets you use the standard gcloud CLI to deploy cluster, submit jobs, and further manage/delete clusters without needing to remember your bdutil flags or keep track of what clusters you have on your client machine. You can SSH into Dataproc clusters and use them basically the same as bdutil clusters, with some minor differences, like Dataproc DEFAULT_FS being HDFS so that any GCS paths you use should fully-specify the complete gs://bucket/object name.

Related

Kubernetes rolling update with updating value in deployment file

I wanted to share a solution I did with kubernetes and have your opinion on best practice to do in such case. I'm still new to kubernetes.
I had a problem I wanted to be able to update my application by restarting my deployment pod that execute all the necessary action to do that already in command start.
I'm using microk8s and I wanted to just go to the good folder and execute microk8s kubectl apply -f myfilename and let kubernetes handle the rest with rolling update.
My issue was how to set dynamic value inside my .yaml file so the command would detect the change and start the process.
I've planned to do a bash script that do the job like the following:
file="my-file-deployment.yaml"
oldstr=`grep 'my' $file | xargs`
timestamp="$(date +"%Y-%m-%d-%H:%M:%S")"
newstr="value: my-version-$timestamp"
sed -i "s/$oldstr/$newstr/g" $file
echo "old version : $oldstr"
echo "Replaced String : $newstr"
sudo microk8s kubectl apply -f $file
on my deployment.yaml file I'm giving the following env:
env:
- name: version
value: my-version-2022-09-27-00:57:15
I'm switching with timestamp to a new value then I launch the command:
microk8s kubectl apply -f myfilename
it is working great for the moment. I still have to configure startupProbe to have a better rolling update execution because I'm having few second downtime which isn't cool.
Is there a better solution to work with rolling update using microk8s?
If you are trying to trigger a rolling update on your deployment (assuming it is a deployment), you can patch the deployment and let the cluster handle the rollout. Here's a trick I use and it's literally a one-liner:
kubectl -n {namespace} patch deployment {name-of-your-deployment} \
-p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}"
This will patch your deployment, adding an annotation to the template block. In this way, the cluster thinks there is a change requiring an update to the deployment's pods, and will cycle them while following the rollingUpdate clause.
The date +'%s' will resolve to a different number each time so every time you run this, it will cause the cluster to cycle the deployment's pods.
We use this trick to force a rolling update when we have done an update that requires our pods to be restarted.
You can accompany this with the rollout status command to wait for the update to complete:
kubectl rollout status deployment/{name-of-your-deployment} -n {namespace}
So a complete line would be something like this if I wanted to rolling update my nginx deployment and wait for it to complete:
kubectl -n nginx patch deployment nginx \
-p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" \
&& kubectl rollout status deployment/nginx -n nginx
One caveat, though. Using kubectl patch does not make changes to the yamls on disk, so if you wanted a copy of the change recorded locally, such as for auditing purposes, similar to what you are doing at the moment, then you could adapt this to do it as a dry-run and redirect output to file:
kubectl -n nginx patch deployment nginx \
-p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" \
--dry-run=client \
-o yaml >patched-nginx.yaml

podman CentOS 8 not starting container as non-root user

I am trying to start busybox container as non root on CentOS 8 server, but its giving the below message.
What is the correct way to start the container as non-root user?
podman run -it --name busy docker.io/library/busybox sh
Trying to pull docker.io/library/busybox...Getting image source signatures
Copying blob bdbbaa22dec6 done
Copying config 6d5fcfe5ff done
Writing manifest to image destination
Storing signatures
ERRO[0003] Error pulling image ref //busybox:latest: Error committing the finished image: error adding layer with blob "sha256:bdbbaa22dec6b7fe23106d2c1b1f43d9598cd8fc33706cc27c1d938ecd5bffc7": Error processing tar file(exit status 1): there might not be enough IDs available in the namespace (requested 65534:65534 for /home): lchown /home: invalid argument
Failed
Error: unable to pull docker.io/library/busybox: unable to pull image: Error committing the finished image: error adding layer with blob "sha256:bdbbaa22dec6b7fe23106d2c1b1f43d9598cd8fc33706cc27c1d938ecd5bffc7": Error processing tar file(exit status 1): there might not be enough IDs available in the namespace (requested 65534:65534 for /home): lchown /home: invalid argument
Yes, the command you run is correct. On my Fedora 31 system it works just fine.
[testuser#fedora31 ~]$ podman run -it --name busy docker.io/library/busybox sh
Trying to pull docker.io/library/busybox...
Getting image source signatures
Copying blob bdbbaa22dec6 done
Copying config 6d5fcfe5ff done
Writing manifest to image destination
Storing signatures
/ # exit
[testuser#fedora31 ~]$ podman --version
podman version 1.8.0
[testuser#fedora31 ~]$
The flag --rm is also often useful.
It seems the error you get is related to the UID mapping.
Here is some information regarding running "rootless" podman:
https://github.com/containers/libpod/blob/master/docs/tutorials/rootless_tutorial.md
What also might be interesting:
"Does not work on NFS or parallel filesystem homedirs"
Quote from
https://github.com/containers/libpod/blob/master/rootless.md

gke cant disable Transparent Huge Pages... permission denied

I am trying to run a redis image in gke. It works except I get the dreaded "Transparent Huge Pages" warning:
WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
Redis is currently too slow to be useful... So I tied turning off THP:
sheena#gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
sheena#gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ echo never > /sys/kernel/mm/transparent_hugepage/enabled
-bash: /sys/kernel/mm/transparent_hugepage/enabled: Permission denied
sheena#gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ sudo echo never > /sys/kernel/mm/transparent_hugepage/enabled
-bash: /sys/kernel/mm/transparent_hugepage/enabled: Permission denied
These permission errors are disconcerting. Redis wants THP off so it can work properly.
I did a little digging and found that google uses a special os image that makes /sys/ a read-only path. There's an alternative image that's based on Debian 7. It got me all excited but in the end I have exactly the same problem.
So how do I stop redis from being effected by THP on Google container engine?
It's not like I'm doing something unique here. Running databases in containers is pretty normal. And it's pretty normal for a database to malfunction when THP is enabled. So... what am I missing here?
Your command is slightly incorrect: echo runs as root but the redirection itself (>) runs as user so it can't write /sys/.
The following command works fine both on container-vm (debian based) and gci (chromeos based):
sudo sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
Persisting this setting on container-vm
Add this kernel command line parameter into /etc/default/grub (don't forget to run sudo update-grub and sudo reboot afterwards):
GRUB_CMDLINE_LINUX="... transparent_hugepage=never"
Persisting this setting on gci
First, using the cloud console copy the instance template that is in use by the node pool.
Second, under metadata change the value for userdata:
#cloud-config
write_files:
- path: /etc/systemd/system/hugepage.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Disable THP
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
[Install]
WantedBy=kubernetes.target
...
runcmd:
- ...
- systemctl enable hugepage.service
- systemctl start kubernetes.target
Third, change the instance template to the newly created one:
gcloud compute instance-groups managed set-instance-template \
gke-YOUCLUSTER-YOURPOOL-grp \
--template=YOURNEWTEMPLATENAME \
--zone=...
Forth, recreate the instace(s):
gcloud compute instance-groups managed recreate-instances \
gke-YOUCLUSTER-YOURPOOL-grp \
--zone=... \
--instances=...
The instances will loose all data and come up with THP disabled. All new instances will have THP disabled as well (in this node pool).

Upgrade Redis cluster Ubuntu

I have installed redis cluster 3.0.0. But Want to upgrade it to 3.0.7. Can somebody tell me the steps to do it?
I don't want to loose any data. And don't want any downtime either.
Steps I did when upgrading from 2.9.101 to 3.0 release. I hope it will do for upgrading to 3.0.7 too.
Compile 3.0.7 from the source and start several instances with cluster enabled.
Let the 3.0.7 instances replicate the 3.0.0 instances as slave
Connect to each 3.0.7 instance and do a manual failover, then the 3.0.0 masters would become slaves after several seconds.
Wait for your application to connect to the new masters; also check the configuration files, and modify the entries to the new masters on your need
Remove those slaves
UPDATE : Docker approach
As it's probably unable to replacing the binary executable while the process is still alive, you could do it by run some Redis in docker.
First you should install docker on your machine and pull the Redis image, or pull a basic OS image and manually build Redis in it, whatever
Based on this image, you are supposed to
copy your current redis.conf into it
make sure the dir exists in the image (cluster-config-file could be the same for all the containers as they are saved individually in their own fs)
make sure the directory for logfile exists and is not the same as dir (we will later map this directory to the host)
leave port logfile anything you like, as they are specified when a container is started
commit the image as redis-3.0.7
Now launch a containerized Redis. I suppose your logfile is located in /var/log/redis/, this Redis binds :8000, and your config file in the image is /etc/redis/redis.conf
docker run -d --net=host -v /var/log/redis:/var/log/redis \
-p 8000:8000 -t redis-3.0.7 \
/usr/bin/redis-server /etc/redis/redis.conf \
--port 8000 \
--logfile /var/log/redis/redis_8000.log
Now you have a Redis 3.0.7 instance, and are ready to finish the rest steps in the previous part.

Docker replicate UID/GID in container from host

When creating Docker containers I keep running into the issue of the UID/GID not being reflected in the container (I realize this is by design). What I am looking for is a way to keep host permissions reasonable and / or to replicate the UID/GID from the host user / group accounts in my Docker container. For instance:
host -
woot4moo:x:504:504:woot4moo:/home/woot4moo:/bin/bash
I would like this same behavior in the Docker container. That being said, is this even the right way to do this type of thing? My belief is I could simply run:
useradd -u 504 -g 504 woot4moo
as part of my Dockerfile, but I am not sure if that is valid.
You wouldn't want to run that as part of the image build process (in your Dockerfile), because the host on which someone is running a container is often not the host on which you are building the image.
One way of solving this is passing in UID/GID information via environment variables:
docker run -e APP_UID=100 -e APP_GID=100 ...
And then have an ENTRYPOINT script that includes something like the following before running the CMD:
useradd -c 'container user' -u $APP_UID -g $APP_GID appuser
chown -R $APP_UID:$APP_GID /app/data
I had similar issues and typically included entrypoint scripts in every image as it has already been mentioned (using https://github.com/ncopa/su-exec for interactive terminal programs). However, I kept repeating the same steps in multiple Dockerfiles. But after I used "docker.inside" from Jenkins Pipeline which does the user id handling auto-magically, I decided to build a Python 3 package based on docker-py to do this in a (hopefully) similar way (with some extended features I found helpful):
https://github.com/boon-code/docker-inside
I realize that the post is rather old; Maybe it's still helpful to someone with the same problem...