Why is Spark2 running on only one node?

Why is Spark2 running on only one node? - hadoop-yarn

I am running Spark2 from Zeppelin (0.7 in HDP 2.6) and I am doing an idf transformation which crashes after many hours. It is run on a cluster with a master and 3 datanodes: s1, s2 and s3. All nodes have a Spark2 client and each has 8 cores and 16GB RAM.
I just noticed it is only running on one node, s3, with 5 executors.
In zeppelin-env.sh I have set zeppelin.executor.instances to 32 and zeppelin.executor.mem to 12g and it has the line:
export MASTER=yarn-client
I have set yarn.resourcemanager.scheduler.class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.
I also set spark.executor.instances to 32 in the Spark2 interpreter.
Anyone have any ideas what else I can try to get the other nodes doing their share?

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.

Related

Ethereum Full Node needs too much storage

I have a small question regarding my geth node.
I have started the node on my machine with the following command:
geth --snapshot=false --mainnet --syncmode "full" --datadir=$HOME/.ethereum --port 30302 --http --http.addr localhost --http.port 8543 --ws --ws.port 8544 --ws.api personal,eth,net,web3 --http.api personal,eth,net,web3
Currently a full geth node is supposed to take up around 600GB of storage on my disk. But after checking my used disk space (command on ubuntu: #du -h) I spotted this:
Can anyone explain to me, why my full node is using 1.4TB of disk space for chaindata? The node is running for some time (around two weeks) and is fully synced. I am using Ubuntu 20.04.
Thanks in advance!

You set syncmode to "full" and disabled snapshot. This will get you an archive node which is much bigger than 600 GB. You can still get a full (but not archive) node by running with the default snapshot and syncmode settings.

Flink job on EMR runs only on one TaskManager

I am running EMR cluster with 3 m5.xlarge nodes (1 master, 2 core) and Flink 1.8 installed (emr-5.24.1).
On master node I start a Flink session within YARN cluster using the following command:
flink-yarn-session -s 4 -jm 12288m -tm 12288m
That is the maximum memory and slots per TaskManager that YARN let me set up based on selected instance types.
During startup there is a log:
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=12288, taskManagerMemoryMB=12288, numberTaskManagers=1, slotsPerTaskManager=4}
This shows that there is only one task manager. Also when looking at YARN Node manager I see that there is only one container running on one of the core nodes. YARN Resource manager shows that the application is using only 50% of cluster.
With the current setup I would assume that I can run Flink job with parallelism set to 8 (2 TaskManagers * 4 slots), but in case that submitted job has set parallelism to more than 4, it fails after a while as it could not get desired resources.
In case the job parallelism is set to 4 (or less), the job runs as it should. Looking at CPU and memory utilisation with Ganglia it shows that only one node is utilised, while the other flat.
Why is application run only on one node and how to utilise the other node as well? Did I need to set up something on YARN that it would set up Flink on the other node as well?
In previous version of Flik there was startup option -n which was used to specify number of task managers. The option is now obsolete.

When you're starting a 'Session Cluster', you should see only one container which is used for the Flink Job Manager. This is probably what you see in the YARN Resource Manager. Additional containers will automatically be allocated for Task Managers, once you submit a job.
How many cores do you see available in the Resource Manager UI?
Don't forget that the Job Manager also uses cores out of the available 8.
You need to do a little "Math" here.
For example, if you would have set the number of slots to 2 per TM and less memory per TM, then submitted a job with parallelism of 6 it should have worked with 3 TMs.

Kubernetes and GPU node cluster implementation practices

I am trying to understand the K8s gpu practices better, and implementing a small K8s GPU cluster which is suppose to work like below.
This going to be little long explanation, but I hope it will help to have many questions at once place to understand GPU practices better in Kubernetes.
Application Requirement
I want to create a K8s autoscale cluster.
Pods are running the models say a tensorflow based deep learning program.
Pods are waiting for a message in pub sub queue to appear and it can proceed
its execution once it recieves a message.
Now a message is queued in a PUB/SUB queue.
As message is available, pods reads it and execute deep learning program.
Cluster requirement
If no message is present in queue and none of the GPU based pods are executing program( i mean not using gpu), then gpu node pool should scale down to 0.
Design 1
Create a gpu node pool. Each node contains N GPU, where N >= 1.
Assign model trainer pod to each gpu. That is 1:1 mapping of pods and GPU.
When I tried assigning 2 pods to 2 GPU machine where each pod is suppose to run a mnist program.
What I noticed is
1 pod got allocated and executes the program and later it went into crash loop. May be I am doing some mistake as my docker image is suppose to run program once only as I was just doing feasibility test of running 2 pods simultaneously on 2 gpu of same node.Below is the error
Message Reason First Seen Last Seen Count
Back-off restarting failed container BackOff Jun 21, 2018, 3:18:15 PM Jun 21, 2018, 4:16:42 PM 143
pulling image "nkumar15/mnist" Pulling Jun 21, 2018, 3:11:33 PM Jun 21, 2018, 3:24:52 PM 5
Successfully pulled image "nkumar15/mnist" Pulled Jun 21, 2018, 3:12:46 PM Jun 21, 2018, 3:24:52 PM 5
Created container Created Jun 21, 2018, 3:12:46 PM Jun 21, 2018, 3:24:52 PM 5
Started container Started Jun 21, 2018, 3:12:46 PM Jun 21, 2018, 3:24:52 PM 5
The other pod didn't get assigned at all to GPU. Below is the message from pod events
0/3 nodes are available: 3 Insufficient nvidia.com/gpu.
Design 2
Have multiple GPU machines in gpu node pool with each node having only 1 GPU.
K8s, will assign each pod to each available GPU in node and hopefully there won't be any issue. I am yet to try this.
Questions
Is there any suggested practice to design above type of system in kubernetes as of version 1.10?
Is Design 1 approach not feasible as of 1.10 release? For eg, I have 2 GPU node with 24 GB GPU memory, is it possible such that K8s assign
1 pod to each GPU and each pods execute its own workload with 12GB memory limit each?
How do I scale down gpu node pool to 0 size through autoscaler?
In Design 2, say what if I run out of GPU memory? as curently in GCP 1 GPU node doesn't have more than 16 GB memory.
Again apologies for such a long question, but I hope it will help other also.
Updates
For question 2
I created a new cluster to reproduce same issue which I faced multiple times before, I am not sure what changed this time but 2nd pod is successfully allocated a GPU. I think with this result I can confirm that 1gpu-1pod mapping is allowed in a multi gpu single node
However restricting memory per gpu process is not feasible as of 1.10.

Both designs are supported in 1.10. I view design 2 as a special case of 1. You don't necessarily need to have 1 GPU per node. In case your pod needs more GPUs and memory, you have to have multiple GPUs per node, as you mentioned in question (4). I'd go with 1 unless there's a reason not to.
I think the best practice would be create a new cluster with no GPUs (a cluster has a default node pool), and then create a GPU node pool and attach it to the cluster. Your non-GPU workload can run in the default pool, and the GPU workload can run in the GPU pool. To support scaling-down to 0 GPUs, you need to set --num-nodes and --min-nodes to be 0 when creating the GPU node pool.
Docs:
Create a cluster with no GPUs: https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-cluster#creating_a_cluster
Create a GPU node pool for an existing cluster: https://cloud.google.com/kubernetes-engine/docs/concepts/gpus#gpu_pool

External checkpoints to S3 on EMR

I am trying to deploy a production cluster for my Flink program. I am using a standard hadoop-core EMR cluster with Flink 1.3.2 installed, using YARN to run it.
I am trying to configure my RocksDB to write my checkpoints to an S3 bucket. I am trying to go through these docs: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#set-s3-filesystem. The problem seems to be getting the dependencies working correctly. I receive this error when trying run the program:
java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:93)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.initialize(HadoopFileSystem.java:328)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:350)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory.<init>(FsCheckpointStreamFactory.java:99)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.createStreamFactory(FsStateBackend.java:282)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createStreamFactory(RocksDBStateBackend.java:273
I have tried both leaving and adjusting the core-site.xml and leaving it as is. I have tried setting the HADOOP_CLASSPATH to the /usr/lib/hadoop/share that contains(what I assume are) most of the JARs described in the above guide. I tried downloading the hadoop 2.7.2 binaries, and copying over them into the flink/libs directory. All resulting in the same error.
Has anyone successfully gotten Flink being able to write to S3 on EMR?
EDIT: My cluster setup
AWS Portal:
1) EMR -> Create Cluster
2) Advanced Options
3) Release = emr-5.8.0
4) Only select Hadoop 2.7.3
5) Next -> Next -> Next -> Create Cluster ( I do fill out names/keys/etc)
Once the cluster is up I ssh into the Master and do the following:
1 wget http://apache.claz.org/flink/flink-1.3.2/flink-1.3.2-bin-hadoop27-scala_2.11.tgz
2 tar -xzf flink-1.3.2-bin-hadoop27-scala_2.11.tgz
3 cd flink-1.3.2
4 ./bin/yarn-session.sh -n 2 -tm 5120 -s 4 -d
5 Change conf/flink-conf.yaml
6 ./bin/flink run -m yarn-cluster -yn 1 ~/flink-consumer.jar
My conf/flink-conf.yaml I add the following fields:
state.backend: rocksdb
state.backend.fs.checkpointdir: s3:/bucket/location
state.checkpoints.dir: s3:/bucket/location
My program's checkpointing setup:
env.enableCheckpointing(getCheckpointRate,CheckpointingMode.EXACTLY_ONCE)
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
env.getCheckpointConfig.setMinPauseBetweenCheckpoints(getCheckpointMinPause)
env.getCheckpointConfig.setCheckpointTimeout(getCheckpointTimeout)
env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
env.setStateBackend(new RocksDBStateBackend("s3://bucket/location", true))
If there are any steps you think I am missing, please let me know

I assume that you installed Flink 1.3.2 on your own on the EMR Yarn cluster, because Amazon does not yet offer Flink 1.3.2, right?
Given that, it seems as if you have a dependency conflict. The method org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration) was only introduced with Hadoop 2.4.0. Therefore I assume that you have deployed a Flink 1.3.2 version which was built with Hadoop 2.3.0. Please deploy a Flink version which was built with the Hadoop version running on EMR. This will most likely solve all dependency conflicts.
Putting the Hadoop dependencies into the lib folder seems to not reliably work because the flink-shaded-hadoop2-uber.jar appears to have precedence in the classpath.

Extend "local" storage on proxmox

There is Proxmox VE 3.0 with 2x1Tb HDD in mirror:
I have 1 huge container (~430Gb), that is located in local. I need to:
extend local with new 2x1Tb disks.
or create new storage (with new disks) and migrate container to new storage
How it can be done without reinstalling Proxmox?
Thanks!
P.S. I can create new storage item as LVM, but LVM is not suitable for containers

update to latest release should be fixed ;)
https://forum.proxmox.com/threads/proxmox-ve-6-2-released.69647/

proxmox ve 3 is running on debian 7 - support ended in 2018, your system is vulnerable like hell and even if you keep it you will probably end up when you will need to install ANY new package - all repositories are already unavailable for some year.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas