Hadoop JobTracker memory usage increasingv - apache

When I open jobtrackerhost:50030/jobtracker.jsp
I can see the Heap Size like:
Cluster Summary (Heap Size is 1.17 GB/7.99 GB)
It's continue increasing. After 3-5 days, it grows to the peak.
We have 2 Hadoop clusters.
On cluster A, the Heap Size stop increasing around the peak.
On cluster B, the heap size continue increasing, after 3-5 days, the jobtracker down. ( The process gone)
Now I really wonder to know why the heap size continue increasing? is it normal or is it a problem ?
Thanks,
Xinsong

#vefthym, I think the mapred-site.xml file is not complete since the hadoop cluster is managed by Cloudera Manager.
Here is the content in the mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2013-07-01T01:39:46.361Z-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://xxxx.com:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
</configuration>

Related

Resource is not exhausted, why is YARN waiting for resources

I have a 3 node cluster (1 Resource Manager, 2 Node Managers 32GB each).
I just follow the instructions from the apache flink doc YARN Setup when I started the YARN session, Flink run reports not enough resource to start AM. Then I stoped the YARN session, Flink run worked.
Application is Activated, waiting for resources to be assigned for AM. Last Node which was processed for the application : flink02.myminda.com:39826 ( Partition : [], Total resource : , Available resource : ). Details : AM Partition = ; Partition Resource = ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 18.333334 % ; Queue's Absolute max capacity = 100.0 % ;
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>flink01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>30720</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.log.aggregation-enable</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>4</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>10240</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx8192m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>10240</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx8192m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>10240</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx8192m</value>
</property>
</configuration>

When YARN is running the hadoop job submitted get stuck in Accepted state

I am using VirualBox to run Ubuntu 14 VM on Windows laptop. I have configured Apache distribution HDFS and YARN for Single Node. When I run dfs and YARN then all required demons are running. When I don't configure YARN and run DFS only then I can execute MapReduce job successfully, But when I run YARN as well then job get stuck at ACCEPTED state, I tried many settings regarding changing memory settings of node but no luck.
Following link I followed to set single node
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/SingleCluster.html
core-site.xml
`
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>`
settings of hdfs-site.xml`
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/shaileshraj/hadoop/name/data</value>
</property>
</configuration>`
settings of mapred-site.xml
`<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>`
settings of yarn-site.xml`
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
RM Web UI
Here is Application Master screen of RM Web UI. What I can see AM container is not allocated, may be that is problem
If the job is not getting enough number of resources, it will be in ACCEPTED state. Whenever it gets resources it will change to RUNNING state.
In your case, open Resource Manager WebUI and check how much of resources are available to run jobs.

Yarn-site.xml changes not reflecting

We have an application managed by yarn when we change yarn-site.xml those changes are not reflected , application is still running with old configuration. We are new to Yarn any help in this regard will be helpful
Note : we have already tried restarted yarn using stop-yarn.sh and start-yarn.sh also restared dfs using start-dfs.sh and stop-dfs.sh . We are using hadoop 2.7.3
this is what yarn looks with only max memory configured to 16GB as shown in the picture but actual configuration is 22GB as per yarn-site.xml
this is the yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hdfs-name-node</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>21528</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>6</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>21528</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///tmp/hadoop/data/nm-local-dir,file:///tmp/hadoop/data/nm-local-dir/filecache,file:///tmp/hadoop/data/nm-local-dir/usercache</value>
</property>
<property>
<name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
<value>500</value>
</property>
<property>
<name>yarn.nodemanager.localizer.cache.target-size-mb</name>
<value>512</value>
</property>
</configuration>
this is the node configuration
1 Master/Driver Node :
Memory :24GB Cores :8
4 Worker Nodes :
Memory :24GB Cores :8

My Yarn Map-Reduce Job is taking a lot of time

Input File size : 75GB
Number of Mappers : 2273
Number of reducers : 1 (As shown in the web UI)
Number of splits : 2273
Number of Input files : 867
Cluster : Apache Hadoop 2.4.0
5 nodes cluster, 1TB each.
1 master and 4 Datanodes.
It's been 4 hrs. now and still only 12% of map is completed. Just wanted to know given my cluster configuration does this make sense or is there anything wrong with the configuration?
Yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux- services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource- tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>32</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
Map-Reduce job where I am using multiple outputs. So reducer will emit multiple files. Each machine has 15GB Ram. Containers running are 8. Total memory available is 32GB in RM Web UI.
Any guidance is appreciated. Thanks in advance.
A few points to check:
The block & split size seems very small considering the data you shared. Try increasing both to an optimal level.
If not used, use a custom partitioner that would uniformly spread your data across reducers.
Consider using combiner.
Consider using appropriate compression (while storing mapper results)
Use optimum number of block replication.
Increase the number of reducers as appropriate.
These will help increase performance. Give a try and share your findings!!
Edit 1: Try to compare the log generated by a successful map task with that of the long running map task attempt. (12% means 272 map tasks completed). You will get to know where it got stuck.
Edit 2: Tweak these parameters: yarn.scheduler.minimum-allocation-mb, yarn.scheduler.maximum-allocation-mb, yarn.nodemanager.resource.memory-mb, mapreduce.map.memory.mb, mapreduce.map.java.opts, mapreduce.reduce.memory.mb, mapreduce.reduce.java.opts, mapreduce.task.io.sort.mb, mapreduce.task.io.sort.factor
These will improve the situation. Take trial and error approach.
Also refer: Container is running beyond memory limits
Edit 3: Try to understand a part of the logic, convert it to pig script, execute and see how it behaves.

NoClassDefFoundError HBase with YARN

I know that this is one of the topic that's asked much. Still after I digged into all of the topics I could find (most of them talking about CLASSPATH), I cant solve mine.
Examples of the topics I found and tried:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
java.lang.NoClassDefFoundError with HBase Scan
I'm using Hadoop 2.5.1 with HBase 0.98.11 on Ubuntu 14.04
I set up pseudo-distributed mode and running hadoop with hbase successfully. After I want to set up the full-distributed mode, jobs fail with NoClassDefFound error. I tried adding "export HADOOP_CLASSPATH=/usr/local/hbase-0.98.11-hadoop2/bin/hbase classpath" into hadoop-env (also yarn-env), still dont work.
One notice I found is if I comment the
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
I can run the jobs SUCCESSFULLY. BUT it seems that I run it on single not multi node.
Here are some of the configs:
mapred-site
<property>
<name>mapred.job.tracker</name>
<value>hadoop1:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>`
hdfs-site
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
yarn-site
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
In yarn-env and hadoop-env there is just as default except the HADOOP_CLASSPATH (which doesn't change things even if I add it or not..)
Here is the error trace:
2015-04-25 23:29:25,143 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at apriori2$FrequentItemsReduce.reduce(apriori2.java:550)
at apriori2$FrequentItemsReduce.reduce(apriori2.java:532)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Really thanks for every help sir.
With Yarn, you need to set "yarn.application.classpath" property with the classpath for your MapReduce job. "export HADOOP_CLASSPATH" would not work with Yarn.