I am seeing the following log from the web UI occasionally when my operators are getting killed. Is there any way I can control the memory settings that are used to communicate with YARN when negotiating a container ?
How does the typical YARN settings for a container heap and max memory relate to the Apex memory allocation model.
The info messages I see in the web UI are as follows:
Container [pid=14699,containerID=container_1462863487071_0015_01_000012] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 6.1 GB of 3.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462863487071_0015_01_000012 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 14817 14699 14699 14699 (java) 1584 1654 6426968064 393896 /usr/java/default/bin/java -Xmx4429185024 -Ddt.attr.APPLICATION_PATH=hdfs://dwh109.qaperf2.sac.int.threatmetrix.com:8020/user/dtadmin/datatorrent/apps/application_1462863487071_0015 -Djava.io.tmpdir=/data3/yarn/nm/usercache/root/appcache/application_1462863487071_0015/container_1462863487071_0015_01_000012/tmp -Ddt.cid=container_1462863487071_0015_01_000012 -Dhadoop.root.logger=INFO,RFA -Dhadoop.log.dir=/data3/yarn/container-logs/application_1462863487071_0015/container_1462863487071_0015_01_000012 -Ddt.loggers.level=com.datatorrent.*:INFO,org.apache.*:INFO com.datatorrent.stram.engine.StreamingContainer
|- 14699 14697 14699 14699 (bash) 1 2 108646400 303 /bin/bash -c /usr/java/default/bin/java -Xmx4429185024 -Ddt.attr.APPLICATION_PATH=hdfs://dwh109.qaperf2.sac.int.threatmetrix.com:8020/user/dtadmin/datatorrent/apps/application_1462863487071_0015 -Djava.io.tmpdir=/data3/yarn/nm/usercache/root/appcache/application_1462863487071_0015/container_1462863487071_0015_01_000012/tmp -Ddt.cid=container_1462863487071_0015_01_000012 -Dhadoop.root.logger=INFO,RFA -Dhadoop.log.dir=/data3/yarn/container-logs/application_1462863487071_0015/container_1462863487071_0015_01_000012 -Ddt.loggers.level=com.datatorrent.*:INFO,org.apache.*:INFO com.datatorrent.stram.engine.StreamingContainer 1>/data3/yarn/container-logs/application_1462863487071_0015/container_1462863487071_0015_01_000012/stdout 2>/data3/yarn/container-logs/application_1462863487071_0015/container_1462863487071_0015_01_000012/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
It looks like the operator requires more memory. You may add this property to have more memory allocated to the container. In properties.xml , for operator O in the application you may specify the property :
<property>
<name>dt.operator.O.attr.MEMORY_MB</name>
<value>2048</value>
</property>
For more advance option, take look at the Physical Plan preparation code.
https://github.com/apache/incubator-apex-core/blob/ddb7471edd37ef228432c7d80e1e118368e68450/engine/src/main/java/com/datatorrent/stram/plan/physical/PhysicalPlan.java
For more troubleshooting guide, take a look at
http://docs.datatorrent.com/troubleshooting/#configuring-memory
Related
I have a java program running in centos Box.
My -Xmx and -Xms set to 4000 Mb.
The program works fine.
But when i do free -m , the used memory is showing as 506 MB. As per my understanding , XMS memory should be reserved for JVM.Why does free command not showing the java used memory ?
I have also done jstat -gccapacity $(pidof java) and there NGCMN and NGCMX updated and have the same value ?
Any support would be helpful.
I'm running my program as java -Xms41000m -Xmx42000m -jar
Even when -Xmx and -Xms set to the same value, the space reserved for Java Heap is not immediately allocated in RAM.
Operating System typically allocates physical memory lazily, only on the first access to a virtual page. So, while unused part of Java Heap is not touched, it won't really consume memory.
You may use -XX:+AlwaysPreTouch option to forcibly touch all heap pages on JVM start.
on Ubuntu 16.04, I compiled the spinnaker SDK src/Acquisition/make, I got the "Acquisition" under bin/
When I run it, I got the error:
Number of cameras detected: 1
Running example for camera 0...
* DEVICE INFORMATION *
DeviceID: 18073382
DeviceSerialNumber: 18073382
DeviceVendorName: Point Grey Research
DeviceModelName: Grasshopper3 GS3-U3-32S4M
DeviceType: U3V
DeviceDisplayName: Point Grey Research
DeviceAccessStatus: OpenReadWrite
DeviceVersion: FW:v2.25.3.00 FPGA:v2.02
DeviceDriverVersion: none : 0.0.0.0
DeviceUserID:
DeviceIsUpdater: 0
DeviceInstanceId: 0113C726
DeviceLocation:
DeviceCurrentSpeed: HighSpeed
GUIXMLLocation: Device
GUIXMLPath: Input.xml
GenICamXMLLocation: Device
GenICamXMLPath:
DeviceU3VProtocol: 1
* IMAGE ACQUISITION *
Acquisition mode set to continuous...
Unable to begin image acquisition. Aborting with error -1010...
Camera 0 example complete...
Done! Press Enter to exit...
Acquisition_C: /softwarelib/Boost/boost_1_60_0/GCC_5_3_1/linux_cpp11/release/amd64/include/boost/thread/pthread/mutex.hpp:111: boost::mutex::~mutex(): Assertion `!res' failed
The sample code itself doesn't use mutex at all.
This error is due to insufficient usbfs memory allocation. Please refer to section 3 of the spinnaker readme as follows for info on how to increase the value to 1000:
===============================================================================
3. USB RELATED NOTES
On Linux systems, the USB-FS memory is restricted to 16 MB or less by default. To
increase this limit to make use of the imaging hardware's full capabilities, a
minor change needs to be made to the system.
To PERMANENTLY modify the USB-FS memory:
1. Open the /etc/default/grub file in any text editor. Find and replace:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
with this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash usbcore.usbfs_memory_mb=1000"
2. Update grub with these settings:
$ sudo update-grub
3. Reboot and test a USB 3.1 camera.
If this method fails to set the memory limit, to TEMPORARILY modify the USB-FS
memory until the next reboot, run the following command:
$ sudo sh -c 'echo 1000 > /sys/module/usbcore/parameters/usbfs_memory_mb'
To confirm that the memory limit has been successfully updated, run the following command:
$ cat /sys/module/usbcore/parameters/usbfs_memory_mb
If using multiple USB3 cameras, the USB-FS memory limit may need to exceed 1000.
More information on these changes can be found at:
https://www.flir.com/support-center/iis/machine-vision/application-note/understanding-usbfs-on-linux
My test execution shows "gc memory overhead exceeded" exception in linux cent os 7. I changed jmeter.bat's heap max size 6g and min size as 512m. I am not used any listeners, preprocessor, http header manager. Used regular expression extractor for 2 samplers and constant timer as common. I run my test in terminal and store result in jtl file. I run it for 250 users, rampup period as 1 and scheduler as 5400 seconds. But still issue persist..
System configuration:
Ram 8 GB
CPU octa core 3.12 GHz
Swap memory 16 GB
You say that you changed jmeter.bat, but the problem is on Linux, which doesn't use jmeter.bat. Unless it's a typo, try to change jmeter or jmeter.sh (whichever one you use to invoke JMeter).
Generally I would not recommend more than 2GB for moderate use, and 4GB for heavy use. For instance my settings are:
HEAP="-Xms4096m -Xmx4096m"
and I can run up to 300 concurrent users with a lot of samplers/heavy scripting even in GUI mode. Setting larger heap may cause larger pauses on GC, which can cause the exception you are getting.
After you start JMeter, run the following command to make sure the memory settings are indeed as you expect them to be:
ps -ef | grep JMeter
I actually changed Xmx in jmeter.bat file instead of jmeter.sh file since i used linux for this test. Jmeter.bat is supported in windows os and jmeter.sh is supported for Linux os. So that the above mentioned error occurred. Once I changed it in jmeter.sh file it works perfectly.
I would like to know the relation between the mapreduce.map.memory.mb and mapred.map.child.java.opts parameters.
Is mapreduce.map.memory.mb > mapred.map.child.java.opts?
mapreduce.map.memory.mb is the upper memory limit that Hadoop allows to be allocated to a mapper, in megabytes. The default is 512.
If this limit is exceeded, Hadoop will kill the mapper with an error like this:
Container[pid=container_1406552545451_0009_01_000002,containerID=container_234132_0001_01_000001]
is running beyond physical memory limits. Current usage: 569.1 MB of
512 MB physical memory used; 970.1 MB of 1.0 GB virtual memory used.
Killing container.
Hadoop mapper is a java process and each Java process has its own heap memory maximum allocation settings configured via mapred.map.child.java.opts (or mapreduce.map.java.opts in Hadoop 2+).
If the mapper process runs out of heap memory, the mapper throws a java out of memory exceptions:
Error: java.lang.RuntimeException: java.lang.OutOfMemoryError
Thus, the Hadoop and the Java settings are related. The Hadoop setting is more of a resource enforcement/controlling one and the Java is more of a resource configuration one.
The Java heap settings should be smaller than the Hadoop container memory limit because we need reserve memory for Java code. Usually, it is recommended to reserve 20% memory for code. So if settings are correct, Java-based Hadoop tasks should never get killed by Hadoop so you should never see the "Killing container" error like above.
If you experience Java out of memory errors, you have to increase both memory settings.
The following properties let you specify options to be passed to the JVMs running your tasks. These can be used with -Xmx to control heap available.
Hadoop 0.x, 1.x (deprecated) Hadoop 2.x
------------------------------- --------------------------
mapred.child.java.opts
mapred.map.child.java.opts mapreduce.map.java.opts
mapred.reduce.child.java.opts mapreduce.reduce.java.opts
Note there is no direct Hadoop 2 equivalent for the first of these; the advice in the source code is to use the other two. mapred.child.java.opts is still supported (but is overridden by the other two more-specific settings if present).
Complementary to these, the following let you limit total memory (possibly virtual) available for your tasks - including heap, stack and class definitions:
Hadoop 0.x, 1.x (deprecated) Hadoop 2.x
------------------------------- --------------------------
mapred.job.map.memory.mb mapreduce.map.memory.mb
mapred.job.reduce.memory.mb mapreduce.reduce.memory.mb
I suggest setting -Xmx to 75% of the memory.mb values.
In a YARN cluster, jobs must not use more memory than the server-side config yarn.scheduler.maximum-allocation-mb or they will be killed.
To check the defaults and precedence of these, see JobConf and MRJobConfig in the Hadoop source code.
Troubleshooting
Remember that your mapred-site.xml may provide defaults for these settings. This can be confusing - e.g. if your job sets mapred.child.java.opts programmatically, this would have no effect if mapred-site.xml sets mapreduce.map.java.opts or mapreduce.reduce.java.opts. You would need to set those properties in your job instead, to override the mapred-site.xml. Check your job's configuration page (search for 'xmx') to see what values have been applied and where they have come from.
ApplicationMaster memory
In a YARN cluster, you can use the following two properties to control the amount of memory available to your ApplicationMaster (to hold details of input splits, status of tasks, etc):
Hadoop 0.x, 1.x Hadoop 2.x
------------------------------- --------------------------
yarn.app.mapreduce.am.command-opts
yarn.app.mapreduce.am.resource.mb
Again, you could set -Xmx (in the former) to 75% of the resource.mb value.
Other configurations
There are many other configurations relating to memory limits, some of them deprecated - see the JobConf class. One useful one:
Hadoop 0.x, 1.x (deprecated) Hadoop 2.x
------------------------------- --------------------------
mapred.job.reduce.total.mem.bytes mapreduce.reduce.memory.totalbytes
Set this to a low value (10) to force shuffle to happen on disk in the event that you hit an OutOfMemoryError at MapOutputCopier.shuffleInMemory.
I've recently become quite fond of Upstart. Previously I've been using God, Monit and Bluepill but I don't really like these solutions so I'm giving Upstart a try.
I've been using the Foreman gem to generate some basic Upstart configuration files for my processes in /etc/init. However, these generated files only handle the respawning of a crashed process. I was wondering whether it's possible to tell Upstart to restart a process that's consuming for example > 150mb of memory, as you would with Monit, God or Bluepill.
I read through the Upstart docs and this looks like the thing I'm looking for. Though I have no clue how to config something like this.
What I basically want is quite simple. I want to restart my web process if the memory usage is > 150mb ram. These are the files I have:
|-- myapp-web-1.conf
|-- myapp-web-2.conf
|-- myapp-web-3.conf
|-- myapp-web.conf
|-- myapp.conf
And their contents are:
myapp.conf
pre-start script
bash << "EOF"
mkdir -p /var/log/myapp
chown -R deployer /var/log/myapp
EOF
end script
myapp-web.conf
start on starting myapp
stop on stopping myapp
myapp-web-1.conf / myapp-web-2.conf / myapp-web-3.conf
start on starting myapp-web
stop on stopping myapp-web
respawn
exec su - deployer -c 'cd /var/applications/releases/20110607140607; cd myapp && bundle exec unicorn -p $PORT >> /var/log/myapp/web-1.log 2>&1'
Any help much appreciated!
Appending this to the end of myapp-web-*.conf will cause any allocation calls trying to allocate more than 150mb of memory to return ENOMEM:
limit rss 157286400 157286400
The process might crash at this point, or it might not. That's up to the process!
Here's a test for this in the Upstart Source.
From the Upstart docs, the limits come from the rlimit system call options. (http://upstart.ubuntu.com/cookbook/#limit)
Since Linux 2.4+ setting the rss (Resident Set Size) has no effect.
An alternative already suggested in other answers is as which sets the virtual memory Address Space size limits. This will have a very different effect of setting 'real' memory limits.
limit as <soft limit> <hard limit>
Excerpt from man pages for setrlimit:
RLIMIT_AS
The maximum size of the process's virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2), and mremap(2),
which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that
kills the process if no alternate stack has been made available via sigaltstack(2)). Since the value is a long, on machines with a
32-bit long either this limit is at most 2 GiB, or this resource is unlimited.