Screenshot of my memory status
Hi I'm getting a error when I try to run the TPCDS- Benchmark query
Memory Limit Exceeded by fragment: 9944e21b4d6634c0:1
HDFS_SCAN_NODE (id=2) could not allocate 1.95 KB without exceeding limit.
Process: memory limit exceeded. Limit=256.00 MB Total=286.62 MB Peak=380.11 MB
My computer has 10GB of RAM. However impala seems to be allocated only 256MB.
I have tried to increase memory limit on startup using mem_limit command but it doesn't do the trick.
I was able to solve my problem via Cloudera Manager.
Go to cloudera Manager Services > Impala > Configuration.
Under configuration search for "Memory" in the search bar. You will find the option to increase memory of impala daemon which can be setup appropriately.
Related
We have a Keycloak deployment running on Kubernetes. Our containers need to be periodically restarted because of high memory consumption. I want to analyze what is causing high memory consumption. How can I take JVM Heap dumps without modifying the Keycloak container image?
First, you can dump heap on demand with jmap command outside container.
You can also enable automatic heap dump on out of memory condition with -XX:+HeapDumpOnOutOfMemoryError JVM flag. Add -XX:HeapDumpPath to specify the path where to store heap dumps. JVM options can be added without modifying container image; just add the following environment variable:
JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/storage/path"
Finally, since these JVM options are manageable, you can set them in runtime with jcmd:
jcmd <PID> VM.set_flag HeapDumpOnOutOfMemoryError true
jcmd <PID> VM.set_flag HeapDumpPath /storage/path
We are using AWS Fargate ECS Tasks for our spring webflux java 11 microservice.We are using a FROM gcr.io/distroless/java:11 java image. When our application is dockerised locally and deployed as a image inside a docker container the memory utilization is quite efficient and we can see the heap usage never crosses 50%
However when we deploy the same image using the same dockerfile in AWS Fargate as a ECS task the AWS Dashbaord shows a completely different picture.The memory utilization never comes down and Cloudwatch logs show no OutOfMemory issues at all. In AWS ECS, once deployed we have done a Peak load test, a stress test after which the memory utilization reached 94% and then did a soak test for 6 hrs. The memory utilization was still 94% without any OOM errors.Memory the garbage collection is happening constantly and not letting the application go OOM.But it stays at 94%
For testing the application's memory utilization locally we are using Visual VM. We are also trying to connect to the remote ECS task in AWS Fargate using Amazon ECS Exec but that is work in progress
We have seen the same issue with other microservices in our and other clusters as well.Once it reaches a maximum number it never comes down.Kindly help if someone has faced the same issue earlier
Edit on 10/10/2022:
We connected to AWS Fargate ECS task using the Amazon ECS Exec and below were the findings
We analysed the GC logs of the AWS ECS Fargate Task and could see the messages.It uses the default GC i.e Simple GC. We keep getting "Pause Young Allocation Failure" which means that the memory assigned to the Young Generation is not enough and hence the GC fails.
[2022-10-09T13:33:45.401+0000][1120.447s][info][gc] GC(1417) Pause Full (Allocation Failure) 793M->196M(1093M) 410.170ms
[2022-10-09T13:33:45.403+0000][1120.449s][info][gc] GC(1416) Pause Young (Allocation Failure) 1052M->196M(1067M) 460.286ms
We made some code changes associated to byteArray getting copied in memory twice and the memory did come down but not by much
/app # ps -o pid,rss
PID RSS
1 1.4g
16 16m
30 27m
515 23m
524 688
1655 4
/app # ps -o pid,rss
PID RSS
1 1.4g
16 15m
30 27m
515 22m
524 688
1710 4
Even after a full gc like below the memory does not come down:
2022-10-09T13:39:13.460+0000][1448.505s][info][gc] GC(1961) Pause Full (Allocation Failure) 797M->243M(1097M) 502.836ms
One important observation was that after running inspect heap , a full gc got trigerred and even that didnt clear up the memory.It shows 679M->149M but the ps -o pid,rss command does not show the drop neither does the AWS Container Insights graph
2022-10-09T13:54:50.424+0000][2385.469s][info][gc] GC(1967) Pause Full (Heap Inspection Initiated GC) 679M->149M(1047M) 448.686ms
[2022-10-09T13:56:20.344+0000][2475.390s][info][gc] GC(1968) Pause Full (Heap Inspection Initiated GC) 181M->119M(999M) 448.699ms
How are you running it locally do you set any parameters (cpu/memory) for the container you launch? On Fargate there are multiple levels of resource configurations (size of the task and amount of resources you assign to the container - check out this blog for more details). Also the other thing to consider is that, with Fargate, you may land on an instance with >> capacity than the task size you configured. Fargate will create a cgroup that will box your container(s) to that size but some old programs (and java versions) are not cgroup-aware and they may assume the amount of memory you have is the memory available on the instance (that you don't see) and not the task size (and cgroup) that was configured.
I don't have an exact answer (and this did not fit into a comment) but this may be an area you can explore (being able to exec into the container should help - ECS exec is great for that).
We have a test server which hosts lots of test applications. when there are lots of process (or threads) running, we found new process or thread cannot be created:
for C program: "cannot fork, resource unavailable"
for java program: it throws exception "OutOfMemory, unable to create native thread"
I think it is due to the hard limit to the maximum number of process. I tried to set ulimit -n 255085. ulimit shows below:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
open files (-n) 90000
pipe size (512 bytes, -p) 10
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 255085
virtual memory (kbytes, -v) unlimited
but it doesn't work. I tried to run many processes at same time with different users and all they stops with same error at same time. therefore, I think there is a "limit" to the whole system regardless to the users logged in.
Your system looks to be out of virtual memory. In such case, there is no point to raise the number of processes.
Increase the swap area size to allow more processes to run.
Make sure you have enough RAM to run all these processes otherwise performance will suffer.
jvm 1 | WARN | Store limit is 102400 mb, whilst the data directory: C:\apach
e-activemq-5.8.0\bin\win32\..\..\data\kahadb only has 44093 mb of usable space
jvm 1 | ERROR | Temporary Store limit is 51200 mb, whilst the temporary data
directory: C:\apache-activemq-5.8.0\bin\win32\..\..\data\localhost\tmp_storage o
nly has 44093 mb of usable space
It's telling you that your configured limits don't fit with the amount of disk space you have available at the store location. This can lead to Broker failure in older versions as the limits are not lowered automatically to match the disk space, in the latest release the broker will lower the limits. When you see this it means you should either rethink you store location, or your broker config.
I'm using:
Cloudera Manager Free Edition: 4.5.1
Cloudera Hadoop Distro: CDH 4.2.0-1.cdh4.2.0.p0.10 (Parcel)
Hive Metastore with cloudera manager embedded PostgreSQL database.
My cloudera manager is running on a separate machine and it's not part of the cluster.
After setting up the cluster using cloudera manager, I started using hive through hue + beeswax.
Everything was running fine for a while and then all of a suddden, whenever I ran any query against a particular table that had a large number of partitions (about 14000), the query started to time out:
FAILED: SemanticException org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
When I noticed this, I looked at the logs and found out that the connection to the Hive Metastore was timing out:
WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
Having seen this, I thought there was a problem with the hive metastore. So I looked at the logs for the hive metastore and discovered java.lang.OutOfMemoryErrors:
/var/log/hive/hadoop-cmf-hive1-HIVEMETASTORE-hci-cdh01.hcinsight.net.log.out:
2013-05-07 14:13:08,744 ERROR org.apache.thrift.ProcessFunction: Internal error processing get_partitions_ with_auth
java.lang.OutOfMemoryError: Java heap space
at sun.reflectH.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.jav a:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.datanucleus.util.ClassUtils.newInstance(ClassUtils.java:95)
at org.datanucleus.store.rdbms.sql.expression.SQLExpressionFactory.newLiteralParameter(SQLExpressi onFactory.java:248)
at org.datanucleus.store.rdbms.scostore.RDBMSMapEntrySetStore.getSQLStatementForIterator(RDBMSMapE ntrySetStore.java:323)
at org.datanucleus.store.rdbms.scostore.RDBMSMapEntrySetStore.iterator(RDBMSMapEntrySetStore.java: 221)
at org.datanucleus.sco.SCOUtils.populateMapDelegateWithStoreData(SCOUtils.java:987)
at org.datanucleus.sco.backed.Map.loadFromStore(Map.java:258)
at org.datanucleus.sco.backed.Map.keySet(Map.java:509)
at org.datanucleus.store.fieldmanager.LoadFieldManager.internalFetchObjectField(LoadFieldManager.j ava:118)
at org.datanucleus.store.fieldmanager.AbstractFetchFieldManager.fetchObjectField(AbstractFetchFiel dManager.java:114)
at org.datanucleus.state.AbstractStateManager.replacingObjectField(AbstractStateManager.java:1183)
at org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.ja va)
at org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceFields(MStorageDescriptor.j ava)
at org.datanucleus.jdo.state.JDOStateManagerImpl.replaceFields(JDOStateManagerImpl.java:2860)
at org.datanucleus.jdo.state.JDOStateManagerImpl.replaceFields(JDOStateManagerImpl.java:2879)
at org.datanucleus.jdo.state.JDOStateManagerImpl.loadFieldsInFetchPlan(JDOStateManagerImpl.java:16 47)
at org.datanucleus.store.fieldmanager.LoadFieldManager.processPersistable(LoadFieldManager.java:63 )
at org.datanucleus.store.fieldmanager.LoadFieldManager.internalFetchObjectField(LoadFieldManager.j ava:84)
at org.datanucleus.store.fieldmanager.AbstractFetchFieldManager.fetchObjectField(AbstractFetchFiel dManager.java:104)
at org.datanucleus.state.AbstractStateManager.replacingObjectField(AbstractStateManager.java:1183)
at org.apache.hadoop.hive.metastore.model.MPartition.jdoReplaceField(MPartition.java)
at org.apache.hadoop.hive.metastore.model.MPartition.jdoReplaceFields(MPartition.java)
at org.datanucleus.jdo.state.JDOStateManagerImpl.replaceFields(JDOStateManagerImpl.java:2860)
at org.datanucleus.jdo.state.JDOStateManagerImpl.replaceFields(JDOStateManagerImpl.java:2879)
at org.datanucleus.jdo.state.JDOStateManagerImpl.loadFieldsInFetchPlan(JDOStateManagerImpl.java:16 47)
at org.datanucleus.ObjectManagerImpl.performDetachAllOnTxnEndPreparation(ObjectManagerImpl.java:35 52)
at org.datanucleus.ObjectManagerImpl.preCommit(ObjectManagerImpl.java:3291)
at org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:369)
at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:256)
At this point, the hive metastore gets shutdown and restarted:
2013-05-07 14:39:40,576 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: Shutting down hive metastore.
2013-05-07 14:41:09,979 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: Starting hive metastore on po rt 9083
Now, to fix this, I've changed the max heap size of both the hive metastore server and the beeswax server:
1. Hive/Hive Metastore Server(Base)/Resource Management/Java Heap Size of Metastore Server : 2 GiB (First thing I did.)
2. Hue/Beeswax Server(Base)/Resource Management/Java Heap Size of Beeswax Server : 2 GiB (After reading some groups posts and stuff online, I tried this as well.)
Neither of these above 2 steps seem to have helped as I continue to see OOMEs in the hive metastore log.
Then I noticed that the actual metastore 'database' is being run as part of my cloudera manager and I'm wondering if that PostgreSQL process is running out of memory. I looked for ways to increase the java heap size for that process and found very little documentation regarding that.
I was wondering if one of you guys could help me solve this issue.
Should I increase the java heap size for the embedded database? If so, where would I do this?
Is there something else that I'm missing?
Thanks!
Did you try doing the below.
'SET hive.metastore.client.socket.timeout=300;'
This solved the issue for me. Let me know how it went.