JVM Heap dumps of Keycloak running on kubernetes - jvm

We have a Keycloak deployment running on Kubernetes. Our containers need to be periodically restarted because of high memory consumption. I want to analyze what is causing high memory consumption. How can I take JVM Heap dumps without modifying the Keycloak container image?

First, you can dump heap on demand with jmap command outside container.
You can also enable automatic heap dump on out of memory condition with -XX:+HeapDumpOnOutOfMemoryError JVM flag. Add -XX:HeapDumpPath to specify the path where to store heap dumps. JVM options can be added without modifying container image; just add the following environment variable:
JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/storage/path"
Finally, since these JVM options are manageable, you can set them in runtime with jcmd:
jcmd <PID> VM.set_flag HeapDumpOnOutOfMemoryError true
jcmd <PID> VM.set_flag HeapDumpPath /storage/path

Related

AWS ECS Fargate Memory Utilization vs Local Docker

We are using AWS Fargate ECS Tasks for our spring webflux java 11 microservice.We are using a FROM gcr.io/distroless/java:11 java image. When our application is dockerised locally and deployed as a image inside a docker container the memory utilization is quite efficient and we can see the heap usage never crosses 50%
However when we deploy the same image using the same dockerfile in AWS Fargate as a ECS task the AWS Dashbaord shows a completely different picture.The memory utilization never comes down and Cloudwatch logs show no OutOfMemory issues at all. In AWS ECS, once deployed we have done a Peak load test, a stress test after which the memory utilization reached 94% and then did a soak test for 6 hrs. The memory utilization was still 94% without any OOM errors.Memory the garbage collection is happening constantly and not letting the application go OOM.But it stays at 94%
For testing the application's memory utilization locally we are using Visual VM. We are also trying to connect to the remote ECS task in AWS Fargate using Amazon ECS Exec but that is work in progress
We have seen the same issue with other microservices in our and other clusters as well.Once it reaches a maximum number it never comes down.Kindly help if someone has faced the same issue earlier
Edit on 10/10/2022:
We connected to AWS Fargate ECS task using the Amazon ECS Exec and below were the findings
We analysed the GC logs of the AWS ECS Fargate Task and could see the messages.It uses the default GC i.e Simple GC. We keep getting "Pause Young Allocation Failure" which means that the memory assigned to the Young Generation is not enough and hence the GC fails.
[2022-10-09T13:33:45.401+0000][1120.447s][info][gc] GC(1417) Pause Full (Allocation Failure) 793M->196M(1093M) 410.170ms
[2022-10-09T13:33:45.403+0000][1120.449s][info][gc] GC(1416) Pause Young (Allocation Failure) 1052M->196M(1067M) 460.286ms
We made some code changes associated to byteArray getting copied in memory twice and the memory did come down but not by much
/app # ps -o pid,rss
PID RSS
1 1.4g
16 16m
30 27m
515 23m
524 688
1655 4
/app # ps -o pid,rss
PID RSS
1 1.4g
16 15m
30 27m
515 22m
524 688
1710 4
Even after a full gc like below the memory does not come down:
2022-10-09T13:39:13.460+0000][1448.505s][info][gc] GC(1961) Pause Full (Allocation Failure) 797M->243M(1097M) 502.836ms
One important observation was that after running inspect heap , a full gc got trigerred and even that didnt clear up the memory.It shows 679M->149M but the ps -o pid,rss command does not show the drop neither does the AWS Container Insights graph
2022-10-09T13:54:50.424+0000][2385.469s][info][gc] GC(1967) Pause Full (Heap Inspection Initiated GC) 679M->149M(1047M) 448.686ms
[2022-10-09T13:56:20.344+0000][2475.390s][info][gc] GC(1968) Pause Full (Heap Inspection Initiated GC) 181M->119M(999M) 448.699ms
How are you running it locally do you set any parameters (cpu/memory) for the container you launch? On Fargate there are multiple levels of resource configurations (size of the task and amount of resources you assign to the container - check out this blog for more details). Also the other thing to consider is that, with Fargate, you may land on an instance with >> capacity than the task size you configured. Fargate will create a cgroup that will box your container(s) to that size but some old programs (and java versions) are not cgroup-aware and they may assume the amount of memory you have is the memory available on the instance (that you don't see) and not the task size (and cgroup) that was configured.
I don't have an exact answer (and this did not fit into a comment) but this may be an area you can explore (being able to exec into the container should help - ECS exec is great for that).

How to trace memory allocations in Apache httpd server?

I am running apache benchmark (ab) with server [httpd2.4.52] running locally. I want to track how many memory allocations and what size allocations does the server make.
I run 'valgrind --trace-malloc=yes ab -n 10 http://127.0.0.1/'
But the number of allocations is ~4.6k regardless of the number of requests (I tried 10,100,1000).
Is this because Apache uses its own custom memory allocator?
How can I track the allocations (specifically #allocations, total/avg size of allocations) for this custom allocator?
This page mentions an option named ALLOC_USE_MALLOC in apr code, but, I could not find this option in apr source code (I checked versions 1.7.0, 1.4.8, 1.4.2 and httpd2.0.51)

Weblogic 10.3.6 generates empty heapdump on OutOfMemoryError

I'm trying to generate a full heapdump from Weblogic 10.3.6 due to an OutOfMemoryError generated by a Web Application deployed on the Server.
I've setted the following start script:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/heapdump
When the OutOfMemoryError occurs, Weblogic generates an empty hprof file (0 bytes size) in /path/to/heapdump folder, and nothing happens: the Server remains in RUNNING mode, even if is not reachable anymore.
The java process is still alive, but with 0% of processor.
Even the server.out log seems completely frozen, without any trace of the OutOfMemoryError.
What's wrong with the configuration?
Probably you can use Java Flight Recorder to save events and check which objects are generating OOM.
(any profiler should work as well).
Been there :( . I remember at the time that we've found it was somewhat logical since there was not enough memory for normal operation, the JVM could not automagically find enough memory to create a heapdump either. If memory serves me well, at that time we did 2 things to debug the memory leak. First we were "lucky" enough that the problem was happening fairly regularly so a close manual monitoring was possible (monitoring of the gc.log looking for repeated FullGC and monitoring of the performance tab in the console). Knowing when the onset of the problem was starting we were doing some kill -3 to get the dump manually. We also used jstack {PID} (JDK 1.6 on Linux) with some luck. With those, at the time, the devs were able to identify the memory leak. Hope that helps.
Okay, your configuration looks alright.. you might want to check if the weblogic process user has the rights to edit the heap dump file.
You can take heap dump by Java tools :
JAVA_HOME/bin/jmap -dump:format=b,file=path_of_the_file
OR
%JROCKIT_HOME%\bin\jrcmd hprofdump filename=path_of_the_file

how to configure hard memory limit for builds in drone.io

Perhaps I am missing it, but I see no method to control the the hard memory limit for any given build (I have builds being murdered because of it). Is the build memory limit based on the build params supplied by the client (this means a single client can bring down everything) or is there someplace I can configure the service to only allow 512mb (for example) per build?
You can limit the max amount of memory per-container by setting the global DRONE_LIMIT_MEM variable (with the server). This should be set to the amount of memory in bytes, for example:
DRONE_LIMIT_MEM_SWAP=512000000
DRONE_LIMIT_MEM=512000000
These limits are passed to Docker when Drone starts a container [1]. It is the equivalent to the following Docker command:
docker run --memory=512000000 <image>
[1] https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory

TeamCity how to set JVM Arguments

my teamcity build server has following JVM Arguments:
-Xmx512m -XX:MaxPermSize=270m
sometimes it shows some memory problem message like "TeamCity server memory usage for PS Old Gen pool exceeded 91% of 341 MB maximum available. 437 MB used of 506 MB total heap available. See the TeamCity documentation for possible solutions."
i read here https://confluence.jetbrains.com/display/TCD8/Installing+and+Configuring+the+TeamCity+Server#InstallingandConfiguringtheTeamCityServer-SettingUpMemorysettingsforTeamCityServer that the minimum recommended settings are: -Xmx750m -XX:MaxPermSize=270m.
how/where do i change this setting?
In TC9+ it is possible to set this variable in TC Server GUI:
Administration -> Diagnostics -> Internal Properties -> Edit internal properties
For 64-bit JVM the recommended setting is:
TEAMCITY_SERVER_MEM_OPTS=-Xmx4g -XX:MaxPermSize=270m -XX:ReservedCodeCacheSize=350m
Just add this line to the Internal properties edit box
I would recommend adding the JVM memory options in the startup script (start.sh) for server based startup using the variable TEAMCITY_SERVER_MEM_OPTS . Please do not set it in the profile of the userid that runs teamcity.
This link should be helpful to you.
In case you want different memory settings for server and agent(usually that's the case), please be selective in naming the variables so that there is a difference in identifying the JVM options for server and agent startup.
As a rule of thumb for teamcity setups, I normally let my teamcity server have 20% more memory than my avg usage to account for any increased load during peak usage periods.
Internal properties are read after the JVM is started and so the heap settings will not take effect if put where another answer suggests. I was looking into how to do this for a TeamCity container and the best option seems to be to use environment variables (TEAMCITY_SERVER_MEM_OPTS). For a container, those can be set by passing -e TEAMCITY_SERVER_MEM_OPTS='...' when creating the container.