Is there anyway to get information about how many Garbage collection been performed for different generations from a dump file. When I try to run some psscor4 commands I get following.
0:003> !GCUsage
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to
displaying, finding or traversing objects as well as gc heap segments may not
work properly. !dumpheap and !verifyheap may incorrectly complain of heap
consistency errors.
Error: Requesting GC Heap data
0:003> !CLRUsage
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to
displaying, finding or traversing objects as well as gc heap segments may not
work properly. !dumpheap and !verifyheap may incorrectly complain of heap
consistency errors.
Error: Requesting GC Heap data
I can get output from eehpeap though, but it does not give me what I am looking for.
0:003> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0000000002c81030
generation 1 starts at 0x0000000002c81018
generation 2 starts at 0x0000000002c81000
ephemeral segment allocation context: none
segment begin allocated size
0000000002c80000 0000000002c81000 0000000002c87fe8 0x6fe8(28648)
Large object heap starts at 0x0000000012c81000
segment begin allocated size
0000000012c80000 0000000012c81000 0000000012c9e358 0x1d358(119640)
Total Size: Size: 0x24340 (148288) bytes.
------------------------------
GC Heap Size: Size: 0x24340 (148288) bytes.
Dumps
You can see the number of garbage collections in performance monitor. However, the way performance counters work makes me believe that this information is not available in a dump file and probably even not available during live debugging.
Think of Debug.WriteLine(): once the text was written to the debug output, it is gone. If you didn't have DebugView running at the time, the information is lost. And that's good, otherwise it would look like a memory leak.
Performance counters (as I understand them) work in a similar fashion. Various "pings" are sent out for someone else (the performance monitor) to be recorded. If noone does, the ping with all its information is gone.
Live debugging
As already mentioned, you can try performance monitor. If you prefer WinDbg, you can use sxe clrn to see garbage collections happen.
PSSCOR
The commands you mentioned, do not show information about garbage collection count:
0:016> !gcusage
Number of GC Heaps: 1
------------------------------
GC Heap Size 0x36d498(3,593,368)
Total Commit Size 0000000000384000 (3 MB)
Total Reserved Size 0000000017c7c000 (380 MB)
0:016> !clrusage
Number of GC Heaps: 1
------------------------------
GC Heap Size 0x36d498(3,593,368)
Total Commit Size 0000000000384000 (3 MB)
Total Reserved Size 0000000017c7c000 (380 MB)
Note: I'm using PSSCOR2 here, since I have the same .NET 4.5 issue on this machine. But I expect the output of PSSCOR4 to be similar.
Related
I'm using GraphDb Free 8.6.1 in research project, I'm running it with default configuration on linux server having 4GB memory.
However, it has started to throw exceptions pointing to insufficient memory:
Caused by: org.eclipse.rdf4j.repository.RepositoryException: Query evaluation error: Insufficient free Heap Memory 238Mb for group by and distinct, threshold:250Mb, reached 0Mb (HTTP status 500)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1143)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1066)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQueryViaHttp(SPARQLProtocolSession.java:834)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:763)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:391)
at org.eclipse.rdf4j.repository.http.HTTPTupleQuery.evaluate(HTTPTupleQuery.java:69)
Please, can you help me to identify the problem?
How can I properly configure GraphDB?
The behavior you observe is part of memory optimization of distinct/group by operations. The error message itself is related to the default threshold of 250 mb and it's there to let you know you need to adjust your memory. When the free heap memory became less than the threshold, a QueryEvaluationException is thrown so to avoid running out of memory due to hungry distinct/group by operation. You can adjust the threshold to minimize those errors, by reducing it with with passing the following argument when starting GraphDB "-Ddefaut.min.distinct.threshold=XXX" (which could be set to the amount of memory in bytes for the threshold).
Insufficient free Heap Memory 238Mb for group by and distinct, threshold:250Mb, reached 0Mb
238Mb = free heap space reported by the JVM
250Mb = the default threshold below which the protection should raise an exception to prevent OME
0Mb = the current buffer used for distinct and group by
I suspect another operation takes most of your RAM and once you run any DISTINCT/GROUP BY query it immediately stop because of the OME protection.
This answer would have helped me.
Example, if your machine has 32 GB RAM:
/opt/graphdb-free/app/graphdb-free.cfg (cutout)
[JVMOptions]
-Xms20G
-Xmx20G
-XX:PermSize=4G
-XX:MaxPermSize=4G
-Dfile.encoding=UTF-8
-Djava.net.preferIPv4Stack=true
--add-exports
jdk.management.agent/jdk.internal.agent=ALL-UNNAMED
--add-opens
java.base/java.lang=ALL-UNNAMED
Via GUI and Settings:
graphdb.page.cache.size 10G
Please help with the following question.
Assume that I have class that contains only methods. Will space in heap be allocated for objects created of this class? If yes then what does it contain?
The question linked by Fairoz contains most relevant data, but I'll try to narrow information to your case.
Yes. The JVM will take a contiguous space off the heap to store these objects.
The contents are specific to the JVM implementation. In HotSpot, you can see the specifics in the source code.
There will be a machine word called "Mark", which is defined here, and is used to keep the hashCode, locking state, and garbage collection. This takes 8 bytes.
Next will be a pointer to the Klass, which contains information about the class, such as methods.
If you're in a 64 bit JVM, with compressedOops enabled (as is default on java 8) the Klass pointer will take only 4 bytes. Since you have no fields, the total size is 12 bytes. However, the JVM forces to align to a full word, so your object will use 4 bytes for padding. In total, 16 bytes.
Some useful documentation:
- https://www.infoq.com/articles/Introduction-to-HotSpot
- https://psy-lob-saw.blogspot.com.es/2013/05/know-thy-java-object-memory-layout.html
I am trying to analyze memory usage pattern of Java Process with G1 Garbage Collector using jstat:
jstat -gc <Process_ID> 60s
The output looks like following:
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
0.0 229376.0 0.0 229376.0 1998848.0 1253376.0 16646144.0 301183.5 50176.0 40977.8 8704.0 5303.9 10 0.296 0 0.000 0.296
As understood, jstat provides information about Young Generation GC as well as Full GC. But it doesn't distinguish between Minor and Mixed collections. Considering that in an well tuned G1 collector, Full GC is not expected and mostly Mixed GC takes care of Tenured generations, I want to get information about different types YGC.
Is there any specific option for jstat which I should use?
I have noticed this discussion on Open JDK forum, but not sure if such feature is available at this point of time.
Please note, I am aware of the fact that GC logs can help me here, but I am specifically interested about jstat (considering it's light weight and can be used in production as per the need basis).
You can see this blog https://blogs.oracle.com/poonam/entry/understanding_g1_gc_logs , which has more detailed information about understanding the G1GC logs
At least as of JDK11, there's three different time counters for -gc:
YGCT: Young Garbage Collection Time, spent on young-only garbage collection.
FGCT: Full Garbage Collection Time, spent on the fallback full stop-the-world GC.
GCT: Total time spent on garbage collections of all types.
So, young-only gets accounted to YGCT and GCT, mixed goes to GCT only (not sure whether the Young portion is accounted to YGCT, but I suspect so based on the times), full/fallback goes to FGCT and GCT. So, GCT minus FGCT minus YGCT equals time spent collecting the old-gen within mixed collections.
I'm trying to understand the apc.shm_strings_buffer setting in apc.ini. After restarting PHP, the pie chart in the APC admin shows 8MB of cache is already used, even though there are no cached entries (except for apc.php, of course). I've found this relates to the apc.shm_strings_buffer setting.
Can someone help me understand what the setting means? The config file notes that this is the "shared memory size reserved for strings, with M/G suffixe", but I fail to comprehend.
I'm using APC with PHP-FPM.
The easy part to explain is "with M/G suffixe" which means that if you set it to 8M, then 8 megabytes would be allocated, or 1G would allocated 1 gigabyte of memory.
The more difficult bit to explain is that it's a cache for storing strings that are used internally by APC when it's compiling and caching opcode.
The config value was introduced in this change and the bulk of the change was to add apc_string.c to the APC project. The main function that is defined in that C file is apc_new_interned_string which is then used in apc_string_pmemcpy in apc_compile.c. the rest of the APC module to store strings.
For example in apc_compile.c
/* private members are stored inside property_info as a mangled
* string of the form:
* \0<classname>\0<membername>\0
*/
CHECK((dst->name = apc_string_pmemcpy((char *)src->name, src->name_length+1, pool TSRMLS_CC)));
When APC goes to store a string, the function apc_new_interned_string looks to see if it that string is already saved in memory by doing a hash on the string, and if it is already stored in memory, it returns the previous instance of the stored string.
Only if that string is not already stored in the cache does a new piece of memory get allocated to store the string.
If you're running PHP with PHP-FPM, I'm 90% confident that the cache of stored strings is shared amongst all the workers in a single pool, but am still double-checking that.
The whole size allocated to storing shared strings is allocated when PHP starts up - it's not allocated dynamically. So it's to be expected that APC shows the 8MB used for the string cache, even though hardly any strings have actually been cached yet.
Edit
Although this answers what it does, I have no idea how to see how much of the shared string buffer is being used, so there's no way of knowing what it should be set to.
The size of the shared memory ("local memory" in OpenCL terms) is only 16 KiB on most nVIDIA GPUs of today.
I have an application in which I need to create an array that has 10,000 integers. so the amount of memory I will need to fit 10,000 integers = 10,000 * 4b = 40kb.
How can I work around this?
Is there any GPU that has more than 16 KiB of shared memory ?
Think of shared memory as explicitly managed cache. You will need to store your array in global memory and cache parts of it in shared memory as needed, either by making multiple passes or some other scheme which minimises the number of loads and stores to/from global memory.
How you implement this will depend on your algorithm - if you can give some details of what it is exactly that you are trying to implement you may get some more concrete suggestions.
One last point - be aware that shared memory is shared between all threads in a block - you have way less than 16 kb per thread, unless you have a single data structure which is common to all threads in a block.
All compute capability 2.0 and greater devices (most in the last year or two) have 48KB of available shared memory per multiprocessor. That begin said, Paul's answer is correct in that you likely will not want to load all 10K integers into a single multiprocessor.
You can try to use cudaFuncSetCacheConfig(nameOfKernel, cudaFuncCachePrefer{Shared, L1}) function.
If you prefer L1 to Shared, then 48KB will go to L1 and 16KB will go to Shared.
If you prefer Shared to L1, then 48KB will go to Shared and 16KB will go to L1.
Usage:
cudaFuncSetCacheConfig(matrix_multiplication, cudaFuncCachePreferShared);
matrix_multiplication<<<bla, bla>>>(bla, bla, bla);