I am running an application and after a while, the logs would display
15:31:41 [WARN] [15:31:41.002] [JOURNAL_FLUSHER] WARNING Journal flush operation took 3,678ms last 8 cycles average is 459ms
15:33:53 [WARN] [15:33:43.878] [JOURNAL_FLUSHER] WARNING Journal flush operation took 4,872ms last 8 cycles average is 609ms
15:34:48 [WARN] [15:34:40.084] [JOURNAL_FLUSHER] WARNING Journal flush operation took 4,941ms last 8 cycles average is 772ms
15:35:13 [WARN] [15:35:13.051] [JOURNAL_FLUSHER] WARNING Journal flush operation took 12,299ms last 8 cycles average is 2,306ms
15:38:19 [WARN] [15:38:13.718] [JOURNAL_FLUSHER] WARNING Journal flush operation took 2,366ms last 8 cycles average is 295ms
15:39:01 [WARN] [15:38:53.897] [JOURNAL_FLUSHER] WARNING Journal flush operation took 8,651ms last 8 cycles average is 1,377ms
What exactly does this mean? After a while, the application grinds to a halt and throws a heap space error. Does this mean I have to increase the heap size or the permgen size? My application runs well on another computer, however, on this one it will get exponentially slower. I am not really asking how to fix this problem, but what this warning means.
The JOURNAL_FLUSHER thread is responsible for writing the journal buffer to disk. These logs mean that there maybe a issue with disk performance. What is your platform (OS) and JVM settings? To fully figure out your problem, more detail information required.
Related
I am working on a project with a master computer connected via a CANOpen network to 4 slaves.
At each time step, the computer receives a measurement message from each slave, and sends them a control message. In total, 4 messages are received and 4 messages are sent at each time sample.
The message sent is a PDO with 6 data bytes (8 bytes including COB-ID)
The message received is a PDO with 8 data bytes (10 bytes including COB-ID)
My CAN network is configured at 1Mbit/s, and I run my program at 1000 Hz (1 ms sampling time). As the total load resulting from the messages described is 576 bits/cycle, the total load expected in the network is 576kbit/s, or 57%.
What I see, however, is that:
The controlling computer measures a load of ~86% (with minima of 68% and peaks of 100%).
A USB CAN bus analyser I connect to the network registers a traffic
of messages (count-wise) that is around half of what I nominally
expect (i.e., 4 sent, 4 received each cycle, for 50 seconds should result in 50k messages, while I only see 18-25k). Moreover, I receive
1-2 error messages per cycle from the slave devices that the
network is overloaded. Before it is pointed out, even counting the
size of these messages as part of traffic wouldn't get close to
explain the anomaly in load.
What I'd like to know is whether my way of calculating the CANOpen network load is correct. For instance, are there any protocol-specific handshakes, CRCs, or any sort of extra bytes sent to make the network simply work? It's nothing I could see in the wiki page of CANOpen, but I do know there are such appendices to messages in the original CAN bus standard.
In a CAN message, there is more than the data to be transmitted.
There is also the arbitration ID (11- or 29bits, depending on whether you use CAN 2.0A or 2.0B), there is a 15 bit CRC, an 7 bit EOF marker, the control field and also some other reserved bits.
Depending on the data, there may also be stuff bits.
Using CAN2.0B and assuming 48 bits (6 bytes) of data, you will get a message size of roughly 132 bits and roughly 151 bits for your 64 bits messages.
Summing this up, you will get roughly 1132 bits per cycle which is too much for a 1Mbit/s bus and 1000 Hz.
Hope that helps.
I'm using GraphDb Free 8.6.1 in research project, I'm running it with default configuration on linux server having 4GB memory.
However, it has started to throw exceptions pointing to insufficient memory:
Caused by: org.eclipse.rdf4j.repository.RepositoryException: Query evaluation error: Insufficient free Heap Memory 238Mb for group by and distinct, threshold:250Mb, reached 0Mb (HTTP status 500)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1143)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1066)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQueryViaHttp(SPARQLProtocolSession.java:834)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:763)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:391)
at org.eclipse.rdf4j.repository.http.HTTPTupleQuery.evaluate(HTTPTupleQuery.java:69)
Please, can you help me to identify the problem?
How can I properly configure GraphDB?
The behavior you observe is part of memory optimization of distinct/group by operations. The error message itself is related to the default threshold of 250 mb and it's there to let you know you need to adjust your memory. When the free heap memory became less than the threshold, a QueryEvaluationException is thrown so to avoid running out of memory due to hungry distinct/group by operation. You can adjust the threshold to minimize those errors, by reducing it with with passing the following argument when starting GraphDB "-Ddefaut.min.distinct.threshold=XXX" (which could be set to the amount of memory in bytes for the threshold).
Insufficient free Heap Memory 238Mb for group by and distinct, threshold:250Mb, reached 0Mb
238Mb = free heap space reported by the JVM
250Mb = the default threshold below which the protection should raise an exception to prevent OME
0Mb = the current buffer used for distinct and group by
I suspect another operation takes most of your RAM and once you run any DISTINCT/GROUP BY query it immediately stop because of the OME protection.
This answer would have helped me.
Example, if your machine has 32 GB RAM:
/opt/graphdb-free/app/graphdb-free.cfg (cutout)
[JVMOptions]
-Xms20G
-Xmx20G
-XX:PermSize=4G
-XX:MaxPermSize=4G
-Dfile.encoding=UTF-8
-Djava.net.preferIPv4Stack=true
--add-exports
jdk.management.agent/jdk.internal.agent=ALL-UNNAMED
--add-opens
java.base/java.lang=ALL-UNNAMED
Via GUI and Settings:
graphdb.page.cache.size 10G
I got a message that if we set min and max of heap size and young generation size to equal separately from [Java Performance], jvm will disable AdaptiveSizePolicy function automatically.
But this work failed in my test bed of jdk1.6 and 1.7.
Only add option -XX:-UseAdaptiveSizePolicy will disable JVM auto scale the young and heap size.
Set -XX:-UseAdaptiveSizePolicy
We have java based message processing system in which there are nearly 25 different queues and a topic. We have set this system at a max memory usage of 2GB and processes 40 messages per second on a normal day. This system is working fine for a couple days and then starts to spike on the memory, then reaches the limit.
In our analysis, we found the MemoryUsage holds the key for this cause and below is a leak suspect stack trace of the heap dump of one of the queue which is using nearly 50% of memory. It could be possible that higher volume message could have loaded the queue heavily. What is the optimal configuration for the MemoryUsage to be set up for this system?
519,955,448 (62.85%) [72] 8 org/apache/activemq/usage/MemoryUsage 0x80d8d180
519,843,456 (62.84%) [16] 2 java/util/concurrent/CopyOnWriteArrayList 0x80d8d210
519,843,392 (62.84%) [352] 89 array of java/lang/Object 0x822cd2e0
411,721,616 (49.77%) [72] 9 org/apache/activemq/usage/MemoryUsage 0x83833378
411,721,248 (49.77%) [16] 2 java/util/concurrent/CopyOnWriteArrayList 0x83835898
411,721,184 (49.77%) [8] 2 array of java/lang/Object 0x8383a730
411,718,600 (49.77%) [336] 33 org/apache/activemq/broker/region/Queue 0x83833120
411,693,720 (49.77%) [16] 2 org/apache/activemq/store/kahadb/KahaDBTransactionStore$1 0x838353e0
411,693,256 (49.77%) [24] 3 org/apache/activemq/store/kahadb/KahaDBTransactionStore 0x80d76aa0
411,689,856 (49.76%) [280] 37 org/apache/activemq/store/kahadb/KahaDBStore 0x80d74de0
358,088,168 (43.29%) [104] 14 org/apache/kahadb/journal/Journal 0x80d76790
356,119,216 (43.05%) [48] 1 java/util/concurrent/ConcurrentHashMap 0x80d773c0
356,119,168 (43.05%) [64] 16 array of java/util/concurrent/ConcurrentHashMap$Segment 0x80d8e628
It's hard to speculate too much with this limited amount of information. If your consumers fall behind, the memory will start to fill and there is nothing really you can do about it. At 40 msgs per sec, then it will go fast, I guess.
What you can do is to overflow the queue to disk after some memory limit. That would slow it down, but at least have it running during a spike.
The area itself is generally complex and as far as I know, there is no silver bullet.
Read on cursors etc and on memory usage and producer flow control.
Is there anyway to get information about how many Garbage collection been performed for different generations from a dump file. When I try to run some psscor4 commands I get following.
0:003> !GCUsage
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to
displaying, finding or traversing objects as well as gc heap segments may not
work properly. !dumpheap and !verifyheap may incorrectly complain of heap
consistency errors.
Error: Requesting GC Heap data
0:003> !CLRUsage
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to
displaying, finding or traversing objects as well as gc heap segments may not
work properly. !dumpheap and !verifyheap may incorrectly complain of heap
consistency errors.
Error: Requesting GC Heap data
I can get output from eehpeap though, but it does not give me what I am looking for.
0:003> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0000000002c81030
generation 1 starts at 0x0000000002c81018
generation 2 starts at 0x0000000002c81000
ephemeral segment allocation context: none
segment begin allocated size
0000000002c80000 0000000002c81000 0000000002c87fe8 0x6fe8(28648)
Large object heap starts at 0x0000000012c81000
segment begin allocated size
0000000012c80000 0000000012c81000 0000000012c9e358 0x1d358(119640)
Total Size: Size: 0x24340 (148288) bytes.
------------------------------
GC Heap Size: Size: 0x24340 (148288) bytes.
Dumps
You can see the number of garbage collections in performance monitor. However, the way performance counters work makes me believe that this information is not available in a dump file and probably even not available during live debugging.
Think of Debug.WriteLine(): once the text was written to the debug output, it is gone. If you didn't have DebugView running at the time, the information is lost. And that's good, otherwise it would look like a memory leak.
Performance counters (as I understand them) work in a similar fashion. Various "pings" are sent out for someone else (the performance monitor) to be recorded. If noone does, the ping with all its information is gone.
Live debugging
As already mentioned, you can try performance monitor. If you prefer WinDbg, you can use sxe clrn to see garbage collections happen.
PSSCOR
The commands you mentioned, do not show information about garbage collection count:
0:016> !gcusage
Number of GC Heaps: 1
------------------------------
GC Heap Size 0x36d498(3,593,368)
Total Commit Size 0000000000384000 (3 MB)
Total Reserved Size 0000000017c7c000 (380 MB)
0:016> !clrusage
Number of GC Heaps: 1
------------------------------
GC Heap Size 0x36d498(3,593,368)
Total Commit Size 0000000000384000 (3 MB)
Total Reserved Size 0000000017c7c000 (380 MB)
Note: I'm using PSSCOR2 here, since I have the same .NET 4.5 issue on this machine. But I expect the output of PSSCOR4 to be similar.