My java app is running out of memory and takes 40 minutes until CF shows the status crash.
2019-09-13T16:11:56.72+0200 [RTR/3] OUT
2019-09-13T16:12:47.13+0200 [APP/PROC/WEB/0] ERR Resource exhaustion event: the JVM was unable to allocate memory from the heap.
2019-09-13T16:12:47.13+0200 [APP/PROC/WEB/0] ERR ResourceExhausted! (1/0)
2019-09-13T16:13:13.69+0200 [APP/PROC/WEB/0] OUT | Instance Count | Total Bytes | Class Name |
2019-09-13T16:13:13.71+0200 [APP/PROC/WEB/0] OUT | 13758 | 1210704 | Ljava/lang/reflect/Method; |
two minutes later there is still no reaction - CF apps manager does not show any issues with this application
application is no longer responding....
2019-09-13T16:49:07.93+0200 [HEALTH/0] ERR Failed to make TCP connection to port 8080: timed out after 1.00 seconds
2019-09-13T16:49:07.93+0200 [CELL/0] OUT Container became unhealthy
but about 34 minutes after the heap dump, cf reports about an unhealthy container...
hm - that is a looong time.
Related
While sending the high traffic(same message and payload) we observed some 503 intermittently.
Further we collected tcpdump in the same time interval and observed a GOAWAY http/2 frame sent by netty server after every 10 min.
We have set idleTimeout = 600000 using below mentioned addServerCustomizers
factory.addServerCustomizers(httpServer -> httpServer.idleTimeout(Duration.ofMillis(idleTimeout)));
In the next test we changed idleTimeout idleTimeout = 420000 and observed GOAWAY http/2 frame in tcpdump after every 3 min 23 sec and 3 min 37 sec alternately.
In the next test we disabled idleTimeout by commenting the addServerCustomizers and 503 was not observed further.
Netty send a GOAWAY http/2 frame in running traffic after idleTimeout in running traffic.
* Reactor version(s) used: 1.0.23
* Other relevant libraries versions (eg. `netty`, ...): (spring-boot-starter-webflux:2.7.4, spring-boot-starter:2.7.4 (*), spring-boot-starter-json:2.7.4 (*), spring-boot-starter-reactor-netty:2.7.4)
+--- org.springframework.boot:spring-boot-starter-webflux:2.7.4
| +--- org.springframework.boot:spring-boot-starter:2.7.4 (*)
| +--- org.springframework.boot:spring-boot-starter-json:2.7.4 (*)
| +--- org.springframework.boot:spring-boot-starter-reactor-netty:2.7.4
| | \--- io.projectreactor.netty:reactor-netty-http:1.0.23
* JVM version (`java -version`): 17.0.5
* OS and version (eg. `uname -a`): Linux
Why Netty is sending a GOAWAY http/2 frame after idleTimeout since we have running traffic.
Also need some help on how to debug this issue, whether I can enable some logs or peg some metric.
SD-7.4.4 (ubuntu 16) Director-7.4.4(ubuntu 16) FD-5.2.10 (windows)
I'm having trouble backing up windows clients with Bacula. I can run a backup just fine when the backup size is around 1MB or 2 but when running a backup of 500MB, I get the same error every time
"Director's connection to SD for this Job was lost."
Some things to mention. When I issue status client:
Terminated Jobs: JobId Level Files Bytes Status Finished
======================================================================
81 Full 5,796 514.8 M OK 06-Nov-17 12:50 BackupComputerA
When I issue status dir
06-Nov 17:58 acme-director JobId 81: Error: Director's connection to SD for this Job was lost.
06-Nov 17:58 acme-director JobId 81: Error: Bacula acme-director 7.4.4 (202Sep16):
Build OS: arm-unknown-linux-gnueabihf debian 9.0
JobId: 81
Job: BackupComputerA.2017-11-06_17.41.01_03
Backup Level: Full (upgraded from Incremental)
Client: "Computer-A-fd" 5.2.10 (28Jun12) Microsoft (build 9200), 32-bit,Cross-compile,Win32
FileSet: "Full Set" 2017-11-03 22:12:58
Pool: "RemoteFile" (From Job resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "File1" (From Job resource)
Scheduled time: 06-Nov-2017 17:40:59
Start time: 06-Nov-2017 17:41:04
End time: 06-Nov-2017 17:58:00
Elapsed time: 16 mins 56 secs
Priority: 10
FD Files Written: 5,796
SD Files Written: 0
FD Bytes Written: 514,883,164 (514.8 MB)
SD Bytes Written: 0 (0 B)
Rate: 506.8 KB/s
Software Compression: 100.0% 1.0:1
Snapshot/VSS: yes
Encryption: yes
Accurate: no
Volume name(s):
Volume Session Id: 1
Volume Session Time: 1509989906
Last Volume Bytes: 8,045,880,119 (8.045 GB)
Non-fatal FD errors: 1
SD Errors: 0
FD termination status: OK
SD termination status: Error
Termination: *** Backup Error ***
About 5 minutes into the backup, I get a message:
Running Jobs:
Console connected at 06-Nov-17 18:08
JobId Type Level Files Bytes Name Status
======================================================================
83 Back Full 0 0 BackupComputerE has terminated
====
The job completes and terminates but loses connection afterwards and I never get a
"OK"
for the status update.
I have added the "Heartbeat Interval = 1 Minute" to all the Daemons and still no luck. Using mysql as the database on the Director
Future thanks for any help
For anyone having the same issues, I was able to fix this problem between the SD and director by adding the heartbeat interval to the clients and adjusting the keep alive time with
sysctl -w net.ipv4.tcp_keepalive_time=60
on both the Storage daemon and the director. Connecting remotely to the director with the bconsole also interrupted jobs so I ran bconsole on the same machine as the director and connected via ssh.
I am getting below Error in Fuse log. In order to solve it I thought I will give permission to db-32.log file but it doesn't allow me to change permission of this file. Instead I get a warning
"cannot change properties of db-32.log"
Please find below the error log
07:17:20,163 | INFO | AMQ-1-thread-1 | ActiveMQServiceFactory | 197 -
io.fabric8.mq.mq-fabric - 1.2.0.redhat-621084 | Broker amq failed to start.
Will try again in 10 seconds
07:17:20,163 | ERROR | AMQ-1-thread-1 | ActiveMQServiceFactory | 197 -
io.fabric8.mq.mq-fabric - 1.2.0.redhat-621084 | Exception on start:
/opt/install/jboss/jboss-fuse-6.2.1.redhat-084/data/amq/kahadb/db-32.log
(Permission denied)
java.io.FileNotFoundException: /opt/install/jboss/jboss-fuse-6.2.1.redhat-
084/data/amq/kahadb/db-32.log (Permission denied)
at java.io.RandomAccessFile.open0(Native Method)[:1.8.0_91]
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)[:1.8.0_91]
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)[:1.8.0_91]
Do you restart the broker under a different user than was previously used to run the broker?
Make sure the OS level user that runs the broker has full access to the configured KahaDB folder on the file system.
we're using the so called JDBC Master Slave architecture with Oracle DB. We have 2 nodes and each one has one Broker started. We start the Broker1 (on node1) and it becomes the MASTER obtaining the lock over the tables. Then we start the Broker2 on the node2 and this one starts as SLAVE. WE can see in the log of Slave broker that it's trying to obtain the lock every 10sec, but it fails:
2013-06-12 19:32:38,714 | INFO | Default failed to acquire lease. Sleeping for 10000 milli(s) before trying again... | org.apache.activemq.store.jdbc.LeaseDatabaseLocker | main
2013-06-12 19:32:48,720 | INFO | Default Lease held by Default till Wed Jun 12 19:32:57 UTC 2013 | org.apache.activemq.store.jdbc.LeaseDatabaseLocker | main
Everything works fine and then in one moment in SLAVE's log we see that it suddenly becomes the MASTER:
2013-06-13 00:38:11,262 | INFO | Default Lease held by Default till Thu Jun 13 00:38:17 UTC 2013 | org.apache.activemq.store.jdbc.LeaseDatabaseLocker | main
2013-06-13 00:38:11,262 | INFO | Default failed to acquire lease. Sleeping for 10000 milli(s) before trying again... | org.apache.activemq.store.jdbc.LeaseDatabaseLocker | main
...
2013-06-13 00:38:21,314 | INFO | Default, becoming the master on dataSource: org.apache.commons.dbcp.BasicDataSource#9c6a99d | org.apache.activemq.store.jdbc.LeaseDatabaseLocker | main
2013-06-13 00:38:21,576 | INFO | Apache ActiveMQ 5.8.0 (Default, ID:corerec3-49774-1371083901328-0:1) is starting | org.apache.activemq.broker.BrokerService | main
2013-06-13 00:38:21,692 | WARN | Failed to start jmx connector: Cannot bind to URL [rmi://localhost:1616/jmxrmi]: javax.naming.NameAlreadyBoundException: jmxrmi [Root exception is java.rmi.AlreadyBoundException: jmxrmi]. Will restart management to re-create jmx connector, trying to remedy this issue. | org.apache.activemq.broker.jmx.ManagementContext | JMX connector
2013-06-13 00:38:21,700 | INFO | Listening for connections at: tcp://corerec3:61617?transport.closeAsync=false | org.apache.activemq.transport.TransportServerThreadSupport | main
2013-06-13 00:38:21,700 | INFO | Connector openwire Started | org.apache.activemq.broker.TransportConnector | main
2013-06-13 00:38:21,701 | INFO | Apache ActiveMQ 5.8.0 (Default, ID:corerec3-49774-1371083901328-0:1) started | org.apache.activemq.broker.BrokerService | main
2013-06-13 00:38:21,701 | INFO | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main
2013-06-13 00:38:21,701 | ERROR | Memory Usage for the Broker (512 mb) is more than the maximum available for the JVM: 245 mb | org.apache.activemq.broker.BrokerService | main
2013-06-13 00:38:22,157 | INFO | Web console type: embedded | org.apache.activemq.web.WebConsoleStarter | main
2013-06-13 00:38:22,292 | INFO | ActiveMQ WebConsole initialized. | org.apache.activemq.web.WebConsoleStarter | main
2013-06-13 00:38:22,353 | INFO | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main
while the MASTER's log shows no change from what it usually outputs...
So, it seems that somehow SLAVE obtains the lock (due to hmm... for example connection loss between master and the DB), but if we don't restart the brokers we start losing messages...
The problem is that in the producers' log we can see that it successfully sends the messages to the QueueX, but we don't see the consumer's taking them from the queue...
If we go to the DB and query _ACTIVEMQ_MSGS_ table we see that the messages are unprocessed.
It looks as if the broker (Producers are connected to) has the lock and inserts the messages into the DB and the brokers Clients are consuming from doesn't have the lock and can't consult the tables...
I don't know if all this makes much sense, but I surely hope someone might shed some light upon this one...
I didn't want to saturate the post with the configuration details, but if you need specific details like failover config, IPs, ports etc. I will post it...
I usually see damaged queues in our activemq5.4.2 ,meaning queue's are malformed and i had to remove the kahaDB files and bounce the broker to resolve this. all messages stored in queue are lost during. How to prevent this damaged queues without loss of data ?
Below are the logs of broker,
ERROR | Failed to reset batching | org.apache.activemq.store.kahadb.KahaDBStore | ActiveMQ Broker[AMQBROKER-TEST] Scheduler
java.lang.IllegalStateException: PageFile is not loaded
at org.apache.kahadb.page.PageFile.assertLoaded(PageFile.java:715)
at org.apache.kahadb.page.PageFile.tx(PageFile.java:239)
at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.resetBatching(KahaDBStore.java:510)
at org.apache.activemq.store.ProxyMessageStore.resetBatching(ProxyMessageStore.java:93)
at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.resetBatch(QueueStorePrefetch.java:85)
at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:254)
at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:108)
at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1026)
at org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:783)
at org.apache.activemq.broker.region.Queue.access$100(Queue.java:83)
at org.apache.activemq.broker.region.Queue$2.run(Queue.java:123)
at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
INFO | Transport failed: java.net.SocketException: Broken pipe | org.apache.activemq.broker.TransportConnection.Transport | Async Exception Handler
WARN | Failed to register MBean: org.apache.activemq:BrokerName=AMQBROKER-TEST,Type=Queue,Destination=_ onEvent&X171249188Y1Z
INFO | Transport failed: java.net.SocketException: Broken pipe
INFO | Transport failed: java.net.SocketException: Connection reset
INFO | Transport failed: java.io.EOFException