I try to add a message to a queue with Java client but rabbitmq keeps me blocked.
Official documentation says at https://www.rabbitmq.com/disk-alarms.html :
When free disk space drops below a configured limit (50 MB by default), an alarm will be triggered and all producers will be blocked.
My disc space looks like this
So, I set the disk space in the config file:
disk_free_limit.absolute = 1000MB
but it does not increment it. Disk space still looks like above.
Also log file says this:
2022-01-17 16:17:34.538000+03:00 [info] <0.399.0> Enabling free disk space monitoring
2022-01-17 16:17:34.538000+03:00 [info] <0.399.0> Disk free limit set to 1000MB
2022-01-17 16:17:34.844000+03:00 [info] <0.399.0> Free disk space is insufficient. Free bytes: 40. Limit: 1000000000
2022-01-17 16:17:34.844000+03:00 [info] <0.223.0> Running boot step code_server_cache defined by app rabbit
2022-01-17 16:17:34.844000+03:00 [warning] <0.395.0> disk resource limit alarm set on node 'rabbit#BLG2A-V1-BB0268'.
2022-01-17 16:17:34.844000+03:00 [warning] <0.395.0>
2022-01-17 16:17:34.844000+03:00 [warning] <0.395.0> **********************************************************
2022-01-17 16:17:34.844000+03:00 [warning] <0.395.0> *** Publishers will be blocked until this alarm clears ***
2022-01-17 16:17:34.844000+03:00 [warning] <0.395.0> **********************************************************
2022-01-17 16:17:34.844000+03:00 [warning] <0.395.0>
How can I increment disk space?
My setup:
OS: Windows 10
RabbitMQ: 3.9.12
Erlang/OTP: 24.2
The alarm is telling you that your server only has 50MB of space left on the disk which RabbitMQ is trying to write to.
The disk_free_limit setting doesn't control how much disk is allocated, it controls how much disk is expected - if you set it to 1000MB, the alarm will be triggered as soon as there is only 1000MB left, rather than waiting until there is only 50MB left.
Making more disk space available is the same as it would be for any other program:
Delete other things that are using up your disk space - e.g. make sure log files are compressed and deleted after a certain amount of time
Configure RabbitMQ to use a different disk or partition, if you already have one that's bigger
Install a larger disk if it's a physical host, or allocate a larger disk image if it's a VM
This issue will be fixed in 3.9.13
https://github.com/rabbitmq/rabbitmq-server/pull/3970
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I changed RabbitMQ version to 3.9.11. Now, it is fixed.
I guess, version 3.9.12 was the problem.
Related
I know this isn't the right place to ask this but this site has the most users on it. I recently bought a Kemp LoadMaster LM-2600 Load Balancer for my webservers. However, this unit didn't include an SSD because the previous owner decided to erase it. So, I downloaded the VirtualBox version of free VLM from kemp's website. Then, I used VBoxManage clonehd LMOS.vmdk LMOS.img --format RAW to turn the disk into a raw img file. Then, I used dd if=LMOS.img of=/dev/sdb to flash a USB with the os. Then, I booted my loadmaster with the USB.
The boot process went like normal until it finished booting and then the machine switched to runlevel 0 (Shutdown)
This is the logs I got when I plugged the USB into my computer (The log file was so big that stack overflow won't allow me to paste it here):
https://pastebin.com/5PbKzRi6
I noticed that it said something about eth0 being down so I plugged in an ethernet cable and booted it again. The same thing happened but I got a different error (The log was shorter so I labeled it):
-- BOOT --
2022-08-07T19:50:06+00:00 lb100 syslog-ng: syslog-ng starting up; version='3.25.1'
-- ERROR --
2022-08-07T19:50:07+00:00 lb100 raid_events_handler: RAID controller not detected yet (check # 0)
-- LOGIN --
2022-08-07T19:50:11+00:00 lb100 login: pam_unix(login:session): session opened for user bal by LOGIN(uid=0)
-- ERROR --
2022-08-07T19:50:14+00:00 lb100 raid_events_handler: RAID controller not detected yet (check # 1)
-- SHUTDOWN --
2022-08-07T19:50:15+00:00 lb100 init: Switching to runlevel: 0
2022-08-07T19:50:15+00:00 lb100 kernel: S99final (938): drop_caches: 1
2022-08-07T19:50:17+00:00 lb100 syslog-ng: syslog-ng shutting down; version='3.25.1'
2022-08-07T19:50:17+00:00 lb100 kernel: Kernel logging (proc) stopped.
2022-08-07T19:50:17+00:00 lb100 kernel: Kernel log daemon terminating.
2022-08-07T19:50:17+00:00 lb100 sslproxy: (815) caught signal 15
2022-08-07T19:50:17+00:00 lb100 raid_events_handler: stop
I have no idea what to do right now. I already tried everything I knew. What should I do?
Any help would be great,
Thanks!
I use a celery worker server with redis as the broker url (for receiving tasks) as well as the result backend.
BROKER_URL = 'redis://localhost:6379/2'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/2'
app = Celery('myceleryapp', broker=BROKER_URL,backend=CELERY_RESULT_BACKEND)
I launch the celery worker server using celery -A myceleryapp worker -l info -c 8
The worker processes start processing my tasks from the redis queue until at some point, I receive the infamous MISCONF redis error and the celery worker process terminates.
Unrecoverable error: ResponseError('MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.',)
I checked the redis log files in /var/log/redis and the tail end of the file has the following
24745:C 19 Aug 09:20:26.169 * RDB: 0 MB of memory used by copy-on-write
1590:M 19 Aug 09:20:26.247 * Background saving terminated with success
1590:M 19 Aug 09:25:27.080 * 10 changes in 300 seconds. Saving...
1590:M 19 Aug 09:25:27.081 * Background saving started by pid 25397
25397:C 19 Aug 09:25:27.082 # Write error saving DB on disk: No space left on device
1590:M 19 Aug 09:25:27.181 # Backgroun1590:M 19 Aug 09:51:03.042 * 1 changes in 900 seconds. Saving...
1590:M 19 Aug 09:51:03.042 * Background saving started by pid 26341
26341:C 19 Aug 09:51:03.405 * DB saved on disk
26341:C 19 Aug 09:51:03.405 * RDB: 22 MB of memory used by copy-on-write
1590:M 19 Aug 09:51:03.487 * Background saving terminated with success
The dump.rdb file is being written to /var/lib/redis/dump.rdb.
Since the logs reported a No space left on device, I checked the disk space where /var is mounted and there seems to be sufficient space left (1.2GB).
How do I get to the root cause of this error if there is enough disk space? Of course, to prevent this error from happening, I could set config set stop-writes-on-bgsave-error no in redis-cli. But I want to get to the root cause of this error. Any help or pointers?
Maybe this is caused by the swap file. Because the swap file took the 1.2Gb space of your disk. So redis complains No space to write.
Try this "swapon -s" command to check this.
I think 1.2Gb is not enough if this disk accept the RAM page swap. you should change the dir of RDB in a more big dir.
Help please Folks
I am trying to set up my Hadoop multinode env (1 master, 1 secondary and 3 slaves - hadoop 2.7.1/Ubuntu 14 on AWS) and i am getting "NameSystem.getDatanode" ERROR message. I browsed and read and tried but reach my limits. Could you point me at least in some direction
Logs (extract) from master - xxx-141/142/143 are the ip of the slaves
'''''''''''''''''''''''''''''''''''
Line 134: 2016-01-23 17:36:19,432 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.143:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.
Line 135: 2016-01-23 17:36:19,457 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.
Line 159: 2016-01-23 17:36:20,988 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.141:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node XXX.XX.XX.143:50010 is expected to serve this storage.
Extract From SLAVE2 SERVER logs
'''''''''''''''''''''''''''''''''''''
2016-01-23 17:36:14,812 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
`2016-01-23 17:36:18,607 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x3c90bbfe60c, containing 1 storage report(s), of which we sent 0. The reports had 1 total blocks and used 0 RPC(s). This took 4 msec to generate and 144 msecs for RPC and NN processing. Got back no commands.
2016-01-23 17:36:18,608 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000 is shutting down
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(1XX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 1XX.XX.XX.141:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:495)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1791)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1315)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:163)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28543)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy13.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:199)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:463)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:688)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
at java.lang.Thread.run(Thread.java:745)
2016-01-23 17:36:18,610 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d)
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-1050309752-MAST.XX.XX.169-1453113991010
2016-01-23 17:36:20,611 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-01-23 17:36:20,613 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-01-23 17:36:20,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ip-SLAV-XX-XX-142/1XX.XX.XX.142
************************************************************/`
it looks like you have three slaves
172.31.22.141:50010
172.31.22.142:50010
172.31.22.143:50010
and you created two of them from a clone of the first slave, after the slave was already included in the cluster.
The two clones now already have a copy of the DFS and use the same storage ID as the first slave. Only one slave with the same ID is expected by the name server. It is trying to tell you this by logging:
[...] is attempting to report storage ID [...].
Node [...]:50010 is expected to serve this storage.
You can try removing the dfs directory on two of the slaves, then restarting them.
i.e. stop the slaves, do a rm -rf on the dfs directory like:
rm -rf /tmp/hadoop-hadoop/dfs/
You can then restart and check that all slaves are connecting and test file replication, e.g. by setting the replication level to 4 for some files like:
hdfs dfs -setrep -w 4 -R /user/somedir
The -w option causes the command to wait until replication has succeeded.
We have our hosting in aws. Recently after moving our blog from wordpress to aws, we are experiencing noticeable delay in server response time. Mainly while accessing the blog. Below are the logs from the error_log file,
[Wed Feb 25 06:10:10 2015] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Wed Feb 25 06:12:22 2015] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Wed Feb 25 06:12:36 2015] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Wed Feb 25 06:12:50 2015] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Wed Feb 25 06:13:35 2015] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[error] (12)Cannot allocate memory: fork: Unable to fork new process
[Wed Feb 25 06:27:14 2015] [error] (12)Cannot allocate memory: fork: Unable to fork new process
We increased the memory size from 256 to 512 mb in php.ini file. But, still the issue exist.
We also changed the KeepAlive as On. Still it doesn't resolve. Any suggestions / solutions would be of great help.
I've face that problem either while hosting a java app with jenkins, mysql & tomcat on ubuntu on an vm of AWS.
First steps I used to solve the problem with restarting a vm.
AWS doesn't give swap memory on a harddrive by default, so you'd better to make it with your hands. How to do this you can find here. Need to mention: the solution with swap zone (have no idea why) haven't work for me, I had to create a swap file.
Good luck to you!
I had same problem to fix it there is 2 options:
1- move from micro instances to small and this was the change that solved the problem (micro instances on amazon tend to have large cpu steal time)
2- tune the mysql database server configuration and my apache configuration to use a lot less memory.
tuning guide for a low memory situation such as this one: http://www.narga.net/optimizing-apachephpmysql-low-memory-server/
(But don't use the suggestion of MyISAM tables - horrible...)
this 2 options will make the problem much much less happening ..
I am still looking for better solution to close the process that are done and kill the ones that hang in there .
Changed Apache's prefork MPM into the httpd.conf
These are the values I ended up using:
StartServers 1
MinSpareServers 1
MaxSpareServers 5
ServerLimit 16
MaxClients 16
MaxRequestsPerChild 0
ListenBacklog 100
Then, try to desactivate some modules php with
sudo a2dismod name_of_module
I just installed OpenStack Juno using devstack, and observed that RabbitMQ (package rabbitmq-server-3.1.5-10 installed by yum) is not stable, i.e. it quickly eats up the memory and shuts down; there is 2G of RAM. Below is the messages from logs and 'systemctl status' before the daemon died:
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
vm_memory_high_watermark clear. Memory used:835116352 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:40 ===
memory resource limit alarm cleared on node rabbit#node
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
accepting AMQP connection <0.27011.5> (10.0.0.11:55198 -> 10.0.0.11:5672)
=INFO REPORT==== 18-Dec-2014::01:25:41 ===
vm_memory_high_watermark set. Memory used:850213192 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:41 ===
memory resource limit alarm set on node rabbit#node.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
rabbitmqctl[770]: ===========
rabbitmqctl[770]: nodes in question: [rabbit#node]
rabbitmqctl[770]: hosts, their running nodes and ports:
rabbitmqctl[770]: - node: [{rabbitmqctl770,40089}]
rabbitmqctl[770]: current node details:
rabbitmqctl[770]: - node name: rabbitmqctl770#node
rabbitmqctl[770]: - home dir: /var/lib/rabbitmq
rabbitmqctl[770]: - cookie hash: FftrRFUESg4RKWsyb1cPqw==
systemd[1]: rabbitmq-server.service: control process exited, code=exited status=2
systemd[1]: Unit rabbitmq-server.service entered failed state.
I know about set_vm_memory_high_watermark, but it doesn't solve the issue. I want to ensure that the daemon doesn't shut down abruptly. I wonder if someone saw this before and could advise?
Thanks.
UPDATE
Upgraded to version 3.4.2 taken directly from www.rabbitmq.com/download.html The new version doesn't consume RAM that fast and tends to work longer then previous version, but eventually still eats out all the memory and shuts.
I think the number of connections in the servers are increasing and they are being held like that without closing that's why it is consuming more memory. When the usage of RAM increases beyond the watermark rabbitmq server won't accept any network request. Either you have to close the connections which all are opened or you have to increase the RAM of the system. But increasing the RAM will only reduce the problem for some time but you'll face the problem again it is better to close the connections.
try to use CloudAMQP instead of installing locally. This will be fixed then. firstly create a rabbitMQ account here. https://customer.cloudamqp.com/signup.
then create your queue there and connect with your application.